diff mbox

FIle copy to FAT FS on NVDIMM hits BUG_ON at fs/buffer.c:3305!

Message ID 87d18neemb.fsf@devron (mailing list archive)
State New, archived
Headers show

Commit Message

OGAWA Hirofumi July 26, 2017, 9:23 a.m. UTC
"Kani, Toshimitsu" <toshi.kani@hpe.com> writes:

>  kernel BUG at fs/buffer.c:3305!
>  invalid opcode: 0000 [#1] SMP
>   :
>  Workqueue: writeback wb_workfn (flush-259:0)
>  task: ffff8d02595b8000 task.stack: ffffa22242400000
>  RIP: 0010:try_to_free_buffers+0xd2/0xe0
>  RSP: 0018:ffffa22242403830 EFLAGS: 00010246
>  RAX: 00afffc000001028 RBX: 0000000000000008 RCX: ffff8d012dcf19c0
>  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffc468e3b52b80
>  RBP: ffffa22242403858 R08: 0000000000000000 R09: 000000000002067c
>  R10: ffff8d027ffe6000 R11: 0000000000000000 R12: 0000000000000000
>  R13: ffff8d022fccdbe0 R14: ffffc468e3b52b80 R15: ffffa22242403ad0
>  FS:  0000000000000000(0000) GS:ffff8d027fd40000(0000)

The locking of this path seems to be broken. The guy familiar to
bdev_write_page() path will made real fix though, The following patch
should be explaining enough what is wrong.

In short, clean_buffers() must be called before unlocking lock_page().

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
---

 fs/block_dev.c |    2 --
 fs/mpage.c     |    1 +
 mm/page_io.c   |    1 +
 3 files changed, 2 insertions(+), 2 deletions(-)

Comments

Ross Zwisler July 26, 2017, 2:23 p.m. UTC | #1
On Wed, Jul 26, 2017 at 06:23:08PM +0900, OGAWA Hirofumi wrote:
> "Kani, Toshimitsu" <toshi.kani@hpe.com> writes:
> 
> >  kernel BUG at fs/buffer.c:3305!
> >  invalid opcode: 0000 [#1] SMP
> >   :
> >  Workqueue: writeback wb_workfn (flush-259:0)
> >  task: ffff8d02595b8000 task.stack: ffffa22242400000
> >  RIP: 0010:try_to_free_buffers+0xd2/0xe0
> >  RSP: 0018:ffffa22242403830 EFLAGS: 00010246
> >  RAX: 00afffc000001028 RBX: 0000000000000008 RCX: ffff8d012dcf19c0
> >  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffc468e3b52b80
> >  RBP: ffffa22242403858 R08: 0000000000000000 R09: 000000000002067c
> >  R10: ffff8d027ffe6000 R11: 0000000000000000 R12: 0000000000000000
> >  R13: ffff8d022fccdbe0 R14: ffffc468e3b52b80 R15: ffffa22242403ad0
> >  FS:  0000000000000000(0000) GS:ffff8d027fd40000(0000)
> 
> The locking of this path seems to be broken. The guy familiar to
> bdev_write_page() path will made real fix though, The following patch
> should be explaining enough what is wrong.

Is there someone in particular who is familiar with bdev_write_page() that is
working on this fix, or does someone need to pick this up?
Dan Williams July 26, 2017, 4:08 p.m. UTC | #2
On Wed, Jul 26, 2017 at 7:23 AM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> On Wed, Jul 26, 2017 at 06:23:08PM +0900, OGAWA Hirofumi wrote:
>> "Kani, Toshimitsu" <toshi.kani@hpe.com> writes:
>>
>> >  kernel BUG at fs/buffer.c:3305!
>> >  invalid opcode: 0000 [#1] SMP
>> >   :
>> >  Workqueue: writeback wb_workfn (flush-259:0)
>> >  task: ffff8d02595b8000 task.stack: ffffa22242400000
>> >  RIP: 0010:try_to_free_buffers+0xd2/0xe0
>> >  RSP: 0018:ffffa22242403830 EFLAGS: 00010246
>> >  RAX: 00afffc000001028 RBX: 0000000000000008 RCX: ffff8d012dcf19c0
>> >  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffc468e3b52b80
>> >  RBP: ffffa22242403858 R08: 0000000000000000 R09: 000000000002067c
>> >  R10: ffff8d027ffe6000 R11: 0000000000000000 R12: 0000000000000000
>> >  R13: ffff8d022fccdbe0 R14: ffffc468e3b52b80 R15: ffffa22242403ad0
>> >  FS:  0000000000000000(0000) GS:ffff8d027fd40000(0000)
>>
>> The locking of this path seems to be broken. The guy familiar to
>> bdev_write_page() path will made real fix though, The following patch
>> should be explaining enough what is wrong.
>
> Is there someone in particular who is familiar with bdev_write_page() that is
> working on this fix, or does someone need to pick this up?

Another question, does ->rw_page() really buy us that much with the
pmem driver? If applications want to enjoy the lowest latency access
they can just use DAX. There's now only 4 drivers that use rw_page
since nvme dropped its usage and I'd be inclined to just rip it out.
Christoph Hellwig July 26, 2017, 5:03 p.m. UTC | #3
On Wed, Jul 26, 2017 at 09:08:00AM -0700, Dan Williams wrote:
> Another question, does ->rw_page() really buy us that much with the
> pmem driver? If applications want to enjoy the lowest latency access
> they can just use DAX. There's now only 4 drivers that use rw_page
> since nvme dropped its usage and I'd be inclined to just rip it out.

nvme never supported rw_page (there was a page for it, but it
fortunately never got merged).

rw_page are massive pain the ass and the method should go away.
For make_request drivers that actually operate synchronous (e.g.
the ramdisk) it's not much of a benefit, and even for normally
asynchronous drivers like nvme the block layer polling interface
is much more suitable.
diff mbox

Patch

diff -puN fs/mpage.c~bdev_write_page-fix fs/mpage.c
--- linux/fs/mpage.c~bdev_write_page-fix	2017-07-26 18:05:53.078204737 +0900
+++ linux-hirofumi/fs/mpage.c	2017-07-26 18:07:03.960043665 +0900
@@ -605,6 +605,7 @@  alloc_new:
 			if (!bdev_write_page(bdev, blocks[0] << (blkbits - 9),
 								page, wbc)) {
 				clean_buffers(page, first_unmapped);
+				unlock_page(page);
 				goto out;
 			}
 		}
diff -puN mm/page_io.c~bdev_write_page-fix mm/page_io.c
--- linux/mm/page_io.c~bdev_write_page-fix	2017-07-26 18:06:16.807150810 +0900
+++ linux-hirofumi/mm/page_io.c	2017-07-26 18:06:23.425135771 +0900
@@ -308,6 +308,7 @@  int __swap_writepage(struct page *page,
 
 	ret = bdev_write_page(sis->bdev, swap_page_sector(page), page, wbc);
 	if (!ret) {
+		unlock_page(page);
 		count_vm_event(PSWPOUT);
 		return 0;
 	}
diff -puN fs/block_dev.c~bdev_write_page-fix fs/block_dev.c
--- linux/fs/block_dev.c~bdev_write_page-fix	2017-07-26 18:08:53.490794861 +0900
+++ linux-hirofumi/fs/block_dev.c	2017-07-26 18:08:58.375783767 +0900
@@ -714,8 +714,6 @@  int bdev_write_page(struct block_device
 	result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, true);
 	if (result)
 		end_page_writeback(page);
-	else
-		unlock_page(page);
 	blk_queue_exit(bdev->bd_queue);
 	return result;
 }