Message ID | 87d18neemb.fsf@devron (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jul 26, 2017 at 06:23:08PM +0900, OGAWA Hirofumi wrote: > "Kani, Toshimitsu" <toshi.kani@hpe.com> writes: > > > kernel BUG at fs/buffer.c:3305! > > invalid opcode: 0000 [#1] SMP > > : > > Workqueue: writeback wb_workfn (flush-259:0) > > task: ffff8d02595b8000 task.stack: ffffa22242400000 > > RIP: 0010:try_to_free_buffers+0xd2/0xe0 > > RSP: 0018:ffffa22242403830 EFLAGS: 00010246 > > RAX: 00afffc000001028 RBX: 0000000000000008 RCX: ffff8d012dcf19c0 > > RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffc468e3b52b80 > > RBP: ffffa22242403858 R08: 0000000000000000 R09: 000000000002067c > > R10: ffff8d027ffe6000 R11: 0000000000000000 R12: 0000000000000000 > > R13: ffff8d022fccdbe0 R14: ffffc468e3b52b80 R15: ffffa22242403ad0 > > FS: 0000000000000000(0000) GS:ffff8d027fd40000(0000) > > The locking of this path seems to be broken. The guy familiar to > bdev_write_page() path will made real fix though, The following patch > should be explaining enough what is wrong. Is there someone in particular who is familiar with bdev_write_page() that is working on this fix, or does someone need to pick this up?
On Wed, Jul 26, 2017 at 7:23 AM, Ross Zwisler <ross.zwisler@linux.intel.com> wrote: > On Wed, Jul 26, 2017 at 06:23:08PM +0900, OGAWA Hirofumi wrote: >> "Kani, Toshimitsu" <toshi.kani@hpe.com> writes: >> >> > kernel BUG at fs/buffer.c:3305! >> > invalid opcode: 0000 [#1] SMP >> > : >> > Workqueue: writeback wb_workfn (flush-259:0) >> > task: ffff8d02595b8000 task.stack: ffffa22242400000 >> > RIP: 0010:try_to_free_buffers+0xd2/0xe0 >> > RSP: 0018:ffffa22242403830 EFLAGS: 00010246 >> > RAX: 00afffc000001028 RBX: 0000000000000008 RCX: ffff8d012dcf19c0 >> > RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffc468e3b52b80 >> > RBP: ffffa22242403858 R08: 0000000000000000 R09: 000000000002067c >> > R10: ffff8d027ffe6000 R11: 0000000000000000 R12: 0000000000000000 >> > R13: ffff8d022fccdbe0 R14: ffffc468e3b52b80 R15: ffffa22242403ad0 >> > FS: 0000000000000000(0000) GS:ffff8d027fd40000(0000) >> >> The locking of this path seems to be broken. The guy familiar to >> bdev_write_page() path will made real fix though, The following patch >> should be explaining enough what is wrong. > > Is there someone in particular who is familiar with bdev_write_page() that is > working on this fix, or does someone need to pick this up? Another question, does ->rw_page() really buy us that much with the pmem driver? If applications want to enjoy the lowest latency access they can just use DAX. There's now only 4 drivers that use rw_page since nvme dropped its usage and I'd be inclined to just rip it out.
On Wed, Jul 26, 2017 at 09:08:00AM -0700, Dan Williams wrote: > Another question, does ->rw_page() really buy us that much with the > pmem driver? If applications want to enjoy the lowest latency access > they can just use DAX. There's now only 4 drivers that use rw_page > since nvme dropped its usage and I'd be inclined to just rip it out. nvme never supported rw_page (there was a page for it, but it fortunately never got merged). rw_page are massive pain the ass and the method should go away. For make_request drivers that actually operate synchronous (e.g. the ramdisk) it's not much of a benefit, and even for normally asynchronous drivers like nvme the block layer polling interface is much more suitable.
diff -puN fs/mpage.c~bdev_write_page-fix fs/mpage.c --- linux/fs/mpage.c~bdev_write_page-fix 2017-07-26 18:05:53.078204737 +0900 +++ linux-hirofumi/fs/mpage.c 2017-07-26 18:07:03.960043665 +0900 @@ -605,6 +605,7 @@ alloc_new: if (!bdev_write_page(bdev, blocks[0] << (blkbits - 9), page, wbc)) { clean_buffers(page, first_unmapped); + unlock_page(page); goto out; } } diff -puN mm/page_io.c~bdev_write_page-fix mm/page_io.c --- linux/mm/page_io.c~bdev_write_page-fix 2017-07-26 18:06:16.807150810 +0900 +++ linux-hirofumi/mm/page_io.c 2017-07-26 18:06:23.425135771 +0900 @@ -308,6 +308,7 @@ int __swap_writepage(struct page *page, ret = bdev_write_page(sis->bdev, swap_page_sector(page), page, wbc); if (!ret) { + unlock_page(page); count_vm_event(PSWPOUT); return 0; } diff -puN fs/block_dev.c~bdev_write_page-fix fs/block_dev.c --- linux/fs/block_dev.c~bdev_write_page-fix 2017-07-26 18:08:53.490794861 +0900 +++ linux-hirofumi/fs/block_dev.c 2017-07-26 18:08:58.375783767 +0900 @@ -714,8 +714,6 @@ int bdev_write_page(struct block_device result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, true); if (result) end_page_writeback(page); - else - unlock_page(page); blk_queue_exit(bdev->bd_queue); return result; }