Message ID | 20181121151052.18744-1-nborisov@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PATCHv3] btrfs: Fix error handling in btrfs_cleanup_ordered_extents | expand |
On 21.11.18 г. 17:10 ч., Nikolay Borisov wrote: > Running btrfs/124 in a loop hung up on me sporadically with the > following call trace: > btrfs D 0 5760 5324 0x00000000 > Call Trace: > ? __schedule+0x243/0x800 > schedule+0x33/0x90 > btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs] > ? wait_woken+0xa0/0xa0 > btrfs_wait_ordered_range+0xbb/0x100 [btrfs] > btrfs_relocate_block_group+0x1ff/0x230 [btrfs] > btrfs_relocate_chunk+0x49/0x100 [btrfs] > btrfs_balance+0xbeb/0x1740 [btrfs] > btrfs_ioctl_balance+0x2ee/0x380 [btrfs] > btrfs_ioctl+0x1691/0x3110 [btrfs] > ? lockdep_hardirqs_on+0xed/0x180 > ? __handle_mm_fault+0x8e7/0xfb0 > ? _raw_spin_unlock+0x24/0x30 > ? __handle_mm_fault+0x8e7/0xfb0 > ? do_vfs_ioctl+0xa5/0x6e0 > ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] > do_vfs_ioctl+0xa5/0x6e0 > ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe > ksys_ioctl+0x3a/0x70 > __x64_sys_ioctl+0x16/0x20 > do_syscall_64+0x60/0x1b0 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > This happens because during page writeback it's valid for > writepage_delalloc to instantiate a delalloc range which doesn't > belong to the page currently being written back. > > The reason this case is valid is due to find_lock_delalloc_range > returning any available range after the passed delalloc_start and > ignorting whether the page under writeback is within that range. > In turn ordered extents (OE) are always created for the returned range > from find_lock_delalloc_range. If, however, a failure occurs while OE > are being created then the clean up code in btrfs_cleanup_ordered_extents > will be called. > > Unfortunately the code in btrfs_cleanup_ordered_extents doesn't consider > the case of such 'foreign' range being processed and instead it always > assumes that the range OE are created for belongs to the page. This > leads to the first page of such foregin range to not be cleaned up since > it's deliberately missed skipped by the current cleaning up code. > > Fix this by correctly checking whether the current page belongs to the > range being instantiated and if so adjsut the range parameters > passed for cleaning up. If it doesn't, then just clean the whole OE > range directly. > > Signed-off-by: Nikolay Borisov <nborisov@suse.com> > Reviewed-by: Josef Bacik <josef@toxicpanda.com> > --- > V3: > * Re-worded the commit for easier comprehension > * Added RB tag from Josef > > V2: > * Fix compilation failure due to missing parentheses > * Fixed the "Fixes" tag. > fs/btrfs/inode.c | 29 ++++++++++++++++++++--------- > 1 file changed, 20 insertions(+), 9 deletions(-) > Ping, Also this patch needs: Fixes: 524272607e88 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang") and it needs to be applied to the stable releases 4.14 <snip>
On Wed, Nov 21, 2018 at 05:10:52PM +0200, Nikolay Borisov wrote: > Running btrfs/124 in a loop hung up on me sporadically with the > following call trace: > btrfs D 0 5760 5324 0x00000000 > Call Trace: > ? __schedule+0x243/0x800 > schedule+0x33/0x90 > btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs] > ? wait_woken+0xa0/0xa0 > btrfs_wait_ordered_range+0xbb/0x100 [btrfs] > btrfs_relocate_block_group+0x1ff/0x230 [btrfs] > btrfs_relocate_chunk+0x49/0x100 [btrfs] > btrfs_balance+0xbeb/0x1740 [btrfs] > btrfs_ioctl_balance+0x2ee/0x380 [btrfs] > btrfs_ioctl+0x1691/0x3110 [btrfs] > ? lockdep_hardirqs_on+0xed/0x180 > ? __handle_mm_fault+0x8e7/0xfb0 > ? _raw_spin_unlock+0x24/0x30 > ? __handle_mm_fault+0x8e7/0xfb0 > ? do_vfs_ioctl+0xa5/0x6e0 > ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] > do_vfs_ioctl+0xa5/0x6e0 > ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe > ksys_ioctl+0x3a/0x70 > __x64_sys_ioctl+0x16/0x20 > do_syscall_64+0x60/0x1b0 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > This happens because during page writeback it's valid for > writepage_delalloc to instantiate a delalloc range which doesn't > belong to the page currently being written back. > > The reason this case is valid is due to find_lock_delalloc_range > returning any available range after the passed delalloc_start and > ignorting whether the page under writeback is within that range. > In turn ordered extents (OE) are always created for the returned range > from find_lock_delalloc_range. If, however, a failure occurs while OE > are being created then the clean up code in btrfs_cleanup_ordered_extents > will be called. > > Unfortunately the code in btrfs_cleanup_ordered_extents doesn't consider > the case of such 'foreign' range being processed and instead it always > assumes that the range OE are created for belongs to the page. This > leads to the first page of such foregin range to not be cleaned up since > it's deliberately missed skipped by the current cleaning up code. > > Fix this by correctly checking whether the current page belongs to the > range being instantiated and if so adjsut the range parameters > passed for cleaning up. If it doesn't, then just clean the whole OE > range directly. > > Signed-off-by: Nikolay Borisov <nborisov@suse.com> > Reviewed-by: Josef Bacik <josef@toxicpanda.com> Added to misc-next, thanks.
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a0fc564626df..9f65f131f244 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -110,17 +110,17 @@ static void __endio_write_update_ordered(struct inode *inode, * extent_clear_unlock_delalloc() to clear both the bits EXTENT_DO_ACCOUNTING * and EXTENT_DELALLOC simultaneously, because that causes the reserved metadata * to be released, which we want to happen only when finishing the ordered - * extent (btrfs_finish_ordered_io()). Also note that the caller of - * btrfs_run_delalloc_range already does proper cleanup for the first page of - * the range, that is, it invokes the callback writepage_end_io_hook() for the - * range of the first page. + * extent (btrfs_finish_ordered_io()). */ static inline void btrfs_cleanup_ordered_extents(struct inode *inode, - const u64 offset, - const u64 bytes) + struct page *locked_page, + u64 offset, u64 bytes) { unsigned long index = offset >> PAGE_SHIFT; unsigned long end_index = (offset + bytes - 1) >> PAGE_SHIFT; + u64 page_start = page_offset(locked_page); + u64 page_end = page_start + PAGE_SIZE - 1; + struct page *page; while (index <= end_index) { @@ -131,8 +131,18 @@ static inline void btrfs_cleanup_ordered_extents(struct inode *inode, ClearPagePrivate2(page); put_page(page); } - return __endio_write_update_ordered(inode, offset + PAGE_SIZE, - bytes - PAGE_SIZE, false); + + /* + * In case this page belongs to the delalloc range being instantiated + * then skip it, since the first page of a range is going to be + * properly cleaned up by the caller of run_delalloc_range + */ + if (page_start >= offset && page_end <= (offset + bytes - 1)) { + offset += PAGE_SIZE; + bytes -= PAGE_SIZE; + } + + return __endio_write_update_ordered(inode, offset, bytes, false); } static int btrfs_dirty_inode(struct inode *inode); @@ -1605,7 +1615,8 @@ int btrfs_run_delalloc_range(void *private_data, struct page *locked_page, write_flags); } if (ret) - btrfs_cleanup_ordered_extents(inode, start, end - start + 1); + btrfs_cleanup_ordered_extents(inode, locked_page, start, + end - start + 1); return ret; }