Message ID | 58e956200dc2e8c65c6a3fdf0cf05de8d77969ab.1709676577.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix data corruption/hang/rsv leak in subpage zoned cases | expand |
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index fb63055f42f3..bdd0e29ba848 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2290,10 +2290,8 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page, page = find_get_page(mapping, cur >> PAGE_SHIFT); ASSERT(PageLocked(page)); - if (pages_dirty && page != locked_page) { + if (pages_dirty && page != locked_page) ASSERT(PageDirty(page)); - clear_page_dirty_for_io(page); - } ret = __extent_writepage_io(BTRFS_I(inode), page, cur, cur_len, &bio_ctrl, i_size, &nr);
[BUG] For subpage + zoned case, btrfs can easily hang with the following workload, even with previous subpage delalloc rework: # mkfs.btrfs -f $dev # mount $dev $mnt # xfs_io -f -c "pwrite 32k 128k" $mnt/foobar # umount $mnt The system would hang at unmount due to unfinished ordered extents. Above $dev is a tcmu-runner emulated zoned HDD, which has a max zone append size of 64K. [CAUSE] There is a bug involved in extent_write_locked_range() (well, I'm already surprised by how many subpage incompatible code are inside that function): - If @pages_dirty is true, we will clear the page dirty flag for the whole page This means, for above case, since the max zone append size is 64K, we got an ordered extent sized 64K, resulting the following writeback range: 0 32K 64K 96K 128K 192K 256K | |//////////|/////////|/////////|///////| | \ Write back / |///| = subpage dirty range Since we clear the dirty flag for the page at 64K before entering __extent_writepage_io(), result the following page flags: 0 32K 64K 96K 128K 192K 256K | | | | |///////| | Then for the next delalloc range run, we would create ordered extent for the range [96K, 192K) and writeback the range. But since the whole 2nd page has no dirty flag set, we only submit the range [128K, 192K), meanwhile our ordered extent is still in 64K size, it would never be properly finished. And this also mean, dirty data is not properly submitted for writeback, and would cause data corruption. This bug only affects subpage and zoned case. For non-subpage and zoned case, find_next_dirty_byte() just return the whole page no matter if it has dirty flags or not. For subpage and non-zoned case, we never go into extent_write_locked_range(). [FIX] Just do not clear the page dirty at all. As __extent_writepage_io() would do a more accurate, subpage compatible clear for page dirty anyway. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/extent_io.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)