Message ID | 3fa83d6d472d78beb5fd519d0290b73d02d53d15.1733176045.git.wqu@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] btrfs: properly wait for writeback before buffered write | expand |
On Mon, Dec 2, 2024 at 9:48 PM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > Before commit e820dbeb6ad1 ("btrfs: convert btrfs_buffered_write() to > use folios"), function prepare_one_folio() will always wait for folio > writeback to finish before returning the folio. > > However commit e820dbeb6ad1 ("btrfs: convert btrfs_buffered_write() to > use folios") changed to use FGP_STABLE to do the writeback wait, but > FGP_STABLE is calling folio_wait_stable(), which only calls > folio_wait_writeback() if the address space has AS_STABLE_WRITES, which > is not set for btrfs inodes. > > This means we will not wait for folio writeback at all. > > [CAUSE] > The cause is FGP_STABLE is not wait for writeback unconditionally, but > only for address spaces with AS_STABLE_WRITES, normally such flag is set > when the super block has SB_I_STABLE_WRITES flag. > > Such super block flag is set when the block device has hardware digest > support or has internal checksum requirement. > > I'd argue btrfs should set such super block due to its default data > checksum behavior, but it is not set yet, so this means FGP_STABLE flag > will have no effect at all. > > (For NODATACSUM inodes, we can skip the wait in theory but that should > be an optimization in the future) > > This can lead to data checksum mismatch, as we can modify the folio > meanwhile it's still under writeback, this will make the contents > differ from the contents at submission and checksum calculation. > > [FIX] > Instead of fully rely on FGP_STABLE, manually do the folio writeback > wait, until we set the address space or super flag. > > Fixes: e820dbeb6ad1 ("btrfs: convert btrfs_buffered_write() to use folios") > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Looks good, thanks for the update of the changelog. > --- > v2: > - Update the commit message and fixes by tag > I was too focused on the syzbot report, not noticing it's a different > commit causing the regression. > Now removed the syzbot report part. > --- > fs/btrfs/file.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index fbb753300071..8734f5fc681f 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -911,6 +911,7 @@ static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_ > ret = PTR_ERR(folio); > return ret; > } > + folio_wait_writeback(folio); > /* Only support page sized folio yet. */ > ASSERT(folio_order(folio) == 0); > ret = set_folio_extent_mapped(folio); > -- > 2.47.1 > >
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index fbb753300071..8734f5fc681f 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -911,6 +911,7 @@ static noinline int prepare_one_folio(struct inode *inode, struct folio **folio_ ret = PTR_ERR(folio); return ret; } + folio_wait_writeback(folio); /* Only support page sized folio yet. */ ASSERT(folio_order(folio) == 0); ret = set_folio_extent_mapped(folio);
[BUG] Before commit e820dbeb6ad1 ("btrfs: convert btrfs_buffered_write() to use folios"), function prepare_one_folio() will always wait for folio writeback to finish before returning the folio. However commit e820dbeb6ad1 ("btrfs: convert btrfs_buffered_write() to use folios") changed to use FGP_STABLE to do the writeback wait, but FGP_STABLE is calling folio_wait_stable(), which only calls folio_wait_writeback() if the address space has AS_STABLE_WRITES, which is not set for btrfs inodes. This means we will not wait for folio writeback at all. [CAUSE] The cause is FGP_STABLE is not wait for writeback unconditionally, but only for address spaces with AS_STABLE_WRITES, normally such flag is set when the super block has SB_I_STABLE_WRITES flag. Such super block flag is set when the block device has hardware digest support or has internal checksum requirement. I'd argue btrfs should set such super block due to its default data checksum behavior, but it is not set yet, so this means FGP_STABLE flag will have no effect at all. (For NODATACSUM inodes, we can skip the wait in theory but that should be an optimization in the future) This can lead to data checksum mismatch, as we can modify the folio meanwhile it's still under writeback, this will make the contents differ from the contents at submission and checksum calculation. [FIX] Instead of fully rely on FGP_STABLE, manually do the folio writeback wait, until we set the address space or super flag. Fixes: e820dbeb6ad1 ("btrfs: convert btrfs_buffered_write() to use folios") Signed-off-by: Qu Wenruo <wqu@suse.com> --- v2: - Update the commit message and fixes by tag I was too focused on the syzbot report, not noticing it's a different commit causing the regression. Now removed the syzbot report part. --- fs/btrfs/file.c | 1 + 1 file changed, 1 insertion(+)