diff mbox series

btrfs: zoned: fix silent data loss after failure splitting ordered extent

Message ID 0aba70d8929db6eeb640c795f512957db7a0b34a.1619011437.git.fdmanana@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: zoned: fix silent data loss after failure splitting ordered extent | expand

Commit Message

Filipe Manana April 21, 2021, 1:31 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

On a zoned filesystem, sometimes we need to split an ordered extent into 3
different ordered extents. The original ordered extent is shortened, at
the front and at the rear, and we create two other new ordered extents to
represent the trimmed parts of the original ordered extent.

After adjusting the original ordered extent, we create an ordered extent
to represent the pre-range, and that may fail with -ENOMEM for example.
After that we always try to create the ordered extent for the post-range,
and if that happens to succeed we end up returning success to the caller
as we overwrite the 'ret' variable which contained the previous error.

This means we end up with a file range for which there is no ordered
extent, which results in the range never getting a new file extent item
pointing to the new data location. And since the split operation did
not return an error, writeback does not fail and the inode's mapping is
not flagged with an error, resulting in a subsequent fsync not reporting
an error either.

It's possibly very unlikely to have the creation of the post-range ordered
extent succeed after the creation of the pre-range ordered extent failed,
but it's not impossible.

So fix this by making sure we only create the post-range ordered extent
if there was no error creating the ordered extent for the pre-range.

Fixes: d22002fd37bd97 ("btrfs: zoned: split ordered extent when bio is sent")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ordered-data.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Sterba April 28, 2021, 5:15 p.m. UTC | #1
On Wed, Apr 21, 2021 at 02:31:50PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> On a zoned filesystem, sometimes we need to split an ordered extent into 3
> different ordered extents. The original ordered extent is shortened, at
> the front and at the rear, and we create two other new ordered extents to
> represent the trimmed parts of the original ordered extent.
> 
> After adjusting the original ordered extent, we create an ordered extent
> to represent the pre-range, and that may fail with -ENOMEM for example.
> After that we always try to create the ordered extent for the post-range,
> and if that happens to succeed we end up returning success to the caller
> as we overwrite the 'ret' variable which contained the previous error.
> 
> This means we end up with a file range for which there is no ordered
> extent, which results in the range never getting a new file extent item
> pointing to the new data location. And since the split operation did
> not return an error, writeback does not fail and the inode's mapping is
> not flagged with an error, resulting in a subsequent fsync not reporting
> an error either.
> 
> It's possibly very unlikely to have the creation of the post-range ordered
> extent succeed after the creation of the pre-range ordered extent failed,
> but it's not impossible.
> 
> So fix this by making sure we only create the post-range ordered extent
> if there was no error creating the ordered extent for the pre-range.
> 
> Fixes: d22002fd37bd97 ("btrfs: zoned: split ordered extent when bio is sent")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Added to misc-next, thanks.
diff mbox series

Patch

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 07b0b4218791..6c413bb451a3 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -984,7 +984,7 @@  int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
 
 	if (pre)
 		ret = clone_ordered_extent(ordered, 0, pre);
-	if (post)
+	if (ret == 0 && post)
 		ret = clone_ordered_extent(ordered, pre + ordered->disk_num_bytes,
 					   post);