Btrfs: fix hole extent items with a zero size after range cloning
diff mbox series

Message ID 20191205165841.18524-1-fdmanana@kernel.org
State New
Headers show
Series
  • Btrfs: fix hole extent items with a zero size after range cloning
Related show

Commit Message

Filipe Manana Dec. 5, 2019, 4:58 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

Normally when cloning a file range if we find an implicit hole at the end
of the range we assume it is because the NO_HOLES feature is enabled.
However that is not always the case. One well known case [1] is when we
have a power failure after mixing buffered and direct IO writes against
the same file.

In such cases we need to punch a hole in the destination file, and if
the NO_HOLES feature is not enabled, we need to insert explicit file
extent items to represent the hole. After commit 690a5dbfc51315
("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning
extents"), we started to insert file extent items representing the hole
with an item size of 0, which is invalid and should be 53 bytes (the size
of a btrfs_file_extent_item structure), resulting in all sorts of
corruptions and invalid memory accesses. This is detected by the tree
checker when we attempt to write a leaf to disk.

The problem can be sporadically triggered by test case generic/561 from
fstests. That test case does not exercise power failure and creates a new
filesystem when it starts, so it does not use a filesystem created by any
previous test that tests power failure. However the test does both
buffered and direct IO writes (through fsstress) and it's precisely that
which is creating the implicit holes in files. That happens even before
the commit mentioned earlier. I need to investigate why we get those
implicit holes to check if there is a real problem or not. For now this
change fixes the regression of introducing file extent items with an item
size of 0 bytes.

Fix the issue by calling btrfs_punch_hole_range() without passing a
btrfs_clone_extent_info structure, which ensures file extent items are
inserted to represent the hole with a correct item size. We were passing
a btrfs_clone_extent_info with a value of 0 for its 'item_size' field,
which was causing the insertion of file extent items with an item size
of 0.

[1] https://www.spinics.net/lists/linux-btrfs/msg75350.html

Fixes: 690a5dbfc51315 ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents")
Reported-by: David Sterba <dsterba@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ioctl.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

Comments

Josef Bacik Dec. 6, 2019, 3:11 p.m. UTC | #1
On Thu, Dec 05, 2019 at 04:58:41PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Normally when cloning a file range if we find an implicit hole at the end
> of the range we assume it is because the NO_HOLES feature is enabled.
> However that is not always the case. One well known case [1] is when we
> have a power failure after mixing buffered and direct IO writes against
> the same file.
> 
> In such cases we need to punch a hole in the destination file, and if
> the NO_HOLES feature is not enabled, we need to insert explicit file
> extent items to represent the hole. After commit 690a5dbfc51315
> ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning
> extents"), we started to insert file extent items representing the hole
> with an item size of 0, which is invalid and should be 53 bytes (the size
> of a btrfs_file_extent_item structure), resulting in all sorts of
> corruptions and invalid memory accesses. This is detected by the tree
> checker when we attempt to write a leaf to disk.
> 
> The problem can be sporadically triggered by test case generic/561 from
> fstests. That test case does not exercise power failure and creates a new
> filesystem when it starts, so it does not use a filesystem created by any
> previous test that tests power failure. However the test does both
> buffered and direct IO writes (through fsstress) and it's precisely that
> which is creating the implicit holes in files. That happens even before
> the commit mentioned earlier. I need to investigate why we get those
> implicit holes to check if there is a real problem or not. For now this
> change fixes the regression of introducing file extent items with an item
> size of 0 bytes.
> 
> Fix the issue by calling btrfs_punch_hole_range() without passing a
> btrfs_clone_extent_info structure, which ensures file extent items are
> inserted to represent the hole with a correct item size. We were passing
> a btrfs_clone_extent_info with a value of 0 for its 'item_size' field,
> which was causing the insertion of file extent items with an item size
> of 0.
> 
> [1] https://www.spinics.net/lists/linux-btrfs/msg75350.html
> 
> Fixes: 690a5dbfc51315 ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents")
> Reported-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
David Sterba Dec. 9, 2019, 4:23 p.m. UTC | #2
On Thu, Dec 05, 2019 at 04:58:41PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Normally when cloning a file range if we find an implicit hole at the end
> of the range we assume it is because the NO_HOLES feature is enabled.
> However that is not always the case. One well known case [1] is when we
> have a power failure after mixing buffered and direct IO writes against
> the same file.
> 
> In such cases we need to punch a hole in the destination file, and if
> the NO_HOLES feature is not enabled, we need to insert explicit file
> extent items to represent the hole. After commit 690a5dbfc51315
> ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning
> extents"), we started to insert file extent items representing the hole
> with an item size of 0, which is invalid and should be 53 bytes (the size
> of a btrfs_file_extent_item structure), resulting in all sorts of
> corruptions and invalid memory accesses. This is detected by the tree
> checker when we attempt to write a leaf to disk.
> 
> The problem can be sporadically triggered by test case generic/561 from
> fstests. That test case does not exercise power failure and creates a new
> filesystem when it starts, so it does not use a filesystem created by any
> previous test that tests power failure. However the test does both
> buffered and direct IO writes (through fsstress) and it's precisely that
> which is creating the implicit holes in files. That happens even before
> the commit mentioned earlier. I need to investigate why we get those
> implicit holes to check if there is a real problem or not. For now this
> change fixes the regression of introducing file extent items with an item
> size of 0 bytes.
> 
> Fix the issue by calling btrfs_punch_hole_range() without passing a
> btrfs_clone_extent_info structure, which ensures file extent items are
> inserted to represent the hole with a correct item size. We were passing
> a btrfs_clone_extent_info with a value of 0 for its 'item_size' field,
> which was causing the insertion of file extent items with an item size
> of 0.
> 
> [1] https://www.spinics.net/lists/linux-btrfs/msg75350.html
> 
> Fixes: 690a5dbfc51315 ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents")
> Reported-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Added to misc-next, thanks.

Patch
diff mbox series

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a1ee0b775e65..3418decb9e61 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3720,24 +3720,18 @@  static int btrfs_clone(struct inode *src, struct inode *inode,
 	ret = 0;
 
 	if (last_dest_end < destoff + len) {
-		struct btrfs_clone_extent_info clone_info = { 0 };
 		/*
-		 * We have an implicit hole (NO_HOLES feature is enabled) that
-		 * fully or partially overlaps our cloning range at its end.
+		 * We have an implicit hole that fully or partially overlaps our
+		 * cloning range at its end. This means that we either have the
+		 * NO_HOLES feature enabled or the implicit hole happened due to
+		 * mixing buffered and direct IO writes against this file.
 		 */
 		btrfs_release_path(path);
 		path->leave_spinning = 0;
 
-		/*
-		 * We are dealing with a hole and our clone_info already has a
-		 * disk_offset of 0, we only need to fill the data length and
-		 * file offset.
-		 */
-		clone_info.data_len = destoff + len - last_dest_end;
-		clone_info.file_offset = last_dest_end;
 		ret = btrfs_punch_hole_range(inode, path,
 					     last_dest_end, destoff + len - 1,
-					     &clone_info, &trans);
+					     NULL, &trans);
 		if (ret)
 			goto out;