Message ID | 20191205165841.18524-1-fdmanana@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Btrfs: fix hole extent items with a zero size after range cloning | expand |
On Thu, Dec 05, 2019 at 04:58:41PM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > Normally when cloning a file range if we find an implicit hole at the end > of the range we assume it is because the NO_HOLES feature is enabled. > However that is not always the case. One well known case [1] is when we > have a power failure after mixing buffered and direct IO writes against > the same file. > > In such cases we need to punch a hole in the destination file, and if > the NO_HOLES feature is not enabled, we need to insert explicit file > extent items to represent the hole. After commit 690a5dbfc51315 > ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning > extents"), we started to insert file extent items representing the hole > with an item size of 0, which is invalid and should be 53 bytes (the size > of a btrfs_file_extent_item structure), resulting in all sorts of > corruptions and invalid memory accesses. This is detected by the tree > checker when we attempt to write a leaf to disk. > > The problem can be sporadically triggered by test case generic/561 from > fstests. That test case does not exercise power failure and creates a new > filesystem when it starts, so it does not use a filesystem created by any > previous test that tests power failure. However the test does both > buffered and direct IO writes (through fsstress) and it's precisely that > which is creating the implicit holes in files. That happens even before > the commit mentioned earlier. I need to investigate why we get those > implicit holes to check if there is a real problem or not. For now this > change fixes the regression of introducing file extent items with an item > size of 0 bytes. > > Fix the issue by calling btrfs_punch_hole_range() without passing a > btrfs_clone_extent_info structure, which ensures file extent items are > inserted to represent the hole with a correct item size. We were passing > a btrfs_clone_extent_info with a value of 0 for its 'item_size' field, > which was causing the insertion of file extent items with an item size > of 0. > > [1] https://www.spinics.net/lists/linux-btrfs/msg75350.html > > Fixes: 690a5dbfc51315 ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents") > Reported-by: David Sterba <dsterba@suse.com> > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Thu, Dec 05, 2019 at 04:58:41PM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > Normally when cloning a file range if we find an implicit hole at the end > of the range we assume it is because the NO_HOLES feature is enabled. > However that is not always the case. One well known case [1] is when we > have a power failure after mixing buffered and direct IO writes against > the same file. > > In such cases we need to punch a hole in the destination file, and if > the NO_HOLES feature is not enabled, we need to insert explicit file > extent items to represent the hole. After commit 690a5dbfc51315 > ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning > extents"), we started to insert file extent items representing the hole > with an item size of 0, which is invalid and should be 53 bytes (the size > of a btrfs_file_extent_item structure), resulting in all sorts of > corruptions and invalid memory accesses. This is detected by the tree > checker when we attempt to write a leaf to disk. > > The problem can be sporadically triggered by test case generic/561 from > fstests. That test case does not exercise power failure and creates a new > filesystem when it starts, so it does not use a filesystem created by any > previous test that tests power failure. However the test does both > buffered and direct IO writes (through fsstress) and it's precisely that > which is creating the implicit holes in files. That happens even before > the commit mentioned earlier. I need to investigate why we get those > implicit holes to check if there is a real problem or not. For now this > change fixes the regression of introducing file extent items with an item > size of 0 bytes. > > Fix the issue by calling btrfs_punch_hole_range() without passing a > btrfs_clone_extent_info structure, which ensures file extent items are > inserted to represent the hole with a correct item size. We were passing > a btrfs_clone_extent_info with a value of 0 for its 'item_size' field, > which was causing the insertion of file extent items with an item size > of 0. > > [1] https://www.spinics.net/lists/linux-btrfs/msg75350.html > > Fixes: 690a5dbfc51315 ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents") > Reported-by: David Sterba <dsterba@suse.com> > Signed-off-by: Filipe Manana <fdmanana@suse.com> Added to misc-next, thanks.
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a1ee0b775e65..3418decb9e61 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3720,24 +3720,18 @@ static int btrfs_clone(struct inode *src, struct inode *inode, ret = 0; if (last_dest_end < destoff + len) { - struct btrfs_clone_extent_info clone_info = { 0 }; /* - * We have an implicit hole (NO_HOLES feature is enabled) that - * fully or partially overlaps our cloning range at its end. + * We have an implicit hole that fully or partially overlaps our + * cloning range at its end. This means that we either have the + * NO_HOLES feature enabled or the implicit hole happened due to + * mixing buffered and direct IO writes against this file. */ btrfs_release_path(path); path->leave_spinning = 0; - /* - * We are dealing with a hole and our clone_info already has a - * disk_offset of 0, we only need to fill the data length and - * file offset. - */ - clone_info.data_len = destoff + len - last_dest_end; - clone_info.file_offset = last_dest_end; ret = btrfs_punch_hole_range(inode, path, last_dest_end, destoff + len - 1, - &clone_info, &trans); + NULL, &trans); if (ret) goto out;