Message ID | 20200124115204.4086-1-fdmanana@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Btrfs: send, fix emission of invalid clone operations within the same file | expand |
On 1/24/20 6:52 AM, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When doing an incremental send and a file has extents shared with itself > at different file offsets, it's possible for send to emit clone operations > that will fail at the destination because the source range goes beyond the > file's current size. This happens when the file size has increased in the > send snapshot, there is a hole between the shared extents and both shared > extents are at file offsets which are greater the file's size in the > parent snapshot. > > Example: > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt/sdb > > $ xfs_io -f -c "pwrite -S 0xf1 0 64K" /mnt/sdb/foobar > $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base > $ btrfs send -f /tmp/1.snap /mnt/sdb/base > > # Create a 320K extent at file offset 512K. > $ xfs_io -c "pwrite -S 0xab 512K 64K" /mnt/sdb/foobar > $ xfs_io -c "pwrite -S 0xcd 576K 64K" /mnt/sdb/foobar > $ xfs_io -c "pwrite -S 0xef 640K 64K" /mnt/sdb/foobar > $ xfs_io -c "pwrite -S 0x64 704K 64K" /mnt/sdb/foobar > $ xfs_io -c "pwrite -S 0x73 768K 64K" /mnt/sdb/foobar > > # Clone part of that 320K extent into a lower file offset (192K). > # This file offset is greater than the file's size in the parent > # snapshot (64K). Also the clone range is a bit behind the offset of > # the 320K extent so that we leave a hole between the shared extents. > $ xfs_io -c "reflink /mnt/sdb/foobar 448K 192K 192K" /mnt/sdb/foobar > > $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr > $ btrfs send -p /mnt/sdb/base -f /tmp/2.snap /mnt/sdb/incr > > $ mkfs.btrfs -f /dev/sdc > $ mount /dev/sdc /mnt/sdc > > $ btrfs receive -f /tmp/1.snap /mnt/sdc > $ btrfs receive -f /tmp/2.snap /mnt/sdc > ERROR: failed to clone extents to foobar: Invalid argument > > The problem is that after processing the extent at file offset 192K, send > does not issue a write operation full of zeroes for the hole between that > extent and the extent starting at file offset 520K (hole range from 384K > to 512K), this is because the hole is at an offset larger the size of the > file in the parent snapshot (384K > 64K). As a consequence the field > 'cur_inode_next_write_offset' of the send context remains with a value of > 384K when we start to process the extent at file offset 512K, which is the > value set after processing the extent at offset 192K. > > This screws up the lookup of possible extents to clone because due to an > incorrect value of 'cur_inode_next_write_offset' we can now consider > extents for cloning, in the same inode, that lie beyond the current size > of the file in the receiver of the send stream. Also, when checking if > an extent in the same file can be used for cloning, we must also check > that not only its start offset doesn't start at or beyond the current eof > of the file in the receiver but that the source range doesn't go beyond > current eof, that is we must check offset + length does not cross the > current eof, as that makes clone operations fail with -EINVAL. > > So fix this by updating 'cur_inode_next_write_offset' whenever we start > processing an extent and checking an extent's offset and length when > considering it for cloning operations. > > A test case for fstests follows soon. > > Fixes: 11f2069c113e02 ("Btrfs: send, allow clone operations within the same file") > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Fri, Jan 24, 2020 at 11:54 AM <fdmanana@kernel.org> wrote: > > From: Filipe Manana <fdmanana@suse.com> > > When doing an incremental send and a file has extents shared with itself > at different file offsets, it's possible for send to emit clone operations > that will fail at the destination because the source range goes beyond the > file's current size. This happens when the file size has increased in the > send snapshot, there is a hole between the shared extents and both shared > extents are at file offsets which are greater the file's size in the > parent snapshot. > > Example: > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt/sdb > > $ xfs_io -f -c "pwrite -S 0xf1 0 64K" /mnt/sdb/foobar > $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base > $ btrfs send -f /tmp/1.snap /mnt/sdb/base > > # Create a 320K extent at file offset 512K. > $ xfs_io -c "pwrite -S 0xab 512K 64K" /mnt/sdb/foobar > $ xfs_io -c "pwrite -S 0xcd 576K 64K" /mnt/sdb/foobar > $ xfs_io -c "pwrite -S 0xef 640K 64K" /mnt/sdb/foobar > $ xfs_io -c "pwrite -S 0x64 704K 64K" /mnt/sdb/foobar > $ xfs_io -c "pwrite -S 0x73 768K 64K" /mnt/sdb/foobar > > # Clone part of that 320K extent into a lower file offset (192K). > # This file offset is greater than the file's size in the parent > # snapshot (64K). Also the clone range is a bit behind the offset of > # the 320K extent so that we leave a hole between the shared extents. > $ xfs_io -c "reflink /mnt/sdb/foobar 448K 192K 192K" /mnt/sdb/foobar > > $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr > $ btrfs send -p /mnt/sdb/base -f /tmp/2.snap /mnt/sdb/incr > > $ mkfs.btrfs -f /dev/sdc > $ mount /dev/sdc /mnt/sdc > > $ btrfs receive -f /tmp/1.snap /mnt/sdc > $ btrfs receive -f /tmp/2.snap /mnt/sdc > ERROR: failed to clone extents to foobar: Invalid argument > > The problem is that after processing the extent at file offset 192K, send > does not issue a write operation full of zeroes for the hole between that > extent and the extent starting at file offset 520K (hole range from 384K > to 512K), this is because the hole is at an offset larger the size of the > file in the parent snapshot (384K > 64K). As a consequence the field > 'cur_inode_next_write_offset' of the send context remains with a value of > 384K when we start to process the extent at file offset 512K, which is the > value set after processing the extent at offset 192K. > > This screws up the lookup of possible extents to clone because due to an > incorrect value of 'cur_inode_next_write_offset' we can now consider > extents for cloning, in the same inode, that lie beyond the current size > of the file in the receiver of the send stream. Also, when checking if > an extent in the same file can be used for cloning, we must also check > that not only its start offset doesn't start at or beyond the current eof > of the file in the receiver but that the source range doesn't go beyond > current eof, that is we must check offset + length does not cross the > current eof, as that makes clone operations fail with -EINVAL. > > So fix this by updating 'cur_inode_next_write_offset' whenever we start > processing an extent and checking an extent's offset and length when > considering it for cloning operations. > > A test case for fstests follows soon. > > Fixes: 11f2069c113e02 ("Btrfs: send, allow clone operations within the same file") > Signed-off-by: Filipe Manana <fdmanana@suse.com> Tested-by: Craig Andrews <candrews@integralblue.com> (on behalf of Craig, see https://lore.kernel.org/linux-btrfs/f2ca887d98c1b5aacc4dde88cba74d98@integralblue.com/) > --- > fs/btrfs/send.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c > index 091e5bc8c7ea..0b42dac8a35f 100644 > --- a/fs/btrfs/send.c > +++ b/fs/btrfs/send.c > @@ -1269,7 +1269,8 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 root, void *ctx_) > * destination of the stream. > */ > if (ino == bctx->cur_objectid && > - offset >= bctx->sctx->cur_inode_next_write_offset) > + offset + bctx->extent_len > > + bctx->sctx->cur_inode_next_write_offset) > return 0; > } > > @@ -5804,6 +5805,18 @@ static int process_extent(struct send_ctx *sctx, > } > } > > + /* > + * There might be a hole between the end of the last processed extent > + * and this extent, and we may have not sent a write operation for that > + * hole because it was not needed (range is beyond eof in the parent > + * snapshot). So adjust the next write offset to the offset of this > + * extent, as we want to make sure we don't do mistakes when checking if > + * we can clone this extent from some other offset in this inode or when > + * detecting if we need to issue a truncate operation when finishing the > + * processing this inode. > + */ > + sctx->cur_inode_next_write_offset = key->offset; > + > ret = find_extent_clone(sctx, path, key->objectid, key->offset, > sctx->cur_inode_size, &found_clone); > if (ret != -ENOENT && ret < 0) > -- > 2.11.0 >
On Fri, Jan 24, 2020 at 11:52:04AM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When doing an incremental send and a file has extents shared with itself > at different file offsets, it's possible for send to emit clone operations > that will fail at the destination because the source range goes beyond the > file's current size. This happens when the file size has increased in the > send snapshot, there is a hole between the shared extents and both shared > extents are at file offsets which are greater the file's size in the > parent snapshot. > > Example: > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt/sdb > > $ xfs_io -f -c "pwrite -S 0xf1 0 64K" /mnt/sdb/foobar > $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base > $ btrfs send -f /tmp/1.snap /mnt/sdb/base > > # Create a 320K extent at file offset 512K. > $ xfs_io -c "pwrite -S 0xab 512K 64K" /mnt/sdb/foobar > $ xfs_io -c "pwrite -S 0xcd 576K 64K" /mnt/sdb/foobar > $ xfs_io -c "pwrite -S 0xef 640K 64K" /mnt/sdb/foobar > $ xfs_io -c "pwrite -S 0x64 704K 64K" /mnt/sdb/foobar > $ xfs_io -c "pwrite -S 0x73 768K 64K" /mnt/sdb/foobar > > # Clone part of that 320K extent into a lower file offset (192K). > # This file offset is greater than the file's size in the parent > # snapshot (64K). Also the clone range is a bit behind the offset of > # the 320K extent so that we leave a hole between the shared extents. > $ xfs_io -c "reflink /mnt/sdb/foobar 448K 192K 192K" /mnt/sdb/foobar > > $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr > $ btrfs send -p /mnt/sdb/base -f /tmp/2.snap /mnt/sdb/incr > > $ mkfs.btrfs -f /dev/sdc > $ mount /dev/sdc /mnt/sdc > > $ btrfs receive -f /tmp/1.snap /mnt/sdc > $ btrfs receive -f /tmp/2.snap /mnt/sdc > ERROR: failed to clone extents to foobar: Invalid argument > > The problem is that after processing the extent at file offset 192K, send > does not issue a write operation full of zeroes for the hole between that > extent and the extent starting at file offset 520K (hole range from 384K > to 512K), this is because the hole is at an offset larger the size of the > file in the parent snapshot (384K > 64K). As a consequence the field > 'cur_inode_next_write_offset' of the send context remains with a value of > 384K when we start to process the extent at file offset 512K, which is the > value set after processing the extent at offset 192K. > > This screws up the lookup of possible extents to clone because due to an > incorrect value of 'cur_inode_next_write_offset' we can now consider > extents for cloning, in the same inode, that lie beyond the current size > of the file in the receiver of the send stream. Also, when checking if > an extent in the same file can be used for cloning, we must also check > that not only its start offset doesn't start at or beyond the current eof > of the file in the receiver but that the source range doesn't go beyond > current eof, that is we must check offset + length does not cross the > current eof, as that makes clone operations fail with -EINVAL. > > So fix this by updating 'cur_inode_next_write_offset' whenever we start > processing an extent and checking an extent's offset and length when > considering it for cloning operations. > > A test case for fstests follows soon. > > Fixes: 11f2069c113e02 ("Btrfs: send, allow clone operations within the same file") > Signed-off-by: Filipe Manana <fdmanana@suse.com> Added to misc-next, with the tested-by tag, thanks.
On Tue, Jan 28, 2020 at 4:22 PM David Sterba <dsterba@suse.cz> wrote: > > On Fri, Jan 24, 2020 at 11:52:04AM +0000, fdmanana@kernel.org wrote: > > From: Filipe Manana <fdmanana@suse.com> > > > > When doing an incremental send and a file has extents shared with itself > > at different file offsets, it's possible for send to emit clone operations > > that will fail at the destination because the source range goes beyond the > > file's current size. This happens when the file size has increased in the > > send snapshot, there is a hole between the shared extents and both shared > > extents are at file offsets which are greater the file's size in the > > parent snapshot. > > > > Example: > > > > $ mkfs.btrfs -f /dev/sdb > > $ mount /dev/sdb /mnt/sdb > > > > $ xfs_io -f -c "pwrite -S 0xf1 0 64K" /mnt/sdb/foobar > > $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base > > $ btrfs send -f /tmp/1.snap /mnt/sdb/base > > > > # Create a 320K extent at file offset 512K. > > $ xfs_io -c "pwrite -S 0xab 512K 64K" /mnt/sdb/foobar > > $ xfs_io -c "pwrite -S 0xcd 576K 64K" /mnt/sdb/foobar > > $ xfs_io -c "pwrite -S 0xef 640K 64K" /mnt/sdb/foobar > > $ xfs_io -c "pwrite -S 0x64 704K 64K" /mnt/sdb/foobar > > $ xfs_io -c "pwrite -S 0x73 768K 64K" /mnt/sdb/foobar > > > > # Clone part of that 320K extent into a lower file offset (192K). > > # This file offset is greater than the file's size in the parent > > # snapshot (64K). Also the clone range is a bit behind the offset of > > # the 320K extent so that we leave a hole between the shared extents. > > $ xfs_io -c "reflink /mnt/sdb/foobar 448K 192K 192K" /mnt/sdb/foobar > > > > $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr > > $ btrfs send -p /mnt/sdb/base -f /tmp/2.snap /mnt/sdb/incr > > > > $ mkfs.btrfs -f /dev/sdc > > $ mount /dev/sdc /mnt/sdc > > > > $ btrfs receive -f /tmp/1.snap /mnt/sdc > > $ btrfs receive -f /tmp/2.snap /mnt/sdc > > ERROR: failed to clone extents to foobar: Invalid argument > > > > The problem is that after processing the extent at file offset 192K, send > > does not issue a write operation full of zeroes for the hole between that > > extent and the extent starting at file offset 520K (hole range from 384K > > to 512K), this is because the hole is at an offset larger the size of the > > file in the parent snapshot (384K > 64K). As a consequence the field > > 'cur_inode_next_write_offset' of the send context remains with a value of > > 384K when we start to process the extent at file offset 512K, which is the > > value set after processing the extent at offset 192K. > > > > This screws up the lookup of possible extents to clone because due to an > > incorrect value of 'cur_inode_next_write_offset' we can now consider > > extents for cloning, in the same inode, that lie beyond the current size > > of the file in the receiver of the send stream. Also, when checking if > > an extent in the same file can be used for cloning, we must also check > > that not only its start offset doesn't start at or beyond the current eof > > of the file in the receiver but that the source range doesn't go beyond > > current eof, that is we must check offset + length does not cross the > > current eof, as that makes clone operations fail with -EINVAL. > > > > So fix this by updating 'cur_inode_next_write_offset' whenever we start > > processing an extent and checking an extent's offset and length when > > considering it for cloning operations. > > > > A test case for fstests follows soon. > > > > Fixes: 11f2069c113e02 ("Btrfs: send, allow clone operations within the same file") > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > Added to misc-next, with the tested-by tag, thanks. Hold on a bit, one hunk of the patch besides not being necessary, it causes problems with fallocated extents beyond i_size and trailing holes. I'll run some more tests to confirm the hunk is not needed and see if Craig can test it too in the meanwhile. Thanks.
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 091e5bc8c7ea..0b42dac8a35f 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -1269,7 +1269,8 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 root, void *ctx_) * destination of the stream. */ if (ino == bctx->cur_objectid && - offset >= bctx->sctx->cur_inode_next_write_offset) + offset + bctx->extent_len > + bctx->sctx->cur_inode_next_write_offset) return 0; } @@ -5804,6 +5805,18 @@ static int process_extent(struct send_ctx *sctx, } } + /* + * There might be a hole between the end of the last processed extent + * and this extent, and we may have not sent a write operation for that + * hole because it was not needed (range is beyond eof in the parent + * snapshot). So adjust the next write offset to the offset of this + * extent, as we want to make sure we don't do mistakes when checking if + * we can clone this extent from some other offset in this inode or when + * detecting if we need to issue a truncate operation when finishing the + * processing this inode. + */ + sctx->cur_inode_next_write_offset = key->offset; + ret = find_extent_clone(sctx, path, key->objectid, key->offset, sctx->cur_inode_size, &found_clone); if (ret != -ENOENT && ret < 0)