Btrfs: send, fix emission of invalid clone operations within the same file
diff mbox series

Message ID 20200124115204.4086-1-fdmanana@kernel.org
State New
Headers show
Series
  • Btrfs: send, fix emission of invalid clone operations within the same file
Related show

Commit Message

Filipe Manana Jan. 24, 2020, 11:52 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

When doing an incremental send and a file has extents shared with itself
at different file offsets, it's possible for send to emit clone operations
that will fail at the destination because the source range goes beyond the
file's current size. This happens when the file size has increased in the
send snapshot, there is a hole between the shared extents and both shared
extents are at file offsets which are greater the file's size in the
parent snapshot.

Example:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt/sdb

  $ xfs_io -f -c "pwrite -S 0xf1 0 64K" /mnt/sdb/foobar
  $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base
  $ btrfs send -f /tmp/1.snap /mnt/sdb/base

  # Create a 320K extent at file offset 512K.
  $ xfs_io -c "pwrite -S 0xab 512K 64K" /mnt/sdb/foobar
  $ xfs_io -c "pwrite -S 0xcd 576K 64K" /mnt/sdb/foobar
  $ xfs_io -c "pwrite -S 0xef 640K 64K" /mnt/sdb/foobar
  $ xfs_io -c "pwrite -S 0x64 704K 64K" /mnt/sdb/foobar
  $ xfs_io -c "pwrite -S 0x73 768K 64K" /mnt/sdb/foobar

  # Clone part of that 320K extent into a lower file offset (192K).
  # This file offset is greater than the file's size in the parent
  # snapshot (64K). Also the clone range is a bit behind the offset of
  # the 320K extent so that we leave a hole between the shared extents.
  $ xfs_io -c "reflink /mnt/sdb/foobar 448K 192K 192K" /mnt/sdb/foobar

  $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr
  $ btrfs send -p /mnt/sdb/base -f /tmp/2.snap /mnt/sdb/incr

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt/sdc

  $ btrfs receive -f /tmp/1.snap /mnt/sdc
  $ btrfs receive -f /tmp/2.snap /mnt/sdc
  ERROR: failed to clone extents to foobar: Invalid argument

The problem is that after processing the extent at file offset 192K, send
does not issue a write operation full of zeroes for the hole between that
extent and the extent starting at file offset 520K (hole range from 384K
to 512K), this is because the hole is at an offset larger the size of the
file in the parent snapshot (384K > 64K). As a consequence the field
'cur_inode_next_write_offset' of the send context remains with a value of
384K when we start to process the extent at file offset 512K, which is the
value set after processing the extent at offset 192K.

This screws up the lookup of possible extents to clone because due to an
incorrect value of 'cur_inode_next_write_offset' we can now consider
extents for cloning, in the same inode, that lie beyond the current size
of the file in the receiver of the send stream. Also, when checking if
an extent in the same file can be used for cloning, we must also check
that not only its start offset doesn't start at or beyond the current eof
of the file in the receiver but that the source range doesn't go beyond
current eof, that is we must check offset + length does not cross the
current eof, as that makes clone operations fail with -EINVAL.

So fix this by updating 'cur_inode_next_write_offset' whenever we start
processing an extent and checking an extent's offset and length when
considering it for cloning operations.

A test case for fstests follows soon.

Fixes: 11f2069c113e02 ("Btrfs: send, allow clone operations within the same file")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/send.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Josef Bacik Jan. 24, 2020, 1:53 p.m. UTC | #1
On 1/24/20 6:52 AM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When doing an incremental send and a file has extents shared with itself
> at different file offsets, it's possible for send to emit clone operations
> that will fail at the destination because the source range goes beyond the
> file's current size. This happens when the file size has increased in the
> send snapshot, there is a hole between the shared extents and both shared
> extents are at file offsets which are greater the file's size in the
> parent snapshot.
> 
> Example:
> 
>    $ mkfs.btrfs -f /dev/sdb
>    $ mount /dev/sdb /mnt/sdb
> 
>    $ xfs_io -f -c "pwrite -S 0xf1 0 64K" /mnt/sdb/foobar
>    $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base
>    $ btrfs send -f /tmp/1.snap /mnt/sdb/base
> 
>    # Create a 320K extent at file offset 512K.
>    $ xfs_io -c "pwrite -S 0xab 512K 64K" /mnt/sdb/foobar
>    $ xfs_io -c "pwrite -S 0xcd 576K 64K" /mnt/sdb/foobar
>    $ xfs_io -c "pwrite -S 0xef 640K 64K" /mnt/sdb/foobar
>    $ xfs_io -c "pwrite -S 0x64 704K 64K" /mnt/sdb/foobar
>    $ xfs_io -c "pwrite -S 0x73 768K 64K" /mnt/sdb/foobar
> 
>    # Clone part of that 320K extent into a lower file offset (192K).
>    # This file offset is greater than the file's size in the parent
>    # snapshot (64K). Also the clone range is a bit behind the offset of
>    # the 320K extent so that we leave a hole between the shared extents.
>    $ xfs_io -c "reflink /mnt/sdb/foobar 448K 192K 192K" /mnt/sdb/foobar
> 
>    $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr
>    $ btrfs send -p /mnt/sdb/base -f /tmp/2.snap /mnt/sdb/incr
> 
>    $ mkfs.btrfs -f /dev/sdc
>    $ mount /dev/sdc /mnt/sdc
> 
>    $ btrfs receive -f /tmp/1.snap /mnt/sdc
>    $ btrfs receive -f /tmp/2.snap /mnt/sdc
>    ERROR: failed to clone extents to foobar: Invalid argument
> 
> The problem is that after processing the extent at file offset 192K, send
> does not issue a write operation full of zeroes for the hole between that
> extent and the extent starting at file offset 520K (hole range from 384K
> to 512K), this is because the hole is at an offset larger the size of the
> file in the parent snapshot (384K > 64K). As a consequence the field
> 'cur_inode_next_write_offset' of the send context remains with a value of
> 384K when we start to process the extent at file offset 512K, which is the
> value set after processing the extent at offset 192K.
> 
> This screws up the lookup of possible extents to clone because due to an
> incorrect value of 'cur_inode_next_write_offset' we can now consider
> extents for cloning, in the same inode, that lie beyond the current size
> of the file in the receiver of the send stream. Also, when checking if
> an extent in the same file can be used for cloning, we must also check
> that not only its start offset doesn't start at or beyond the current eof
> of the file in the receiver but that the source range doesn't go beyond
> current eof, that is we must check offset + length does not cross the
> current eof, as that makes clone operations fail with -EINVAL.
> 
> So fix this by updating 'cur_inode_next_write_offset' whenever we start
> processing an extent and checking an extent's offset and length when
> considering it for cloning operations.
> 
> A test case for fstests follows soon.
> 
> Fixes: 11f2069c113e02 ("Btrfs: send, allow clone operations within the same file")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
Filipe Manana Jan. 28, 2020, 3:07 p.m. UTC | #2
On Fri, Jan 24, 2020 at 11:54 AM <fdmanana@kernel.org> wrote:
>
> From: Filipe Manana <fdmanana@suse.com>
>
> When doing an incremental send and a file has extents shared with itself
> at different file offsets, it's possible for send to emit clone operations
> that will fail at the destination because the source range goes beyond the
> file's current size. This happens when the file size has increased in the
> send snapshot, there is a hole between the shared extents and both shared
> extents are at file offsets which are greater the file's size in the
> parent snapshot.
>
> Example:
>
>   $ mkfs.btrfs -f /dev/sdb
>   $ mount /dev/sdb /mnt/sdb
>
>   $ xfs_io -f -c "pwrite -S 0xf1 0 64K" /mnt/sdb/foobar
>   $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base
>   $ btrfs send -f /tmp/1.snap /mnt/sdb/base
>
>   # Create a 320K extent at file offset 512K.
>   $ xfs_io -c "pwrite -S 0xab 512K 64K" /mnt/sdb/foobar
>   $ xfs_io -c "pwrite -S 0xcd 576K 64K" /mnt/sdb/foobar
>   $ xfs_io -c "pwrite -S 0xef 640K 64K" /mnt/sdb/foobar
>   $ xfs_io -c "pwrite -S 0x64 704K 64K" /mnt/sdb/foobar
>   $ xfs_io -c "pwrite -S 0x73 768K 64K" /mnt/sdb/foobar
>
>   # Clone part of that 320K extent into a lower file offset (192K).
>   # This file offset is greater than the file's size in the parent
>   # snapshot (64K). Also the clone range is a bit behind the offset of
>   # the 320K extent so that we leave a hole between the shared extents.
>   $ xfs_io -c "reflink /mnt/sdb/foobar 448K 192K 192K" /mnt/sdb/foobar
>
>   $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr
>   $ btrfs send -p /mnt/sdb/base -f /tmp/2.snap /mnt/sdb/incr
>
>   $ mkfs.btrfs -f /dev/sdc
>   $ mount /dev/sdc /mnt/sdc
>
>   $ btrfs receive -f /tmp/1.snap /mnt/sdc
>   $ btrfs receive -f /tmp/2.snap /mnt/sdc
>   ERROR: failed to clone extents to foobar: Invalid argument
>
> The problem is that after processing the extent at file offset 192K, send
> does not issue a write operation full of zeroes for the hole between that
> extent and the extent starting at file offset 520K (hole range from 384K
> to 512K), this is because the hole is at an offset larger the size of the
> file in the parent snapshot (384K > 64K). As a consequence the field
> 'cur_inode_next_write_offset' of the send context remains with a value of
> 384K when we start to process the extent at file offset 512K, which is the
> value set after processing the extent at offset 192K.
>
> This screws up the lookup of possible extents to clone because due to an
> incorrect value of 'cur_inode_next_write_offset' we can now consider
> extents for cloning, in the same inode, that lie beyond the current size
> of the file in the receiver of the send stream. Also, when checking if
> an extent in the same file can be used for cloning, we must also check
> that not only its start offset doesn't start at or beyond the current eof
> of the file in the receiver but that the source range doesn't go beyond
> current eof, that is we must check offset + length does not cross the
> current eof, as that makes clone operations fail with -EINVAL.
>
> So fix this by updating 'cur_inode_next_write_offset' whenever we start
> processing an extent and checking an extent's offset and length when
> considering it for cloning operations.
>
> A test case for fstests follows soon.
>
> Fixes: 11f2069c113e02 ("Btrfs: send, allow clone operations within the same file")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Tested-by: Craig Andrews <candrews@integralblue.com>

(on behalf of Craig, see
https://lore.kernel.org/linux-btrfs/f2ca887d98c1b5aacc4dde88cba74d98@integralblue.com/)

> ---
>  fs/btrfs/send.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 091e5bc8c7ea..0b42dac8a35f 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -1269,7 +1269,8 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 root, void *ctx_)
>                  * destination of the stream.
>                  */
>                 if (ino == bctx->cur_objectid &&
> -                   offset >= bctx->sctx->cur_inode_next_write_offset)
> +                   offset + bctx->extent_len >
> +                   bctx->sctx->cur_inode_next_write_offset)
>                         return 0;
>         }
>
> @@ -5804,6 +5805,18 @@ static int process_extent(struct send_ctx *sctx,
>                 }
>         }
>
> +       /*
> +        * There might be a hole between the end of the last processed extent
> +        * and this extent, and we may have not sent a write operation for that
> +        * hole because it was not needed (range is beyond eof in the parent
> +        * snapshot). So adjust the next write offset to the offset of this
> +        * extent, as we want to make sure we don't do mistakes when checking if
> +        * we can clone this extent from some other offset in this inode or when
> +        * detecting if we need to issue a truncate operation when finishing the
> +        * processing this inode.
> +        */
> +       sctx->cur_inode_next_write_offset = key->offset;
> +
>         ret = find_extent_clone(sctx, path, key->objectid, key->offset,
>                         sctx->cur_inode_size, &found_clone);
>         if (ret != -ENOENT && ret < 0)
> --
> 2.11.0
>
David Sterba Jan. 28, 2020, 4:21 p.m. UTC | #3
On Fri, Jan 24, 2020 at 11:52:04AM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When doing an incremental send and a file has extents shared with itself
> at different file offsets, it's possible for send to emit clone operations
> that will fail at the destination because the source range goes beyond the
> file's current size. This happens when the file size has increased in the
> send snapshot, there is a hole between the shared extents and both shared
> extents are at file offsets which are greater the file's size in the
> parent snapshot.
> 
> Example:
> 
>   $ mkfs.btrfs -f /dev/sdb
>   $ mount /dev/sdb /mnt/sdb
> 
>   $ xfs_io -f -c "pwrite -S 0xf1 0 64K" /mnt/sdb/foobar
>   $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base
>   $ btrfs send -f /tmp/1.snap /mnt/sdb/base
> 
>   # Create a 320K extent at file offset 512K.
>   $ xfs_io -c "pwrite -S 0xab 512K 64K" /mnt/sdb/foobar
>   $ xfs_io -c "pwrite -S 0xcd 576K 64K" /mnt/sdb/foobar
>   $ xfs_io -c "pwrite -S 0xef 640K 64K" /mnt/sdb/foobar
>   $ xfs_io -c "pwrite -S 0x64 704K 64K" /mnt/sdb/foobar
>   $ xfs_io -c "pwrite -S 0x73 768K 64K" /mnt/sdb/foobar
> 
>   # Clone part of that 320K extent into a lower file offset (192K).
>   # This file offset is greater than the file's size in the parent
>   # snapshot (64K). Also the clone range is a bit behind the offset of
>   # the 320K extent so that we leave a hole between the shared extents.
>   $ xfs_io -c "reflink /mnt/sdb/foobar 448K 192K 192K" /mnt/sdb/foobar
> 
>   $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr
>   $ btrfs send -p /mnt/sdb/base -f /tmp/2.snap /mnt/sdb/incr
> 
>   $ mkfs.btrfs -f /dev/sdc
>   $ mount /dev/sdc /mnt/sdc
> 
>   $ btrfs receive -f /tmp/1.snap /mnt/sdc
>   $ btrfs receive -f /tmp/2.snap /mnt/sdc
>   ERROR: failed to clone extents to foobar: Invalid argument
> 
> The problem is that after processing the extent at file offset 192K, send
> does not issue a write operation full of zeroes for the hole between that
> extent and the extent starting at file offset 520K (hole range from 384K
> to 512K), this is because the hole is at an offset larger the size of the
> file in the parent snapshot (384K > 64K). As a consequence the field
> 'cur_inode_next_write_offset' of the send context remains with a value of
> 384K when we start to process the extent at file offset 512K, which is the
> value set after processing the extent at offset 192K.
> 
> This screws up the lookup of possible extents to clone because due to an
> incorrect value of 'cur_inode_next_write_offset' we can now consider
> extents for cloning, in the same inode, that lie beyond the current size
> of the file in the receiver of the send stream. Also, when checking if
> an extent in the same file can be used for cloning, we must also check
> that not only its start offset doesn't start at or beyond the current eof
> of the file in the receiver but that the source range doesn't go beyond
> current eof, that is we must check offset + length does not cross the
> current eof, as that makes clone operations fail with -EINVAL.
> 
> So fix this by updating 'cur_inode_next_write_offset' whenever we start
> processing an extent and checking an extent's offset and length when
> considering it for cloning operations.
> 
> A test case for fstests follows soon.
> 
> Fixes: 11f2069c113e02 ("Btrfs: send, allow clone operations within the same file")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Added to misc-next, with the tested-by tag, thanks.
Filipe Manana Jan. 28, 2020, 5:35 p.m. UTC | #4
On Tue, Jan 28, 2020 at 4:22 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Fri, Jan 24, 2020 at 11:52:04AM +0000, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When doing an incremental send and a file has extents shared with itself
> > at different file offsets, it's possible for send to emit clone operations
> > that will fail at the destination because the source range goes beyond the
> > file's current size. This happens when the file size has increased in the
> > send snapshot, there is a hole between the shared extents and both shared
> > extents are at file offsets which are greater the file's size in the
> > parent snapshot.
> >
> > Example:
> >
> >   $ mkfs.btrfs -f /dev/sdb
> >   $ mount /dev/sdb /mnt/sdb
> >
> >   $ xfs_io -f -c "pwrite -S 0xf1 0 64K" /mnt/sdb/foobar
> >   $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base
> >   $ btrfs send -f /tmp/1.snap /mnt/sdb/base
> >
> >   # Create a 320K extent at file offset 512K.
> >   $ xfs_io -c "pwrite -S 0xab 512K 64K" /mnt/sdb/foobar
> >   $ xfs_io -c "pwrite -S 0xcd 576K 64K" /mnt/sdb/foobar
> >   $ xfs_io -c "pwrite -S 0xef 640K 64K" /mnt/sdb/foobar
> >   $ xfs_io -c "pwrite -S 0x64 704K 64K" /mnt/sdb/foobar
> >   $ xfs_io -c "pwrite -S 0x73 768K 64K" /mnt/sdb/foobar
> >
> >   # Clone part of that 320K extent into a lower file offset (192K).
> >   # This file offset is greater than the file's size in the parent
> >   # snapshot (64K). Also the clone range is a bit behind the offset of
> >   # the 320K extent so that we leave a hole between the shared extents.
> >   $ xfs_io -c "reflink /mnt/sdb/foobar 448K 192K 192K" /mnt/sdb/foobar
> >
> >   $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr
> >   $ btrfs send -p /mnt/sdb/base -f /tmp/2.snap /mnt/sdb/incr
> >
> >   $ mkfs.btrfs -f /dev/sdc
> >   $ mount /dev/sdc /mnt/sdc
> >
> >   $ btrfs receive -f /tmp/1.snap /mnt/sdc
> >   $ btrfs receive -f /tmp/2.snap /mnt/sdc
> >   ERROR: failed to clone extents to foobar: Invalid argument
> >
> > The problem is that after processing the extent at file offset 192K, send
> > does not issue a write operation full of zeroes for the hole between that
> > extent and the extent starting at file offset 520K (hole range from 384K
> > to 512K), this is because the hole is at an offset larger the size of the
> > file in the parent snapshot (384K > 64K). As a consequence the field
> > 'cur_inode_next_write_offset' of the send context remains with a value of
> > 384K when we start to process the extent at file offset 512K, which is the
> > value set after processing the extent at offset 192K.
> >
> > This screws up the lookup of possible extents to clone because due to an
> > incorrect value of 'cur_inode_next_write_offset' we can now consider
> > extents for cloning, in the same inode, that lie beyond the current size
> > of the file in the receiver of the send stream. Also, when checking if
> > an extent in the same file can be used for cloning, we must also check
> > that not only its start offset doesn't start at or beyond the current eof
> > of the file in the receiver but that the source range doesn't go beyond
> > current eof, that is we must check offset + length does not cross the
> > current eof, as that makes clone operations fail with -EINVAL.
> >
> > So fix this by updating 'cur_inode_next_write_offset' whenever we start
> > processing an extent and checking an extent's offset and length when
> > considering it for cloning operations.
> >
> > A test case for fstests follows soon.
> >
> > Fixes: 11f2069c113e02 ("Btrfs: send, allow clone operations within the same file")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Added to misc-next, with the tested-by tag, thanks.

Hold on a bit, one hunk of the patch besides not being necessary, it
causes problems with fallocated extents beyond i_size and trailing
holes.
I'll run some more tests to confirm the hunk is not needed and see if
Craig can test it too in the meanwhile.

Thanks.

Patch
diff mbox series

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 091e5bc8c7ea..0b42dac8a35f 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -1269,7 +1269,8 @@  static int __iterate_backrefs(u64 ino, u64 offset, u64 root, void *ctx_)
 		 * destination of the stream.
 		 */
 		if (ino == bctx->cur_objectid &&
-		    offset >= bctx->sctx->cur_inode_next_write_offset)
+		    offset + bctx->extent_len >
+		    bctx->sctx->cur_inode_next_write_offset)
 			return 0;
 	}
 
@@ -5804,6 +5805,18 @@  static int process_extent(struct send_ctx *sctx,
 		}
 	}
 
+	/*
+	 * There might be a hole between the end of the last processed extent
+	 * and this extent, and we may have not sent a write operation for that
+	 * hole because it was not needed (range is beyond eof in the parent
+	 * snapshot). So adjust the next write offset to the offset of this
+	 * extent, as we want to make sure we don't do mistakes when checking if
+	 * we can clone this extent from some other offset in this inode or when
+	 * detecting if we need to issue a truncate operation when finishing the
+	 * processing this inode.
+	 */
+	sctx->cur_inode_next_write_offset = key->offset;
+
 	ret = find_extent_clone(sctx, path, key->objectid, key->offset,
 			sctx->cur_inode_size, &found_clone);
 	if (ret != -ENOENT && ret < 0)