diff mbox series

[v2] btrfs: Remove received information from snapshot on ro->rw switch

Message ID 20210908135135.1474055-1-nborisov@suse.com (mailing list archive)
State New, archived
Headers show
Series [v2] btrfs: Remove received information from snapshot on ro->rw switch | expand

Commit Message

Nikolay Borisov Sept. 8, 2021, 1:51 p.m. UTC
Currently when a read-only snapshot is received and subsequently its
ro property is set to false i.e. switched to rw-mode the
received_uuid/stime/rtime/stransid/rtransid of that subvol remains
intact. However, once the received volume is switched to RW mode we
cannot guaranteee that it contains the same data, so it makes sense
to remove those fields which indicate this volume was ever
send/received. Additionally, sending such volume can cause conflicts
due to the presence of received_uuid.

Preserving the received_uuid (and related fields) after transitioning the
snapshot from RO to RW and then changing the snapshot has a potential for
causing send to fail in many unexpected ways if we later turn it back to
RO and use it for an incremental send operation.

A recent example, in the thread to which the Link tag below points to, we
had a user with a filesystem that had multiple snapshots with the same
received_uuid but with different content due to a transition from RO to RW
and then back to RO. When an incremental send was attempted using two of
those snapshots, it resulted in send emitting a clone operation that was
intended to clone from the parent root to the send root - however because
both roots had the same received_uuid, the receiver ended up picking the
send root as the source root for the clone operation, which happened to
result in the clone operation to fail with -EINVAL, due to the source and
destination offsets being the same (and on the same root and file). In this
particular case there was no harm, but we could end up in a case where the
clone operation succeeds but clones wrong data due to picking up an
incorrect root - as in the case of multiple snapshots with the same
received_uuid, it has no way to know which one is the correct one if they
have different content.

Link: https://lore.kernel.org/linux-btrfs/CAOaVUnV3L6RpcqJ5gaqzNXWXK0VMkEVXCdihawH1PgS6TiMchQ@mail.gmail.com/
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Suggested-by: David Sterba <dsterba@suse.cz>
---
 fs/btrfs/ioctl.c | 41 +++++++++++++++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 6 deletions(-)

Comments

Filipe Manana Sept. 8, 2021, 2:08 p.m. UTC | #1
On Wed, Sep 8, 2021 at 2:53 PM Nikolay Borisov <nborisov@suse.com> wrote:
>
> Currently when a read-only snapshot is received and subsequently its
> ro property is set to false i.e. switched to rw-mode the
> received_uuid/stime/rtime/stransid/rtransid of that subvol remains
> intact. However, once the received volume is switched to RW mode we
> cannot guaranteee that it contains the same data, so it makes sense
> to remove those fields which indicate this volume was ever
> send/received. Additionally, sending such volume can cause conflicts
> due to the presence of received_uuid.
>
> Preserving the received_uuid (and related fields) after transitioning the
> snapshot from RO to RW and then changing the snapshot has a potential for
> causing send to fail in many unexpected ways if we later turn it back to
> RO and use it for an incremental send operation.
>
> A recent example, in the thread to which the Link tag below points to, we
> had a user with a filesystem that had multiple snapshots with the same
> received_uuid but with different content due to a transition from RO to RW
> and then back to RO. When an incremental send was attempted using two of
> those snapshots, it resulted in send emitting a clone operation that was
> intended to clone from the parent root to the send root - however because
> both roots had the same received_uuid, the receiver ended up picking the
> send root as the source root for the clone operation, which happened to
> result in the clone operation to fail with -EINVAL, due to the source and
> destination offsets being the same (and on the same root and file). In this
> particular case there was no harm, but we could end up in a case where the
> clone operation succeeds but clones wrong data due to picking up an
> incorrect root - as in the case of multiple snapshots with the same
> received_uuid, it has no way to know which one is the correct one if they
> have different content.
>
> Link: https://lore.kernel.org/linux-btrfs/CAOaVUnV3L6RpcqJ5gaqzNXWXK0VMkEVXCdihawH1PgS6TiMchQ@mail.gmail.com/
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> Suggested-by: David Sterba <dsterba@suse.cz>

Thanks, looks good.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

> ---
>  fs/btrfs/ioctl.c | 41 +++++++++++++++++++++++++++++++++++------
>  1 file changed, 35 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 9eb0c1eb568e..67709d274489 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1927,9 +1927,11 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>         struct inode *inode = file_inode(file);
>         struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>         struct btrfs_root *root = BTRFS_I(inode)->root;
> +       struct btrfs_root_item *root_item = &root->root_item;
>         struct btrfs_trans_handle *trans;
>         u64 root_flags;
>         u64 flags;
> +       bool clear_received_state = false;
>         int ret = 0;
>
>         if (!inode_owner_or_capable(file_mnt_user_ns(file), inode))
> @@ -1960,9 +1962,9 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>         if (!!(flags & BTRFS_SUBVOL_RDONLY) == btrfs_root_readonly(root))
>                 goto out_drop_sem;
>
> -       root_flags = btrfs_root_flags(&root->root_item);
> +       root_flags = btrfs_root_flags(root_item);
>         if (flags & BTRFS_SUBVOL_RDONLY) {
> -               btrfs_set_root_flags(&root->root_item,
> +               btrfs_set_root_flags(root_item,
>                                      root_flags | BTRFS_ROOT_SUBVOL_RDONLY);
>         } else {
>                 /*
> @@ -1971,9 +1973,10 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>                  */
>                 spin_lock(&root->root_item_lock);
>                 if (root->send_in_progress == 0) {
> -                       btrfs_set_root_flags(&root->root_item,
> +                       btrfs_set_root_flags(root_item,
>                                      root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY);
>                         spin_unlock(&root->root_item_lock);
> +                       clear_received_state = true;
>                 } else {
>                         spin_unlock(&root->root_item_lock);
>                         btrfs_warn(fs_info,
> @@ -1984,14 +1987,40 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>                 }
>         }
>
> -       trans = btrfs_start_transaction(root, 1);
> +       /*
> +        * 1 item for updating the uuid root in the root tree
> +        * 1 item for actually removing the uuid record in the uuid tree
> +        */
> +       trans = btrfs_start_transaction(root, 2);
>         if (IS_ERR(trans)) {
>                 ret = PTR_ERR(trans);
>                 goto out_reset;
>         }
>
> -       ret = btrfs_update_root(trans, fs_info->tree_root,
> -                               &root->root_key, &root->root_item);
> +       if (clear_received_state &&
> +           !btrfs_is_empty_uuid(root_item->received_uuid)) {
> +
> +               ret = btrfs_uuid_tree_remove(trans, root_item->received_uuid,
> +                                            BTRFS_UUID_KEY_RECEIVED_SUBVOL,
> +                                            root->root_key.objectid);
> +
> +               if (ret && ret != -ENOENT) {
> +                       btrfs_abort_transaction(trans, ret);
> +                       btrfs_end_transaction(trans);
> +                       goto out_reset;
> +               }
> +
> +               memset(root_item->received_uuid, 0, BTRFS_UUID_SIZE);
> +               btrfs_set_root_stransid(root_item, 0);
> +               btrfs_set_root_rtransid(root_item, 0);
> +               btrfs_set_stack_timespec_sec(&root_item->stime, 0);
> +               btrfs_set_stack_timespec_nsec(&root_item->stime, 0);
> +               btrfs_set_stack_timespec_sec(&root_item->rtime, 0);
> +               btrfs_set_stack_timespec_nsec(&root_item->rtime, 0);
> +       }
> +
> +       ret = btrfs_update_root(trans, fs_info->tree_root, &root->root_key,
> +                               root_item);
>         if (ret < 0) {
>                 btrfs_end_transaction(trans);
>                 goto out_reset;
> --
> 2.25.1
>
Martin Raiber Sept. 8, 2021, 4:34 p.m. UTC | #2
On 08.09.2021 15:51 Nikolay Borisov wrote:
> Currently when a read-only snapshot is received and subsequently its
> ro property is set to false i.e. switched to rw-mode the
> received_uuid/stime/rtime/stransid/rtransid of that subvol remains
> intact. However, once the received volume is switched to RW mode we
> cannot guaranteee that it contains the same data, so it makes sense
> to remove those fields which indicate this volume was ever
> send/received. Additionally, sending such volume can cause conflicts
> due to the presence of received_uuid.
>
> Preserving the received_uuid (and related fields) after transitioning the
> snapshot from RO to RW and then changing the snapshot has a potential for
> causing send to fail in many unexpected ways if we later turn it back to
> RO and use it for an incremental send operation.
>
> A recent example, in the thread to which the Link tag below points to, we
> had a user with a filesystem that had multiple snapshots with the same
> received_uuid but with different content due to a transition from RO to RW
> and then back to RO. When an incremental send was attempted using two of
> those snapshots, it resulted in send emitting a clone operation that was
> intended to clone from the parent root to the send root - however because
> both roots had the same received_uuid, the receiver ended up picking the
> send root as the source root for the clone operation, which happened to
> result in the clone operation to fail with -EINVAL, due to the source and
> destination offsets being the same (and on the same root and file). In this
> particular case there was no harm, but we could end up in a case where the
> clone operation succeeds but clones wrong data due to picking up an
> incorrect root - as in the case of multiple snapshots with the same
> received_uuid, it has no way to know which one is the correct one if they
> have different content.
Not to overly discourage this change but I think this will cause some issues in user space.

For example I had the problem of partial subvols after a sudden restart during receive. No problem, just receive to a directory that gets deleted on startup then move the subvol to the final location after completion. To move the subvol it needs to be temporarily set rw for some reason. I'm sure there is some better solution but patterns like this might be out there.

>
> Link: https://lore.kernel.org/linux-btrfs/CAOaVUnV3L6RpcqJ5gaqzNXWXK0VMkEVXCdihawH1PgS6TiMchQ@mail.gmail.com/
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> Suggested-by: David Sterba <dsterba@suse.cz>
> ---
>  fs/btrfs/ioctl.c | 41 +++++++++++++++++++++++++++++++++++------
>  1 file changed, 35 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 9eb0c1eb568e..67709d274489 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1927,9 +1927,11 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>  	struct inode *inode = file_inode(file);
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
> +	struct btrfs_root_item *root_item = &root->root_item;
>  	struct btrfs_trans_handle *trans;
>  	u64 root_flags;
>  	u64 flags;
> +	bool clear_received_state = false;
>  	int ret = 0;
>  
>  	if (!inode_owner_or_capable(file_mnt_user_ns(file), inode))
> @@ -1960,9 +1962,9 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>  	if (!!(flags & BTRFS_SUBVOL_RDONLY) == btrfs_root_readonly(root))
>  		goto out_drop_sem;
>  
> -	root_flags = btrfs_root_flags(&root->root_item);
> +	root_flags = btrfs_root_flags(root_item);
>  	if (flags & BTRFS_SUBVOL_RDONLY) {
> -		btrfs_set_root_flags(&root->root_item,
> +		btrfs_set_root_flags(root_item,
>  				     root_flags | BTRFS_ROOT_SUBVOL_RDONLY);
>  	} else {
>  		/*
> @@ -1971,9 +1973,10 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>  		 */
>  		spin_lock(&root->root_item_lock);
>  		if (root->send_in_progress == 0) {
> -			btrfs_set_root_flags(&root->root_item,
> +			btrfs_set_root_flags(root_item,
>  				     root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY);
>  			spin_unlock(&root->root_item_lock);
> +			clear_received_state = true;
>  		} else {
>  			spin_unlock(&root->root_item_lock);
>  			btrfs_warn(fs_info,
> @@ -1984,14 +1987,40 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>  		}
>  	}
>  
> -	trans = btrfs_start_transaction(root, 1);
> +	/*
> +	 * 1 item for updating the uuid root in the root tree
> +	 * 1 item for actually removing the uuid record in the uuid tree
> +	 */
> +	trans = btrfs_start_transaction(root, 2);
>  	if (IS_ERR(trans)) {
>  		ret = PTR_ERR(trans);
>  		goto out_reset;
>  	}
>  
> -	ret = btrfs_update_root(trans, fs_info->tree_root,
> -				&root->root_key, &root->root_item);
> +	if (clear_received_state &&
> +	    !btrfs_is_empty_uuid(root_item->received_uuid)) {
> +
> +		ret = btrfs_uuid_tree_remove(trans, root_item->received_uuid,
> +					     BTRFS_UUID_KEY_RECEIVED_SUBVOL,
> +					     root->root_key.objectid);
> +
> +		if (ret && ret != -ENOENT) {
> +			btrfs_abort_transaction(trans, ret);
> +			btrfs_end_transaction(trans);
> +			goto out_reset;
> +		}
> +
> +		memset(root_item->received_uuid, 0, BTRFS_UUID_SIZE);
> +		btrfs_set_root_stransid(root_item, 0);
> +		btrfs_set_root_rtransid(root_item, 0);
> +		btrfs_set_stack_timespec_sec(&root_item->stime, 0);
> +		btrfs_set_stack_timespec_nsec(&root_item->stime, 0);
> +		btrfs_set_stack_timespec_sec(&root_item->rtime, 0);
> +		btrfs_set_stack_timespec_nsec(&root_item->rtime, 0);
> +	}
> +
> +	ret = btrfs_update_root(trans, fs_info->tree_root, &root->root_key,
> +				root_item);
>  	if (ret < 0) {
>  		btrfs_end_transaction(trans);
>  		goto out_reset;
David Sterba Sept. 8, 2021, 6:33 p.m. UTC | #3
On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote:
> On 08.09.2021 15:51 Nikolay Borisov wrote:
> > Currently when a read-only snapshot is received and subsequently its
> > ro property is set to false i.e. switched to rw-mode the
> > received_uuid/stime/rtime/stransid/rtransid of that subvol remains
> > intact. However, once the received volume is switched to RW mode we
> > cannot guaranteee that it contains the same data, so it makes sense
> > to remove those fields which indicate this volume was ever
> > send/received. Additionally, sending such volume can cause conflicts
> > due to the presence of received_uuid.
> >
> > Preserving the received_uuid (and related fields) after transitioning the
> > snapshot from RO to RW and then changing the snapshot has a potential for
> > causing send to fail in many unexpected ways if we later turn it back to
> > RO and use it for an incremental send operation.
> >
> > A recent example, in the thread to which the Link tag below points to, we
> > had a user with a filesystem that had multiple snapshots with the same
> > received_uuid but with different content due to a transition from RO to RW
> > and then back to RO. When an incremental send was attempted using two of
> > those snapshots, it resulted in send emitting a clone operation that was
> > intended to clone from the parent root to the send root - however because
> > both roots had the same received_uuid, the receiver ended up picking the
> > send root as the source root for the clone operation, which happened to
> > result in the clone operation to fail with -EINVAL, due to the source and
> > destination offsets being the same (and on the same root and file). In this
> > particular case there was no harm, but we could end up in a case where the
> > clone operation succeeds but clones wrong data due to picking up an
> > incorrect root - as in the case of multiple snapshots with the same
> > received_uuid, it has no way to know which one is the correct one if they
> > have different content.
> Not to overly discourage this change but I think this will cause some
> issues in user space.

That this change can cause issues for users is the reason why it hasn't
been merged. The change on the kernel side is simple, but I've been
missing the progs part and the "what-if"s that happen in practice.

This hasn't been communicated properly so we've got resends without
changes. I had a chat with Nikolay about what's missing so hopefully we
can move forward this time.

> For example I had the problem of partial subvols after a sudden
> restart during receive. No problem, just receive to a directory that
> gets deleted on startup then move the subvol to the final location
> after completion. To move the subvol it needs to be temporarily set rw
> for some reason. I'm sure there is some better solution but patterns
> like this might be out there.

Thanks, that's a case we should take into account. And there are
probably more, but I'm not using send/receive much so that's another
reason why I've been hesitant to merge the patch due to lack of
understanding of the use case.
Graham Cobb Sept. 8, 2021, 9:24 p.m. UTC | #4
On 08/09/2021 19:33, David Sterba wrote:
> On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote:

<snip>

>> For example I had the problem of partial subvols after a sudden
>> restart during receive. No problem, just receive to a directory that
>> gets deleted on startup then move the subvol to the final location
>> after completion. To move the subvol it needs to be temporarily set rw
>> for some reason. I'm sure there is some better solution but patterns
>> like this might be out there.
> 
> Thanks, that's a case we should take into account. And there are
> probably more, but I'm not using send/receive much so that's another
> reason why I've been hesitant to merge the patch due to lack of
> understanding of the use case.
> 

This seems to be an important change to make, but is user-affecting. A
few ideas:

1) Can it be made optional? On by default but possible to turn off
(mount option, sys file, ...) if you really need to retain the current
behaviour.

2) Is there a way to engage with the developers and user community for
popular tools which make heavy use of snapshotting and/or send/receive
for discussion and testing? For example, btrbk, snapper, distros.

3) How about adding an IOCTL to allow the user to set/delete the
received_uuid? Only intended for cases which really need to emulate the
removed behaviour. This could be a less complex long term solution than
keeping option 1 indefinitely.

Graham
Nikolay Borisov Sept. 9, 2021, 6:46 a.m. UTC | #5
On 9.09.21 г. 0:24, Graham Cobb wrote:
> 
> On 08/09/2021 19:33, David Sterba wrote:
>> On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote:
> 
> <snip>
> 
>>> For example I had the problem of partial subvols after a sudden
>>> restart during receive. No problem, just receive to a directory that
>>> gets deleted on startup then move the subvol to the final location
>>> after completion. To move the subvol it needs to be temporarily set rw
>>> for some reason. I'm sure there is some better solution but patterns
>>> like this might be out there.
>>
>> Thanks, that's a case we should take into account. And there are
>> probably more, but I'm not using send/receive much so that's another
>> reason why I've been hesitant to merge the patch due to lack of
>> understanding of the use case.
>>
> 
> This seems to be an important change to make, but is user-affecting. A
> few ideas:
> 
> 1) Can it be made optional? On by default but possible to turn off
> (mount option, sys file, ...) if you really need to retain the current
> behaviour.

But the current behavior is buggy and non-intentional so it should be
rectified. Basically we've made it easy for users to do something which
they shouldn't really be doing, they then bear the consequences and come
to the mailing list for support thinking something is broken.

> 
> 2) Is there a way to engage with the developers and user community for
> popular tools which make heavy use of snapshotting and/or send/receive
> for discussion and testing? For example, btrbk, snapper, distros.
> 
> 3) How about adding an IOCTL to allow the user to set/delete the
> received_uuid? Only intended for cases which really need to emulate the
> removed behaviour. This could be a less complex long term solution than
> keeping option 1 indefinitely.
> 
> Graham
>
Filipe Manana Sept. 9, 2021, 8:22 a.m. UTC | #6
On Wed, Sep 8, 2021 at 7:35 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote:
> > On 08.09.2021 15:51 Nikolay Borisov wrote:
> > > Currently when a read-only snapshot is received and subsequently its
> > > ro property is set to false i.e. switched to rw-mode the
> > > received_uuid/stime/rtime/stransid/rtransid of that subvol remains
> > > intact. However, once the received volume is switched to RW mode we
> > > cannot guaranteee that it contains the same data, so it makes sense
> > > to remove those fields which indicate this volume was ever
> > > send/received. Additionally, sending such volume can cause conflicts
> > > due to the presence of received_uuid.
> > >
> > > Preserving the received_uuid (and related fields) after transitioning the
> > > snapshot from RO to RW and then changing the snapshot has a potential for
> > > causing send to fail in many unexpected ways if we later turn it back to
> > > RO and use it for an incremental send operation.
> > >
> > > A recent example, in the thread to which the Link tag below points to, we
> > > had a user with a filesystem that had multiple snapshots with the same
> > > received_uuid but with different content due to a transition from RO to RW
> > > and then back to RO. When an incremental send was attempted using two of
> > > those snapshots, it resulted in send emitting a clone operation that was
> > > intended to clone from the parent root to the send root - however because
> > > both roots had the same received_uuid, the receiver ended up picking the
> > > send root as the source root for the clone operation, which happened to
> > > result in the clone operation to fail with -EINVAL, due to the source and
> > > destination offsets being the same (and on the same root and file). In this
> > > particular case there was no harm, but we could end up in a case where the
> > > clone operation succeeds but clones wrong data due to picking up an
> > > incorrect root - as in the case of multiple snapshots with the same
> > > received_uuid, it has no way to know which one is the correct one if they
> > > have different content.
> > Not to overly discourage this change but I think this will cause some
> > issues in user space.
>
> That this change can cause issues for users is the reason why it hasn't
> been merged. The change on the kernel side is simple, but I've been
> missing the progs part and the "what-if"s that happen in practice.
>
> This hasn't been communicated properly so we've got resends without
> changes. I had a chat with Nikolay about what's missing so hopefully we
> can move forward this time.

Any reason why that wasn't shared in the past?

>
> > For example I had the problem of partial subvols after a sudden
> > restart during receive. No problem, just receive to a directory that
> > gets deleted on startup then move the subvol to the final location
> > after completion. To move the subvol it needs to be temporarily set rw
> > for some reason. I'm sure there is some better solution but patterns
> > like this might be out there.
>
> Thanks, that's a case we should take into account. And there are
> probably more, but I'm not using send/receive much so that's another
> reason why I've been hesitant to merge the patch due to lack of
> understanding of the use case.

So this has been going on for years, and it's well known that changing
a snapshot to RW, do changes on it, then back to RO and then use it
for incremental sends, often causes problems.

Most of the time, when there is a problem, the receive operation
simply fails somewhere, and it's not obvious what went wrong.
In last week's case I spent quite a considerable amount of time
looking at why a clone operation was failing, convinced the problem
was due to a bug in the algorithm of the sending side.

But looking at that what scares the most is not those cases that
result in an error - it's the cases where the incremental send
succeeds but results in a silent corruption, like cloning from the
wrong root, because there are multiple snapshots with the same
received_uuid but different content. For the cloning case, we can work
around and disable cloning completely for incremental sends - not
ideal, but at least no silent corruptions due to cloning. For all the
other cases not related to cloning, I have no idea how to prevent
them.

Yes, cases like Martin's are unfortunate and there's no easy way
around. But having such failures, and especially silent corruption, is
even worse IMO.
Do you have suggestions on how to proceed? How would you solve the problem?

Thanks.
Graham Cobb Sept. 9, 2021, 9:37 a.m. UTC | #7
On 09/09/2021 07:46, Nikolay Borisov wrote:
> 
> 
> On 9.09.21 г. 0:24, Graham Cobb wrote:
>>
>> On 08/09/2021 19:33, David Sterba wrote:
>>> On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote:
>>
>> <snip>
>>
>>>> For example I had the problem of partial subvols after a sudden
>>>> restart during receive. No problem, just receive to a directory that
>>>> gets deleted on startup then move the subvol to the final location
>>>> after completion. To move the subvol it needs to be temporarily set rw
>>>> for some reason. I'm sure there is some better solution but patterns
>>>> like this might be out there.
>>>
>>> Thanks, that's a case we should take into account. And there are
>>> probably more, but I'm not using send/receive much so that's another
>>> reason why I've been hesitant to merge the patch due to lack of
>>> understanding of the use case.
>>>
>>
>> This seems to be an important change to make, but is user-affecting. A
>> few ideas:
>>
>> 1) Can it be made optional? On by default but possible to turn off
>> (mount option, sys file, ...) if you really need to retain the current
>> behaviour.
> 
> But the current behavior is buggy and non-intentional so it should be
> rectified. Basically we've made it easy for users to do something which
> they shouldn't really be doing, they then bear the consequences and come
> to the mailing list for support thinking something is broken.

Yes, I agree completely. I was disappointed the change wasn't made last
time this was discussed on the list. But it wasn't. And I think that was
because of concern over the impact: the change will break some users'
workflows or scripts and could break some important tools, apps or
things like distro installation/upgrade scripts. That was the purpose of
my suggestions: to break down barriers which might delay making this happen.

>> 2) Is there a way to engage with the developers and user community for
>> popular tools which make heavy use of snapshotting and/or send/receive
>> for discussion and testing? For example, btrbk, snapper, distros.

The point of this suggestion was to address David's concern of not
understanding the use case. This could be useful when discussing the
timing of the fix (and whether it can be backported to stable kernels).

>> 3) How about adding an IOCTL to allow the user to set/delete the
>> received_uuid? Only intended for cases which really need to emulate the
>> removed behaviour. This could be a less complex long term solution than
>> keeping option 1 indefinitely.

And this suggestion was to make it "possible" to work round the change
but, in practice, harder than just doing the right thing :-) I agree
this is probably too far.

Graham
Andrei Borzenkov Sept. 9, 2021, 12:24 p.m. UTC | #8
On 09.09.2021 00:24, Graham Cobb wrote:
> 
> On 08/09/2021 19:33, David Sterba wrote:
>> On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote:
> 
> <snip>
> 
>>> For example I had the problem of partial subvols after a sudden
>>> restart during receive. No problem, just receive to a directory that
>>> gets deleted on startup then move the subvol to the final location
>>> after completion. To move the subvol it needs to be temporarily set rw
>>> for some reason. I'm sure there is some better solution but patterns
>>> like this might be out there.
>>

Better solution is to delete partial subvolume and start over. No need
to mess with internals.

>> Thanks, that's a case we should take into account. And there are
>> probably more, but I'm not using send/receive much so that's another
>> reason why I've been hesitant to merge the patch due to lack of
>> understanding of the use case.
>>
> 
> This seems to be an important change to make, but is user-affecting. A
> few ideas:
> 
> 1) Can it be made optional? On by default but possible to turn off
> (mount option, sys file, ...) if you really need to retain the current
> behaviour.
> 

And then next btrfs update will start modifying read-write subvolume
when you do not expect it and we are back on square one.

> 2) Is there a way to engage with the developers and user community for
> popular tools which make heavy use of snapshotting and/or send/receive
> for discussion and testing? For example, btrbk, snapper, distros.
> 

The only reason you need possibility to flip read-only bit is to allow
btrfs receive to write to subvolume and mark it read-only after that.
What should have happened instead is subvolume state where only one
process (btrfs receive) is allowed to write and subvolume is
automatically removed if this process dies before having completed.

Even better would be support for checkpoints and restarts.

But current situation is really the worst possible.

> 3) How about adding an IOCTL to allow the user to set/delete the
> received_uuid? Only intended for cases which really need to emulate the
> removed behaviour. This could be a less complex long term solution than
> keeping option 1 indefinitely.
> 

This IOCTL already exists and it is used by btrfs receive but it does
not really solve anything. Why do you expect users to remember to use
this IOCTL if users do not remember to never switch read-only subvolume
that is part of replication to read-write?

tw:/home/bor/python-btrfs/examples # btrfs sub sh /mnt/rcv2/src-snap1/
rcv2/src-snap1
	Name: 			src-snap1
	UUID: 			05bb994c-605d-3042-b45a-1ba9d88822ea
	Parent UUID: 		-
	Received UUID: 		213b224e-016e-0b43-a681-d534027cfe95
	Creation time: 		2021-09-01 20:30:35 +0300
	Subvolume ID: 		263
	Generation: 		118
	Gen at creation: 	114
	Parent ID: 		262
	Top level ID: 		262
	Flags: 			readonly
	Snapshot(s):
				src-snap1-clone
				src-snap1-clone1
tw:/home/bor/python-btrfs/examples # btrfs property set
/mnt/rcv2/src-snap1/ ro false
tw:/home/bor/python-btrfs/examples # btrfs sub sh /mnt/rcv2/src-snap1/
rcv2/src-snap1
	Name: 			src-snap1
	UUID: 			05bb994c-605d-3042-b45a-1ba9d88822ea
	Parent UUID: 		-
	Received UUID: 		213b224e-016e-0b43-a681-d534027cfe95
	Creation time: 		2021-09-01 20:30:35 +0300
	Subvolume ID: 		263
	Generation: 		118
	Gen at creation: 	114
	Parent ID: 		262
	Top level ID: 		262
	Flags: 			-
	Snapshot(s):
				src-snap1-clone
				src-snap1-clone1
tw:/home/bor/python-btrfs/examples # ./set_received_uuid.py
00000000-0000-0000-0000-000000000000 0 0.0 /mnt/rcv2/src-snap1/
Current subvolume information:
  subvol_id: 263
  received_uuid: 213b224e-016e-0b43-a681-d534027cfe95
  stime: 0.0 (1970-01-01T00:00:00)
  stransid: 111
  rtime: 1630517435.297119052 (2021-09-01T17:30:35.297119)
  rtransid: 115

Setting received subvolume...

Resulting subvolume information:
  subvol_id: 263
  received_uuid: 00000000-0000-0000-0000-000000000000
  stime: 0.0 (1970-01-01T00:00:00)
  stransid: 0
  rtime: 1631189234.720887105 (2021-09-09T12:07:14.720887)
  rtransid: 167

tw:/home/bor/python-btrfs/examples # btrfs sub sh /mnt/rcv2/src-snap1/
rcv2/src-snap1
	Name: 			src-snap1
	UUID: 			05bb994c-605d-3042-b45a-1ba9d88822ea
	Parent UUID: 		-
	Received UUID: 		-
	Creation time: 		2021-09-01 20:30:35 +0300
	Subvolume ID: 		263
	Generation: 		118
	Gen at creation: 	114
	Parent ID: 		262
	Top level ID: 		262
	Flags: 			-
	Snapshot(s):
				src-snap1-clone
				src-snap1-clone1
tw:/home/bor/python-btrfs/examples #
Martin Raiber Sept. 9, 2021, 3:39 p.m. UTC | #9
On 09.09.2021 11:37 Graham Cobb wrote:
> On 09/09/2021 07:46, Nikolay Borisov wrote:
>
>
> On 9.09.21 г. 0:24, Graham Cobb wrote:
>>> On 08/09/2021 19:33, David Sterba wrote:
>>>> On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote:
>>> <snip>
>>>
>>>>> For example I had the problem of partial subvols after a sudden
>>>>> restart during receive. No problem, just receive to a directory that
>>>>> gets deleted on startup then move the subvol to the final location
>>>>> after completion. To move the subvol it needs to be temporarily set rw
>>>>> for some reason. I'm sure there is some better solution but patterns
>>>>> like this might be out there.
>>>> Thanks, that's a case we should take into account. And there are
>>>> probably more, but I'm not using send/receive much so that's another
>>>> reason why I've been hesitant to merge the patch due to lack of
>>>> understanding of the use case.
>>>>
>>> This seems to be an important change to make, but is user-affecting. A
>>> few ideas:
>>>
>>> 1) Can it be made optional? On by default but possible to turn off
>>> (mount option, sys file, ...) if you really need to retain the current
>>> behaviour.
>> But the current behavior is buggy and non-intentional so it should be
>> rectified. Basically we've made it easy for users to do something which
>> they shouldn't really be doing, they then bear the consequences and come
>> to the mailing list for support thinking something is broken.
> Yes, I agree completely. I was disappointed the change wasn't made last
> time this was discussed on the list. But it wasn't. And I think that was
> because of concern over the impact: the change will break some users'
> workflows or scripts and could break some important tools, apps or
> things like distro installation/upgrade scripts. That was the purpose of
> my suggestions: to break down barriers which might delay making this happen.
>
>>> 2) Is there a way to engage with the developers and user community for
>>> popular tools which make heavy use of snapshotting and/or send/receive
>>> for discussion and testing? For example, btrbk, snapper, distros.
> The point of this suggestion was to address David's concern of not
> understanding the use case. This could be useful when discussing the
> timing of the fix (and whether it can be backported to stable kernels).
>
>>> 3) How about adding an IOCTL to allow the user to set/delete the
>>> received_uuid? Only intended for cases which really need to emulate the
>>> removed behaviour. This could be a less complex long term solution than
>>> keeping option 1 indefinitely.
> And this suggestion was to make it "possible" to work round the change
> but, in practice, harder than just doing the right thing :-) I agree
> this is probably too far.

5) Have a new subvol flag and only reset received uuid on ro -> rw if this new subvol flag is set. At this point it is completely backwards compatible (if older kernels ignore unknown subvol flags?). Then change btrfs-progs to set this flag during receive (under whatever policy btrfs-progs has for backwards-compatibility). Maybe only for v2 streams but that might further delay the transition. btrfs_ioctl_received_subvol_args even has an already existing flags member.
Qu Wenruo Sept. 10, 2021, 5:14 a.m. UTC | #10
On 2021/9/8 下午9:51, Nikolay Borisov wrote:
> Currently when a read-only snapshot is received and subsequently its
> ro property is set to false i.e. switched to rw-mode the
> received_uuid/stime/rtime/stransid/rtransid of that subvol remains
> intact. However, once the received volume is switched to RW mode we
> cannot guaranteee that it contains the same data, so it makes sense
> to remove those fields which indicate this volume was ever
> send/received. Additionally, sending such volume can cause conflicts
> due to the presence of received_uuid.
>
> Preserving the received_uuid (and related fields) after transitioning the
> snapshot from RO to RW and then changing the snapshot has a potential for
> causing send to fail in many unexpected ways if we later turn it back to
> RO and use it for an incremental send operation.
>
> A recent example, in the thread to which the Link tag below points to, we
> had a user with a filesystem that had multiple snapshots with the same
> received_uuid but with different content due to a transition from RO to RW
> and then back to RO. When an incremental send was attempted using two of
> those snapshots, it resulted in send emitting a clone operation that was
> intended to clone from the parent root to the send root - however because
> both roots had the same received_uuid, the receiver ended up picking the
> send root as the source root for the clone operation, which happened to
> result in the clone operation to fail with -EINVAL, due to the source and
> destination offsets being the same (and on the same root and file). In this
> particular case there was no harm, but we could end up in a case where the
> clone operation succeeds but clones wrong data due to picking up an
> incorrect root - as in the case of multiple snapshots with the same
> received_uuid, it has no way to know which one is the correct one if they
> have different content.
>
> Link: https://lore.kernel.org/linux-btrfs/CAOaVUnV3L6RpcqJ5gaqzNXWXK0VMkEVXCdihawH1PgS6TiMchQ@mail.gmail.com/
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> Suggested-by: David Sterba <dsterba@suse.cz>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Will add some warning for btrfs-progs to educate users.

Thanks,
Qu
> ---
>   fs/btrfs/ioctl.c | 41 +++++++++++++++++++++++++++++++++++------
>   1 file changed, 35 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 9eb0c1eb568e..67709d274489 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1927,9 +1927,11 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>   	struct inode *inode = file_inode(file);
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	struct btrfs_root *root = BTRFS_I(inode)->root;
> +	struct btrfs_root_item *root_item = &root->root_item;
>   	struct btrfs_trans_handle *trans;
>   	u64 root_flags;
>   	u64 flags;
> +	bool clear_received_state = false;
>   	int ret = 0;
>
>   	if (!inode_owner_or_capable(file_mnt_user_ns(file), inode))
> @@ -1960,9 +1962,9 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>   	if (!!(flags & BTRFS_SUBVOL_RDONLY) == btrfs_root_readonly(root))
>   		goto out_drop_sem;
>
> -	root_flags = btrfs_root_flags(&root->root_item);
> +	root_flags = btrfs_root_flags(root_item);
>   	if (flags & BTRFS_SUBVOL_RDONLY) {
> -		btrfs_set_root_flags(&root->root_item,
> +		btrfs_set_root_flags(root_item,
>   				     root_flags | BTRFS_ROOT_SUBVOL_RDONLY);
>   	} else {
>   		/*
> @@ -1971,9 +1973,10 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>   		 */
>   		spin_lock(&root->root_item_lock);
>   		if (root->send_in_progress == 0) {
> -			btrfs_set_root_flags(&root->root_item,
> +			btrfs_set_root_flags(root_item,
>   				     root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY);
>   			spin_unlock(&root->root_item_lock);
> +			clear_received_state = true;
>   		} else {
>   			spin_unlock(&root->root_item_lock);
>   			btrfs_warn(fs_info,
> @@ -1984,14 +1987,40 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>   		}
>   	}
>
> -	trans = btrfs_start_transaction(root, 1);
> +	/*
> +	 * 1 item for updating the uuid root in the root tree
> +	 * 1 item for actually removing the uuid record in the uuid tree
> +	 */
> +	trans = btrfs_start_transaction(root, 2);
>   	if (IS_ERR(trans)) {
>   		ret = PTR_ERR(trans);
>   		goto out_reset;
>   	}
>
> -	ret = btrfs_update_root(trans, fs_info->tree_root,
> -				&root->root_key, &root->root_item);
> +	if (clear_received_state &&
> +	    !btrfs_is_empty_uuid(root_item->received_uuid)) {
> +
> +		ret = btrfs_uuid_tree_remove(trans, root_item->received_uuid,
> +					     BTRFS_UUID_KEY_RECEIVED_SUBVOL,
> +					     root->root_key.objectid);
> +
> +		if (ret && ret != -ENOENT) {
> +			btrfs_abort_transaction(trans, ret);
> +			btrfs_end_transaction(trans);
> +			goto out_reset;
> +		}
> +
> +		memset(root_item->received_uuid, 0, BTRFS_UUID_SIZE);
> +		btrfs_set_root_stransid(root_item, 0);
> +		btrfs_set_root_rtransid(root_item, 0);
> +		btrfs_set_stack_timespec_sec(&root_item->stime, 0);
> +		btrfs_set_stack_timespec_nsec(&root_item->stime, 0);
> +		btrfs_set_stack_timespec_sec(&root_item->rtime, 0);
> +		btrfs_set_stack_timespec_nsec(&root_item->rtime, 0);
> +	}
> +
> +	ret = btrfs_update_root(trans, fs_info->tree_root, &root->root_key,
> +				root_item);
>   	if (ret < 0) {
>   		btrfs_end_transaction(trans);
>   		goto out_reset;
>
diff mbox series

Patch

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9eb0c1eb568e..67709d274489 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1927,9 +1927,11 @@  static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_root_item *root_item = &root->root_item;
 	struct btrfs_trans_handle *trans;
 	u64 root_flags;
 	u64 flags;
+	bool clear_received_state = false;
 	int ret = 0;
 
 	if (!inode_owner_or_capable(file_mnt_user_ns(file), inode))
@@ -1960,9 +1962,9 @@  static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
 	if (!!(flags & BTRFS_SUBVOL_RDONLY) == btrfs_root_readonly(root))
 		goto out_drop_sem;
 
-	root_flags = btrfs_root_flags(&root->root_item);
+	root_flags = btrfs_root_flags(root_item);
 	if (flags & BTRFS_SUBVOL_RDONLY) {
-		btrfs_set_root_flags(&root->root_item,
+		btrfs_set_root_flags(root_item,
 				     root_flags | BTRFS_ROOT_SUBVOL_RDONLY);
 	} else {
 		/*
@@ -1971,9 +1973,10 @@  static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
 		 */
 		spin_lock(&root->root_item_lock);
 		if (root->send_in_progress == 0) {
-			btrfs_set_root_flags(&root->root_item,
+			btrfs_set_root_flags(root_item,
 				     root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY);
 			spin_unlock(&root->root_item_lock);
+			clear_received_state = true;
 		} else {
 			spin_unlock(&root->root_item_lock);
 			btrfs_warn(fs_info,
@@ -1984,14 +1987,40 @@  static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
 		}
 	}
 
-	trans = btrfs_start_transaction(root, 1);
+	/*
+	 * 1 item for updating the uuid root in the root tree
+	 * 1 item for actually removing the uuid record in the uuid tree
+	 */
+	trans = btrfs_start_transaction(root, 2);
 	if (IS_ERR(trans)) {
 		ret = PTR_ERR(trans);
 		goto out_reset;
 	}
 
-	ret = btrfs_update_root(trans, fs_info->tree_root,
-				&root->root_key, &root->root_item);
+	if (clear_received_state &&
+	    !btrfs_is_empty_uuid(root_item->received_uuid)) {
+
+		ret = btrfs_uuid_tree_remove(trans, root_item->received_uuid,
+					     BTRFS_UUID_KEY_RECEIVED_SUBVOL,
+					     root->root_key.objectid);
+
+		if (ret && ret != -ENOENT) {
+			btrfs_abort_transaction(trans, ret);
+			btrfs_end_transaction(trans);
+			goto out_reset;
+		}
+
+		memset(root_item->received_uuid, 0, BTRFS_UUID_SIZE);
+		btrfs_set_root_stransid(root_item, 0);
+		btrfs_set_root_rtransid(root_item, 0);
+		btrfs_set_stack_timespec_sec(&root_item->stime, 0);
+		btrfs_set_stack_timespec_nsec(&root_item->stime, 0);
+		btrfs_set_stack_timespec_sec(&root_item->rtime, 0);
+		btrfs_set_stack_timespec_nsec(&root_item->rtime, 0);
+	}
+
+	ret = btrfs_update_root(trans, fs_info->tree_root, &root->root_key,
+				root_item);
 	if (ret < 0) {
 		btrfs_end_transaction(trans);
 		goto out_reset;