diff mbox series

[RESEND] btrfs: Remove received information from snapshot on ro->rw switch

Message ID 20200403134436.9095-1-nborisov@suse.com (mailing list archive)
State New, archived
Headers show
Series [RESEND] btrfs: Remove received information from snapshot on ro->rw switch | expand

Commit Message

Nikolay Borisov April 3, 2020, 1:44 p.m. UTC
Currently when a read-only snapshot is received and subsequently its
ro property is set to false i.e. switched to rw-mode the
received_uuid/stime/rtime/stransid/rtransid of that subvol remains
intact. However, once the received volume is switched to RW mode we
cannot guaranteee that it contains the same data, so it makes sense
to remove those fields which indicate this volume was ever
send/received. Additionally, sending such volume can cause conflicts
due to the presence of received_uuid.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Suggested-by: David Sterba <dsterba@suse.cz>
---

I've been carrying this patch in my tree for around 2.5 years. It stems from
multiple reports on the mailing list about people changing the RO->RW mode on
a received snapshot and getting unexpected behavior. AFAIR this patch resolved
that.


 fs/btrfs/ioctl.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

--
2.17.1

Comments

Josef Bacik April 10, 2020, 7:23 p.m. UTC | #1
On 4/3/20 9:44 AM, Nikolay Borisov wrote:
> Currently when a read-only snapshot is received and subsequently its
> ro property is set to false i.e. switched to rw-mode the
> received_uuid/stime/rtime/stransid/rtransid of that subvol remains
> intact. However, once the received volume is switched to RW mode we
> cannot guaranteee that it contains the same data, so it makes sense
> to remove those fields which indicate this volume was ever
> send/received. Additionally, sending such volume can cause conflicts
> due to the presence of received_uuid.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> Suggested-by: David Sterba <dsterba@suse.cz>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
Filipe Manana Sept. 8, 2021, 11:32 a.m. UTC | #2
On Fri, Apr 3, 2020 at 2:45 PM Nikolay Borisov <nborisov@suse.com> wrote:
>
> Currently when a read-only snapshot is received and subsequently its
> ro property is set to false i.e. switched to rw-mode the
> received_uuid/stime/rtime/stransid/rtransid of that subvol remains
> intact. However, once the received volume is switched to RW mode we
> cannot guaranteee that it contains the same data, so it makes sense
> to remove those fields which indicate this volume was ever
> send/received. Additionally, sending such volume can cause conflicts
> due to the presence of received_uuid.
>
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> Suggested-by: David Sterba <dsterba@suse.cz>
> ---
>
> I've been carrying this patch in my tree for around 2.5 years. It stems from
> multiple reports on the mailing list about people changing the RO->RW mode on
> a received snapshot and getting unexpected behavior. AFAIR this patch resolved
> that.

I agree with it, and this sporadically causes problems that get
reported on the list.
I spent a bunch of time back and forth with a user that ran into a
problem caused by switching snapshots from RO to RW.

So I would like to add the following paragraphs and Link tag to the changelog:

"
Preserving the received_uuid (and related fields) after transitioning the
snapshot from RO to RW and then changing the snapshot has a potential for
causing send to fail in many unexpected ways if we later turn it back to
RO and use it for an incremental send operation.

A recent example, in the thread to which the Link tag below points to, we
had a user with a filesystem that had multiple snapshots with the same
received_uuid but with different content due to a transition from RO to RW
and then back to RO. When an incremental send was attempted using two of
those snapshots, it resulted in send emitting a clone operation that was
intended to clone from the parent root to the send root - however because
both roots had the same received_uuid, the receiver ended up picking the
send root as the source root for the clone operation, which happened to
result in the clone operation to fail with -EINVAL, due to the source and
destination offsets being the same (and on the same root and file). In this
particular case there was no harm, but we could end up in a case where the
clone operation succeeds but clones wrong data due to picking up an
incorrect root - as in the case of multiple snapshots with the same
received_uuid, it has no way to know which one is the correct one if they
have different content.

Link: https://lore.kernel.org/linux-btrfs/CAOaVUnV3L6RpcqJ5gaqzNXWXK0VMkEVXCdihawH1PgS6TiMchQ@mail.gmail.com/
"


A few comments below as well.

>
>
>  fs/btrfs/ioctl.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 40b729dce91c..39840b654600 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1873,6 +1873,7 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>         struct btrfs_trans_handle *trans;
>         u64 root_flags;
>         u64 flags;
> +       bool clear_received_state = false;
>         int ret = 0;
>
>         if (!inode_owner_or_capable(inode))
> @@ -1917,6 +1918,7 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>                         btrfs_set_root_flags(&root->root_item,
>                                      root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY);
>                         spin_unlock(&root->root_item_lock);
> +                       clear_received_state = true;
>                 } else {
>                         spin_unlock(&root->root_item_lock);
>                         btrfs_warn(fs_info,
> @@ -1933,6 +1935,31 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>                 goto out_reset;
>         }
>
> +       if (clear_received_state) {
> +               if (!btrfs_is_empty_uuid(root->root_item.received_uuid)) {

This could be combined into a single if to reduce indentation:   if
(clear_received_state && !btrfs_is_empty_uuid(...))

Also, there's here indentation weirdness due to using spaces instead
of tabs, and checkpatch complains about it:

ERROR: code indent should use tabs where possible
#80: FILE: fs/btrfs/ioctl.c:1996:
+^I        if (!btrfs_is_empty_uuid(root->root_item.received_uuid)) {$

ERROR: code indent should use tabs where possible
#83: FILE: fs/btrfs/ioctl.c:1999:
+^I                ret = btrfs_uuid_tree_remove(trans,$

ERROR: code indent should use tabs where possible
#84: FILE: fs/btrfs/ioctl.c:2000:
+^I                                root->root_item.received_uuid,$
(...)

> +                       struct btrfs_root_item *root_item = &root->root_item;
> +
> +                       ret = btrfs_uuid_tree_remove(trans,
> +                                       root->root_item.received_uuid,
> +                                       BTRFS_UUID_KEY_RECEIVED_SUBVOL,
> +                                       root->root_key.objectid);

If we reach here, we should have reserved one additional metadata item
when starting the transaction.
And add a comment about what each item is used for - 1 for updating
the root item in the root tree, another 1 for removing an item from
the uuid tree.

Also, we can use root_item there instead of root->root_item.

Other than that, it looks fine to me.

Thanks.

> +
> +                       if (ret && ret != -ENOENT) {
> +                               btrfs_abort_transaction(trans, ret);
> +                               btrfs_end_transaction(trans);
> +                               goto out_reset;
> +                       }
> +
> +                       memset(root_item->received_uuid, 0, BTRFS_UUID_SIZE);
> +                       btrfs_set_root_stransid(root_item, 0);
> +                       btrfs_set_root_rtransid(root_item, 0);
> +                       btrfs_set_stack_timespec_sec(&root_item->stime, 0);
> +                       btrfs_set_stack_timespec_nsec(&root_item->stime, 0);
> +                       btrfs_set_stack_timespec_sec(&root_item->rtime, 0);
> +                       btrfs_set_stack_timespec_nsec(&root_item->rtime, 0);
> +               }
> +       }
> +
>         ret = btrfs_update_root(trans, fs_info->tree_root,
>                                 &root->root_key, &root->root_item);
>         if (ret < 0) {
> --
> 2.17.1
>
Graham Cobb Sept. 8, 2021, 12:01 p.m. UTC | #3
On 08/09/2021 12:32, Filipe Manana wrote:
> On Fri, Apr 3, 2020 at 2:45 PM Nikolay Borisov <nborisov@suse.com> wrote:
>>
>> Currently when a read-only snapshot is received and subsequently its
>> ro property is set to false i.e. switched to rw-mode the
>> received_uuid/stime/rtime/stransid/rtransid of that subvol remains
>> intact. However, once the received volume is switched to RW mode we
>> cannot guaranteee that it contains the same data, so it makes sense
>> to remove those fields which indicate this volume was ever
>> send/received. Additionally, sending such volume can cause conflicts
>> due to the presence of received_uuid.
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> Suggested-by: David Sterba <dsterba@suse.cz>
>> ---
>>
>> I've been carrying this patch in my tree for around 2.5 years. It stems from
>> multiple reports on the mailing list about people changing the RO->RW mode on
>> a received snapshot and getting unexpected behavior. AFAIR this patch resolved
>> that.
> 
> I agree with it, and this sporadically causes problems that get
> reported on the list.
> I spent a bunch of time back and forth with a user that ran into a
> problem caused by switching snapshots from RO to RW.
> 
> So I would like to add the following paragraphs and Link tag to the changelog:
> 
> "
> Preserving the received_uuid (and related fields) after transitioning the
> snapshot from RO to RW and then changing the snapshot has a potential for
> causing send to fail in many unexpected ways if we later turn it back to
> RO and use it for an incremental send operation.
> 
> A recent example, in the thread to which the Link tag below points to, we
> had a user with a filesystem that had multiple snapshots with the same
> received_uuid but with different content due to a transition from RO to RW
> and then back to RO. 

Should btrfs-check be enhanced to detect subvolumes with duplicated
received_uuid? Presumably the repair would be to change all of them to
new UUIDs (or maybe keep the oldest one unchanged?).

Graham

When an incremental send was attempted using two of
> those snapshots, it resulted in send emitting a clone operation that was
> intended to clone from the parent root to the send root - however because
> both roots had the same received_uuid, the receiver ended up picking the
> send root as the source root for the clone operation, which happened to
> result in the clone operation to fail with -EINVAL, due to the source and
> destination offsets being the same (and on the same root and file). In this
> particular case there was no harm, but we could end up in a case where the
> clone operation succeeds but clones wrong data due to picking up an
> incorrect root - as in the case of multiple snapshots with the same
> received_uuid, it has no way to know which one is the correct one if they
> have different content.
> 
> Link: https://lore.kernel.org/linux-btrfs/CAOaVUnV3L6RpcqJ5gaqzNXWXK0VMkEVXCdihawH1PgS6TiMchQ@mail.gmail.com/
> "
> 
> 
> A few comments below as well.
> 
>>
>>
>>  fs/btrfs/ioctl.c | 27 +++++++++++++++++++++++++++
>>  1 file changed, 27 insertions(+)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 40b729dce91c..39840b654600 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1873,6 +1873,7 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>>         struct btrfs_trans_handle *trans;
>>         u64 root_flags;
>>         u64 flags;
>> +       bool clear_received_state = false;
>>         int ret = 0;
>>
>>         if (!inode_owner_or_capable(inode))
>> @@ -1917,6 +1918,7 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>>                         btrfs_set_root_flags(&root->root_item,
>>                                      root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY);
>>                         spin_unlock(&root->root_item_lock);
>> +                       clear_received_state = true;
>>                 } else {
>>                         spin_unlock(&root->root_item_lock);
>>                         btrfs_warn(fs_info,
>> @@ -1933,6 +1935,31 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>>                 goto out_reset;
>>         }
>>
>> +       if (clear_received_state) {
>> +               if (!btrfs_is_empty_uuid(root->root_item.received_uuid)) {
> 
> This could be combined into a single if to reduce indentation:   if
> (clear_received_state && !btrfs_is_empty_uuid(...))
> 
> Also, there's here indentation weirdness due to using spaces instead
> of tabs, and checkpatch complains about it:
> 
> ERROR: code indent should use tabs where possible
> #80: FILE: fs/btrfs/ioctl.c:1996:
> +^I        if (!btrfs_is_empty_uuid(root->root_item.received_uuid)) {$
> 
> ERROR: code indent should use tabs where possible
> #83: FILE: fs/btrfs/ioctl.c:1999:
> +^I                ret = btrfs_uuid_tree_remove(trans,$
> 
> ERROR: code indent should use tabs where possible
> #84: FILE: fs/btrfs/ioctl.c:2000:
> +^I                                root->root_item.received_uuid,$
> (...)
> 
>> +                       struct btrfs_root_item *root_item = &root->root_item;
>> +
>> +                       ret = btrfs_uuid_tree_remove(trans,
>> +                                       root->root_item.received_uuid,
>> +                                       BTRFS_UUID_KEY_RECEIVED_SUBVOL,
>> +                                       root->root_key.objectid);
> 
> If we reach here, we should have reserved one additional metadata item
> when starting the transaction.
> And add a comment about what each item is used for - 1 for updating
> the root item in the root tree, another 1 for removing an item from
> the uuid tree.
> 
> Also, we can use root_item there instead of root->root_item.
> 
> Other than that, it looks fine to me.
> 
> Thanks.
> 
>> +
>> +                       if (ret && ret != -ENOENT) {
>> +                               btrfs_abort_transaction(trans, ret);
>> +                               btrfs_end_transaction(trans);
>> +                               goto out_reset;
>> +                       }
>> +
>> +                       memset(root_item->received_uuid, 0, BTRFS_UUID_SIZE);
>> +                       btrfs_set_root_stransid(root_item, 0);
>> +                       btrfs_set_root_rtransid(root_item, 0);
>> +                       btrfs_set_stack_timespec_sec(&root_item->stime, 0);
>> +                       btrfs_set_stack_timespec_nsec(&root_item->stime, 0);
>> +                       btrfs_set_stack_timespec_sec(&root_item->rtime, 0);
>> +                       btrfs_set_stack_timespec_nsec(&root_item->rtime, 0);
>> +               }
>> +       }
>> +
>>         ret = btrfs_update_root(trans, fs_info->tree_root,
>>                                 &root->root_key, &root->root_item);
>>         if (ret < 0) {
>> --
>> 2.17.1
>>
> 
>
diff mbox series

Patch

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 40b729dce91c..39840b654600 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1873,6 +1873,7 @@  static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
 	struct btrfs_trans_handle *trans;
 	u64 root_flags;
 	u64 flags;
+	bool clear_received_state = false;
 	int ret = 0;

 	if (!inode_owner_or_capable(inode))
@@ -1917,6 +1918,7 @@  static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
 			btrfs_set_root_flags(&root->root_item,
 				     root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY);
 			spin_unlock(&root->root_item_lock);
+			clear_received_state = true;
 		} else {
 			spin_unlock(&root->root_item_lock);
 			btrfs_warn(fs_info,
@@ -1933,6 +1935,31 @@  static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
 		goto out_reset;
 	}

+	if (clear_received_state) {
+	        if (!btrfs_is_empty_uuid(root->root_item.received_uuid)) {
+			struct btrfs_root_item *root_item = &root->root_item;
+
+	                ret = btrfs_uuid_tree_remove(trans,
+	                                root->root_item.received_uuid,
+	                                BTRFS_UUID_KEY_RECEIVED_SUBVOL,
+	                                root->root_key.objectid);
+
+	                if (ret && ret != -ENOENT) {
+	                        btrfs_abort_transaction(trans, ret);
+	                        btrfs_end_transaction(trans);
+	                        goto out_reset;
+	                }
+
+	                memset(root_item->received_uuid, 0, BTRFS_UUID_SIZE);
+			btrfs_set_root_stransid(root_item, 0);
+			btrfs_set_root_rtransid(root_item, 0);
+			btrfs_set_stack_timespec_sec(&root_item->stime, 0);
+			btrfs_set_stack_timespec_nsec(&root_item->stime, 0);
+			btrfs_set_stack_timespec_sec(&root_item->rtime, 0);
+			btrfs_set_stack_timespec_nsec(&root_item->rtime, 0);
+	        }
+	}
+
 	ret = btrfs_update_root(trans, fs_info->tree_root,
 				&root->root_key, &root->root_item);
 	if (ret < 0) {