diff mbox series

[RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit

Message ID 12b53ab9683c805873d26a4881308137e0bd324e.1692097274.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series [RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit | expand

Commit Message

Anand Jain Aug. 15, 2023, 12:53 p.m. UTC
The btrfstune -m|M operation changes the fsid and preserves the original
fsid in the metadata_uuid.

Changing the fsid happens in two commits in the function
set_metadata_uuid()
- Stage 1 commits the superblock with the CHANGING_FSID_V2 flag.
- Stage 2 updates the fsid in fs_info->super_copy (local memory),
  resets the CHANGING_FSID_V2 flag (local memory),
  and then calls commit.

The two-stage operation with the CHANGING_FSID flag makes sense for
btrfstune -u|U, where the fsid is changed on each and every tree block,
involving many commits. The CHANGING_FSID flag helps to identify the
transient incomplete operation.

However, for btrfstune -m|M, we don't need the CHANGING_FSID_V2 flag, and
a single commit would have been sufficient. The original git commit that
added it, 493dfc9d1fc4 ("btrfs-progs: btrfstune: Add support for changing
the metadata uuid"), provides no reasoning or did I miss something?
(So marking this patch as RFC).

This patch removes the two-stage commit for btrfstune -m|M and instead
performs it in a single commit.

With this change, the CHANGING_FSID_V2 flag is unused. Nevertheless, for
legacy purposes, we will still reset the CHANGING_FSID_V2 flag during the
btrfstune -m|M commit. Furthermore, the patchset titled:

	[PATCH 00/16] btrfs-progs: recover from failed metadata_uuid

will help recover from the split brain situation due to two stage
method in the older btrfstune.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 tune/change-metadata-uuid.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

Comments

Qu Wenruo Aug. 16, 2023, 2:17 a.m. UTC | #1
On 2023/8/15 20:53, Anand Jain wrote:
> The btrfstune -m|M operation changes the fsid and preserves the original
> fsid in the metadata_uuid.
>
> Changing the fsid happens in two commits in the function
> set_metadata_uuid()
> - Stage 1 commits the superblock with the CHANGING_FSID_V2 flag.
> - Stage 2 updates the fsid in fs_info->super_copy (local memory),
>    resets the CHANGING_FSID_V2 flag (local memory),
>    and then calls commit.
>
> The two-stage operation with the CHANGING_FSID flag makes sense for
> btrfstune -u|U, where the fsid is changed on each and every tree block,
> involving many commits. The CHANGING_FSID flag helps to identify the
> transient incomplete operation.
>
> However, for btrfstune -m|M, we don't need the CHANGING_FSID_V2 flag, and
> a single commit would have been sufficient. The original git commit that
> added it, 493dfc9d1fc4 ("btrfs-progs: btrfstune: Add support for changing
> the metadata uuid"), provides no reasoning or did I miss something?

To me, it looks very sane to do all the operations just in one transaction.

Furthermore unlike kernel, btrfs-progs has full control on whether to
commit the transaction (exclusively owns the fs, thus no one else can
modify/commit the fs), and all the operations are just modifying the
super block flags.

Thus merging the operations should be fine, and I'm a little surprised
why we didn't do it in the first place.

Thanks,
Qu
> (So marking this patch as RFC).
>
> This patch removes the two-stage commit for btrfstune -m|M and instead
> performs it in a single commit.
>
> With this change, the CHANGING_FSID_V2 flag is unused. Nevertheless, for
> legacy purposes, we will still reset the CHANGING_FSID_V2 flag during the
> btrfstune -m|M commit. Furthermore, the patchset titled:
>
> 	[PATCH 00/16] btrfs-progs: recover from failed metadata_uuid
>
> will help recover from the split brain situation due to two stage
> method in the older btrfstune.
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>   tune/change-metadata-uuid.c | 12 +-----------
>   1 file changed, 1 insertion(+), 11 deletions(-)
>
> diff --git a/tune/change-metadata-uuid.c b/tune/change-metadata-uuid.c
> index 371f34e679b4..ac8c2fd398c2 100644
> --- a/tune/change-metadata-uuid.c
> +++ b/tune/change-metadata-uuid.c
> @@ -33,7 +33,6 @@ int set_metadata_uuid(struct btrfs_root *root, const char *new_fsid_string)
>   	u64 incompat_flags;
>   	bool fsid_changed;
>   	u64 super_flags;
> -	int ret;
>
>   	disk_super = root->fs_info->super_copy;
>   	super_flags = btrfs_super_flags(disk_super);
> @@ -66,14 +65,6 @@ int set_metadata_uuid(struct btrfs_root *root, const char *new_fsid_string)
>
>   	new_fsid = (memcmp(fsid, disk_super->fsid, BTRFS_FSID_SIZE) != 0);
>
> -	/* Step 1 sets the in progress flag */
> -	trans = btrfs_start_transaction(root, 1);
> -	super_flags |= BTRFS_SUPER_FLAG_CHANGING_FSID_V2;
> -	btrfs_set_super_flags(disk_super, super_flags);
> -	ret = btrfs_commit_transaction(trans, root);
> -	if (ret < 0)
> -		return ret;
> -
>   	if (new_fsid && fsid_changed && memcmp(disk_super->metadata_uuid,
>   					       fsid, BTRFS_FSID_SIZE) == 0) {
>   		/*
> @@ -106,8 +97,7 @@ int set_metadata_uuid(struct btrfs_root *root, const char *new_fsid_string)
>   	trans = btrfs_start_transaction(root, 1);
>
>   	/*
> -	 * Step 2 is to write the metadata_uuid, set the incompat flag and
> -	 * clear the in progress flag
> +	 * For leagcy purpose reset the BTRFS_SUPER_FLAG_CHANGING_FSID_V2
>   	 */
>   	super_flags &= ~BTRFS_SUPER_FLAG_CHANGING_FSID_V2;
>   	btrfs_set_super_flags(disk_super, super_flags);
David Sterba Aug. 17, 2023, 11:52 a.m. UTC | #2
On Tue, Aug 15, 2023 at 08:53:24PM +0800, Anand Jain wrote:
> The btrfstune -m|M operation changes the fsid and preserves the original
> fsid in the metadata_uuid.
> 
> Changing the fsid happens in two commits in the function
> set_metadata_uuid()
> - Stage 1 commits the superblock with the CHANGING_FSID_V2 flag.
> - Stage 2 updates the fsid in fs_info->super_copy (local memory),
>   resets the CHANGING_FSID_V2 flag (local memory),
>   and then calls commit.
> 
> The two-stage operation with the CHANGING_FSID flag makes sense for
> btrfstune -u|U, where the fsid is changed on each and every tree block,
> involving many commits. The CHANGING_FSID flag helps to identify the
> transient incomplete operation.
> 
> However, for btrfstune -m|M, we don't need the CHANGING_FSID_V2 flag, and
> a single commit would have been sufficient. The original git commit that
> added it, 493dfc9d1fc4 ("btrfs-progs: btrfstune: Add support for changing
> the metadata uuid"), provides no reasoning or did I miss something?
> (So marking this patch as RFC).

I remember discussions with Nikolay about the corner cases that could
happen due to interrupted conversion and there was a document explaining
that. Part of that ended up in kernel in the logic to detect partially
enabled metadata_uuid.  So there was reason to do it in two phases but
I'd have to look it up in mails or if I still have a link or copy of the
document.
Anand Jain Aug. 18, 2023, 9:21 a.m. UTC | #3
On 17/08/2023 19:52, David Sterba wrote:
> On Tue, Aug 15, 2023 at 08:53:24PM +0800, Anand Jain wrote:
>> The btrfstune -m|M operation changes the fsid and preserves the original
>> fsid in the metadata_uuid.
>>
>> Changing the fsid happens in two commits in the function
>> set_metadata_uuid()
>> - Stage 1 commits the superblock with the CHANGING_FSID_V2 flag.
>> - Stage 2 updates the fsid in fs_info->super_copy (local memory),
>>    resets the CHANGING_FSID_V2 flag (local memory),
>>    and then calls commit.
>>
>> The two-stage operation with the CHANGING_FSID flag makes sense for
>> btrfstune -u|U, where the fsid is changed on each and every tree block,
>> involving many commits. The CHANGING_FSID flag helps to identify the
>> transient incomplete operation.
>>
>> However, for btrfstune -m|M, we don't need the CHANGING_FSID_V2 flag, and
>> a single commit would have been sufficient. The original git commit that
>> added it, 493dfc9d1fc4 ("btrfs-progs: btrfstune: Add support for changing
>> the metadata uuid"), provides no reasoning or did I miss something?
>> (So marking this patch as RFC).
> 
> I remember discussions with Nikolay about the corner cases that could
> happen due to interrupted conversion and there was a document explaining
> that. Part of that ended up in kernel in the logic to detect partially
> enabled metadata_uuid.  So there was reason to do it in two phases but
> I'd have to look it up in mails or if I still have a link or copy of the
> document.


On 18/08/2023 08:21, Qu Wenruo wrote:

 > Oh, my memory comes back, the original design for the two stage
 > commitment is to avoid split brain cases when one device is committed
 > with the new flag, while the remaining one doesn't.
 >
 > With the extra stage, even if at stage 1 or 2 the transaction is
 > interrupted and only one device got the new flag, it can help us to
 > locate the stage and recover.

It is useful for `btrfstune -u`
when there are many transaction commits to write. It uses the
`CHANGING_FSID` flag for this purpose. Any device with the
`CHANGING_FSID` flag fails to mount, and `btrfstune` should be called
again to continue rewrite the new FSID. This is a fair process.


However, in the case of `CHANGING_FSID_V2`, which the `btrfstune -m` 
command uses, only one transaction is required. How does this help?

                 Disk1              Disk2

Commit1     CHANGING_FSID_V2   CHANGING_FSID_V2
Commit2     new-fsid           new-fsid
Qu Wenruo Aug. 18, 2023, 9:27 a.m. UTC | #4
On 2023/8/18 17:21, Anand Jain wrote:
> On 17/08/2023 19:52, David Sterba wrote:
>> On Tue, Aug 15, 2023 at 08:53:24PM +0800, Anand Jain wrote:
>>> The btrfstune -m|M operation changes the fsid and preserves the original
>>> fsid in the metadata_uuid.
>>>
>>> Changing the fsid happens in two commits in the function
>>> set_metadata_uuid()
>>> - Stage 1 commits the superblock with the CHANGING_FSID_V2 flag.
>>> - Stage 2 updates the fsid in fs_info->super_copy (local memory),
>>>    resets the CHANGING_FSID_V2 flag (local memory),
>>>    and then calls commit.
>>>
>>> The two-stage operation with the CHANGING_FSID flag makes sense for
>>> btrfstune -u|U, where the fsid is changed on each and every tree block,
>>> involving many commits. The CHANGING_FSID flag helps to identify the
>>> transient incomplete operation.
>>>
>>> However, for btrfstune -m|M, we don't need the CHANGING_FSID_V2 flag, 
>>> and
>>> a single commit would have been sufficient. The original git commit that
>>> added it, 493dfc9d1fc4 ("btrfs-progs: btrfstune: Add support for 
>>> changing
>>> the metadata uuid"), provides no reasoning or did I miss something?
>>> (So marking this patch as RFC).
>>
>> I remember discussions with Nikolay about the corner cases that could
>> happen due to interrupted conversion and there was a document explaining
>> that. Part of that ended up in kernel in the logic to detect partially
>> enabled metadata_uuid.  So there was reason to do it in two phases but
>> I'd have to look it up in mails or if I still have a link or copy of the
>> document.
> 
> 
> On 18/08/2023 08:21, Qu Wenruo wrote:
> 
>  > Oh, my memory comes back, the original design for the two stage
>  > commitment is to avoid split brain cases when one device is committed
>  > with the new flag, while the remaining one doesn't.
>  >
>  > With the extra stage, even if at stage 1 or 2 the transaction is
>  > interrupted and only one device got the new flag, it can help us to
>  > locate the stage and recover.
> 
> It is useful for `btrfstune -u`
> when there are many transaction commits to write. It uses the
> `CHANGING_FSID` flag for this purpose. Any device with the
> `CHANGING_FSID` flag fails to mount, and `btrfstune` should be called
> again to continue rewrite the new FSID. This is a fair process.
> 
> 
> However, in the case of `CHANGING_FSID_V2`, which the `btrfstune -m` 
> command uses, only one transaction is required. How does this help?
> 
>                  Disk1              Disk2
> 
> Commit1     CHANGING_FSID_V2   CHANGING_FSID_V2
> Commit2     new-fsid           new-fsid
> 
> 
> 

Instead if there is only one transaction to enable metadata_uuid 
feature, we can have the following situation where for the single 
transaction we only committed on disk 1:

	Disk 1		Disk 2
	Meta uuid	Regular UUID

Then how do kernel/progs proper recover from this situation?

Although I'd say, it's still possible to recover, but significantly more 
difficult, as we need to properly handle different generations of super 
blocks.

For the two stage one, it's a little simpler but I'm not sure if we have 
the extra handling. E.g. if commit 1 failed partially:

	Disk 1			Disk 2
	Changing_fsid_v2	no changing_fsid_v2

In this case, we can detects we're trying to start fsid change using 
metadata uuid.

The same thing for the 2nd commit.

Thanks,
Qu
Anand Jain Aug. 19, 2023, 11:14 a.m. UTC | #5
On 18/08/2023 17:27, Qu Wenruo wrote:
> 
> 
> On 2023/8/18 17:21, Anand Jain wrote:
>> On 17/08/2023 19:52, David Sterba wrote:
>>> On Tue, Aug 15, 2023 at 08:53:24PM +0800, Anand Jain wrote:
>>>> The btrfstune -m|M operation changes the fsid and preserves the 
>>>> original
>>>> fsid in the metadata_uuid.
>>>>
>>>> Changing the fsid happens in two commits in the function
>>>> set_metadata_uuid()
>>>> - Stage 1 commits the superblock with the CHANGING_FSID_V2 flag.
>>>> - Stage 2 updates the fsid in fs_info->super_copy (local memory),
>>>>    resets the CHANGING_FSID_V2 flag (local memory),
>>>>    and then calls commit.
>>>>
>>>> The two-stage operation with the CHANGING_FSID flag makes sense for
>>>> btrfstune -u|U, where the fsid is changed on each and every tree block,
>>>> involving many commits. The CHANGING_FSID flag helps to identify the
>>>> transient incomplete operation.
>>>>
>>>> However, for btrfstune -m|M, we don't need the CHANGING_FSID_V2 
>>>> flag, and
>>>> a single commit would have been sufficient. The original git commit 
>>>> that
>>>> added it, 493dfc9d1fc4 ("btrfs-progs: btrfstune: Add support for 
>>>> changing
>>>> the metadata uuid"), provides no reasoning or did I miss something?
>>>> (So marking this patch as RFC).
>>>
>>> I remember discussions with Nikolay about the corner cases that could
>>> happen due to interrupted conversion and there was a document explaining
>>> that. Part of that ended up in kernel in the logic to detect partially
>>> enabled metadata_uuid.  So there was reason to do it in two phases but
>>> I'd have to look it up in mails or if I still have a link or copy of the
>>> document.
>>
>>
>> On 18/08/2023 08:21, Qu Wenruo wrote:
>>
>>  > Oh, my memory comes back, the original design for the two stage
>>  > commitment is to avoid split brain cases when one device is committed
>>  > with the new flag, while the remaining one doesn't.
>>  >
>>  > With the extra stage, even if at stage 1 or 2 the transaction is
>>  > interrupted and only one device got the new flag, it can help us to
>>  > locate the stage and recover.
>>
>> It is useful for `btrfstune -u`
>> when there are many transaction commits to write. It uses the
>> `CHANGING_FSID` flag for this purpose. Any device with the
>> `CHANGING_FSID` flag fails to mount, and `btrfstune` should be called
>> again to continue rewrite the new FSID. This is a fair process.
>>
>>
>> However, in the case of `CHANGING_FSID_V2`, which the `btrfstune -m` 
>> command uses, only one transaction is required. How does this help?
>>
>>                  Disk1              Disk2
>>
>> Commit1     CHANGING_FSID_V2   CHANGING_FSID_V2
>> Commit2     new-fsid           new-fsid
>>
>>
>>
> 
> Instead if there is only one transaction to enable metadata_uuid 
> feature, we can have the following situation where for the single 
> transaction we only committed on disk 1:
> 
>      Disk 1        Disk 2
>      Meta uuid    Regular UUID
> 
> Then how do kernel/progs proper recover from this situation?
> 
> Although I'd say, it's still possible to recover, but significantly more 
> difficult, as we need to properly handle different generations of super 
> blocks.
> 
> For the two stage one, it's a little simpler but I'm not sure if we have 
> the extra handling. E.g. if commit 1 failed partially:
> 
>      Disk 1            Disk 2
>      Changing_fsid_v2    no changing_fsid_v2
> 
> In this case, we can detects we're trying to start fsid change using 
> metadata uuid.
> 
> The same thing for the 2nd commit.



The changing_fsid_v2 flag an unnecessary overhead, right? As it could be 
something like:

  . Write new-fsid and Verify. Revert if needed and verify.


------
Summarizing the overall patches:

Patchset [1] is a port of kernel changing_fsid_v2 recovery automation 
functions to the progs. So, hosts with older progs and can't upgrade to 
find a tmp host with the newer progs to fix the disks?

   [1] [PATCH 00/16] btrfs-progs: recover from failed metadata_uuid


Automation in progs/kernel cannot fix all possible scenarios; we still 
need user intervention to reunite, as in patchset [2]. It adds --device 
and --noscan options to reunite. (Needs comments, so that I can rebase).

   [2] [PATCH 00/10] btrfs-progs: check and tune: add device and noscan 
options


Patchset [3] removes the changing_fsid_v2 recovery code from the kernel, 
as pointed out- recovery code is robust in general, except in corner 
cases. So, after this, similar to changing_fsid, disks with 
changing_fsid_v2 are rejected.
But when progs can recover the failed situation and could remove the use 
of the changing_fsid_v2 flag (patch [4]), we don't actually need the 
recovery in the kernel.

   [3] [PATCH] btrfs: reject device with CHANGING_FSID_V2

   [4] [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit
-------

Thanks,
Anand
David Sterba Sept. 18, 2023, 10:38 p.m. UTC | #6
On Sat, Aug 19, 2023 at 07:14:40PM +0800, Anand Jain wrote:
> 
> 
> On 18/08/2023 17:27, Qu Wenruo wrote:
> > 
> > 
> > On 2023/8/18 17:21, Anand Jain wrote:
> >> On 17/08/2023 19:52, David Sterba wrote:
> >>> On Tue, Aug 15, 2023 at 08:53:24PM +0800, Anand Jain wrote:
> >>>> The btrfstune -m|M operation changes the fsid and preserves the 
> >>>> original
> >>>> fsid in the metadata_uuid.
> >>>>
> >>>> Changing the fsid happens in two commits in the function
> >>>> set_metadata_uuid()
> >>>> - Stage 1 commits the superblock with the CHANGING_FSID_V2 flag.
> >>>> - Stage 2 updates the fsid in fs_info->super_copy (local memory),
> >>>>    resets the CHANGING_FSID_V2 flag (local memory),
> >>>>    and then calls commit.
> >>>>
> >>>> The two-stage operation with the CHANGING_FSID flag makes sense for
> >>>> btrfstune -u|U, where the fsid is changed on each and every tree block,
> >>>> involving many commits. The CHANGING_FSID flag helps to identify the
> >>>> transient incomplete operation.
> >>>>
> >>>> However, for btrfstune -m|M, we don't need the CHANGING_FSID_V2 
> >>>> flag, and
> >>>> a single commit would have been sufficient. The original git commit 
> >>>> that
> >>>> added it, 493dfc9d1fc4 ("btrfs-progs: btrfstune: Add support for 
> >>>> changing
> >>>> the metadata uuid"), provides no reasoning or did I miss something?
> >>>> (So marking this patch as RFC).
> >>>
> >>> I remember discussions with Nikolay about the corner cases that could
> >>> happen due to interrupted conversion and there was a document explaining
> >>> that. Part of that ended up in kernel in the logic to detect partially
> >>> enabled metadata_uuid.  So there was reason to do it in two phases but
> >>> I'd have to look it up in mails or if I still have a link or copy of the
> >>> document.
> >>
> >>
> >> On 18/08/2023 08:21, Qu Wenruo wrote:
> >>
> >>  > Oh, my memory comes back, the original design for the two stage
> >>  > commitment is to avoid split brain cases when one device is committed
> >>  > with the new flag, while the remaining one doesn't.
> >>  >
> >>  > With the extra stage, even if at stage 1 or 2 the transaction is
> >>  > interrupted and only one device got the new flag, it can help us to
> >>  > locate the stage and recover.
> >>
> >> It is useful for `btrfstune -u`
> >> when there are many transaction commits to write. It uses the
> >> `CHANGING_FSID` flag for this purpose. Any device with the
> >> `CHANGING_FSID` flag fails to mount, and `btrfstune` should be called
> >> again to continue rewrite the new FSID. This is a fair process.
> >>
> >>
> >> However, in the case of `CHANGING_FSID_V2`, which the `btrfstune -m` 
> >> command uses, only one transaction is required. How does this help?
> >>
> >>                  Disk1              Disk2
> >>
> >> Commit1     CHANGING_FSID_V2   CHANGING_FSID_V2
> >> Commit2     new-fsid           new-fsid
> >>
> >>
> >>
> > 
> > Instead if there is only one transaction to enable metadata_uuid 
> > feature, we can have the following situation where for the single 
> > transaction we only committed on disk 1:
> > 
> >      Disk 1        Disk 2
> >      Meta uuid    Regular UUID
> > 
> > Then how do kernel/progs proper recover from this situation?
> > 
> > Although I'd say, it's still possible to recover, but significantly more 
> > difficult, as we need to properly handle different generations of super 
> > blocks.
> > 
> > For the two stage one, it's a little simpler but I'm not sure if we have 
> > the extra handling. E.g. if commit 1 failed partially:
> > 
> >      Disk 1            Disk 2
> >      Changing_fsid_v2    no changing_fsid_v2
> > 
> > In this case, we can detects we're trying to start fsid change using 
> > metadata uuid.
> > 
> > The same thing for the 2nd commit.
> 
> 
> 
> The changing_fsid_v2 flag an unnecessary overhead, right? As it could be 
> something like:
> 
>   . Write new-fsid and Verify. Revert if needed and verify.
> 
> 
> ------
> Summarizing the overall patches:
> 
> Patchset [1] is a port of kernel changing_fsid_v2 recovery automation 
> functions to the progs. So, hosts with older progs and can't upgrade to 
> find a tmp host with the newer progs to fix the disks?
> 
>    [1] [PATCH 00/16] btrfs-progs: recover from failed metadata_uuid
> 
> 
> Automation in progs/kernel cannot fix all possible scenarios; we still 
> need user intervention to reunite, as in patchset [2]. It adds --device 
> and --noscan options to reunite. (Needs comments, so that I can rebase).
> 
>    [2] [PATCH 00/10] btrfs-progs: check and tune: add device and noscan 
> options
> 
> 
> Patchset [3] removes the changing_fsid_v2 recovery code from the kernel, 
> as pointed out- recovery code is robust in general, except in corner 
> cases. So, after this, similar to changing_fsid, disks with 
> changing_fsid_v2 are rejected.

Hm I think this is acceptable though degrading the feature a bit. We
can't tell how often the kernel recovery of the metadata_uuid has
happened but fixing all cases there might be too complex, more than it
already is. The whole conversion in btrfstune is quite fast, crashing in
the middle of that is possible but highly unlikely.

We don't have a proper documentation for the metadata_uuid use case,
there are the option and sysfs descriptions but not really how it's
supposed to be used and what to do if setting the metadata_uuid fails.

> But when progs can recover the failed situation and could remove the use 
> of the changing_fsid_v2 flag (patch [4]), we don't actually need the 
> recovery in the kernel.

Ok, let's do it.
Anand Jain Sept. 20, 2023, 11:01 p.m. UTC | #7
On 19/09/2023 06:38, David Sterba wrote:
> On Sat, Aug 19, 2023 at 07:14:40PM +0800, Anand Jain wrote:
>>
>>
>> On 18/08/2023 17:27, Qu Wenruo wrote:
>>>
>>>
>>> On 2023/8/18 17:21, Anand Jain wrote:
>>>> On 17/08/2023 19:52, David Sterba wrote:
>>>>> On Tue, Aug 15, 2023 at 08:53:24PM +0800, Anand Jain wrote:
>>>>>> The btrfstune -m|M operation changes the fsid and preserves the
>>>>>> original
>>>>>> fsid in the metadata_uuid.
>>>>>>
>>>>>> Changing the fsid happens in two commits in the function
>>>>>> set_metadata_uuid()
>>>>>> - Stage 1 commits the superblock with the CHANGING_FSID_V2 flag.
>>>>>> - Stage 2 updates the fsid in fs_info->super_copy (local memory),
>>>>>>     resets the CHANGING_FSID_V2 flag (local memory),
>>>>>>     and then calls commit.
>>>>>>
>>>>>> The two-stage operation with the CHANGING_FSID flag makes sense for
>>>>>> btrfstune -u|U, where the fsid is changed on each and every tree block,
>>>>>> involving many commits. The CHANGING_FSID flag helps to identify the
>>>>>> transient incomplete operation.
>>>>>>
>>>>>> However, for btrfstune -m|M, we don't need the CHANGING_FSID_V2
>>>>>> flag, and
>>>>>> a single commit would have been sufficient. The original git commit
>>>>>> that
>>>>>> added it, 493dfc9d1fc4 ("btrfs-progs: btrfstune: Add support for
>>>>>> changing
>>>>>> the metadata uuid"), provides no reasoning or did I miss something?
>>>>>> (So marking this patch as RFC).
>>>>>
>>>>> I remember discussions with Nikolay about the corner cases that could
>>>>> happen due to interrupted conversion and there was a document explaining
>>>>> that. Part of that ended up in kernel in the logic to detect partially
>>>>> enabled metadata_uuid.  So there was reason to do it in two phases but
>>>>> I'd have to look it up in mails or if I still have a link or copy of the
>>>>> document.
>>>>
>>>>
>>>> On 18/08/2023 08:21, Qu Wenruo wrote:
>>>>
>>>>   > Oh, my memory comes back, the original design for the two stage
>>>>   > commitment is to avoid split brain cases when one device is committed
>>>>   > with the new flag, while the remaining one doesn't.
>>>>   >
>>>>   > With the extra stage, even if at stage 1 or 2 the transaction is
>>>>   > interrupted and only one device got the new flag, it can help us to
>>>>   > locate the stage and recover.
>>>>
>>>> It is useful for `btrfstune -u`
>>>> when there are many transaction commits to write. It uses the
>>>> `CHANGING_FSID` flag for this purpose. Any device with the
>>>> `CHANGING_FSID` flag fails to mount, and `btrfstune` should be called
>>>> again to continue rewrite the new FSID. This is a fair process.
>>>>
>>>>
>>>> However, in the case of `CHANGING_FSID_V2`, which the `btrfstune -m`
>>>> command uses, only one transaction is required. How does this help?
>>>>
>>>>                   Disk1              Disk2
>>>>
>>>> Commit1     CHANGING_FSID_V2   CHANGING_FSID_V2
>>>> Commit2     new-fsid           new-fsid
>>>>
>>>>
>>>>
>>>
>>> Instead if there is only one transaction to enable metadata_uuid
>>> feature, we can have the following situation where for the single
>>> transaction we only committed on disk 1:
>>>
>>>       Disk 1        Disk 2
>>>       Meta uuid    Regular UUID
>>>
>>> Then how do kernel/progs proper recover from this situation?
>>>
>>> Although I'd say, it's still possible to recover, but significantly more
>>> difficult, as we need to properly handle different generations of super
>>> blocks.
>>>
>>> For the two stage one, it's a little simpler but I'm not sure if we have
>>> the extra handling. E.g. if commit 1 failed partially:
>>>
>>>       Disk 1            Disk 2
>>>       Changing_fsid_v2    no changing_fsid_v2
>>>
>>> In this case, we can detects we're trying to start fsid change using
>>> metadata uuid.
>>>
>>> The same thing for the 2nd commit.
>>
>>
>>
>> The changing_fsid_v2 flag an unnecessary overhead, right? As it could be
>> something like:
>>
>>    . Write new-fsid and Verify. Revert if needed and verify.
>>
>>
>> ------
>> Summarizing the overall patches:
>>
>> Patchset [1] is a port of kernel changing_fsid_v2 recovery automation
>> functions to the progs. So, hosts with older progs and can't upgrade to
>> find a tmp host with the newer progs to fix the disks?
>>
>>     [1] [PATCH 00/16] btrfs-progs: recover from failed metadata_uuid
>>
>>
>> Automation in progs/kernel cannot fix all possible scenarios; we still
>> need user intervention to reunite, as in patchset [2]. It adds --device
>> and --noscan options to reunite. (Needs comments, so that I can rebase).
>>
>>     [2] [PATCH 00/10] btrfs-progs: check and tune: add device and noscan
>> options
>>
>>
>> Patchset [3] removes the changing_fsid_v2 recovery code from the kernel,
>> as pointed out- recovery code is robust in general, except in corner
>> cases. So, after this, similar to changing_fsid, disks with
>> changing_fsid_v2 are rejected.
> 
> Hm I think this is acceptable though degrading the feature a bit. We
> can't tell how often the kernel recovery of the metadata_uuid has
> happened but fixing all cases there might be too complex, more than it
> already is. The whole conversion in btrfstune is quite fast, crashing in
> the middle of that is possible but highly unlikely.
> 
> We don't have a proper documentation for the metadata_uuid use case,
> there are the option and sysfs descriptions but not really how it's
> supposed to be used and what to do if setting the metadata_uuid fails.

> 
>> But when progs can recover the failed situation and could remove the use
>> of the changing_fsid_v2 flag (patch [4]), we don't actually need the
>> recovery in the kernel.
> 
> Ok, let's do it.

Done.

Kernel:
  [PATCH 0/2 v2] btrfs: reject device with CHANGING_FSID_V2 flag


btrfs-progs (in the same order):

  [PATCH 0/4 v4] btrfs-progs: recover from failed metadata_uuid port kernel

and

  [PATCH] btrfs-progs: misc-tests/034-metadata-uuid remove kernel support

Thanks, Anand
diff mbox series

Patch

diff --git a/tune/change-metadata-uuid.c b/tune/change-metadata-uuid.c
index 371f34e679b4..ac8c2fd398c2 100644
--- a/tune/change-metadata-uuid.c
+++ b/tune/change-metadata-uuid.c
@@ -33,7 +33,6 @@  int set_metadata_uuid(struct btrfs_root *root, const char *new_fsid_string)
 	u64 incompat_flags;
 	bool fsid_changed;
 	u64 super_flags;
-	int ret;
 
 	disk_super = root->fs_info->super_copy;
 	super_flags = btrfs_super_flags(disk_super);
@@ -66,14 +65,6 @@  int set_metadata_uuid(struct btrfs_root *root, const char *new_fsid_string)
 
 	new_fsid = (memcmp(fsid, disk_super->fsid, BTRFS_FSID_SIZE) != 0);
 
-	/* Step 1 sets the in progress flag */
-	trans = btrfs_start_transaction(root, 1);
-	super_flags |= BTRFS_SUPER_FLAG_CHANGING_FSID_V2;
-	btrfs_set_super_flags(disk_super, super_flags);
-	ret = btrfs_commit_transaction(trans, root);
-	if (ret < 0)
-		return ret;
-
 	if (new_fsid && fsid_changed && memcmp(disk_super->metadata_uuid,
 					       fsid, BTRFS_FSID_SIZE) == 0) {
 		/*
@@ -106,8 +97,7 @@  int set_metadata_uuid(struct btrfs_root *root, const char *new_fsid_string)
 	trans = btrfs_start_transaction(root, 1);
 
 	/*
-	 * Step 2 is to write the metadata_uuid, set the incompat flag and
-	 * clear the in progress flag
+	 * For leagcy purpose reset the BTRFS_SUPER_FLAG_CHANGING_FSID_V2
 	 */
 	super_flags &= ~BTRFS_SUPER_FLAG_CHANGING_FSID_V2;
 	btrfs_set_super_flags(disk_super, super_flags);