diff mbox series

[v3,3/5] btrfs-progs: separate block group tree from extent tree v2

Message ID 5eef4fd2d55a02dab38a6d1dec43dbcd82652508.1660024949.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs-progs: separate BLOCK_GROUP_TREE feature from extent-tree-v2 | expand

Commit Message

Qu Wenruo Aug. 9, 2022, 6:03 a.m. UTC
Block group tree feature is completely a standalone feature, and it has
been over 5 years before the initial introduction to solve the long
mount time.

I don't really want to waste another 5 years waiting for a feature which
may or may not work, but definitely not properly reviewed for its
preparation patches.

So this patch will separate the block group tree feature into a
standalone compat RO feature.

There is a catch, in mkfs create_block_group_tree(), current
tree-checker only accepts block group item with valid chunk_objectid,
but the existing code from extent-tree-v2 didn't properly initialize it.

This patch will also fix above mentioned problem so kernel can mount it
correctly.

Now mkfs/fsck should be able to handle the fs with block group tree.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 check/main.c               |  8 ++------
 common/fsfeatures.c        |  8 ++++++++
 common/fsfeatures.h        |  2 ++
 kernel-shared/ctree.h      |  9 ++++++++-
 kernel-shared/disk-io.c    |  4 ++--
 kernel-shared/disk-io.h    |  2 +-
 kernel-shared/print-tree.c |  5 ++---
 mkfs/common.c              | 31 ++++++++++++++++++++++++-------
 mkfs/main.c                |  3 ++-
 9 files changed, 51 insertions(+), 21 deletions(-)

Comments

David Sterba Aug. 31, 2022, 7:14 p.m. UTC | #1
On Tue, Aug 09, 2022 at 02:03:53PM +0800, Qu Wenruo wrote:
> Block group tree feature is completely a standalone feature, and it has
> been over 5 years before the initial introduction to solve the long
> mount time.
> 
> I don't really want to waste another 5 years waiting for a feature which
> may or may not work, but definitely not properly reviewed for its
> preparation patches.

This should go to the cover letter but in the commit such ranting does
not bring much information for the code change. And I rephrase or delete
such things unless it's somehow relevant.

> So this patch will separate the block group tree feature into a
> standalone compat RO feature.
> 
> There is a catch, in mkfs create_block_group_tree(), current
> tree-checker only accepts block group item with valid chunk_objectid,
> but the existing code from extent-tree-v2 didn't properly initialize it.
> 
> This patch will also fix above mentioned problem so kernel can mount it
> correctly.
> 
> Now mkfs/fsck should be able to handle the fs with block group tree.
> 
> --- a/common/fsfeatures.c
> +++ b/common/fsfeatures.c
> @@ -172,6 +172,14 @@ static const struct btrfs_feature runtime_features[] = {
>  		VERSION_TO_STRING2(safe, 4,9),
>  		VERSION_TO_STRING2(default, 5,15),
>  		.desc		= "free space tree (space_cache=v2)"
> +	}, {
> +		.name		= "block-group-tree",
> +		.flag		= BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
> +		.sysfs_name = "block_group_tree",
> +		VERSION_TO_STRING2(compat, 6,0),
> +		VERSION_NULL(safe),
> +		VERSION_NULL(default),
> +		.desc		= "block group tree to reduce mount time"

Like explaining that this is a runtime feature and I have not noticed
until I tried to test it expecting to see it among the mkfs-time
features but there was nothing in 'mkfs.btrfs -O list-all'.

This is a mkfs-time feature as it creates a fundamental on-disk
structure, basically a subset of extent tree.

As it's in one patch please send a fixup so I can fold it. Thanks.
Qu Wenruo Aug. 31, 2022, 9:43 p.m. UTC | #2
On 2022/9/1 03:14, David Sterba wrote:
> On Tue, Aug 09, 2022 at 02:03:53PM +0800, Qu Wenruo wrote:
>> Block group tree feature is completely a standalone feature, and it has
>> been over 5 years before the initial introduction to solve the long
>> mount time.
>>
>> I don't really want to waste another 5 years waiting for a feature which
>> may or may not work, but definitely not properly reviewed for its
>> preparation patches.
>
> This should go to the cover letter but in the commit such ranting does
> not bring much information for the code change. And I rephrase or delete
> such things unless it's somehow relevant.
>
>> So this patch will separate the block group tree feature into a
>> standalone compat RO feature.
>>
>> There is a catch, in mkfs create_block_group_tree(), current
>> tree-checker only accepts block group item with valid chunk_objectid,
>> but the existing code from extent-tree-v2 didn't properly initialize it.
>>
>> This patch will also fix above mentioned problem so kernel can mount it
>> correctly.
>>
>> Now mkfs/fsck should be able to handle the fs with block group tree.
>>
>> --- a/common/fsfeatures.c
>> +++ b/common/fsfeatures.c
>> @@ -172,6 +172,14 @@ static const struct btrfs_feature runtime_features[] = {
>>   		VERSION_TO_STRING2(safe, 4,9),
>>   		VERSION_TO_STRING2(default, 5,15),
>>   		.desc		= "free space tree (space_cache=v2)"
>> +	}, {
>> +		.name		= "block-group-tree",
>> +		.flag		= BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
>> +		.sysfs_name = "block_group_tree",
>> +		VERSION_TO_STRING2(compat, 6,0),
>> +		VERSION_NULL(safe),
>> +		VERSION_NULL(default),
>> +		.desc		= "block group tree to reduce mount time"
>
> Like explaining that this is a runtime feature and I have not noticed
> until I tried to test it expecting to see it among the mkfs-time
> features but there was nothing in 'mkfs.btrfs -O list-all'.
>
> This is a mkfs-time feature as it creates a fundamental on-disk
> structure, basically a subset of extent tree.

This comes to the decision to make bg-tree feature as a compat RO flag.

As we didn't put free-space-tree into "-O" options, but "-R" options.
So the same should be done for most compat RO flags.

Furthermore I remember I discussed about this before, extent tree change
should not need a full incompat flag, as pure read-only tools, like
btrfs-fuse should still be able to read the subvolume/csum/chunk/root
trees without any problem.

So following above reasons, bg-tree is compat RO, and compat RO goes
into "-R" options, I see no reason to put it into "-O" options.

Thanks,
Qu

>
> As it's in one patch please send a fixup so I can fold it. Thanks.
Qu Wenruo Sept. 1, 2022, 12:15 p.m. UTC | #3
On 2022/9/1 05:43, Qu Wenruo wrote:
>
>
> On 2022/9/1 03:14, David Sterba wrote:
>> On Tue, Aug 09, 2022 at 02:03:53PM +0800, Qu Wenruo wrote:
>>> Block group tree feature is completely a standalone feature, and it has
>>> been over 5 years before the initial introduction to solve the long
>>> mount time.
>>>
>>> I don't really want to waste another 5 years waiting for a feature which
>>> may or may not work, but definitely not properly reviewed for its
>>> preparation patches.
>>
>> This should go to the cover letter but in the commit such ranting does
>> not bring much information for the code change. And I rephrase or delete
>> such things unless it's somehow relevant.
>>
>>> So this patch will separate the block group tree feature into a
>>> standalone compat RO feature.
>>>
>>> There is a catch, in mkfs create_block_group_tree(), current
>>> tree-checker only accepts block group item with valid chunk_objectid,
>>> but the existing code from extent-tree-v2 didn't properly initialize it.
>>>
>>> This patch will also fix above mentioned problem so kernel can mount it
>>> correctly.
>>>
>>> Now mkfs/fsck should be able to handle the fs with block group tree.
>>>
>>> --- a/common/fsfeatures.c
>>> +++ b/common/fsfeatures.c
>>> @@ -172,6 +172,14 @@ static const struct btrfs_feature
>>> runtime_features[] = {
>>>           VERSION_TO_STRING2(safe, 4,9),
>>>           VERSION_TO_STRING2(default, 5,15),
>>>           .desc        = "free space tree (space_cache=v2)"
>>> +    }, {
>>> +        .name        = "block-group-tree",
>>> +        .flag        = BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
>>> +        .sysfs_name = "block_group_tree",
>>> +        VERSION_TO_STRING2(compat, 6,0),
>>> +        VERSION_NULL(safe),
>>> +        VERSION_NULL(default),
>>> +        .desc        = "block group tree to reduce mount time"
>>
>> Like explaining that this is a runtime feature and I have not noticed
>> until I tried to test it expecting to see it among the mkfs-time
>> features but there was nothing in 'mkfs.btrfs -O list-all'.
>>
>> This is a mkfs-time feature as it creates a fundamental on-disk
>> structure, basically a subset of extent tree.
>
> This comes to the decision to make bg-tree feature as a compat RO flag.
>
> As we didn't put free-space-tree into "-O" options, but "-R" options.
> So the same should be done for most compat RO flags.
>
> Furthermore I remember I discussed about this before, extent tree change
> should not need a full incompat flag, as pure read-only tools, like
> btrfs-fuse should still be able to read the subvolume/csum/chunk/root
> trees without any problem.
>
> So following above reasons, bg-tree is compat RO, and compat RO goes
> into "-R" options, I see no reason to put it into "-O" options.

After more consideration, I believe we shouldn't split all the features
(including quota) between "-O" and "-R" options.

Firstly, although free space tree is compat RO (and a lot of future
features will also be compat RO), it's still a on-disk format change (a
new tree, some new keys).

It's even a bigger change compared to NO_HOLES features.
No to mention the block group tree.

Now we have a very bad split for -R and -O, some of them are on-disk
format change that is large enough, but still compat RO.

Some of them should be compat RO, but still set as incompt flags.

To me, end users should not really bother what the feature is
implemented, they only need to bother:

- What the feature is doing
- What is the compatibility
   The incompat and compat RO doesn't make too much difference for most
   users, they just care about which kernel version is compatible.

So from this point of view, -O/-R split it not really helpful from the
very beginning.

It may make sense for quota, which is the only exception, it's supported
from the very beginning, without a compat RO/incompat flag.

But for more and more features, -O/-R split doesn't make much sense.

Thanks,
Qu

>
> Thanks,
> Qu
>
>>
>> As it's in one patch please send a fixup so I can fold it. Thanks.
David Sterba Sept. 2, 2022, 9:21 a.m. UTC | #4
On Thu, Sep 01, 2022 at 08:15:07PM +0800, Qu Wenruo wrote:
> >>> --- a/common/fsfeatures.c
> >>> +++ b/common/fsfeatures.c
> >>> @@ -172,6 +172,14 @@ static const struct btrfs_feature
> >>> runtime_features[] = {
> >>>           VERSION_TO_STRING2(safe, 4,9),
> >>>           VERSION_TO_STRING2(default, 5,15),
> >>>           .desc        = "free space tree (space_cache=v2)"
> >>> +    }, {
> >>> +        .name        = "block-group-tree",
> >>> +        .flag        = BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
> >>> +        .sysfs_name = "block_group_tree",
> >>> +        VERSION_TO_STRING2(compat, 6,0),
> >>> +        VERSION_NULL(safe),
> >>> +        VERSION_NULL(default),
> >>> +        .desc        = "block group tree to reduce mount time"
> >>
> >> Like explaining that this is a runtime feature and I have not noticed
> >> until I tried to test it expecting to see it among the mkfs-time
> >> features but there was nothing in 'mkfs.btrfs -O list-all'.
> >>
> >> This is a mkfs-time feature as it creates a fundamental on-disk
> >> structure, basically a subset of extent tree.
> >
> > This comes to the decision to make bg-tree feature as a compat RO flag.
> >
> > As we didn't put free-space-tree into "-O" options, but "-R" options.
> > So the same should be done for most compat RO flags.
> >
> > Furthermore I remember I discussed about this before, extent tree change
> > should not need a full incompat flag, as pure read-only tools, like
> > btrfs-fuse should still be able to read the subvolume/csum/chunk/root
> > trees without any problem.
> >
> > So following above reasons, bg-tree is compat RO, and compat RO goes
> > into "-R" options, I see no reason to put it into "-O" options.
> 
> After more consideration, I believe we shouldn't split all the features
> (including quota) between "-O" and "-R" options.

After reading your previous I got to the same conclusion.

> Firstly, although free space tree is compat RO (and a lot of future
> features will also be compat RO), it's still a on-disk format change (a
> new tree, some new keys).
> 
> It's even a bigger change compared to NO_HOLES features.
> No to mention the block group tree.
> 
> Now we have a very bad split for -R and -O, some of them are on-disk
> format change that is large enough, but still compat RO.

Agreed.

> Some of them should be compat RO, but still set as incompt flags.
> 
> To me, end users should not really bother what the feature is
> implemented, they only need to bother:
> 
> - What the feature is doing
> - What is the compatibility
>    The incompat and compat RO doesn't make too much difference for most
>    users, they just care about which kernel version is compatible.
> 
> So from this point of view, -O/-R split it not really helpful from the
> very beginning.
> 
> It may make sense for quota, which is the only exception, it's supported
> from the very beginning, without a compat RO/incompat flag.
> 
> But for more and more features, -O/-R split doesn't make much sense.

Yeah, the free-space-tree is misplaced and I did not realize that back
then. That something is possible to switch on at run time by a mount
option should not be the only condition to put the option to the -R option.

Quota are maybe still a good example of the runtime feature, there's a
command to enable and disable it. There are additional structures
created or deleted but it's not something fundamental. The distinction
in the options should hint at what's the type "what if I don't select
this now, can I turn it on later?", perhaps documentation should be more
explicit about that.

For compatibility we need to keep free-space-tree under -R but we can
add an alias to -O and everything of that sort add there too, like the
block group tree.
Qu Wenruo Sept. 2, 2022, 9:37 a.m. UTC | #5
On 2022/9/2 17:21, David Sterba wrote:
> On Thu, Sep 01, 2022 at 08:15:07PM +0800, Qu Wenruo wrote:
>>>>> --- a/common/fsfeatures.c
>>>>> +++ b/common/fsfeatures.c
>>>>> @@ -172,6 +172,14 @@ static const struct btrfs_feature
>>>>> runtime_features[] = {
>>>>>            VERSION_TO_STRING2(safe, 4,9),
>>>>>            VERSION_TO_STRING2(default, 5,15),
>>>>>            .desc        = "free space tree (space_cache=v2)"
>>>>> +    }, {
>>>>> +        .name        = "block-group-tree",
>>>>> +        .flag        = BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
>>>>> +        .sysfs_name = "block_group_tree",
>>>>> +        VERSION_TO_STRING2(compat, 6,0),
>>>>> +        VERSION_NULL(safe),
>>>>> +        VERSION_NULL(default),
>>>>> +        .desc        = "block group tree to reduce mount time"
>>>>
>>>> Like explaining that this is a runtime feature and I have not noticed
>>>> until I tried to test it expecting to see it among the mkfs-time
>>>> features but there was nothing in 'mkfs.btrfs -O list-all'.
>>>>
>>>> This is a mkfs-time feature as it creates a fundamental on-disk
>>>> structure, basically a subset of extent tree.
>>>
>>> This comes to the decision to make bg-tree feature as a compat RO flag.
>>>
>>> As we didn't put free-space-tree into "-O" options, but "-R" options.
>>> So the same should be done for most compat RO flags.
>>>
>>> Furthermore I remember I discussed about this before, extent tree change
>>> should not need a full incompat flag, as pure read-only tools, like
>>> btrfs-fuse should still be able to read the subvolume/csum/chunk/root
>>> trees without any problem.
>>>
>>> So following above reasons, bg-tree is compat RO, and compat RO goes
>>> into "-R" options, I see no reason to put it into "-O" options.
>>
>> After more consideration, I believe we shouldn't split all the features
>> (including quota) between "-O" and "-R" options.
> 
> After reading your previous I got to the same conclusion.
> 
>> Firstly, although free space tree is compat RO (and a lot of future
>> features will also be compat RO), it's still a on-disk format change (a
>> new tree, some new keys).
>>
>> It's even a bigger change compared to NO_HOLES features.
>> No to mention the block group tree.
>>
>> Now we have a very bad split for -R and -O, some of them are on-disk
>> format change that is large enough, but still compat RO.
> 
> Agreed.
> 
>> Some of them should be compat RO, but still set as incompt flags.
>>
>> To me, end users should not really bother what the feature is
>> implemented, they only need to bother:
>>
>> - What the feature is doing
>> - What is the compatibility
>>     The incompat and compat RO doesn't make too much difference for most
>>     users, they just care about which kernel version is compatible.
>>
>> So from this point of view, -O/-R split it not really helpful from the
>> very beginning.
>>
>> It may make sense for quota, which is the only exception, it's supported
>> from the very beginning, without a compat RO/incompat flag.
>>
>> But for more and more features, -O/-R split doesn't make much sense.
> 
> Yeah, the free-space-tree is misplaced and I did not realize that back
> then. That something is possible to switch on at run time by a mount
> option should not be the only condition to put the option to the -R option.
> 
> Quota are maybe still a good example of the runtime feature, there's a
> command to enable and disable it. There are additional structures
> created or deleted but it's not something fundamental. The distinction
> in the options should hint at what's the type "what if I don't select
> this now, can I turn it on later?", perhaps documentation should be more
> explicit about that.

Quota tree is a special case, just because it's from day-one, thus no 
compat/compat ro/incompat flags needed at all.

To me, we can accept one exception.

> 
> For compatibility we need to keep free-space-tree under -R but we can
> add an alias to -O and everything of that sort add there too, like the
> block group tree.

That's simple, make -R deprecated, and treat -R just as -O internally, 
and put all features including quota into -O.

Of course, we may need some small changes, as now one fs feature needs 1 
or 0 compat/compat ro/incompat flags set.
But everything else, from the compat/safe/default string can be 
inherited from the existing format.

By this, we have the minimal code change, while still keeps the same 
compatibility (in fact, greatly enlarged -O options)

Thanks,
Qu
David Sterba Sept. 2, 2022, 12:10 p.m. UTC | #6
On Fri, Sep 02, 2022 at 05:37:53PM +0800, Qu Wenruo wrote:
> > Yeah, the free-space-tree is misplaced and I did not realize that back
> > then. That something is possible to switch on at run time by a mount
> > option should not be the only condition to put the option to the -R option.
> > 
> > Quota are maybe still a good example of the runtime feature, there's a
> > command to enable and disable it. There are additional structures
> > created or deleted but it's not something fundamental. The distinction
> > in the options should hint at what's the type "what if I don't select
> > this now, can I turn it on later?", perhaps documentation should be more
> > explicit about that.
> 
> Quota tree is a special case, just because it's from day-one, thus no 
> compat/compat ro/incompat flags needed at all.
> 
> To me, we can accept one exception.
> 
> > 
> > For compatibility we need to keep free-space-tree under -R but we can
> > add an alias to -O and everything of that sort add there too, like the
> > block group tree.
> 
> That's simple, make -R deprecated, and treat -R just as -O internally, 
> and put all features including quota into -O.
> 
> Of course, we may need some small changes, as now one fs feature needs 1 
> or 0 compat/compat ro/incompat flags set.
> But everything else, from the compat/safe/default string can be 
> inherited from the existing format.
> 
> By this, we have the minimal code change, while still keeps the same 
> compatibility (in fact, greatly enlarged -O options)

It's a change to the interface so it's always with some consequences but
I think a single option for features is indeed an improvement. We now
have only 2 under -R so it's not that bad yet.

I've looked to manual pages of other filesystems' mkfs, there are
separate options but for specific features like for journal, or
additional tunables. For the global features there's one option.

We can add the quota and f-s-tree in a minor release, it's not breaking
compatibility and add a warning to -R in some future major release.
Anand Jain Oct. 3, 2022, 2:48 p.m. UTC | #7
This patch is causing regressions; now can't mkfs with extent-tree-v2.


$ mkfs.btrfs -f -O block-group-tree  /dev/nvme0n1
btrfs-progs v5.19.1
See http://btrfs.wiki.kernel.org for more information.

ERROR: superblock magic doesn't match
ERROR: illegal nodesize 16384 (not equal to 4096 for mixed block group)



$ mkfs.btrfs -f -O extent-tree-v2  /dev/nvme0n1
btrfs-progs v5.19.1
See http://btrfs.wiki.kernel.org for more information.

ERROR: superblock magic doesn't match
NOTE: several default settings have changed in version 5.15, please make 
sure
       this does not affect your deployments:
       - DUP for metadata (-m dup)
       - enabled no-holes (-O no-holes)
       - enabled free-space-tree (-R free-space-tree)

Unable to find block group for 0
Unable to find block group for 0
Unable to find block group for 0
ERROR: no space to allocate metadata chunk
ERROR: failed to create default block groups: -28




On 09/08/2022 14:03, Qu Wenruo wrote:
> Block group tree feature is completely a standalone feature, and it has
> been over 5 years before the initial introduction to solve the long
> mount time.
> 
> I don't really want to waste another 5 years waiting for a feature which
> may or may not work, but definitely not properly reviewed for its
> preparation patches.
> 
> So this patch will separate the block group tree feature into a
> standalone compat RO feature.
> 
> There is a catch, in mkfs create_block_group_tree(), current
> tree-checker only accepts block group item with valid chunk_objectid,
> but the existing code from extent-tree-v2 didn't properly initialize it.
> 
> This patch will also fix above mentioned problem so kernel can mount it
> correctly.
> 
> Now mkfs/fsck should be able to handle the fs with block group tree.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   check/main.c               |  8 ++------
>   common/fsfeatures.c        |  8 ++++++++
>   common/fsfeatures.h        |  2 ++
>   kernel-shared/ctree.h      |  9 ++++++++-
>   kernel-shared/disk-io.c    |  4 ++--
>   kernel-shared/disk-io.h    |  2 +-
>   kernel-shared/print-tree.c |  5 ++---
>   mkfs/common.c              | 31 ++++++++++++++++++++++++-------
>   mkfs/main.c                |  3 ++-
>   9 files changed, 51 insertions(+), 21 deletions(-)
> 
> diff --git a/check/main.c b/check/main.c
> index 4f7ab8b29309..02abbd5289f9 100644
> --- a/check/main.c
> +++ b/check/main.c
> @@ -6293,7 +6293,7 @@ static int check_type_with_root(u64 rootid, u8 key_type)
>   			goto err;
>   		break;
>   	case BTRFS_BLOCK_GROUP_ITEM_KEY:
> -		if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
> +		if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
>   			if (rootid != BTRFS_BLOCK_GROUP_TREE_OBJECTID)
>   				goto err;
>   		} else if (rootid != BTRFS_EXTENT_TREE_OBJECTID) {
> @@ -9071,10 +9071,6 @@ again:
>   	ret = load_super_root(&normal_trees, gfs_info->chunk_root);
>   	if (ret < 0)
>   		goto out;
> -	ret = load_super_root(&normal_trees, gfs_info->block_group_root);
> -	if (ret < 0)
> -		goto out;
> -
>   	ret = parse_tree_roots(&normal_trees, &dropping_trees);
>   	if (ret < 0)
>   		goto out;
> @@ -9574,7 +9570,7 @@ again:
>   	 * If we are extent tree v2 then we can reint the block group root as
>   	 * well.
>   	 */
> -	if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
> +	if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
>   		ret = btrfs_fsck_reinit_root(trans, gfs_info->block_group_root);
>   		if (ret) {
>   			fprintf(stderr, "block group initialization failed\n");
> diff --git a/common/fsfeatures.c b/common/fsfeatures.c
> index 23a92c21a2cc..90704959b13b 100644
> --- a/common/fsfeatures.c
> +++ b/common/fsfeatures.c
> @@ -172,6 +172,14 @@ static const struct btrfs_feature runtime_features[] = {
>   		VERSION_TO_STRING2(safe, 4,9),
>   		VERSION_TO_STRING2(default, 5,15),
>   		.desc		= "free space tree (space_cache=v2)"
> +	}, {
> +		.name		= "block-group-tree",
> +		.flag		= BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
> +		.sysfs_name = "block_group_tree",
> +		VERSION_TO_STRING2(compat, 6,0),
> +		VERSION_NULL(safe),
> +		VERSION_NULL(default),
> +		.desc		= "block group tree to reduce mount time"
>   	},
>   	/* Keep this one last */
>   	{
> diff --git a/common/fsfeatures.h b/common/fsfeatures.h
> index 9e39c667b900..a8d77fd4da05 100644
> --- a/common/fsfeatures.h
> +++ b/common/fsfeatures.h
> @@ -45,6 +45,8 @@
>   
>   #define BTRFS_RUNTIME_FEATURE_QUOTA		(1ULL << 0)
>   #define BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE	(1ULL << 1)
> +#define BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE	(1ULL << 2)
> +
>   
>   void btrfs_list_all_fs_features(u64 mask_disallowed);
>   void btrfs_list_all_runtime_features(u64 mask_disallowed);
> diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
> index c12076202577..d8909b3fdf20 100644
> --- a/kernel-shared/ctree.h
> +++ b/kernel-shared/ctree.h
> @@ -479,6 +479,12 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE);
>    */
>   #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID	(1ULL << 1)
>   
> +/*
> + * Save all block group items into a dedicated block group tree, to greatly
> + * reduce mount time for large fs.
> + */
> +#define BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE	(1ULL << 5)
> +
>   #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF	(1ULL << 0)
>   #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL	(1ULL << 1)
>   #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
> @@ -508,7 +514,8 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE);
>    */
>   #define BTRFS_FEATURE_COMPAT_RO_SUPP			\
>   	(BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |	\
> -	 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID)
> +	 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID| \
> +	 BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE)
>   
>   #if EXPERIMENTAL
>   #define BTRFS_FEATURE_INCOMPAT_SUPP			\
> diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
> index 80db5976cc3f..6eeb5ecd1d59 100644
> --- a/kernel-shared/disk-io.c
> +++ b/kernel-shared/disk-io.c
> @@ -1203,7 +1203,7 @@ static int load_important_roots(struct btrfs_fs_info *fs_info,
>   		backup = sb->super_roots + index;
>   	}
>   
> -	if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
> +	if (!btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
>   		free(fs_info->block_group_root);
>   		fs_info->block_group_root = NULL;
>   		goto tree_root;
> @@ -1256,7 +1256,7 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
>   	if (ret)
>   		return ret;
>   
> -	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
> +	if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
>   		ret = find_and_setup_root(root, fs_info,
>   				BTRFS_BLOCK_GROUP_TREE_OBJECTID,
>   				fs_info->block_group_root);
> diff --git a/kernel-shared/disk-io.h b/kernel-shared/disk-io.h
> index bba97fc1a814..6c8eaa2bd13d 100644
> --- a/kernel-shared/disk-io.h
> +++ b/kernel-shared/disk-io.h
> @@ -232,7 +232,7 @@ int btrfs_global_root_insert(struct btrfs_fs_info *fs_info,
>   static inline struct btrfs_root *btrfs_block_group_root(
>   						struct btrfs_fs_info *fs_info)
>   {
> -	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
> +	if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE))
>   		return fs_info->block_group_root;
>   	return btrfs_extent_root(fs_info, 0);
>   }
> diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c
> index bffe30b405c7..b2ee77c2fb73 100644
> --- a/kernel-shared/print-tree.c
> +++ b/kernel-shared/print-tree.c
> @@ -1668,6 +1668,7 @@ struct readable_flag_entry {
>   static struct readable_flag_entry compat_ro_flags_array[] = {
>   	DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE),
>   	DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE_VALID),
> +	DEF_COMPAT_RO_FLAG_ENTRY(BLOCK_GROUP_TREE),
>   };
>   static const int compat_ro_flags_num = sizeof(compat_ro_flags_array) /
>   				       sizeof(struct readable_flag_entry);
> @@ -1754,9 +1755,7 @@ static void print_readable_compat_ro_flag(u64 flag)
>   	 */
>   	return __print_readable_flag(flag, compat_ro_flags_array,
>   				     compat_ro_flags_num,
> -				     BTRFS_FEATURE_COMPAT_RO_SUPP |
> -				     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
> -				     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
> +				     BTRFS_FEATURE_COMPAT_RO_SUPP);
>   }
>   
>   static void print_readable_incompat_flag(u64 flag)
> diff --git a/mkfs/common.c b/mkfs/common.c
> index b72338551dfb..cb616f13ef9b 100644
> --- a/mkfs/common.c
> +++ b/mkfs/common.c
> @@ -75,6 +75,8 @@ static int btrfs_create_tree_root(int fd, struct btrfs_mkfs_config *cfg,
>   	int blk;
>   	int i;
>   	u8 uuid[BTRFS_UUID_SIZE];
> +	bool block_group_tree = !!(cfg->runtime_features &
> +				   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
>   
>   	memset(buf->data + sizeof(struct btrfs_header), 0,
>   		cfg->nodesize - sizeof(struct btrfs_header));
> @@ -101,6 +103,9 @@ static int btrfs_create_tree_root(int fd, struct btrfs_mkfs_config *cfg,
>   		if (blk == MKFS_ROOT_TREE || blk == MKFS_CHUNK_TREE)
>   			continue;
>   
> +		if (!block_group_tree && blk == MKFS_BLOCK_GROUP_TREE)
> +			continue;
> +
>   		btrfs_set_root_bytenr(&root_item, cfg->blocks[blk]);
>   		btrfs_set_disk_key_objectid(&disk_key,
>   			reference_root_table[blk]);
> @@ -216,7 +221,8 @@ static int create_block_group_tree(int fd, struct btrfs_mkfs_config *cfg,
>   
>   	memset(buf->data + sizeof(struct btrfs_header), 0,
>   		cfg->nodesize - sizeof(struct btrfs_header));
> -	write_block_group_item(buf, 0, bg_offset, bg_size, bg_used, 0,
> +	write_block_group_item(buf, 0, bg_offset, bg_size, bg_used,
> +			       BTRFS_FIRST_CHUNK_TREE_OBJECTID,
>   			       cfg->leaf_data_size -
>   			       sizeof(struct btrfs_block_group_item));
>   	btrfs_set_header_bytenr(buf, cfg->blocks[MKFS_BLOCK_GROUP_TREE]);
> @@ -357,6 +363,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>   	u32 array_size;
>   	u32 item_size;
>   	u64 total_used = 0;
> +	u64 ro_flags = 0;
>   	int skinny_metadata = !!(cfg->features &
>   				 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA);
>   	u64 num_bytes;
> @@ -365,6 +372,8 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>   	bool add_block_group = true;
>   	bool free_space_tree = !!(cfg->runtime_features &
>   				  BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE);
> +	bool block_group_tree = !!(cfg->runtime_features &
> +				   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
>   	bool extent_tree_v2 = !!(cfg->features &
>   				 BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2);
>   
> @@ -372,8 +381,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>   	       sizeof(enum btrfs_mkfs_block) * ARRAY_SIZE(default_blocks));
>   	blocks_nr = ARRAY_SIZE(default_blocks);
>   
> -	/* Extent tree v2 needs an extra block for block group tree.*/
> -	if (extent_tree_v2) {
> +	/*
> +	 * Add one new block for block group tree.
> +	 * And for block group tree, we don't need to add block group item
> +	 * into extent tree, the item will be handled in block group tree
> +	 * initialization.
> +	 */
> +	if (block_group_tree) {
>   		mkfs_blocks_add(blocks, &blocks_nr, MKFS_BLOCK_GROUP_TREE);
>   		add_block_group = false;
>   	}
> @@ -433,12 +447,15 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>   		btrfs_set_super_cache_generation(&super, -1);
>   	btrfs_set_super_incompat_flags(&super, cfg->features);
>   	if (free_space_tree) {
> -		u64 ro_flags = BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
> -			BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID;
> +		ro_flags |= (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
> +			     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
>   
> -		btrfs_set_super_compat_ro_flags(&super, ro_flags);
>   		btrfs_set_super_cache_generation(&super, 0);
>   	}
> +	if (block_group_tree)
> +		ro_flags |= BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE;
> +	btrfs_set_super_compat_ro_flags(&super, ro_flags);
> +
>   	if (extent_tree_v2)
>   		btrfs_set_super_nr_global_roots(&super, 1);
>   
> @@ -695,7 +712,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>   			goto out;
>   	}
>   
> -	if (extent_tree_v2) {
> +	if (block_group_tree) {
>   		ret = create_block_group_tree(fd, cfg, buf,
>   					      system_group_offset,
>   					      system_group_size, total_used);
> diff --git a/mkfs/main.c b/mkfs/main.c
> index ce096d362171..518ce0fd7523 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -299,7 +299,8 @@ static int recow_roots(struct btrfs_trans_handle *trans,
>   	ret = __recow_root(trans, info->dev_root);
>   	if (ret)
>   		return ret;
> -        if (btrfs_fs_incompat(info, EXTENT_TREE_V2)) {
> +
> +	if (btrfs_fs_compat_ro(info, BLOCK_GROUP_TREE)) {
>   		ret = __recow_root(trans, info->block_group_root);
>   		if (ret)
>   			return ret;
Qu Wenruo Oct. 3, 2022, 11:28 p.m. UTC | #8
On 2022/10/3 22:48, Anand Jain wrote:
> 
> This patch is causing regressions; now can't mkfs with extent-tree-v2.

I'm already looking at it.

> 
> 
> $ mkfs.btrfs -f -O block-group-tree  /dev/nvme0n1
> btrfs-progs v5.19.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> ERROR: superblock magic doesn't match
> ERROR: illegal nodesize 16384 (not equal to 4096 for mixed block group)
> 
> 
> 
> $ mkfs.btrfs -f -O extent-tree-v2  /dev/nvme0n1
> btrfs-progs v5.19.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> ERROR: superblock magic doesn't match
> NOTE: several default settings have changed in version 5.15, please make 
> sure
>        this does not affect your deployments:
>        - DUP for metadata (-m dup)
>        - enabled no-holes (-O no-holes)
>        - enabled free-space-tree (-R free-space-tree)
> 
> Unable to find block group for 0
> Unable to find block group for 0
> Unable to find block group for 0
> ERROR: no space to allocate metadata chunk
> ERROR: failed to create default block groups: -28
> 
> 
> 
> 
> On 09/08/2022 14:03, Qu Wenruo wrote:
>> Block group tree feature is completely a standalone feature, and it has
>> been over 5 years before the initial introduction to solve the long
>> mount time.
>>
>> I don't really want to waste another 5 years waiting for a feature which
>> may or may not work, but definitely not properly reviewed for its
>> preparation patches.
>>
>> So this patch will separate the block group tree feature into a
>> standalone compat RO feature.
>>
>> There is a catch, in mkfs create_block_group_tree(), current
>> tree-checker only accepts block group item with valid chunk_objectid,
>> but the existing code from extent-tree-v2 didn't properly initialize it.
>>
>> This patch will also fix above mentioned problem so kernel can mount it
>> correctly.
>>
>> Now mkfs/fsck should be able to handle the fs with block group tree.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   check/main.c               |  8 ++------
>>   common/fsfeatures.c        |  8 ++++++++
>>   common/fsfeatures.h        |  2 ++
>>   kernel-shared/ctree.h      |  9 ++++++++-
>>   kernel-shared/disk-io.c    |  4 ++--
>>   kernel-shared/disk-io.h    |  2 +-
>>   kernel-shared/print-tree.c |  5 ++---
>>   mkfs/common.c              | 31 ++++++++++++++++++++++++-------
>>   mkfs/main.c                |  3 ++-
>>   9 files changed, 51 insertions(+), 21 deletions(-)
>>
>> diff --git a/check/main.c b/check/main.c
>> index 4f7ab8b29309..02abbd5289f9 100644
>> --- a/check/main.c
>> +++ b/check/main.c
>> @@ -6293,7 +6293,7 @@ static int check_type_with_root(u64 rootid, u8 
>> key_type)
>>               goto err;
>>           break;
>>       case BTRFS_BLOCK_GROUP_ITEM_KEY:
>> -        if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
>> +        if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
>>               if (rootid != BTRFS_BLOCK_GROUP_TREE_OBJECTID)
>>                   goto err;
>>           } else if (rootid != BTRFS_EXTENT_TREE_OBJECTID) {
>> @@ -9071,10 +9071,6 @@ again:
>>       ret = load_super_root(&normal_trees, gfs_info->chunk_root);
>>       if (ret < 0)
>>           goto out;
>> -    ret = load_super_root(&normal_trees, gfs_info->block_group_root);
>> -    if (ret < 0)
>> -        goto out;
>> -
>>       ret = parse_tree_roots(&normal_trees, &dropping_trees);
>>       if (ret < 0)
>>           goto out;
>> @@ -9574,7 +9570,7 @@ again:
>>        * If we are extent tree v2 then we can reint the block group 
>> root as
>>        * well.
>>        */
>> -    if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
>> +    if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
>>           ret = btrfs_fsck_reinit_root(trans, 
>> gfs_info->block_group_root);
>>           if (ret) {
>>               fprintf(stderr, "block group initialization failed\n");
>> diff --git a/common/fsfeatures.c b/common/fsfeatures.c
>> index 23a92c21a2cc..90704959b13b 100644
>> --- a/common/fsfeatures.c
>> +++ b/common/fsfeatures.c
>> @@ -172,6 +172,14 @@ static const struct btrfs_feature 
>> runtime_features[] = {
>>           VERSION_TO_STRING2(safe, 4,9),
>>           VERSION_TO_STRING2(default, 5,15),
>>           .desc        = "free space tree (space_cache=v2)"
>> +    }, {
>> +        .name        = "block-group-tree",
>> +        .flag        = BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
>> +        .sysfs_name = "block_group_tree",
>> +        VERSION_TO_STRING2(compat, 6,0),
>> +        VERSION_NULL(safe),
>> +        VERSION_NULL(default),
>> +        .desc        = "block group tree to reduce mount time"
>>       },
>>       /* Keep this one last */
>>       {
>> diff --git a/common/fsfeatures.h b/common/fsfeatures.h
>> index 9e39c667b900..a8d77fd4da05 100644
>> --- a/common/fsfeatures.h
>> +++ b/common/fsfeatures.h
>> @@ -45,6 +45,8 @@
>>   #define BTRFS_RUNTIME_FEATURE_QUOTA        (1ULL << 0)
>>   #define BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE    (1ULL << 1)
>> +#define BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE    (1ULL << 2)
>> +
>>   void btrfs_list_all_fs_features(u64 mask_disallowed);
>>   void btrfs_list_all_runtime_features(u64 mask_disallowed);
>> diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
>> index c12076202577..d8909b3fdf20 100644
>> --- a/kernel-shared/ctree.h
>> +++ b/kernel-shared/ctree.h
>> @@ -479,6 +479,12 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == 
>> BTRFS_SUPER_INFO_SIZE);
>>    */
>>   #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID    (1ULL << 1)
>> +/*
>> + * Save all block group items into a dedicated block group tree, to 
>> greatly
>> + * reduce mount time for large fs.
>> + */
>> +#define BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE    (1ULL << 5)
>> +
>>   #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF    (1ULL << 0)
>>   #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL    (1ULL << 1)
>>   #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS    (1ULL << 2)
>> @@ -508,7 +514,8 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == 
>> BTRFS_SUPER_INFO_SIZE);
>>    */
>>   #define BTRFS_FEATURE_COMPAT_RO_SUPP            \
>>       (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |    \
>> -     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID)
>> +     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID| \
>> +     BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE)
>>   #if EXPERIMENTAL
>>   #define BTRFS_FEATURE_INCOMPAT_SUPP            \
>> diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
>> index 80db5976cc3f..6eeb5ecd1d59 100644
>> --- a/kernel-shared/disk-io.c
>> +++ b/kernel-shared/disk-io.c
>> @@ -1203,7 +1203,7 @@ static int load_important_roots(struct 
>> btrfs_fs_info *fs_info,
>>           backup = sb->super_roots + index;
>>       }
>> -    if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
>> +    if (!btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
>>           free(fs_info->block_group_root);
>>           fs_info->block_group_root = NULL;
>>           goto tree_root;
>> @@ -1256,7 +1256,7 @@ int btrfs_setup_all_roots(struct btrfs_fs_info 
>> *fs_info, u64 root_tree_bytenr,
>>       if (ret)
>>           return ret;
>> -    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
>> +    if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
>>           ret = find_and_setup_root(root, fs_info,
>>                   BTRFS_BLOCK_GROUP_TREE_OBJECTID,
>>                   fs_info->block_group_root);
>> diff --git a/kernel-shared/disk-io.h b/kernel-shared/disk-io.h
>> index bba97fc1a814..6c8eaa2bd13d 100644
>> --- a/kernel-shared/disk-io.h
>> +++ b/kernel-shared/disk-io.h
>> @@ -232,7 +232,7 @@ int btrfs_global_root_insert(struct btrfs_fs_info 
>> *fs_info,
>>   static inline struct btrfs_root *btrfs_block_group_root(
>>                           struct btrfs_fs_info *fs_info)
>>   {
>> -    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
>> +    if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE))
>>           return fs_info->block_group_root;
>>       return btrfs_extent_root(fs_info, 0);
>>   }
>> diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c
>> index bffe30b405c7..b2ee77c2fb73 100644
>> --- a/kernel-shared/print-tree.c
>> +++ b/kernel-shared/print-tree.c
>> @@ -1668,6 +1668,7 @@ struct readable_flag_entry {
>>   static struct readable_flag_entry compat_ro_flags_array[] = {
>>       DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE),
>>       DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE_VALID),
>> +    DEF_COMPAT_RO_FLAG_ENTRY(BLOCK_GROUP_TREE),
>>   };
>>   static const int compat_ro_flags_num = sizeof(compat_ro_flags_array) /
>>                          sizeof(struct readable_flag_entry);
>> @@ -1754,9 +1755,7 @@ static void print_readable_compat_ro_flag(u64 flag)
>>        */
>>       return __print_readable_flag(flag, compat_ro_flags_array,
>>                        compat_ro_flags_num,
>> -                     BTRFS_FEATURE_COMPAT_RO_SUPP |
>> -                     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
>> -                     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
>> +                     BTRFS_FEATURE_COMPAT_RO_SUPP);
>>   }
>>   static void print_readable_incompat_flag(u64 flag)
>> diff --git a/mkfs/common.c b/mkfs/common.c
>> index b72338551dfb..cb616f13ef9b 100644
>> --- a/mkfs/common.c
>> +++ b/mkfs/common.c
>> @@ -75,6 +75,8 @@ static int btrfs_create_tree_root(int fd, struct 
>> btrfs_mkfs_config *cfg,
>>       int blk;
>>       int i;
>>       u8 uuid[BTRFS_UUID_SIZE];
>> +    bool block_group_tree = !!(cfg->runtime_features &
>> +                   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
>>       memset(buf->data + sizeof(struct btrfs_header), 0,
>>           cfg->nodesize - sizeof(struct btrfs_header));
>> @@ -101,6 +103,9 @@ static int btrfs_create_tree_root(int fd, struct 
>> btrfs_mkfs_config *cfg,
>>           if (blk == MKFS_ROOT_TREE || blk == MKFS_CHUNK_TREE)
>>               continue;
>> +        if (!block_group_tree && blk == MKFS_BLOCK_GROUP_TREE)
>> +            continue;
>> +
>>           btrfs_set_root_bytenr(&root_item, cfg->blocks[blk]);
>>           btrfs_set_disk_key_objectid(&disk_key,
>>               reference_root_table[blk]);
>> @@ -216,7 +221,8 @@ static int create_block_group_tree(int fd, struct 
>> btrfs_mkfs_config *cfg,
>>       memset(buf->data + sizeof(struct btrfs_header), 0,
>>           cfg->nodesize - sizeof(struct btrfs_header));
>> -    write_block_group_item(buf, 0, bg_offset, bg_size, bg_used, 0,
>> +    write_block_group_item(buf, 0, bg_offset, bg_size, bg_used,
>> +                   BTRFS_FIRST_CHUNK_TREE_OBJECTID,
>>                      cfg->leaf_data_size -
>>                      sizeof(struct btrfs_block_group_item));
>>       btrfs_set_header_bytenr(buf, cfg->blocks[MKFS_BLOCK_GROUP_TREE]);
>> @@ -357,6 +363,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>>       u32 array_size;
>>       u32 item_size;
>>       u64 total_used = 0;
>> +    u64 ro_flags = 0;
>>       int skinny_metadata = !!(cfg->features &
>>                    BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA);
>>       u64 num_bytes;
>> @@ -365,6 +372,8 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>>       bool add_block_group = true;
>>       bool free_space_tree = !!(cfg->runtime_features &
>>                     BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE);
>> +    bool block_group_tree = !!(cfg->runtime_features &
>> +                   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
>>       bool extent_tree_v2 = !!(cfg->features &
>>                    BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2);
>> @@ -372,8 +381,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>> *cfg)
>>              sizeof(enum btrfs_mkfs_block) * ARRAY_SIZE(default_blocks));
>>       blocks_nr = ARRAY_SIZE(default_blocks);
>> -    /* Extent tree v2 needs an extra block for block group tree.*/
>> -    if (extent_tree_v2) {
>> +    /*
>> +     * Add one new block for block group tree.
>> +     * And for block group tree, we don't need to add block group item
>> +     * into extent tree, the item will be handled in block group tree
>> +     * initialization.
>> +     */
>> +    if (block_group_tree) {
>>           mkfs_blocks_add(blocks, &blocks_nr, MKFS_BLOCK_GROUP_TREE);
>>           add_block_group = false;
>>       }
>> @@ -433,12 +447,15 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>> *cfg)
>>           btrfs_set_super_cache_generation(&super, -1);
>>       btrfs_set_super_incompat_flags(&super, cfg->features);
>>       if (free_space_tree) {
>> -        u64 ro_flags = BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
>> -            BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID;
>> +        ro_flags |= (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
>> +                 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
>> -        btrfs_set_super_compat_ro_flags(&super, ro_flags);
>>           btrfs_set_super_cache_generation(&super, 0);
>>       }
>> +    if (block_group_tree)
>> +        ro_flags |= BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE;
>> +    btrfs_set_super_compat_ro_flags(&super, ro_flags);
>> +
>>       if (extent_tree_v2)
>>           btrfs_set_super_nr_global_roots(&super, 1);
>> @@ -695,7 +712,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
>>               goto out;
>>       }
>> -    if (extent_tree_v2) {
>> +    if (block_group_tree) {
>>           ret = create_block_group_tree(fd, cfg, buf,
>>                             system_group_offset,
>>                             system_group_size, total_used);
>> diff --git a/mkfs/main.c b/mkfs/main.c
>> index ce096d362171..518ce0fd7523 100644
>> --- a/mkfs/main.c
>> +++ b/mkfs/main.c
>> @@ -299,7 +299,8 @@ static int recow_roots(struct btrfs_trans_handle 
>> *trans,
>>       ret = __recow_root(trans, info->dev_root);
>>       if (ret)
>>           return ret;
>> -        if (btrfs_fs_incompat(info, EXTENT_TREE_V2)) {
>> +
>> +    if (btrfs_fs_compat_ro(info, BLOCK_GROUP_TREE)) {
>>           ret = __recow_root(trans, info->block_group_root);
>>           if (ret)
>>               return ret;
>
Qu Wenruo Oct. 4, 2022, 12:05 a.m. UTC | #9
On 2022/10/4 07:28, Qu Wenruo wrote:
> 
> 
> On 2022/10/3 22:48, Anand Jain wrote:
>>
>> This patch is causing regressions; now can't mkfs with extent-tree-v2.
> 
> I'm already looking at it.

It's a more complex thing, not just a simple regression.

Firstly, commit "btrfs-progs: prepare merging compat feature lists" 
tries to merge the -O and -R options, which is a good idea.

The problem is, we're still just using the initial u64 numbers for 
btrfs_parse_fs_feaetures(), which we expect to get a simple U64 bit flags.

But unfortunately this means the u64 will have conflicting bits for 
compat_ro and incompat flags.

And for block group tree case, it's 1<<2 in compat_ro, while 1<<2 in 
incompat it's mixed bg.

Thus we trigger the problem.

I'll rework the merge patch to avoid the problem.

Thanks,
Qu

> 
>>
>>
>> $ mkfs.btrfs -f -O block-group-tree  /dev/nvme0n1
>> btrfs-progs v5.19.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> ERROR: superblock magic doesn't match
>> ERROR: illegal nodesize 16384 (not equal to 4096 for mixed block group)
>>
>>
>>
>> $ mkfs.btrfs -f -O extent-tree-v2  /dev/nvme0n1
>> btrfs-progs v5.19.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> ERROR: superblock magic doesn't match
>> NOTE: several default settings have changed in version 5.15, please 
>> make sure
>>        this does not affect your deployments:
>>        - DUP for metadata (-m dup)
>>        - enabled no-holes (-O no-holes)
>>        - enabled free-space-tree (-R free-space-tree)
>>
>> Unable to find block group for 0
>> Unable to find block group for 0
>> Unable to find block group for 0
>> ERROR: no space to allocate metadata chunk
>> ERROR: failed to create default block groups: -28
>>
>>
>>
>>
>> On 09/08/2022 14:03, Qu Wenruo wrote:
>>> Block group tree feature is completely a standalone feature, and it has
>>> been over 5 years before the initial introduction to solve the long
>>> mount time.
>>>
>>> I don't really want to waste another 5 years waiting for a feature which
>>> may or may not work, but definitely not properly reviewed for its
>>> preparation patches.
>>>
>>> So this patch will separate the block group tree feature into a
>>> standalone compat RO feature.
>>>
>>> There is a catch, in mkfs create_block_group_tree(), current
>>> tree-checker only accepts block group item with valid chunk_objectid,
>>> but the existing code from extent-tree-v2 didn't properly initialize it.
>>>
>>> This patch will also fix above mentioned problem so kernel can mount it
>>> correctly.
>>>
>>> Now mkfs/fsck should be able to handle the fs with block group tree.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>   check/main.c               |  8 ++------
>>>   common/fsfeatures.c        |  8 ++++++++
>>>   common/fsfeatures.h        |  2 ++
>>>   kernel-shared/ctree.h      |  9 ++++++++-
>>>   kernel-shared/disk-io.c    |  4 ++--
>>>   kernel-shared/disk-io.h    |  2 +-
>>>   kernel-shared/print-tree.c |  5 ++---
>>>   mkfs/common.c              | 31 ++++++++++++++++++++++++-------
>>>   mkfs/main.c                |  3 ++-
>>>   9 files changed, 51 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/check/main.c b/check/main.c
>>> index 4f7ab8b29309..02abbd5289f9 100644
>>> --- a/check/main.c
>>> +++ b/check/main.c
>>> @@ -6293,7 +6293,7 @@ static int check_type_with_root(u64 rootid, u8 
>>> key_type)
>>>               goto err;
>>>           break;
>>>       case BTRFS_BLOCK_GROUP_ITEM_KEY:
>>> -        if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
>>> +        if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
>>>               if (rootid != BTRFS_BLOCK_GROUP_TREE_OBJECTID)
>>>                   goto err;
>>>           } else if (rootid != BTRFS_EXTENT_TREE_OBJECTID) {
>>> @@ -9071,10 +9071,6 @@ again:
>>>       ret = load_super_root(&normal_trees, gfs_info->chunk_root);
>>>       if (ret < 0)
>>>           goto out;
>>> -    ret = load_super_root(&normal_trees, gfs_info->block_group_root);
>>> -    if (ret < 0)
>>> -        goto out;
>>> -
>>>       ret = parse_tree_roots(&normal_trees, &dropping_trees);
>>>       if (ret < 0)
>>>           goto out;
>>> @@ -9574,7 +9570,7 @@ again:
>>>        * If we are extent tree v2 then we can reint the block group 
>>> root as
>>>        * well.
>>>        */
>>> -    if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
>>> +    if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
>>>           ret = btrfs_fsck_reinit_root(trans, 
>>> gfs_info->block_group_root);
>>>           if (ret) {
>>>               fprintf(stderr, "block group initialization failed\n");
>>> diff --git a/common/fsfeatures.c b/common/fsfeatures.c
>>> index 23a92c21a2cc..90704959b13b 100644
>>> --- a/common/fsfeatures.c
>>> +++ b/common/fsfeatures.c
>>> @@ -172,6 +172,14 @@ static const struct btrfs_feature 
>>> runtime_features[] = {
>>>           VERSION_TO_STRING2(safe, 4,9),
>>>           VERSION_TO_STRING2(default, 5,15),
>>>           .desc        = "free space tree (space_cache=v2)"
>>> +    }, {
>>> +        .name        = "block-group-tree",
>>> +        .flag        = BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
>>> +        .sysfs_name = "block_group_tree",
>>> +        VERSION_TO_STRING2(compat, 6,0),
>>> +        VERSION_NULL(safe),
>>> +        VERSION_NULL(default),
>>> +        .desc        = "block group tree to reduce mount time"
>>>       },
>>>       /* Keep this one last */
>>>       {
>>> diff --git a/common/fsfeatures.h b/common/fsfeatures.h
>>> index 9e39c667b900..a8d77fd4da05 100644
>>> --- a/common/fsfeatures.h
>>> +++ b/common/fsfeatures.h
>>> @@ -45,6 +45,8 @@
>>>   #define BTRFS_RUNTIME_FEATURE_QUOTA        (1ULL << 0)
>>>   #define BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE    (1ULL << 1)
>>> +#define BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE    (1ULL << 2)
>>> +
>>>   void btrfs_list_all_fs_features(u64 mask_disallowed);
>>>   void btrfs_list_all_runtime_features(u64 mask_disallowed);
>>> diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
>>> index c12076202577..d8909b3fdf20 100644
>>> --- a/kernel-shared/ctree.h
>>> +++ b/kernel-shared/ctree.h
>>> @@ -479,6 +479,12 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == 
>>> BTRFS_SUPER_INFO_SIZE);
>>>    */
>>>   #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID    (1ULL << 1)
>>> +/*
>>> + * Save all block group items into a dedicated block group tree, to 
>>> greatly
>>> + * reduce mount time for large fs.
>>> + */
>>> +#define BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE    (1ULL << 5)
>>> +
>>>   #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF    (1ULL << 0)
>>>   #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL    (1ULL << 1)
>>>   #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS    (1ULL << 2)
>>> @@ -508,7 +514,8 @@ BUILD_ASSERT(sizeof(struct btrfs_super_block) == 
>>> BTRFS_SUPER_INFO_SIZE);
>>>    */
>>>   #define BTRFS_FEATURE_COMPAT_RO_SUPP            \
>>>       (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |    \
>>> -     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID)
>>> +     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID| \
>>> +     BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE)
>>>   #if EXPERIMENTAL
>>>   #define BTRFS_FEATURE_INCOMPAT_SUPP            \
>>> diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
>>> index 80db5976cc3f..6eeb5ecd1d59 100644
>>> --- a/kernel-shared/disk-io.c
>>> +++ b/kernel-shared/disk-io.c
>>> @@ -1203,7 +1203,7 @@ static int load_important_roots(struct 
>>> btrfs_fs_info *fs_info,
>>>           backup = sb->super_roots + index;
>>>       }
>>> -    if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
>>> +    if (!btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
>>>           free(fs_info->block_group_root);
>>>           fs_info->block_group_root = NULL;
>>>           goto tree_root;
>>> @@ -1256,7 +1256,7 @@ int btrfs_setup_all_roots(struct btrfs_fs_info 
>>> *fs_info, u64 root_tree_bytenr,
>>>       if (ret)
>>>           return ret;
>>> -    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
>>> +    if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
>>>           ret = find_and_setup_root(root, fs_info,
>>>                   BTRFS_BLOCK_GROUP_TREE_OBJECTID,
>>>                   fs_info->block_group_root);
>>> diff --git a/kernel-shared/disk-io.h b/kernel-shared/disk-io.h
>>> index bba97fc1a814..6c8eaa2bd13d 100644
>>> --- a/kernel-shared/disk-io.h
>>> +++ b/kernel-shared/disk-io.h
>>> @@ -232,7 +232,7 @@ int btrfs_global_root_insert(struct btrfs_fs_info 
>>> *fs_info,
>>>   static inline struct btrfs_root *btrfs_block_group_root(
>>>                           struct btrfs_fs_info *fs_info)
>>>   {
>>> -    if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
>>> +    if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE))
>>>           return fs_info->block_group_root;
>>>       return btrfs_extent_root(fs_info, 0);
>>>   }
>>> diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c
>>> index bffe30b405c7..b2ee77c2fb73 100644
>>> --- a/kernel-shared/print-tree.c
>>> +++ b/kernel-shared/print-tree.c
>>> @@ -1668,6 +1668,7 @@ struct readable_flag_entry {
>>>   static struct readable_flag_entry compat_ro_flags_array[] = {
>>>       DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE),
>>>       DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE_VALID),
>>> +    DEF_COMPAT_RO_FLAG_ENTRY(BLOCK_GROUP_TREE),
>>>   };
>>>   static const int compat_ro_flags_num = sizeof(compat_ro_flags_array) /
>>>                          sizeof(struct readable_flag_entry);
>>> @@ -1754,9 +1755,7 @@ static void print_readable_compat_ro_flag(u64 
>>> flag)
>>>        */
>>>       return __print_readable_flag(flag, compat_ro_flags_array,
>>>                        compat_ro_flags_num,
>>> -                     BTRFS_FEATURE_COMPAT_RO_SUPP |
>>> -                     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
>>> -                     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
>>> +                     BTRFS_FEATURE_COMPAT_RO_SUPP);
>>>   }
>>>   static void print_readable_incompat_flag(u64 flag)
>>> diff --git a/mkfs/common.c b/mkfs/common.c
>>> index b72338551dfb..cb616f13ef9b 100644
>>> --- a/mkfs/common.c
>>> +++ b/mkfs/common.c
>>> @@ -75,6 +75,8 @@ static int btrfs_create_tree_root(int fd, struct 
>>> btrfs_mkfs_config *cfg,
>>>       int blk;
>>>       int i;
>>>       u8 uuid[BTRFS_UUID_SIZE];
>>> +    bool block_group_tree = !!(cfg->runtime_features &
>>> +                   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
>>>       memset(buf->data + sizeof(struct btrfs_header), 0,
>>>           cfg->nodesize - sizeof(struct btrfs_header));
>>> @@ -101,6 +103,9 @@ static int btrfs_create_tree_root(int fd, struct 
>>> btrfs_mkfs_config *cfg,
>>>           if (blk == MKFS_ROOT_TREE || blk == MKFS_CHUNK_TREE)
>>>               continue;
>>> +        if (!block_group_tree && blk == MKFS_BLOCK_GROUP_TREE)
>>> +            continue;
>>> +
>>>           btrfs_set_root_bytenr(&root_item, cfg->blocks[blk]);
>>>           btrfs_set_disk_key_objectid(&disk_key,
>>>               reference_root_table[blk]);
>>> @@ -216,7 +221,8 @@ static int create_block_group_tree(int fd, struct 
>>> btrfs_mkfs_config *cfg,
>>>       memset(buf->data + sizeof(struct btrfs_header), 0,
>>>           cfg->nodesize - sizeof(struct btrfs_header));
>>> -    write_block_group_item(buf, 0, bg_offset, bg_size, bg_used, 0,
>>> +    write_block_group_item(buf, 0, bg_offset, bg_size, bg_used,
>>> +                   BTRFS_FIRST_CHUNK_TREE_OBJECTID,
>>>                      cfg->leaf_data_size -
>>>                      sizeof(struct btrfs_block_group_item));
>>>       btrfs_set_header_bytenr(buf, cfg->blocks[MKFS_BLOCK_GROUP_TREE]);
>>> @@ -357,6 +363,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>>> *cfg)
>>>       u32 array_size;
>>>       u32 item_size;
>>>       u64 total_used = 0;
>>> +    u64 ro_flags = 0;
>>>       int skinny_metadata = !!(cfg->features &
>>>                    BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA);
>>>       u64 num_bytes;
>>> @@ -365,6 +372,8 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>>> *cfg)
>>>       bool add_block_group = true;
>>>       bool free_space_tree = !!(cfg->runtime_features &
>>>                     BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE);
>>> +    bool block_group_tree = !!(cfg->runtime_features &
>>> +                   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
>>>       bool extent_tree_v2 = !!(cfg->features &
>>>                    BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2);
>>> @@ -372,8 +381,13 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>>> *cfg)
>>>              sizeof(enum btrfs_mkfs_block) * 
>>> ARRAY_SIZE(default_blocks));
>>>       blocks_nr = ARRAY_SIZE(default_blocks);
>>> -    /* Extent tree v2 needs an extra block for block group tree.*/
>>> -    if (extent_tree_v2) {
>>> +    /*
>>> +     * Add one new block for block group tree.
>>> +     * And for block group tree, we don't need to add block group item
>>> +     * into extent tree, the item will be handled in block group tree
>>> +     * initialization.
>>> +     */
>>> +    if (block_group_tree) {
>>>           mkfs_blocks_add(blocks, &blocks_nr, MKFS_BLOCK_GROUP_TREE);
>>>           add_block_group = false;
>>>       }
>>> @@ -433,12 +447,15 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>>> *cfg)
>>>           btrfs_set_super_cache_generation(&super, -1);
>>>       btrfs_set_super_incompat_flags(&super, cfg->features);
>>>       if (free_space_tree) {
>>> -        u64 ro_flags = BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
>>> -            BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID;
>>> +        ro_flags |= (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
>>> +                 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
>>> -        btrfs_set_super_compat_ro_flags(&super, ro_flags);
>>>           btrfs_set_super_cache_generation(&super, 0);
>>>       }
>>> +    if (block_group_tree)
>>> +        ro_flags |= BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE;
>>> +    btrfs_set_super_compat_ro_flags(&super, ro_flags);
>>> +
>>>       if (extent_tree_v2)
>>>           btrfs_set_super_nr_global_roots(&super, 1);
>>> @@ -695,7 +712,7 @@ int make_btrfs(int fd, struct btrfs_mkfs_config 
>>> *cfg)
>>>               goto out;
>>>       }
>>> -    if (extent_tree_v2) {
>>> +    if (block_group_tree) {
>>>           ret = create_block_group_tree(fd, cfg, buf,
>>>                             system_group_offset,
>>>                             system_group_size, total_used);
>>> diff --git a/mkfs/main.c b/mkfs/main.c
>>> index ce096d362171..518ce0fd7523 100644
>>> --- a/mkfs/main.c
>>> +++ b/mkfs/main.c
>>> @@ -299,7 +299,8 @@ static int recow_roots(struct btrfs_trans_handle 
>>> *trans,
>>>       ret = __recow_root(trans, info->dev_root);
>>>       if (ret)
>>>           return ret;
>>> -        if (btrfs_fs_incompat(info, EXTENT_TREE_V2)) {
>>> +
>>> +    if (btrfs_fs_compat_ro(info, BLOCK_GROUP_TREE)) {
>>>           ret = __recow_root(trans, info->block_group_root);
>>>           if (ret)
>>>               return ret;
>>
diff mbox series

Patch

diff --git a/check/main.c b/check/main.c
index 4f7ab8b29309..02abbd5289f9 100644
--- a/check/main.c
+++ b/check/main.c
@@ -6293,7 +6293,7 @@  static int check_type_with_root(u64 rootid, u8 key_type)
 			goto err;
 		break;
 	case BTRFS_BLOCK_GROUP_ITEM_KEY:
-		if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
+		if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
 			if (rootid != BTRFS_BLOCK_GROUP_TREE_OBJECTID)
 				goto err;
 		} else if (rootid != BTRFS_EXTENT_TREE_OBJECTID) {
@@ -9071,10 +9071,6 @@  again:
 	ret = load_super_root(&normal_trees, gfs_info->chunk_root);
 	if (ret < 0)
 		goto out;
-	ret = load_super_root(&normal_trees, gfs_info->block_group_root);
-	if (ret < 0)
-		goto out;
-
 	ret = parse_tree_roots(&normal_trees, &dropping_trees);
 	if (ret < 0)
 		goto out;
@@ -9574,7 +9570,7 @@  again:
 	 * If we are extent tree v2 then we can reint the block group root as
 	 * well.
 	 */
-	if (btrfs_fs_incompat(gfs_info, EXTENT_TREE_V2)) {
+	if (btrfs_fs_compat_ro(gfs_info, BLOCK_GROUP_TREE)) {
 		ret = btrfs_fsck_reinit_root(trans, gfs_info->block_group_root);
 		if (ret) {
 			fprintf(stderr, "block group initialization failed\n");
diff --git a/common/fsfeatures.c b/common/fsfeatures.c
index 23a92c21a2cc..90704959b13b 100644
--- a/common/fsfeatures.c
+++ b/common/fsfeatures.c
@@ -172,6 +172,14 @@  static const struct btrfs_feature runtime_features[] = {
 		VERSION_TO_STRING2(safe, 4,9),
 		VERSION_TO_STRING2(default, 5,15),
 		.desc		= "free space tree (space_cache=v2)"
+	}, {
+		.name		= "block-group-tree",
+		.flag		= BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE,
+		.sysfs_name = "block_group_tree",
+		VERSION_TO_STRING2(compat, 6,0),
+		VERSION_NULL(safe),
+		VERSION_NULL(default),
+		.desc		= "block group tree to reduce mount time"
 	},
 	/* Keep this one last */
 	{
diff --git a/common/fsfeatures.h b/common/fsfeatures.h
index 9e39c667b900..a8d77fd4da05 100644
--- a/common/fsfeatures.h
+++ b/common/fsfeatures.h
@@ -45,6 +45,8 @@ 
 
 #define BTRFS_RUNTIME_FEATURE_QUOTA		(1ULL << 0)
 #define BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE	(1ULL << 1)
+#define BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE	(1ULL << 2)
+
 
 void btrfs_list_all_fs_features(u64 mask_disallowed);
 void btrfs_list_all_runtime_features(u64 mask_disallowed);
diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index c12076202577..d8909b3fdf20 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -479,6 +479,12 @@  BUILD_ASSERT(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE);
  */
 #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID	(1ULL << 1)
 
+/*
+ * Save all block group items into a dedicated block group tree, to greatly
+ * reduce mount time for large fs.
+ */
+#define BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE	(1ULL << 5)
+
 #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF	(1ULL << 0)
 #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL	(1ULL << 1)
 #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
@@ -508,7 +514,8 @@  BUILD_ASSERT(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE);
  */
 #define BTRFS_FEATURE_COMPAT_RO_SUPP			\
 	(BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |	\
-	 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID)
+	 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID| \
+	 BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE)
 
 #if EXPERIMENTAL
 #define BTRFS_FEATURE_INCOMPAT_SUPP			\
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index 80db5976cc3f..6eeb5ecd1d59 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -1203,7 +1203,7 @@  static int load_important_roots(struct btrfs_fs_info *fs_info,
 		backup = sb->super_roots + index;
 	}
 
-	if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+	if (!btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
 		free(fs_info->block_group_root);
 		fs_info->block_group_root = NULL;
 		goto tree_root;
@@ -1256,7 +1256,7 @@  int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr,
 	if (ret)
 		return ret;
 
-	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+	if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
 		ret = find_and_setup_root(root, fs_info,
 				BTRFS_BLOCK_GROUP_TREE_OBJECTID,
 				fs_info->block_group_root);
diff --git a/kernel-shared/disk-io.h b/kernel-shared/disk-io.h
index bba97fc1a814..6c8eaa2bd13d 100644
--- a/kernel-shared/disk-io.h
+++ b/kernel-shared/disk-io.h
@@ -232,7 +232,7 @@  int btrfs_global_root_insert(struct btrfs_fs_info *fs_info,
 static inline struct btrfs_root *btrfs_block_group_root(
 						struct btrfs_fs_info *fs_info)
 {
-	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
+	if (btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE))
 		return fs_info->block_group_root;
 	return btrfs_extent_root(fs_info, 0);
 }
diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c
index bffe30b405c7..b2ee77c2fb73 100644
--- a/kernel-shared/print-tree.c
+++ b/kernel-shared/print-tree.c
@@ -1668,6 +1668,7 @@  struct readable_flag_entry {
 static struct readable_flag_entry compat_ro_flags_array[] = {
 	DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE),
 	DEF_COMPAT_RO_FLAG_ENTRY(FREE_SPACE_TREE_VALID),
+	DEF_COMPAT_RO_FLAG_ENTRY(BLOCK_GROUP_TREE),
 };
 static const int compat_ro_flags_num = sizeof(compat_ro_flags_array) /
 				       sizeof(struct readable_flag_entry);
@@ -1754,9 +1755,7 @@  static void print_readable_compat_ro_flag(u64 flag)
 	 */
 	return __print_readable_flag(flag, compat_ro_flags_array,
 				     compat_ro_flags_num,
-				     BTRFS_FEATURE_COMPAT_RO_SUPP |
-				     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
-				     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
+				     BTRFS_FEATURE_COMPAT_RO_SUPP);
 }
 
 static void print_readable_incompat_flag(u64 flag)
diff --git a/mkfs/common.c b/mkfs/common.c
index b72338551dfb..cb616f13ef9b 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -75,6 +75,8 @@  static int btrfs_create_tree_root(int fd, struct btrfs_mkfs_config *cfg,
 	int blk;
 	int i;
 	u8 uuid[BTRFS_UUID_SIZE];
+	bool block_group_tree = !!(cfg->runtime_features &
+				   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
 
 	memset(buf->data + sizeof(struct btrfs_header), 0,
 		cfg->nodesize - sizeof(struct btrfs_header));
@@ -101,6 +103,9 @@  static int btrfs_create_tree_root(int fd, struct btrfs_mkfs_config *cfg,
 		if (blk == MKFS_ROOT_TREE || blk == MKFS_CHUNK_TREE)
 			continue;
 
+		if (!block_group_tree && blk == MKFS_BLOCK_GROUP_TREE)
+			continue;
+
 		btrfs_set_root_bytenr(&root_item, cfg->blocks[blk]);
 		btrfs_set_disk_key_objectid(&disk_key,
 			reference_root_table[blk]);
@@ -216,7 +221,8 @@  static int create_block_group_tree(int fd, struct btrfs_mkfs_config *cfg,
 
 	memset(buf->data + sizeof(struct btrfs_header), 0,
 		cfg->nodesize - sizeof(struct btrfs_header));
-	write_block_group_item(buf, 0, bg_offset, bg_size, bg_used, 0,
+	write_block_group_item(buf, 0, bg_offset, bg_size, bg_used,
+			       BTRFS_FIRST_CHUNK_TREE_OBJECTID,
 			       cfg->leaf_data_size -
 			       sizeof(struct btrfs_block_group_item));
 	btrfs_set_header_bytenr(buf, cfg->blocks[MKFS_BLOCK_GROUP_TREE]);
@@ -357,6 +363,7 @@  int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	u32 array_size;
 	u32 item_size;
 	u64 total_used = 0;
+	u64 ro_flags = 0;
 	int skinny_metadata = !!(cfg->features &
 				 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA);
 	u64 num_bytes;
@@ -365,6 +372,8 @@  int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	bool add_block_group = true;
 	bool free_space_tree = !!(cfg->runtime_features &
 				  BTRFS_RUNTIME_FEATURE_FREE_SPACE_TREE);
+	bool block_group_tree = !!(cfg->runtime_features &
+				   BTRFS_RUNTIME_FEATURE_BLOCK_GROUP_TREE);
 	bool extent_tree_v2 = !!(cfg->features &
 				 BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2);
 
@@ -372,8 +381,13 @@  int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 	       sizeof(enum btrfs_mkfs_block) * ARRAY_SIZE(default_blocks));
 	blocks_nr = ARRAY_SIZE(default_blocks);
 
-	/* Extent tree v2 needs an extra block for block group tree.*/
-	if (extent_tree_v2) {
+	/*
+	 * Add one new block for block group tree.
+	 * And for block group tree, we don't need to add block group item
+	 * into extent tree, the item will be handled in block group tree
+	 * initialization.
+	 */
+	if (block_group_tree) {
 		mkfs_blocks_add(blocks, &blocks_nr, MKFS_BLOCK_GROUP_TREE);
 		add_block_group = false;
 	}
@@ -433,12 +447,15 @@  int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 		btrfs_set_super_cache_generation(&super, -1);
 	btrfs_set_super_incompat_flags(&super, cfg->features);
 	if (free_space_tree) {
-		u64 ro_flags = BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
-			BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID;
+		ro_flags |= (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |
+			     BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID);
 
-		btrfs_set_super_compat_ro_flags(&super, ro_flags);
 		btrfs_set_super_cache_generation(&super, 0);
 	}
+	if (block_group_tree)
+		ro_flags |= BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE;
+	btrfs_set_super_compat_ro_flags(&super, ro_flags);
+
 	if (extent_tree_v2)
 		btrfs_set_super_nr_global_roots(&super, 1);
 
@@ -695,7 +712,7 @@  int make_btrfs(int fd, struct btrfs_mkfs_config *cfg)
 			goto out;
 	}
 
-	if (extent_tree_v2) {
+	if (block_group_tree) {
 		ret = create_block_group_tree(fd, cfg, buf,
 					      system_group_offset,
 					      system_group_size, total_used);
diff --git a/mkfs/main.c b/mkfs/main.c
index ce096d362171..518ce0fd7523 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -299,7 +299,8 @@  static int recow_roots(struct btrfs_trans_handle *trans,
 	ret = __recow_root(trans, info->dev_root);
 	if (ret)
 		return ret;
-        if (btrfs_fs_incompat(info, EXTENT_TREE_V2)) {
+
+	if (btrfs_fs_compat_ro(info, BLOCK_GROUP_TREE)) {
 		ret = __recow_root(trans, info->block_group_root);
 		if (ret)
 			return ret;