diff mbox series

btrfs-progs: Drop the type check in init_alloc_chunk_ctl_policy_regular

Message ID 20210809182613.4466-1-mpdesouza@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs-progs: Drop the type check in init_alloc_chunk_ctl_policy_regular | expand

Commit Message

Marcos Paulo de Souza Aug. 9, 2021, 6:26 p.m. UTC
[PROBLEM]
Our documentation says that a DATA block group can have up to 1G of
size, or the at most 10% of the filesystem size. Currently, by default,
mkfs.btrfs is creating an initial DATA block group of 8M of size,
regardless of the filesystem size. It happens because we check for the
ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the
BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE
(default) DATA block group:

$ mkfs.btrfs -f /storage/btrfs.disk
btrfs-progs v4.19.1
See http://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               39e3492f-41f2-4bd7-8c25-93032606b9fe
Node size:          16384
Sector size:        4096
Filesystem size:    55.00GiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         DUP               1.00GiB
  System:           DUP               8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1    55.00GiB  /storage/btrfs.disk

In this case, for single data profile, it should create a data block
group of 1G, since the filesystem if bigger than 50G.

[FIX]
Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in
init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is
later on assigned to ctl.num_bytes, which is used by
btrfs_make_block_group to create the block group.

By removing the check we allow the code to always configure the correct
stripe_size regardless of the PROFILE, looking on the block group TYPE.

With the fix applied, it now created the BG correctly:

$ ./mkfs.btrfs -f /storage/btrfs.disk
btrfs-progs v5.10.1
See http://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               5145e343-5639-462d-82ee-3eb75dc41c31
Node size:          16384
Sector size:        4096
Filesystem size:    55.00GiB
Block group profiles:
  Data:             single            1.00GiB
  Metadata:         DUP               1.00GiB
  System:           DUP               8.00MiB
SSD detected:       no
Zoned device:       no
Incompat features:  extref, skinny-metadata
Runtime features:
Checksum:           crc32c
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1    55.00GiB  /storage/btrfs.disk

Using a disk >50G it creates a 1G single data block group. Another
example:

$ ./mkfs.btrfs -f /storage/btrfs2.disk
btrfs-progs v5.10.1
See http://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               c0910857-e512-4e76-9efa-50a475aafc87
Node size:          16384
Sector size:        4096
Filesystem size:    5.00GiB
Block group profiles:
  Data:             single          512.00MiB
  Metadata:         DUP             256.00MiB
  System:           DUP               8.00MiB
SSD detected:       no
Zoned device:       no
Incompat features:  extref, skinny-metadata
Runtime features:
Checksum:           crc32c
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1     5.00GiB  /storage/btrfs2.disk

The code now created a single data block group of 512M, which is exactly
10% of the size of the filesystem, which is 5G in this case.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---

 This change mimics what the kernel currently does, which is set the stripe_size
 regardless of the profile. Any thoughts on it? Thanks!

 kernel-shared/volumes.c | 40 +++++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

Comments

Nikolay Borisov Aug. 9, 2021, 6:44 p.m. UTC | #1
On 9.08.21 г. 21:26, Marcos Paulo de Souza wrote:
> [PROBLEM]
> Our documentation says that a DATA block group can have up to 1G of
> size, or the at most 10% of the filesystem size. Currently, by default,
> mkfs.btrfs is creating an initial DATA block group of 8M of size,
> regardless of the filesystem size. It happens because we check for the
> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the
> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE
> (default) DATA block group:
> 
> $ mkfs.btrfs -f /storage/btrfs.disk
> btrfs-progs v4.19.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:              (null)
> UUID:               39e3492f-41f2-4bd7-8c25-93032606b9fe
> Node size:          16384
> Sector size:        4096
> Filesystem size:    55.00GiB
> Block group profiles:
>   Data:             single            8.00MiB
>   Metadata:         DUP               1.00GiB
>   System:           DUP               8.00MiB
> SSD detected:       no
> Incompat features:  extref, skinny-metadata
> Number of devices:  1
> Devices:
>    ID        SIZE  PATH
>     1    55.00GiB  /storage/btrfs.disk
> 
> In this case, for single data profile, it should create a data block
> group of 1G, since the filesystem if bigger than 50G.
> 
> [FIX]
> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in
> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is
> later on assigned to ctl.num_bytes, which is used by
> btrfs_make_block_group to create the block group.
> 
> By removing the check we allow the code to always configure the correct
> stripe_size regardless of the PROFILE, looking on the block group TYPE.
> 
> With the fix applied, it now created the BG correctly:
> 
> $ ./mkfs.btrfs -f /storage/btrfs.disk
> btrfs-progs v5.10.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:              (null)
> UUID:               5145e343-5639-462d-82ee-3eb75dc41c31
> Node size:          16384
> Sector size:        4096
> Filesystem size:    55.00GiB
> Block group profiles:
>   Data:             single            1.00GiB
>   Metadata:         DUP               1.00GiB
>   System:           DUP               8.00MiB
> SSD detected:       no
> Zoned device:       no
> Incompat features:  extref, skinny-metadata
> Runtime features:
> Checksum:           crc32c
> Number of devices:  1
> Devices:
>    ID        SIZE  PATH
>     1    55.00GiB  /storage/btrfs.disk
> 
> Using a disk >50G it creates a 1G single data block group. Another
> example:
> 
> $ ./mkfs.btrfs -f /storage/btrfs2.disk
> btrfs-progs v5.10.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:              (null)
> UUID:               c0910857-e512-4e76-9efa-50a475aafc87
> Node size:          16384
> Sector size:        4096
> Filesystem size:    5.00GiB
> Block group profiles:
>   Data:             single          512.00MiB
>   Metadata:         DUP             256.00MiB
>   System:           DUP               8.00MiB
> SSD detected:       no
> Zoned device:       no
> Incompat features:  extref, skinny-metadata
> Runtime features:
> Checksum:           crc32c
> Number of devices:  1
> Devices:
>    ID        SIZE  PATH
>     1     5.00GiB  /storage/btrfs2.disk
> 
> The code now created a single data block group of 512M, which is exactly
> 10% of the size of the filesystem, which is 5G in this case.
> 
> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

I see no reason why we should care about BTRFS_BLOCK_GROUP_PROFILE_MASK
when creating the chunks. Without this patch ctl's various member are
being initialized to their defaults in init_alloc_chunk_ctl,
subsequently in create_chunk, chunk_bytes_by_type returns
return stripe_size * ctl->num_stripes; which is really SZ_8M * 1.

> ---
> 
>  This change mimics what the kernel currently does, which is set the stripe_size
>  regardless of the profile. Any thoughts on it? Thanks!
> 
>  kernel-shared/volumes.c | 40 +++++++++++++++++++---------------------
>  1 file changed, 19 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
> index aeeb25fe..3677dd7c 100644
> --- a/kernel-shared/volumes.c
> +++ b/kernel-shared/volumes.c
> @@ -1105,27 +1105,25 @@ static void init_alloc_chunk_ctl_policy_regular(struct btrfs_fs_info *info,
>  	u64 type = ctl->type;
>  	u64 percent_max;
>  
> -	if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
> -		if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
> -			ctl->stripe_size = SZ_8M;
> -			ctl->max_chunk_size = ctl->stripe_size * 2;
> -			ctl->min_stripe_size = SZ_1M;
> -			ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK;
> -		} else if (type & BTRFS_BLOCK_GROUP_DATA) {
> -			ctl->stripe_size = SZ_1G;
> -			ctl->max_chunk_size = 10 * ctl->stripe_size;
> -			ctl->min_stripe_size = SZ_64M;
> -			ctl->max_stripes = BTRFS_MAX_DEVS(info);
> -		} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
> -			/* For larger filesystems, use larger metadata chunks */
> -			if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G)
> -				ctl->max_chunk_size = SZ_1G;
> -			else
> -				ctl->max_chunk_size = SZ_256M;
> -			ctl->stripe_size = ctl->max_chunk_size;
> -			ctl->min_stripe_size = SZ_32M;
> -			ctl->max_stripes = BTRFS_MAX_DEVS(info);
> -		}
> +	if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
> +		ctl->stripe_size = SZ_8M;
> +		ctl->max_chunk_size = ctl->stripe_size * 2;
> +		ctl->min_stripe_size = SZ_1M;
> +		ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK;
> +	} else if (type & BTRFS_BLOCK_GROUP_DATA) {
> +		ctl->stripe_size = SZ_1G;
> +		ctl->max_chunk_size = 10 * ctl->stripe_size;
> +		ctl->min_stripe_size = SZ_64M;
> +		ctl->max_stripes = BTRFS_MAX_DEVS(info);
> +	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
> +		/* For larger filesystems, use larger metadata chunks */
> +		if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G)
> +			ctl->max_chunk_size = SZ_1G;
> +		else
> +			ctl->max_chunk_size = SZ_256M;
> +		ctl->stripe_size = ctl->max_chunk_size;
> +		ctl->min_stripe_size = SZ_32M;
> +		ctl->max_stripes = BTRFS_MAX_DEVS(info);
>  	}
>  
>  	/* We don't want a chunk larger than 10% of the FS */
>
David Sterba Aug. 17, 2021, 1:24 p.m. UTC | #2
On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote:
> [PROBLEM]
> Our documentation says that a DATA block group can have up to 1G of
> size, or the at most 10% of the filesystem size. Currently, by default,
> mkfs.btrfs is creating an initial DATA block group of 8M of size,
> regardless of the filesystem size. It happens because we check for the
> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the
> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE
> (default) DATA block group:
> 
> $ mkfs.btrfs -f /storage/btrfs.disk
> btrfs-progs v4.19.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:              (null)
> UUID:               39e3492f-41f2-4bd7-8c25-93032606b9fe
> Node size:          16384
> Sector size:        4096
> Filesystem size:    55.00GiB
> Block group profiles:
>   Data:             single            8.00MiB
>   Metadata:         DUP               1.00GiB
>   System:           DUP               8.00MiB
> SSD detected:       no
> Incompat features:  extref, skinny-metadata
> Number of devices:  1
> Devices:
>    ID        SIZE  PATH
>     1    55.00GiB  /storage/btrfs.disk
> 
> In this case, for single data profile, it should create a data block
> group of 1G, since the filesystem if bigger than 50G.
> 
> [FIX]
> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in
> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is
> later on assigned to ctl.num_bytes, which is used by
> btrfs_make_block_group to create the block group.
> 
> By removing the check we allow the code to always configure the correct
> stripe_size regardless of the PROFILE, looking on the block group TYPE.
> 
> With the fix applied, it now created the BG correctly:
> 
> $ ./mkfs.btrfs -f /storage/btrfs.disk
> btrfs-progs v5.10.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:              (null)
> UUID:               5145e343-5639-462d-82ee-3eb75dc41c31
> Node size:          16384
> Sector size:        4096
> Filesystem size:    55.00GiB
> Block group profiles:
>   Data:             single            1.00GiB
>   Metadata:         DUP               1.00GiB
>   System:           DUP               8.00MiB
> SSD detected:       no
> Zoned device:       no
> Incompat features:  extref, skinny-metadata
> Runtime features:
> Checksum:           crc32c
> Number of devices:  1
> Devices:
>    ID        SIZE  PATH
>     1    55.00GiB  /storage/btrfs.disk
> 
> Using a disk >50G it creates a 1G single data block group. Another
> example:
> 
> $ ./mkfs.btrfs -f /storage/btrfs2.disk
> btrfs-progs v5.10.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:              (null)
> UUID:               c0910857-e512-4e76-9efa-50a475aafc87
> Node size:          16384
> Sector size:        4096
> Filesystem size:    5.00GiB
> Block group profiles:
>   Data:             single          512.00MiB
>   Metadata:         DUP             256.00MiB
>   System:           DUP               8.00MiB
> SSD detected:       no
> Zoned device:       no
> Incompat features:  extref, skinny-metadata
> Runtime features:
> Checksum:           crc32c
> Number of devices:  1
> Devices:
>    ID        SIZE  PATH
>     1     5.00GiB  /storage/btrfs2.disk
> 
> The code now created a single data block group of 512M, which is exactly
> 10% of the size of the filesystem, which is 5G in this case.
> 
> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> ---
> 
>  This change mimics what the kernel currently does, which is set the stripe_size
>  regardless of the profile. Any thoughts on it? Thanks!

Makes sense to unify that, it works well for the large sizes. Please
write tests that verify that the chunk sizes are correct after mkfs on
various device sizes. Patch added to devel, thanks.
Anand Jain Aug. 18, 2021, 4:02 a.m. UTC | #3
On 17/08/2021 21:24, David Sterba wrote:
> On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote:
>> [PROBLEM]
>> Our documentation says that a DATA block group can have up to 1G of
>> size, or the at most 10% of the filesystem size. Currently, by default,
>> mkfs.btrfs is creating an initial DATA block group of 8M of size,
>> regardless of the filesystem size. It happens because we check for the
>> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the
>> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE
>> (default) DATA block group:
>>
>> $ mkfs.btrfs -f /storage/btrfs.disk
>> btrfs-progs v4.19.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Label:              (null)
>> UUID:               39e3492f-41f2-4bd7-8c25-93032606b9fe
>> Node size:          16384
>> Sector size:        4096
>> Filesystem size:    55.00GiB
>> Block group profiles:
>>    Data:             single            8.00MiB
>>    Metadata:         DUP               1.00GiB
>>    System:           DUP               8.00MiB
>> SSD detected:       no
>> Incompat features:  extref, skinny-metadata
>> Number of devices:  1
>> Devices:
>>     ID        SIZE  PATH
>>      1    55.00GiB  /storage/btrfs.disk
>>
>> In this case, for single data profile, it should create a data block
>> group of 1G, since the filesystem if bigger than 50G.
>>
>> [FIX]
>> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in
>> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is
>> later on assigned to ctl.num_bytes, which is used by
>> btrfs_make_block_group to create the block group.
>>
>> By removing the check we allow the code to always configure the correct
>> stripe_size regardless of the PROFILE, looking on the block group TYPE.
>>
>> With the fix applied, it now created the BG correctly:
>>
>> $ ./mkfs.btrfs -f /storage/btrfs.disk
>> btrfs-progs v5.10.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Label:              (null)
>> UUID:               5145e343-5639-462d-82ee-3eb75dc41c31
>> Node size:          16384
>> Sector size:        4096
>> Filesystem size:    55.00GiB
>> Block group profiles:
>>    Data:             single            1.00GiB
>>    Metadata:         DUP               1.00GiB
>>    System:           DUP               8.00MiB
>> SSD detected:       no
>> Zoned device:       no
>> Incompat features:  extref, skinny-metadata
>> Runtime features:
>> Checksum:           crc32c
>> Number of devices:  1
>> Devices:
>>     ID        SIZE  PATH
>>      1    55.00GiB  /storage/btrfs.disk
>>
>> Using a disk >50G it creates a 1G single data block group. Another
>> example:
>>
>> $ ./mkfs.btrfs -f /storage/btrfs2.disk
>> btrfs-progs v5.10.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Label:              (null)
>> UUID:               c0910857-e512-4e76-9efa-50a475aafc87
>> Node size:          16384
>> Sector size:        4096
>> Filesystem size:    5.00GiB
>> Block group profiles:
>>    Data:             single          512.00MiB
>>    Metadata:         DUP             256.00MiB
>>    System:           DUP               8.00MiB
>> SSD detected:       no
>> Zoned device:       no
>> Incompat features:  extref, skinny-metadata
>> Runtime features:
>> Checksum:           crc32c
>> Number of devices:  1
>> Devices:
>>     ID        SIZE  PATH
>>      1     5.00GiB  /storage/btrfs2.disk
>>
>> The code now created a single data block group of 512M, which is exactly
>> 10% of the size of the filesystem, which is 5G in this case.
>>
>> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
>> ---
>>
>>   This change mimics what the kernel currently does, which is set the stripe_size
>>   regardless of the profile. Any thoughts on it? Thanks!
> 


> Makes sense to unify that,

  Patch [1] also said the same thing 1.5years back.
  Probably in a wrong context and timing? ;-)

  [1]
 
https://patchwork.kernel.org/project/linux-btrfs/patch/1583926429-28485-1-git-send-email-anand.jain@oracle.com/

-Anand

 >  it works well for the large sizes.
> Please
> write tests that verify that the chunk sizes are correct after mkfs on
> various device sizes. Patch added to devel, thanks.
>
Qu Wenruo Aug. 18, 2021, 5:17 a.m. UTC | #4
On 2021/8/17 下午9:24, David Sterba wrote:
> On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote:
>> [PROBLEM]
>> Our documentation says that a DATA block group can have up to 1G of
>> size, or the at most 10% of the filesystem size. Currently, by default,
>> mkfs.btrfs is creating an initial DATA block group of 8M of size,
>> regardless of the filesystem size. It happens because we check for the
>> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the
>> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE
>> (default) DATA block group:
>>
>> $ mkfs.btrfs -f /storage/btrfs.disk
>> btrfs-progs v4.19.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Label:              (null)
>> UUID:               39e3492f-41f2-4bd7-8c25-93032606b9fe
>> Node size:          16384
>> Sector size:        4096
>> Filesystem size:    55.00GiB
>> Block group profiles:
>>    Data:             single            8.00MiB
>>    Metadata:         DUP               1.00GiB
>>    System:           DUP               8.00MiB
>> SSD detected:       no
>> Incompat features:  extref, skinny-metadata
>> Number of devices:  1
>> Devices:
>>     ID        SIZE  PATH
>>      1    55.00GiB  /storage/btrfs.disk
>>
>> In this case, for single data profile, it should create a data block
>> group of 1G, since the filesystem if bigger than 50G.
>>
>> [FIX]
>> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in
>> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is
>> later on assigned to ctl.num_bytes, which is used by
>> btrfs_make_block_group to create the block group.
>>
>> By removing the check we allow the code to always configure the correct
>> stripe_size regardless of the PROFILE, looking on the block group TYPE.
>>
>> With the fix applied, it now created the BG correctly:
>>
>> $ ./mkfs.btrfs -f /storage/btrfs.disk
>> btrfs-progs v5.10.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Label:              (null)
>> UUID:               5145e343-5639-462d-82ee-3eb75dc41c31
>> Node size:          16384
>> Sector size:        4096
>> Filesystem size:    55.00GiB
>> Block group profiles:
>>    Data:             single            1.00GiB
>>    Metadata:         DUP               1.00GiB
>>    System:           DUP               8.00MiB
>> SSD detected:       no
>> Zoned device:       no
>> Incompat features:  extref, skinny-metadata
>> Runtime features:
>> Checksum:           crc32c
>> Number of devices:  1
>> Devices:
>>     ID        SIZE  PATH
>>      1    55.00GiB  /storage/btrfs.disk
>>
>> Using a disk >50G it creates a 1G single data block group. Another
>> example:
>>
>> $ ./mkfs.btrfs -f /storage/btrfs2.disk
>> btrfs-progs v5.10.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Label:              (null)
>> UUID:               c0910857-e512-4e76-9efa-50a475aafc87
>> Node size:          16384
>> Sector size:        4096
>> Filesystem size:    5.00GiB
>> Block group profiles:
>>    Data:             single          512.00MiB
>>    Metadata:         DUP             256.00MiB
>>    System:           DUP               8.00MiB
>> SSD detected:       no
>> Zoned device:       no
>> Incompat features:  extref, skinny-metadata
>> Runtime features:
>> Checksum:           crc32c
>> Number of devices:  1
>> Devices:
>>     ID        SIZE  PATH
>>      1     5.00GiB  /storage/btrfs2.disk
>>
>> The code now created a single data block group of 512M, which is exactly
>> 10% of the size of the filesystem, which is 5G in this case.
>>
>> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
>> ---
>>
>>   This change mimics what the kernel currently does, which is set the stripe_size
>>   regardless of the profile. Any thoughts on it? Thanks!
>
> Makes sense to unify that, it works well for the large sizes. Please
> write tests that verify that the chunk sizes are correct after mkfs on
> various device sizes. Patch added to devel, thanks.
>

It in fact makes fsck/025 to fail, bisection points to this patch
surprisingly.

Now "mkfs.btrfs -f" on a 128M file will just fail.

This looks like a big problem to me though...

Thanks,
Qu
Anand Jain Aug. 18, 2021, 5:27 a.m. UTC | #5
On 10/08/2021 02:26, Marcos Paulo de Souza wrote:
> [PROBLEM]
> Our documentation says that a DATA block group can have up to 1G of
> size, or the at most 10% of the filesystem size. Currently, by default,
> mkfs.btrfs is creating an initial DATA block group of 8M of size,
> regardless of the filesystem size. It happens because we check for the
> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the
> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE
> (default) DATA block group:


> $ mkfs.btrfs -f /storage/btrfs.disk
> btrfs-progs v4.19.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:              (null)
> UUID:               39e3492f-41f2-4bd7-8c25-93032606b9fe
> Node size:          16384
> Sector size:        4096
> Filesystem size:    55.00GiB
> Block group profiles:
>    Data:             single            8.00MiB
>    Metadata:         DUP               1.00GiB
>    System:           DUP               8.00MiB
> SSD detected:       no
> Incompat features:  extref, skinny-metadata
> Number of devices:  1
> Devices:
>     ID        SIZE  PATH
>      1    55.00GiB  /storage/btrfs.disk
> 
> In this case, for single data profile, it should create a data block
> group of 1G, since the filesystem if bigger than 50G.
> 
> [FIX]
> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in
> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is
> later on assigned to ctl.num_bytes, which is used by
> btrfs_make_block_group to create the block group.
> 
> By removing the check we allow the code to always configure the correct
> stripe_size regardless of the PROFILE, looking on the block group TYPE.
> 
> With the fix applied, it now created the BG correctly:
> 
> $ ./mkfs.btrfs -f /storage/btrfs.disk
> btrfs-progs v5.10.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:              (null)
> UUID:               5145e343-5639-462d-82ee-3eb75dc41c31
> Node size:          16384
> Sector size:        4096
> Filesystem size:    55.00GiB
> Block group profiles:
>    Data:             single            1.00GiB
>    Metadata:         DUP               1.00GiB
>    System:           DUP               8.00MiB
> SSD detected:       no
> Zoned device:       no
> Incompat features:  extref, skinny-metadata
> Runtime features:
> Checksum:           crc32c
> Number of devices:  1
> Devices:
>     ID        SIZE  PATH
>      1    55.00GiB  /storage/btrfs.disk
> 
> Using a disk >50G it creates a 1G single data block group. Another
> example:
> 
> $ ./mkfs.btrfs -f /storage/btrfs2.disk
> btrfs-progs v5.10.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:              (null)
> UUID:               c0910857-e512-4e76-9efa-50a475aafc87
> Node size:          16384
> Sector size:        4096
> Filesystem size:    5.00GiB
> Block group profiles:
>    Data:             single          512.00MiB
>    Metadata:         DUP             256.00MiB
>    System:           DUP               8.00MiB
> SSD detected:       no
> Zoned device:       no
> Incompat features:  extref, skinny-metadata
> Runtime features:
> Checksum:           crc32c
> Number of devices:  1
> Devices:
>     ID        SIZE  PATH
>      1     5.00GiB  /storage/btrfs2.disk
> 
> The code now created a single data block group of 512M, which is exactly
> 10% of the size of the filesystem, which is 5G in this case.
> 
> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>

Looks good to me.

I hope there isn't any xfstests/tests/btrfs test case that is hardcoded
to the older block group alloc (like swap file test with relocation?).
But then the test case will need a fix, not btrfs-progs. So

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

> ---
> 
>   This change mimics what the kernel currently does, which is set the stripe_size
>   regardless of the profile. Any thoughts on it? Thanks!
> 
>   kernel-shared/volumes.c | 40 +++++++++++++++++++---------------------
>   1 file changed, 19 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
> index aeeb25fe..3677dd7c 100644
> --- a/kernel-shared/volumes.c
> +++ b/kernel-shared/volumes.c
> @@ -1105,27 +1105,25 @@ static void init_alloc_chunk_ctl_policy_regular(struct btrfs_fs_info *info,
>   	u64 type = ctl->type;
>   	u64 percent_max;
>   
> -	if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
> -		if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
> -			ctl->stripe_size = SZ_8M;
> -			ctl->max_chunk_size = ctl->stripe_size * 2;
> -			ctl->min_stripe_size = SZ_1M;
> -			ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK;
> -		} else if (type & BTRFS_BLOCK_GROUP_DATA) {
> -			ctl->stripe_size = SZ_1G;
> -			ctl->max_chunk_size = 10 * ctl->stripe_size;
> -			ctl->min_stripe_size = SZ_64M;
> -			ctl->max_stripes = BTRFS_MAX_DEVS(info);
> -		} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
> -			/* For larger filesystems, use larger metadata chunks */
> -			if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G)
> -				ctl->max_chunk_size = SZ_1G;
> -			else
> -				ctl->max_chunk_size = SZ_256M;
> -			ctl->stripe_size = ctl->max_chunk_size;
> -			ctl->min_stripe_size = SZ_32M;
> -			ctl->max_stripes = BTRFS_MAX_DEVS(info);
> -		}
> +	if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
> +		ctl->stripe_size = SZ_8M;
> +		ctl->max_chunk_size = ctl->stripe_size * 2;
> +		ctl->min_stripe_size = SZ_1M;
> +		ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK;
> +	} else if (type & BTRFS_BLOCK_GROUP_DATA) {
> +		ctl->stripe_size = SZ_1G;
> +		ctl->max_chunk_size = 10 * ctl->stripe_size;
> +		ctl->min_stripe_size = SZ_64M;
> +		ctl->max_stripes = BTRFS_MAX_DEVS(info);
> +	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
> +		/* For larger filesystems, use larger metadata chunks */
> +		if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G)
> +			ctl->max_chunk_size = SZ_1G;
> +		else
> +			ctl->max_chunk_size = SZ_256M;
> +		ctl->stripe_size = ctl->max_chunk_size;
> +		ctl->min_stripe_size = SZ_32M;
> +		ctl->max_stripes = BTRFS_MAX_DEVS(info);
>   	}
>   
>   	/* We don't want a chunk larger than 10% of the FS */
>
Qu Wenruo Aug. 18, 2021, 5:34 a.m. UTC | #6
On 2021/8/18 下午1:17, Qu Wenruo wrote:
>
>
> On 2021/8/17 下午9:24, David Sterba wrote:
>> On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote:
>>> [PROBLEM]
>>> Our documentation says that a DATA block group can have up to 1G of
>>> size, or the at most 10% of the filesystem size. Currently, by default,
>>> mkfs.btrfs is creating an initial DATA block group of 8M of size,
>>> regardless of the filesystem size. It happens because we check for the
>>> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the
>>> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE
>>> (default) DATA block group:
>>>
>>> $ mkfs.btrfs -f /storage/btrfs.disk
>>> btrfs-progs v4.19.1
>>> See http://btrfs.wiki.kernel.org for more information.
>>>
>>> Label:              (null)
>>> UUID:               39e3492f-41f2-4bd7-8c25-93032606b9fe
>>> Node size:          16384
>>> Sector size:        4096
>>> Filesystem size:    55.00GiB
>>> Block group profiles:
>>>    Data:             single            8.00MiB
>>>    Metadata:         DUP               1.00GiB
>>>    System:           DUP               8.00MiB
>>> SSD detected:       no
>>> Incompat features:  extref, skinny-metadata
>>> Number of devices:  1
>>> Devices:
>>>     ID        SIZE  PATH
>>>      1    55.00GiB  /storage/btrfs.disk
>>>
>>> In this case, for single data profile, it should create a data block
>>> group of 1G, since the filesystem if bigger than 50G.
>>>
>>> [FIX]
>>> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in
>>> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is
>>> later on assigned to ctl.num_bytes, which is used by
>>> btrfs_make_block_group to create the block group.
>>>
>>> By removing the check we allow the code to always configure the correct
>>> stripe_size regardless of the PROFILE, looking on the block group TYPE.
>>>
>>> With the fix applied, it now created the BG correctly:
>>>
>>> $ ./mkfs.btrfs -f /storage/btrfs.disk
>>> btrfs-progs v5.10.1
>>> See http://btrfs.wiki.kernel.org for more information.
>>>
>>> Label:              (null)
>>> UUID:               5145e343-5639-462d-82ee-3eb75dc41c31
>>> Node size:          16384
>>> Sector size:        4096
>>> Filesystem size:    55.00GiB
>>> Block group profiles:
>>>    Data:             single            1.00GiB
>>>    Metadata:         DUP               1.00GiB
>>>    System:           DUP               8.00MiB
>>> SSD detected:       no
>>> Zoned device:       no
>>> Incompat features:  extref, skinny-metadata
>>> Runtime features:
>>> Checksum:           crc32c
>>> Number of devices:  1
>>> Devices:
>>>     ID        SIZE  PATH
>>>      1    55.00GiB  /storage/btrfs.disk
>>>
>>> Using a disk >50G it creates a 1G single data block group. Another
>>> example:
>>>
>>> $ ./mkfs.btrfs -f /storage/btrfs2.disk
>>> btrfs-progs v5.10.1
>>> See http://btrfs.wiki.kernel.org for more information.
>>>
>>> Label:              (null)
>>> UUID:               c0910857-e512-4e76-9efa-50a475aafc87
>>> Node size:          16384
>>> Sector size:        4096
>>> Filesystem size:    5.00GiB
>>> Block group profiles:
>>>    Data:             single          512.00MiB
>>>    Metadata:         DUP             256.00MiB
>>>    System:           DUP               8.00MiB
>>> SSD detected:       no
>>> Zoned device:       no
>>> Incompat features:  extref, skinny-metadata
>>> Runtime features:
>>> Checksum:           crc32c
>>> Number of devices:  1
>>> Devices:
>>>     ID        SIZE  PATH
>>>      1     5.00GiB  /storage/btrfs2.disk
>>>
>>> The code now created a single data block group of 512M, which is exactly
>>> 10% of the size of the filesystem, which is 5G in this case.
>>>
>>> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
>>> ---
>>>
>>>   This change mimics what the kernel currently does, which is set the
>>> stripe_size
>>>   regardless of the profile. Any thoughts on it? Thanks!
>>
>> Makes sense to unify that, it works well for the large sizes. Please
>> write tests that verify that the chunk sizes are correct after mkfs on
>> various device sizes. Patch added to devel, thanks.
>>
>
> It in fact makes fsck/025 to fail, bisection points to this patch
> surprisingly.
>
> Now "mkfs.btrfs -f" on a 128M file will just fail.
>
> This looks like a big problem to me though...

The cause here is that, we changed the minimal stripe size for SINGLE
profile.

The old code for profile SINGLE will not go through any of the type
specific limits, and falls back to use the default limits, which has
only 1MiB minimal stripe limit.

While the new code changes the minimal stripe length to 64MiB.

Furthermore, currently mkfs using default profiles will lead to the
following chunk layout:

   Data:             single            8.00MiB
   Metadata:         DUP              32.00MiB
   System:           DUP               8.00MiB

This means, System + Metadata will eat up 80MiB already (40MiB x2 caused
by DUP profile). leaving only 48 MiB space left for data, which is
smaller than the 64MiB minimal stripe requirement.

In fact, this behavior is still not the same as kernel, as kernel can
create 16MiB SINGLE data chunk without any problem, so the 64MiB limit
is not correct either.

Since we're here, I hope we can work on the initial chunk size with
better flexibility, not creating too large metadata chunk which easily
eat up half of the space.

Thanks,
Qu
>
> Thanks,
> Qu
Qu Wenruo Aug. 18, 2021, 5:36 a.m. UTC | #7
On 2021/8/18 下午1:27, Anand Jain wrote:
> On 10/08/2021 02:26, Marcos Paulo de Souza wrote:
>> [PROBLEM]
>> Our documentation says that a DATA block group can have up to 1G of
>> size, or the at most 10% of the filesystem size. Currently, by default,
>> mkfs.btrfs is creating an initial DATA block group of 8M of size,
>> regardless of the filesystem size. It happens because we check for the
>> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the
>> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE
>> (default) DATA block group:
> 
> 
>> $ mkfs.btrfs -f /storage/btrfs.disk
>> btrfs-progs v4.19.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Label:              (null)
>> UUID:               39e3492f-41f2-4bd7-8c25-93032606b9fe
>> Node size:          16384
>> Sector size:        4096
>> Filesystem size:    55.00GiB
>> Block group profiles:
>>    Data:             single            8.00MiB
>>    Metadata:         DUP               1.00GiB
>>    System:           DUP               8.00MiB
>> SSD detected:       no
>> Incompat features:  extref, skinny-metadata
>> Number of devices:  1
>> Devices:
>>     ID        SIZE  PATH
>>      1    55.00GiB  /storage/btrfs.disk
>>
>> In this case, for single data profile, it should create a data block
>> group of 1G, since the filesystem if bigger than 50G.
>>
>> [FIX]
>> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in
>> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is
>> later on assigned to ctl.num_bytes, which is used by
>> btrfs_make_block_group to create the block group.
>>
>> By removing the check we allow the code to always configure the correct
>> stripe_size regardless of the PROFILE, looking on the block group TYPE.
>>
>> With the fix applied, it now created the BG correctly:
>>
>> $ ./mkfs.btrfs -f /storage/btrfs.disk
>> btrfs-progs v5.10.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Label:              (null)
>> UUID:               5145e343-5639-462d-82ee-3eb75dc41c31
>> Node size:          16384
>> Sector size:        4096
>> Filesystem size:    55.00GiB
>> Block group profiles:
>>    Data:             single            1.00GiB
>>    Metadata:         DUP               1.00GiB
>>    System:           DUP               8.00MiB
>> SSD detected:       no
>> Zoned device:       no
>> Incompat features:  extref, skinny-metadata
>> Runtime features:
>> Checksum:           crc32c
>> Number of devices:  1
>> Devices:
>>     ID        SIZE  PATH
>>      1    55.00GiB  /storage/btrfs.disk
>>
>> Using a disk >50G it creates a 1G single data block group. Another
>> example:
>>
>> $ ./mkfs.btrfs -f /storage/btrfs2.disk
>> btrfs-progs v5.10.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Label:              (null)
>> UUID:               c0910857-e512-4e76-9efa-50a475aafc87
>> Node size:          16384
>> Sector size:        4096
>> Filesystem size:    5.00GiB
>> Block group profiles:
>>    Data:             single          512.00MiB
>>    Metadata:         DUP             256.00MiB
>>    System:           DUP               8.00MiB
>> SSD detected:       no
>> Zoned device:       no
>> Incompat features:  extref, skinny-metadata
>> Runtime features:
>> Checksum:           crc32c
>> Number of devices:  1
>> Devices:
>>     ID        SIZE  PATH
>>      1     5.00GiB  /storage/btrfs2.disk
>>
>> The code now created a single data block group of 512M, which is exactly
>> 10% of the size of the filesystem, which is 5G in this case.
>>
>> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> 
> Looks good to me.
> 
> I hope there isn't any xfstests/tests/btrfs test case that is hardcoded
> to the older block group alloc (like swap file test with relocation?).
> But then the test case will need a fix, not btrfs-progs. So

Well, fsck/025 will fail, and it's not the test case needs a fix...

We just suddenly can't mkfs on a 128MiB device due to the enlarged 
minimal stripe length of data chunks.

This still exposes a behavior difference between kernel and btrfs-progs...

Thanks,
Qu
> 
> Reviewed-by: Anand Jain <anand.jain@oracle.com>
> 
> Thanks, Anand
> 
>> ---
>>
>>   This change mimics what the kernel currently does, which is set the 
>> stripe_size
>>   regardless of the profile. Any thoughts on it? Thanks!
>>
>>   kernel-shared/volumes.c | 40 +++++++++++++++++++---------------------
>>   1 file changed, 19 insertions(+), 21 deletions(-)
>>
>> diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
>> index aeeb25fe..3677dd7c 100644
>> --- a/kernel-shared/volumes.c
>> +++ b/kernel-shared/volumes.c
>> @@ -1105,27 +1105,25 @@ static void 
>> init_alloc_chunk_ctl_policy_regular(struct btrfs_fs_info *info,
>>       u64 type = ctl->type;
>>       u64 percent_max;
>> -    if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
>> -        if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
>> -            ctl->stripe_size = SZ_8M;
>> -            ctl->max_chunk_size = ctl->stripe_size * 2;
>> -            ctl->min_stripe_size = SZ_1M;
>> -            ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK;
>> -        } else if (type & BTRFS_BLOCK_GROUP_DATA) {
>> -            ctl->stripe_size = SZ_1G;
>> -            ctl->max_chunk_size = 10 * ctl->stripe_size;
>> -            ctl->min_stripe_size = SZ_64M;
>> -            ctl->max_stripes = BTRFS_MAX_DEVS(info);
>> -        } else if (type & BTRFS_BLOCK_GROUP_METADATA) {
>> -            /* For larger filesystems, use larger metadata chunks */
>> -            if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G)
>> -                ctl->max_chunk_size = SZ_1G;
>> -            else
>> -                ctl->max_chunk_size = SZ_256M;
>> -            ctl->stripe_size = ctl->max_chunk_size;
>> -            ctl->min_stripe_size = SZ_32M;
>> -            ctl->max_stripes = BTRFS_MAX_DEVS(info);
>> -        }
>> +    if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
>> +        ctl->stripe_size = SZ_8M;
>> +        ctl->max_chunk_size = ctl->stripe_size * 2;
>> +        ctl->min_stripe_size = SZ_1M;
>> +        ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK;
>> +    } else if (type & BTRFS_BLOCK_GROUP_DATA) {
>> +        ctl->stripe_size = SZ_1G;
>> +        ctl->max_chunk_size = 10 * ctl->stripe_size;
>> +        ctl->min_stripe_size = SZ_64M;
>> +        ctl->max_stripes = BTRFS_MAX_DEVS(info);
>> +    } else if (type & BTRFS_BLOCK_GROUP_METADATA) {
>> +        /* For larger filesystems, use larger metadata chunks */
>> +        if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G)
>> +            ctl->max_chunk_size = SZ_1G;
>> +        else
>> +            ctl->max_chunk_size = SZ_256M;
>> +        ctl->stripe_size = ctl->max_chunk_size;
>> +        ctl->min_stripe_size = SZ_32M;
>> +        ctl->max_stripes = BTRFS_MAX_DEVS(info);
>>       }
>>       /* We don't want a chunk larger than 10% of the FS */
>>
>
David Sterba Aug. 18, 2021, 10:35 a.m. UTC | #8
On Wed, Aug 18, 2021 at 01:17:46PM +0800, Qu Wenruo wrote:
> On 2021/8/17 下午9:24, David Sterba wrote:
> > On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote:
> >>
> >> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> >> ---
> >>
> >>   This change mimics what the kernel currently does, which is set the stripe_size
> >>   regardless of the profile. Any thoughts on it? Thanks!
> >
> > Makes sense to unify that, it works well for the large sizes. Please
> > write tests that verify that the chunk sizes are correct after mkfs on
> > various device sizes. Patch added to devel, thanks.
> 
> It in fact makes fsck/025 to fail, bisection points to this patch
> surprisingly.
> 
> Now "mkfs.btrfs -f" on a 128M file will just fail.
> 
> This looks like a big problem to me though...

This is known that the small filesystem size and intial chunk layout is
not scaled properly, the patch OTOH fixes the more common case where the
normal block group sizes fit and leave enough room for the rest.

Can the test 025 be scaled up so we don't have to create the 128M
filesystem? I'd rather go that way.
David Sterba Aug. 18, 2021, 10:40 a.m. UTC | #9
On Wed, Aug 18, 2021 at 12:02:45PM +0800, Anand Jain wrote:
> On 17/08/2021 21:24, David Sterba wrote:
> > On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote:
> >> [PROBLEM]
> >> Our documentation says that a DATA block group can have up to 1G of
> >> size, or the at most 10% of the filesystem size. Currently, by default,
> >> mkfs.btrfs is creating an initial DATA block group of 8M of size,
> >> regardless of the filesystem size. It happens because we check for the
> >> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the
> >> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE
> >> (default) DATA block group:
> >>
> >> $ mkfs.btrfs -f /storage/btrfs.disk
> >> btrfs-progs v4.19.1
> >> See http://btrfs.wiki.kernel.org for more information.
> >>
> >> Label:              (null)
> >> UUID:               39e3492f-41f2-4bd7-8c25-93032606b9fe
> >> Node size:          16384
> >> Sector size:        4096
> >> Filesystem size:    55.00GiB
> >> Block group profiles:
> >>    Data:             single            8.00MiB
> >>    Metadata:         DUP               1.00GiB
> >>    System:           DUP               8.00MiB
> >> SSD detected:       no
> >> Incompat features:  extref, skinny-metadata
> >> Number of devices:  1
> >> Devices:
> >>     ID        SIZE  PATH
> >>      1    55.00GiB  /storage/btrfs.disk
> >>
> >> In this case, for single data profile, it should create a data block
> >> group of 1G, since the filesystem if bigger than 50G.
> >>
> >> [FIX]
> >> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in
> >> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is
> >> later on assigned to ctl.num_bytes, which is used by
> >> btrfs_make_block_group to create the block group.
> >>
> >> By removing the check we allow the code to always configure the correct
> >> stripe_size regardless of the PROFILE, looking on the block group TYPE.
> >>
> >> With the fix applied, it now created the BG correctly:
> >>
> >> $ ./mkfs.btrfs -f /storage/btrfs.disk
> >> btrfs-progs v5.10.1
> >> See http://btrfs.wiki.kernel.org for more information.
> >>
> >> Label:              (null)
> >> UUID:               5145e343-5639-462d-82ee-3eb75dc41c31
> >> Node size:          16384
> >> Sector size:        4096
> >> Filesystem size:    55.00GiB
> >> Block group profiles:
> >>    Data:             single            1.00GiB
> >>    Metadata:         DUP               1.00GiB
> >>    System:           DUP               8.00MiB
> >> SSD detected:       no
> >> Zoned device:       no
> >> Incompat features:  extref, skinny-metadata
> >> Runtime features:
> >> Checksum:           crc32c
> >> Number of devices:  1
> >> Devices:
> >>     ID        SIZE  PATH
> >>      1    55.00GiB  /storage/btrfs.disk
> >>
> >> Using a disk >50G it creates a 1G single data block group. Another
> >> example:
> >>
> >> $ ./mkfs.btrfs -f /storage/btrfs2.disk
> >> btrfs-progs v5.10.1
> >> See http://btrfs.wiki.kernel.org for more information.
> >>
> >> Label:              (null)
> >> UUID:               c0910857-e512-4e76-9efa-50a475aafc87
> >> Node size:          16384
> >> Sector size:        4096
> >> Filesystem size:    5.00GiB
> >> Block group profiles:
> >>    Data:             single          512.00MiB
> >>    Metadata:         DUP             256.00MiB
> >>    System:           DUP               8.00MiB
> >> SSD detected:       no
> >> Zoned device:       no
> >> Incompat features:  extref, skinny-metadata
> >> Runtime features:
> >> Checksum:           crc32c
> >> Number of devices:  1
> >> Devices:
> >>     ID        SIZE  PATH
> >>      1     5.00GiB  /storage/btrfs2.disk
> >>
> >> The code now created a single data block group of 512M, which is exactly
> >> 10% of the size of the filesystem, which is 5G in this case.
> >>
> >> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> >> ---
> >>
> >>   This change mimics what the kernel currently does, which is set the stripe_size
> >>   regardless of the profile. Any thoughts on it? Thanks!
> > 
> 
> 
> > Makes sense to unify that,
> 
>   Patch [1] also said the same thing 1.5years back.
>   Probably in a wrong context and timing? ;-)

The way it was implemented was not the same, but the idea is right.
Current code does the same allocation like kernel, while previously you
just tweaked some constant.
Qu Wenruo Aug. 18, 2021, 11:26 a.m. UTC | #10
On 2021/8/18 下午6:35, David Sterba wrote:
> On Wed, Aug 18, 2021 at 01:17:46PM +0800, Qu Wenruo wrote:
>> On 2021/8/17 下午9:24, David Sterba wrote:
>>> On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote:
>>>>
>>>> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
>>>> ---
>>>>
>>>>    This change mimics what the kernel currently does, which is set the stripe_size
>>>>    regardless of the profile. Any thoughts on it? Thanks!
>>>
>>> Makes sense to unify that, it works well for the large sizes. Please
>>> write tests that verify that the chunk sizes are correct after mkfs on
>>> various device sizes. Patch added to devel, thanks.
>>
>> It in fact makes fsck/025 to fail, bisection points to this patch
>> surprisingly.
>>
>> Now "mkfs.btrfs -f" on a 128M file will just fail.
>>
>> This looks like a big problem to me though...
>
> This is known that the small filesystem size and intial chunk layout is
> not scaled properly, the patch OTOH fixes the more common case where the
> normal block group sizes fit and leave enough room for the rest.
>
> Can the test 025 be scaled up so we don't have to create the 128M
> filesystem? I'd rather go that way.
>
Sure, we can scale up the size, but this still indicates the problem of
the progs chunk allocator.

So far that's the only failure, but that also means, the minimal device
size calculation is no longer correct.

As kernel has no problem allocate SINGLE chunk smaller than 64M.

I'd prefer to unify the minimal stripe size with kernel.
(And of course, also do proper chunk size calculation for metadata chunks)

Thanks,
Qu
David Sterba Aug. 20, 2021, 12:29 p.m. UTC | #11
On Wed, Aug 18, 2021 at 01:36:32PM +0800, Qu Wenruo wrote:
> > Looks good to me.
> > 
> > I hope there isn't any xfstests/tests/btrfs test case that is hardcoded
> > to the older block group alloc (like swap file test with relocation?).
> > But then the test case will need a fix, not btrfs-progs. So
> 
> Well, fsck/025 will fail, and it's not the test case needs a fix...
> 
> We just suddenly can't mkfs on a 128MiB device due to the enlarged 
> minimal stripe length of data chunks.
> 
> This still exposes a behavior difference between kernel and btrfs-progs...

So I'll reply here as all people involved are also CCed - I'm dropping
this patch for now.

The test 025 fails and my idea was to skip it for now but that still
leaves the problems with creating small filesystems, that's something
that has been working and it should keep doing so.

Proper fix: scale the chunk size for large filesystems and also small
filesystems. We can probably set the minimum size to somewhere like 64M
(for non-mixed setups), and scale the chunk sizes according to that.

d.
Qu Wenruo Aug. 21, 2021, 12:57 a.m. UTC | #12
On 2021/8/20 下午8:29, David Sterba wrote:
> On Wed, Aug 18, 2021 at 01:36:32PM +0800, Qu Wenruo wrote:
>>> Looks good to me.
>>>
>>> I hope there isn't any xfstests/tests/btrfs test case that is hardcoded
>>> to the older block group alloc (like swap file test with relocation?).
>>> But then the test case will need a fix, not btrfs-progs. So
>>
>> Well, fsck/025 will fail, and it's not the test case needs a fix...
>>
>> We just suddenly can't mkfs on a 128MiB device due to the enlarged
>> minimal stripe length of data chunks.
>>
>> This still exposes a behavior difference between kernel and btrfs-progs...
>
> So I'll reply here as all people involved are also CCed - I'm dropping
> this patch for now.
>
> The test 025 fails and my idea was to skip it for now but that still
> leaves the problems with creating small filesystems, that's something
> that has been working and it should keep doing so.
>
> Proper fix: scale the chunk size for large filesystems

This is what the patch is doing, so no problem.

> and also small
> filesystems. We can probably set the minimum size to somewhere like 64M
> (for non-mixed setups), and scale the chunk sizes according to that.

In fact, it's the minimal stripe size causing the problem and could be
easily fixed by just synchronizing the same value from kernel.

But for sure, we still need to do extra tests for small fs size to be
extra safe.

Thanks,
Qu

>
> d.
>
diff mbox series

Patch

diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c
index aeeb25fe..3677dd7c 100644
--- a/kernel-shared/volumes.c
+++ b/kernel-shared/volumes.c
@@ -1105,27 +1105,25 @@  static void init_alloc_chunk_ctl_policy_regular(struct btrfs_fs_info *info,
 	u64 type = ctl->type;
 	u64 percent_max;
 
-	if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
-		if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
-			ctl->stripe_size = SZ_8M;
-			ctl->max_chunk_size = ctl->stripe_size * 2;
-			ctl->min_stripe_size = SZ_1M;
-			ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK;
-		} else if (type & BTRFS_BLOCK_GROUP_DATA) {
-			ctl->stripe_size = SZ_1G;
-			ctl->max_chunk_size = 10 * ctl->stripe_size;
-			ctl->min_stripe_size = SZ_64M;
-			ctl->max_stripes = BTRFS_MAX_DEVS(info);
-		} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
-			/* For larger filesystems, use larger metadata chunks */
-			if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G)
-				ctl->max_chunk_size = SZ_1G;
-			else
-				ctl->max_chunk_size = SZ_256M;
-			ctl->stripe_size = ctl->max_chunk_size;
-			ctl->min_stripe_size = SZ_32M;
-			ctl->max_stripes = BTRFS_MAX_DEVS(info);
-		}
+	if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
+		ctl->stripe_size = SZ_8M;
+		ctl->max_chunk_size = ctl->stripe_size * 2;
+		ctl->min_stripe_size = SZ_1M;
+		ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK;
+	} else if (type & BTRFS_BLOCK_GROUP_DATA) {
+		ctl->stripe_size = SZ_1G;
+		ctl->max_chunk_size = 10 * ctl->stripe_size;
+		ctl->min_stripe_size = SZ_64M;
+		ctl->max_stripes = BTRFS_MAX_DEVS(info);
+	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
+		/* For larger filesystems, use larger metadata chunks */
+		if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G)
+			ctl->max_chunk_size = SZ_1G;
+		else
+			ctl->max_chunk_size = SZ_256M;
+		ctl->stripe_size = ctl->max_chunk_size;
+		ctl->min_stripe_size = SZ_32M;
+		ctl->max_stripes = BTRFS_MAX_DEVS(info);
 	}
 
 	/* We don't want a chunk larger than 10% of the FS */