Message ID | 20210809182613.4466-1-mpdesouza@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs-progs: Drop the type check in init_alloc_chunk_ctl_policy_regular | expand |
On 9.08.21 г. 21:26, Marcos Paulo de Souza wrote: > [PROBLEM] > Our documentation says that a DATA block group can have up to 1G of > size, or the at most 10% of the filesystem size. Currently, by default, > mkfs.btrfs is creating an initial DATA block group of 8M of size, > regardless of the filesystem size. It happens because we check for the > ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the > BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE > (default) DATA block group: > > $ mkfs.btrfs -f /storage/btrfs.disk > btrfs-progs v4.19.1 > See http://btrfs.wiki.kernel.org for more information. > > Label: (null) > UUID: 39e3492f-41f2-4bd7-8c25-93032606b9fe > Node size: 16384 > Sector size: 4096 > Filesystem size: 55.00GiB > Block group profiles: > Data: single 8.00MiB > Metadata: DUP 1.00GiB > System: DUP 8.00MiB > SSD detected: no > Incompat features: extref, skinny-metadata > Number of devices: 1 > Devices: > ID SIZE PATH > 1 55.00GiB /storage/btrfs.disk > > In this case, for single data profile, it should create a data block > group of 1G, since the filesystem if bigger than 50G. > > [FIX] > Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in > init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is > later on assigned to ctl.num_bytes, which is used by > btrfs_make_block_group to create the block group. > > By removing the check we allow the code to always configure the correct > stripe_size regardless of the PROFILE, looking on the block group TYPE. > > With the fix applied, it now created the BG correctly: > > $ ./mkfs.btrfs -f /storage/btrfs.disk > btrfs-progs v5.10.1 > See http://btrfs.wiki.kernel.org for more information. > > Label: (null) > UUID: 5145e343-5639-462d-82ee-3eb75dc41c31 > Node size: 16384 > Sector size: 4096 > Filesystem size: 55.00GiB > Block group profiles: > Data: single 1.00GiB > Metadata: DUP 1.00GiB > System: DUP 8.00MiB > SSD detected: no > Zoned device: no > Incompat features: extref, skinny-metadata > Runtime features: > Checksum: crc32c > Number of devices: 1 > Devices: > ID SIZE PATH > 1 55.00GiB /storage/btrfs.disk > > Using a disk >50G it creates a 1G single data block group. Another > example: > > $ ./mkfs.btrfs -f /storage/btrfs2.disk > btrfs-progs v5.10.1 > See http://btrfs.wiki.kernel.org for more information. > > Label: (null) > UUID: c0910857-e512-4e76-9efa-50a475aafc87 > Node size: 16384 > Sector size: 4096 > Filesystem size: 5.00GiB > Block group profiles: > Data: single 512.00MiB > Metadata: DUP 256.00MiB > System: DUP 8.00MiB > SSD detected: no > Zoned device: no > Incompat features: extref, skinny-metadata > Runtime features: > Checksum: crc32c > Number of devices: 1 > Devices: > ID SIZE PATH > 1 5.00GiB /storage/btrfs2.disk > > The code now created a single data block group of 512M, which is exactly > 10% of the size of the filesystem, which is 5G in this case. > > Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> I see no reason why we should care about BTRFS_BLOCK_GROUP_PROFILE_MASK when creating the chunks. Without this patch ctl's various member are being initialized to their defaults in init_alloc_chunk_ctl, subsequently in create_chunk, chunk_bytes_by_type returns return stripe_size * ctl->num_stripes; which is really SZ_8M * 1. > --- > > This change mimics what the kernel currently does, which is set the stripe_size > regardless of the profile. Any thoughts on it? Thanks! > > kernel-shared/volumes.c | 40 +++++++++++++++++++--------------------- > 1 file changed, 19 insertions(+), 21 deletions(-) > > diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c > index aeeb25fe..3677dd7c 100644 > --- a/kernel-shared/volumes.c > +++ b/kernel-shared/volumes.c > @@ -1105,27 +1105,25 @@ static void init_alloc_chunk_ctl_policy_regular(struct btrfs_fs_info *info, > u64 type = ctl->type; > u64 percent_max; > > - if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { > - if (type & BTRFS_BLOCK_GROUP_SYSTEM) { > - ctl->stripe_size = SZ_8M; > - ctl->max_chunk_size = ctl->stripe_size * 2; > - ctl->min_stripe_size = SZ_1M; > - ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK; > - } else if (type & BTRFS_BLOCK_GROUP_DATA) { > - ctl->stripe_size = SZ_1G; > - ctl->max_chunk_size = 10 * ctl->stripe_size; > - ctl->min_stripe_size = SZ_64M; > - ctl->max_stripes = BTRFS_MAX_DEVS(info); > - } else if (type & BTRFS_BLOCK_GROUP_METADATA) { > - /* For larger filesystems, use larger metadata chunks */ > - if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G) > - ctl->max_chunk_size = SZ_1G; > - else > - ctl->max_chunk_size = SZ_256M; > - ctl->stripe_size = ctl->max_chunk_size; > - ctl->min_stripe_size = SZ_32M; > - ctl->max_stripes = BTRFS_MAX_DEVS(info); > - } > + if (type & BTRFS_BLOCK_GROUP_SYSTEM) { > + ctl->stripe_size = SZ_8M; > + ctl->max_chunk_size = ctl->stripe_size * 2; > + ctl->min_stripe_size = SZ_1M; > + ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK; > + } else if (type & BTRFS_BLOCK_GROUP_DATA) { > + ctl->stripe_size = SZ_1G; > + ctl->max_chunk_size = 10 * ctl->stripe_size; > + ctl->min_stripe_size = SZ_64M; > + ctl->max_stripes = BTRFS_MAX_DEVS(info); > + } else if (type & BTRFS_BLOCK_GROUP_METADATA) { > + /* For larger filesystems, use larger metadata chunks */ > + if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G) > + ctl->max_chunk_size = SZ_1G; > + else > + ctl->max_chunk_size = SZ_256M; > + ctl->stripe_size = ctl->max_chunk_size; > + ctl->min_stripe_size = SZ_32M; > + ctl->max_stripes = BTRFS_MAX_DEVS(info); > } > > /* We don't want a chunk larger than 10% of the FS */ >
On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote: > [PROBLEM] > Our documentation says that a DATA block group can have up to 1G of > size, or the at most 10% of the filesystem size. Currently, by default, > mkfs.btrfs is creating an initial DATA block group of 8M of size, > regardless of the filesystem size. It happens because we check for the > ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the > BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE > (default) DATA block group: > > $ mkfs.btrfs -f /storage/btrfs.disk > btrfs-progs v4.19.1 > See http://btrfs.wiki.kernel.org for more information. > > Label: (null) > UUID: 39e3492f-41f2-4bd7-8c25-93032606b9fe > Node size: 16384 > Sector size: 4096 > Filesystem size: 55.00GiB > Block group profiles: > Data: single 8.00MiB > Metadata: DUP 1.00GiB > System: DUP 8.00MiB > SSD detected: no > Incompat features: extref, skinny-metadata > Number of devices: 1 > Devices: > ID SIZE PATH > 1 55.00GiB /storage/btrfs.disk > > In this case, for single data profile, it should create a data block > group of 1G, since the filesystem if bigger than 50G. > > [FIX] > Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in > init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is > later on assigned to ctl.num_bytes, which is used by > btrfs_make_block_group to create the block group. > > By removing the check we allow the code to always configure the correct > stripe_size regardless of the PROFILE, looking on the block group TYPE. > > With the fix applied, it now created the BG correctly: > > $ ./mkfs.btrfs -f /storage/btrfs.disk > btrfs-progs v5.10.1 > See http://btrfs.wiki.kernel.org for more information. > > Label: (null) > UUID: 5145e343-5639-462d-82ee-3eb75dc41c31 > Node size: 16384 > Sector size: 4096 > Filesystem size: 55.00GiB > Block group profiles: > Data: single 1.00GiB > Metadata: DUP 1.00GiB > System: DUP 8.00MiB > SSD detected: no > Zoned device: no > Incompat features: extref, skinny-metadata > Runtime features: > Checksum: crc32c > Number of devices: 1 > Devices: > ID SIZE PATH > 1 55.00GiB /storage/btrfs.disk > > Using a disk >50G it creates a 1G single data block group. Another > example: > > $ ./mkfs.btrfs -f /storage/btrfs2.disk > btrfs-progs v5.10.1 > See http://btrfs.wiki.kernel.org for more information. > > Label: (null) > UUID: c0910857-e512-4e76-9efa-50a475aafc87 > Node size: 16384 > Sector size: 4096 > Filesystem size: 5.00GiB > Block group profiles: > Data: single 512.00MiB > Metadata: DUP 256.00MiB > System: DUP 8.00MiB > SSD detected: no > Zoned device: no > Incompat features: extref, skinny-metadata > Runtime features: > Checksum: crc32c > Number of devices: 1 > Devices: > ID SIZE PATH > 1 5.00GiB /storage/btrfs2.disk > > The code now created a single data block group of 512M, which is exactly > 10% of the size of the filesystem, which is 5G in this case. > > Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> > --- > > This change mimics what the kernel currently does, which is set the stripe_size > regardless of the profile. Any thoughts on it? Thanks! Makes sense to unify that, it works well for the large sizes. Please write tests that verify that the chunk sizes are correct after mkfs on various device sizes. Patch added to devel, thanks.
On 17/08/2021 21:24, David Sterba wrote: > On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote: >> [PROBLEM] >> Our documentation says that a DATA block group can have up to 1G of >> size, or the at most 10% of the filesystem size. Currently, by default, >> mkfs.btrfs is creating an initial DATA block group of 8M of size, >> regardless of the filesystem size. It happens because we check for the >> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the >> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE >> (default) DATA block group: >> >> $ mkfs.btrfs -f /storage/btrfs.disk >> btrfs-progs v4.19.1 >> See http://btrfs.wiki.kernel.org for more information. >> >> Label: (null) >> UUID: 39e3492f-41f2-4bd7-8c25-93032606b9fe >> Node size: 16384 >> Sector size: 4096 >> Filesystem size: 55.00GiB >> Block group profiles: >> Data: single 8.00MiB >> Metadata: DUP 1.00GiB >> System: DUP 8.00MiB >> SSD detected: no >> Incompat features: extref, skinny-metadata >> Number of devices: 1 >> Devices: >> ID SIZE PATH >> 1 55.00GiB /storage/btrfs.disk >> >> In this case, for single data profile, it should create a data block >> group of 1G, since the filesystem if bigger than 50G. >> >> [FIX] >> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in >> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is >> later on assigned to ctl.num_bytes, which is used by >> btrfs_make_block_group to create the block group. >> >> By removing the check we allow the code to always configure the correct >> stripe_size regardless of the PROFILE, looking on the block group TYPE. >> >> With the fix applied, it now created the BG correctly: >> >> $ ./mkfs.btrfs -f /storage/btrfs.disk >> btrfs-progs v5.10.1 >> See http://btrfs.wiki.kernel.org for more information. >> >> Label: (null) >> UUID: 5145e343-5639-462d-82ee-3eb75dc41c31 >> Node size: 16384 >> Sector size: 4096 >> Filesystem size: 55.00GiB >> Block group profiles: >> Data: single 1.00GiB >> Metadata: DUP 1.00GiB >> System: DUP 8.00MiB >> SSD detected: no >> Zoned device: no >> Incompat features: extref, skinny-metadata >> Runtime features: >> Checksum: crc32c >> Number of devices: 1 >> Devices: >> ID SIZE PATH >> 1 55.00GiB /storage/btrfs.disk >> >> Using a disk >50G it creates a 1G single data block group. Another >> example: >> >> $ ./mkfs.btrfs -f /storage/btrfs2.disk >> btrfs-progs v5.10.1 >> See http://btrfs.wiki.kernel.org for more information. >> >> Label: (null) >> UUID: c0910857-e512-4e76-9efa-50a475aafc87 >> Node size: 16384 >> Sector size: 4096 >> Filesystem size: 5.00GiB >> Block group profiles: >> Data: single 512.00MiB >> Metadata: DUP 256.00MiB >> System: DUP 8.00MiB >> SSD detected: no >> Zoned device: no >> Incompat features: extref, skinny-metadata >> Runtime features: >> Checksum: crc32c >> Number of devices: 1 >> Devices: >> ID SIZE PATH >> 1 5.00GiB /storage/btrfs2.disk >> >> The code now created a single data block group of 512M, which is exactly >> 10% of the size of the filesystem, which is 5G in this case. >> >> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> >> --- >> >> This change mimics what the kernel currently does, which is set the stripe_size >> regardless of the profile. Any thoughts on it? Thanks! > > Makes sense to unify that, Patch [1] also said the same thing 1.5years back. Probably in a wrong context and timing? ;-) [1] https://patchwork.kernel.org/project/linux-btrfs/patch/1583926429-28485-1-git-send-email-anand.jain@oracle.com/ -Anand > it works well for the large sizes. > Please > write tests that verify that the chunk sizes are correct after mkfs on > various device sizes. Patch added to devel, thanks. >
On 2021/8/17 下午9:24, David Sterba wrote: > On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote: >> [PROBLEM] >> Our documentation says that a DATA block group can have up to 1G of >> size, or the at most 10% of the filesystem size. Currently, by default, >> mkfs.btrfs is creating an initial DATA block group of 8M of size, >> regardless of the filesystem size. It happens because we check for the >> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the >> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE >> (default) DATA block group: >> >> $ mkfs.btrfs -f /storage/btrfs.disk >> btrfs-progs v4.19.1 >> See http://btrfs.wiki.kernel.org for more information. >> >> Label: (null) >> UUID: 39e3492f-41f2-4bd7-8c25-93032606b9fe >> Node size: 16384 >> Sector size: 4096 >> Filesystem size: 55.00GiB >> Block group profiles: >> Data: single 8.00MiB >> Metadata: DUP 1.00GiB >> System: DUP 8.00MiB >> SSD detected: no >> Incompat features: extref, skinny-metadata >> Number of devices: 1 >> Devices: >> ID SIZE PATH >> 1 55.00GiB /storage/btrfs.disk >> >> In this case, for single data profile, it should create a data block >> group of 1G, since the filesystem if bigger than 50G. >> >> [FIX] >> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in >> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is >> later on assigned to ctl.num_bytes, which is used by >> btrfs_make_block_group to create the block group. >> >> By removing the check we allow the code to always configure the correct >> stripe_size regardless of the PROFILE, looking on the block group TYPE. >> >> With the fix applied, it now created the BG correctly: >> >> $ ./mkfs.btrfs -f /storage/btrfs.disk >> btrfs-progs v5.10.1 >> See http://btrfs.wiki.kernel.org for more information. >> >> Label: (null) >> UUID: 5145e343-5639-462d-82ee-3eb75dc41c31 >> Node size: 16384 >> Sector size: 4096 >> Filesystem size: 55.00GiB >> Block group profiles: >> Data: single 1.00GiB >> Metadata: DUP 1.00GiB >> System: DUP 8.00MiB >> SSD detected: no >> Zoned device: no >> Incompat features: extref, skinny-metadata >> Runtime features: >> Checksum: crc32c >> Number of devices: 1 >> Devices: >> ID SIZE PATH >> 1 55.00GiB /storage/btrfs.disk >> >> Using a disk >50G it creates a 1G single data block group. Another >> example: >> >> $ ./mkfs.btrfs -f /storage/btrfs2.disk >> btrfs-progs v5.10.1 >> See http://btrfs.wiki.kernel.org for more information. >> >> Label: (null) >> UUID: c0910857-e512-4e76-9efa-50a475aafc87 >> Node size: 16384 >> Sector size: 4096 >> Filesystem size: 5.00GiB >> Block group profiles: >> Data: single 512.00MiB >> Metadata: DUP 256.00MiB >> System: DUP 8.00MiB >> SSD detected: no >> Zoned device: no >> Incompat features: extref, skinny-metadata >> Runtime features: >> Checksum: crc32c >> Number of devices: 1 >> Devices: >> ID SIZE PATH >> 1 5.00GiB /storage/btrfs2.disk >> >> The code now created a single data block group of 512M, which is exactly >> 10% of the size of the filesystem, which is 5G in this case. >> >> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> >> --- >> >> This change mimics what the kernel currently does, which is set the stripe_size >> regardless of the profile. Any thoughts on it? Thanks! > > Makes sense to unify that, it works well for the large sizes. Please > write tests that verify that the chunk sizes are correct after mkfs on > various device sizes. Patch added to devel, thanks. > It in fact makes fsck/025 to fail, bisection points to this patch surprisingly. Now "mkfs.btrfs -f" on a 128M file will just fail. This looks like a big problem to me though... Thanks, Qu
On 10/08/2021 02:26, Marcos Paulo de Souza wrote: > [PROBLEM] > Our documentation says that a DATA block group can have up to 1G of > size, or the at most 10% of the filesystem size. Currently, by default, > mkfs.btrfs is creating an initial DATA block group of 8M of size, > regardless of the filesystem size. It happens because we check for the > ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the > BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE > (default) DATA block group: > $ mkfs.btrfs -f /storage/btrfs.disk > btrfs-progs v4.19.1 > See http://btrfs.wiki.kernel.org for more information. > > Label: (null) > UUID: 39e3492f-41f2-4bd7-8c25-93032606b9fe > Node size: 16384 > Sector size: 4096 > Filesystem size: 55.00GiB > Block group profiles: > Data: single 8.00MiB > Metadata: DUP 1.00GiB > System: DUP 8.00MiB > SSD detected: no > Incompat features: extref, skinny-metadata > Number of devices: 1 > Devices: > ID SIZE PATH > 1 55.00GiB /storage/btrfs.disk > > In this case, for single data profile, it should create a data block > group of 1G, since the filesystem if bigger than 50G. > > [FIX] > Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in > init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is > later on assigned to ctl.num_bytes, which is used by > btrfs_make_block_group to create the block group. > > By removing the check we allow the code to always configure the correct > stripe_size regardless of the PROFILE, looking on the block group TYPE. > > With the fix applied, it now created the BG correctly: > > $ ./mkfs.btrfs -f /storage/btrfs.disk > btrfs-progs v5.10.1 > See http://btrfs.wiki.kernel.org for more information. > > Label: (null) > UUID: 5145e343-5639-462d-82ee-3eb75dc41c31 > Node size: 16384 > Sector size: 4096 > Filesystem size: 55.00GiB > Block group profiles: > Data: single 1.00GiB > Metadata: DUP 1.00GiB > System: DUP 8.00MiB > SSD detected: no > Zoned device: no > Incompat features: extref, skinny-metadata > Runtime features: > Checksum: crc32c > Number of devices: 1 > Devices: > ID SIZE PATH > 1 55.00GiB /storage/btrfs.disk > > Using a disk >50G it creates a 1G single data block group. Another > example: > > $ ./mkfs.btrfs -f /storage/btrfs2.disk > btrfs-progs v5.10.1 > See http://btrfs.wiki.kernel.org for more information. > > Label: (null) > UUID: c0910857-e512-4e76-9efa-50a475aafc87 > Node size: 16384 > Sector size: 4096 > Filesystem size: 5.00GiB > Block group profiles: > Data: single 512.00MiB > Metadata: DUP 256.00MiB > System: DUP 8.00MiB > SSD detected: no > Zoned device: no > Incompat features: extref, skinny-metadata > Runtime features: > Checksum: crc32c > Number of devices: 1 > Devices: > ID SIZE PATH > 1 5.00GiB /storage/btrfs2.disk > > The code now created a single data block group of 512M, which is exactly > 10% of the size of the filesystem, which is 5G in this case. > > Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Looks good to me. I hope there isn't any xfstests/tests/btrfs test case that is hardcoded to the older block group alloc (like swap file test with relocation?). But then the test case will need a fix, not btrfs-progs. So Reviewed-by: Anand Jain <anand.jain@oracle.com> Thanks, Anand > --- > > This change mimics what the kernel currently does, which is set the stripe_size > regardless of the profile. Any thoughts on it? Thanks! > > kernel-shared/volumes.c | 40 +++++++++++++++++++--------------------- > 1 file changed, 19 insertions(+), 21 deletions(-) > > diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c > index aeeb25fe..3677dd7c 100644 > --- a/kernel-shared/volumes.c > +++ b/kernel-shared/volumes.c > @@ -1105,27 +1105,25 @@ static void init_alloc_chunk_ctl_policy_regular(struct btrfs_fs_info *info, > u64 type = ctl->type; > u64 percent_max; > > - if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { > - if (type & BTRFS_BLOCK_GROUP_SYSTEM) { > - ctl->stripe_size = SZ_8M; > - ctl->max_chunk_size = ctl->stripe_size * 2; > - ctl->min_stripe_size = SZ_1M; > - ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK; > - } else if (type & BTRFS_BLOCK_GROUP_DATA) { > - ctl->stripe_size = SZ_1G; > - ctl->max_chunk_size = 10 * ctl->stripe_size; > - ctl->min_stripe_size = SZ_64M; > - ctl->max_stripes = BTRFS_MAX_DEVS(info); > - } else if (type & BTRFS_BLOCK_GROUP_METADATA) { > - /* For larger filesystems, use larger metadata chunks */ > - if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G) > - ctl->max_chunk_size = SZ_1G; > - else > - ctl->max_chunk_size = SZ_256M; > - ctl->stripe_size = ctl->max_chunk_size; > - ctl->min_stripe_size = SZ_32M; > - ctl->max_stripes = BTRFS_MAX_DEVS(info); > - } > + if (type & BTRFS_BLOCK_GROUP_SYSTEM) { > + ctl->stripe_size = SZ_8M; > + ctl->max_chunk_size = ctl->stripe_size * 2; > + ctl->min_stripe_size = SZ_1M; > + ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK; > + } else if (type & BTRFS_BLOCK_GROUP_DATA) { > + ctl->stripe_size = SZ_1G; > + ctl->max_chunk_size = 10 * ctl->stripe_size; > + ctl->min_stripe_size = SZ_64M; > + ctl->max_stripes = BTRFS_MAX_DEVS(info); > + } else if (type & BTRFS_BLOCK_GROUP_METADATA) { > + /* For larger filesystems, use larger metadata chunks */ > + if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G) > + ctl->max_chunk_size = SZ_1G; > + else > + ctl->max_chunk_size = SZ_256M; > + ctl->stripe_size = ctl->max_chunk_size; > + ctl->min_stripe_size = SZ_32M; > + ctl->max_stripes = BTRFS_MAX_DEVS(info); > } > > /* We don't want a chunk larger than 10% of the FS */ >
On 2021/8/18 下午1:17, Qu Wenruo wrote: > > > On 2021/8/17 下午9:24, David Sterba wrote: >> On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote: >>> [PROBLEM] >>> Our documentation says that a DATA block group can have up to 1G of >>> size, or the at most 10% of the filesystem size. Currently, by default, >>> mkfs.btrfs is creating an initial DATA block group of 8M of size, >>> regardless of the filesystem size. It happens because we check for the >>> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the >>> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE >>> (default) DATA block group: >>> >>> $ mkfs.btrfs -f /storage/btrfs.disk >>> btrfs-progs v4.19.1 >>> See http://btrfs.wiki.kernel.org for more information. >>> >>> Label: (null) >>> UUID: 39e3492f-41f2-4bd7-8c25-93032606b9fe >>> Node size: 16384 >>> Sector size: 4096 >>> Filesystem size: 55.00GiB >>> Block group profiles: >>> Data: single 8.00MiB >>> Metadata: DUP 1.00GiB >>> System: DUP 8.00MiB >>> SSD detected: no >>> Incompat features: extref, skinny-metadata >>> Number of devices: 1 >>> Devices: >>> ID SIZE PATH >>> 1 55.00GiB /storage/btrfs.disk >>> >>> In this case, for single data profile, it should create a data block >>> group of 1G, since the filesystem if bigger than 50G. >>> >>> [FIX] >>> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in >>> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is >>> later on assigned to ctl.num_bytes, which is used by >>> btrfs_make_block_group to create the block group. >>> >>> By removing the check we allow the code to always configure the correct >>> stripe_size regardless of the PROFILE, looking on the block group TYPE. >>> >>> With the fix applied, it now created the BG correctly: >>> >>> $ ./mkfs.btrfs -f /storage/btrfs.disk >>> btrfs-progs v5.10.1 >>> See http://btrfs.wiki.kernel.org for more information. >>> >>> Label: (null) >>> UUID: 5145e343-5639-462d-82ee-3eb75dc41c31 >>> Node size: 16384 >>> Sector size: 4096 >>> Filesystem size: 55.00GiB >>> Block group profiles: >>> Data: single 1.00GiB >>> Metadata: DUP 1.00GiB >>> System: DUP 8.00MiB >>> SSD detected: no >>> Zoned device: no >>> Incompat features: extref, skinny-metadata >>> Runtime features: >>> Checksum: crc32c >>> Number of devices: 1 >>> Devices: >>> ID SIZE PATH >>> 1 55.00GiB /storage/btrfs.disk >>> >>> Using a disk >50G it creates a 1G single data block group. Another >>> example: >>> >>> $ ./mkfs.btrfs -f /storage/btrfs2.disk >>> btrfs-progs v5.10.1 >>> See http://btrfs.wiki.kernel.org for more information. >>> >>> Label: (null) >>> UUID: c0910857-e512-4e76-9efa-50a475aafc87 >>> Node size: 16384 >>> Sector size: 4096 >>> Filesystem size: 5.00GiB >>> Block group profiles: >>> Data: single 512.00MiB >>> Metadata: DUP 256.00MiB >>> System: DUP 8.00MiB >>> SSD detected: no >>> Zoned device: no >>> Incompat features: extref, skinny-metadata >>> Runtime features: >>> Checksum: crc32c >>> Number of devices: 1 >>> Devices: >>> ID SIZE PATH >>> 1 5.00GiB /storage/btrfs2.disk >>> >>> The code now created a single data block group of 512M, which is exactly >>> 10% of the size of the filesystem, which is 5G in this case. >>> >>> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> >>> --- >>> >>> This change mimics what the kernel currently does, which is set the >>> stripe_size >>> regardless of the profile. Any thoughts on it? Thanks! >> >> Makes sense to unify that, it works well for the large sizes. Please >> write tests that verify that the chunk sizes are correct after mkfs on >> various device sizes. Patch added to devel, thanks. >> > > It in fact makes fsck/025 to fail, bisection points to this patch > surprisingly. > > Now "mkfs.btrfs -f" on a 128M file will just fail. > > This looks like a big problem to me though... The cause here is that, we changed the minimal stripe size for SINGLE profile. The old code for profile SINGLE will not go through any of the type specific limits, and falls back to use the default limits, which has only 1MiB minimal stripe limit. While the new code changes the minimal stripe length to 64MiB. Furthermore, currently mkfs using default profiles will lead to the following chunk layout: Data: single 8.00MiB Metadata: DUP 32.00MiB System: DUP 8.00MiB This means, System + Metadata will eat up 80MiB already (40MiB x2 caused by DUP profile). leaving only 48 MiB space left for data, which is smaller than the 64MiB minimal stripe requirement. In fact, this behavior is still not the same as kernel, as kernel can create 16MiB SINGLE data chunk without any problem, so the 64MiB limit is not correct either. Since we're here, I hope we can work on the initial chunk size with better flexibility, not creating too large metadata chunk which easily eat up half of the space. Thanks, Qu > > Thanks, > Qu
On 2021/8/18 下午1:27, Anand Jain wrote: > On 10/08/2021 02:26, Marcos Paulo de Souza wrote: >> [PROBLEM] >> Our documentation says that a DATA block group can have up to 1G of >> size, or the at most 10% of the filesystem size. Currently, by default, >> mkfs.btrfs is creating an initial DATA block group of 8M of size, >> regardless of the filesystem size. It happens because we check for the >> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the >> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE >> (default) DATA block group: > > >> $ mkfs.btrfs -f /storage/btrfs.disk >> btrfs-progs v4.19.1 >> See http://btrfs.wiki.kernel.org for more information. >> >> Label: (null) >> UUID: 39e3492f-41f2-4bd7-8c25-93032606b9fe >> Node size: 16384 >> Sector size: 4096 >> Filesystem size: 55.00GiB >> Block group profiles: >> Data: single 8.00MiB >> Metadata: DUP 1.00GiB >> System: DUP 8.00MiB >> SSD detected: no >> Incompat features: extref, skinny-metadata >> Number of devices: 1 >> Devices: >> ID SIZE PATH >> 1 55.00GiB /storage/btrfs.disk >> >> In this case, for single data profile, it should create a data block >> group of 1G, since the filesystem if bigger than 50G. >> >> [FIX] >> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in >> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is >> later on assigned to ctl.num_bytes, which is used by >> btrfs_make_block_group to create the block group. >> >> By removing the check we allow the code to always configure the correct >> stripe_size regardless of the PROFILE, looking on the block group TYPE. >> >> With the fix applied, it now created the BG correctly: >> >> $ ./mkfs.btrfs -f /storage/btrfs.disk >> btrfs-progs v5.10.1 >> See http://btrfs.wiki.kernel.org for more information. >> >> Label: (null) >> UUID: 5145e343-5639-462d-82ee-3eb75dc41c31 >> Node size: 16384 >> Sector size: 4096 >> Filesystem size: 55.00GiB >> Block group profiles: >> Data: single 1.00GiB >> Metadata: DUP 1.00GiB >> System: DUP 8.00MiB >> SSD detected: no >> Zoned device: no >> Incompat features: extref, skinny-metadata >> Runtime features: >> Checksum: crc32c >> Number of devices: 1 >> Devices: >> ID SIZE PATH >> 1 55.00GiB /storage/btrfs.disk >> >> Using a disk >50G it creates a 1G single data block group. Another >> example: >> >> $ ./mkfs.btrfs -f /storage/btrfs2.disk >> btrfs-progs v5.10.1 >> See http://btrfs.wiki.kernel.org for more information. >> >> Label: (null) >> UUID: c0910857-e512-4e76-9efa-50a475aafc87 >> Node size: 16384 >> Sector size: 4096 >> Filesystem size: 5.00GiB >> Block group profiles: >> Data: single 512.00MiB >> Metadata: DUP 256.00MiB >> System: DUP 8.00MiB >> SSD detected: no >> Zoned device: no >> Incompat features: extref, skinny-metadata >> Runtime features: >> Checksum: crc32c >> Number of devices: 1 >> Devices: >> ID SIZE PATH >> 1 5.00GiB /storage/btrfs2.disk >> >> The code now created a single data block group of 512M, which is exactly >> 10% of the size of the filesystem, which is 5G in this case. >> >> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> > > Looks good to me. > > I hope there isn't any xfstests/tests/btrfs test case that is hardcoded > to the older block group alloc (like swap file test with relocation?). > But then the test case will need a fix, not btrfs-progs. So Well, fsck/025 will fail, and it's not the test case needs a fix... We just suddenly can't mkfs on a 128MiB device due to the enlarged minimal stripe length of data chunks. This still exposes a behavior difference between kernel and btrfs-progs... Thanks, Qu > > Reviewed-by: Anand Jain <anand.jain@oracle.com> > > Thanks, Anand > >> --- >> >> This change mimics what the kernel currently does, which is set the >> stripe_size >> regardless of the profile. Any thoughts on it? Thanks! >> >> kernel-shared/volumes.c | 40 +++++++++++++++++++--------------------- >> 1 file changed, 19 insertions(+), 21 deletions(-) >> >> diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c >> index aeeb25fe..3677dd7c 100644 >> --- a/kernel-shared/volumes.c >> +++ b/kernel-shared/volumes.c >> @@ -1105,27 +1105,25 @@ static void >> init_alloc_chunk_ctl_policy_regular(struct btrfs_fs_info *info, >> u64 type = ctl->type; >> u64 percent_max; >> - if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { >> - if (type & BTRFS_BLOCK_GROUP_SYSTEM) { >> - ctl->stripe_size = SZ_8M; >> - ctl->max_chunk_size = ctl->stripe_size * 2; >> - ctl->min_stripe_size = SZ_1M; >> - ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK; >> - } else if (type & BTRFS_BLOCK_GROUP_DATA) { >> - ctl->stripe_size = SZ_1G; >> - ctl->max_chunk_size = 10 * ctl->stripe_size; >> - ctl->min_stripe_size = SZ_64M; >> - ctl->max_stripes = BTRFS_MAX_DEVS(info); >> - } else if (type & BTRFS_BLOCK_GROUP_METADATA) { >> - /* For larger filesystems, use larger metadata chunks */ >> - if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G) >> - ctl->max_chunk_size = SZ_1G; >> - else >> - ctl->max_chunk_size = SZ_256M; >> - ctl->stripe_size = ctl->max_chunk_size; >> - ctl->min_stripe_size = SZ_32M; >> - ctl->max_stripes = BTRFS_MAX_DEVS(info); >> - } >> + if (type & BTRFS_BLOCK_GROUP_SYSTEM) { >> + ctl->stripe_size = SZ_8M; >> + ctl->max_chunk_size = ctl->stripe_size * 2; >> + ctl->min_stripe_size = SZ_1M; >> + ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK; >> + } else if (type & BTRFS_BLOCK_GROUP_DATA) { >> + ctl->stripe_size = SZ_1G; >> + ctl->max_chunk_size = 10 * ctl->stripe_size; >> + ctl->min_stripe_size = SZ_64M; >> + ctl->max_stripes = BTRFS_MAX_DEVS(info); >> + } else if (type & BTRFS_BLOCK_GROUP_METADATA) { >> + /* For larger filesystems, use larger metadata chunks */ >> + if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G) >> + ctl->max_chunk_size = SZ_1G; >> + else >> + ctl->max_chunk_size = SZ_256M; >> + ctl->stripe_size = ctl->max_chunk_size; >> + ctl->min_stripe_size = SZ_32M; >> + ctl->max_stripes = BTRFS_MAX_DEVS(info); >> } >> /* We don't want a chunk larger than 10% of the FS */ >> >
On Wed, Aug 18, 2021 at 01:17:46PM +0800, Qu Wenruo wrote: > On 2021/8/17 下午9:24, David Sterba wrote: > > On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote: > >> > >> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> > >> --- > >> > >> This change mimics what the kernel currently does, which is set the stripe_size > >> regardless of the profile. Any thoughts on it? Thanks! > > > > Makes sense to unify that, it works well for the large sizes. Please > > write tests that verify that the chunk sizes are correct after mkfs on > > various device sizes. Patch added to devel, thanks. > > It in fact makes fsck/025 to fail, bisection points to this patch > surprisingly. > > Now "mkfs.btrfs -f" on a 128M file will just fail. > > This looks like a big problem to me though... This is known that the small filesystem size and intial chunk layout is not scaled properly, the patch OTOH fixes the more common case where the normal block group sizes fit and leave enough room for the rest. Can the test 025 be scaled up so we don't have to create the 128M filesystem? I'd rather go that way.
On Wed, Aug 18, 2021 at 12:02:45PM +0800, Anand Jain wrote: > On 17/08/2021 21:24, David Sterba wrote: > > On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote: > >> [PROBLEM] > >> Our documentation says that a DATA block group can have up to 1G of > >> size, or the at most 10% of the filesystem size. Currently, by default, > >> mkfs.btrfs is creating an initial DATA block group of 8M of size, > >> regardless of the filesystem size. It happens because we check for the > >> ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the > >> BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE > >> (default) DATA block group: > >> > >> $ mkfs.btrfs -f /storage/btrfs.disk > >> btrfs-progs v4.19.1 > >> See http://btrfs.wiki.kernel.org for more information. > >> > >> Label: (null) > >> UUID: 39e3492f-41f2-4bd7-8c25-93032606b9fe > >> Node size: 16384 > >> Sector size: 4096 > >> Filesystem size: 55.00GiB > >> Block group profiles: > >> Data: single 8.00MiB > >> Metadata: DUP 1.00GiB > >> System: DUP 8.00MiB > >> SSD detected: no > >> Incompat features: extref, skinny-metadata > >> Number of devices: 1 > >> Devices: > >> ID SIZE PATH > >> 1 55.00GiB /storage/btrfs.disk > >> > >> In this case, for single data profile, it should create a data block > >> group of 1G, since the filesystem if bigger than 50G. > >> > >> [FIX] > >> Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in > >> init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is > >> later on assigned to ctl.num_bytes, which is used by > >> btrfs_make_block_group to create the block group. > >> > >> By removing the check we allow the code to always configure the correct > >> stripe_size regardless of the PROFILE, looking on the block group TYPE. > >> > >> With the fix applied, it now created the BG correctly: > >> > >> $ ./mkfs.btrfs -f /storage/btrfs.disk > >> btrfs-progs v5.10.1 > >> See http://btrfs.wiki.kernel.org for more information. > >> > >> Label: (null) > >> UUID: 5145e343-5639-462d-82ee-3eb75dc41c31 > >> Node size: 16384 > >> Sector size: 4096 > >> Filesystem size: 55.00GiB > >> Block group profiles: > >> Data: single 1.00GiB > >> Metadata: DUP 1.00GiB > >> System: DUP 8.00MiB > >> SSD detected: no > >> Zoned device: no > >> Incompat features: extref, skinny-metadata > >> Runtime features: > >> Checksum: crc32c > >> Number of devices: 1 > >> Devices: > >> ID SIZE PATH > >> 1 55.00GiB /storage/btrfs.disk > >> > >> Using a disk >50G it creates a 1G single data block group. Another > >> example: > >> > >> $ ./mkfs.btrfs -f /storage/btrfs2.disk > >> btrfs-progs v5.10.1 > >> See http://btrfs.wiki.kernel.org for more information. > >> > >> Label: (null) > >> UUID: c0910857-e512-4e76-9efa-50a475aafc87 > >> Node size: 16384 > >> Sector size: 4096 > >> Filesystem size: 5.00GiB > >> Block group profiles: > >> Data: single 512.00MiB > >> Metadata: DUP 256.00MiB > >> System: DUP 8.00MiB > >> SSD detected: no > >> Zoned device: no > >> Incompat features: extref, skinny-metadata > >> Runtime features: > >> Checksum: crc32c > >> Number of devices: 1 > >> Devices: > >> ID SIZE PATH > >> 1 5.00GiB /storage/btrfs2.disk > >> > >> The code now created a single data block group of 512M, which is exactly > >> 10% of the size of the filesystem, which is 5G in this case. > >> > >> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> > >> --- > >> > >> This change mimics what the kernel currently does, which is set the stripe_size > >> regardless of the profile. Any thoughts on it? Thanks! > > > > > > Makes sense to unify that, > > Patch [1] also said the same thing 1.5years back. > Probably in a wrong context and timing? ;-) The way it was implemented was not the same, but the idea is right. Current code does the same allocation like kernel, while previously you just tweaked some constant.
On 2021/8/18 下午6:35, David Sterba wrote: > On Wed, Aug 18, 2021 at 01:17:46PM +0800, Qu Wenruo wrote: >> On 2021/8/17 下午9:24, David Sterba wrote: >>> On Mon, Aug 09, 2021 at 03:26:13PM -0300, Marcos Paulo de Souza wrote: >>>> >>>> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> >>>> --- >>>> >>>> This change mimics what the kernel currently does, which is set the stripe_size >>>> regardless of the profile. Any thoughts on it? Thanks! >>> >>> Makes sense to unify that, it works well for the large sizes. Please >>> write tests that verify that the chunk sizes are correct after mkfs on >>> various device sizes. Patch added to devel, thanks. >> >> It in fact makes fsck/025 to fail, bisection points to this patch >> surprisingly. >> >> Now "mkfs.btrfs -f" on a 128M file will just fail. >> >> This looks like a big problem to me though... > > This is known that the small filesystem size and intial chunk layout is > not scaled properly, the patch OTOH fixes the more common case where the > normal block group sizes fit and leave enough room for the rest. > > Can the test 025 be scaled up so we don't have to create the 128M > filesystem? I'd rather go that way. > Sure, we can scale up the size, but this still indicates the problem of the progs chunk allocator. So far that's the only failure, but that also means, the minimal device size calculation is no longer correct. As kernel has no problem allocate SINGLE chunk smaller than 64M. I'd prefer to unify the minimal stripe size with kernel. (And of course, also do proper chunk size calculation for metadata chunks) Thanks, Qu
On Wed, Aug 18, 2021 at 01:36:32PM +0800, Qu Wenruo wrote: > > Looks good to me. > > > > I hope there isn't any xfstests/tests/btrfs test case that is hardcoded > > to the older block group alloc (like swap file test with relocation?). > > But then the test case will need a fix, not btrfs-progs. So > > Well, fsck/025 will fail, and it's not the test case needs a fix... > > We just suddenly can't mkfs on a 128MiB device due to the enlarged > minimal stripe length of data chunks. > > This still exposes a behavior difference between kernel and btrfs-progs... So I'll reply here as all people involved are also CCed - I'm dropping this patch for now. The test 025 fails and my idea was to skip it for now but that still leaves the problems with creating small filesystems, that's something that has been working and it should keep doing so. Proper fix: scale the chunk size for large filesystems and also small filesystems. We can probably set the minimum size to somewhere like 64M (for non-mixed setups), and scale the chunk sizes according to that. d.
On 2021/8/20 下午8:29, David Sterba wrote: > On Wed, Aug 18, 2021 at 01:36:32PM +0800, Qu Wenruo wrote: >>> Looks good to me. >>> >>> I hope there isn't any xfstests/tests/btrfs test case that is hardcoded >>> to the older block group alloc (like swap file test with relocation?). >>> But then the test case will need a fix, not btrfs-progs. So >> >> Well, fsck/025 will fail, and it's not the test case needs a fix... >> >> We just suddenly can't mkfs on a 128MiB device due to the enlarged >> minimal stripe length of data chunks. >> >> This still exposes a behavior difference between kernel and btrfs-progs... > > So I'll reply here as all people involved are also CCed - I'm dropping > this patch for now. > > The test 025 fails and my idea was to skip it for now but that still > leaves the problems with creating small filesystems, that's something > that has been working and it should keep doing so. > > Proper fix: scale the chunk size for large filesystems This is what the patch is doing, so no problem. > and also small > filesystems. We can probably set the minimum size to somewhere like 64M > (for non-mixed setups), and scale the chunk sizes according to that. In fact, it's the minimal stripe size causing the problem and could be easily fixed by just synchronizing the same value from kernel. But for sure, we still need to do extra tests for small fs size to be extra safe. Thanks, Qu > > d. >
diff --git a/kernel-shared/volumes.c b/kernel-shared/volumes.c index aeeb25fe..3677dd7c 100644 --- a/kernel-shared/volumes.c +++ b/kernel-shared/volumes.c @@ -1105,27 +1105,25 @@ static void init_alloc_chunk_ctl_policy_regular(struct btrfs_fs_info *info, u64 type = ctl->type; u64 percent_max; - if (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { - if (type & BTRFS_BLOCK_GROUP_SYSTEM) { - ctl->stripe_size = SZ_8M; - ctl->max_chunk_size = ctl->stripe_size * 2; - ctl->min_stripe_size = SZ_1M; - ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK; - } else if (type & BTRFS_BLOCK_GROUP_DATA) { - ctl->stripe_size = SZ_1G; - ctl->max_chunk_size = 10 * ctl->stripe_size; - ctl->min_stripe_size = SZ_64M; - ctl->max_stripes = BTRFS_MAX_DEVS(info); - } else if (type & BTRFS_BLOCK_GROUP_METADATA) { - /* For larger filesystems, use larger metadata chunks */ - if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G) - ctl->max_chunk_size = SZ_1G; - else - ctl->max_chunk_size = SZ_256M; - ctl->stripe_size = ctl->max_chunk_size; - ctl->min_stripe_size = SZ_32M; - ctl->max_stripes = BTRFS_MAX_DEVS(info); - } + if (type & BTRFS_BLOCK_GROUP_SYSTEM) { + ctl->stripe_size = SZ_8M; + ctl->max_chunk_size = ctl->stripe_size * 2; + ctl->min_stripe_size = SZ_1M; + ctl->max_stripes = BTRFS_MAX_DEVS_SYS_CHUNK; + } else if (type & BTRFS_BLOCK_GROUP_DATA) { + ctl->stripe_size = SZ_1G; + ctl->max_chunk_size = 10 * ctl->stripe_size; + ctl->min_stripe_size = SZ_64M; + ctl->max_stripes = BTRFS_MAX_DEVS(info); + } else if (type & BTRFS_BLOCK_GROUP_METADATA) { + /* For larger filesystems, use larger metadata chunks */ + if (info->fs_devices->total_rw_bytes > 50ULL * SZ_1G) + ctl->max_chunk_size = SZ_1G; + else + ctl->max_chunk_size = SZ_256M; + ctl->stripe_size = ctl->max_chunk_size; + ctl->min_stripe_size = SZ_32M; + ctl->max_stripes = BTRFS_MAX_DEVS(info); } /* We don't want a chunk larger than 10% of the FS */
[PROBLEM] Our documentation says that a DATA block group can have up to 1G of size, or the at most 10% of the filesystem size. Currently, by default, mkfs.btrfs is creating an initial DATA block group of 8M of size, regardless of the filesystem size. It happens because we check for the ctl->type in init_alloc_chunk_ctl_policy_regular to be one of the BTRFS_BLOCK_GROUP_PROFILE_MASK bits, which is not the case for SINGLE (default) DATA block group: $ mkfs.btrfs -f /storage/btrfs.disk btrfs-progs v4.19.1 See http://btrfs.wiki.kernel.org for more information. Label: (null) UUID: 39e3492f-41f2-4bd7-8c25-93032606b9fe Node size: 16384 Sector size: 4096 Filesystem size: 55.00GiB Block group profiles: Data: single 8.00MiB Metadata: DUP 1.00GiB System: DUP 8.00MiB SSD detected: no Incompat features: extref, skinny-metadata Number of devices: 1 Devices: ID SIZE PATH 1 55.00GiB /storage/btrfs.disk In this case, for single data profile, it should create a data block group of 1G, since the filesystem if bigger than 50G. [FIX] Remove the check for BTRFS_BLOCK_GROUP_PROFILE_MASK in init_alloc_chunk_ctl_policy_regular function. The ctl->stripe_size is later on assigned to ctl.num_bytes, which is used by btrfs_make_block_group to create the block group. By removing the check we allow the code to always configure the correct stripe_size regardless of the PROFILE, looking on the block group TYPE. With the fix applied, it now created the BG correctly: $ ./mkfs.btrfs -f /storage/btrfs.disk btrfs-progs v5.10.1 See http://btrfs.wiki.kernel.org for more information. Label: (null) UUID: 5145e343-5639-462d-82ee-3eb75dc41c31 Node size: 16384 Sector size: 4096 Filesystem size: 55.00GiB Block group profiles: Data: single 1.00GiB Metadata: DUP 1.00GiB System: DUP 8.00MiB SSD detected: no Zoned device: no Incompat features: extref, skinny-metadata Runtime features: Checksum: crc32c Number of devices: 1 Devices: ID SIZE PATH 1 55.00GiB /storage/btrfs.disk Using a disk >50G it creates a 1G single data block group. Another example: $ ./mkfs.btrfs -f /storage/btrfs2.disk btrfs-progs v5.10.1 See http://btrfs.wiki.kernel.org for more information. Label: (null) UUID: c0910857-e512-4e76-9efa-50a475aafc87 Node size: 16384 Sector size: 4096 Filesystem size: 5.00GiB Block group profiles: Data: single 512.00MiB Metadata: DUP 256.00MiB System: DUP 8.00MiB SSD detected: no Zoned device: no Incompat features: extref, skinny-metadata Runtime features: Checksum: crc32c Number of devices: 1 Devices: ID SIZE PATH 1 5.00GiB /storage/btrfs2.disk The code now created a single data block group of 512M, which is exactly 10% of the size of the filesystem, which is 5G in this case. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> --- This change mimics what the kernel currently does, which is set the stripe_size regardless of the profile. Any thoughts on it? Thanks! kernel-shared/volumes.c | 40 +++++++++++++++++++--------------------- 1 file changed, 19 insertions(+), 21 deletions(-)