mbox series

[v6,0/4] btrfs: sysfs: set / query btrfs chunk size

Message ID 20211203220445.2312182-1-shr@fb.com (mailing list archive)
Headers show
Series btrfs: sysfs: set / query btrfs chunk size | expand

Message

Stefan Roesch Dec. 3, 2021, 10:04 p.m. UTC
The btrfs allocator is currently not ideal for all workloads. It tends
to suffer from overallocating data block groups and underallocating
metadata block groups. This results in filesystems becoming read-only
even though there is plenty of "free" space.

This is naturally confusing and distressing to users.

Patches:
1) Store the chunk size in the btrfs_space_info structure
2) Add a sysfs entry to read and write the above information
3) Add a sysfs entry to force a space allocation
4) Increase the default size of the metadata chunk allocation to 5GB
   for volumes greater than 50GB.

Testing:
  A new test is being added to the xfstest suite. For reference the
  corresponding patch has the title:
    [PATCH] btrfs: Test chunk allocation with different sizes

  In addition also manual testing has been performed.
    - Run xfstests with the changes and the new test. It does not
      show new diffs.
    - Test with storage devices 10G, 20G, 30G, 50G, 60G
      - Default allocation
      - Increase of chunk size
      - If the stripe size is > the free space, it allocates
        free space - 1MB. The 1MB is left as free space.
      - If the device has a storage size > 50G, it uses a 5GB
        chunk size for new allocations.

---
v6: - Update btrfs_force_chunk_alloc_store to use tree_root
      instead of extent_root.
V5: - Changed the field name in the btrfs_space_info struct from
      default_chunk_size to chunk_size and made it atomic
    - Removed the compute_chunk_size_zoned function
    - Added further checks when writing /sys/fs/btrfs/<id>/allocation/<type>/chunk_size
    - Removed the ability to query /sys/fs/btrfs/<id>/allocation/<type>/force_alloc_chunk

V4: - Patch email contained duplicate entries.

V3: - Rename sysfs entry from stripe_size to chunk_size
    - Remove max_chunk_size field in data structure btrfs_space_info
    - Rename max_stripe_size field to default_chunk_size in data
      structure btrfs_space_info
    - Remove max_chunk_size logic
    - Use stripe_size = chunk_size

V2:
   - Split the patch in 4 patches
   - Added checks for zone volumes in sysfs.c
   - Replaced the BUG() with ASSERT()
   - Changed if with fallthrough
   - Removed comments in space-info.h
--
Stefan Roesch (4):
  btrfs: store chunk size in space-info struct.
  btrfs: expose chunk size in sysfs.
  btrfs: add force_chunk_alloc sysfs entry to force allocation
  btrfs: increase metadata alloc size to 5GB for volumes > 50GB

 fs/btrfs/space-info.c |  41 ++++++++++++
 fs/btrfs/space-info.h |   3 +
 fs/btrfs/sysfs.c      | 152 ++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c    |  28 +++-----
 4 files changed, 205 insertions(+), 19 deletions(-)


Signed-off-by: Stefan Roesch <shr@fb.com>
base-commit: 87c673b657a7e4784fb7274a77a8529712232d0c

Comments

Qu Wenruo Aug. 18, 2022, 10:40 a.m. UTC | #1
On 2021/12/4 06:04, Stefan Roesch wrote:
> The btrfs allocator is currently not ideal for all workloads. It tends
> to suffer from overallocating data block groups and underallocating
> metadata block groups. This results in filesystems becoming read-only
> even though there is plenty of "free" space.
>
> This is naturally confusing and distressing to users.
>
> Patches:
> 1) Store the chunk size in the btrfs_space_info structure

I guess it's too late, since the series is already in upstream, but I
believe we need to do some more review on this feature.

Firstly, I don't believe a single chunk_size (even it's per space-info)
is good enough.

The default chunk allocator has both checks on stripe_size and
chunk_size, but the series only added chunk_size knob, missing
stripe_size one.

It would be much better to add another stripe_size knob for it.

> 2) Add a sysfs entry to read and write the above information
> 3) Add a sysfs entry to force a space allocation

Any idea on how we handle the auto-reclaim on empty bgs?
AKA, what's preventing the newly allocated bgs from being immediately
reclaimed?

I didn't see any special handling in the 3rd patch.

> 4) Increase the default size of the metadata chunk allocation to 5GB
>     for volumes greater than 50GB.

It would be better to also allow users to tweak the size threshold.
(E.g. allowing a knob for threshold on different chunk/stripe size
limits, along with above mentioned stripe size knob).

Thanks,
Qu
>
> Testing:
>    A new test is being added to the xfstest suite. For reference the
>    corresponding patch has the title:
>      [PATCH] btrfs: Test chunk allocation with different sizes
>
>    In addition also manual testing has been performed.
>      - Run xfstests with the changes and the new test. It does not
>        show new diffs.
>      - Test with storage devices 10G, 20G, 30G, 50G, 60G
>        - Default allocation
>        - Increase of chunk size
>        - If the stripe size is > the free space, it allocates
>          free space - 1MB. The 1MB is left as free space.
>        - If the device has a storage size > 50G, it uses a 5GB
>          chunk size for new allocations.
>
> ---
> v6: - Update btrfs_force_chunk_alloc_store to use tree_root
>        instead of extent_root.
> V5: - Changed the field name in the btrfs_space_info struct from
>        default_chunk_size to chunk_size and made it atomic
>      - Removed the compute_chunk_size_zoned function
>      - Added further checks when writing /sys/fs/btrfs/<id>/allocation/<type>/chunk_size
>      - Removed the ability to query /sys/fs/btrfs/<id>/allocation/<type>/force_alloc_chunk
>
> V4: - Patch email contained duplicate entries.
>
> V3: - Rename sysfs entry from stripe_size to chunk_size
>      - Remove max_chunk_size field in data structure btrfs_space_info
>      - Rename max_stripe_size field to default_chunk_size in data
>        structure btrfs_space_info
>      - Remove max_chunk_size logic
>      - Use stripe_size = chunk_size
>
> V2:
>     - Split the patch in 4 patches
>     - Added checks for zone volumes in sysfs.c
>     - Replaced the BUG() with ASSERT()
>     - Changed if with fallthrough
>     - Removed comments in space-info.h
> --
> Stefan Roesch (4):
>    btrfs: store chunk size in space-info struct.
>    btrfs: expose chunk size in sysfs.
>    btrfs: add force_chunk_alloc sysfs entry to force allocation
>    btrfs: increase metadata alloc size to 5GB for volumes > 50GB
>
>   fs/btrfs/space-info.c |  41 ++++++++++++
>   fs/btrfs/space-info.h |   3 +
>   fs/btrfs/sysfs.c      | 152 ++++++++++++++++++++++++++++++++++++++++++
>   fs/btrfs/volumes.c    |  28 +++-----
>   4 files changed, 205 insertions(+), 19 deletions(-)
>
>
> Signed-off-by: Stefan Roesch <shr@fb.com>
> base-commit: 87c673b657a7e4784fb7274a77a8529712232d0c