mbox series

[0/5] btrfs: qgroup: address the performance penalty for subvolume dropping

Message ID 20210831094903.111432-1-wqu@suse.com (mailing list archive)
Headers show
Series btrfs: qgroup: address the performance penalty for subvolume dropping | expand

Message

Qu Wenruo Aug. 31, 2021, 9:48 a.m. UTC
Btrfs qgroup has a long history of bringing huge performance penalty,
from subvolume dropping to balance.

Although we solved the problem for balance, but the subvolume dropping
problem is still unresolved, as we really need to do all the costly
backref for all the involved subtrees, or qgroup numbers will be
inconsistent.

But the performance penalty is sometimes too big, so big that it's
better just to disable qgroup, do the drop, then do the rescan.

This patchset will address the problem by introducing a user
configurable sysfs interface, to allow certain high subtree dropping to
mark qgroup inconsistent, and skip the whole accounting.

The following things are needed for this objective:

- New qgroups attributes

  Instead of plain qgroup kobjects, we need extra attributes like
  drop_subtree_threshold.

  This patchset will introduce two new attributes to the existing
  qgroups kobject:
  * qgroups_flags
    To indicate the qgroup status flags like ON, RESCAN, INCONSISTENT.

  * drop_subtree_threshold
    To show the subtree dropping level threshold.
    The default value is BTRFS_MAX_LEVEL (8), which means all subtree
    dropping will go through the qgroup accounting, while costly it will
    try to keep qgroup numbers as consistent as possible.

    Users can specify values like 3, meaning any subtree which is at
    level 3 or higher will mark qgroup inconsistent and skip all the
    costly accounting.

    This only affects subvolume dropping.

- Skip qgroup accounting when the numbers are already inconsistent

  But still keeps the qgroup relationship correct, thus users can keep
  its qgroup organization while do the rescan later.


This sysfs interface needs user space tools to monitor and set the
values for each btrfs.

Currently the target user space tool is snapper, which by default
utilizes qgroups for its space-aware snapshots reclaim mechanism.

Qu Wenruo (5):
  btrfs: sysfs: introduce qgroup global attribute groups
  btrfs: introduce BTRFS_QGROUP_STATUS_FLAGS_MASK for later expansion
  btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN
  btrfs: introduce BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING to skip
    qgroup accounting
  btrfs: skip subtree scan if it's too high to avoid low stall in
    btrfs_commit_transaction()

 fs/btrfs/ctree.h                |   1 +
 fs/btrfs/disk-io.c              |   1 +
 fs/btrfs/qgroup.c               |  87 +++++++++++++++++++-------
 fs/btrfs/qgroup.h               |   3 +
 fs/btrfs/sysfs.c                | 106 ++++++++++++++++++++++++++++++--
 include/uapi/linux/btrfs_tree.h |   4 ++
 6 files changed, 176 insertions(+), 26 deletions(-)

Comments

David Sterba Sept. 2, 2021, 4:25 p.m. UTC | #1
On Tue, Aug 31, 2021 at 05:48:58PM +0800, Qu Wenruo wrote:
> Btrfs qgroup has a long history of bringing huge performance penalty,
> from subvolume dropping to balance.
> 
> Although we solved the problem for balance, but the subvolume dropping
> problem is still unresolved, as we really need to do all the costly
> backref for all the involved subtrees, or qgroup numbers will be
> inconsistent.
> 
> But the performance penalty is sometimes too big, so big that it's
> better just to disable qgroup, do the drop, then do the rescan.
> 
> This patchset will address the problem by introducing a user
> configurable sysfs interface, to allow certain high subtree dropping to
> mark qgroup inconsistent, and skip the whole accounting.
> 
> The following things are needed for this objective:
> 
> - New qgroups attributes
> 
>   Instead of plain qgroup kobjects, we need extra attributes like
>   drop_subtree_threshold.
> 
>   This patchset will introduce two new attributes to the existing
>   qgroups kobject:
>   * qgroups_flags
>     To indicate the qgroup status flags like ON, RESCAN, INCONSISTENT.
> 
>   * drop_subtree_threshold
>     To show the subtree dropping level threshold.
>     The default value is BTRFS_MAX_LEVEL (8), which means all subtree
>     dropping will go through the qgroup accounting, while costly it will
>     try to keep qgroup numbers as consistent as possible.
> 
>     Users can specify values like 3, meaning any subtree which is at
>     level 3 or higher will mark qgroup inconsistent and skip all the
>     costly accounting.
> 
>     This only affects subvolume dropping.
> 
> - Skip qgroup accounting when the numbers are already inconsistent
> 
>   But still keeps the qgroup relationship correct, thus users can keep
>   its qgroup organization while do the rescan later.
> 
> 
> This sysfs interface needs user space tools to monitor and set the
> values for each btrfs.
> 
> Currently the target user space tool is snapper, which by default
> utilizes qgroups for its space-aware snapshots reclaim mechanism.

This is an interesting approach, though I'm there are some usability
questions. First as a user, how do I know I need to use it?  The height
of the subvolume fs tree is not easily accessible.

The sysfs file is not protected in any way so multiple tools can decide
to set it to different values. And whether rescan is required or not
depends on the value so setting it.

It might be better to set the level (or a bit) to the subvol deletion
request, eg. a "fast" mode that would internally use maximum height 3 to
do slow deletion and anything else for fast leaving qgroup numbers
inconsistent.
Qu Wenruo Sept. 2, 2021, 10:28 p.m. UTC | #2
On 2021/9/3 上午12:25, David Sterba wrote:
> On Tue, Aug 31, 2021 at 05:48:58PM +0800, Qu Wenruo wrote:
>> Btrfs qgroup has a long history of bringing huge performance penalty,
>> from subvolume dropping to balance.
>>
>> Although we solved the problem for balance, but the subvolume dropping
>> problem is still unresolved, as we really need to do all the costly
>> backref for all the involved subtrees, or qgroup numbers will be
>> inconsistent.
>>
>> But the performance penalty is sometimes too big, so big that it's
>> better just to disable qgroup, do the drop, then do the rescan.
>>
>> This patchset will address the problem by introducing a user
>> configurable sysfs interface, to allow certain high subtree dropping to
>> mark qgroup inconsistent, and skip the whole accounting.
>>
>> The following things are needed for this objective:
>>
>> - New qgroups attributes
>>
>>    Instead of plain qgroup kobjects, we need extra attributes like
>>    drop_subtree_threshold.
>>
>>    This patchset will introduce two new attributes to the existing
>>    qgroups kobject:
>>    * qgroups_flags
>>      To indicate the qgroup status flags like ON, RESCAN, INCONSISTENT.
>>
>>    * drop_subtree_threshold
>>      To show the subtree dropping level threshold.
>>      The default value is BTRFS_MAX_LEVEL (8), which means all subtree
>>      dropping will go through the qgroup accounting, while costly it will
>>      try to keep qgroup numbers as consistent as possible.
>>
>>      Users can specify values like 3, meaning any subtree which is at
>>      level 3 or higher will mark qgroup inconsistent and skip all the
>>      costly accounting.
>>
>>      This only affects subvolume dropping.
>>
>> - Skip qgroup accounting when the numbers are already inconsistent
>>
>>    But still keeps the qgroup relationship correct, thus users can keep
>>    its qgroup organization while do the rescan later.
>>
>>
>> This sysfs interface needs user space tools to monitor and set the
>> values for each btrfs.
>>
>> Currently the target user space tool is snapper, which by default
>> utilizes qgroups for its space-aware snapshots reclaim mechanism.
>
> This is an interesting approach, though I'm there are some usability
> questions. First as a user, how do I know I need to use it?  The height
> of the subvolume fs tree is not easily accessible.

The generic idea is, if you're using qgroup and find btrfs-cleaner
taking all CPU for a while, then it's the case.

>
> The sysfs file is not protected in any way so multiple tools can decide
> to set it to different values. And whether rescan is required or not
> depends on the value so setting it.

That's true, but shouldn't that be the problem of the users?

>
> It might be better to set the level (or a bit) to the subvol deletion
> request, eg. a "fast" mode that would internally use maximum height 3 to
> do slow deletion and anything else for fast leaving qgroup numbers
> inconsistent.
>
The problem is, the qgroup part is completely optional, thus I'm not
sure if it's a good idea to add new interface just for an optional feature.

Furthermore, when deleting a subvolume, it's only unlinked, the real
deletion can happen after a mount cycle, in that case, runtime values
will be lost.

If we really want consistent behavior, then we need new on-disk format,
which looks overkilled to me.

Thanks,
Qu