[2/2] btrfs: zoned: wake up cleaner sooner if needed

Message ID	20240122-reclaim-fix-v1-2-761234a6d005@wdc.com (mailing list archive)
State	New, archived
Headers	show Received: from esa2.hgst.iphmx.com (esa2.hgst.iphmx.com [68.232.143.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68A0C3A8CD; Mon, 22 Jan 2024 10:51:15 +0000 (UTC) IronPort-SDR: Ol0B+PAA1Xfy4fEmP4YDPQ+AyE0T1Nw4ssoKvgy9fwd7nAiCDylfiv13aGgQ7+l1j1K8QCED70 c3TD8o/6KR1A== IronPort-SDR: HN3211Og+EnTAn1/uYEQkSJ5IpLy/3YoVt9P+QxZvh1hwrxvbYrIaEPgmZ3kVPmAxboLudfs7h T0v8FSZEZkvQ== WDCIronportException: Internal From: Johannes Thumshirn <johannes.thumshirn@wdc.com> Date: Mon, 22 Jan 2024 02:51:04 -0800 Subject: [PATCH 2/2] btrfs: zoned: wake up cleaner sooner if needed Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240122-reclaim-fix-v1-2-761234a6d005@wdc.com> References: <20240122-reclaim-fix-v1-0-761234a6d005@wdc.com> In-Reply-To: <20240122-reclaim-fix-v1-0-761234a6d005@wdc.com> To: Josef Bacik <josef@toxicpanda.com>, David Sterba <dsterba@suse.com> Cc: Naohiro Aota <naohiro.aota@wdc.com>, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, Johannes Thumshirn <johannes.thumshirn@wdc.com>, Damien Le Moal <dlemoal@kernel.org>
Series	btrfs: zoned: kick reclaim earlier on fast zoned devices \| expand [0/2] btrfs: zoned: kick reclaim earlier on fast zoned devices [1/2] btrfs: zoned: use rcu list for iterating devices to collect stats [2/2] btrfs: zoned: wake up cleaner sooner if needed

Johannes Thumshirn Jan. 22, 2024, 10:51 a.m. UTC

On very fast but small devices, waiting for a transaction commit can be
too long of a wait in order to wake up the cleaner kthread to remove unused
and reclaimable block-groups.

Check every time we're adding back free space to a block group, if we need
to activate the cleaner kthread.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/free-space-cache.c | 6 ++++++
 1 file changed, 6 insertions(+)

Naohiro Aota Jan. 22, 2024, 12:22 p.m. UTC | #1

On Mon, Jan 22, 2024 at 02:51:04AM -0800, Johannes Thumshirn wrote:
> On very fast but small devices, waiting for a transaction commit can be
> too long of a wait in order to wake up the cleaner kthread to remove unused
> and reclaimable block-groups.
> 
> Check every time we're adding back free space to a block group, if we need
> to activate the cleaner kthread.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/free-space-cache.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index d372c7ce0e6b..2d98b9ca0e83 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -30,6 +30,7 @@
>  #include "file-item.h"
>  #include "file.h"
>  #include "super.h"
> +#include "zoned.h"
>  
>  #define BITS_PER_BITMAP		(PAGE_SIZE * 8UL)
>  #define MAX_CACHE_BYTES_PER_GIG	SZ_64K
> @@ -2694,6 +2695,7 @@ int __btrfs_add_free_space(struct btrfs_block_group *block_group,
>  static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
>  					u64 bytenr, u64 size, bool used)
>  {
> +	struct btrfs_fs_info *fs_info = block_group->fs_info;
>  	struct btrfs_space_info *sinfo = block_group->space_info;
>  	struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
>  	u64 offset = bytenr - block_group->start;
> @@ -2745,6 +2747,10 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
>  		btrfs_mark_bg_to_reclaim(block_group);
>  	}
>  
> +	if (btrfs_zoned_should_reclaim(fs_info) &&
> +	    !test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags))
> +		wake_up_process(fs_info->cleaner_kthread);
> +

Isn't it too costly to call btrfs_zoned_should_reclaim() every time
something updated? Can we wake up it in btrfs_mark_bg_to_reclaim and
btrfs_mark_bg_unused ?

Also, looking into btrfs_zoned_should_reclaim(), it sums device->bytes_used
for each fs_devices->devices. And, device->bytes_used is set at
create_chunk() or at btrfs_remove_chunk(). Isn't it feasible to do the
calculation only there?

>  	return 0;
>  }
>  
> 
> -- 
> 2.43.0
>

Johannes Thumshirn Jan. 22, 2024, 12:30 p.m. UTC | #2

On 22.01.24 13:22, Naohiro Aota wrote:
> On Mon, Jan 22, 2024 at 02:51:04AM -0800, Johannes Thumshirn wrote:
>> On very fast but small devices, waiting for a transaction commit can be
>> too long of a wait in order to wake up the cleaner kthread to remove unused
>> and reclaimable block-groups.
>>
>> Check every time we're adding back free space to a block group, if we need
>> to activate the cleaner kthread.
>>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>> ---
>>   fs/btrfs/free-space-cache.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
>> index d372c7ce0e6b..2d98b9ca0e83 100644
>> --- a/fs/btrfs/free-space-cache.c
>> +++ b/fs/btrfs/free-space-cache.c
>> @@ -30,6 +30,7 @@
>>   #include "file-item.h"
>>   #include "file.h"
>>   #include "super.h"
>> +#include "zoned.h"
>>   
>>   #define BITS_PER_BITMAP		(PAGE_SIZE * 8UL)
>>   #define MAX_CACHE_BYTES_PER_GIG	SZ_64K
>> @@ -2694,6 +2695,7 @@ int __btrfs_add_free_space(struct btrfs_block_group *block_group,
>>   static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
>>   					u64 bytenr, u64 size, bool used)
>>   {
>> +	struct btrfs_fs_info *fs_info = block_group->fs_info;
>>   	struct btrfs_space_info *sinfo = block_group->space_info;
>>   	struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
>>   	u64 offset = bytenr - block_group->start;
>> @@ -2745,6 +2747,10 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
>>   		btrfs_mark_bg_to_reclaim(block_group);
>>   	}
>>   
>> +	if (btrfs_zoned_should_reclaim(fs_info) &&
>> +	    !test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags))
>> +		wake_up_process(fs_info->cleaner_kthread);
>> +
> 
> Isn't it too costly to call btrfs_zoned_should_reclaim() every time
> something updated? Can we wake up it in btrfs_mark_bg_to_reclaim and
> btrfs_mark_bg_unused ?

Hmm yes, I've thought of adding a list_count() for the reclaim and 
unused lists, and only calling into should_reclaim if these lists have 
entries. Or even better !list_is_singular().

> 
> Also, looking into btrfs_zoned_should_reclaim(), it sums device->bytes_used
> for each fs_devices->devices. And, device->bytes_used is set at
> create_chunk() or at btrfs_remove_chunk(). Isn't it feasible to do the
> calculation only there?

Oh sh*t! Right we should check bytes_used from all space_infos in 
btrfs_zoned_should_reclaim() and compare that to the disk total bytes.

Naohiro Aota Jan. 22, 2024, 2:39 p.m. UTC | #3

On Mon, Jan 22, 2024 at 12:30:52PM +0000, Johannes Thumshirn wrote:
> On 22.01.24 13:22, Naohiro Aota wrote:
> > On Mon, Jan 22, 2024 at 02:51:04AM -0800, Johannes Thumshirn wrote:
> >> On very fast but small devices, waiting for a transaction commit can be
> >> too long of a wait in order to wake up the cleaner kthread to remove unused
> >> and reclaimable block-groups.
> >>
> >> Check every time we're adding back free space to a block group, if we need
> >> to activate the cleaner kthread.
> >>
> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> >> ---
> >>   fs/btrfs/free-space-cache.c | 6 ++++++
> >>   1 file changed, 6 insertions(+)
> >>
> >> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> >> index d372c7ce0e6b..2d98b9ca0e83 100644
> >> --- a/fs/btrfs/free-space-cache.c
> >> +++ b/fs/btrfs/free-space-cache.c
> >> @@ -30,6 +30,7 @@
> >>   #include "file-item.h"
> >>   #include "file.h"
> >>   #include "super.h"
> >> +#include "zoned.h"
> >>   
> >>   #define BITS_PER_BITMAP		(PAGE_SIZE * 8UL)
> >>   #define MAX_CACHE_BYTES_PER_GIG	SZ_64K
> >> @@ -2694,6 +2695,7 @@ int __btrfs_add_free_space(struct btrfs_block_group *block_group,
> >>   static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
> >>   					u64 bytenr, u64 size, bool used)
> >>   {
> >> +	struct btrfs_fs_info *fs_info = block_group->fs_info;
> >>   	struct btrfs_space_info *sinfo = block_group->space_info;
> >>   	struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
> >>   	u64 offset = bytenr - block_group->start;
> >> @@ -2745,6 +2747,10 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
> >>   		btrfs_mark_bg_to_reclaim(block_group);
> >>   	}
> >>   
> >> +	if (btrfs_zoned_should_reclaim(fs_info) &&
> >> +	    !test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags))
> >> +		wake_up_process(fs_info->cleaner_kthread);
> >> +
> > 
> > Isn't it too costly to call btrfs_zoned_should_reclaim() every time
> > something updated? Can we wake up it in btrfs_mark_bg_to_reclaim and
> > btrfs_mark_bg_unused ?
> 
> Hmm yes, I've thought of adding a list_count() for the reclaim and 
> unused lists, and only calling into should_reclaim if these lists have 
> entries. Or even better !list_is_singular().

That sounds reasonable.

> > 
> > Also, looking into btrfs_zoned_should_reclaim(), it sums device->bytes_used
> > for each fs_devices->devices. And, device->bytes_used is set at
> > create_chunk() or at btrfs_remove_chunk(). Isn't it feasible to do the
> > calculation only there?
> 
> Oh sh*t! Right we should check bytes_used from all space_infos in 
> btrfs_zoned_should_reclaim() and compare that to the disk total bytes.

You mean device->bytes_used? space_info->bytes_used does not count free
space and zone_unusable in BGs, so using that changes the behavior. Even,
it won't kick the thread if there are many zone_unusable but small used
space.

>

Johannes Thumshirn Jan. 22, 2024, 2:43 p.m. UTC | #4

On 22.01.24 15:39, Naohiro Aota wrote:
>>>
>>> Also, looking into btrfs_zoned_should_reclaim(), it sums device->bytes_used
>>> for each fs_devices->devices. And, device->bytes_used is set at
>>> create_chunk() or at btrfs_remove_chunk(). Isn't it feasible to do the
>>> calculation only there?
>>
>> Oh sh*t! Right we should check bytes_used from all space_infos in
>> btrfs_zoned_should_reclaim() and compare that to the disk total bytes.
> 
> You mean device->bytes_used? space_info->bytes_used does not count free
> space and zone_unusable in BGs, so using that changes the behavior. Even,
> it won't kick the thread if there are many zone_unusable but small used
> space.
> 

I did mean btrfs_space_info_used():

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index b7e7b5a5a6fa..d5242c96c97c 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -2414,6 +2414,7 @@ bool btrfs_zoned_should_reclaim(struct 
btrfs_fs_info *fs_info)
  {
         struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
         struct btrfs_device *device;
+       struct btrfs_space_info *space_info;
         u64 used = 0;
         u64 total = 0;
         u64 factor;
@@ -2429,10 +2430,15 @@ bool btrfs_zoned_should_reclaim(struct 
btrfs_fs_info *fs_info)
                         continue;

                 total += device->disk_total_bytes;
-               used += device->bytes_used;
         }
         rcu_read_unlock();

+       list_for_each_entry(space_info, &fs_info->space_info, list) {
+               spin_lock(&space_info->lock);
+               used += btrfs_space_info_used(space_info, true);
+               spin_unlock(&space_info->lock);
+       }
+
         factor = div64_u64(used * 100, total);
         return factor >= fs_info->bg_reclaim_threshold;
  }

Naohiro Aota Jan. 22, 2024, 3:26 p.m. UTC | #5

On Mon, Jan 22, 2024 at 02:43:03PM +0000, Johannes Thumshirn wrote:
> On 22.01.24 15:39, Naohiro Aota wrote:
> >>>
> >>> Also, looking into btrfs_zoned_should_reclaim(), it sums device->bytes_used
> >>> for each fs_devices->devices. And, device->bytes_used is set at
> >>> create_chunk() or at btrfs_remove_chunk(). Isn't it feasible to do the
> >>> calculation only there?
> >>
> >> Oh sh*t! Right we should check bytes_used from all space_infos in
> >> btrfs_zoned_should_reclaim() and compare that to the disk total bytes.
> > 
> > You mean device->bytes_used? space_info->bytes_used does not count free
> > space and zone_unusable in BGs, so using that changes the behavior. Even,
> > it won't kick the thread if there are many zone_unusable but small used
> > space.
> > 
> 
> I did mean btrfs_space_info_used():
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index b7e7b5a5a6fa..d5242c96c97c 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -2414,6 +2414,7 @@ bool btrfs_zoned_should_reclaim(struct 
> btrfs_fs_info *fs_info)
>   {
>          struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>          struct btrfs_device *device;
> +       struct btrfs_space_info *space_info;
>          u64 used = 0;
>          u64 total = 0;
>          u64 factor;
> @@ -2429,10 +2430,15 @@ bool btrfs_zoned_should_reclaim(struct 
> btrfs_fs_info *fs_info)
>                          continue;
> 
>                  total += device->disk_total_bytes;
> -               used += device->bytes_used;
>          }
>          rcu_read_unlock();
> 
> +       list_for_each_entry(space_info, &fs_info->space_info, list) {
> +               spin_lock(&space_info->lock);
> +               used += btrfs_space_info_used(space_info, true);
> +               spin_unlock(&space_info->lock);
> +       }
> +
>          factor = div64_u64(used * 100, total);
>          return factor >= fs_info->bg_reclaim_threshold;
>   }
> 

This changes the behavior. btrfs_space_info_used() excludes unallocated
space.

Also, if we calculate it with device->disk_total_bytes, it screws up on
DUP/RAID* profile because btrfs_space_info_used() counts the logical space
vs disk_total_bytes counts the physical space.

Boris Burkov Jan. 22, 2024, 11:51 p.m. UTC | #6

On Mon, Jan 22, 2024 at 02:51:04AM -0800, Johannes Thumshirn wrote:
> On very fast but small devices, waiting for a transaction commit can be
> too long of a wait in order to wake up the cleaner kthread to remove unused
> and reclaimable block-groups.
> 
> Check every time we're adding back free space to a block group, if we need
> to activate the cleaner kthread.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/free-space-cache.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index d372c7ce0e6b..2d98b9ca0e83 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -30,6 +30,7 @@
>  #include "file-item.h"
>  #include "file.h"
>  #include "super.h"
> +#include "zoned.h"
>  
>  #define BITS_PER_BITMAP		(PAGE_SIZE * 8UL)
>  #define MAX_CACHE_BYTES_PER_GIG	SZ_64K
> @@ -2694,6 +2695,7 @@ int __btrfs_add_free_space(struct btrfs_block_group *block_group,
>  static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
>  					u64 bytenr, u64 size, bool used)
>  {

I thought add_free_space are only called from various error/backout
conditions and then for real from unpin_extent_range, which is also in
the transaction commit.

The normal reclaim/unused decision is made in btrfs_update_block_group
for that reason.

OTOH, I assume you had some repro that was performing poorly and this
patch fixed it so, I am very likely missing something.

Thanks,
Boris

> +	struct btrfs_fs_info *fs_info = block_group->fs_info;
>  	struct btrfs_space_info *sinfo = block_group->space_info;
>  	struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
>  	u64 offset = bytenr - block_group->start;
> @@ -2745,6 +2747,10 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
>  		btrfs_mark_bg_to_reclaim(block_group);
>  	}
>  
> +	if (btrfs_zoned_should_reclaim(fs_info) &&
> +	    !test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags))
> +		wake_up_process(fs_info->cleaner_kthread);
> +
>  	return 0;
>  }
>  
> 
> -- 
> 2.43.0
>

[2/2] btrfs: zoned: wake up cleaner sooner if needed

Commit Message

Comments

Patch