diff mbox series

[1/4] btrfs: delete unused BGs while reclaiming BGs

Message ID 06288ff9c090a5bfe76e0a2080eb1fbd640cdd62.1686028197.git.naohiro.aota@wdc.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fixes for reclaim | expand

Commit Message

Naohiro Aota June 6, 2023, 5:36 a.m. UTC
The reclaiming process only starts after the FS volumes are allocated to a
certain level (75% by default). Thus, the list of reclaiming target block
groups can build up so huge at the time the reclaim process kicks in. On a
test run, there were over 1000 BGs in the reclaim list.

As the reclaim involves rewriting the data, it takes really long time to
reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs()
won't proceed because the reclaim side is holding
fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused
BGs kept in the unused list. On my test run, I got 1057 unused BGs.

Since deleting a block group is relatively easy and fast work, we can call
btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up
unused BGs.

Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Filipe Manana June 6, 2023, 9:54 a.m. UTC | #1
On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote:
>
> The reclaiming process only starts after the FS volumes are allocated to a
> certain level (75% by default). Thus, the list of reclaiming target block
> groups can build up so huge at the time the reclaim process kicks in. On a
> test run, there were over 1000 BGs in the reclaim list.
>
> As the reclaim involves rewriting the data, it takes really long time to
> reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs()
> won't proceed because the reclaim side is holding
> fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused
> BGs kept in the unused list. On my test run, I got 1057 unused BGs.
>
> Since deleting a block group is relatively easy and fast work, we can call
> btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up
> unused BGs.
>
> Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
> CC: stable@vger.kernel.org # 5.15+
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, great catch.

> ---
>  fs/btrfs/block-group.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 618ba7670e66..c5547da0f6eb 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1824,10 +1824,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>
>  next:
>                 btrfs_put_block_group(bg);
> +
> +               mutex_unlock(&fs_info->reclaim_bgs_lock);
> +               /*
> +                * Reclaiming all the BGs in the list can take really long.
> +                * Prioritize cleanning up unused BGs.
> +                */
> +               btrfs_delete_unused_bgs(fs_info);
> +               /*
> +                * If we are interrupted by a balance, we can just bail out. The
> +                * cleaner thread call me again if necessary.
> +                */
> +               if (!mutex_trylock(&fs_info->reclaim_bgs_lock))
> +                       goto end;
>                 spin_lock(&fs_info->unused_bgs_lock);
>         }
>         spin_unlock(&fs_info->unused_bgs_lock);
>         mutex_unlock(&fs_info->reclaim_bgs_lock);
> +end:
>         btrfs_exclop_finish(fs_info);
>         sb_end_write(fs_info->sb);
>  }
> --
> 2.40.1
>
Johannes Thumshirn June 6, 2023, 11:32 a.m. UTC | #2
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Boris Burkov June 7, 2023, 6:48 p.m. UTC | #3
On Tue, Jun 06, 2023 at 02:36:33PM +0900, Naohiro Aota wrote:
> The reclaiming process only starts after the FS volumes are allocated to a
> certain level (75% by default). Thus, the list of reclaiming target block
> groups can build up so huge at the time the reclaim process kicks in. On a
> test run, there were over 1000 BGs in the reclaim list.
> 
> As the reclaim involves rewriting the data, it takes really long time to
> reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs()
> won't proceed because the reclaim side is holding
> fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused
> BGs kept in the unused list. On my test run, I got 1057 unused BGs.
> 
> Since deleting a block group is relatively easy and fast work, we can call
> btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up
> unused BGs.
> 
> Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
> CC: stable@vger.kernel.org # 5.15+
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/block-group.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 618ba7670e66..c5547da0f6eb 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1824,10 +1824,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>  
>  next:
>  		btrfs_put_block_group(bg);
> +
> +		mutex_unlock(&fs_info->reclaim_bgs_lock);
> +		/*
> +		 * Reclaiming all the BGs in the list can take really long.
> +		 * Prioritize cleanning up unused BGs.
> +		 */
> +		btrfs_delete_unused_bgs(fs_info);
> +		/*
> +		 * If we are interrupted by a balance, we can just bail out. The
> +		 * cleaner thread call me again if necessary.
> +		 */
> +		if (!mutex_trylock(&fs_info->reclaim_bgs_lock))
> +			goto end;

I agree that this fix makes sense and a lot of reclaim should not block
deleting unused bgs.

However, it feels quite hacky to me, because by the current design, we
are explicitly calling btrfs_delete_unused_bgs as a first class cleanup
action from the cleaner thread, before calling reclaim. I think a little
more cleanup to integrate the two together would reduce the "throw
things at the wall" feel here.

I would propose either:
1. Run them in parallel and make sure they release locks appropriately
   so that everyone makes good forward progress. I think it's a good
   model, but kind of a departure from the single cleaner thread so
   maybe risky/a pain to code.
2. Just get rid of the explicit delete unused from cleaner and
   integrate it as a first class component of this reclaim loop. This
   loop becomes *the* delete unused work except shutdown type cases.

FWIW, not a NAK, just my 2c.

Thanks,
Boris

>  		spin_lock(&fs_info->unused_bgs_lock);
>  	}
>  	spin_unlock(&fs_info->unused_bgs_lock);
>  	mutex_unlock(&fs_info->reclaim_bgs_lock);
> +end:
>  	btrfs_exclop_finish(fs_info);
>  	sb_end_write(fs_info->sb);
>  }
> -- 
> 2.40.1
>
Naohiro Aota June 12, 2023, 12:50 p.m. UTC | #4
On Wed, Jun 07, 2023 at 11:48:37AM -0700, Boris Burkov wrote:
> On Tue, Jun 06, 2023 at 02:36:33PM +0900, Naohiro Aota wrote:
> > The reclaiming process only starts after the FS volumes are allocated to a
> > certain level (75% by default). Thus, the list of reclaiming target block
> > groups can build up so huge at the time the reclaim process kicks in. On a
> > test run, there were over 1000 BGs in the reclaim list.
> > 
> > As the reclaim involves rewriting the data, it takes really long time to
> > reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs()
> > won't proceed because the reclaim side is holding
> > fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused
> > BGs kept in the unused list. On my test run, I got 1057 unused BGs.
> > 
> > Since deleting a block group is relatively easy and fast work, we can call
> > btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up
> > unused BGs.
> > 
> > Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
> > CC: stable@vger.kernel.org # 5.15+
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> >  fs/btrfs/block-group.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > index 618ba7670e66..c5547da0f6eb 100644
> > --- a/fs/btrfs/block-group.c
> > +++ b/fs/btrfs/block-group.c
> > @@ -1824,10 +1824,24 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
> >  
> >  next:
> >  		btrfs_put_block_group(bg);
> > +
> > +		mutex_unlock(&fs_info->reclaim_bgs_lock);
> > +		/*
> > +		 * Reclaiming all the BGs in the list can take really long.
> > +		 * Prioritize cleanning up unused BGs.
> > +		 */
> > +		btrfs_delete_unused_bgs(fs_info);
> > +		/*
> > +		 * If we are interrupted by a balance, we can just bail out. The
> > +		 * cleaner thread call me again if necessary.
> > +		 */
> > +		if (!mutex_trylock(&fs_info->reclaim_bgs_lock))
> > +			goto end;
> 
> I agree that this fix makes sense and a lot of reclaim should not block
> deleting unused bgs.
> 
> However, it feels quite hacky to me, because by the current design, we
> are explicitly calling btrfs_delete_unused_bgs as a first class cleanup
> action from the cleaner thread, before calling reclaim. I think a little
> more cleanup to integrate the two together would reduce the "throw
> things at the wall" feel here.
> 
> I would propose either:
> 1. Run them in parallel and make sure they release locks appropriately
>    so that everyone makes good forward progress. I think it's a good
>    model, but kind of a departure from the single cleaner thread so
>    maybe risky/a pain to code.
> 2. Just get rid of the explicit delete unused from cleaner and
>    integrate it as a first class component of this reclaim loop. This
>    loop becomes *the* delete unused work except shutdown type cases.
> 
> FWIW, not a NAK, just my 2c.

Thank you for your suggestion. Yeah, I also considered something like your
option 2, but I hesitated to do so as a fix patch because it will change
the current cleaner and reclaim thread model. I'll rework them an a
integrated loop if btrfs devs agree with it.

> Thanks,
> Boris
> 
> >  		spin_lock(&fs_info->unused_bgs_lock);
> >  	}
> >  	spin_unlock(&fs_info->unused_bgs_lock);
> >  	mutex_unlock(&fs_info->reclaim_bgs_lock);
> > +end:
> >  	btrfs_exclop_finish(fs_info);
> >  	sb_end_write(fs_info->sb);
> >  }
> > -- 
> > 2.40.1
> >
diff mbox series

Patch

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 618ba7670e66..c5547da0f6eb 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1824,10 +1824,24 @@  void btrfs_reclaim_bgs_work(struct work_struct *work)
 
 next:
 		btrfs_put_block_group(bg);
+
+		mutex_unlock(&fs_info->reclaim_bgs_lock);
+		/*
+		 * Reclaiming all the BGs in the list can take really long.
+		 * Prioritize cleanning up unused BGs.
+		 */
+		btrfs_delete_unused_bgs(fs_info);
+		/*
+		 * If we are interrupted by a balance, we can just bail out. The
+		 * cleaner thread call me again if necessary.
+		 */
+		if (!mutex_trylock(&fs_info->reclaim_bgs_lock))
+			goto end;
 		spin_lock(&fs_info->unused_bgs_lock);
 	}
 	spin_unlock(&fs_info->unused_bgs_lock);
 	mutex_unlock(&fs_info->reclaim_bgs_lock);
+end:
 	btrfs_exclop_finish(fs_info);
 	sb_end_write(fs_info->sb);
 }