diff mbox series

[4/4] btrfs: reinsert BGs failed to reclaim

Message ID e8acfcfefeb3156e11e60ea97dcd2c6ecf984101.1686028197.git.naohiro.aota@wdc.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fixes for reclaim | expand

Commit Message

Naohiro Aota June 6, 2023, 5:36 a.m. UTC
The reclaim process can temporally fail. For example, if the space is
getting tight, it fails to make the block group read-only. If there are no
further writes on that block group, the block group will never get back to
the reclaim list, and the BG never get reclaimed. In a certain workload, we
can leave many such block groups never reclaimed.

So, let's get it back to the list and give it a chance to be reclaimed.

Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/block-group.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Filipe Manana June 6, 2023, 10:25 a.m. UTC | #1
On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote:
>
> The reclaim process can temporally fail. For example, if the space is

temporally -> temporarily

> getting tight, it fails to make the block group read-only. If there are no
> further writes on that block group, the block group will never get back to
> the reclaim list, and the BG never get reclaimed. In a certain workload, we

get -> gets

> can leave many such block groups never reclaimed.
>
> So, let's get it back to the list and give it a chance to be reclaimed.
>
> Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
> CC: stable@vger.kernel.org # 5.15+
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/block-group.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index db9bee071434..36e0d9b5d824 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1833,7 +1833,18 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>                 }
>
>  next:
> -               btrfs_put_block_group(bg);
> +               if (!ret) {
> +                       btrfs_put_block_group(bg);
> +               } else {
> +                       spin_lock(&bg->lock);

Why do we need to take bg->lock?
The ->bg_list is protected by fs_info->unused_bgs_lock alone.

> +                       spin_lock(&fs_info->unused_bgs_lock);
> +                       if (list_empty(&bg->bg_list))
> +                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> +                       else
> +                               btrfs_put_block_group(bg);
> +                       spin_unlock(&fs_info->unused_bgs_lock);
> +                       spin_unlock(&bg->lock);
> +               }

Also, this is very similar to btrfs_mark_bg_to_reclaim(), so we should
use that, and simply have:

btrfs_mark_bg_to_reclaim();
btrfs_put_block_group(bg);

Thanks.

>
>                 mutex_unlock(&fs_info->reclaim_bgs_lock);
>                 /*
> --
> 2.40.1
>
David Sterba June 6, 2023, 4:23 p.m. UTC | #2
On Tue, Jun 06, 2023 at 11:25:23AM +0100, Filipe Manana wrote:
> On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote:
> > +                       spin_lock(&fs_info->unused_bgs_lock);
> > +                       if (list_empty(&bg->bg_list))
> > +                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> > +                       else
> > +                               btrfs_put_block_group(bg);
> > +                       spin_unlock(&fs_info->unused_bgs_lock);
> > +                       spin_unlock(&bg->lock);
> > +               }
> 
> Also, this is very similar to btrfs_mark_bg_to_reclaim(), so we should
> use that, and simply have:
> 
> btrfs_mark_bg_to_reclaim();
> btrfs_put_block_group(bg);

I can fold the diff below if you agree

--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1833,18 +1833,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
                }
 
 next:
-               if (!ret) {
-                       btrfs_put_block_group(bg);
-               } else {
-                       spin_lock(&bg->lock);
-                       spin_lock(&fs_info->unused_bgs_lock);
-                       if (list_empty(&bg->bg_list))
-                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
-                       else
-                               btrfs_put_block_group(bg);
-                       spin_unlock(&fs_info->unused_bgs_lock);
-                       spin_unlock(&bg->lock);
-               }
+               if (!ret)
+                       btrfs_mark_bg_to_reclaim(bg);
+               btrfs_put_block_group(bg);
 
                mutex_unlock(&fs_info->reclaim_bgs_lock);
                /*
---
Naohiro Aota June 6, 2023, 11:55 p.m. UTC | #3
On Tue, Jun 06, 2023 at 06:23:23PM +0200, David Sterba wrote:
> On Tue, Jun 06, 2023 at 11:25:23AM +0100, Filipe Manana wrote:
> > On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote:
> > > +                       spin_lock(&fs_info->unused_bgs_lock);
> > > +                       if (list_empty(&bg->bg_list))
> > > +                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> > > +                       else
> > > +                               btrfs_put_block_group(bg);
> > > +                       spin_unlock(&fs_info->unused_bgs_lock);
> > > +                       spin_unlock(&bg->lock);
> > > +               }
> > 
> > Also, this is very similar to btrfs_mark_bg_to_reclaim(), so we should
> > use that, and simply have:
> > 
> > btrfs_mark_bg_to_reclaim();
> > btrfs_put_block_group(bg);

Yeah, this looks nice. Thank you.

> 
> I can fold the diff below if you agree
> 
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1833,18 +1833,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>                 }
>  
>  next:
> -               if (!ret) {
> -                       btrfs_put_block_group(bg);
> -               } else {
> -                       spin_lock(&bg->lock);
> -                       spin_lock(&fs_info->unused_bgs_lock);
> -                       if (list_empty(&bg->bg_list))
> -                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> -                       else
> -                               btrfs_put_block_group(bg);
> -                       spin_unlock(&fs_info->unused_bgs_lock);
> -                       spin_unlock(&bg->lock);
> -               }
> +               if (!ret)
> +                       btrfs_mark_bg_to_reclaim(bg);

Thank you for folding this, but the condition is flipped.  We should add it
back to the list in a failure case.

> +               btrfs_put_block_group(bg);
>  
>                 mutex_unlock(&fs_info->reclaim_bgs_lock);
>                 /*
> ---
David Sterba June 8, 2023, 11:48 a.m. UTC | #4
On Tue, Jun 06, 2023 at 11:55:48PM +0000, Naohiro Aota wrote:
> On Tue, Jun 06, 2023 at 06:23:23PM +0200, David Sterba wrote:
> > On Tue, Jun 06, 2023 at 11:25:23AM +0100, Filipe Manana wrote:
> > > On Tue, Jun 6, 2023 at 7:04 AM Naohiro Aota <naota@elisp.net> wrote:
> > > > +                       spin_lock(&fs_info->unused_bgs_lock);
> > > > +                       if (list_empty(&bg->bg_list))
> > > > +                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> > > > +                       else
> > > > +                               btrfs_put_block_group(bg);
> > > > +                       spin_unlock(&fs_info->unused_bgs_lock);
> > > > +                       spin_unlock(&bg->lock);
> > > > +               }
> > > 
> > > Also, this is very similar to btrfs_mark_bg_to_reclaim(), so we should
> > > use that, and simply have:
> > > 
> > > btrfs_mark_bg_to_reclaim();
> > > btrfs_put_block_group(bg);
> 
> Yeah, this looks nice. Thank you.
> 
> > 
> > I can fold the diff below if you agree
> > 
> > --- a/fs/btrfs/block-group.c
> > +++ b/fs/btrfs/block-group.c
> > @@ -1833,18 +1833,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
> >                 }
> >  
> >  next:
> > -               if (!ret) {
> > -                       btrfs_put_block_group(bg);
> > -               } else {
> > -                       spin_lock(&bg->lock);
> > -                       spin_lock(&fs_info->unused_bgs_lock);
> > -                       if (list_empty(&bg->bg_list))
> > -                               list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
> > -                       else
> > -                               btrfs_put_block_group(bg);
> > -                       spin_unlock(&fs_info->unused_bgs_lock);
> > -                       spin_unlock(&bg->lock);
> > -               }
> > +               if (!ret)
> > +                       btrfs_mark_bg_to_reclaim(bg);
> 
> Thank you for folding this, but the condition is flipped.  We should add it
> back to the list in a failure case.

Ah right, fixed and pushed. The whole diff becomes:

               if (ret)
                       btrfs_mark_bg_to_reclaim(bg);
diff mbox series

Patch

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index db9bee071434..36e0d9b5d824 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1833,7 +1833,18 @@  void btrfs_reclaim_bgs_work(struct work_struct *work)
 		}
 
 next:
-		btrfs_put_block_group(bg);
+		if (!ret) {
+			btrfs_put_block_group(bg);
+		} else {
+			spin_lock(&bg->lock);
+			spin_lock(&fs_info->unused_bgs_lock);
+			if (list_empty(&bg->bg_list))
+				list_add_tail(&bg->bg_list, &fs_info->reclaim_bgs);
+			else
+				btrfs_put_block_group(bg);
+			spin_unlock(&fs_info->unused_bgs_lock);
+			spin_unlock(&bg->lock);
+		}
 
 		mutex_unlock(&fs_info->reclaim_bgs_lock);
 		/*