btrfs: retry bg reclaim without infinite loop

Message ID	3ed106331a7ff61303c9e1c2930f8a7673b80a86.1717790627.git.boris@bur.io (mailing list archive)
State	New, archived
Headers	show Received: from fout2-smtp.messagingengine.com (fout2-smtp.messagingengine.com [103.168.172.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 96DE933EE for <linux-btrfs@vger.kernel.org>; Fri, 7 Jun 2024 20:04:20 +0000 (UTC) Feedback-ID: i083147f8:Fastmail From: Boris Burkov <boris@bur.io> To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH] btrfs: retry bg reclaim without infinite loop Date: Fri, 7 Jun 2024 13:04:04 -0700 Message-ID: <3ed106331a7ff61303c9e1c2930f8a7673b80a86.1717790627.git.boris@bur.io> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	btrfs: retry bg reclaim without infinite loop \| expand btrfs: retry bg reclaim without infinite loop

Message ID

3ed106331a7ff61303c9e1c2930f8a7673b80a86.1717790627.git.boris@bur.io (mailing list archive)

State

New, archived

Headers

Feedback-ID: i083147f8:Fastmail
From: Boris Burkov <boris@bur.io>
To: linux-btrfs@vger.kernel.org,
	kernel-team@fb.com
Subject: [PATCH] btrfs: retry bg reclaim without infinite loop
Date: Fri,  7 Jun 2024 13:04:04 -0700
Message-ID: 
 <3ed106331a7ff61303c9e1c2930f8a7673b80a86.1717790627.git.boris@bur.io>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

btrfs: retry bg reclaim without infinite loop | expand

Commit Message

Boris Burkov June 7, 2024, 8:04 p.m. UTC

If inc_block_group_ro systematically fails (e.g. due to ETXTBUSY from
swap!) or btrfs_relocate_chunk systematically fails (from lack of
space), then this worker becomes an infinite loop.

At the very least, this strands the cleaner thread, but can also result
in hung tasks/rcu stalls on PREEMPT_NONE kernels and if the
reclaim_bgs_lock mutex is not contended.

I believe the best long term fix is to manage reclaim via work queue,
where we queue up a relocation on the triggering condition and re-queue
on failure. In the meantime, this is an easy fix to apply to avoid the
immediate pain.

Cc: stable@vger.kernel.org
Fixes: 7e2718099438 ("btrfs: reinsert BGs failed to reclaim")
Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/block-group.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Johannes Thumshirn June 8, 2024, 12:30 p.m. UTC | #1

On 07.06.24 22:04, Boris Burkov wrote:
[...]
> Cc: stable@vger.kernel.org
> Fixes: 7e2718099438 ("btrfs: reinsert BGs failed to reclaim")
> Signed-off-by: Boris Burkov <boris@bur.io>

I think the code used to look like that before, but I can't remember if 
my mind serves me right.

> -		if (ret)
> -			btrfs_mark_bg_to_reclaim(bg);
> +		if (ret) {
> +			// refcount held by the reclaim_bgs list after splice

Nit: I don't think we use C++ style comments

Anyways, as is it looks good to me.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Filipe Manana June 11, 2024, 10:41 a.m. UTC | #2

On Fri, Jun 7, 2024 at 9:04 PM Boris Burkov <boris@bur.io> wrote:
>
> If inc_block_group_ro systematically fails (e.g. due to ETXTBUSY from
> swap!) or btrfs_relocate_chunk systematically fails (from lack of
> space), then this worker becomes an infinite loop.
>
> At the very least, this strands the cleaner thread, but can also result
> in hung tasks/rcu stalls on PREEMPT_NONE kernels and if the
> reclaim_bgs_lock mutex is not contended.
>
> I believe the best long term fix is to manage reclaim via work queue,
> where we queue up a relocation on the triggering condition and re-queue
> on failure. In the meantime, this is an easy fix to apply to avoid the
> immediate pain.
>
> Cc: stable@vger.kernel.org
> Fixes: 7e2718099438 ("btrfs: reinsert BGs failed to reclaim")
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  fs/btrfs/block-group.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 9910bae89966..fe61fc2df950 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1792,6 +1792,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>                 container_of(work, struct btrfs_fs_info, reclaim_bgs_work);
>         struct btrfs_block_group *bg;
>         struct btrfs_space_info *space_info;
> +       LIST_HEAD(retry_list);
>
>         if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
>                 return;
> @@ -1928,8 +1929,11 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>                 }
>
>  next:
> -               if (ret)
> -                       btrfs_mark_bg_to_reclaim(bg);
> +               if (ret) {
> +                       // refcount held by the reclaim_bgs list after splice

Apart from the comment style as pointed out by Johannes before:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> +                       btrfs_get_block_group(bg);
> +                       list_add_tail(&bg->bg_list, &retry_list);
> +               }
>                 btrfs_put_block_group(bg);
>
>                 mutex_unlock(&fs_info->reclaim_bgs_lock);
> @@ -1949,6 +1953,9 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
>         spin_unlock(&fs_info->unused_bgs_lock);
>         mutex_unlock(&fs_info->reclaim_bgs_lock);
>  end:
> +       spin_lock(&fs_info->unused_bgs_lock);
> +       list_splice_tail(&retry_list, &fs_info->reclaim_bgs);
> +       spin_unlock(&fs_info->unused_bgs_lock);
>         btrfs_exclop_finish(fs_info);
>         sb_end_write(fs_info->sb);
>  }
> --
> 2.45.2
>
>

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 9910bae89966..fe61fc2df950 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1792,6 +1792,7 @@  void btrfs_reclaim_bgs_work(struct work_struct *work)
 		container_of(work, struct btrfs_fs_info, reclaim_bgs_work);
 	struct btrfs_block_group *bg;
 	struct btrfs_space_info *space_info;
+	LIST_HEAD(retry_list);
 
 	if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
 		return;
@@ -1928,8 +1929,11 @@  void btrfs_reclaim_bgs_work(struct work_struct *work)
 		}
 
 next:
-		if (ret)
-			btrfs_mark_bg_to_reclaim(bg);
+		if (ret) {
+			// refcount held by the reclaim_bgs list after splice
+			btrfs_get_block_group(bg);
+			list_add_tail(&bg->bg_list, &retry_list);
+		}
 		btrfs_put_block_group(bg);
 
 		mutex_unlock(&fs_info->reclaim_bgs_lock);
@@ -1949,6 +1953,9 @@  void btrfs_reclaim_bgs_work(struct work_struct *work)
 	spin_unlock(&fs_info->unused_bgs_lock);
 	mutex_unlock(&fs_info->reclaim_bgs_lock);
 end:
+	spin_lock(&fs_info->unused_bgs_lock);
+	list_splice_tail(&retry_list, &fs_info->reclaim_bgs);
+	spin_unlock(&fs_info->unused_bgs_lock);
 	btrfs_exclop_finish(fs_info);
 	sb_end_write(fs_info->sb);
 }

btrfs: retry bg reclaim without infinite loop

Commit Message

Comments

Patch