[3/5] btrfs: make discard_workfn block_group ref explicit

Message ID	ab8a6f9b2e3ce6ee7196a481233956ea9629b7b8.1741306938.git.boris@bur.io (mailing list archive)
State	New
Headers	show Received: from fout-a6-smtp.messagingengine.com (fout-a6-smtp.messagingengine.com [103.168.172.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8162827701 for <linux-btrfs@vger.kernel.org>; Fri, 7 Mar 2025 00:28:18 +0000 (UTC) Feedback-ID: i083147f8:Fastmail From: Boris Burkov <boris@bur.io> To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 3/5] btrfs: make discard_workfn block_group ref explicit Date: Thu, 6 Mar 2025 16:29:03 -0800 Message-ID: <ab8a6f9b2e3ce6ee7196a481233956ea9629b7b8.1741306938.git.boris@bur.io> In-Reply-To: <cover.1741306938.git.boris@bur.io> References: <cover.1741306938.git.boris@bur.io> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	btrfs: block_group refcounting fixes \| expand [0/5] btrfs: block_group refcounting fixes [1/5] btrfs: fix bg refcount race in btrfs_create_pending_block_groups [2/5] btrfs: fix bg->bg_list list_del refcount races [3/5] btrfs: make discard_workfn block_group ref explicit [4/5] btrfs: explicitly ref count block_group on new_bgs list [5/5] btrfs: codify pattern for adding block_group to bg_list

Message ID

ab8a6f9b2e3ce6ee7196a481233956ea9629b7b8.1741306938.git.boris@bur.io (mailing list archive)

State

New

Headers

Feedback-ID: i083147f8:Fastmail
From: Boris Burkov <boris@bur.io>
To: linux-btrfs@vger.kernel.org,
	kernel-team@fb.com
Subject: [PATCH 3/5] btrfs: make discard_workfn block_group ref explicit
Date: Thu,  6 Mar 2025 16:29:03 -0800
Message-ID: 
 <ab8a6f9b2e3ce6ee7196a481233956ea9629b7b8.1741306938.git.boris@bur.io>
In-Reply-To: <cover.1741306938.git.boris@bur.io>
References: <cover.1741306938.git.boris@bur.io>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

btrfs: block_group refcounting fixes | expand

Commit Message

Boris Burkov March 7, 2025, 12:29 a.m. UTC

Currently, the async discard machinery owns a ref to the block_group
when the block_group is queued on a discard list. However, to handle
races with discard cancellation and the discard workfn, we have some
cute logic to detect that the block_group is *currently* running in the
workfn, to protect the workfn's usage amidst cancellation.

As far as I can tell, this doesn't have any overt bugs (though
finish_discard_pass and remove_from_discard_list racing can have a
surprising outcome for the caller of remove_from_discard_list in that it
is again added at the end)

But it is needlessly complicated to rely on locking and the nullity of
discard_ctl->block_group. Simplify this significantly by just taking a
refcount while we are in the workfn and uncondtionally drop it in both
the remove and workfn paths, regardless of if they race.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/discard.c | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

Comments

Filipe Manana March 7, 2025, 2:33 p.m. UTC | #1

On Fri, Mar 7, 2025 at 12:31 AM Boris Burkov <boris@bur.io> wrote:
>
> Currently, the async discard machinery owns a ref to the block_group
> when the block_group is queued on a discard list. However, to handle
> races with discard cancellation and the discard workfn, we have some
> cute logic to detect that the block_group is *currently* running in the
> workfn, to protect the workfn's usage amidst cancellation.
>
> As far as I can tell, this doesn't have any overt bugs (though
> finish_discard_pass and remove_from_discard_list racing can have a
> surprising outcome for the caller of remove_from_discard_list in that it
> is again added at the end)
>
> But it is needlessly complicated to rely on locking and the nullity of
> discard_ctl->block_group. Simplify this significantly by just taking a
> refcount while we are in the workfn and uncondtionally drop it in both
> the remove and workfn paths, regardless of if they race.
>
> Signed-off-by: Boris Burkov <boris@bur.io>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/discard.c | 34 ++++++++++++++++------------------
>  1 file changed, 16 insertions(+), 18 deletions(-)
>
> diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> index e815d165cccc..d6eef4bd9e9d 100644
> --- a/fs/btrfs/discard.c
> +++ b/fs/btrfs/discard.c
> @@ -167,13 +167,7 @@ static bool remove_from_discard_list(struct btrfs_discard_ctl *discard_ctl,
>         block_group->discard_eligible_time = 0;
>         queued = !list_empty(&block_group->discard_list);
>         list_del_init(&block_group->discard_list);
> -       /*
> -        * If the block group is currently running in the discard workfn, we
> -        * don't want to deref it, since it's still being used by the workfn.
> -        * The workfn will notice this case and deref the block group when it is
> -        * finished.
> -        */
> -       if (queued && !running)
> +       if (queued)
>                 btrfs_put_block_group(block_group);
>
>         spin_unlock(&discard_ctl->lock);
> @@ -260,9 +254,10 @@ static struct btrfs_block_group *peek_discard_list(
>                         block_group->discard_cursor = block_group->start;
>                         block_group->discard_state = BTRFS_DISCARD_EXTENTS;
>                 }
> -               discard_ctl->block_group = block_group;
>         }
>         if (block_group) {
> +               btrfs_get_block_group(block_group);
> +               discard_ctl->block_group = block_group;
>                 *discard_state = block_group->discard_state;
>                 *discard_index = block_group->discard_index;
>         }
> @@ -493,9 +488,20 @@ static void btrfs_discard_workfn(struct work_struct *work)
>
>         block_group = peek_discard_list(discard_ctl, &discard_state,
>                                         &discard_index, now);
> -       if (!block_group || !btrfs_run_discard_work(discard_ctl))
> +       if (!block_group)
>                 return;
> +       if (!btrfs_run_discard_work(discard_ctl)) {
> +               spin_lock(&discard_ctl->lock);
> +               btrfs_put_block_group(block_group);
> +               discard_ctl->block_group = NULL;
> +               spin_unlock(&discard_ctl->lock);
> +               return;
> +       }
>         if (now < block_group->discard_eligible_time) {
> +               spin_lock(&discard_ctl->lock);
> +               btrfs_put_block_group(block_group);
> +               discard_ctl->block_group = NULL;
> +               spin_unlock(&discard_ctl->lock);
>                 btrfs_discard_schedule_work(discard_ctl, false);
>                 return;
>         }
> @@ -547,15 +553,7 @@ static void btrfs_discard_workfn(struct work_struct *work)
>         spin_lock(&discard_ctl->lock);
>         discard_ctl->prev_discard = trimmed;
>         discard_ctl->prev_discard_time = now;
> -       /*
> -        * If the block group was removed from the discard list while it was
> -        * running in this workfn, then we didn't deref it, since this function
> -        * still owned that reference. But we set the discard_ctl->block_group
> -        * back to NULL, so we can use that condition to know that now we need
> -        * to deref the block_group.
> -        */
> -       if (discard_ctl->block_group == NULL)
> -               btrfs_put_block_group(block_group);
> +       btrfs_put_block_group(block_group);
>         discard_ctl->block_group = NULL;
>         __btrfs_discard_schedule_work(discard_ctl, now, false);
>         spin_unlock(&discard_ctl->lock);
> --
> 2.48.1
>
>

diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index e815d165cccc..d6eef4bd9e9d 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -167,13 +167,7 @@  static bool remove_from_discard_list(struct btrfs_discard_ctl *discard_ctl,
 	block_group->discard_eligible_time = 0;
 	queued = !list_empty(&block_group->discard_list);
 	list_del_init(&block_group->discard_list);
-	/*
-	 * If the block group is currently running in the discard workfn, we
-	 * don't want to deref it, since it's still being used by the workfn.
-	 * The workfn will notice this case and deref the block group when it is
-	 * finished.
-	 */
-	if (queued && !running)
+	if (queued)
 		btrfs_put_block_group(block_group);
 
 	spin_unlock(&discard_ctl->lock);
@@ -260,9 +254,10 @@  static struct btrfs_block_group *peek_discard_list(
 			block_group->discard_cursor = block_group->start;
 			block_group->discard_state = BTRFS_DISCARD_EXTENTS;
 		}
-		discard_ctl->block_group = block_group;
 	}
 	if (block_group) {
+		btrfs_get_block_group(block_group);
+		discard_ctl->block_group = block_group;
 		*discard_state = block_group->discard_state;
 		*discard_index = block_group->discard_index;
 	}
@@ -493,9 +488,20 @@  static void btrfs_discard_workfn(struct work_struct *work)
 
 	block_group = peek_discard_list(discard_ctl, &discard_state,
 					&discard_index, now);
-	if (!block_group || !btrfs_run_discard_work(discard_ctl))
+	if (!block_group)
 		return;
+	if (!btrfs_run_discard_work(discard_ctl)) {
+		spin_lock(&discard_ctl->lock);
+		btrfs_put_block_group(block_group);
+		discard_ctl->block_group = NULL;
+		spin_unlock(&discard_ctl->lock);
+		return;
+	}
 	if (now < block_group->discard_eligible_time) {
+		spin_lock(&discard_ctl->lock);
+		btrfs_put_block_group(block_group);
+		discard_ctl->block_group = NULL;
+		spin_unlock(&discard_ctl->lock);
 		btrfs_discard_schedule_work(discard_ctl, false);
 		return;
 	}
@@ -547,15 +553,7 @@  static void btrfs_discard_workfn(struct work_struct *work)
 	spin_lock(&discard_ctl->lock);
 	discard_ctl->prev_discard = trimmed;
 	discard_ctl->prev_discard_time = now;
-	/*
-	 * If the block group was removed from the discard list while it was
-	 * running in this workfn, then we didn't deref it, since this function
-	 * still owned that reference. But we set the discard_ctl->block_group
-	 * back to NULL, so we can use that condition to know that now we need
-	 * to deref the block_group.
-	 */
-	if (discard_ctl->block_group == NULL)
-		btrfs_put_block_group(block_group);
+	btrfs_put_block_group(block_group);
 	discard_ctl->block_group = NULL;
 	__btrfs_discard_schedule_work(discard_ctl, now, false);
 	spin_unlock(&discard_ctl->lock);

[3/5] btrfs: make discard_workfn block_group ref explicit

Commit Message

Comments

Patch