md: do not _put wrong device

Message ID	20230911105657.6816-1-mariusz.tkaczyk@linux.intel.com (mailing list archive)
State	Superseded, archived
Headers	show Return-Path: <linux-raid-owner@vger.kernel.org> From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> To: song@kernel.org Cc: linux-raid@vger.kernel.org Subject: [PATCH] md: do not _put wrong device Date: Mon, 11 Sep 2023 12:56:57 +0200 Message-Id: <20230911105657.6816-1-mariusz.tkaczyk@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	md: do not _put wrong device \| expand md: do not _put wrong device

Message ID

20230911105657.6816-1-mariusz.tkaczyk@linux.intel.com (mailing list archive)

State

Superseded, archived

Headers

From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
To: song@kernel.org
Cc: linux-raid@vger.kernel.org
Subject: [PATCH] md: do not _put wrong device
Date: Mon, 11 Sep 2023 12:56:57 +0200
Message-Id: <20230911105657.6816-1-mariusz.tkaczyk@linux.intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

md: do not _put wrong device | expand

Commit Message

Mariusz Tkaczyk Sept. 11, 2023, 10:56 a.m. UTC

Put the device which has been _get with previous md_seq_next call.
Add guard for improper mddev_put usage(). It shouldn't be called if
there are less than 1 for mddev->active.

I didn't convert atomic_t to refcount_t because refcount warns when 0 is
achieved which is likely to happen for mddev->active.

It fixes [1], I'm unable to reproduce this issue now.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217798
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
---

There is md_seq cleanup proposed by Yu, this one should be merged
first, because it is low risk fix for particular regression.

This is not complete fix for the problem, we need to prevent new open
in the middle of removal, I will propose patchset separately.

 drivers/md/md.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Song Liu Sept. 11, 2023, 9:35 p.m. UTC | #1

On Mon, Sep 11, 2023 at 3:57 AM Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> Put the device which has been _get with previous md_seq_next call.
> Add guard for improper mddev_put usage(). It shouldn't be called if
> there are less than 1 for mddev->active.
>
> I didn't convert atomic_t to refcount_t because refcount warns when 0 is
> achieved which is likely to happen for mddev->active.
>
> It fixes [1], I'm unable to reproduce this issue now.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217798
> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>

This patch somehow didn't make it through lore. Please repost and
CC folks so we can test it thoroughly. Also, please add more information
about the failure for future reference.

Thanks,
Song


> ---
>
> There is md_seq cleanup proposed by Yu, this one should be merged
> first, because it is low risk fix for particular regression.
>
> This is not complete fix for the problem, we need to prevent new open
> in the middle of removal, I will propose patchset separately.
>
>  drivers/md/md.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0fe7ab6e8ab9..ffae02406175 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -618,6 +618,12 @@ static void mddev_delayed_delete(struct work_struct *ws);
>
>  void mddev_put(struct mddev *mddev)
>  {
> +       /* Guard for ambiguous put. */
> +       if (unlikely(atomic_read(&mddev->active) < 1)) {
> +               pr_warn("%s: active refcount is lower than 1\n", mdname(mddev));
> +               return;
> +       }
> +
>         if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
>                 return;
>         if (!mddev->raid_disks && list_empty(&mddev->disks) &&
> @@ -8250,8 +8256,7 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>                 next_mddev = list_entry(tmp, struct mddev, all_mddevs);
>                 if (mddev_get(next_mddev))
>                         break;
> -               mddev = next_mddev;
> -               tmp = mddev->all_mddevs.next;
> +               tmp = next_mddev->all_mddevs.next;
>         }
>         spin_unlock(&all_mddevs_lock);
>
> --
> 2.26.2
>

Yu Kuai Sept. 12, 2023, 2:04 a.m. UTC | #2

Hi,

在 2023/09/11 18:56, Mariusz Tkaczyk 写道:
> Put the device which has been _get with previous md_seq_next call.
> Add guard for improper mddev_put usage(). It shouldn't be called if
> there are less than 1 for mddev->active.
> 
> I didn't convert atomic_t to refcount_t because refcount warns when 0 is
> achieved which is likely to happen for mddev->active.
> 
> It fixes [1], I'm unable to reproduce this issue now.
> 

Please add a fix tag:

Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the 
disk is freed")

> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217798
> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> ---
> 
> There is md_seq cleanup proposed by Yu, this one should be merged
> first, because it is low risk fix for particular regression.
> 
> This is not complete fix for the problem, we need to prevent new open
> in the middle of removal, I will propose patchset separately.
> 
>   drivers/md/md.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0fe7ab6e8ab9..ffae02406175 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -618,6 +618,12 @@ static void mddev_delayed_delete(struct work_struct *ws);
>   
>   void mddev_put(struct mddev *mddev)
>   {
> +	/* Guard for ambiguous put. */
> +	if (unlikely(atomic_read(&mddev->active) < 1)) {
> +		pr_warn("%s: active refcount is lower than 1\n", mdname(mddev));
> +		return;
> +	}
> +
>   	if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
>   		return;
>   	if (!mddev->raid_disks && list_empty(&mddev->disks) &&
> @@ -8250,8 +8256,7 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>   		next_mddev = list_entry(tmp, struct mddev, all_mddevs);
>   		if (mddev_get(next_mddev))
>   			break;
> -		mddev = next_mddev;
> -		tmp = mddev->all_mddevs.next;
> +		tmp = next_mddev->all_mddevs.next;
>   	}
>   	spin_unlock(&all_mddevs_lock);
>   
> 

And looks like commit 12a6caf27324 wants to mddev_put(to_put), and with
your change, 'to_put' is not needed as well, can you also remove the
local variable 'to_put'?

Thanks,
Kuai

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0fe7ab6e8ab9..ffae02406175 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -618,6 +618,12 @@  static void mddev_delayed_delete(struct work_struct *ws);
 
 void mddev_put(struct mddev *mddev)
 {
+	/* Guard for ambiguous put. */
+	if (unlikely(atomic_read(&mddev->active) < 1)) {
+		pr_warn("%s: active refcount is lower than 1\n", mdname(mddev));
+		return;
+	}
+
 	if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
 		return;
 	if (!mddev->raid_disks && list_empty(&mddev->disks) &&
@@ -8250,8 +8256,7 @@  static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 		next_mddev = list_entry(tmp, struct mddev, all_mddevs);
 		if (mddev_get(next_mddev))
 			break;
-		mddev = next_mddev;
-		tmp = mddev->all_mddevs.next;
+		tmp = next_mddev->all_mddevs.next;
 	}
 	spin_unlock(&all_mddevs_lock);

md: do not _put wrong device

Commit Message

Comments

Patch