diff mbox series

md: do not _put wrong device

Message ID 20230911105657.6816-1-mariusz.tkaczyk@linux.intel.com (mailing list archive)
State Superseded, archived
Headers show
Series md: do not _put wrong device | expand

Commit Message

Mariusz Tkaczyk Sept. 11, 2023, 10:56 a.m. UTC
Put the device which has been _get with previous md_seq_next call.
Add guard for improper mddev_put usage(). It shouldn't be called if
there are less than 1 for mddev->active.

I didn't convert atomic_t to refcount_t because refcount warns when 0 is
achieved which is likely to happen for mddev->active.

It fixes [1], I'm unable to reproduce this issue now.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217798
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
---

There is md_seq cleanup proposed by Yu, this one should be merged
first, because it is low risk fix for particular regression.

This is not complete fix for the problem, we need to prevent new open
in the middle of removal, I will propose patchset separately.

 drivers/md/md.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Song Liu Sept. 11, 2023, 9:35 p.m. UTC | #1
On Mon, Sep 11, 2023 at 3:57 AM Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> Put the device which has been _get with previous md_seq_next call.
> Add guard for improper mddev_put usage(). It shouldn't be called if
> there are less than 1 for mddev->active.
>
> I didn't convert atomic_t to refcount_t because refcount warns when 0 is
> achieved which is likely to happen for mddev->active.
>
> It fixes [1], I'm unable to reproduce this issue now.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217798
> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>

This patch somehow didn't make it through lore. Please repost and
CC folks so we can test it thoroughly. Also, please add more information
about the failure for future reference.

Thanks,
Song


> ---
>
> There is md_seq cleanup proposed by Yu, this one should be merged
> first, because it is low risk fix for particular regression.
>
> This is not complete fix for the problem, we need to prevent new open
> in the middle of removal, I will propose patchset separately.
>
>  drivers/md/md.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0fe7ab6e8ab9..ffae02406175 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -618,6 +618,12 @@ static void mddev_delayed_delete(struct work_struct *ws);
>
>  void mddev_put(struct mddev *mddev)
>  {
> +       /* Guard for ambiguous put. */
> +       if (unlikely(atomic_read(&mddev->active) < 1)) {
> +               pr_warn("%s: active refcount is lower than 1\n", mdname(mddev));
> +               return;
> +       }
> +
>         if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
>                 return;
>         if (!mddev->raid_disks && list_empty(&mddev->disks) &&
> @@ -8250,8 +8256,7 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>                 next_mddev = list_entry(tmp, struct mddev, all_mddevs);
>                 if (mddev_get(next_mddev))
>                         break;
> -               mddev = next_mddev;
> -               tmp = mddev->all_mddevs.next;
> +               tmp = next_mddev->all_mddevs.next;
>         }
>         spin_unlock(&all_mddevs_lock);
>
> --
> 2.26.2
>
Yu Kuai Sept. 12, 2023, 2:04 a.m. UTC | #2
Hi,

在 2023/09/11 18:56, Mariusz Tkaczyk 写道:
> Put the device which has been _get with previous md_seq_next call.
> Add guard for improper mddev_put usage(). It shouldn't be called if
> there are less than 1 for mddev->active.
> 
> I didn't convert atomic_t to refcount_t because refcount warns when 0 is
> achieved which is likely to happen for mddev->active.
> 
> It fixes [1], I'm unable to reproduce this issue now.
> 

Please add a fix tag:

Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the 
disk is freed")

> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217798
> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> ---
> 
> There is md_seq cleanup proposed by Yu, this one should be merged
> first, because it is low risk fix for particular regression.
> 
> This is not complete fix for the problem, we need to prevent new open
> in the middle of removal, I will propose patchset separately.
> 
>   drivers/md/md.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0fe7ab6e8ab9..ffae02406175 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -618,6 +618,12 @@ static void mddev_delayed_delete(struct work_struct *ws);
>   
>   void mddev_put(struct mddev *mddev)
>   {
> +	/* Guard for ambiguous put. */
> +	if (unlikely(atomic_read(&mddev->active) < 1)) {
> +		pr_warn("%s: active refcount is lower than 1\n", mdname(mddev));
> +		return;
> +	}
> +
>   	if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
>   		return;
>   	if (!mddev->raid_disks && list_empty(&mddev->disks) &&
> @@ -8250,8 +8256,7 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
>   		next_mddev = list_entry(tmp, struct mddev, all_mddevs);
>   		if (mddev_get(next_mddev))
>   			break;
> -		mddev = next_mddev;
> -		tmp = mddev->all_mddevs.next;
> +		tmp = next_mddev->all_mddevs.next;
>   	}
>   	spin_unlock(&all_mddevs_lock);
>   
> 

And looks like commit 12a6caf27324 wants to mddev_put(to_put), and with
your change, 'to_put' is not needed as well, can you also remove the
local variable 'to_put'?

Thanks,
Kuai
diff mbox series

Patch

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0fe7ab6e8ab9..ffae02406175 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -618,6 +618,12 @@  static void mddev_delayed_delete(struct work_struct *ws);
 
 void mddev_put(struct mddev *mddev)
 {
+	/* Guard for ambiguous put. */
+	if (unlikely(atomic_read(&mddev->active) < 1)) {
+		pr_warn("%s: active refcount is lower than 1\n", mdname(mddev));
+		return;
+	}
+
 	if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
 		return;
 	if (!mddev->raid_disks && list_empty(&mddev->disks) &&
@@ -8250,8 +8256,7 @@  static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 		next_mddev = list_entry(tmp, struct mddev, all_mddevs);
 		if (mddev_get(next_mddev))
 			break;
-		mddev = next_mddev;
-		tmp = mddev->all_mddevs.next;
+		tmp = next_mddev->all_mddevs.next;
 	}
 	spin_unlock(&all_mddevs_lock);