Message ID | 20230911105657.6816-1-mariusz.tkaczyk@linux.intel.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | md: do not _put wrong device | expand |
On Mon, Sep 11, 2023 at 3:57 AM Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> wrote: > > Put the device which has been _get with previous md_seq_next call. > Add guard for improper mddev_put usage(). It shouldn't be called if > there are less than 1 for mddev->active. > > I didn't convert atomic_t to refcount_t because refcount warns when 0 is > achieved which is likely to happen for mddev->active. > > It fixes [1], I'm unable to reproduce this issue now. > > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217798 > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> This patch somehow didn't make it through lore. Please repost and CC folks so we can test it thoroughly. Also, please add more information about the failure for future reference. Thanks, Song > --- > > There is md_seq cleanup proposed by Yu, this one should be merged > first, because it is low risk fix for particular regression. > > This is not complete fix for the problem, we need to prevent new open > in the middle of removal, I will propose patchset separately. > > drivers/md/md.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 0fe7ab6e8ab9..ffae02406175 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -618,6 +618,12 @@ static void mddev_delayed_delete(struct work_struct *ws); > > void mddev_put(struct mddev *mddev) > { > + /* Guard for ambiguous put. */ > + if (unlikely(atomic_read(&mddev->active) < 1)) { > + pr_warn("%s: active refcount is lower than 1\n", mdname(mddev)); > + return; > + } > + > if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) > return; > if (!mddev->raid_disks && list_empty(&mddev->disks) && > @@ -8250,8 +8256,7 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) > next_mddev = list_entry(tmp, struct mddev, all_mddevs); > if (mddev_get(next_mddev)) > break; > - mddev = next_mddev; > - tmp = mddev->all_mddevs.next; > + tmp = next_mddev->all_mddevs.next; > } > spin_unlock(&all_mddevs_lock); > > -- > 2.26.2 >
Hi, 在 2023/09/11 18:56, Mariusz Tkaczyk 写道: > Put the device which has been _get with previous md_seq_next call. > Add guard for improper mddev_put usage(). It shouldn't be called if > there are less than 1 for mddev->active. > > I didn't convert atomic_t to refcount_t because refcount warns when 0 is > achieved which is likely to happen for mddev->active. > > It fixes [1], I'm unable to reproduce this issue now. > Please add a fix tag: Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the disk is freed") > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217798 > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > --- > > There is md_seq cleanup proposed by Yu, this one should be merged > first, because it is low risk fix for particular regression. > > This is not complete fix for the problem, we need to prevent new open > in the middle of removal, I will propose patchset separately. > > drivers/md/md.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 0fe7ab6e8ab9..ffae02406175 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -618,6 +618,12 @@ static void mddev_delayed_delete(struct work_struct *ws); > > void mddev_put(struct mddev *mddev) > { > + /* Guard for ambiguous put. */ > + if (unlikely(atomic_read(&mddev->active) < 1)) { > + pr_warn("%s: active refcount is lower than 1\n", mdname(mddev)); > + return; > + } > + > if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) > return; > if (!mddev->raid_disks && list_empty(&mddev->disks) && > @@ -8250,8 +8256,7 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) > next_mddev = list_entry(tmp, struct mddev, all_mddevs); > if (mddev_get(next_mddev)) > break; > - mddev = next_mddev; > - tmp = mddev->all_mddevs.next; > + tmp = next_mddev->all_mddevs.next; > } > spin_unlock(&all_mddevs_lock); > > And looks like commit 12a6caf27324 wants to mddev_put(to_put), and with your change, 'to_put' is not needed as well, can you also remove the local variable 'to_put'? Thanks, Kuai
diff --git a/drivers/md/md.c b/drivers/md/md.c index 0fe7ab6e8ab9..ffae02406175 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -618,6 +618,12 @@ static void mddev_delayed_delete(struct work_struct *ws); void mddev_put(struct mddev *mddev) { + /* Guard for ambiguous put. */ + if (unlikely(atomic_read(&mddev->active) < 1)) { + pr_warn("%s: active refcount is lower than 1\n", mdname(mddev)); + return; + } + if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) return; if (!mddev->raid_disks && list_empty(&mddev->disks) && @@ -8250,8 +8256,7 @@ static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) next_mddev = list_entry(tmp, struct mddev, all_mddevs); if (mddev_get(next_mddev)) break; - mddev = next_mddev; - tmp = mddev->all_mddevs.next; + tmp = next_mddev->all_mddevs.next; } spin_unlock(&all_mddevs_lock);
Put the device which has been _get with previous md_seq_next call. Add guard for improper mddev_put usage(). It shouldn't be called if there are less than 1 for mddev->active. I didn't convert atomic_t to refcount_t because refcount warns when 0 is achieved which is likely to happen for mddev->active. It fixes [1], I'm unable to reproduce this issue now. [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217798 Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> --- There is md_seq cleanup proposed by Yu, this one should be merged first, because it is low risk fix for particular regression. This is not complete fix for the problem, we need to prevent new open in the middle of removal, I will propose patchset separately. drivers/md/md.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)