diff mbox series

[1/1] md/raid5: Wait sync io to finish before changing group cnt

Message ID 20241106095124.74577-1-xni@redhat.com (mailing list archive)
State Accepted
Headers show
Series [1/1] md/raid5: Wait sync io to finish before changing group cnt | expand

Checks

Context Check Description
mdraidci/vmtest-md-6_13-PR success PR summary
mdraidci/vmtest-md-6_13-VM_Test-0 success Logs for per-patch-testing

Commit Message

Xiao Ni Nov. 6, 2024, 9:51 a.m. UTC
One customer reports a bug: raid5 is hung when changing thread cnt
while resync is running. The stripes are all in conf->handle_list
and new threads can't handle them.

Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
pers->quiesce from mddev_suspend/resume. Before this patch, mddev_suspend
needs to wait for all ios including sync io to finish. Now it's used
to only wait normal io.

In this patch, it calls raid5_quiesce in raid5_store_group_thread_cnt
directly to wait all sync requests to finish before changing the group
cnt.

Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/raid5.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Yu Kuai Nov. 7, 2024, 1:19 p.m. UTC | #1
在 2024/11/06 17:51, Xiao Ni 写道:
> One customer reports a bug: raid5 is hung when changing thread cnt
> while resync is running. The stripes are all in conf->handle_list
> and new threads can't handle them.
> 
> Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
> pers->quiesce from mddev_suspend/resume. Before this patch, mddev_suspend
> needs to wait for all ios including sync io to finish. Now it's used
> to only wait normal io.
> 
> In this patch, it calls raid5_quiesce in raid5_store_group_thread_cnt
> directly to wait all sync requests to finish before changing the group
> cnt.
> 
> Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
> Signed-off-by: Xiao Ni <xni@redhat.com>
> ---
>   drivers/md/raid5.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
LGTM
Reviewed-by: Yu Kuai <yukuai3@huawei.com>

> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index dc2ea636d173..2fa1f270fb1d 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7177,6 +7177,8 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
>   	err = mddev_suspend_and_lock(mddev);
>   	if (err)
>   		return err;
> +	raid5_quiesce(mddev, true);
> +
>   	conf = mddev->private;
>   	if (!conf)
>   		err = -ENODEV;
> @@ -7198,6 +7200,8 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
>   			kfree(old_groups);
>   		}
>   	}
> +
> +	raid5_quiesce(mddev, false);
>   	mddev_unlock_and_resume(mddev);
>   
>   	return err ?: len;
>
Song Liu Nov. 7, 2024, 11:37 p.m. UTC | #2
On Thu, Nov 7, 2024 at 5:20 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> 在 2024/11/06 17:51, Xiao Ni 写道:
> > One customer reports a bug: raid5 is hung when changing thread cnt
> > while resync is running. The stripes are all in conf->handle_list
> > and new threads can't handle them.
> >
> > Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
> > pers->quiesce from mddev_suspend/resume. Before this patch, mddev_suspend
> > needs to wait for all ios including sync io to finish. Now it's used
> > to only wait normal io.
> >
> > In this patch, it calls raid5_quiesce in raid5_store_group_thread_cnt
> > directly to wait all sync requests to finish before changing the group
> > cnt.
> >
> > Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
> > Signed-off-by: Xiao Ni <xni@redhat.com>
> > ---
> >   drivers/md/raid5.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> LGTM
> Reviewed-by: Yu Kuai <yukuai3@huawei.com>

Applied to md-6.13. Thanks for the fix!

Song
diff mbox series

Patch

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index dc2ea636d173..2fa1f270fb1d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7177,6 +7177,8 @@  raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
 	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
+	raid5_quiesce(mddev, true);
+
 	conf = mddev->private;
 	if (!conf)
 		err = -ENODEV;
@@ -7198,6 +7200,8 @@  raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
 			kfree(old_groups);
 		}
 	}
+
+	raid5_quiesce(mddev, false);
 	mddev_unlock_and_resume(mddev);
 
 	return err ?: len;