diff mbox series

[RFC,2/4] md: Set MD_RECOVERY_FROZEN before stop sync thread

Message ID 20240220153059.11233-3-xni@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Mike Snitzer
Headers show
Series Fix regression bugs | expand

Commit Message

Xiao Ni Feb. 20, 2024, 3:30 p.m. UTC
After patch commit f52f5c71f3d4b ("md: fix stopping sync thread"), dmraid
stops sync thread asynchronously. The calling process is:
dev_remove->dm_destroy->__dm_destroy->raid_postsuspend->raid_dtr

raid_postsuspend does two jobs. First, it stops sync thread. Then it
suspend array. Now it can stop sync thread successfully. But it doesn't
set MD_RECOVERY_FROZEN. It's introduced by patch f52f5c71f3d4b. So after
raid_postsuspend, the sync thread starts again. raid_dtr can't stop the
sync thread because the array is already suspended.

This can be reproduced easily by those commands:
while [ 1 ]; do
vgcreate test_vg /dev/loop0 /dev/loop1
lvcreate --type raid1 -L 400M -m 1 -n test_lv test_vg
lvchange -an test_vg
vgremove test_vg -ff
done

Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/md.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Yu Kuai Feb. 23, 2024, 3:12 a.m. UTC | #1
Hi,

在 2024/02/20 23:30, Xiao Ni 写道:
> After patch commit f52f5c71f3d4b ("md: fix stopping sync thread"), dmraid
> stops sync thread asynchronously. The calling process is:
> dev_remove->dm_destroy->__dm_destroy->raid_postsuspend->raid_dtr
> 
> raid_postsuspend does two jobs. First, it stops sync thread. Then it
> suspend array. Now it can stop sync thread successfully. But it doesn't
> set MD_RECOVERY_FROZEN. It's introduced by patch f52f5c71f3d4b. So after
> raid_postsuspend, the sync thread starts again. raid_dtr can't stop the
> sync thread because the array is already suspended.
> 
> This can be reproduced easily by those commands:
> while [ 1 ]; do
> vgcreate test_vg /dev/loop0 /dev/loop1
> lvcreate --type raid1 -L 400M -m 1 -n test_lv test_vg
> lvchange -an test_vg
> vgremove test_vg -ff
> done
> 
> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> Signed-off-by: Xiao Ni <xni@redhat.com>

I agree with this change, but this patch is part of my patch in the
other thread:

dm-raid: really frozen sync_thread during suspend

I still think that fix found problems completely is better, however,
we'll let Song to make decision.

Thanks,
Kuai

> ---
>   drivers/md/md.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 54790261254d..77e3af7cf6bb 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6337,6 +6337,7 @@ static void __md_stop_writes(struct mddev *mddev)
>   void md_stop_writes(struct mddev *mddev)
>   {
>   	mddev_lock_nointr(mddev);
> +	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>   	__md_stop_writes(mddev);
>   	mddev_unlock(mddev);
>   }
>
Song Liu Feb. 23, 2024, 3:58 a.m. UTC | #2
On Thu, Feb 22, 2024 at 7:12 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/20 23:30, Xiao Ni 写道:
> > After patch commit f52f5c71f3d4b ("md: fix stopping sync thread"), dmraid
> > stops sync thread asynchronously. The calling process is:
> > dev_remove->dm_destroy->__dm_destroy->raid_postsuspend->raid_dtr
> >
> > raid_postsuspend does two jobs. First, it stops sync thread. Then it
> > suspend array. Now it can stop sync thread successfully. But it doesn't
> > set MD_RECOVERY_FROZEN. It's introduced by patch f52f5c71f3d4b. So after
> > raid_postsuspend, the sync thread starts again. raid_dtr can't stop the
> > sync thread because the array is already suspended.
> >
> > This can be reproduced easily by those commands:
> > while [ 1 ]; do
> > vgcreate test_vg /dev/loop0 /dev/loop1
> > lvcreate --type raid1 -L 400M -m 1 -n test_lv test_vg
> > lvchange -an test_vg
> > vgremove test_vg -ff
> > done
> >
> > Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> > Signed-off-by: Xiao Ni <xni@redhat.com>
>
> I agree with this change, but this patch is part of my patch in the
> other thread:
>
> dm-raid: really frozen sync_thread during suspend
>
> I still think that fix found problems completely is better, however,
> we'll let Song to make decision.

I think we still need more time (and maybe more iterations) for the
other thread, so we can ship this change sooner. We should add
SoB Kuai here.

Thanks,
Song
diff mbox series

Patch

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 54790261254d..77e3af7cf6bb 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6337,6 +6337,7 @@  static void __md_stop_writes(struct mddev *mddev)
 void md_stop_writes(struct mddev *mddev)
 {
 	mddev_lock_nointr(mddev);
+	set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 	__md_stop_writes(mddev);
 	mddev_unlock(mddev);
 }