Message ID | 20240201092559.910982-4-yukuai1@huaweicloud.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | dm-raid/md/raid: fix v6.7 regressions | expand |
On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > From: Yu Kuai <yukuai3@huawei.com> > > stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must > set MD_RECOVERY_DONE, so that follow up md_check_recovery() will > unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up > stop_sync_thread(). > > If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will > return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4 > ("md: fix stopping sync thread"), dm-raid switch from > md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread > from md_stop() and md_stop_writes(), causing the test > shell/lvconvert-raid-reshape.sh hang. > > We shouldn't switch back to md_reap_sync_thread() because it's > problematic in the first place. Fix the problem by making sure > md_do_sync() will set MD_RECOVERY_DONE. > > Reported-by: Mikulas Patocka <mpatocka@redhat.com> > Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com/ > Fixes: d5d885fd514f ("md: introduce new personality funciton start()") > Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock") > Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > drivers/md/md.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 6906d023f1d6..c65dfd156090 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -8788,12 +8788,16 @@ void md_do_sync(struct md_thread *thread) > int ret; > > /* just incase thread restarts... */ > - if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) || > - test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) > + if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) > return; > - if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */ > + > + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) > + goto skip; > + > + if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) || > + !md_is_rdwr(mddev)) {/* never try to sync a read-only array */ > set_bit(MD_RECOVERY_INTR, &mddev->recovery); > - return; > + goto skip; > } Hi all I have a question here. The codes above means if MD_RECOVERY_WAIT is set, it sets MD_RECOVERY_INTR. If so, the sync thread can't happen. But from the codes in md_start function: set_bit(MD_RECOVERY_WAIT, &mddev->recovery); md_wakeup_thread(mddev->thread); ret = mddev->pers->start(mddev); clear_bit(MD_RECOVERY_WAIT, &mddev->recovery); md_wakeup_thread(mddev->sync_thread); MD_RECOVERY_WAIT means "it'll run sync thread later not interrupt it". I guess this patch can introduce a new bug for raid5 journal? And to resolve this deadlock, we can use this patch: --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3796,8 +3796,10 @@ static void raid_postsuspend(struct dm_target *ti) struct raid_set *rs = ti->private; if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { + if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery)) + clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery); Regards Xiao > > if (mddev_is_clustered(mddev)) { > -- > 2.39.2 >
Hi, 在 2024/02/18 13:56, Xiao Ni 写道: > On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> From: Yu Kuai <yukuai3@huawei.com> >> >> stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must >> set MD_RECOVERY_DONE, so that follow up md_check_recovery() will >> unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up >> stop_sync_thread(). >> >> If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will >> return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4 >> ("md: fix stopping sync thread"), dm-raid switch from >> md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread >> from md_stop() and md_stop_writes(), causing the test >> shell/lvconvert-raid-reshape.sh hang. >> >> We shouldn't switch back to md_reap_sync_thread() because it's >> problematic in the first place. Fix the problem by making sure >> md_do_sync() will set MD_RECOVERY_DONE. >> >> Reported-by: Mikulas Patocka <mpatocka@redhat.com> >> Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com/ >> Fixes: d5d885fd514f ("md: introduce new personality funciton start()") >> Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock") >> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >> --- >> drivers/md/md.c | 12 ++++++++---- >> 1 file changed, 8 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index 6906d023f1d6..c65dfd156090 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -8788,12 +8788,16 @@ void md_do_sync(struct md_thread *thread) >> int ret; >> >> /* just incase thread restarts... */ >> - if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) || >> - test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) >> + if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) >> return; >> - if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */ >> + >> + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) >> + goto skip; >> + >> + if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) || >> + !md_is_rdwr(mddev)) {/* never try to sync a read-only array */ >> set_bit(MD_RECOVERY_INTR, &mddev->recovery); >> - return; >> + goto skip; >> } > > Hi all > > I have a question here. The codes above means if MD_RECOVERY_WAIT is > set, it sets MD_RECOVERY_INTR. If so, the sync thread can't happen. > But from the codes in md_start function: > > set_bit(MD_RECOVERY_WAIT, &mddev->recovery); > md_wakeup_thread(mddev->thread); > ret = mddev->pers->start(mddev); > clear_bit(MD_RECOVERY_WAIT, &mddev->recovery); > md_wakeup_thread(mddev->sync_thread); > > MD_RECOVERY_WAIT means "it'll run sync thread later not interrupt it". > I guess this patch can introduce a new bug for raid5 journal? I'm not sure what kind of problem you're talking about. After patch 4, md_start_sync() should be the only place to register sync_thread, hence md_start() should not see registered sync_thread. Perhaps MD_RECOVERY_WAIT and md_wakeup_thread(mddev->sync_thread) can be removed after patch 4? > > And to resolve this deadlock, we can use this patch: > > --- a/drivers/md/dm-raid.c > +++ b/drivers/md/dm-raid.c > @@ -3796,8 +3796,10 @@ static void raid_postsuspend(struct dm_target *ti) > struct raid_set *rs = ti->private; > > if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { > + if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery)) > + clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery); You must make sure md_do_sync() is called after this if sync_thread is already registered, and I don't understand yet how this is guranteed. :( Thanks, Kuai > > Regards > Xiao >> >> if (mddev_is_clustered(mddev)) { >> -- >> 2.39.2 >> > > . >
On Sun, Feb 18, 2024 at 2:51 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2024/02/18 13:56, Xiao Ni 写道: > > On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> > >> From: Yu Kuai <yukuai3@huawei.com> > >> > >> stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must > >> set MD_RECOVERY_DONE, so that follow up md_check_recovery() will > >> unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up > >> stop_sync_thread(). > >> > >> If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will > >> return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4 > >> ("md: fix stopping sync thread"), dm-raid switch from > >> md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread > >> from md_stop() and md_stop_writes(), causing the test > >> shell/lvconvert-raid-reshape.sh hang. > >> > >> We shouldn't switch back to md_reap_sync_thread() because it's > >> problematic in the first place. Fix the problem by making sure > >> md_do_sync() will set MD_RECOVERY_DONE. > >> > >> Reported-by: Mikulas Patocka <mpatocka@redhat.com> > >> Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com/ > >> Fixes: d5d885fd514f ("md: introduce new personality funciton start()") > >> Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock") > >> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") > >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> > >> --- > >> drivers/md/md.c | 12 ++++++++---- > >> 1 file changed, 8 insertions(+), 4 deletions(-) > >> > >> diff --git a/drivers/md/md.c b/drivers/md/md.c > >> index 6906d023f1d6..c65dfd156090 100644 > >> --- a/drivers/md/md.c > >> +++ b/drivers/md/md.c > >> @@ -8788,12 +8788,16 @@ void md_do_sync(struct md_thread *thread) > >> int ret; > >> > >> /* just incase thread restarts... */ > >> - if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) || > >> - test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) > >> + if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) > >> return; > >> - if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */ > >> + > >> + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) > >> + goto skip; > >> + > >> + if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) || > >> + !md_is_rdwr(mddev)) {/* never try to sync a read-only array */ > >> set_bit(MD_RECOVERY_INTR, &mddev->recovery); > >> - return; > >> + goto skip; > >> } > > > > Hi all > > > > I have a question here. The codes above means if MD_RECOVERY_WAIT is > > set, it sets MD_RECOVERY_INTR. If so, the sync thread can't happen. > > But from the codes in md_start function: > > > > set_bit(MD_RECOVERY_WAIT, &mddev->recovery); > > md_wakeup_thread(mddev->thread); > > ret = mddev->pers->start(mddev); > > clear_bit(MD_RECOVERY_WAIT, &mddev->recovery); > > md_wakeup_thread(mddev->sync_thread); > > > > MD_RECOVERY_WAIT means "it'll run sync thread later not interrupt it". > > I guess this patch can introduce a new bug for raid5 journal? > > I'm not sure what kind of problem you're talking about. After patch 4, > md_start_sync() should be the only place to register sync_thread, hence > md_start() should not see registered sync_thread. Perhaps > MD_RECOVERY_WAIT and md_wakeup_thread(mddev->sync_thread) can be removed > after patch 4? Hi Kuai Before this patch, the process is: 1. set MD_RECOVERY_WAIT 2. start sync thread, sync thread can't run until MD_RECOVERY_WAIT is cleared 3. do something 4. clear MD_RECOVERY_WAIT 5. sync thread (md_do_sync) can run After this patch, step2 returns directly because MD_RECOVERY_INTR is set. By this patch, MD_RECOVERY_WAIT has the same meaning as MD_RECOVERY_INTR. So this patch breaks one logic. MD_RECOVERY_WAIT is introduced by patch d5d885fd514fcebc9da5503c88aa0112df7514ef (md: introduce new personality funciton start()). Then dm raid uses it to delay sync thread too. Back to the deadlock which this patch tries to fix. The original report is reshape is stuck and can be reproduced easily by these commands: modprobe brd rd_size=34816 rd_nr=5 vgcreate test_vg /dev/ram* lvcreate --type raid5 -L 16M -n test_lv test_vg lvconvert -y --stripes 4 /dev/test_vg/test_lv vgremove test_vg -ff The root cause is that dm raid stopped the sync thread directly before. It works even MD_RECOVERY_WAIT is set. Now we stop sync thread asynchronously. Because MD_RECOVERY_WAIT is set, when stopping dm raid, it can't set MD_RECOVERY_DONE in md_do_sync. It's the reason it's stuck at stop_sync_thread. So to fix this deadlock, dm raid should clear MD_RECOVERY_WAIT before stopping the sync thread. dm raid stop process: 1. dm_table_postsuspend_targets -> raid_postsuspend -> md_stop_writes. 2. dm_table_destroy -> raid_dtr So we need to clear MD_RECOVERY_WAIT before calling md_stop_writes. > > > > > And to resolve this deadlock, we can use this patch: > > > > --- a/drivers/md/dm-raid.c > > +++ b/drivers/md/dm-raid.c > > @@ -3796,8 +3796,10 @@ static void raid_postsuspend(struct dm_target *ti) > > struct raid_set *rs = ti->private; > > > > if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { > > + if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery)) > > + clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery); > > You must make sure md_do_sync() is called after this if sync_thread is > already registered, and I don't understand yet how this is guranteed. :( md_stop_writes -> __md_stop_writes -> stop_sync_thread guarantee this. Best Regards Xiao > > Thanks, > Kuai > > > > > Regards > > Xiao > >> > >> if (mddev_is_clustered(mddev)) { > >> -- > >> 2.39.2 > >> > > > > . > > >
Hi, 在 2024/02/18 16:41, Xiao Ni 写道: > On Sun, Feb 18, 2024 at 2:51 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2024/02/18 13:56, Xiao Ni 写道: >>> On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>>> >>>> From: Yu Kuai <yukuai3@huawei.com> >>>> >>>> stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must >>>> set MD_RECOVERY_DONE, so that follow up md_check_recovery() will >>>> unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up >>>> stop_sync_thread(). >>>> >>>> If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will >>>> return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4 >>>> ("md: fix stopping sync thread"), dm-raid switch from >>>> md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread >>>> from md_stop() and md_stop_writes(), causing the test >>>> shell/lvconvert-raid-reshape.sh hang. >>>> >>>> We shouldn't switch back to md_reap_sync_thread() because it's >>>> problematic in the first place. Fix the problem by making sure >>>> md_do_sync() will set MD_RECOVERY_DONE. >>>> >>>> Reported-by: Mikulas Patocka <mpatocka@redhat.com> >>>> Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com/ >>>> Fixes: d5d885fd514f ("md: introduce new personality funciton start()") >>>> Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock") >>>> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread") >>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>>> --- >>>> drivers/md/md.c | 12 ++++++++---- >>>> 1 file changed, 8 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/md/md.c b/drivers/md/md.c >>>> index 6906d023f1d6..c65dfd156090 100644 >>>> --- a/drivers/md/md.c >>>> +++ b/drivers/md/md.c >>>> @@ -8788,12 +8788,16 @@ void md_do_sync(struct md_thread *thread) >>>> int ret; >>>> >>>> /* just incase thread restarts... */ >>>> - if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) || >>>> - test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) >>>> + if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) >>>> return; >>>> - if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */ >>>> + >>>> + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) >>>> + goto skip; >>>> + >>>> + if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) || >>>> + !md_is_rdwr(mddev)) {/* never try to sync a read-only array */ >>>> set_bit(MD_RECOVERY_INTR, &mddev->recovery); >>>> - return; >>>> + goto skip; >>>> } >>> >>> Hi all >>> >>> I have a question here. The codes above means if MD_RECOVERY_WAIT is >>> set, it sets MD_RECOVERY_INTR. If so, the sync thread can't happen. >>> But from the codes in md_start function: >>> >>> set_bit(MD_RECOVERY_WAIT, &mddev->recovery); >>> md_wakeup_thread(mddev->thread); >>> ret = mddev->pers->start(mddev); >>> clear_bit(MD_RECOVERY_WAIT, &mddev->recovery); >>> md_wakeup_thread(mddev->sync_thread); >>> >>> MD_RECOVERY_WAIT means "it'll run sync thread later not interrupt it". >>> I guess this patch can introduce a new bug for raid5 journal? >> >> I'm not sure what kind of problem you're talking about. After patch 4, >> md_start_sync() should be the only place to register sync_thread, hence >> md_start() should not see registered sync_thread. Perhaps >> MD_RECOVERY_WAIT and md_wakeup_thread(mddev->sync_thread) can be removed >> after patch 4? > > Hi Kuai > > Before this patch, the process is: > 1. set MD_RECOVERY_WAIT > 2. start sync thread, sync thread can't run until MD_RECOVERY_WAIT is cleared Do you take a look at patch 4 and patch 9? sync thread will not start before step 4 now. > 3. do something > 4. clear MD_RECOVERY_WAIT > 5. sync thread (md_do_sync) can run > > After this patch, step2 returns directly because MD_RECOVERY_INTR is > set. By this patch, MD_RECOVERY_WAIT has the same meaning as > MD_RECOVERY_INTR. So this patch breaks one logic. And nothing is broke here. > > MD_RECOVERY_WAIT is introduced by patch > d5d885fd514fcebc9da5503c88aa0112df7514ef (md: introduce new > personality funciton start()). Then dm raid uses it to delay sync > thread too. > > Back to the deadlock which this patch tries to fix. > The original report is reshape is stuck and can be reproduced easily > by these commands: > modprobe brd rd_size=34816 rd_nr=5 > vgcreate test_vg /dev/ram* > lvcreate --type raid5 -L 16M -n test_lv test_vg > lvconvert -y --stripes 4 /dev/test_vg/test_lv > vgremove test_vg -ff > And can you still reporduce this problem after this patchset? > The root cause is that dm raid stopped the sync thread directly > before. It works even MD_RECOVERY_WAIT is set. Now we stop sync thread > asynchronously. Because MD_RECOVERY_WAIT is set, when stopping dm > raid, it can't set MD_RECOVERY_DONE in md_do_sync. It's the reason > it's stuck at stop_sync_thread. So to fix this deadlock, dm raid > should clear MD_RECOVERY_WAIT before stopping the sync thread. Or are you saying that it's better to fix this problem this way? You dind't explain that what's the problem to set MD_RECOVERY_DONE in md_so_sync(). > > dm raid stop process: > 1. dm_table_postsuspend_targets -> raid_postsuspend -> md_stop_writes. > 2. dm_table_destroy -> raid_dtr > > So we need to clear MD_RECOVERY_WAIT before calling md_stop_writes. > >> >>> >>> And to resolve this deadlock, we can use this patch: >>> >>> --- a/drivers/md/dm-raid.c >>> +++ b/drivers/md/dm-raid.c >>> @@ -3796,8 +3796,10 @@ static void raid_postsuspend(struct dm_target *ti) >>> struct raid_set *rs = ti->private; >>> >>> if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) { >>> + if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery)) >>> + clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery); >> >> You must make sure md_do_sync() is called after this if sync_thread is >> already registered, and I don't understand yet how this is guranteed. :( > > md_stop_writes -> __md_stop_writes -> stop_sync_thread guarantee this. > > Best Regards > Xiao >> >> Thanks, >> Kuai >> >>> >>> Regards >>> Xiao >>>> >>>> if (mddev_is_clustered(mddev)) { >>>> -- >>>> 2.39.2 >>>> >>> >>> . >>> >> > > . >
diff --git a/drivers/md/md.c b/drivers/md/md.c index 6906d023f1d6..c65dfd156090 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8788,12 +8788,16 @@ void md_do_sync(struct md_thread *thread) int ret; /* just incase thread restarts... */ - if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) || - test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) + if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) return; - if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */ + + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) + goto skip; + + if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) || + !md_is_rdwr(mddev)) {/* never try to sync a read-only array */ set_bit(MD_RECOVERY_INTR, &mddev->recovery); - return; + goto skip; } if (mddev_is_clustered(mddev)) {