diff mbox series

[v5,03/14] md: make sure md_do_sync() will set MD_RECOVERY_DONE

Message ID 20240201092559.910982-4-yukuai1@huaweicloud.com (mailing list archive)
State New, archived
Headers show
Series dm-raid/md/raid: fix v6.7 regressions | expand

Commit Message

Yu Kuai Feb. 1, 2024, 9:25 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com>

stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must
set MD_RECOVERY_DONE, so that follow up md_check_recovery() will
unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up
stop_sync_thread().

If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will
return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4
("md: fix stopping sync thread"), dm-raid switch from
md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread
from md_stop() and md_stop_writes(), causing the test
shell/lvconvert-raid-reshape.sh hang.

We shouldn't switch back to md_reap_sync_thread() because it's
problematic in the first place. Fix the problem by making sure
md_do_sync() will set MD_RECOVERY_DONE.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com/
Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock")
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Xiao Ni Feb. 18, 2024, 5:56 a.m. UTC | #1
On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must
> set MD_RECOVERY_DONE, so that follow up md_check_recovery() will
> unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up
> stop_sync_thread().
>
> If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will
> return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4
> ("md: fix stopping sync thread"), dm-raid switch from
> md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread
> from md_stop() and md_stop_writes(), causing the test
> shell/lvconvert-raid-reshape.sh hang.
>
> We shouldn't switch back to md_reap_sync_thread() because it's
> problematic in the first place. Fix the problem by making sure
> md_do_sync() will set MD_RECOVERY_DONE.
>
> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
> Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com/
> Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
> Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock")
> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 6906d023f1d6..c65dfd156090 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8788,12 +8788,16 @@ void md_do_sync(struct md_thread *thread)
>         int ret;
>
>         /* just incase thread restarts... */
> -       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
> -           test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
> +       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
>                 return;
> -       if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
> +
> +       if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
> +               goto skip;
> +
> +       if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
> +           !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
>                 set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -               return;
> +               goto skip;
>         }

Hi all

I have a question here. The codes above means if MD_RECOVERY_WAIT is
set, it sets MD_RECOVERY_INTR. If so, the sync thread can't happen.
But from the codes in md_start function:

                set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
                md_wakeup_thread(mddev->thread);
                ret = mddev->pers->start(mddev);
                clear_bit(MD_RECOVERY_WAIT, &mddev->recovery);
                md_wakeup_thread(mddev->sync_thread);

MD_RECOVERY_WAIT means "it'll run sync thread later not interrupt it".
I guess this patch can introduce a new bug for raid5 journal?

And to resolve this deadlock, we can use this patch:

--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3796,8 +3796,10 @@ static void raid_postsuspend(struct dm_target *ti)
        struct raid_set *rs = ti->private;

        if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
+               if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
+                       clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);

Regards
Xiao
>
>         if (mddev_is_clustered(mddev)) {
> --
> 2.39.2
>
Yu Kuai Feb. 18, 2024, 6:51 a.m. UTC | #2
Hi,

在 2024/02/18 13:56, Xiao Ni 写道:
> On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must
>> set MD_RECOVERY_DONE, so that follow up md_check_recovery() will
>> unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up
>> stop_sync_thread().
>>
>> If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will
>> return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4
>> ("md: fix stopping sync thread"), dm-raid switch from
>> md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread
>> from md_stop() and md_stop_writes(), causing the test
>> shell/lvconvert-raid-reshape.sh hang.
>>
>> We shouldn't switch back to md_reap_sync_thread() because it's
>> problematic in the first place. Fix the problem by making sure
>> md_do_sync() will set MD_RECOVERY_DONE.
>>
>> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
>> Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com/
>> Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
>> Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock")
>> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/md.c | 12 ++++++++----
>>   1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 6906d023f1d6..c65dfd156090 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -8788,12 +8788,16 @@ void md_do_sync(struct md_thread *thread)
>>          int ret;
>>
>>          /* just incase thread restarts... */
>> -       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
>> -           test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
>> +       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
>>                  return;
>> -       if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
>> +
>> +       if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
>> +               goto skip;
>> +
>> +       if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
>> +           !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
>>                  set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -               return;
>> +               goto skip;
>>          }
> 
> Hi all
> 
> I have a question here. The codes above means if MD_RECOVERY_WAIT is
> set, it sets MD_RECOVERY_INTR. If so, the sync thread can't happen.
> But from the codes in md_start function:
> 
>                  set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
>                  md_wakeup_thread(mddev->thread);
>                  ret = mddev->pers->start(mddev);
>                  clear_bit(MD_RECOVERY_WAIT, &mddev->recovery);
>                  md_wakeup_thread(mddev->sync_thread);
> 
> MD_RECOVERY_WAIT means "it'll run sync thread later not interrupt it".
> I guess this patch can introduce a new bug for raid5 journal?

I'm not sure what kind of problem you're talking about. After patch 4,
md_start_sync() should be the only place to register sync_thread, hence
md_start() should not see registered sync_thread. Perhaps
MD_RECOVERY_WAIT and md_wakeup_thread(mddev->sync_thread) can be removed
after patch 4?

> 
> And to resolve this deadlock, we can use this patch:
> 
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3796,8 +3796,10 @@ static void raid_postsuspend(struct dm_target *ti)
>          struct raid_set *rs = ti->private;
> 
>          if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
> +               if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
> +                       clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);

You must make sure md_do_sync() is called after this if sync_thread is 
already registered, and I don't understand yet how this is guranteed. :(

Thanks,
Kuai

> 
> Regards
> Xiao
>>
>>          if (mddev_is_clustered(mddev)) {
>> --
>> 2.39.2
>>
> 
> .
>
Xiao Ni Feb. 18, 2024, 8:41 a.m. UTC | #3
On Sun, Feb 18, 2024 at 2:51 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/18 13:56, Xiao Ni 写道:
> > On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> From: Yu Kuai <yukuai3@huawei.com>
> >>
> >> stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must
> >> set MD_RECOVERY_DONE, so that follow up md_check_recovery() will
> >> unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up
> >> stop_sync_thread().
> >>
> >> If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will
> >> return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4
> >> ("md: fix stopping sync thread"), dm-raid switch from
> >> md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread
> >> from md_stop() and md_stop_writes(), causing the test
> >> shell/lvconvert-raid-reshape.sh hang.
> >>
> >> We shouldn't switch back to md_reap_sync_thread() because it's
> >> problematic in the first place. Fix the problem by making sure
> >> md_do_sync() will set MD_RECOVERY_DONE.
> >>
> >> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
> >> Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com/
> >> Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
> >> Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock")
> >> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> >> ---
> >>   drivers/md/md.c | 12 ++++++++----
> >>   1 file changed, 8 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index 6906d023f1d6..c65dfd156090 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -8788,12 +8788,16 @@ void md_do_sync(struct md_thread *thread)
> >>          int ret;
> >>
> >>          /* just incase thread restarts... */
> >> -       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
> >> -           test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
> >> +       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
> >>                  return;
> >> -       if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
> >> +
> >> +       if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
> >> +               goto skip;
> >> +
> >> +       if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
> >> +           !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
> >>                  set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> >> -               return;
> >> +               goto skip;
> >>          }
> >
> > Hi all
> >
> > I have a question here. The codes above means if MD_RECOVERY_WAIT is
> > set, it sets MD_RECOVERY_INTR. If so, the sync thread can't happen.
> > But from the codes in md_start function:
> >
> >                  set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
> >                  md_wakeup_thread(mddev->thread);
> >                  ret = mddev->pers->start(mddev);
> >                  clear_bit(MD_RECOVERY_WAIT, &mddev->recovery);
> >                  md_wakeup_thread(mddev->sync_thread);
> >
> > MD_RECOVERY_WAIT means "it'll run sync thread later not interrupt it".
> > I guess this patch can introduce a new bug for raid5 journal?
>
> I'm not sure what kind of problem you're talking about. After patch 4,
> md_start_sync() should be the only place to register sync_thread, hence
> md_start() should not see registered sync_thread. Perhaps
> MD_RECOVERY_WAIT and md_wakeup_thread(mddev->sync_thread) can be removed
> after patch 4?

Hi Kuai

Before this patch, the process is:
1. set MD_RECOVERY_WAIT
2. start sync thread, sync thread can't run until MD_RECOVERY_WAIT is cleared
3. do something
4. clear MD_RECOVERY_WAIT
5. sync thread (md_do_sync) can run

After this patch, step2 returns directly because MD_RECOVERY_INTR is
set. By this patch, MD_RECOVERY_WAIT has the same meaning as
MD_RECOVERY_INTR.  So this patch breaks one logic.

MD_RECOVERY_WAIT is introduced by patch
d5d885fd514fcebc9da5503c88aa0112df7514ef (md: introduce new
personality funciton start()). Then dm raid uses it to delay sync
thread too.

Back to the deadlock which this patch tries to fix.
The original report is reshape is stuck and can be reproduced easily
by these commands:
modprobe brd rd_size=34816 rd_nr=5
vgcreate test_vg /dev/ram*
lvcreate --type raid5 -L 16M -n test_lv test_vg
lvconvert -y --stripes 4 /dev/test_vg/test_lv
vgremove test_vg -ff

The root cause is that dm raid stopped the sync thread directly
before. It works even MD_RECOVERY_WAIT is set. Now we stop sync thread
asynchronously. Because MD_RECOVERY_WAIT is set, when stopping dm
raid, it can't set MD_RECOVERY_DONE in md_do_sync. It's the reason
it's stuck at stop_sync_thread. So to fix this deadlock, dm raid
should clear MD_RECOVERY_WAIT before stopping the sync thread.

dm raid stop process:
1. dm_table_postsuspend_targets -> raid_postsuspend -> md_stop_writes.
2. dm_table_destroy -> raid_dtr

So we need to clear MD_RECOVERY_WAIT before calling md_stop_writes.

>
> >
> > And to resolve this deadlock, we can use this patch:
> >
> > --- a/drivers/md/dm-raid.c
> > +++ b/drivers/md/dm-raid.c
> > @@ -3796,8 +3796,10 @@ static void raid_postsuspend(struct dm_target *ti)
> >          struct raid_set *rs = ti->private;
> >
> >          if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
> > +               if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
> > +                       clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
>
> You must make sure md_do_sync() is called after this if sync_thread is
> already registered, and I don't understand yet how this is guranteed. :(

md_stop_writes -> __md_stop_writes -> stop_sync_thread guarantee this.

Best Regards
Xiao
>
> Thanks,
> Kuai
>
> >
> > Regards
> > Xiao
> >>
> >>          if (mddev_is_clustered(mddev)) {
> >> --
> >> 2.39.2
> >>
> >
> > .
> >
>
Yu Kuai Feb. 18, 2024, 8:59 a.m. UTC | #4
Hi,

在 2024/02/18 16:41, Xiao Ni 写道:
> On Sun, Feb 18, 2024 at 2:51 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2024/02/18 13:56, Xiao Ni 写道:
>>> On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>
>>>> stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must
>>>> set MD_RECOVERY_DONE, so that follow up md_check_recovery() will
>>>> unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up
>>>> stop_sync_thread().
>>>>
>>>> If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will
>>>> return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4
>>>> ("md: fix stopping sync thread"), dm-raid switch from
>>>> md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread
>>>> from md_stop() and md_stop_writes(), causing the test
>>>> shell/lvconvert-raid-reshape.sh hang.
>>>>
>>>> We shouldn't switch back to md_reap_sync_thread() because it's
>>>> problematic in the first place. Fix the problem by making sure
>>>> md_do_sync() will set MD_RECOVERY_DONE.
>>>>
>>>> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
>>>> Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com/
>>>> Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
>>>> Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock")
>>>> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>> ---
>>>>    drivers/md/md.c | 12 ++++++++----
>>>>    1 file changed, 8 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>>> index 6906d023f1d6..c65dfd156090 100644
>>>> --- a/drivers/md/md.c
>>>> +++ b/drivers/md/md.c
>>>> @@ -8788,12 +8788,16 @@ void md_do_sync(struct md_thread *thread)
>>>>           int ret;
>>>>
>>>>           /* just incase thread restarts... */
>>>> -       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
>>>> -           test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
>>>> +       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
>>>>                   return;
>>>> -       if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
>>>> +
>>>> +       if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
>>>> +               goto skip;
>>>> +
>>>> +       if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
>>>> +           !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
>>>>                   set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>>> -               return;
>>>> +               goto skip;
>>>>           }
>>>
>>> Hi all
>>>
>>> I have a question here. The codes above means if MD_RECOVERY_WAIT is
>>> set, it sets MD_RECOVERY_INTR. If so, the sync thread can't happen.
>>> But from the codes in md_start function:
>>>
>>>                   set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
>>>                   md_wakeup_thread(mddev->thread);
>>>                   ret = mddev->pers->start(mddev);
>>>                   clear_bit(MD_RECOVERY_WAIT, &mddev->recovery);
>>>                   md_wakeup_thread(mddev->sync_thread);
>>>
>>> MD_RECOVERY_WAIT means "it'll run sync thread later not interrupt it".
>>> I guess this patch can introduce a new bug for raid5 journal?
>>
>> I'm not sure what kind of problem you're talking about. After patch 4,
>> md_start_sync() should be the only place to register sync_thread, hence
>> md_start() should not see registered sync_thread. Perhaps
>> MD_RECOVERY_WAIT and md_wakeup_thread(mddev->sync_thread) can be removed
>> after patch 4?
> 
> Hi Kuai
> 
> Before this patch, the process is:
> 1. set MD_RECOVERY_WAIT
> 2. start sync thread, sync thread can't run until MD_RECOVERY_WAIT is cleared

Do you take a look at patch 4 and patch 9? sync thread will not start
before step 4 now.
> 3. do something
> 4. clear MD_RECOVERY_WAIT
> 5. sync thread (md_do_sync) can run
> 
> After this patch, step2 returns directly because MD_RECOVERY_INTR is
> set. By this patch, MD_RECOVERY_WAIT has the same meaning as
> MD_RECOVERY_INTR.  So this patch breaks one logic.

And nothing is broke here.
> 
> MD_RECOVERY_WAIT is introduced by patch
> d5d885fd514fcebc9da5503c88aa0112df7514ef (md: introduce new
> personality funciton start()). Then dm raid uses it to delay sync
> thread too.
> 
> Back to the deadlock which this patch tries to fix.
> The original report is reshape is stuck and can be reproduced easily
> by these commands:
> modprobe brd rd_size=34816 rd_nr=5
> vgcreate test_vg /dev/ram*
> lvcreate --type raid5 -L 16M -n test_lv test_vg
> lvconvert -y --stripes 4 /dev/test_vg/test_lv
> vgremove test_vg -ff
> 

And can you still reporduce this problem after this patchset?

> The root cause is that dm raid stopped the sync thread directly
> before. It works even MD_RECOVERY_WAIT is set. Now we stop sync thread
> asynchronously. Because MD_RECOVERY_WAIT is set, when stopping dm
> raid, it can't set MD_RECOVERY_DONE in md_do_sync. It's the reason
> it's stuck at stop_sync_thread. So to fix this deadlock, dm raid
> should clear MD_RECOVERY_WAIT before stopping the sync thread.

Or are you saying that it's better to fix this problem this way? You
dind't explain that what's the problem to set MD_RECOVERY_DONE in
md_so_sync().
> 
> dm raid stop process:
> 1. dm_table_postsuspend_targets -> raid_postsuspend -> md_stop_writes.
> 2. dm_table_destroy -> raid_dtr
> 
> So we need to clear MD_RECOVERY_WAIT before calling md_stop_writes.
> 
>>
>>>
>>> And to resolve this deadlock, we can use this patch:
>>>
>>> --- a/drivers/md/dm-raid.c
>>> +++ b/drivers/md/dm-raid.c
>>> @@ -3796,8 +3796,10 @@ static void raid_postsuspend(struct dm_target *ti)
>>>           struct raid_set *rs = ti->private;
>>>
>>>           if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
>>> +               if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
>>> +                       clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
>>
>> You must make sure md_do_sync() is called after this if sync_thread is
>> already registered, and I don't understand yet how this is guranteed. :(
> 
> md_stop_writes -> __md_stop_writes -> stop_sync_thread guarantee this.
> 
> Best Regards
> Xiao
>>
>> Thanks,
>> Kuai
>>
>>>
>>> Regards
>>> Xiao
>>>>
>>>>           if (mddev_is_clustered(mddev)) {
>>>> --
>>>> 2.39.2
>>>>
>>>
>>> .
>>>
>>
> 
> .
>
diff mbox series

Patch

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 6906d023f1d6..c65dfd156090 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8788,12 +8788,16 @@  void md_do_sync(struct md_thread *thread)
 	int ret;
 
 	/* just incase thread restarts... */
-	if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
-	    test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
+	if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
 		return;
-	if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
+
+	if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+		goto skip;
+
+	if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
+	    !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
 		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-		return;
+		goto skip;
 	}
 
 	if (mddev_is_clustered(mddev)) {