diff mbox series

[5/7] md: fix deadlock in shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh

Message ID ece2b06f-d647-6613-a534-ff4c9bec1142@redhat.com (mailing list archive)
State New, archived
Headers show
Series MD fixes for the LVM2 testsuite | expand

Commit Message

Mikulas Patocka Jan. 17, 2024, 6:21 p.m. UTC
This commit fixes a deadlock in the LVM2 test
shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh

When MD_RECOVERY_WAIT is set or when md_is_rdwr(mddev) is true, the
function md_do_sync would not set MD_RECOVERY_DONE. Thus, stop_sync_thread
would wait for the flag MD_RECOVERY_DONE indefinitely.

Also, md_wakeup_thread_directly does nothing if the thread is waiting in
md_thread on thread->wqueue (it wakes the thread up, the thread would
check THREAD_WAKEUP and go to sleep again without doing anything). So,
this commit introduces a call to md_wakeup_thread from
md_wakeup_thread_directly.

task:lvm             state:D stack:0     pid:46322 tgid:46322 ppid:46079  flags:0x00004002
Call Trace:
 <TASK>
 __schedule+0x228/0x570
 schedule+0x29/0xa0
 schedule_timeout+0x6a/0xd0
 ? timer_shutdown_sync+0x10/0x10
 stop_sync_thread+0x197/0x1c0 [md_mod]
 ? housekeeping_test_cpu+0x30/0x30
 ? table_deps+0x1b0/0x1b0 [dm_mod]
 __md_stop_writes+0x10/0xd0 [md_mod]
 md_stop_writes+0x18/0x30 [md_mod]
 raid_postsuspend+0x32/0x40 [dm_raid]
 dm_table_postsuspend_targets+0x34/0x50 [dm_mod]
 dm_suspend+0xc4/0xd0 [dm_mod]
 dev_suspend+0x186/0x2d0 [dm_mod]
 ? table_deps+0x1b0/0x1b0 [dm_mod]
 ctl_ioctl+0x2e1/0x570 [dm_mod]
 dm_ctl_ioctl+0x5/0x10 [dm_mod]
 __x64_sys_ioctl+0x85/0xa0
 do_syscall_64+0x5d/0x1a0
 entry_SYSCALL_64_after_hwframe+0x46/0x4e

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Cc: stable@vger.kernel.org	# v6.7

---
 drivers/md/md.c    |    8 +++++++-
 drivers/md/raid5.c |    4 ++++
 2 files changed, 11 insertions(+), 1 deletion(-)

Comments

Song Liu Jan. 18, 2024, 1:12 a.m. UTC | #1
On Wed, Jan 17, 2024 at 10:21 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> This commit fixes a deadlock in the LVM2 test
> shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh

I can reproduce this issue on the 6.7 kernel. However, 4/7 and 5/7
(without 1/7-3/7)
cannot fix it. I will run more tests.

Thanks,
Song
Yu Kuai Jan. 18, 2024, 1:51 a.m. UTC | #2
Hi,

在 2024/01/18 2:21, Mikulas Patocka 写道:
> This commit fixes a deadlock in the LVM2 test
> shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh
> 
> When MD_RECOVERY_WAIT is set or when md_is_rdwr(mddev) is true, the
> function md_do_sync would not set MD_RECOVERY_DONE. Thus, stop_sync_thread
> would wait for the flag MD_RECOVERY_DONE indefinitely.
> 
> Also, md_wakeup_thread_directly does nothing if the thread is waiting in
> md_thread on thread->wqueue (it wakes the thread up, the thread would
> check THREAD_WAKEUP and go to sleep again without doing anything). So,
> this commit introduces a call to md_wakeup_thread from
> md_wakeup_thread_directly.
> 
> task:lvm             state:D stack:0     pid:46322 tgid:46322 ppid:46079  flags:0x00004002
> Call Trace:
>   <TASK>
>   __schedule+0x228/0x570
>   schedule+0x29/0xa0
>   schedule_timeout+0x6a/0xd0
>   ? timer_shutdown_sync+0x10/0x10
>   stop_sync_thread+0x197/0x1c0 [md_mod]
>   ? housekeeping_test_cpu+0x30/0x30
>   ? table_deps+0x1b0/0x1b0 [dm_mod]
>   __md_stop_writes+0x10/0xd0 [md_mod]
>   md_stop_writes+0x18/0x30 [md_mod]
>   raid_postsuspend+0x32/0x40 [dm_raid]
>   dm_table_postsuspend_targets+0x34/0x50 [dm_mod]
>   dm_suspend+0xc4/0xd0 [dm_mod]
>   dev_suspend+0x186/0x2d0 [dm_mod]
>   ? table_deps+0x1b0/0x1b0 [dm_mod]
>   ctl_ioctl+0x2e1/0x570 [dm_mod]
>   dm_ctl_ioctl+0x5/0x10 [dm_mod]
>   __x64_sys_ioctl+0x85/0xa0
>   do_syscall_64+0x5d/0x1a0
>   entry_SYSCALL_64_after_hwframe+0x46/0x4e
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> Cc: stable@vger.kernel.org	# v6.7
> 
> ---
>   drivers/md/md.c    |    8 +++++++-
>   drivers/md/raid5.c |    4 ++++
>   2 files changed, 11 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/drivers/md/md.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/md.c
> +++ linux-2.6/drivers/md/md.c
> @@ -8029,6 +8029,8 @@ static void md_wakeup_thread_directly(st
>   	if (t)
>   		wake_up_process(t->tsk);
>   	rcu_read_unlock();
> +
> +	md_wakeup_thread(thread);

This is not correct. I already explained(already in comments) what
md_wakeup_thread_directly() is supposed to do.
>   }
>   
>   void md_wakeup_thread(struct md_thread __rcu *thread)
> @@ -8777,10 +8779,14 @@ void md_do_sync(struct md_thread *thread
>   
>   	/* just incase thread restarts... */
>   	if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
> -	    test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
> +	    test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) {
> +		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
> +			set_bit(MD_RECOVERY_DONE, &mddev->recovery);

If you set MD_RECOVERY_DONE here, sync_thread will be unregistered, I
don't think this is the expected behaviour. Only dm-raid is using this
flag, and rs_start_reshape() already explains that it wants
sync_thread to work later until the table gets reloaded.

>   		return;
> +	}
>   	if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
>   		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> +		set_bit(MD_RECOVERY_DONE, &mddev->recovery);

This change looks reasonable.

Thanks,
Kuai

>   		return;
>   	}
>   
> 
> .
>
diff mbox series

Patch

Index: linux-2.6/drivers/md/md.c
===================================================================
--- linux-2.6.orig/drivers/md/md.c
+++ linux-2.6/drivers/md/md.c
@@ -8029,6 +8029,8 @@  static void md_wakeup_thread_directly(st
 	if (t)
 		wake_up_process(t->tsk);
 	rcu_read_unlock();
+
+	md_wakeup_thread(thread);
 }
 
 void md_wakeup_thread(struct md_thread __rcu *thread)
@@ -8777,10 +8779,14 @@  void md_do_sync(struct md_thread *thread
 
 	/* just incase thread restarts... */
 	if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
-	    test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
+	    test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) {
+		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+			set_bit(MD_RECOVERY_DONE, &mddev->recovery);
 		return;
+	}
 	if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
 		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
+		set_bit(MD_RECOVERY_DONE, &mddev->recovery);
 		return;
 	}