diff mbox series

[2/6] md: Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"

Message ID 20240229154941.99557-3-xni@redhat.com (mailing list archive)
State Superseded, archived
Headers show
Series Fix dmraid regression bugs | expand

Commit Message

Xiao Ni Feb. 29, 2024, 3:49 p.m. UTC
This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6.

The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid.
The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes
this problem.

Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/md.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

Comments

Song Liu Feb. 29, 2024, 10:53 p.m. UTC | #1
On Thu, Feb 29, 2024 at 7:50 AM Xiao Ni <xni@redhat.com> wrote:
>
> This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6.
>
> The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid.
> The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes
> this problem.
>
> Signed-off-by: Xiao Ni <xni@redhat.com>

I think we still need 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6 or some
variation of it. Otherwise, we may hit the following deadlock. The test vm here
has 2 raid arrays: one raid5 with journal, and a raid1.

I pushed other patches in the set to the md-6.9-for-hch branch for
further tests.

Thanks,
Song


[  250.347646] INFO: task systemd-udevd:546 blocked for more than 122 seconds.
[  250.348443]       Not tainted 6.8.0-rc3+ #479
[  250.348912] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  250.349741] task:systemd-udevd   state:D stack:27136 pid:546
tgid:546   ppid:525    flags:0x00000000
[  250.350740] Call Trace:
[  250.351043]  <TASK>
[  250.351310]  __schedule+0x862/0x19b0
[  250.351770]  ? __pfx___schedule+0x10/0x10
[  250.352222]  ? lock_release+0x250/0x690
[  250.352657]  ? __pfx_lock_release+0x10/0x10
[  250.353128]  ? mark_held_locks+0x62/0x90
[  250.353604]  schedule+0x77/0x200
[  250.353976]  md_handle_request+0x1fe/0x650
[  250.354459]  ? __pfx_md_handle_request+0x10/0x10
[  250.354957]  ? bio_split_to_limits+0x131/0x150
[  250.355456]  ? __pfx_autoremove_wake_function+0x10/0x10
[  250.356031]  ? lock_is_held_type+0xda/0x130
[  250.356515]  __submit_bio+0x99/0xe0
[  250.356910]  submit_bio_noacct_nocheck+0x25a/0x570
[  250.357510]  ? __pfx_submit_bio_noacct_nocheck+0x10/0x10
[  250.358080]  ? __might_resched+0x274/0x350
[  250.358546]  ? submit_bio_noacct+0x1b7/0x6c0
[  250.359067]  mpage_readahead+0x25b/0x300
[  250.359507]  ? __pfx_mpage_readahead+0x10/0x10
[  250.359986]  ? __pfx___lock_acquire+0x10/0x10
[  250.360524]  ? __pfx_blkdev_get_block+0x10/0x10
[  250.361046]  ? __pfx_lock_release+0x10/0x10
[  250.361602]  ? __pfx___filemap_add_folio+0x10/0x10
[  250.362250]  ? lock_is_held_type+0xda/0x130
[  250.362785]  read_pages+0xfd/0x650
[  250.363173]  ? __pfx_read_pages+0x10/0x10
[  250.363685]  page_cache_ra_unbounded+0x1df/0x2d0
[  250.364228]  force_page_cache_ra+0x11e/0x150
[  250.364716]  filemap_get_pages+0x6f1/0xbb0
[  250.365218]  ? __pfx_filemap_get_pages+0x10/0x10
[  250.365735]  ? lock_is_held_type+0xda/0x130
[  250.366266]  filemap_read+0x216/0x6a0
[  250.366679]  ? __pfx_mark_lock+0x10/0x10
[  250.367132]  ? __pfx_ptep_set_access_flags+0x10/0x10
[  250.367765]  ? __pfx_filemap_read+0x10/0x10
[  250.368234]  ? __lock_acquire+0x959/0x3540
[  250.368756]  blkdev_read_iter+0xc0/0x230
[  250.369200]  vfs_read+0x38c/0x540
[  250.369581]  ? __pfx_vfs_read+0x10/0x10
[  250.370038]  ? __fget_light+0x96/0xd0
[  250.370469]  ksys_read+0xcb/0x170
[  250.370839]  ? __pfx_ksys_read+0x10/0x10
[  250.371320]  do_syscall_64+0x7a/0x1a0
[  250.371735]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  250.372367] RIP: 0033:0x7fcb590118b2
[  250.372865] RSP: 002b:00007ffcdd5f9c18 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[  250.373840] RAX: ffffffffffffffda RBX: 0000555885985010 RCX: 00007fcb590118b2
[  250.374641] RDX: 0000000000000040 RSI: 0000555885985038 RDI: 0000000000000011
[  250.375437] RBP: 000055588599fd40 R08: 0000555885985010 R09: 000055588596c010
[  250.376222] R10: 00007fcb58fbfbc0 R11: 0000000000000246 R12: 00000000804f0000
[  250.376974] R13: 0000000000000040 R14: 000055588599fd90 R15: 0000555885985028
[  250.377811]  </TASK>
[  250.378073] INFO: task mdadm:562 blocked for more than 122 seconds.
[  250.378753]       Not tainted 6.8.0-rc3+ #479
[  250.379237] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  250.380055] task:mdadm           state:D stack:25872 pid:562
tgid:562   ppid:543    flags:0x00004000
[  250.381071] Call Trace:
[  250.381369]  <TASK>
[  250.381625]  __schedule+0x862/0x19b0
[  250.382054]  ? __pfx___schedule+0x10/0x10
[  250.382502]  ? lock_release+0x250/0x690
[  250.382943]  ? __pfx_lock_release+0x10/0x10
[  250.383407]  ? mark_held_locks+0x24/0x90
[  250.383851]  ? lockdep_hardirqs_on+0x7d/0x100
[  250.384345]  ? preempt_count_sub+0x18/0xd0
[  250.384806]  ? _raw_spin_unlock_irqrestore+0x3f/0x60
[  250.385358]  schedule+0x77/0x200
[  250.385718]  md_ioctl+0x1750/0x1d60
[  250.386114]  ? __pfx_md_ioctl+0x10/0x10
[  250.386535]  ? _raw_spin_unlock_irqrestore+0x34/0x60
[  250.387063]  ? lockdep_hardirqs_on+0x7d/0x100
[  250.387567]  ? preempt_count_sub+0x18/0xd0
[  250.388024]  ? populate_seccomp_data+0x184/0x220
[  250.388522]  ? __pfx_autoremove_wake_function+0x10/0x10
[  250.389083]  ? __seccomp_filter+0x102/0x760
[  250.389553]  blkdev_ioctl+0x1f1/0x3c0
[  250.389956]  ? __pfx_blkdev_ioctl+0x10/0x10
[  250.390441]  __x64_sys_ioctl+0xc6/0x100
[  250.390880]  do_syscall_64+0x7a/0x1a0
[  250.391313]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  250.391877] RIP: 0033:0x7fd88eef362b
[  250.392290] RSP: 002b:00007fff8c298438 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  250.393098] RAX: ffffffffffffffda RBX: 000055e1b77a2300 RCX: 00007fd88eef362b
[  250.393896] RDX: 00007fff8c2985a8 RSI: 0000000040140921 RDI: 0000000000000004
[  250.394664] RBP: 0000000000000005 R08: 000000000000001e R09: 00007fff8c298197
[  250.395457] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  250.396223] R13: 000055e1b77a4c70 R14: 00007fff8c2984f8 R15: 000055e1b77a46d0
[  250.397050]  </TASK>
[  250.397357]
[  250.397357] Showing all locks held in the system:
[  250.398092] 1 lock held by khungtaskd/211:
[  250.398535]  #0: ffffffff87f6fea0 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0x4d/0x230
[  250.399613] 1 lock held by systemd-journal/499:
[  250.400124] 1 lock held by systemd-udevd/546:
[  250.400616]  #0: ffff88801461d178
(mapping.invalidate_lock){.+.+}-{3:3}, at:
page_cache_ra_unbounded+0xa4/0x2d0
[  250.401701]
[  250.401882] =============================================
[  250.401882]
[  250.402618] Kernel panic - not syncing: hung_task: blocked tasks
[  250.403294] CPU: 2 PID: 211 Comm: khungtaskd Not tainted 6.8.0-rc3+ #479
[  250.404046] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  250.405264] Call Trace:
[  250.405537]  <TASK>
[  250.405776]  dump_stack_lvl+0x4a/0x80
[  250.406185]  panic+0x41c/0x460
[  250.406592]  ? __pfx_panic+0x10/0x10
[  250.407167]  ? lock_release+0x205/0x690
[  250.407713]  ? preempt_count_sub+0x18/0xd0
[  250.408273]  watchdog+0x9af/0x9b0
[  250.408673]  ? __pfx_watchdog+0x10/0x10
[  250.409097]  kthread+0x1b1/0x1f0
[  250.409476]  ? kthread+0xf6/0x1f0
[  250.409849]  ? __pfx_kthread+0x10/0x10
[  250.410276]  ret_from_fork+0x31/0x60
[  250.410704]  ? __pfx_kthread+0x10/0x10
[  250.411123]  ret_from_fork_asm+0x1b/0x30
[  250.411604]  </TASK>
[  250.412330] Kernel Offset: disabled
[  250.412802] ---[ end Kernel panic - not syncing: hung_task: blocked
tasks ]---
Song Liu Feb. 29, 2024, 11:45 p.m. UTC | #2
On Thu, Feb 29, 2024 at 2:53 PM Song Liu <song@kernel.org> wrote:
>
> On Thu, Feb 29, 2024 at 7:50 AM Xiao Ni <xni@redhat.com> wrote:
> >
> > This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6.
> >
> > The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid.
> > The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes
> > this problem.
> >
> > Signed-off-by: Xiao Ni <xni@redhat.com>
>
> I think we still need 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6 or some
> variation of it. Otherwise, we may hit the following deadlock. The test vm here
> has 2 raid arrays: one raid5 with journal, and a raid1.
>
> I pushed other patches in the set to the md-6.9-for-hch branch for
> further tests.

Actually, it appears md-6.9-for-hch branch still has this problem. Let me test
more..

Song
Xiao Ni March 1, 2024, 12:49 a.m. UTC | #3
On Fri, Mar 1, 2024 at 7:46 AM Song Liu <song@kernel.org> wrote:
>
> On Thu, Feb 29, 2024 at 2:53 PM Song Liu <song@kernel.org> wrote:
> >
> > On Thu, Feb 29, 2024 at 7:50 AM Xiao Ni <xni@redhat.com> wrote:
> > >
> > > This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6.
> > >
> > > The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid.
> > > The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes
> > > this problem.
> > >
> > > Signed-off-by: Xiao Ni <xni@redhat.com>
> >
> > I think we still need 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6 or some
> > variation of it. Otherwise, we may hit the following deadlock. The test vm here
> > has 2 raid arrays: one raid5 with journal, and a raid1.
> >
> > I pushed other patches in the set to the md-6.9-for-hch branch for
> > further tests.
>
> Actually, it appears md-6.9-for-hch branch still has this problem. Let me test
> more..
>
> Song
>

Hi Song

What are the commands you use for testing? Can you reproduce it with
the 6.6 kernel?

Regards
Xiao
Song Liu March 1, 2024, 1:11 a.m. UTC | #4
On Thu, Feb 29, 2024 at 4:49 PM Xiao Ni <xni@redhat.com> wrote:
>
> On Fri, Mar 1, 2024 at 7:46 AM Song Liu <song@kernel.org> wrote:
> >
> > On Thu, Feb 29, 2024 at 2:53 PM Song Liu <song@kernel.org> wrote:
> > >
> > > On Thu, Feb 29, 2024 at 7:50 AM Xiao Ni <xni@redhat.com> wrote:
> > > >
> > > > This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6.
> > > >
> > > > The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid.
> > > > The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes
> > > > this problem.
> > > >
> > > > Signed-off-by: Xiao Ni <xni@redhat.com>
> > >
> > > I think we still need 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6 or some
> > > variation of it. Otherwise, we may hit the following deadlock. The test vm here
> > > has 2 raid arrays: one raid5 with journal, and a raid1.
> > >
> > > I pushed other patches in the set to the md-6.9-for-hch branch for
> > > further tests.
> >
> > Actually, it appears md-6.9-for-hch branch still has this problem. Let me test
> > more..
> >
> > Song
> >
>
> Hi Song
>
> What are the commands you use for testing? Can you reproduce it with
> the 6.6 kernel?

The VM has these two arrays assembled automatically on boot. I can repro
the issue by simply reboot the VM (which triggers stop array on both). So
the repro is basically rebooting the array in a loop via ssh.

For this branch,

https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=md-6.9-for-hch

which has 5 of the 6 patches in these set, I can reproduce the issue. This issue
doesn't happen on commit aee93ec0ec79, which is before this set.

Song
diff mbox series

Patch

diff --git a/drivers/md/md.c b/drivers/md/md.c
index db4743ba7f6c..6376b1aad4d9 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8792,16 +8792,12 @@  void md_do_sync(struct md_thread *thread)
 	int ret;
 
 	/* just incase thread restarts... */
-	if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
+	if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
+	    test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
 		return;
-
-	if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
-		goto skip;
-
-	if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
-	    !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
+	if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
 		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-		goto skip;
+		return;
 	}
 
 	if (mddev_is_clustered(mddev)) {