mbox series

[V2,0/4] Fix dmraid regression bugs

Message ID 20240301152128.13465-1-xni@redhat.com (mailing list archive)
Headers show
Series Fix dmraid regression bugs | expand

Message

Xiao Ni March 1, 2024, 3:21 p.m. UTC
Hi all

This patch set tries to fix dmraid regression problems we face
recently. This patch is based on song's md-6.8 branch. 

This patch set has four patches. The first two patches revert two patches.
The third one and the fourth one resolve deadlock problems.

I have run lvm2 regression test 5 times. There are 4 failed cases:
shell/dmsetup-integrity-keys.sh
shell/lvresize-fs-crypt.sh
shell/pvck-dump.sh
shell/select-report.sh

And lvconvert-raid-reshape.sh can fail sometimes. But it fails in 6.6
kernel too. So it can return back to the same state with 6.6 kernel.

V2:
It doesn't revert commit 82ec0ae59d02
("md: Make sure md_do_sync() will set MD_RECOVERY_DONE")
It doesn't clear MD_RECOVERY_WAIT before stopping dmraid
Re-write patch01 comment

Xiao Ni (4):
  md: Revert "md: Don't register sync_thread for reshape directly"
  md: Revert "md: Don't ignore suspended array in md_check_recovery()"
  md: Set MD_RECOVERY_FROZEN before stop sync thread
  md/raid5: Don't check crossing reshape when reshape hasn't started

 drivers/md/md.c     |  9 ++++----
 drivers/md/raid10.c | 16 ++++++++++++--
 drivers/md/raid5.c  | 51 ++++++++++++++++++++++++++++++++-------------
 3 files changed, 56 insertions(+), 20 deletions(-)

Comments

Song Liu March 1, 2024, 10:28 p.m. UTC | #1
On Fri, Mar 1, 2024 at 7:21 AM Xiao Ni <xni@redhat.com> wrote:
>
> Hi all
>
> This patch set tries to fix dmraid regression problems we face
> recently. This patch is based on song's md-6.8 branch.
>
> This patch set has four patches. The first two patches revert two patches.
> The third one and the fourth one resolve deadlock problems.
>
> I have run lvm2 regression test 5 times. There are 4 failed cases:
> shell/dmsetup-integrity-keys.sh
> shell/lvresize-fs-crypt.sh
> shell/pvck-dump.sh
> shell/select-report.sh
>
> And lvconvert-raid-reshape.sh can fail sometimes. But it fails in 6.6
> kernel too. So it can return back to the same state with 6.6 kernel.
>
> V2:
> It doesn't revert commit 82ec0ae59d02
> ("md: Make sure md_do_sync() will set MD_RECOVERY_DONE")
> It doesn't clear MD_RECOVERY_WAIT before stopping dmraid
> Re-write patch01 comment

Unfortunately, I am still seeing the same deadlock in the reboot tests
with two arrays. OTOH, Yu Kuai's version doesn't have this issue.
I think we will ship that patch set.

Thanks for your kind work on this issue.
Song
Xiao Ni March 2, 2024, 12:41 a.m. UTC | #2
On Sat, Mar 2, 2024 at 6:28 AM Song Liu <song@kernel.org> wrote:
>
> On Fri, Mar 1, 2024 at 7:21 AM Xiao Ni <xni@redhat.com> wrote:
> >
> > Hi all
> >
> > This patch set tries to fix dmraid regression problems we face
> > recently. This patch is based on song's md-6.8 branch.
> >
> > This patch set has four patches. The first two patches revert two patches.
> > The third one and the fourth one resolve deadlock problems.
> >
> > I have run lvm2 regression test 5 times. There are 4 failed cases:
> > shell/dmsetup-integrity-keys.sh
> > shell/lvresize-fs-crypt.sh
> > shell/pvck-dump.sh
> > shell/select-report.sh
> >
> > And lvconvert-raid-reshape.sh can fail sometimes. But it fails in 6.6
> > kernel too. So it can return back to the same state with 6.6 kernel.
> >
> > V2:
> > It doesn't revert commit 82ec0ae59d02
> > ("md: Make sure md_do_sync() will set MD_RECOVERY_DONE")
> > It doesn't clear MD_RECOVERY_WAIT before stopping dmraid
> > Re-write patch01 comment
>
> Unfortunately, I am still seeing the same deadlock in the reboot tests
> with two arrays. OTOH, Yu Kuai's version doesn't have this issue.
> I think we will ship that patch set.
>
> Thanks for your kind work on this issue.
> Song
>

It's a process for me to study :)
I'll have a test based on Yu Kuai's patch set and give the result later.

Best Regards
Xiao