mbox series

[0/6] Fix dmraid regression bugs

Message ID 20240229154941.99557-1-xni@redhat.com (mailing list archive)
Headers show
Series Fix dmraid regression bugs | expand

Message

Xiao Ni Feb. 29, 2024, 3:49 p.m. UTC
Hi all

This patch set tries to fix dmraid regression problems when we recently.
After talking with Kuai who also sent a patch set which is used to fix
dmraid regression problems, we decide to use a small patch set to fix
these regression problems. This patch is based on song's md-6.8 branch. 

This patch set has six patches. It reverts three patches. The fourth one
and the fifth one resolve deadlock problems. With these two patches, it
can resolve most deadlock problem. The last one fixes the raid5 reshape
deadlock problem.

I have run lvm2 regression test. There are 4 failed cases:
shell/dmsetup-integrity-keys.sh
shell/lvresize-fs-crypt.sh
shell/pvck-dump.sh
shell/select-report.sh

And lvconvert-raid-reshape.sh can fail sometimes. But it fails in 6.6
kernel too. So it can return back to the same state with 6.6 kernel.

Xiao Ni (6):
  Revert "md: Don't register sync_thread for reshape directly"
  Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"
  Revert "md: Don't ignore suspended array in md_check_recovery()"
  dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
  md: Set MD_RECOVERY_FROZEN before stop sync thread
  md/raid5: Don't check crossing reshape when reshape hasn't started

 drivers/md/dm-raid.c |  2 ++
 drivers/md/md.c      | 22 +++++++++----------
 drivers/md/raid10.c  | 16 ++++++++++++--
 drivers/md/raid5.c   | 51 ++++++++++++++++++++++++++++++++------------
 4 files changed, 63 insertions(+), 28 deletions(-)

Comments

Christoph Hellwig Feb. 29, 2024, 7:39 p.m. UTC | #1
If I rund this on the md/md-6.9-for-hch branch all the hangs I was
previously seeing in the lvm2 test suite are gone.  Still a bunchof
failures, though:

### 427 tests: 284 passed, 127 skipped, 0 timed out, 3 warned, 13 failed
Song Liu Feb. 29, 2024, 7:45 p.m. UTC | #2
On Thu, Feb 29, 2024 at 11:39 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> If I rund this on the md/md-6.9-for-hch branch all the hangs I was
> previously seeing in the lvm2 test suite are gone.  Still a bunchof
> failures, though:
>
> ### 427 tests: 284 passed, 127 skipped, 0 timed out, 3 warned, 13 failed

Yes, this set fixes the issues we are seeing with lvm2 tests. However,
it triggers some other issue. I am looking into it.

Thanks,
Song
Yu Kuai March 1, 2024, 2:12 a.m. UTC | #3
Hi,

在 2024/02/29 23:49, Xiao Ni 写道:
> Hi all
> 
> This patch set tries to fix dmraid regression problems when we recently.
> After talking with Kuai who also sent a patch set which is used to fix
> dmraid regression problems, we decide to use a small patch set to fix
> these regression problems. This patch is based on song's md-6.8 branch.
> 
> This patch set has six patches. It reverts three patches. The fourth one
> and the fifth one resolve deadlock problems. With these two patches, it
> can resolve most deadlock problem. The last one fixes the raid5 reshape
> deadlock problem.
> 
> I have run lvm2 regression test. There are 4 failed cases:
> shell/dmsetup-integrity-keys.sh
> shell/lvresize-fs-crypt.sh
> shell/pvck-dump.sh
> shell/select-report.sh

You might need to run the test suite in a loop to make sure there are no
tests that will fail occasionally.

Thanks,
Kuai

> 
> And lvconvert-raid-reshape.sh can fail sometimes. But it fails in 6.6
> kernel too. So it can return back to the same state with 6.6 kernel.
> 
> Xiao Ni (6):
>    Revert "md: Don't register sync_thread for reshape directly"
>    Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"
>    Revert "md: Don't ignore suspended array in md_check_recovery()"
>    dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
>    md: Set MD_RECOVERY_FROZEN before stop sync thread
>    md/raid5: Don't check crossing reshape when reshape hasn't started
> 
>   drivers/md/dm-raid.c |  2 ++
>   drivers/md/md.c      | 22 +++++++++----------
>   drivers/md/raid10.c  | 16 ++++++++++++--
>   drivers/md/raid5.c   | 51 ++++++++++++++++++++++++++++++++------------
>   4 files changed, 63 insertions(+), 28 deletions(-)
>
Xiao Ni March 1, 2024, 2:22 a.m. UTC | #4
On Fri, Mar 1, 2024 at 10:12 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/02/29 23:49, Xiao Ni 写道:
> > Hi all
> >
> > This patch set tries to fix dmraid regression problems when we recently.
> > After talking with Kuai who also sent a patch set which is used to fix
> > dmraid regression problems, we decide to use a small patch set to fix
> > these regression problems. This patch is based on song's md-6.8 branch.
> >
> > This patch set has six patches. It reverts three patches. The fourth one
> > and the fifth one resolve deadlock problems. With these two patches, it
> > can resolve most deadlock problem. The last one fixes the raid5 reshape
> > deadlock problem.
> >
> > I have run lvm2 regression test. There are 4 failed cases:
> > shell/dmsetup-integrity-keys.sh
> > shell/lvresize-fs-crypt.sh
> > shell/pvck-dump.sh
> > shell/select-report.sh
>
> You might need to run the test suite in a loop to make sure there are no
> tests that will fail occasionally.

I'll let the tests run today to check if there are more errors.

Regards
Xiao
>
> Thanks,
> Kuai
>
> >
> > And lvconvert-raid-reshape.sh can fail sometimes. But it fails in 6.6
> > kernel too. So it can return back to the same state with 6.6 kernel.
> >
> > Xiao Ni (6):
> >    Revert "md: Don't register sync_thread for reshape directly"
> >    Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"
> >    Revert "md: Don't ignore suspended array in md_check_recovery()"
> >    dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid
> >    md: Set MD_RECOVERY_FROZEN before stop sync thread
> >    md/raid5: Don't check crossing reshape when reshape hasn't started
> >
> >   drivers/md/dm-raid.c |  2 ++
> >   drivers/md/md.c      | 22 +++++++++----------
> >   drivers/md/raid10.c  | 16 ++++++++++++--
> >   drivers/md/raid5.c   | 51 ++++++++++++++++++++++++++++++++------------
> >   4 files changed, 63 insertions(+), 28 deletions(-)
> >
>