mbox series

[v1,00/15] Bug fixes for mdadm tests

Message ID 20220519191311.17119-1-logang@deltatee.com (mailing list archive)
Headers show
Series Bug fixes for mdadm tests | expand

Message

Logan Gunthorpe May 19, 2022, 7:12 p.m. UTC
Hi,

This series includes fixes to fix all the kernel panics in the mdadm
tests and some, related, sparse issues. The first 10 patches
clean refactor the raid5-cache code so that the RCU usage of conf->log
can be cleaned up which is done in patches 11 and 12 -- fixing some
actual kernel NULL pointer dereference crashes in the mdadm test.

Patch 13 fixes some of the remaining sparse warnings that are just
missing __rcu annotations.

Patches 14 and 15 fix a couple additional hangs in an mdadm test.

This series also originally included a patch[1] to fix the
mddev->private=NULL issue in raid0. That bug caused an mdadm tests to
crash, but it seems Xiao beat me to the fix by a few days. Hopefully,
this work to improve mdadm tests will mean these types of bugs will
be caught much sooner, before merging.

This series will be followed by another series for mdadm which fixes
the segfaults and annotates some failing tests to make mdadm tests
runnable fairly reliably, but I'll wait for a stable hash for this
series to note the kernel version tested against. Following that,
v3 of my lock contention series will be sent with more confidence
of its correctness.

This series is based on the current md/md-next branch as of today
(6ad84d559b8c). A git branch is available here:

  https://github.com/sbates130272/linux-p2pmem md-bug

Thanks,

Logan

[1] https://github.com/sbates130272/linux-p2pmem/commit/5a538f9f48d77cba111773759256bbc3ccaaa74a

--

Logan Gunthorpe (15):
  md/raid5-log: Drop extern decorators for function prototypes
  md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf
  md/raid5-cache: Refactor r5l_start() to take a struct r5conf
  md/raid5-cache: Refactor r5l_flush_stripe_to_raid() to take a struct
    r5conf
  md/raid5-cache: Refactor r5l_wake_reclaim() to take a struct r5conf
  md/raid5-cache: Refactor remaining functions to take a r5conf
  md/raid5-ppl: Drop unused argument from ppl_handle_flush_request()
  md/raid5-cache: Pass the log through to r5c_finish_cache_stripe()
  md/raid5-cache: Don't pass conf to r5c_calculate_new_cp()
  md/raid5-cache: Take struct r5l_log in
    r5c_log_required_to_flush_cache()
  md/raid5: Ensure array is suspended for calls to log_exit()
  md/raid5-cache: Add RCU protection to conf->log accesses
  md/raid5-cache: Annotate pslot with __rcu notation
  md: Ensure resync is reported after it starts
  md: Notify sysfs sync_completed in md_reap_sync_thread()

 drivers/md/md.c          |  13 ++-
 drivers/md/raid5-cache.c | 240 ++++++++++++++++++++++++---------------
 drivers/md/raid5-log.h   | 103 ++++++++---------
 drivers/md/raid5-ppl.c   |   2 +-
 drivers/md/raid5.c       |  50 ++++----
 drivers/md/raid5.h       |   2 +-
 6 files changed, 231 insertions(+), 179 deletions(-)


base-commit: 6ad84d559b8cbce9ab27a3a2658c438de867c98e
--
2.30.2

Comments

Song Liu May 23, 2022, 6:28 a.m. UTC | #1
On Thu, May 19, 2022 at 12:13 PM Logan Gunthorpe <logang@deltatee.com> wrote:
>
> Hi,
>
> This series includes fixes to fix all the kernel panics in the mdadm
> tests and some, related, sparse issues. The first 10 patches
> clean refactor the raid5-cache code so that the RCU usage of conf->log
> can be cleaned up which is done in patches 11 and 12 -- fixing some
> actual kernel NULL pointer dereference crashes in the mdadm test.
>
> Patch 13 fixes some of the remaining sparse warnings that are just
> missing __rcu annotations.
>
> Patches 14 and 15 fix a couple additional hangs in an mdadm test.
>
> This series also originally included a patch[1] to fix the
> mddev->private=NULL issue in raid0. That bug caused an mdadm tests to
> crash, but it seems Xiao beat me to the fix by a few days. Hopefully,
> this work to improve mdadm tests will mean these types of bugs will
> be caught much sooner, before merging.

Thanks for the fix! The set looks good overall. Please address feedback
from Christoph and Donald. We should be able to ship it soon.

Thanks,
Song

>
> This series will be followed by another series for mdadm which fixes
> the segfaults and annotates some failing tests to make mdadm tests
> runnable fairly reliably, but I'll wait for a stable hash for this
> series to note the kernel version tested against. Following that,
> v3 of my lock contention series will be sent with more confidence
> of its correctness.
>
> This series is based on the current md/md-next branch as of today
> (6ad84d559b8c). A git branch is available here:
>
>   https://github.com/sbates130272/linux-p2pmem md-bug
>
> Thanks,
>
> Logan
>
> [1] https://github.com/sbates130272/linux-p2pmem/commit/5a538f9f48d77cba111773759256bbc3ccaaa74a
>
> --
>
> Logan Gunthorpe (15):
>   md/raid5-log: Drop extern decorators for function prototypes
>   md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf
>   md/raid5-cache: Refactor r5l_start() to take a struct r5conf
>   md/raid5-cache: Refactor r5l_flush_stripe_to_raid() to take a struct
>     r5conf
>   md/raid5-cache: Refactor r5l_wake_reclaim() to take a struct r5conf
>   md/raid5-cache: Refactor remaining functions to take a r5conf
>   md/raid5-ppl: Drop unused argument from ppl_handle_flush_request()
>   md/raid5-cache: Pass the log through to r5c_finish_cache_stripe()
>   md/raid5-cache: Don't pass conf to r5c_calculate_new_cp()
>   md/raid5-cache: Take struct r5l_log in
>     r5c_log_required_to_flush_cache()
>   md/raid5: Ensure array is suspended for calls to log_exit()
>   md/raid5-cache: Add RCU protection to conf->log accesses
>   md/raid5-cache: Annotate pslot with __rcu notation
>   md: Ensure resync is reported after it starts
>   md: Notify sysfs sync_completed in md_reap_sync_thread()
>
>  drivers/md/md.c          |  13 ++-
>  drivers/md/raid5-cache.c | 240 ++++++++++++++++++++++++---------------
>  drivers/md/raid5-log.h   | 103 ++++++++---------
>  drivers/md/raid5-ppl.c   |   2 +-
>  drivers/md/raid5.c       |  50 ++++----
>  drivers/md/raid5.h       |   2 +-
>  6 files changed, 231 insertions(+), 179 deletions(-)
>
>
> base-commit: 6ad84d559b8cbce9ab27a3a2658c438de867c98e
> --
> 2.30.2