mbox series

[v3,00/10] cxl/mem: Fix shutdown order

Message ID 169657715790.1491153.3612164287133860191.stgit@dwillia2-xfh.jf.intel.com
Headers show
Series cxl/mem: Fix shutdown order | expand

Message

Dan Williams Oct. 6, 2023, 7:25 a.m. UTC
Changes since v2 [1]:
- Fix @dev vs @host confusion in cxl_sanitize_setup_notifier()
  (Davidlohr)
- Fix hardirq vs threadirq context for taking mbox lock in the handler
  (Davidlohr)
- Switch a test to boolean notation (Jonathan)
- Clarify why cxl_sanitize_setup_notifier() takes @cxlmd, instead of
  @mds (Jonathan)
- Drop export of cxl_mem_sanitize()
- Make it more obvious where some setup functions are taking an @host
  parameter vs an device object / context to operate on parameter.
- Fix synchronization between sanitize and decoder commit
- Add cxl_test infrastructure to regression test these ABI paths

[1]: http://lore.kernel.org/r/169602896768.904193.11292185494339980455.stgit@dwillia2-xfh.jf.intel.com

---

While fixing a crash where the cxlmd->cxlds validity needed to be
maintained over the span of the memdev being unregistered, I made a note
to come back and cleanup sanitize notifier setup/shutdown
implementation. Jonathan went further and noticed that the fix needs
that rework first [2].

[2]: http://lore.kernel.org/r/20230929100316.00004546@Huawei.com

The special wrinkle of the sanitize notifier is that it interacts with
interrupts, which are enabled early in the flow, and it interacts with
memdev sysfs which is not initialized until late in the flow.

After some other cleanups, a self contained
cxl_sanitize_setup_notifier() is introduced to centralize that
incremental setup work, and leave cxl_memdev_shutdown() alone to
coordinate closing down the ioctl path relative to the unregister event
of the memdev.

As I went to checkout the notifier in cxl_test I realized that it
insta-crashes since it tries to read registers from a sysfs attribute.
It turns out that could be fixed as a side effect of fixing the race
between issuing the sanitize command and committing decoders.

Given the new locking and the fallout from not having regression
coverage here I went ahead and extended cxl_test to exercise these ABI
paths. Watch for the related cxl-cli set next.

---

Dan Williams (10):
      cxl/pci: Remove unnecessary device reference management in sanitize work
      cxl/pci: Cleanup 'sanitize' to always poll
      cxl/pci: Remove hardirq handler for cxl_request_irq()
      cxl/pci: Remove inconsistent usage of dev_err_probe()
      cxl/pci: Clarify devm host for memdev relative setup
      cxl/pci: Fix sanitize notifier setup
      cxl/memdev: Fix sanitize vs decoder setup locking
      cxl/mem: Fix shutdown order
      tools/testing/cxl: Make cxl_memdev_state available to other command emulation
      tools/testing/cxl: Add 'sanitize notifier' support


 drivers/cxl/core/core.h      |    1 
 drivers/cxl/core/hdm.c       |   19 +++++
 drivers/cxl/core/mbox.c      |   55 +++++++++++----
 drivers/cxl/core/memdev.c    |  157 ++++++++++++++++++------------------------
 drivers/cxl/core/port.c      |    6 ++
 drivers/cxl/core/region.c    |    6 --
 drivers/cxl/cxlmem.h         |   13 ++-
 drivers/cxl/pci.c            |   88 +++++++++++-------------
 tools/testing/cxl/test/mem.c |   78 +++++++++++++++++++--
 9 files changed, 256 insertions(+), 167 deletions(-)

base-commit: 6465e260f48790807eef06b583b38ca9789b6072