mbox series

[v12,0/6] CXL Poison List Retrieval & Tracing

Message ID cover.1681159309.git.alison.schofield@intel.com
Headers show
Series CXL Poison List Retrieval & Tracing | expand

Message

Alison Schofield April 10, 2023, 8:55 p.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

Changes in v12:
- Pick up Reviewed-by Tags:
  2/6 cxl/trace: Add TRACE support for CXL media-error records (Ira)
  4/6 cxl/region: Provide region info to the cxl_poison trace event (Ira, DaveJ)
  6/6 tools/testing/cxl: Mock support for Get Poison List (DaveJ)
- Update sysfs doc to say logging happens when cxl_poison events are enabled.
- Remove errant blank line in cxl_memdev_visible() (Jonathan, DaveJ)
- Rename cxlds->poison.payload_out, poison.list_out
- Update commit message syntax (DaveJ)
- Shorten mailbox struct names (DaveJ)

Link to v11:
https://lore.kernel.org/linux-cxl/cover.1679888450.git.alison.schofield@intel.com/

Add support for retrieving device poison lists and store the returned
error records as kernel trace events.

The handling of the poison list is guided by the CXL 3.0 Specification
Section 8.2.9.8.4.1. [1] 

Example trigger:
$ echo 1 > /sys/bus/cxl/devices/mem0/trigger_poison_list

Example Trace Events:

Poison found in a PMEM Region:
cxl_poison: memdev=mem0 host=cxl_mem.0 serial=0 trace_type=List region=region11 region_uuid=d96e67ec-76b0-406f-8c35-5b52630dcad1 hpa=0xf100000000 dpa=0x70000000 dpa_length=0x40 source=Injected flags= overflow_time=0

Poison found in RAM Region:
cxl_poison: memdev=mem0 host=cxl_mem.0 serial=0 trace_type=List region=region2 region_uuid=00000000-0000-0000-0000-000000000000 hpa=0xf010000000 dpa=0x0 dpa_length=0x40 source=Injected flags= overflow_time=0

Poison found in an unmapped DPA resource:
cxl_poison: memdev=mem3 host=cxl_mem.3 serial=3 trace_type=List region= region_uuid=00000000-0000-0000-0000-000000000000 hpa=0xffffffffffffffff dpa=0x40000000 dpa_length=0x40 source=Injected flags= overflow_time=0

[1]: https://www.computeexpresslink.org/download-the-specification

Alison Schofield (6):
  cxl/mbox: Add GET_POISON_LIST mailbox command
  cxl/trace: Add TRACE support for CXL media-error records
  cxl/memdev: Add trigger_poison_list sysfs attribute
  cxl/region: Provide region info to the cxl_poison trace event
  cxl/trace: Add an HPA to cxl_poison trace events
  tools/testing/cxl: Mock support for Get Poison List

 Documentation/ABI/testing/sysfs-bus-cxl |  14 +++
 drivers/cxl/core/core.h                 |  15 ++++
 drivers/cxl/core/mbox.c                 |  74 ++++++++++++++++
 drivers/cxl/core/memdev.c               | 108 ++++++++++++++++++++++++
 drivers/cxl/core/region.c               |  63 ++++++++++++++
 drivers/cxl/core/trace.c                |  94 +++++++++++++++++++++
 drivers/cxl/core/trace.h                | 101 ++++++++++++++++++++++
 drivers/cxl/cxlmem.h                    |  72 +++++++++++++++-
 drivers/cxl/mem.c                       |  36 ++++++++
 drivers/cxl/pci.c                       |   4 +
 tools/testing/cxl/test/mem.c            |  42 +++++++++
 11 files changed, 622 insertions(+), 1 deletion(-)


base-commit: e686c32590f40bffc45f105c04c836ffad3e531a