mbox series

[ndctl,0/3] Support poison list retrieval

Message ID cover.1668133294.git.alison.schofield@intel.com (mailing list archive)
Headers show
Series Support poison list retrieval | expand

Message

Alison Schofield Nov. 11, 2022, 3:20 a.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

Changes RFC->v1:
- Resync with DaveJ's v5 monitor patchset. [1]
  (It provides the event tracing functionality used here.)
- Resync with the kernel patchset adding poison list support. [2]
- Add cxl-get-poison.sh unit test to cxl test suite.
- JSON object naming cleanups, replace spaces with '_'.
- Use common event pid field to restrict events to this cxl list instance.
- Use json_object_get_int64() for addresses.
- Remove empty hpa fields. Add back with dpa->hpa translation.

[1] https://lore.kernel.org/linux-cxl/166803877747.145141.11853418648969334939.stgit@djiang5-desk3.ch.intel.com/
[2] https://lore.kernel.org/linux-cxl/cover.1668115235.git.alison.schofield@intel.com/

The first patch adds a libcxl API for triggering the read of a
poison list from a memory device. Users of that API will need to
trace the kernel events to collect the error records.

Patches 2 adds a PID filtering option to event tracing and then
patches 3 & 4 add a pretty option, --media-errors to cxl list.
The last patch (5) adds a unit test to the cxl test suite.

Examples:
cxl list -m mem2 --media-errors
[
  {
    "memdev":"mem2",
    "pmem_size":1073741824,
    "ram_size":0,
    "serial":2,
    "host":"cxl_mem.2",
    "media_errors":{
      "nr_media_errors":2,
      "media_error_records":[
        {
          "dpa":64,
          "length":128,
          "source":"Injected",
          "flags":"Overflow,",
          "overflow_time":1656711046
        },
        {
          "dpa":192,
          "length":192,
          "source":"Internal",
          "flags":"Overflow,",
          "overflow_time":1656711046
        },
      ]
    }
  }
]

# cxl list -r region5 --media-errors
[
  {
    "region":"region5",
    "resource":1035623989248,
    "size":2147483648,
    "interleave_ways":2,
    "interleave_granularity":4096,
    "decode_state":"commit",
    "media_errors":{
      "nr_media_errors":2,
      "media_error_records":[
        {
          "memdev":"mem2",
          "dpa":0,
          "length":64,
          "source":"Internal",
          "flags":"",
          "overflow_time":0
        },
	{
          "memdev":"mem5",
          "dpa":0,
          "length":256,
          "source":"Injected",
          "flags":"",
          "overflow_time":0
        }
      ]
    }
  }
]

Alison Schofield (5):
  libcxl: add interfaces for GET_POISON_LIST mailbox commands
  cxl: add an optional pid check to event parsing
  cxl/list: collect and parse the poison list records
  cxl/list: add --media-errors option to cxl list
  test: add a cxl-get-poison test

 Documentation/cxl/cxl-list.txt |  64 ++++++++++++
 cxl/event_trace.c              |   5 +
 cxl/event_trace.h              |   1 +
 cxl/filter.c                   |   2 +
 cxl/filter.h                   |   1 +
 cxl/json.c                     | 185 +++++++++++++++++++++++++++++++++
 cxl/lib/libcxl.c               |  44 ++++++++
 cxl/lib/libcxl.sym             |   6 ++
 cxl/libcxl.h                   |   2 +
 cxl/list.c                     |   2 +
 test/cxl-get-poison.sh         |  78 ++++++++++++++
 test/meson.build               |   2 +
 12 files changed, 392 insertions(+)
 create mode 100644 test/cxl-get-poison.sh