Message ID | cover.1709748564.git.alison.schofield@intel.com |
---|---|
Headers | show |
Series | Support poison list retrieval | expand |
alison.schofield@ wrote: > From: Alison Schofield <alison.schofield@intel.com> > > Changes since v9: > - Replace the multi-use 'name' var, with multiple descriptive > flavors: memdev_name, region_name, decoder_name (DaveJ) > - Use a static string table for poison source lookup (DaveJ) > - Rebased on latest pending > Link to v9: https://lore.kernel.org/r/cover.1709253898.git.alison.schofield@intel.com/ > > > Add the option to add a memory devices poison list to the cxl-list > json output. Offer the option by memdev and by region. Sample usage: > > # cxl list -m mem1 --media-errors > [ > { > "memdev":"mem1", > "pmem_size":1073741824, > "ram_size":1073741824, > "serial":1, > "numa_node":1, > "host":"cxl_mem.1", > "media_errors":[ > { > "dpa":0, > "length":64, > "source":"Internal" > }, > { > "decoder":"decoder10.0", > "hpa":1035355557888, > "dpa":1073741824, > "length":64, > "source":"External" > }, > { > "decoder":"decoder10.0", > "hpa":1035355566080, > "dpa":1073745920, > "length":64, > "source":"Injected" > } > ] > } > ] > > # cxl list -r region5 --media-errors > [ > { > "region":"region5", > "resource":1035355553792, > "size":2147483648, > "type":"pmem", > "interleave_ways":2, > "interleave_granularity":4096, > "decode_state":"commit", > "media_errors":[ > { > "decoder":"decoder10.0", > "hpa":1035355557888, > "dpa":1073741824, > "length":64, I notice that the ndctl --media-errors records are: { offset, length } ...it is not clear to me that "dpa" and "hpa" have much meaning to userspace by default. Physical address information is privileged, so if these records were { offset, length } tuples there is the possibility that they can be provided to non-root. "Offset" is region relative "hpa" when listing region media errors, and "offset" is memdev relative "dpa" while listing memdev relative media errors.
On Wed, Mar 06, 2024 at 03:03:40PM -0800, Dan Williams wrote: > alison.schofield@ wrote: > > From: Alison Schofield <alison.schofield@intel.com> > > > > Changes since v9: > > - Replace the multi-use 'name' var, with multiple descriptive > > flavors: memdev_name, region_name, decoder_name (DaveJ) > > - Use a static string table for poison source lookup (DaveJ) > > - Rebased on latest pending > > Link to v9: https://lore.kernel.org/r/cover.1709253898.git.alison.schofield@intel.com/ > > > > > > Add the option to add a memory devices poison list to the cxl-list > > json output. Offer the option by memdev and by region. Sample usage: > > > > # cxl list -m mem1 --media-errors > > [ > > { > > "memdev":"mem1", > > "pmem_size":1073741824, > > "ram_size":1073741824, > > "serial":1, > > "numa_node":1, > > "host":"cxl_mem.1", > > "media_errors":[ > > { > > "dpa":0, > > "length":64, > > "source":"Internal" > > }, > > { > > "decoder":"decoder10.0", > > "hpa":1035355557888, > > "dpa":1073741824, > > "length":64, > > "source":"External" > > }, > > { > > "decoder":"decoder10.0", > > "hpa":1035355566080, > > "dpa":1073745920, > > "length":64, > > "source":"Injected" > > } > > ] > > } > > ] > > > > # cxl list -r region5 --media-errors > > [ > > { > > "region":"region5", > > "resource":1035355553792, > > "size":2147483648, > > "type":"pmem", > > "interleave_ways":2, > > "interleave_granularity":4096, > > "decode_state":"commit", > > "media_errors":[ > > { > > "decoder":"decoder10.0", > > "hpa":1035355557888, > > "dpa":1073741824, > > "length":64, > > I notice that the ndctl --media-errors records are: > > { offset, length } > > ...it is not clear to me that "dpa" and "hpa" have much meaning to > userspace by default. Physical address information is privileged, so if > these records were { offset, length } tuples there is the possibility > that they can be provided to non-root. > > "Offset" is region relative "hpa" when listing region media errors, and > "offset" is memdev relative "dpa" while listing memdev relative media > errors. Done. memdev relative dpa is just dpa right? Unless you are thinking offset into a partition? I don't think so.
Alison Schofield wrote: [..] > > I notice that the ndctl --media-errors records are: > > > > { offset, length } > > > > ...it is not clear to me that "dpa" and "hpa" have much meaning to > > userspace by default. Physical address information is privileged, so if > > these records were { offset, length } tuples there is the possibility > > that they can be provided to non-root. > > > > "Offset" is region relative "hpa" when listing region media errors, and > > "offset" is memdev relative "dpa" while listing memdev relative media > > errors. > > Done. memdev relative dpa is just dpa right? Unless you are thinking > offset into a partition? I don't think so. Right, memdev offset == absolute device dpa region offset == region base relative
On Wed, Mar 06, 2024 at 03:03:40PM -0800, Dan Williams wrote: > alison.schofield@ wrote: > > From: Alison Schofield <alison.schofield@intel.com> > > > > Changes since v9: > > - Replace the multi-use 'name' var, with multiple descriptive > > flavors: memdev_name, region_name, decoder_name (DaveJ) > > - Use a static string table for poison source lookup (DaveJ) > > - Rebased on latest pending > > Link to v9: https://lore.kernel.org/r/cover.1709253898.git.alison.schofield@intel.com/ > > > > > > Add the option to add a memory devices poison list to the cxl-list > > json output. Offer the option by memdev and by region. Sample usage: > > > > # cxl list -m mem1 --media-errors > > [ > > { > > "memdev":"mem1", > > "pmem_size":1073741824, > > "ram_size":1073741824, > > "serial":1, > > "numa_node":1, > > "host":"cxl_mem.1", > > "media_errors":[ > > { > > "dpa":0, > > "length":64, > > "source":"Internal" > > }, > > { > > "decoder":"decoder10.0", > > "hpa":1035355557888, > > "dpa":1073741824, > > "length":64, > > "source":"External" > > }, > > { > > "decoder":"decoder10.0", > > "hpa":1035355566080, > > "dpa":1073745920, > > "length":64, > > "source":"Injected" > > } Dan, In cleaning up the man pages, I need to follow up on this offset, length notation. A default by memdev list now looks like this- { "offset" : "length" : "source" : } Which means dropping the 'decoder' even if it can be discovered from the trace event. Recall previously a region was listed if present, then we changed that to a decoder is listed if present. Now, with no 'hpa' listing I've dropped the decoder too. That leaves no hint in the by memdev listing that this poison is in a region. Decoders will only be included in the by region list: { "decoder": "offset" : "length" : "source" : } OK ? > > } > > ] > > > > # cxl list -r region5 --media-errors > > [ > > { > > "region":"region5", > > "resource":1035355553792, > > "size":2147483648, > > "type":"pmem", > > "interleave_ways":2, > > "interleave_granularity":4096, > > "decode_state":"commit", > > "media_errors":[ > > { > > "decoder":"decoder10.0", > > "hpa":1035355557888, > > "dpa":1073741824, > > "length":64, > > I notice that the ndctl --media-errors records are: > > { offset, length } > > ...it is not clear to me that "dpa" and "hpa" have much meaning to > userspace by default. Physical address information is privileged, so if > these records were { offset, length } tuples there is the possibility > that they can be provided to non-root. > > "Offset" is region relative "hpa" when listing region media errors, and > "offset" is memdev relative "dpa" while listing memdev relative media > errors.
From: Alison Schofield <alison.schofield@intel.com> Changes since v9: - Replace the multi-use 'name' var, with multiple descriptive flavors: memdev_name, region_name, decoder_name (DaveJ) - Use a static string table for poison source lookup (DaveJ) - Rebased on latest pending Link to v9: https://lore.kernel.org/r/cover.1709253898.git.alison.schofield@intel.com/ Add the option to add a memory devices poison list to the cxl-list json output. Offer the option by memdev and by region. Sample usage: # cxl list -m mem1 --media-errors [ { "memdev":"mem1", "pmem_size":1073741824, "ram_size":1073741824, "serial":1, "numa_node":1, "host":"cxl_mem.1", "media_errors":[ { "dpa":0, "length":64, "source":"Internal" }, { "decoder":"decoder10.0", "hpa":1035355557888, "dpa":1073741824, "length":64, "source":"External" }, { "decoder":"decoder10.0", "hpa":1035355566080, "dpa":1073745920, "length":64, "source":"Injected" } ] } ] # cxl list -r region5 --media-errors [ { "region":"region5", "resource":1035355553792, "size":2147483648, "type":"pmem", "interleave_ways":2, "interleave_granularity":4096, "decode_state":"commit", "media_errors":[ { "decoder":"decoder10.0", "hpa":1035355557888, "dpa":1073741824, "length":64, "source":"External" }, { "decoder":"decoder8.1", "hpa":1035355566080, "dpa":1073745920, "length":64, "source":"Internal" } ] } ] Alison Schofield (7): libcxl: add interfaces for GET_POISON_LIST mailbox commands cxl: add an optional pid check to event parsing cxl/event_trace: add a private context for private parsers cxl/event_trace: add helpers to retrieve tep fields by type cxl/list: collect and parse media_error records cxl/list: add --media-errors option to cxl list cxl/test: add cxl-poison.sh unit test Documentation/cxl/cxl-list.txt | 79 +++++++++- cxl/event_trace.c | 82 ++++++++++- cxl/event_trace.h | 14 +- cxl/filter.h | 3 + cxl/json.c | 257 +++++++++++++++++++++++++++++++++ cxl/lib/libcxl.c | 47 ++++++ cxl/lib/libcxl.sym | 2 + cxl/libcxl.h | 2 + cxl/list.c | 3 + test/cxl-poison.sh | 137 ++++++++++++++++++ test/meson.build | 2 + 11 files changed, 624 insertions(+), 4 deletions(-) create mode 100644 test/cxl-poison.sh base-commit: e0d0680bd3e554bd5f211e989480c5a13a023b2d