mbox series

[RFC,v3,0/6] efi/cxl-cper: Report CPER CXL component events through trace events

Message ID 20230601-cxl-cper-v3-0-0189d61f7956@intel.com
Headers show
Series efi/cxl-cper: Report CPER CXL component events through trace events | expand

Message

Ira Weiny Nov. 1, 2023, 9:11 p.m. UTC
Series status/background
========================

This is another RFC version of processing the CXL CPER records through
the CXL trace mechanisms as Dan mentioned in [1].

This raises the cxl event structures to a core header and rearranges them
such that they can be shared most efficiently.  Thus eliminating a
memcpy Smita noticed.  Also BDF is used instead of serial number.

NOTE: I'm still fuzzy on which fields in the CPER record are correct to
find the BDF in the Linux code.  It would be nice to double check those
for me.

The CPER code remains compile tested only.  The original event code
continues to pass cxl-test.

[1] https://lore.kernel.org/all/6528808cef2ba_780ef294c5@dwillia2-xfh.jf.intel.com.notmuch/

Cover letter
============

CXL Component Events, as defined by EFI 2.10 Section N.2.14, wrap a
mostly CXL event payload in an EFI Common Platform Error Record (CPER)
record.  If a device is configured for firmware first CXL event records
are not sent directly to the host.

The CXL sub-system uniquely has DPA to HPA translation information.  It
also already properly decodes the event format.  Send the CXL CPER
records to the CXL sub-system for processing.

With CXL event logs the device interrupts the host with events.  In the
EFI case events are wrapped with device information which needs to be
matched with memdev devices the CXL driver is tracking.

A number of alternatives were considered to match the memdev with the
CPER record.  The most robust was to find the PCI device via Bus,
Device, Function and match it to the memdev driver data.

CPER records are identified with GUID's while CXL event logs contain
UUID's.  The UUID was previously printed for all events.  But the UUID
is redundant information which presents unnecessary complexity when
processing CPER data.  Remove the UUIDs from known events.  Restructure
the code to make sharing the data between CPER/event logs most
efficient.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
Changes in RFC v3:
- djbw: Share structures between CPER/event logs
- Smita: use BDF to resolve the memdev
- djbw/Smita: various cleanups
- Link to v2: https://lore.kernel.org/r/20230601-cxl-cper-v2-0-314d9c36ab02@intel.com

---
Ira Weiny (6):
      cxl/trace: Remove uuid from event trace known events
      cxl/events: Promote CXL event structures to a core header
      cxl/events: Remove UUID from non-generic event structures
      cxl/events: Create a CXL event union
      firmware/efi: Process CXL Component Events
      cxl/memdev: Register for and process CPER events

 drivers/cxl/core/mbox.c         |  57 +++++++++-----
 drivers/cxl/core/trace.h        |  18 ++---
 drivers/cxl/cxlmem.h            |  96 ++---------------------
 drivers/cxl/pci.c               |  59 +++++++++++++-
 drivers/firmware/efi/cper.c     |  15 ++++
 drivers/firmware/efi/cper_cxl.c |  40 ++++++++++
 drivers/firmware/efi/cper_cxl.h |  29 +++++++
 include/linux/cxl-event.h       | 160 ++++++++++++++++++++++++++++++++++++++
 tools/testing/cxl/test/mem.c    | 166 +++++++++++++++++++++++-----------------
 9 files changed, 451 insertions(+), 189 deletions(-)
---
base-commit: 1c8b86a3799f7e5be903c3f49fcdaee29fd385b5
change-id: 20230601-cxl-cper-26ffc839c6c6

Best regards,