mbox series

[v3,0/6] cxl/pci: Add support for RCH RAS error handling

Message ID 20230411180302.2678736-1-terry.bowman@amd.com
Headers show
Series cxl/pci: Add support for RCH RAS error handling | expand

Message

Bowman, Terry April 11, 2023, 6:02 p.m. UTC
This patchset adds error handling support for restricted CXL host (RCH)
downstream ports. This is necessary because RCH downstream ports are
implemented in RCRBs and report protocol errors through a root complex
event collector (RCEC). The RCH error reporting flow is not currently
supported by the CXL driver and will be added by this patchset.

The first patch discovers the RCH dport AER and RAS registers. These will
be mapped later and used in CXL driver error logging.

The second patch exports cper_mem_err_unpack(). cper_mem_err_unpack() is a
dependency for using the cper_print_aer() AER trace logging.

The third patch exports cper_print_aer(). cper_print_aer() is used for
CXL AER error logging because it provides a common format for logging
into dmesg.

The fourth patch maps the AER and RAS registers. This patch also adds the
RCH handler for logging downstream port AER and RAS information. 

The fifth patch is AER port driver changes forwarding RCH errors to
the RCiEP RCH handler.

The sixth patch enables internal AER errors for RCEC's with CXL
RCiEPs. The CONFIG_PCIEAER_CXL kernel option is introduced to enable
this logic.

 Changes in V3:
 - Correct base commit in cover sheet.
 - Change hardcoded return 0 to NULL in regs.c.
 - Remove calls to pci_disable_pcie_error_reporting(pdev) and
   pci_enable_pcie_error_reporting(pdev) in mem.c;
 - Move RCEC interrupt unmask to PCIe port AER driver's probe.
   - Fixes missing PCIEAER and PCIEPORTBUS config option error.
 - Rename cxl_rcrb_setup() to cxl_setup_rcrb() in mem.c.
 - Update cper_mem_err_unpack() patch subject and description.

 Changes in V2:
 - Refactor RCH initialization into cxl_mem driver.
   - Includes RCH RAS and AER register discovery and mapping.
 - Add RCEC protocol error interrupt forwarding to CXL endpoint
   handler.
 - Change AER and RAS logging to use existing trace routines.
 - Enable RCEC AER internal errors.
 
Robert Richter (2):
  PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem
    dev handler
  PCI/AER: Unmask RCEC internal errors to enable RCH downstream port
    error handling

Terry Bowman (4):
  cxl/pci: Add RCH downstream port AER and RAS register discovery
  efi/cper: Export cper_mem_err_unpack() for use by modules
  PCI/AER: Export cper_print_aer() for use by modules
  cxl/pci: Add RCH downstream port error logging

 drivers/cxl/core/pci.c      | 126 +++++++++++++++++++++++++++++----
 drivers/cxl/core/regs.c     |  94 +++++++++++++++++++++----
 drivers/cxl/cxl.h           |  18 +++++
 drivers/cxl/mem.c           | 110 ++++++++++++++++++++++++++---
 drivers/firmware/efi/cper.c |   1 +
 drivers/pci/pcie/Kconfig    |   8 +++
 drivers/pci/pcie/aer.c      | 135 ++++++++++++++++++++++++++++++++++++
 7 files changed, 457 insertions(+), 35 deletions(-)

base-commit: ca712e47054678c5ce93a0e0f686353ad5561195