mbox series

[v11,00/20] cxl/pci: Add support for RCH RAS error handling

Message ID 20230927154339.1600738-1-rrichter@amd.com
Headers show
Series cxl/pci: Add support for RCH RAS error handling | expand

Message

Robert Richter Sept. 27, 2023, 3:43 p.m. UTC
This patchset enables CXL RCH error handling. This is necessary because RCH
downstream port protocol error handling is implemented uniquely and not
currently supported. These patches address the following:

   * Discovery and mapping of RCH downstream port AER registers.

   * AER portdrv changes to support CXL RCH protocol errors. 

   * Interrupt setup specific to RCH mode: enabling RCEC internal
     errors and disabling root port interrupts.

   * Logging RCH downstream port AER and RAS errors.

Changes in v11:
  - Rebased onto cxl/fixes (c66650d29764)
  - Added: cxl/port: Fix release of RCD endpoints
  - Added: cxl/core/regs: Rename @dev to @host in struct cxl_register_map
  - Added: cxl/port: Fix @host confusion in cxl_dport_setup_regs()
  - Added: cxl/port: Rename @comp_map to @reg_map in struct cxl_register_map
  - Removed: cxl/regs: Prepare for multiple users of register mappings
  - Modified: cxl/hdm: Use stored Component Register mappings to map
    HDM decoder capability
    - Dan: rework to drop cxl_port_get_comp_map()
  - Added: cxl/pci: Introduce config option PCIEAER_CXL
  - Modified: cxl/pci: Add RCH downstream port AER register discovery
    - Moved AER discovery to devm_cxl_setup_parent_dport() called when
      memdev is probed
    - Fixed devm_cxl_iomap_block() release by fixing devm host
  - Modified: cxl/pci: Map RCH downstream AER registers for logging
    protocol errors
    - Reworded description
    - Moved register mappings to devm_cxl_setup_parent_dport() called
     when memdev is probed
  - Modified: cxl/pci: Disable root port interrupts in RCH mode
    - Call cxl_disable_rch_root_ints() in devm_cxl_setup_parent_dport()
      called when memdev is probed
    - Fixed resource release by fixing devm host
  - Reworded description of PCIEAER_CXL config option
  - Added: cxl/core/regs: Rework cxl_map_pmu_regs() to use map->dev for
    devm

Changes in v10:
  - Updated cxl_setup_comp_regs() in patch#1 to include
    cxl_register_map::reg_type default value initialization.
  
Changes in v9:
  - Added: cxl/regs: Prepare for multiple users of register mappings,
  - Updated use of cxl_map_component_regs() and cxl_map_device_regs

Changes in V8:
  - Rebased onto: commit
    0c0df63177e3 ("Merge branch 'for-6.5/cxl-rch-eh' into for-6.5/cxl")
  - cxl/port: Pre-initialize component register mappings
    - Added patch to pre-initialize component register mappings.
  - cxl/pci: Remove Component Register base address from
    - Separated removal of Component Register base address in struct
    cxl_dev_state to not break functionality.
  - cxl/hdm: Use stored Component Register mappings to map HDM decoder
    capability
    - Implemented a less strict check in devm_cxl_setup_hdm(), be tolerant
    if HDM decoder registers are not implemented.
  - cxl/pci: Map RCH downstream AER registers for  logging protocol errors
    - Fixed uninitialized access of map->dev in cxl_dport_map_regs().
  - PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem
    dev handler
    - Fix typo in patch description.
  - cxl/pci: Update CXL error logging to use RAS register address
    - Fix typo in patch description.
    
Changes in V7:
  - cxl: Updates for CXL Test to work with RCH
    - Removed Robert's DSO.
  - cxl/core/regs: Add @dev to cxl_register_map
    - Corrected typo in patch description.
  - PCI/AER: Unmask RCEC internal errors to enable RCH downstream port
    error handling.
    - Changed int variable to bool.
  - cxl/pci: Map RCH downstream AER registers for logging protocol errors
    - Corrected patch description.
  - cxl/pci: Add RCH downstream port AER register discovery
    - Reverted empty line removal.
  - cxl/port: Store the port's Component Register mappings in struct
    cxl_port
    - Update to use structure initialization in cxl_setup_comp_regs().
  - Remove first patch (already in the tree) and added patch 27/27.
    - Was a one-off error caused when merging branches during internal
    review.

Changes in V6:
  - Added patch for cxl test fixes: 'cxl: Update CXl Test to Work with
    RCH'. Patch from Dan.
  - Simplified: 'cxl/rch: Prepare for caching the MMIO mapped PCIe AER
    capability'. Patch from Dan.
  - Added patch: 'cxl: Rename 'uport' to 'uport_dev''
  - Updated patch: 'cxl: Rename member @dport of struct cxl_dport to
    @dport_dev'
  - Updated *map assignment to use structure init in 'cxl/core/regs: Add
    @dev to cxl_register_map'. Also fixed whitespace.
  - Removed extra whitespace in 'cxl/core/regs: Add @dev to
    cxl_register_map'
  - Updated patch subject: 'cxl/acpi: Move add_host_bridge_uport() after
    cxl_get_chbs()'
  - Changes to work with CXL test. 'cxl/acpi: Directly bind the CEDT
    detected CHBCR to the Host Bridge's port'
  - 'cxl/pci: Early setup RCH dport component registers from RCRB'
    - Removed parameter from cxl_rcrb_get_comp_regs().
    - Changed return value to EPROBE_DEFER for retry during ACPI
      initialization.
    - Changed map to us struct initialization.
  - Remove ENODEV check in 'cxl/port: Store the downstream port's
    Component Register mappings in struct cxl_dport'
  - 'cxl/port: Remove Component Register base address from struct
    cxl_dport'
    - Moved earlier with same removal for cxl_port.
  - cxl/pci: Add RCH downstream port AER register discovery
    - Flattened {request,release}_mem_region() and ioremap() into
      cxl_rcrb_to_aer().
    - Add check if OS is assigned AER handling before discovering AER.
  - Added CXL namespace import to cxl_core (drivers/cxl/core/port.c).
    Needed for using pci_print_aer(). In 'PCI/AER: Refactor
    cper_print_aer() for use by CXL driver module'.
  - cxl/pci: Map RCH downstream AER registers for logging protocol errors
    - Changed dport device used in devm_cxl_iomap_block() call to be
      port->dev.
    - Removed ENODEV check.
  - cxl/pci: Disable root port interrupts in RCH mode
    - Removed unnecessary 'rch' check.
    - Moved cxl_disable_rch_root_ints() into core/pci.c. 
    - Added OSC AER assignment check before accessing AER registers.
  - cxl/pci: Update CXL error logging to use RAS register address
    - Renamed function handlers.
  - cxl/pci: Add RCH downstream port error logging
    - Moved RCD check to caller.
    - Added put_dev() after call to cxl_pci_find_port().
                                                          
Changes in V5:
  - Split 'cxl/rch: Prepare for logging RCH downstream port protocol
    errors' patch into 2 patches.
  - Added:
    cxl/core/regs: Rename phys_addr in cxl_map_component_regs()
    cxl/mem: Prepare for early RCH dport component register setup
  - Correct comments CXL3.0 to CXL 3.0.
  - changed cxl_port_get_comp_map() to static.

Changes in V4:
  - Made port RAS register discovery common and called from
    __devm_cxl_add_dport().
  - Changed RCH AER register discovery to be called from
    __devm_cxl_add_dport().
  - Changed RAS and RCH AER register mapping to be called from
    __devm_cxl_add_dport().
  - Changed component register mapping to support all CXL component
    mapping, cxl_map_component_regs().
  - Added cxl_regs to 'struct cxl_dport' for providing RCH downstream port
    mapped registers USED in error handler.
  - PCI/AER:
      - Improved description of PCIEAER_CXL option in Kconfig.
      - Renamed function to pci_aer_unmask_internal_errors(), added
        pcie_aer_is_native() check.
      - Improved comments and added spec refs.
      - Renamed functions to cxl_rch_handle_error*().
      - Modified cxl_rch_handle_error_iter() to only call the handler
        callbacks, this also simplifies refcounting of the pdev.
      - Refactored handle_error_source(), created pci_aer_handle_error().
      - Changed printk messages to pci_*() variants.
      - Added check for pcie_aer_is_native() to the RCEC.
      - Introduced function cxl_rch_enable_rcec().
      - Updated patch description ("PCI/AER: Forward RCH downstream
      port-detected errors to the CXL.mem dev handler").

Changes in V3:
  - Correct base commit in cover sheet.
  - Change hardcoded return 0 to NULL in regs.c.
  - Remove calls to pci_disable_pcie_error_reporting(pdev) and
    pci_enable_pcie_error_reporting(pdev) in mem.c;
  - Move RCEC interrupt unmask to PCIe port AER driver's probe.
    - Fixes missing PCIEAER and PCIEPORTBUS config option error.
  - Rename cxl_rcrb_setup() to cxl_setup_rcrb() in mem.c.
  - Update cper_mem_err_unpack() patch subject and description.

Changes in V2:
  - Refactor RCH initialization into cxl_mem driver.
    - Includes RCH RAS and AER register discovery and mapping.
  - Add RCEC protocol error interrupt forwarding to CXL endpoint
    handler.
  - Change AER and RAS logging to use existing trace routines.
  - Enable RCEC AER internal errors.

Dan Williams (1):
  cxl/port: Fix @host confusion in cxl_dport_setup_regs()

Robert Richter (13):
  cxl/port: Fix release of RCD endpoints
  cxl/core/regs: Rename @dev to @host in struct cxl_register_map
  cxl/port: Rename @comp_map to @reg_map in struct cxl_register_map
  cxl/port: Pre-initialize component register mappings
  cxl/pci: Store the endpoint's Component Register mappings in struct
    cxl_dev_state
  cxl/hdm: Use stored Component Register mappings to map HDM decoder
    capability
  cxl/pci: Remove Component Register base address from struct
    cxl_dev_state
  cxl/port: Remove Component Register base address from struct cxl_port
  cxl/pci: Introduce config option PCIEAER_CXL
  PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem
    dev handler
  PCI/AER: Unmask RCEC internal errors to enable RCH downstream port
    error handling
  cxl/core/regs: Rename phys_addr in cxl_map_component_regs()
  cxl/core/regs: Rework cxl_map_pmu_regs() to use map->dev for devm

Terry Bowman (6):
  cxl/pci: Add RCH downstream port AER register discovery
  PCI/AER: Refactor cper_print_aer() for use by CXL driver module
  cxl/pci: Update CXL error logging to use RAS register address
  cxl/pci: Map RCH downstream AER registers for logging protocol errors
  cxl/pci: Add RCH downstream port error logging
  cxl/pci: Disable root port interrupts in RCH mode

 drivers/cxl/core/core.h      |   1 +
 drivers/cxl/core/hdm.c       |  48 +++-----
 drivers/cxl/core/mbox.c      |   2 +
 drivers/cxl/core/pci.c       | 223 +++++++++++++++++++++++++++++++++--
 drivers/cxl/core/port.c      | 104 +++++++++++-----
 drivers/cxl/core/regs.c      |  72 ++++++++---
 drivers/cxl/cxl.h            |  34 ++++--
 drivers/cxl/cxlmem.h         |   4 +-
 drivers/cxl/mem.c            |   7 +-
 drivers/cxl/pci.c            |  14 +--
 drivers/pci/pcie/Kconfig     |   9 ++
 drivers/pci/pcie/aer.c       | 162 ++++++++++++++++++++++++-
 include/linux/aer.h          |   2 +-
 tools/testing/cxl/test/mem.c |   1 -
 14 files changed, 559 insertions(+), 124 deletions(-)


base-commit: c66650d29764e228eba40b7a59fdb70fa6567daa

Comments

Robert Richter Sept. 27, 2023, 4:04 p.m. UTC | #1
Dan,

On 27.09.23 17:43:19, Robert Richter wrote:

> Changes in v11:
>   - Rebased onto cxl/fixes (c66650d29764)
>   - Added: cxl/port: Fix release of RCD endpoints
>   - Added: cxl/core/regs: Rename @dev to @host in struct cxl_register_map
>   - Added: cxl/port: Fix @host confusion in cxl_dport_setup_regs()
>   - Added: cxl/port: Rename @comp_map to @reg_map in struct cxl_register_map
>   - Removed: cxl/regs: Prepare for multiple users of register mappings
>   - Modified: cxl/hdm: Use stored Component Register mappings to map
>     HDM decoder capability
>     - Dan: rework to drop cxl_port_get_comp_map()
>   - Added: cxl/pci: Introduce config option PCIEAER_CXL
>   - Modified: cxl/pci: Add RCH downstream port AER register discovery
>     - Moved AER discovery to devm_cxl_setup_parent_dport() called when
>       memdev is probed
>     - Fixed devm_cxl_iomap_block() release by fixing devm host
>   - Modified: cxl/pci: Map RCH downstream AER registers for logging
>     protocol errors
>     - Reworded description
>     - Moved register mappings to devm_cxl_setup_parent_dport() called
>      when memdev is probed
>   - Modified: cxl/pci: Disable root port interrupts in RCH mode
>     - Call cxl_disable_rch_root_ints() in devm_cxl_setup_parent_dport()
>       called when memdev is probed
>     - Fixed resource release by fixing devm host
>   - Reworded description of PCIEAER_CXL config option
>   - Added: cxl/core/regs: Rework cxl_map_pmu_regs() to use map->dev for
>     devm

for a v11 this is a major rework. Most of the dport setup is now in
devm_cxl_setup_parent_dport() which is called very late from
cxl_mem_probe(). Also, additional patches with fixes and more
reworks. I saw one failure in the ndctl cxl test suite with qemu, but
decided to send the patches out anyway as a new baseline for review,
testing and debugging. Bear with it as due to its changes the code
need to mature a little.

Thanks,

-Robert