diff mbox series

[v3,5/6] PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler

Message ID 20230411180302.2678736-6-terry.bowman@amd.com (mailing list archive)
State Handled Elsewhere
Delegated to: Bjorn Helgaas
Headers show
Series None | expand

Commit Message

Bowman, Terry April 11, 2023, 6:03 p.m. UTC
From: Robert Richter <rrichter@amd.com>

In Restricted CXL Device (RCD) mode a CXL device is exposed as an
RCiEP, but CXL downstream and upstream ports are not enumerated and
not visible in the PCIe hierarchy. Protocol and link errors are sent
to an RCEC.

Restricted CXL host (RCH) downstream port-detected errors are signaled
as internal AER errors, either Uncorrectable Internal Error (UIE) or
Corrected Internal Errors (CIE). The error source is the id of the
RCEC. A CXL handler must then inspect the error status in various CXL
registers residing in the dport's component register space (CXL RAS
cap) or the dport's RCRB (AER ext cap). [1]

Errors showing up in the RCEC's error handler must be handled and
connected to the CXL subsystem. Implement this by forwarding the error
to all CXL devices below the RCEC. Since the entire CXL device is
controlled only using PCIe Configuration Space of device 0, Function
0, only pass it there [2]. These devices have the Memory Device class
code set (PCI_CLASS_MEMORY_CXL, 502h) and the existing cxl_pci driver
can implement the handler. In addition to errors directed to the CXL
endpoint device, the handler must also inspect the CXL downstream
port's CXL RAS and PCIe AER external capabilities that is connected to
the device.

Since CXL downstream port errors are signaled using internal errors,
the handler requires those errors to be unmasked. This is subject of a
follow-on patch.

The reason for choosing this implementation is that a CXL RCEC device
is bound to the AER port driver, but the driver does not allow it to
register a custom specific handler to support CXL. Connecting the RCEC
hard-wired with a CXL handler does not work, as the CXL subsystem
might not be present all the time. The alternative to add an
implementation to the portdrv to allow the registration of a custom
RCEC error handler isn't worth doing it as CXL would be its only user.
Instead, just check for an CXL RCEC and pass it down to the connected
CXL device's error handler. With this approach the code can entirely
be implemented in the PCIe AER driver and is independent of the CXL
subsystem. The CXL driver only provides the handler.

[1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
[2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices

Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-pci@vger.kernel.org
---
 drivers/pci/pcie/Kconfig |  8 ++++++
 drivers/pci/pcie/aer.c   | 61 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

Comments

Bjorn Helgaas April 12, 2023, 10:02 p.m. UTC | #1
On Tue, Apr 11, 2023 at 01:03:01PM -0500, Terry Bowman wrote:
> From: Robert Richter <rrichter@amd.com>
> 
> In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> RCiEP, but CXL downstream and upstream ports are not enumerated and
> not visible in the PCIe hierarchy. Protocol and link errors are sent
> to an RCEC.
> 
> Restricted CXL host (RCH) downstream port-detected errors are signaled
> as internal AER errors, either Uncorrectable Internal Error (UIE) or
> Corrected Internal Errors (CIE). The error source is the id of the
> RCEC. A CXL handler must then inspect the error status in various CXL
> registers residing in the dport's component register space (CXL RAS
> cap) or the dport's RCRB (AER ext cap). [1]
> 
> Errors showing up in the RCEC's error handler must be handled and
> connected to the CXL subsystem. Implement this by forwarding the error
> to all CXL devices below the RCEC. Since the entire CXL device is
> controlled only using PCIe Configuration Space of device 0, Function
> 0,

Capitalize "device" and "Function" the same way (also appears in
comment below).

> only pass it there [2]. These devices have the Memory Device class
> code set (PCI_CLASS_MEMORY_CXL, 502h) and the existing cxl_pci driver
> can implement the handler. In addition to errors directed to the CXL
> endpoint device, the handler must also inspect the CXL downstream
> port's CXL RAS and PCIe AER external capabilities that is connected to

"AER external capabilities" --  is that referring to the "AER
*Extended* capability"?  If so, we usually don't bother including the
"extended" part because it's usually not relevant.  But if you intended
"external", I don't know what it means.

> the device.
> 
> Since CXL downstream port errors are signaled using internal errors,
> the handler requires those errors to be unmasked. This is subject of a
> follow-on patch.
> 
> The reason for choosing this implementation is that a CXL RCEC device
> is bound to the AER port driver, but the driver does not allow it to
> register a custom specific handler to support CXL. Connecting the RCEC
> hard-wired with a CXL handler does not work, as the CXL subsystem
> might not be present all the time. The alternative to add an
> implementation to the portdrv to allow the registration of a custom
> RCEC error handler isn't worth doing it as CXL would be its only user.
> Instead, just check for an CXL RCEC and pass it down to the connected
> CXL device's error handler. With this approach the code can entirely
> be implemented in the PCIe AER driver and is independent of the CXL
> subsystem. The CXL driver only provides the handler.

Can you make this more concrete with an example topology so we can
work through how this all works?  Correct me when I go off the rails
here:

The current code uses pcie_walk_rcec() in this path, which basically
searches below a Root Port or RCEC for devices that have an AER error
status bit set, add them to the e_info[] list, and call
handle_error_source() for each one:

  aer_isr_one_error
    # get e_src from aer_fifo
    find_source_device(e_src)
      pcie_walk_rcec(find_device_iter)
        find_device_iter
          is_error_source
            # read PCI_ERR_COR_STATUS or PCI_ERR_UNCOR_STATUS
          if (error-source)
            add_error_device
              # add device to e_info[] list
    # now call handle_error_source for everything in e_info[]
    aer_process_err_devices
      for (i = 0; i < e_info->err_dev_num; i++)
        handle_error_source

IIUC, this patch basically says that an RCEC should have an AER error
status bit (UIE or CIE) set, but the devices "below" the RCEC will
not, so they won't get added to e_info[].

So we insert cxl_handle_error() in handle_error_source(), where it
gets called for the RCEC, and then it uses pcie_walk_rcec() again to
forcibly call handle_error_source() for *every* device "below" the
RCEC (even though they don't have AER error status bits set).

Then handle_error_source() ultimately calls the CXL driver err_handler
entry points (.cor_error_detected(), .error_detected(), etc), which
can look at the CXL-specific error status in the CXL RAS or RCRB or
whatever.

So this basically looks like a workaround for the fact that the AER
code only calls handle_error_source() when it finds AER error status,
and CXL doesn't *set* that AER error status.  There's not that much
code here, but it seems like a quite a bit of complexity in an area
that is already pretty complicated.

Here's another idea: the ACPI GHES code (ghes_handle_aer()) basically
receives a packet of error status from firmware and queues it for
recovery via pcie_do_recovery().  What if you had a CXL module that
knew how to look for the CXL error status, package it up similarly,
and queue it via aer_recover_queue()?

> [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
> 
> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Cc: "Oliver O'Halloran" <oohall@gmail.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-pci@vger.kernel.org
> ---
>  drivers/pci/pcie/Kconfig |  8 ++++++
>  drivers/pci/pcie/aer.c   | 61 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 69 insertions(+)
> 
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 228652a59f27..b0dbd864d3a3 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -49,6 +49,14 @@ config PCIEAER_INJECT
>  	  gotten from:
>  	     https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
>  
> +config PCIEAER_CXL
> +	bool "PCI Express CXL RAS support"
> +	default y
> +	depends on PCIEAER && CXL_PCI
> +	help
> +	  This enables CXL error handling for Restricted CXL Hosts
> +	  (RCHs).
> +
>  #
>  # PCI Express ECRC
>  #
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 7a25b62d9e01..171a08fd8ebd 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -946,6 +946,65 @@ static bool find_source_device(struct pci_dev *parent,
>  	return true;
>  }
>  
> +#ifdef CONFIG_PCIEAER_CXL
> +
> +static bool is_cxl_mem_dev(struct pci_dev *dev)
> +{
> +	/*
> +	 * A CXL device is controlled only using PCIe Configuration
> +	 * Space of device 0, Function 0.
> +	 */
> +	if (dev->devfn != PCI_DEVFN(0, 0))
> +		return false;
> +
> +	/* Right now there is only a CXL.mem driver */
> +	if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> +		return false;
> +
> +	return true;
> +}
> +
> +static bool is_internal_error(struct aer_err_info *info)
> +{
> +	if (info->severity == AER_CORRECTABLE)
> +		return info->status & PCI_ERR_COR_INTERNAL;
> +
> +	return info->status & PCI_ERR_UNC_INTN;
> +}
> +
> +static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info);
> +
> +static int cxl_handle_error_iter(struct pci_dev *dev, void *data)
> +{
> +	struct aer_err_info *e_info = (struct aer_err_info *)data;
> +
> +	if (!is_cxl_mem_dev(dev))
> +		return 0;
> +
> +	/* pci_dev_put() in handle_error_source() */
> +	dev = pci_dev_get(dev);
> +	if (dev)
> +		handle_error_source(dev, e_info);
> +
> +	return 0;
> +}
> +
> +static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> +{
> +	/*
> +	 * CXL downstream port errors are signaled as RCEC internal
> +	 * errors. Forward them to all CXL devices below the RCEC.
> +	 */
> +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> +	    is_internal_error(info))
> +		pcie_walk_rcec(dev, cxl_handle_error_iter, info);
> +}
> +
> +#else
> +static inline void cxl_handle_error(struct pci_dev *dev,
> +				    struct aer_err_info *info) { }
> +#endif
> +
>  /**
>   * handle_error_source - handle logging error into an event log
>   * @dev: pointer to pci_dev data structure of error source device
> @@ -957,6 +1016,8 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
>  {
>  	int aer = dev->aer_cap;
>  
> +	cxl_handle_error(dev, info);
> +
>  	if (info->severity == AER_CORRECTABLE) {
>  		/*
>  		 * Correctable error does not need software intervention.
> -- 
> 2.34.1
>
Robert Richter April 13, 2023, 11:40 a.m. UTC | #2
Bjorn,

thanks for your detailed review.

On 12.04.23 17:02:33, Bjorn Helgaas wrote:
> On Tue, Apr 11, 2023 at 01:03:01PM -0500, Terry Bowman wrote:
> > From: Robert Richter <rrichter@amd.com>
> > 
> > In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> > RCiEP, but CXL downstream and upstream ports are not enumerated and
> > not visible in the PCIe hierarchy. Protocol and link errors are sent
> > to an RCEC.
> > 
> > Restricted CXL host (RCH) downstream port-detected errors are signaled
> > as internal AER errors, either Uncorrectable Internal Error (UIE) or
> > Corrected Internal Errors (CIE). The error source is the id of the
> > RCEC. A CXL handler must then inspect the error status in various CXL
> > registers residing in the dport's component register space (CXL RAS
> > cap) or the dport's RCRB (AER ext cap). [1]
> > 
> > Errors showing up in the RCEC's error handler must be handled and
> > connected to the CXL subsystem. Implement this by forwarding the error
> > to all CXL devices below the RCEC. Since the entire CXL device is
> > controlled only using PCIe Configuration Space of device 0, Function
> > 0,
> 
> Capitalize "device" and "Function" the same way (also appears in
> comment below).

Changed that.

> 
> > only pass it there [2]. These devices have the Memory Device class
> > code set (PCI_CLASS_MEMORY_CXL, 502h) and the existing cxl_pci driver
> > can implement the handler. In addition to errors directed to the CXL
> > endpoint device, the handler must also inspect the CXL downstream
> > port's CXL RAS and PCIe AER external capabilities that is connected to
> 
> "AER external capabilities" --  is that referring to the "AER
> *Extended* capability"?  If so, we usually don't bother including the
> "extended" part because it's usually not relevant.  But if you intended
> "external", I don't know what it means.

Right, "extended" is meant here, but I will drop it to also fit with
the 'CXL RAS capability'.

> 
> > the device.
> > 
> > Since CXL downstream port errors are signaled using internal errors,
> > the handler requires those errors to be unmasked. This is subject of a
> > follow-on patch.
> > 
> > The reason for choosing this implementation is that a CXL RCEC device
> > is bound to the AER port driver, but the driver does not allow it to
> > register a custom specific handler to support CXL. Connecting the RCEC
> > hard-wired with a CXL handler does not work, as the CXL subsystem
> > might not be present all the time. The alternative to add an
> > implementation to the portdrv to allow the registration of a custom
> > RCEC error handler isn't worth doing it as CXL would be its only user.
> > Instead, just check for an CXL RCEC and pass it down to the connected
> > CXL device's error handler. With this approach the code can entirely
> > be implemented in the PCIe AER driver and is independent of the CXL
> > subsystem. The CXL driver only provides the handler.
> 
> Can you make this more concrete with an example topology so we can
> work through how this all works?  Correct me when I go off the rails
> here:

Let's assume just a simple CXL RCH topology:

PCI hierarchy:

		-----------------
		| ACPI0016      |----------------	Host bridge (CXL host)
		| - CEDT        |		|
   -------------|   - RCRB base |		|
   |		-----------------		:
   |		     |
   |		     |
   |		------------------- 	---------
   |		| RCiEP       	  |.....| RCEC  |	Endpoint (CXL dev)
   |	--------| - BDF       	  |	| - BDF |
   |	|	| - PCIe AER  	  |	---------
   |	|	| - CXL dvsec     |
   |	|	|   (v2: reg loc) |
   |	|	|   - Comp regs   |
   |	|	|     - CXL RAS   |
   |	|	-------------------
   :	:
	
CXL hierarchy:

   :						:
   : 		------------------		|
   |		| CXL root port  |<--------------
   |		|     	   	 |	  
   |----------->| - dport RCRB   |<--------------
   |		|   - PCIe AER	 |		|
   |		|   - Comp regs	 |		|
   |		|     - CXL RAS  |		|
   |		------------------		|
   |	:					|
   |	|	------------------		|
   |	------->| CXL endpoint   |---------------
   |		| (v1: RCRB)	 |
   ------------>| - uport RCRB   |
   		|   - Comp regs	 |
		|     - CXL RAS  |
		------------------

Dport detected errors are reported using PCIe AER and CXL RAS caps in
the dports RCRB.

Uport detected errors are reported using RCiEP's PCIe AER cap and
either the uport's RCRB RAS cap or the RAS cap of the comp regs
located using CXL DVSEC register locator.

In all cases the RCEC is used with either the RCEC (dport errors) or
the RCiEP (uport errors) error source id (BDF: bus, dev, func).

> 
> The current code uses pcie_walk_rcec() in this path, which basically
> searches below a Root Port or RCEC for devices that have an AER error
> status bit set, add them to the e_info[] list, and call
> handle_error_source() for each one:

For reference, this series adds support to handle RCH downstream
port-detected errors as described in CXL 3.0, 12.2.1.1.

This flow looks correct to me, see comments inline.

> 
>   aer_isr_one_error
>     # get e_src from aer_fifo
>     find_source_device(e_src)

e_src is the RCEC.

>       pcie_walk_rcec(find_device_iter)
>         find_device_iter
>           is_error_source
>             # read PCI_ERR_COR_STATUS or PCI_ERR_UNCOR_STATUS

It is an internal error (CIE or UIE).

>           if (error-source)

An early version of the spec did not require the RCEC as an error
source. But this case is not handled with this series.

>             add_error_device
>               # add device to e_info[] list
>     # now call handle_error_source for everything in e_info[]
>     aer_process_err_devices
>       for (i = 0; i < e_info->err_dev_num; i++)
>         handle_error_source

handle_error_source() is called with the RCEC as pci_dev.

> 
> IIUC, this patch basically says that an RCEC should have an AER error
> status bit (UIE or CIE) set, but the devices "below" the RCEC will
> not, so they won't get added to e_info[].

An internal error of the RCEC indicates a CXL dport error.

> 
> So we insert cxl_handle_error() in handle_error_source(), where it
> gets called for the RCEC, and then it uses pcie_walk_rcec() again to
> forcibly call handle_error_source() for *every* device "below" the
> RCEC (even though they don't have AER error status bits set).

The CXL device contains the links to the dport's caps. Also, there can
be multiple RCs with CXL devs connected to it. So we must search for
all CXL devices now, determine the corresponding dport and inspect
both, PCIe AER and CXL RAS caps.

> 
> Then handle_error_source() ultimately calls the CXL driver err_handler
> entry points (.cor_error_detected(), .error_detected(), etc), which
> can look at the CXL-specific error status in the CXL RAS or RCRB or
> whatever.

The AER driver (portdrv) does not have the knowledge of CXL internals.
Thus the approach is to pass dport errors to the cxl_mem driver to
handle it there in addition to cxl mem dev errors.

> 
> So this basically looks like a workaround for the fact that the AER
> code only calls handle_error_source() when it finds AER error status,
> and CXL doesn't *set* that AER error status.  There's not that much
> code here, but it seems like a quite a bit of complexity in an area
> that is already pretty complicated.
> 
> Here's another idea: the ACPI GHES code (ghes_handle_aer()) basically
> receives a packet of error status from firmware and queues it for
> recovery via pcie_do_recovery().  What if you had a CXL module that
> knew how to look for the CXL error status, package it up similarly,
> and queue it via aer_recover_queue()?

The CXL module knows how and where to look for errors, but it does not
receive interrupts (for dport errors). The interrupts land in the
portdrv (the RCEC's pci driver) and the CXL module must be notified by
the portdrv. But the portdrv (AER driver) does not know the CXL module
nor it is always present (e.g. CXL bus must be enumerated first etc.).

aer_recover_queue() is interesting to report AER errors that has been
retrieved outside the PCIe hierarchy, in particular the dport AER cap
in the RCRB (see patch #4). We could collect all the data and just
send it to aer_recover_queue(). I think aer_recover_work_func() must
be extended to also handle corrected errors, otherwise the function is
already almost the same as handle_error_source().

But first, RCEC error notifications (RCEC AER interrupts) must be sent
to the CXL driver to look into the dport's RCRB.

-Robert

> 
> > [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> > [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
> > 
> > Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> > Cc: "Oliver O'Halloran" <oohall@gmail.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: linux-pci@vger.kernel.org
> > ---
> >  drivers/pci/pcie/Kconfig |  8 ++++++
> >  drivers/pci/pcie/aer.c   | 61 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 69 insertions(+)
> > 
> > diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> > index 228652a59f27..b0dbd864d3a3 100644
> > --- a/drivers/pci/pcie/Kconfig
> > +++ b/drivers/pci/pcie/Kconfig
> > @@ -49,6 +49,14 @@ config PCIEAER_INJECT
> >  	  gotten from:
> >  	     https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
> >  
> > +config PCIEAER_CXL
> > +	bool "PCI Express CXL RAS support"
> > +	default y
> > +	depends on PCIEAER && CXL_PCI
> > +	help
> > +	  This enables CXL error handling for Restricted CXL Hosts
> > +	  (RCHs).
> > +
> >  #
> >  # PCI Express ECRC
> >  #
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index 7a25b62d9e01..171a08fd8ebd 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -946,6 +946,65 @@ static bool find_source_device(struct pci_dev *parent,
> >  	return true;
> >  }
> >  
> > +#ifdef CONFIG_PCIEAER_CXL
> > +
> > +static bool is_cxl_mem_dev(struct pci_dev *dev)
> > +{
> > +	/*
> > +	 * A CXL device is controlled only using PCIe Configuration
> > +	 * Space of device 0, Function 0.
> > +	 */
> > +	if (dev->devfn != PCI_DEVFN(0, 0))
> > +		return false;
> > +
> > +	/* Right now there is only a CXL.mem driver */
> > +	if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +static bool is_internal_error(struct aer_err_info *info)
> > +{
> > +	if (info->severity == AER_CORRECTABLE)
> > +		return info->status & PCI_ERR_COR_INTERNAL;
> > +
> > +	return info->status & PCI_ERR_UNC_INTN;
> > +}
> > +
> > +static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info);
> > +
> > +static int cxl_handle_error_iter(struct pci_dev *dev, void *data)
> > +{
> > +	struct aer_err_info *e_info = (struct aer_err_info *)data;
> > +
> > +	if (!is_cxl_mem_dev(dev))
> > +		return 0;
> > +
> > +	/* pci_dev_put() in handle_error_source() */
> > +	dev = pci_dev_get(dev);
> > +	if (dev)
> > +		handle_error_source(dev, e_info);
> > +
> > +	return 0;
> > +}
> > +
> > +static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> > +{
> > +	/*
> > +	 * CXL downstream port errors are signaled as RCEC internal
> > +	 * errors. Forward them to all CXL devices below the RCEC.
> > +	 */
> > +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> > +	    is_internal_error(info))
> > +		pcie_walk_rcec(dev, cxl_handle_error_iter, info);
> > +}
> > +
> > +#else
> > +static inline void cxl_handle_error(struct pci_dev *dev,
> > +				    struct aer_err_info *info) { }
> > +#endif
> > +
> >  /**
> >   * handle_error_source - handle logging error into an event log
> >   * @dev: pointer to pci_dev data structure of error source device
> > @@ -957,6 +1016,8 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
> >  {
> >  	int aer = dev->aer_cap;
> >  
> > +	cxl_handle_error(dev, info);
> > +
> >  	if (info->severity == AER_CORRECTABLE) {
> >  		/*
> >  		 * Correctable error does not need software intervention.
> > -- 
> > 2.34.1
> >
Jonathan Cameron April 14, 2023, 12:19 p.m. UTC | #3
On Tue, 11 Apr 2023 13:03:01 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

> From: Robert Richter <rrichter@amd.com>
> 
> In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> RCiEP, but CXL downstream and upstream ports are not enumerated and
> not visible in the PCIe hierarchy. Protocol and link errors are sent
> to an RCEC.
> 
> Restricted CXL host (RCH) downstream port-detected errors are signaled
> as internal AER errors, either Uncorrectable Internal Error (UIE) or
> Corrected Internal Errors (CIE). The error source is the id of the
> RCEC. A CXL handler must then inspect the error status in various CXL
> registers residing in the dport's component register space (CXL RAS
> cap) or the dport's RCRB (AER ext cap). [1]
> 
> Errors showing up in the RCEC's error handler must be handled and
> connected to the CXL subsystem. Implement this by forwarding the error
> to all CXL devices below the RCEC. Since the entire CXL device is
> controlled only using PCIe Configuration Space of device 0, Function
> 0, only pass it there [2]. These devices have the Memory Device class
> code set (PCI_CLASS_MEMORY_CXL, 502h) and the existing cxl_pci driver
> can implement the handler.

This comment implies only class code compliant drivers.  Sure we don't
have drivers for anything else yet, but we should try to avoid saying
there won't be any (which I think above implies).

You have a comment in the code, but maybe relaxing the description above
to "currently support devices have..."

> In addition to errors directed to the CXL
> endpoint device, the handler must also inspect the CXL downstream
> port's CXL RAS and PCIe AER external capabilities that is connected to
> the device.
> 
> Since CXL downstream port errors are signaled using internal errors,
> the handler requires those errors to be unmasked. This is subject of a
> follow-on patch.
> 
> The reason for choosing this implementation is that a CXL RCEC device
> is bound to the AER port driver, but the driver does not allow it to
> register a custom specific handler to support CXL. Connecting the RCEC
> hard-wired with a CXL handler does not work, as the CXL subsystem
> might not be present all the time. The alternative to add an
> implementation to the portdrv to allow the registration of a custom
> RCEC error handler isn't worth doing it as CXL would be its only user.
> Instead, just check for an CXL RCEC and pass it down to the connected
> CXL device's error handler. With this approach the code can entirely
> be implemented in the PCIe AER driver and is independent of the CXL
> subsystem. The CXL driver only provides the handler.
> 
> [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
> 
> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Cc: "Oliver O'Halloran" <oohall@gmail.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-pci@vger.kernel.org

Generally looks good to me.  A few trivial comments inline.

> ---
>  drivers/pci/pcie/Kconfig |  8 ++++++
>  drivers/pci/pcie/aer.c   | 61 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 69 insertions(+)
> 
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 228652a59f27..b0dbd864d3a3 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -49,6 +49,14 @@ config PCIEAER_INJECT
>  	  gotten from:
>  	     https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
>  
> +config PCIEAER_CXL
> +	bool "PCI Express CXL RAS support"

Description makes this sound too general. I'd mentioned restricted
hosts even in the menu option title.


> +	default y
> +	depends on PCIEAER && CXL_PCI
> +	help
> +	  This enables CXL error handling for Restricted CXL Hosts
> +	  (RCHs).

Spec term is probably fine in the title, but in the help I'd 
expand it as per the CXL 3.0 glossary to include
"CXL Host that is operating in RCD mode."
It might otherwise surprise people that this matters on their shiny
new CXL X.0 host (because they found an old CXL 1.1 card in a box
and decided to plug it in)

Do we actually need this protection at all?  It's a tiny amount of code
and I can't see anything immediately that requires the CXL_PCI dependency
other than it's a bit pointless if that isn't here.

> +
>  #
>  # PCI Express ECRC
>  #
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 7a25b62d9e01..171a08fd8ebd 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -946,6 +946,65 @@ static bool find_source_device(struct pci_dev *parent,
>  	return true;
>  }
>  
> +#ifdef CONFIG_PCIEAER_CXL
> +
> +static bool is_cxl_mem_dev(struct pci_dev *dev)
> +{
> +	/*
> +	 * A CXL device is controlled only using PCIe Configuration
> +	 * Space of device 0, Function 0.

That's not true in general.   Definitely true that CXL protocol
error reporting is controlled only using this Devfn, but
more generally there could be other stuff in later functions.
So perhaps make the comment more specific.

> +	 */
> +	if (dev->devfn != PCI_DEVFN(0, 0))
> +		return false;
> +
> +	/* Right now there is only a CXL.mem driver */
> +	if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> +		return false;
> +
> +	return true;
> +}
> +
> +static bool is_internal_error(struct aer_err_info *info)
> +{
> +	if (info->severity == AER_CORRECTABLE)
> +		return info->status & PCI_ERR_COR_INTERNAL;
> +
> +	return info->status & PCI_ERR_UNC_INTN;
> +}
> +
> +static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info);
> +
> +static int cxl_handle_error_iter(struct pci_dev *dev, void *data)
> +{
> +	struct aer_err_info *e_info = (struct aer_err_info *)data;
> +
> +	if (!is_cxl_mem_dev(dev))
> +		return 0;
> +
> +	/* pci_dev_put() in handle_error_source() */
> +	dev = pci_dev_get(dev);
> +	if (dev)
> +		handle_error_source(dev, e_info);
> +
> +	return 0;
> +}
> +
> +static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> +{
> +	/*
> +	 * CXL downstream port errors are signaled as RCEC internal

Make this comment more specific (to RCH I think).

> +	 * errors. Forward them to all CXL devices below the RCEC.
> +	 */
> +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> +	    is_internal_error(info))
> +		pcie_walk_rcec(dev, cxl_handle_error_iter, info);
> +}
> +
> +#else
> +static inline void cxl_handle_error(struct pci_dev *dev,
> +				    struct aer_err_info *info) { }
> +#endif
> +
>  /**
>   * handle_error_source - handle logging error into an event log
>   * @dev: pointer to pci_dev data structure of error source device
> @@ -957,6 +1016,8 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
>  {
>  	int aer = dev->aer_cap;
>  
> +	cxl_handle_error(dev, info);
> +
>  	if (info->severity == AER_CORRECTABLE) {
>  		/*
>  		 * Correctable error does not need software intervention.
Robert Richter April 14, 2023, 2:35 p.m. UTC | #4
On 14.04.23 13:19:50, Jonathan Cameron wrote:
> On Tue, 11 Apr 2023 13:03:01 -0500
> Terry Bowman <terry.bowman@amd.com> wrote:
> 
> > From: Robert Richter <rrichter@amd.com>
> > 
> > In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> > RCiEP, but CXL downstream and upstream ports are not enumerated and
> > not visible in the PCIe hierarchy. Protocol and link errors are sent
> > to an RCEC.
> > 
> > Restricted CXL host (RCH) downstream port-detected errors are signaled
> > as internal AER errors, either Uncorrectable Internal Error (UIE) or
> > Corrected Internal Errors (CIE). The error source is the id of the
> > RCEC. A CXL handler must then inspect the error status in various CXL
> > registers residing in the dport's component register space (CXL RAS
> > cap) or the dport's RCRB (AER ext cap). [1]
> > 
> > Errors showing up in the RCEC's error handler must be handled and
> > connected to the CXL subsystem. Implement this by forwarding the error
> > to all CXL devices below the RCEC. Since the entire CXL device is
> > controlled only using PCIe Configuration Space of device 0, Function
> > 0, only pass it there [2]. These devices have the Memory Device class
> > code set (PCI_CLASS_MEMORY_CXL, 502h) and the existing cxl_pci driver
> > can implement the handler.
> 
> This comment implies only class code compliant drivers.  Sure we don't
> have drivers for anything else yet, but we should try to avoid saying
> there won't be any (which I think above implies).
> 
> You have a comment in the code, but maybe relaxing the description above
> to "currently support devices have..."

It is used here to identify CXL memory devices and limit the
enablement to those. The spec requires this to be set for CXL mem devs
(see cxl 3.0, 8.1.12.2).

There could be other CXL devices (e.g. cache), but other drivers are
not yet implemented. That is what I am referring to. The check makes
sure there is actually a driver with a handler for it (cxl_pci).

> 
> > In addition to errors directed to the CXL
> > endpoint device, the handler must also inspect the CXL downstream
> > port's CXL RAS and PCIe AER external capabilities that is connected to
> > the device.
> > 
> > Since CXL downstream port errors are signaled using internal errors,
> > the handler requires those errors to be unmasked. This is subject of a
> > follow-on patch.
> > 
> > The reason for choosing this implementation is that a CXL RCEC device
> > is bound to the AER port driver, but the driver does not allow it to
> > register a custom specific handler to support CXL. Connecting the RCEC
> > hard-wired with a CXL handler does not work, as the CXL subsystem
> > might not be present all the time. The alternative to add an
> > implementation to the portdrv to allow the registration of a custom
> > RCEC error handler isn't worth doing it as CXL would be its only user.
> > Instead, just check for an CXL RCEC and pass it down to the connected
> > CXL device's error handler. With this approach the code can entirely
> > be implemented in the PCIe AER driver and is independent of the CXL
> > subsystem. The CXL driver only provides the handler.
> > 
> > [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> > [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
> > 
> > Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> > Cc: "Oliver O'Halloran" <oohall@gmail.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: linux-pci@vger.kernel.org
> 
> Generally looks good to me.  A few trivial comments inline.
> 
> > ---
> >  drivers/pci/pcie/Kconfig |  8 ++++++
> >  drivers/pci/pcie/aer.c   | 61 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 69 insertions(+)
> > 
> > diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> > index 228652a59f27..b0dbd864d3a3 100644
> > --- a/drivers/pci/pcie/Kconfig
> > +++ b/drivers/pci/pcie/Kconfig
> > @@ -49,6 +49,14 @@ config PCIEAER_INJECT
> >  	  gotten from:
> >  	     https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
> >  
> > +config PCIEAER_CXL
> > +	bool "PCI Express CXL RAS support"
> 
> Description makes this sound too general. I'd mentioned restricted
> hosts even in the menu option title.
> 
> 
> > +	default y
> > +	depends on PCIEAER && CXL_PCI
> > +	help
> > +	  This enables CXL error handling for Restricted CXL Hosts
> > +	  (RCHs).
> 
> Spec term is probably fine in the title, but in the help I'd 
> expand it as per the CXL 3.0 glossary to include
> "CXL Host that is operating in RCD mode."
> It might otherwise surprise people that this matters on their shiny
> new CXL X.0 host (because they found an old CXL 1.1 card in a box
> and decided to plug it in)
> 
> Do we actually need this protection at all?  It's a tiny amount of code
> and I can't see anything immediately that requires the CXL_PCI dependency
> other than it's a bit pointless if that isn't here.
> 
> > +
> >  #
> >  # PCI Express ECRC
> >  #
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index 7a25b62d9e01..171a08fd8ebd 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -946,6 +946,65 @@ static bool find_source_device(struct pci_dev *parent,
> >  	return true;
> >  }
> >  
> > +#ifdef CONFIG_PCIEAER_CXL
> > +
> > +static bool is_cxl_mem_dev(struct pci_dev *dev)
> > +{
> > +	/*
> > +	 * A CXL device is controlled only using PCIe Configuration
> > +	 * Space of device 0, Function 0.
> 
> That's not true in general.   Definitely true that CXL protocol
> error reporting is controlled only using this Devfn, but
> more generally there could be other stuff in later functions.
> So perhaps make the comment more specific.

I actually mean CXL device in RCD mode here (seen as RCiEP in the PCI
hierarchy).

The spec says (cxl 3.0, 8.1.3):

"""
In either case [(RCD and non-RCD)], the capability, status, and
control fields in Device 0, Function 0 DVSEC control the CXL
functionality of the entire device.
"""

So dev 0, func 0 must contain a CXL PCIe DVSEC. Thus it is a CXL
device and able to handle CXL AER errors. The limitation to the first
device prevents the handler from being run multiple times for the same
event.


> 
> > +	 */
> > +	if (dev->devfn != PCI_DEVFN(0, 0))
> > +		return false;
> > +
> > +	/* Right now there is only a CXL.mem driver */
> > +	if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +static bool is_internal_error(struct aer_err_info *info)
> > +{
> > +	if (info->severity == AER_CORRECTABLE)
> > +		return info->status & PCI_ERR_COR_INTERNAL;
> > +
> > +	return info->status & PCI_ERR_UNC_INTN;
> > +}
> > +
> > +static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info);
> > +
> > +static int cxl_handle_error_iter(struct pci_dev *dev, void *data)
> > +{
> > +	struct aer_err_info *e_info = (struct aer_err_info *)data;
> > +
> > +	if (!is_cxl_mem_dev(dev))
> > +		return 0;
> > +
> > +	/* pci_dev_put() in handle_error_source() */
> > +	dev = pci_dev_get(dev);
> > +	if (dev)
> > +		handle_error_source(dev, e_info);
> > +
> > +	return 0;
> > +}
> > +
> > +static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> > +{
> > +	/*
> > +	 * CXL downstream port errors are signaled as RCEC internal
> 
> Make this comment more specific (to RCH I think).

Right, same here, this is restricted mode only.

Thanks for review.

-Robert


> 
> > +	 * errors. Forward them to all CXL devices below the RCEC.
> > +	 */
> > +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> > +	    is_internal_error(info))
> > +		pcie_walk_rcec(dev, cxl_handle_error_iter, info);
> > +}
> > +
> > +#else
> > +static inline void cxl_handle_error(struct pci_dev *dev,
> > +				    struct aer_err_info *info) { }
> > +#endif
> > +
> >  /**
> >   * handle_error_source - handle logging error into an event log
> >   * @dev: pointer to pci_dev data structure of error source device
> > @@ -957,6 +1016,8 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
> >  {
> >  	int aer = dev->aer_cap;
> >  
> > +	cxl_handle_error(dev, info);
> > +
> >  	if (info->severity == AER_CORRECTABLE) {
> >  		/*
> >  		 * Correctable error does not need software intervention.
>
Bjorn Helgaas April 14, 2023, 9:32 p.m. UTC | #5
On Thu, Apr 13, 2023 at 01:40:52PM +0200, Robert Richter wrote:
> On 12.04.23 17:02:33, Bjorn Helgaas wrote:
> > On Tue, Apr 11, 2023 at 01:03:01PM -0500, Terry Bowman wrote:
> > > From: Robert Richter <rrichter@amd.com>

> ...
> Let's assume just a simple CXL RCH topology:
> 
> PCI hierarchy:
> 
>               -----------------
>               | ACPI0016      |--------------       Host bridge (CXL host)
>               | - CEDT        |             |
>    -----------|   - RCRB base |             |
>    |          -----------------             :
>    |               |
>    |               |
>    |          -------------------     ---------
>    |          | RCiEP           |.....| RCEC  |     Endpoint (CXL dev)
>    |  --------| - BDF           |     | - BDF |
>    |  |       | - PCIe AER      |     ---------
>    |  |       | - CXL dvsec     |
>    |  |       |   (v2: reg loc) |
>    |  |       |   - Comp regs   |
>    |  |       |     - CXL RAS   |
>    |  |       -------------------
>    :  :
>       
> CXL hierarchy:
> 
>    :                                        :
>    :          ------------------            |
>    |          | CXL root port  |<------------
>    |          |                |        
>    |--------->| - dport RCRB   |<------------
>    |          |   - PCIe AER   |            |
>    |          |   - Comp regs  |            |
>    |          |     - CXL RAS  |            |
>    |          ------------------            |
>    |  :                                     |
>    |  |       ------------------            |
>    |  ------->| CXL endpoint   |-------------
>    |          | (v1: RCRB)     |
>    ---------->| - uport RCRB   |
>               |   - Comp regs  |
>               |     - CXL RAS  |
>               ------------------
> 
> Dport detected errors are reported using PCIe AER and CXL RAS caps in
> the dports RCRB.
> 
> Uport detected errors are reported using RCiEP's PCIe AER cap and
> either the uport's RCRB RAS cap or the RAS cap of the comp regs
> located using CXL DVSEC register locator.
> 
> In all cases the RCEC is used with either the RCEC (dport errors) or
> the RCiEP (uport errors) error source id (BDF: bus, dev, func).

I'm mostly interested in the PCI entities involved because that's all
aer.c can deal with.  For the above, I think the PCI core only knows
about these:

  00:00.0 RCEC  with AER, RCEC EA includes 00:01.0
  00:01.0 RCiEP with AER

aer_irq() would handle AER interrupts from 00:00.0.
cxl_handle_error() would be called for 00:00.0 and would call
handle_error_source() for everything below it (only 00:01.0 here).

> > The current code uses pcie_walk_rcec() in this path, which basically
> > searches below a Root Port or RCEC for devices that have an AER error
> > status bit set, add them to the e_info[] list, and call
> > handle_error_source() for each one:
> 
> For reference, this series adds support to handle RCH downstream
> port-detected errors as described in CXL 3.0, 12.2.1.1.
> 
> This flow looks correct to me, see comments inline.

We seem to be on the same page here, so I'll trim it out.

> ...
> > So we insert cxl_handle_error() in handle_error_source(), where it
> > gets called for the RCEC, and then it uses pcie_walk_rcec() again to
> > forcibly call handle_error_source() for *every* device "below" the
> > RCEC (even though they don't have AER error status bits set).
> 
> The CXL device contains the links to the dport's caps. Also, there can
> be multiple RCs with CXL devs connected to it. So we must search for
> all CXL devices now, determine the corresponding dport and inspect
> both, PCIe AER and CXL RAS caps.
> 
> > Then handle_error_source() ultimately calls the CXL driver err_handler
> > entry points (.cor_error_detected(), .error_detected(), etc), which
> > can look at the CXL-specific error status in the CXL RAS or RCRB or
> > whatever.
> 
> The AER driver (portdrv) does not have the knowledge of CXL internals.
> Thus the approach is to pass dport errors to the cxl_mem driver to
> handle it there in addition to cxl mem dev errors.
> 
> > So this basically looks like a workaround for the fact that the AER
> > code only calls handle_error_source() when it finds AER error status,
> > and CXL doesn't *set* that AER error status.  There's not that much
> > code here, but it seems like a quite a bit of complexity in an area
> > that is already pretty complicated.

My main point here (correct me if I got this wrong) is that:

  - A RCEC generates an AER interrupt

  - find_source_device() searches all devices below the RCEC and
    builds a list everything for which to call handle_error_source()

  - cxl_handle_error() *again* looks at all devices below the same
    RCEC and calls handle_error_source() for each one

So the main difference here is that the existing flow only calls
handle_error_source() when it finds an error logged in an AER status
register, while the new CXL flow calls handle_error_source() for
*every* device below the RCEC.

I think it's OK to do that, but the almost recursive structure and the
unusual reference counting make the overall AER flow much harder to
understand.

What if we changed is_error_source() to add every CXL.mem device it
finds to the e_info[] list, which I think could nicely encapsulate the
idea that "CXL devices have error state we don't know how to interpret
here"?  Would the existing loop in aer_process_err_devices() then do
what you need?

> > Here's another idea: the ACPI GHES code (ghes_handle_aer()) basically
> > receives a packet of error status from firmware and queues it for
> > recovery via pcie_do_recovery().  What if you had a CXL module that
> > knew how to look for the CXL error status, package it up similarly,
> > and queue it via aer_recover_queue()?
> 
> ...
> But first, RCEC error notifications (RCEC AER interrupts) must be sent
> to the CXL driver to look into the dport's RCRB.

Right.  I think it could be solvable to have aer_irq() call or wake a
CXL interface that has been registered.  But maybe changing
is_error_source() would be simpler.

Bjorn
Jonathan Cameron April 17, 2023, 4:54 p.m. UTC | #6
On Fri, 14 Apr 2023 16:35:05 +0200
Robert Richter <rrichter@amd.com> wrote:

> On 14.04.23 13:19:50, Jonathan Cameron wrote:
> > On Tue, 11 Apr 2023 13:03:01 -0500
> > Terry Bowman <terry.bowman@amd.com> wrote:
> >   
> > > From: Robert Richter <rrichter@amd.com>
> > > 
> > > In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> > > RCiEP, but CXL downstream and upstream ports are not enumerated and
> > > not visible in the PCIe hierarchy. Protocol and link errors are sent
> > > to an RCEC.
> > > 
> > > Restricted CXL host (RCH) downstream port-detected errors are signaled
> > > as internal AER errors, either Uncorrectable Internal Error (UIE) or
> > > Corrected Internal Errors (CIE). The error source is the id of the
> > > RCEC. A CXL handler must then inspect the error status in various CXL
> > > registers residing in the dport's component register space (CXL RAS
> > > cap) or the dport's RCRB (AER ext cap). [1]
> > > 
> > > Errors showing up in the RCEC's error handler must be handled and
> > > connected to the CXL subsystem. Implement this by forwarding the error
> > > to all CXL devices below the RCEC. Since the entire CXL device is
> > > controlled only using PCIe Configuration Space of device 0, Function
> > > 0, only pass it there [2]. These devices have the Memory Device class
> > > code set (PCI_CLASS_MEMORY_CXL, 502h) and the existing cxl_pci driver
> > > can implement the handler.  
> > 
> > This comment implies only class code compliant drivers.  Sure we don't
> > have drivers for anything else yet, but we should try to avoid saying
> > there won't be any (which I think above implies).
> > 
> > You have a comment in the code, but maybe relaxing the description above
> > to "currently support devices have..."  
> 
> It is used here to identify CXL memory devices and limit the
> enablement to those. The spec requires this to be set for CXL mem devs
> (see cxl 3.0, 8.1.12.2).
> 
> There could be other CXL devices (e.g. cache), but other drivers are
> not yet implemented. That is what I am referring to. The check makes
> sure there is actually a driver with a handler for it (cxl_pci).

Understood on intent. My worry is that the above can be read as a
statement on hardware restrictions, rathe than on what software currently
implements.  Meh. Minor point so I don't care that much!
Unlikely anyone will read the patch description after it merges anyway ;)

> 
> >   
> > > In addition to errors directed to the CXL
> > > endpoint device, the handler must also inspect the CXL downstream
> > > port's CXL RAS and PCIe AER external capabilities that is connected to
> > > the device.
> > > 
> > > Since CXL downstream port errors are signaled using internal errors,
> > > the handler requires those errors to be unmasked. This is subject of a
> > > follow-on patch.
> > > 
> > > The reason for choosing this implementation is that a CXL RCEC device
> > > is bound to the AER port driver, but the driver does not allow it to
> > > register a custom specific handler to support CXL. Connecting the RCEC
> > > hard-wired with a CXL handler does not work, as the CXL subsystem
> > > might not be present all the time. The alternative to add an
> > > implementation to the portdrv to allow the registration of a custom
> > > RCEC error handler isn't worth doing it as CXL would be its only user.
> > > Instead, just check for an CXL RCEC and pass it down to the connected
> > > CXL device's error handler. With this approach the code can entirely
> > > be implemented in the PCIe AER driver and is independent of the CXL
> > > subsystem. The CXL driver only provides the handler.
> > > 
> > > [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> > > [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
> > > 
> > > Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> > > Cc: "Oliver O'Halloran" <oohall@gmail.com>
> > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > Cc: linux-pci@vger.kernel.org  
> > 
> > Generally looks good to me.  A few trivial comments inline.
> >   
> > > ---
> > >  drivers/pci/pcie/Kconfig |  8 ++++++
> > >  drivers/pci/pcie/aer.c   | 61 ++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 69 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> > > index 228652a59f27..b0dbd864d3a3 100644
> > > --- a/drivers/pci/pcie/Kconfig
> > > +++ b/drivers/pci/pcie/Kconfig
> > > @@ -49,6 +49,14 @@ config PCIEAER_INJECT
> > >  	  gotten from:
> > >  	     https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
> > >  
> > > +config PCIEAER_CXL
> > > +	bool "PCI Express CXL RAS support"  
> > 
> > Description makes this sound too general. I'd mentioned restricted
> > hosts even in the menu option title.
> > 
> >   
> > > +	default y
> > > +	depends on PCIEAER && CXL_PCI
> > > +	help
> > > +	  This enables CXL error handling for Restricted CXL Hosts
> > > +	  (RCHs).  
> > 
> > Spec term is probably fine in the title, but in the help I'd 
> > expand it as per the CXL 3.0 glossary to include
> > "CXL Host that is operating in RCD mode."
> > It might otherwise surprise people that this matters on their shiny
> > new CXL X.0 host (because they found an old CXL 1.1 card in a box
> > and decided to plug it in)
> > 
> > Do we actually need this protection at all?  It's a tiny amount of code
> > and I can't see anything immediately that requires the CXL_PCI dependency
> > other than it's a bit pointless if that isn't here.
> >   
> > > +
> > >  #
> > >  # PCI Express ECRC
> > >  #
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index 7a25b62d9e01..171a08fd8ebd 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -946,6 +946,65 @@ static bool find_source_device(struct pci_dev *parent,
> > >  	return true;
> > >  }
> > >  
> > > +#ifdef CONFIG_PCIEAER_CXL
> > > +
> > > +static bool is_cxl_mem_dev(struct pci_dev *dev)
> > > +{
> > > +	/*
> > > +	 * A CXL device is controlled only using PCIe Configuration
> > > +	 * Space of device 0, Function 0.  
> > 
> > That's not true in general.   Definitely true that CXL protocol
> > error reporting is controlled only using this Devfn, but
> > more generally there could be other stuff in later functions.
> > So perhaps make the comment more specific.  
> 
> I actually mean CXL device in RCD mode here (seen as RCiEP in the PCI
> hierarchy).
> 
> The spec says (cxl 3.0, 8.1.3):
> 
> """
> In either case [(RCD and non-RCD)], the capability, status, and
> control fields in Device 0, Function 0 DVSEC control the CXL
> functionality of the entire device.

> """
> 
> So dev 0, func 0 must contain a CXL PCIe DVSEC. Thus it is a CXL
> device and able to handle CXL AER errors. The limitation to the first
> device prevents the handler from being run multiple times for the same
> event.

Fine with limitation.  Text says "device is controlled only using".
That is true for what you are controlling here, but other aspects of the
device are controlled via whatever interface they like.

Perhaps just quote the specification as you have done in your reply. Then it
is clear that we mean just these registers.


> 
> 
> >   
> > > +	 */
> > > +	if (dev->devfn != PCI_DEVFN(0, 0))
> > > +		return false;
> > > +
> > > +	/* Right now there is only a CXL.mem driver */
> > > +	if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> > > +		return false;
> > > +
> > > +	return true;
> > > +}
> > > +
> > > +static bool is_internal_error(struct aer_err_info *info)
> > > +{
> > > +	if (info->severity == AER_CORRECTABLE)
> > > +		return info->status & PCI_ERR_COR_INTERNAL;
> > > +
> > > +	return info->status & PCI_ERR_UNC_INTN;
> > > +}
> > > +
> > > +static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info);
> > > +
> > > +static int cxl_handle_error_iter(struct pci_dev *dev, void *data)
> > > +{
> > > +	struct aer_err_info *e_info = (struct aer_err_info *)data;
> > > +
> > > +	if (!is_cxl_mem_dev(dev))
> > > +		return 0;
> > > +
> > > +	/* pci_dev_put() in handle_error_source() */
> > > +	dev = pci_dev_get(dev);
> > > +	if (dev)
> > > +		handle_error_source(dev, e_info);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> > > +{
> > > +	/*
> > > +	 * CXL downstream port errors are signaled as RCEC internal  
> > 
> > Make this comment more specific (to RCH I think).  
> 
> Right, same here, this is restricted mode only.
> 
> Thanks for review.
> 
> -Robert
> 
> 
> >   
> > > +	 * errors. Forward them to all CXL devices below the RCEC.
> > > +	 */
> > > +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
> > > +	    is_internal_error(info))
> > > +		pcie_walk_rcec(dev, cxl_handle_error_iter, info);
> > > +}
> > > +
> > > +#else
> > > +static inline void cxl_handle_error(struct pci_dev *dev,
> > > +				    struct aer_err_info *info) { }
> > > +#endif
> > > +
> > >  /**
> > >   * handle_error_source - handle logging error into an event log
> > >   * @dev: pointer to pci_dev data structure of error source device
> > > @@ -957,6 +1016,8 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
> > >  {
> > >  	int aer = dev->aer_cap;
> > >  
> > > +	cxl_handle_error(dev, info);
> > > +
> > >  	if (info->severity == AER_CORRECTABLE) {
> > >  		/*
> > >  		 * Correctable error does not need software intervention.  
> >
Robert Richter April 17, 2023, 8:36 p.m. UTC | #7
Hi Jonathan,

On 17.04.23 17:54:31, Jonathan Cameron wrote:
> On Fri, 14 Apr 2023 16:35:05 +0200
> Robert Richter <rrichter@amd.com> wrote:
> 
> > On 14.04.23 13:19:50, Jonathan Cameron wrote:
> > > On Tue, 11 Apr 2023 13:03:01 -0500
> > > Terry Bowman <terry.bowman@amd.com> wrote:
> > >   
> > > > From: Robert Richter <rrichter@amd.com>
> > > > 
> > > > In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> > > > RCiEP, but CXL downstream and upstream ports are not enumerated and
> > > > not visible in the PCIe hierarchy. Protocol and link errors are sent
> > > > to an RCEC.
> > > > 
> > > > Restricted CXL host (RCH) downstream port-detected errors are signaled
> > > > as internal AER errors, either Uncorrectable Internal Error (UIE) or
> > > > Corrected Internal Errors (CIE). The error source is the id of the
> > > > RCEC. A CXL handler must then inspect the error status in various CXL
> > > > registers residing in the dport's component register space (CXL RAS
> > > > cap) or the dport's RCRB (AER ext cap). [1]
> > > > 
> > > > Errors showing up in the RCEC's error handler must be handled and
> > > > connected to the CXL subsystem. Implement this by forwarding the error
> > > > to all CXL devices below the RCEC. Since the entire CXL device is
> > > > controlled only using PCIe Configuration Space of device 0, Function
> > > > 0, only pass it there [2]. These devices have the Memory Device class
> > > > code set (PCI_CLASS_MEMORY_CXL, 502h) and the existing cxl_pci driver
> > > > can implement the handler.  
> > > 
> > > This comment implies only class code compliant drivers.  Sure we don't
> > > have drivers for anything else yet, but we should try to avoid saying
> > > there won't be any (which I think above implies).
> > > 
> > > You have a comment in the code, but maybe relaxing the description above
> > > to "currently support devices have..."  
> > 
> > It is used here to identify CXL memory devices and limit the
> > enablement to those. The spec requires this to be set for CXL mem devs
> > (see cxl 3.0, 8.1.12.2).
> > 
> > There could be other CXL devices (e.g. cache), but other drivers are
> > not yet implemented. That is what I am referring to. The check makes
> > sure there is actually a driver with a handler for it (cxl_pci).
> 
> Understood on intent. My worry is that the above can be read as a
> statement on hardware restrictions, rathe than on what software currently
> implements.  Meh. Minor point so I don't care that much!
> Unlikely anyone will read the patch description after it merges anyway ;)

I have updated the description ...

> > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > > index 7a25b62d9e01..171a08fd8ebd 100644
> > > > --- a/drivers/pci/pcie/aer.c
> > > > +++ b/drivers/pci/pcie/aer.c
> > > > @@ -946,6 +946,65 @@ static bool find_source_device(struct pci_dev *parent,
> > > >  	return true;
> > > >  }
> > > >  
> > > > +#ifdef CONFIG_PCIEAER_CXL
> > > > +
> > > > +static bool is_cxl_mem_dev(struct pci_dev *dev)
> > > > +{
> > > > +	/*
> > > > +	 * A CXL device is controlled only using PCIe Configuration
> > > > +	 * Space of device 0, Function 0.  
> > > 
> > > That's not true in general.   Definitely true that CXL protocol
> > > error reporting is controlled only using this Devfn, but
> > > more generally there could be other stuff in later functions.
> > > So perhaps make the comment more specific.  
> > 
> > I actually mean CXL device in RCD mode here (seen as RCiEP in the PCI
> > hierarchy).
> > 
> > The spec says (cxl 3.0, 8.1.3):
> > 
> > """
> > In either case [(RCD and non-RCD)], the capability, status, and
> > control fields in Device 0, Function 0 DVSEC control the CXL
> > functionality of the entire device.
> 
> > """
> > 
> > So dev 0, func 0 must contain a CXL PCIe DVSEC. Thus it is a CXL
> > device and able to handle CXL AER errors. The limitation to the first
> > device prevents the handler from being run multiple times for the same
> > event.
> 
> Fine with limitation.  Text says "device is controlled only using".
> That is true for what you are controlling here, but other aspects of the
> device are controlled via whatever interface they like.
> 
> Perhaps just quote the specification as you have done in your reply. Then it
> is clear that we mean just these registers.

... and comments.

Thanks,

-Robert
Robert Richter April 17, 2023, 10 p.m. UTC | #8
On 14.04.23 16:32:54, Bjorn Helgaas wrote:
> On Thu, Apr 13, 2023 at 01:40:52PM +0200, Robert Richter wrote:
> > On 12.04.23 17:02:33, Bjorn Helgaas wrote:
> > > On Tue, Apr 11, 2023 at 01:03:01PM -0500, Terry Bowman wrote:
> > > > From: Robert Richter <rrichter@amd.com>
> 
> > ...
> > Let's assume just a simple CXL RCH topology:
> > 
> > PCI hierarchy:
> > 
> >               -----------------
> >               | ACPI0016      |--------------       Host bridge (CXL host)
> >               | - CEDT        |             |
> >    -----------|   - RCRB base |             |
> >    |          -----------------             :
> >    |               |
> >    |               |
> >    |          -------------------     ---------
> >    |          | RCiEP           |.....| RCEC  |     Endpoint (CXL dev)
> >    |  --------| - BDF           |     | - BDF |
> >    |  |       | - PCIe AER      |     ---------
> >    |  |       | - CXL dvsec     |
> >    |  |       |   (v2: reg loc) |
> >    |  |       |   - Comp regs   |
> >    |  |       |     - CXL RAS   |
> >    |  |       -------------------
> >    :  :
> >       
> > CXL hierarchy:
> > 
> >    :                                        :
> >    :          ------------------            |
> >    |          | CXL root port  |<------------
> >    |          |                |        
> >    |--------->| - dport RCRB   |<------------
> >    |          |   - PCIe AER   |            |
> >    |          |   - Comp regs  |            |
> >    |          |     - CXL RAS  |            |
> >    |          ------------------            |
> >    |  :                                     |
> >    |  |       ------------------            |
> >    |  ------->| CXL endpoint   |-------------
> >    |          | (v1: RCRB)     |
> >    ---------->| - uport RCRB   |
> >               |   - Comp regs  |
> >               |     - CXL RAS  |
> >               ------------------
> > 
> > Dport detected errors are reported using PCIe AER and CXL RAS caps in
> > the dports RCRB.
> > 
> > Uport detected errors are reported using RCiEP's PCIe AER cap and
> > either the uport's RCRB RAS cap or the RAS cap of the comp regs
> > located using CXL DVSEC register locator.
> > 
> > In all cases the RCEC is used with either the RCEC (dport errors) or
> > the RCiEP (uport errors) error source id (BDF: bus, dev, func).
> 
> I'm mostly interested in the PCI entities involved because that's all
> aer.c can deal with.  For the above, I think the PCI core only knows
> about these:
> 
>   00:00.0 RCEC  with AER, RCEC EA includes 00:01.0
>   00:01.0 RCiEP with AER
> 
> aer_irq() would handle AER interrupts from 00:00.0.
> cxl_handle_error() would be called for 00:00.0 and would call
> handle_error_source() for everything below it (only 00:01.0 here).
> 
> > > The current code uses pcie_walk_rcec() in this path, which basically
> > > searches below a Root Port or RCEC for devices that have an AER error
> > > status bit set, add them to the e_info[] list, and call
> > > handle_error_source() for each one:
> > 
> > For reference, this series adds support to handle RCH downstream
> > port-detected errors as described in CXL 3.0, 12.2.1.1.
> > 
> > This flow looks correct to me, see comments inline.
> 
> We seem to be on the same page here, so I'll trim it out.
> 
> > ...
> > > So we insert cxl_handle_error() in handle_error_source(), where it
> > > gets called for the RCEC, and then it uses pcie_walk_rcec() again to
> > > forcibly call handle_error_source() for *every* device "below" the
> > > RCEC (even though they don't have AER error status bits set).
> > 
> > The CXL device contains the links to the dport's caps. Also, there can
> > be multiple RCs with CXL devs connected to it. So we must search for
> > all CXL devices now, determine the corresponding dport and inspect
> > both, PCIe AER and CXL RAS caps.
> > 
> > > Then handle_error_source() ultimately calls the CXL driver err_handler
> > > entry points (.cor_error_detected(), .error_detected(), etc), which
> > > can look at the CXL-specific error status in the CXL RAS or RCRB or
> > > whatever.
> > 
> > The AER driver (portdrv) does not have the knowledge of CXL internals.
> > Thus the approach is to pass dport errors to the cxl_mem driver to
> > handle it there in addition to cxl mem dev errors.
> > 
> > > So this basically looks like a workaround for the fact that the AER
> > > code only calls handle_error_source() when it finds AER error status,
> > > and CXL doesn't *set* that AER error status.  There's not that much
> > > code here, but it seems like a quite a bit of complexity in an area
> > > that is already pretty complicated.
> 
> My main point here (correct me if I got this wrong) is that:
> 
>   - A RCEC generates an AER interrupt
> 
>   - find_source_device() searches all devices below the RCEC and
>     builds a list everything for which to call handle_error_source()

find_source_device() does not walk the RCEC if the error source is the
RCEC itself (note that find_device_iter() is called for the root/rcec
device first and exits early then).

> 
>   - cxl_handle_error() *again* looks at all devices below the same
>     RCEC and calls handle_error_source() for each one
> 
> So the main difference here is that the existing flow only calls
> handle_error_source() when it finds an error logged in an AER status
> register, while the new CXL flow calls handle_error_source() for
> *every* device below the RCEC.

That is limited as much as possible:

 * The RCEC walk to handle CXL dport errors is done only in case of
   internal errors, for an RCEC only (not a port) (check in
   cxl_handle_error()).

 * Internal errors are only enabled for RCECs connected to CXL devices
   (handles_cxl_errors()).

 * The handler is only called if it is a CXL memory device (class code
   set and zero devfn) (check in cxl_handle_error_iter()).

An optimization I see here is to convert some runtime checks to cached
values determined during device enumeration (CXL device list, RCEC is
associated with CXL devices). Some sort of RCEC-to-CXL-dev
association, similar to rcec->rcec_ea.

> 
> I think it's OK to do that, but the almost recursive structure and the
> unusual reference counting make the overall AER flow much harder to
> understand.
> 
> What if we changed is_error_source() to add every CXL.mem device it
> finds to the e_info[] list, which I think could nicely encapsulate the
> idea that "CXL devices have error state we don't know how to interpret
> here"?  Would the existing loop in aer_process_err_devices() then do
> what you need?

I did not want to mix this with devices determined by the Error Source
Identification Register. CXL device may not be the error source of an
error which may cause some unwanted side-effects. We must also touch
AER_MAX_MULTI_ERR_DEVICES then and how the dev list is implemented as
the max number of devices is unclear.

> 
> > > Here's another idea: the ACPI GHES code (ghes_handle_aer()) basically
> > > receives a packet of error status from firmware and queues it for
> > > recovery via pcie_do_recovery().  What if you had a CXL module that
> > > knew how to look for the CXL error status, package it up similarly,
> > > and queue it via aer_recover_queue()?
> > 
> > ...
> > But first, RCEC error notifications (RCEC AER interrupts) must be sent
> > to the CXL driver to look into the dport's RCRB.
> 
> Right.  I think it could be solvable to have aer_irq() call or wake a
> CXL interface that has been registered.  But maybe changing
> is_error_source() would be simpler.

I am going to see if is_error_source() can be used to also find CXL
devices. But my main concern here is to mix CXL devices with actual
devices identified by the Error Source ID.

Thanks,

-Robert
Dan Williams April 18, 2023, 1:01 a.m. UTC | #9
Terry Bowman wrote:
> From: Robert Richter <rrichter@amd.com>
> 
> In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> RCiEP, but CXL downstream and upstream ports are not enumerated and
> not visible in the PCIe hierarchy. Protocol and link errors are sent
> to an RCEC.
> 
> Restricted CXL host (RCH) downstream port-detected errors are signaled
> as internal AER errors, either Uncorrectable Internal Error (UIE) or
> Corrected Internal Errors (CIE). The error source is the id of the
> RCEC. A CXL handler must then inspect the error status in various CXL
> registers residing in the dport's component register space (CXL RAS
> cap) or the dport's RCRB (AER ext cap). [1]
> 
> Errors showing up in the RCEC's error handler must be handled and
> connected to the CXL subsystem. Implement this by forwarding the error
> to all CXL devices below the RCEC. Since the entire CXL device is
> controlled only using PCIe Configuration Space of device 0, Function
> 0, only pass it there [2]. These devices have the Memory Device class
> code set (PCI_CLASS_MEMORY_CXL, 502h) and the existing cxl_pci driver
> can implement the handler. In addition to errors directed to the CXL
> endpoint device, the handler must also inspect the CXL downstream
> port's CXL RAS and PCIe AER external capabilities that is connected to
> the device.
> 
> Since CXL downstream port errors are signaled using internal errors,
> the handler requires those errors to be unmasked. This is subject of a
> follow-on patch.
> 
> The reason for choosing this implementation is that a CXL RCEC device
> is bound to the AER port driver, but the driver does not allow it to
> register a custom specific handler to support CXL. Connecting the RCEC
> hard-wired with a CXL handler does not work, as the CXL subsystem
> might not be present all the time. The alternative to add an
> implementation to the portdrv to allow the registration of a custom
> RCEC error handler isn't worth doing it as CXL would be its only user.
> Instead, just check for an CXL RCEC and pass it down to the connected
> CXL device's error handler. With this approach the code can entirely
> be implemented in the PCIe AER driver and is independent of the CXL
> subsystem. The CXL driver only provides the handler.
> 
> [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
> 
> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Cc: "Oliver O'Halloran" <oohall@gmail.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-pci@vger.kernel.org
> ---
>  drivers/pci/pcie/Kconfig |  8 ++++++
>  drivers/pci/pcie/aer.c   | 61 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 69 insertions(+)
> 
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 228652a59f27..b0dbd864d3a3 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -49,6 +49,14 @@ config PCIEAER_INJECT
>  	  gotten from:
>  	     https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
>  
> +config PCIEAER_CXL
> +	bool "PCI Express CXL RAS support"
> +	default y
> +	depends on PCIEAER && CXL_PCI
> +	help
> +	  This enables CXL error handling for Restricted CXL Hosts
> +	  (RCHs).
> +
>  #
>  # PCI Express ECRC
>  #
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 7a25b62d9e01..171a08fd8ebd 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -946,6 +946,65 @@ static bool find_source_device(struct pci_dev *parent,
>  	return true;
>  }
>  
> +#ifdef CONFIG_PCIEAER_CXL
> +
> +static bool is_cxl_mem_dev(struct pci_dev *dev)
> +{
> +	/*
> +	 * A CXL device is controlled only using PCIe Configuration
> +	 * Space of device 0, Function 0.
> +	 */
> +	if (dev->devfn != PCI_DEVFN(0, 0))
> +		return false;
> +
> +	/* Right now there is only a CXL.mem driver */
> +	if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> +		return false;
> +
> +	return true;
> +}

This part feels broken because most the errors of concern here are CXL
link generic and that can involve CXL.cache and CXL.mem errors on
devices that are not PCI_CLASS_MEMORY_CXL. This situation feels like it
wants formal acknowledgement in 'struct pci_dev' that CXL links ride on
top of PCIe links.

If it were not for RCRBs then the PCI core could just do:

    dvsec = pci_find_dvsec_capability(pdev, PCI_DVSEC_VENDOR_ID_CXL,
                                      CXL_DVSEC_FLEXBUS_PORT);

...at bus scan time to identify devices with active CXL links. RCRBs
unfortunately make it so the link presence can not be detected until a
CXL driver is loaded to read that DVSEC out of MMIO space.

However, I still think that looks like a CXL aware driver registering a
'struct cxl_link' (for lack of a better name) object with a
corresponding PCI device. That link can indicate whether this is an RCH
topology and whether it needs to do the RCEC walk, and that registration
event can flag the RCEC has having CXL link duties to attend to on AER
events.

I suspect 'struct cxl_link' can also be used if/when we get to
incoporating CXL Reset into PCI reset handling.

> +
> +static bool is_internal_error(struct aer_err_info *info)
> +{
> +	if (info->severity == AER_CORRECTABLE)
> +		return info->status & PCI_ERR_COR_INTERNAL;
> +
> +	return info->status & PCI_ERR_UNC_INTN;
> +}
> +
> +static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info);
> +
> +static int cxl_handle_error_iter(struct pci_dev *dev, void *data)
> +{
> +	struct aer_err_info *e_info = (struct aer_err_info *)data;
> +
> +	if (!is_cxl_mem_dev(dev))
> +		return 0;


I assume this also needs to reference the RDPAS if present?

CXL 3.0 9.17.1.5 RCEC Downstream Port Association Structure (RDPAS)

> +
> +	/* pci_dev_put() in handle_error_source() */
> +	dev = pci_dev_get(dev);
> +	if (dev)
> +		handle_error_source(dev, e_info);

I went looking but missed where does handle_error_source() synchronize
against driver ->remove()?

> +
> +	return 0;
> +}
> +
> +static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)

Naming suggestion...

Given that the VH topology does not require this scanning and
assoication step, lets call this cxl_rch_handle_error() to make it clear
this is only here to undo the awkwardness of CXL 1.1 platforms hiding
registers from typical PCI scanning. A reference to:

CXL 3.0 9.11.8 CXL Devices Attached to an RCH

...might be useful to a future reader that wonders why the CXL RCH case
is so complicated from an AER perspective.
Robert Richter April 19, 2023, 1:30 p.m. UTC | #10
Dan,

thanks for review, see comments inline.

On 17.04.23 18:01:41, Dan Williams wrote:
> Terry Bowman wrote:
> > From: Robert Richter <rrichter@amd.com>
> > 
> > In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> > RCiEP, but CXL downstream and upstream ports are not enumerated and
> > not visible in the PCIe hierarchy. Protocol and link errors are sent
> > to an RCEC.
> > 
> > Restricted CXL host (RCH) downstream port-detected errors are signaled
> > as internal AER errors, either Uncorrectable Internal Error (UIE) or
> > Corrected Internal Errors (CIE). The error source is the id of the
> > RCEC. A CXL handler must then inspect the error status in various CXL
> > registers residing in the dport's component register space (CXL RAS
> > cap) or the dport's RCRB (AER ext cap). [1]
> > 
> > Errors showing up in the RCEC's error handler must be handled and
> > connected to the CXL subsystem. Implement this by forwarding the error
> > to all CXL devices below the RCEC. Since the entire CXL device is
> > controlled only using PCIe Configuration Space of device 0, Function
> > 0, only pass it there [2]. These devices have the Memory Device class
> > code set (PCI_CLASS_MEMORY_CXL, 502h) and the existing cxl_pci driver
> > can implement the handler. In addition to errors directed to the CXL
> > endpoint device, the handler must also inspect the CXL downstream
> > port's CXL RAS and PCIe AER external capabilities that is connected to
> > the device.
> > 
> > Since CXL downstream port errors are signaled using internal errors,
> > the handler requires those errors to be unmasked. This is subject of a
> > follow-on patch.
> > 
> > The reason for choosing this implementation is that a CXL RCEC device
> > is bound to the AER port driver, but the driver does not allow it to
> > register a custom specific handler to support CXL. Connecting the RCEC
> > hard-wired with a CXL handler does not work, as the CXL subsystem
> > might not be present all the time. The alternative to add an
> > implementation to the portdrv to allow the registration of a custom
> > RCEC error handler isn't worth doing it as CXL would be its only user.
> > Instead, just check for an CXL RCEC and pass it down to the connected
> > CXL device's error handler. With this approach the code can entirely
> > be implemented in the PCIe AER driver and is independent of the CXL
> > subsystem. The CXL driver only provides the handler.
> > 
> > [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> > [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
> > 
> > Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> > Cc: "Oliver O'Halloran" <oohall@gmail.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: linux-pci@vger.kernel.org
> > ---
> >  drivers/pci/pcie/Kconfig |  8 ++++++
> >  drivers/pci/pcie/aer.c   | 61 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 69 insertions(+)
> > 
> > diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> > index 228652a59f27..b0dbd864d3a3 100644
> > --- a/drivers/pci/pcie/Kconfig
> > +++ b/drivers/pci/pcie/Kconfig
> > @@ -49,6 +49,14 @@ config PCIEAER_INJECT
> >  	  gotten from:
> >  	     https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
> >  
> > +config PCIEAER_CXL
> > +	bool "PCI Express CXL RAS support"
> > +	default y
> > +	depends on PCIEAER && CXL_PCI
> > +	help
> > +	  This enables CXL error handling for Restricted CXL Hosts
> > +	  (RCHs).
> > +
> >  #
> >  # PCI Express ECRC
> >  #
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index 7a25b62d9e01..171a08fd8ebd 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -946,6 +946,65 @@ static bool find_source_device(struct pci_dev *parent,
> >  	return true;
> >  }
> >  
> > +#ifdef CONFIG_PCIEAER_CXL
> > +
> > +static bool is_cxl_mem_dev(struct pci_dev *dev)
> > +{
> > +	/*
> > +	 * A CXL device is controlled only using PCIe Configuration
> > +	 * Space of device 0, Function 0.
> > +	 */
> > +	if (dev->devfn != PCI_DEVFN(0, 0))
> > +		return false;
> > +
> > +	/* Right now there is only a CXL.mem driver */
> > +	if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
> > +		return false;
> > +
> > +	return true;
> > +}
> 
> This part feels broken because most the errors of concern here are CXL
> link generic and that can involve CXL.cache and CXL.mem errors on
> devices that are not PCI_CLASS_MEMORY_CXL. This situation feels like it
> wants formal acknowledgement in 'struct pci_dev' that CXL links ride on
> top of PCIe links.

There is already rcec->rcec_ea that holds the RCEC-to-endpoint
association. Determining if the RCiEP is a CXL dev is a small check
which is exactly what is_cxl_mem_dev() is for. I don't see a benefit
in holding the same information in an additional cxl_link structure.

And as you also said below, for RCRB handling a CXL driver is needed
which is why is_cxl_mem_dev() with the class check is used below.

> 
> If it were not for RCRBs then the PCI core could just do:
> 
>     dvsec = pci_find_dvsec_capability(pdev, PCI_DVSEC_VENDOR_ID_CXL,
>                                       CXL_DVSEC_FLEXBUS_PORT);
> 
> ...at bus scan time to identify devices with active CXL links. RCRBs
> unfortunately make it so the link presence can not be detected until a
> CXL driver is loaded to read that DVSEC out of MMIO space.

In a VH topology those errors can be directly handled in a pci driver
for CXL ports, if the portdrv handles that the check could be useful.
But this is not subject of this patch series.

> 
> However, I still think that looks like a CXL aware driver registering a
> 'struct cxl_link' (for lack of a better name) object with a
> corresponding PCI device. That link can indicate whether this is an RCH
> topology and whether it needs to do the RCEC walk, and that registration
> event can flag the RCEC has having CXL link duties to attend to on AER
> events.

For CXL awareness of the AER driver the simple checks from above could
be used, either called directly for the pci_dev (VH mode), or by
walking the RCEC. IMO, a 'struct cxl_link' and a function to register
it are not really needed here.

> 
> I suspect 'struct cxl_link' can also be used if/when we get to
> incoporating CXL Reset into PCI reset handling.
> 
> > +
> > +static bool is_internal_error(struct aer_err_info *info)
> > +{
> > +	if (info->severity == AER_CORRECTABLE)
> > +		return info->status & PCI_ERR_COR_INTERNAL;
> > +
> > +	return info->status & PCI_ERR_UNC_INTN;
> > +}
> > +
> > +static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info);
> > +
> > +static int cxl_handle_error_iter(struct pci_dev *dev, void *data)
> > +{
> > +	struct aer_err_info *e_info = (struct aer_err_info *)data;
> > +
> > +	if (!is_cxl_mem_dev(dev))
> > +		return 0;
> 
> 
> I assume this also needs to reference the RDPAS if present?

That is subject of a follow-on patch.

Here I see, why you may need a struct cxl_link. But that list must not
reside in the pci_dev, instead a CXL aware driver can look up a
self-maintained list of RDPAS mappings (RCEC-to-Downstream Port
assosiations) to decide whether to lookup the dport's AER and RAS
capablilities.

> 
> CXL 3.0 9.17.1.5 RCEC Downstream Port Association Structure (RDPAS)
> 
> > +
> > +	/* pci_dev_put() in handle_error_source() */
> > +	dev = pci_dev_get(dev);
> > +	if (dev)
> > +		handle_error_source(dev, e_info);
> 
> I went looking but missed where does handle_error_source() synchronize
> against driver ->remove()?

Right, the device_lock() is missing in handle_error_source() while
accessing pdrv and calling the handler. Will send a fix.

> 
> > +
> > +	return 0;
> > +}
> > +
> > +static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
> 
> Naming suggestion...
> 
> Given that the VH topology does not require this scanning and
> assoication step, lets call this cxl_rch_handle_error() to make it clear
> this is only here to undo the awkwardness of CXL 1.1 platforms hiding
> registers from typical PCI scanning. A reference to:
> 
> CXL 3.0 9.11.8 CXL Devices Attached to an RCH
> 
> ...might be useful to a future reader that wonders why the CXL RCH case
> is so complicated from an AER perspective.

Ok.

Thanks,

-Robert
Robert Richter April 19, 2023, 2:17 p.m. UTC | #11
Bjorn,

On 18.04.23 00:00:58, Robert Richter wrote:
> On 14.04.23 16:32:54, Bjorn Helgaas wrote:
> > On Thu, Apr 13, 2023 at 01:40:52PM +0200, Robert Richter wrote:
> > > On 12.04.23 17:02:33, Bjorn Helgaas wrote:
> > > > On Tue, Apr 11, 2023 at 01:03:01PM -0500, Terry Bowman wrote:

> > I'm mostly interested in the PCI entities involved because that's all
> > aer.c can deal with.  For the above, I think the PCI core only knows
> > about these:
> > 
> >   00:00.0 RCEC  with AER, RCEC EA includes 00:01.0
> >   00:01.0 RCiEP with AER
> > 
> > aer_irq() would handle AER interrupts from 00:00.0.
> > cxl_handle_error() would be called for 00:00.0 and would call
> > handle_error_source() for everything below it (only 00:01.0 here).
> > 
> > > > The current code uses pcie_walk_rcec() in this path, which basically
> > > > searches below a Root Port or RCEC for devices that have an AER error
> > > > status bit set, add them to the e_info[] list, and call
> > > > handle_error_source() for each one:
> > > 
> > > For reference, this series adds support to handle RCH downstream
> > > port-detected errors as described in CXL 3.0, 12.2.1.1.
> > > 
> > > This flow looks correct to me, see comments inline.
> > 
> > We seem to be on the same page here, so I'll trim it out.
> > 
> > > ...
> > > > So we insert cxl_handle_error() in handle_error_source(), where it
> > > > gets called for the RCEC, and then it uses pcie_walk_rcec() again to
> > > > forcibly call handle_error_source() for *every* device "below" the
> > > > RCEC (even though they don't have AER error status bits set).
> > > 
> > > The CXL device contains the links to the dport's caps. Also, there can
> > > be multiple RCs with CXL devs connected to it. So we must search for
> > > all CXL devices now, determine the corresponding dport and inspect
> > > both, PCIe AER and CXL RAS caps.
> > > 
> > > > Then handle_error_source() ultimately calls the CXL driver err_handler
> > > > entry points (.cor_error_detected(), .error_detected(), etc), which
> > > > can look at the CXL-specific error status in the CXL RAS or RCRB or
> > > > whatever.
> > > 
> > > The AER driver (portdrv) does not have the knowledge of CXL internals.
> > > Thus the approach is to pass dport errors to the cxl_mem driver to
> > > handle it there in addition to cxl mem dev errors.
> > > 
> > > > So this basically looks like a workaround for the fact that the AER
> > > > code only calls handle_error_source() when it finds AER error status,
> > > > and CXL doesn't *set* that AER error status.  There's not that much
> > > > code here, but it seems like a quite a bit of complexity in an area
> > > > that is already pretty complicated.
> > 
> > My main point here (correct me if I got this wrong) is that:
> > 
> >   - A RCEC generates an AER interrupt
> > 
> >   - find_source_device() searches all devices below the RCEC and
> >     builds a list everything for which to call handle_error_source()
> 
> find_source_device() does not walk the RCEC if the error source is the
> RCEC itself (note that find_device_iter() is called for the root/rcec
> device first and exits early then).
> 
> > 
> >   - cxl_handle_error() *again* looks at all devices below the same
> >     RCEC and calls handle_error_source() for each one
> > 
> > So the main difference here is that the existing flow only calls
> > handle_error_source() when it finds an error logged in an AER status
> > register, while the new CXL flow calls handle_error_source() for
> > *every* device below the RCEC.
> 
> That is limited as much as possible:
> 
>  * The RCEC walk to handle CXL dport errors is done only in case of
>    internal errors, for an RCEC only (not a port) (check in
>    cxl_handle_error()).
> 
>  * Internal errors are only enabled for RCECs connected to CXL devices
>    (handles_cxl_errors()).
> 
>  * The handler is only called if it is a CXL memory device (class code
>    set and zero devfn) (check in cxl_handle_error_iter()).
> 
> An optimization I see here is to convert some runtime checks to cached
> values determined during device enumeration (CXL device list, RCEC is
> associated with CXL devices). Some sort of RCEC-to-CXL-dev
> association, similar to rcec->rcec_ea.
> 
> > 
> > I think it's OK to do that, but the almost recursive structure and the
> > unusual reference counting make the overall AER flow much harder to
> > understand.
> > 
> > What if we changed is_error_source() to add every CXL.mem device it
> > finds to the e_info[] list, which I think could nicely encapsulate the
> > idea that "CXL devices have error state we don't know how to interpret
> > here"?  Would the existing loop in aer_process_err_devices() then do
> > what you need?
> 
> I did not want to mix this with devices determined by the Error Source
> Identification Register. CXL device may not be the error source of an
> error which may cause some unwanted side-effects. We must also touch
> AER_MAX_MULTI_ERR_DEVICES then and how the dev list is implemented as
> the max number of devices is unclear.
> 
> > 
> > > > Here's another idea: the ACPI GHES code (ghes_handle_aer()) basically
> > > > receives a packet of error status from firmware and queues it for
> > > > recovery via pcie_do_recovery().  What if you had a CXL module that
> > > > knew how to look for the CXL error status, package it up similarly,
> > > > and queue it via aer_recover_queue()?
> > > 
> > > ...
> > > But first, RCEC error notifications (RCEC AER interrupts) must be sent
> > > to the CXL driver to look into the dport's RCRB.
> > 
> > Right.  I think it could be solvable to have aer_irq() call or wake a
> > CXL interface that has been registered.  But maybe changing
> > is_error_source() would be simpler.
> 
> I am going to see if is_error_source() can be used to also find CXL
> devices. But my main concern here is to mix CXL devices with actual
> devices identified by the Error Source ID.

I have looked into reusing is_error_source() and modifying
find_source_device() to also add CXL devices (the RCiEPs) to the dev
list in e_info. The problem I see is that at AER level it is unknown
whether an error happened or not. The downstream port AER capability
also does not reside in a PCI config space header and thus is not
directly bound to a pci_dev. That means the endpoint's AER capability
in pci_dev is not the one we need, instead a CXL aware driver must
lookup the RCRB which contains the AER. Additional, the CXL RAS cap
must be inspected by that driver.

Assuming we add the RCiEP to the dev list the CXL endpoint will be
processed by aer_get_device_error_info(), aer_print_error() and
handle_error_source(). This is done for the endpoint device even if
the source is the dport. Also we need to check the error status of
both caps registers first. This will cause error reports and status
checks of devices not being the error source.

That said, I think the best option is still to delegate the error down
to a CXL handler and do the error status check, reporting and handling
of the CXL specifics there.

I see your point that esp. the pci_dev's refcount handling needs to be
improved. I will address that along with the other review comments in
a next version of this patch series. Let's then revisit this
discussion here?

Thanks,

-Robert
diff mbox series

Patch

diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 228652a59f27..b0dbd864d3a3 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -49,6 +49,14 @@  config PCIEAER_INJECT
 	  gotten from:
 	     https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
 
+config PCIEAER_CXL
+	bool "PCI Express CXL RAS support"
+	default y
+	depends on PCIEAER && CXL_PCI
+	help
+	  This enables CXL error handling for Restricted CXL Hosts
+	  (RCHs).
+
 #
 # PCI Express ECRC
 #
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 7a25b62d9e01..171a08fd8ebd 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -946,6 +946,65 @@  static bool find_source_device(struct pci_dev *parent,
 	return true;
 }
 
+#ifdef CONFIG_PCIEAER_CXL
+
+static bool is_cxl_mem_dev(struct pci_dev *dev)
+{
+	/*
+	 * A CXL device is controlled only using PCIe Configuration
+	 * Space of device 0, Function 0.
+	 */
+	if (dev->devfn != PCI_DEVFN(0, 0))
+		return false;
+
+	/* Right now there is only a CXL.mem driver */
+	if ((dev->class >> 8) != PCI_CLASS_MEMORY_CXL)
+		return false;
+
+	return true;
+}
+
+static bool is_internal_error(struct aer_err_info *info)
+{
+	if (info->severity == AER_CORRECTABLE)
+		return info->status & PCI_ERR_COR_INTERNAL;
+
+	return info->status & PCI_ERR_UNC_INTN;
+}
+
+static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info);
+
+static int cxl_handle_error_iter(struct pci_dev *dev, void *data)
+{
+	struct aer_err_info *e_info = (struct aer_err_info *)data;
+
+	if (!is_cxl_mem_dev(dev))
+		return 0;
+
+	/* pci_dev_put() in handle_error_source() */
+	dev = pci_dev_get(dev);
+	if (dev)
+		handle_error_source(dev, e_info);
+
+	return 0;
+}
+
+static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info)
+{
+	/*
+	 * CXL downstream port errors are signaled as RCEC internal
+	 * errors. Forward them to all CXL devices below the RCEC.
+	 */
+	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
+	    is_internal_error(info))
+		pcie_walk_rcec(dev, cxl_handle_error_iter, info);
+}
+
+#else
+static inline void cxl_handle_error(struct pci_dev *dev,
+				    struct aer_err_info *info) { }
+#endif
+
 /**
  * handle_error_source - handle logging error into an event log
  * @dev: pointer to pci_dev data structure of error source device
@@ -957,6 +1016,8 @@  static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
 {
 	int aer = dev->aer_cap;
 
+	cxl_handle_error(dev, info);
+
 	if (info->severity == AER_CORRECTABLE) {
 		/*
 		 * Correctable error does not need software intervention.