Message ID | 20240617200411.1426554-2-terry.bowman@amd.com |
---|---|
State | New |
Headers | show |
Series | Add RAS support for CXL root ports, CXL downstream switch ports, and CXL upstream switch ports | expand |
On Mon, 17 Jun 2024 15:04:03 -0500 Terry Bowman <terry.bowman@amd.com> wrote: > The AER service driver does not currently call a handler for AER > uncorrectable errors (UCE) detected in root ports or downstream > ports. This is not needed in most cases because common PCIe port > functionality is handled by portdrv service drivers. > > CXL root ports include CXL specific RAS registers that need logging > before starting do_recovery() in the UCE case. > > Update the AER service driver to call the UCE handler for root ports > and downstream ports. These PCIe port devices are bound to the portdrv > driver that includes a CE and UCE handler to be called. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index 705893b5f7b0..a4db474b2be5 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > > + /* > + * PCIe ports may include functionality beyond the standard > + * extended port capabilities. This may present a need to log and > + * handle errors not addressed in this driver. Examples are CXL > + * root ports and CXL downstream switch ports using AER UIE to > + * indicate CXL UCE RAS protocol errors. > + */ > + if (type == PCI_EXP_TYPE_ROOT_PORT || > + type == PCI_EXP_TYPE_DOWNSTREAM) { > + struct pci_driver *pdrv = dev->driver; > + > + if (pdrv && pdrv->err_handler && > + pdrv->err_handler->error_detected) { > + const struct pci_error_handlers *err_handler; > + > + err_handler = pdrv->err_handler; > + status = err_handler->error_detected(dev, state); This status is going to get overridden by one of the pci_walk_bridge() calls. Should it be kept around and acted on, or dropped silently? (I'd guess no for silent!). > + } > + } > + > /* > * If the error was detected by a Root Port, Downstream Port, RCEC, > * or RCiEP, recovery runs on the device itself. For Ports, that
Terry Bowman wrote: > The AER service driver does not currently call a handler for AER > uncorrectable errors (UCE) detected in root ports or downstream > ports. This is not needed in most cases because common PCIe port > functionality is handled by portdrv service drivers. > > CXL root ports include CXL specific RAS registers that need logging > before starting do_recovery() in the UCE case. > > Update the AER service driver to call the UCE handler for root ports > and downstream ports. These PCIe port devices are bound to the portdrv > driver that includes a CE and UCE handler to be called. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index 705893b5f7b0..a4db474b2be5 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > > + /* > + * PCIe ports may include functionality beyond the standard > + * extended port capabilities. This may present a need to log and > + * handle errors not addressed in this driver. Examples are CXL > + * root ports and CXL downstream switch ports using AER UIE to > + * indicate CXL UCE RAS protocol errors. > + */ > + if (type == PCI_EXP_TYPE_ROOT_PORT || > + type == PCI_EXP_TYPE_DOWNSTREAM) { > + struct pci_driver *pdrv = dev->driver; > + > + if (pdrv && pdrv->err_handler && > + pdrv->err_handler->error_detected) { > + const struct pci_error_handlers *err_handler; > + > + err_handler = pdrv->err_handler; > + status = err_handler->error_detected(dev, state); > + } > + } > + Would not a more appropriate place for this be pci_walk_bridge() where the ->subordinate == NULL and these type-check cases are unified?
Hi Jonathan, I added a response below. On 6/20/24 06:21, Jonathan Cameron wrote: > On Mon, 17 Jun 2024 15:04:03 -0500 > Terry Bowman <terry.bowman@amd.com> wrote: > >> The AER service driver does not currently call a handler for AER >> uncorrectable errors (UCE) detected in root ports or downstream >> ports. This is not needed in most cases because common PCIe port >> functionality is handled by portdrv service drivers. >> >> CXL root ports include CXL specific RAS registers that need logging >> before starting do_recovery() in the UCE case. >> >> Update the AER service driver to call the UCE handler for root ports >> and downstream ports. These PCIe port devices are bound to the portdrv >> driver that includes a CE and UCE handler to be called. >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: linux-pci@vger.kernel.org >> --- >> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ >> 1 file changed, 20 insertions(+) >> >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >> index 705893b5f7b0..a4db474b2be5 100644 >> --- a/drivers/pci/pcie/err.c >> +++ b/drivers/pci/pcie/err.c >> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; >> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); >> >> + /* >> + * PCIe ports may include functionality beyond the standard >> + * extended port capabilities. This may present a need to log and >> + * handle errors not addressed in this driver. Examples are CXL >> + * root ports and CXL downstream switch ports using AER UIE to >> + * indicate CXL UCE RAS protocol errors. >> + */ >> + if (type == PCI_EXP_TYPE_ROOT_PORT || >> + type == PCI_EXP_TYPE_DOWNSTREAM) { >> + struct pci_driver *pdrv = dev->driver; >> + >> + if (pdrv && pdrv->err_handler && >> + pdrv->err_handler->error_detected) { >> + const struct pci_error_handlers *err_handler; >> + >> + err_handler = pdrv->err_handler; >> + status = err_handler->error_detected(dev, state); > > This status is going to get overridden by one of the pci_walk_bridge() > calls. Should it be kept around and acted on, or dropped silently? > (I'd guess no for silent!). > It should be used later. According to PCI spec "The only method of recovering from an Uncorrectable Internal Error is reset or hardware replacement." I need to make certain that carries through below. Regards, Terry >> + } >> + } >> + >> /* >> * If the error was detected by a Root Port, Downstream Port, RCEC, >> * or RCiEP, recovery runs on the device itself. For Ports, that >
Hi Dan, I added a response below. On 6/21/24 14:17, Dan Williams wrote: > Terry Bowman wrote: >> The AER service driver does not currently call a handler for AER >> uncorrectable errors (UCE) detected in root ports or downstream >> ports. This is not needed in most cases because common PCIe port >> functionality is handled by portdrv service drivers. >> >> CXL root ports include CXL specific RAS registers that need logging >> before starting do_recovery() in the UCE case. >> >> Update the AER service driver to call the UCE handler for root ports >> and downstream ports. These PCIe port devices are bound to the portdrv >> driver that includes a CE and UCE handler to be called. >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: linux-pci@vger.kernel.org >> --- >> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ >> 1 file changed, 20 insertions(+) >> >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >> index 705893b5f7b0..a4db474b2be5 100644 >> --- a/drivers/pci/pcie/err.c >> +++ b/drivers/pci/pcie/err.c >> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; >> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); >> >> + /* >> + * PCIe ports may include functionality beyond the standard >> + * extended port capabilities. This may present a need to log and >> + * handle errors not addressed in this driver. Examples are CXL >> + * root ports and CXL downstream switch ports using AER UIE to >> + * indicate CXL UCE RAS protocol errors. >> + */ >> + if (type == PCI_EXP_TYPE_ROOT_PORT || >> + type == PCI_EXP_TYPE_DOWNSTREAM) { >> + struct pci_driver *pdrv = dev->driver; >> + >> + if (pdrv && pdrv->err_handler && >> + pdrv->err_handler->error_detected) { >> + const struct pci_error_handlers *err_handler; >> + >> + err_handler = pdrv->err_handler; >> + status = err_handler->error_detected(dev, state); >> + } >> + } >> + > > Would not a more appropriate place for this be pci_walk_bridge() where > the ->subordinate == NULL and these type-check cases are unified? It does. I can take a look at moving that. Regards, Terry
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 705893b5f7b0..a4db474b2be5 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); + /* + * PCIe ports may include functionality beyond the standard + * extended port capabilities. This may present a need to log and + * handle errors not addressed in this driver. Examples are CXL + * root ports and CXL downstream switch ports using AER UIE to + * indicate CXL UCE RAS protocol errors. + */ + if (type == PCI_EXP_TYPE_ROOT_PORT || + type == PCI_EXP_TYPE_DOWNSTREAM) { + struct pci_driver *pdrv = dev->driver; + + if (pdrv && pdrv->err_handler && + pdrv->err_handler->error_detected) { + const struct pci_error_handlers *err_handler; + + err_handler = pdrv->err_handler; + status = err_handler->error_detected(dev, state); + } + } + /* * If the error was detected by a Root Port, Downstream Port, RCEC, * or RCiEP, recovery runs on the device itself. For Ports, that
The AER service driver does not currently call a handler for AER uncorrectable errors (UCE) detected in root ports or downstream ports. This is not needed in most cases because common PCIe port functionality is handled by portdrv service drivers. CXL root ports include CXL specific RAS registers that need logging before starting do_recovery() in the UCE case. Update the AER service driver to call the UCE handler for root ports and downstream ports. These PCIe port devices are bound to the portdrv driver that includes a CE and UCE handler to be called. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org --- drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)