Message ID | 20250211192444.2292833-17-terry.bowman@amd.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | Enable CXL PCIe port protocol error handling and logging | expand |
On Tue, Feb 11, 2025 at 01:24:43PM -0600, Terry Bowman wrote: > The AER service driver enables PCIe Uncorrectable Internal Errors (UIE) and > Correctable Internal errors (CIE) for CXL Root Ports. The UIE and CIE are > used in reporting CXL Protocol Errors. The same UIE/CIE enablement is > needed for CXL Upstream Switch Ports and CXL Downstream Switch Ports > inorder to notify the associated Root Port and OS.[1] > > Export the AER service driver's pci_aer_unmask_internal_errors() function > to CXL namespace. > > Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config > because it is now an exported function. > > Call pci_aer_unmask_internal_errors() during RAS initialization in: > cxl_uport_init_ras_reporting() and cxl_dport_init_ras_reporting(). > > [1] PCIe Base Spec r6.2-1.0, 6.2.3.2.2 Masking Individual Errors > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> I'd say this is really a CXL-centric change, given that pci_aer_unmask_internal_errors() is only used for CXL and it's exported in the CXL namespace. So I would use cxl/pci: ... in the subject. > --- > drivers/cxl/core/pci.c | 2 ++ > drivers/pci/pcie/aer.c | 3 ++- > include/linux/aer.h | 1 + > 3 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 03ae21a944e0..36e686a31045 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -912,6 +912,7 @@ void cxl_uport_init_ras_reporting(struct cxl_port *port) > > cxl_assign_port_error_handlers(pdev); > devm_add_action_or_reset(&port->dev, cxl_clear_port_error_handlers, pdev); > + pci_aer_unmask_internal_errors(pdev); > } > EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, "CXL"); > > @@ -959,6 +960,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport) > cxl_assign_port_error_handlers(pdev); > devm_add_action_or_reset(&port->dev, cxl_clear_port_error_handlers, pdev); > put_device(&port->dev); > + pci_aer_unmask_internal_errors(pdev); > } > EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL"); > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index ee38db08d005..8e3a60411610 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -948,7 +948,7 @@ static bool find_source_device(struct pci_dev *parent, > * Note: AER must be enabled and supported by the device which must be > * checked in advance, e.g. with pcie_aer_is_native(). > */ > -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) > +void pci_aer_unmask_internal_errors(struct pci_dev *dev) > { > int aer = dev->aer_cap; > u32 mask; > @@ -961,6 +961,7 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev) > mask &= ~PCI_ERR_COR_INTERNAL; > pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); > } > +EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, "CXL"); > > static bool is_cxl_mem_dev(struct pci_dev *dev) > { > diff --git a/include/linux/aer.h b/include/linux/aer.h > index 947b63091902..a54545796edc 100644 > --- a/include/linux/aer.h > +++ b/include/linux/aer.h > @@ -61,5 +61,6 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, > int cper_severity_to_aer(int cper_severity); > void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, > int severity, struct aer_capability_regs *aer_regs); > +void pci_aer_unmask_internal_errors(struct pci_dev *dev); > #endif //_AER_H_ > > -- > 2.34.1 >
On 2/11/2025 2:25 PM, Bjorn Helgaas wrote: > On Tue, Feb 11, 2025 at 01:24:43PM -0600, Terry Bowman wrote: >> The AER service driver enables PCIe Uncorrectable Internal Errors (UIE) and >> Correctable Internal errors (CIE) for CXL Root Ports. The UIE and CIE are >> used in reporting CXL Protocol Errors. The same UIE/CIE enablement is >> needed for CXL Upstream Switch Ports and CXL Downstream Switch Ports >> inorder to notify the associated Root Port and OS.[1] >> >> Export the AER service driver's pci_aer_unmask_internal_errors() function >> to CXL namespace. >> >> Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config >> because it is now an exported function. >> >> Call pci_aer_unmask_internal_errors() during RAS initialization in: >> cxl_uport_init_ras_reporting() and cxl_dport_init_ras_reporting(). >> >> [1] PCIe Base Spec r6.2-1.0, 6.2.3.2.2 Masking Individual Errors >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > > I'd say this is really a CXL-centric change, given that > pci_aer_unmask_internal_errors() is only used for CXL and it's > exported in the CXL namespace. So I would use > > cxl/pci: ... > > in the subject. Yes, I'll change to cxl/pci. Thanks for reviewing. Terry >> --- >> drivers/cxl/core/pci.c | 2 ++ >> drivers/pci/pcie/aer.c | 3 ++- >> include/linux/aer.h | 1 + >> 3 files changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c >> index 03ae21a944e0..36e686a31045 100644 >> --- a/drivers/cxl/core/pci.c >> +++ b/drivers/cxl/core/pci.c >> @@ -912,6 +912,7 @@ void cxl_uport_init_ras_reporting(struct cxl_port *port) >> >> cxl_assign_port_error_handlers(pdev); >> devm_add_action_or_reset(&port->dev, cxl_clear_port_error_handlers, pdev); >> + pci_aer_unmask_internal_errors(pdev); >> } >> EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, "CXL"); >> >> @@ -959,6 +960,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport) >> cxl_assign_port_error_handlers(pdev); >> devm_add_action_or_reset(&port->dev, cxl_clear_port_error_handlers, pdev); >> put_device(&port->dev); >> + pci_aer_unmask_internal_errors(pdev); >> } >> EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL"); >> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index ee38db08d005..8e3a60411610 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -948,7 +948,7 @@ static bool find_source_device(struct pci_dev *parent, >> * Note: AER must be enabled and supported by the device which must be >> * checked in advance, e.g. with pcie_aer_is_native(). >> */ >> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) >> +void pci_aer_unmask_internal_errors(struct pci_dev *dev) >> { >> int aer = dev->aer_cap; >> u32 mask; >> @@ -961,6 +961,7 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev) >> mask &= ~PCI_ERR_COR_INTERNAL; >> pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); >> } >> +EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, "CXL"); >> >> static bool is_cxl_mem_dev(struct pci_dev *dev) >> { >> diff --git a/include/linux/aer.h b/include/linux/aer.h >> index 947b63091902..a54545796edc 100644 >> --- a/include/linux/aer.h >> +++ b/include/linux/aer.h >> @@ -61,5 +61,6 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, >> int cper_severity_to_aer(int cper_severity); >> void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, >> int severity, struct aer_capability_regs *aer_regs); >> +void pci_aer_unmask_internal_errors(struct pci_dev *dev); >> #endif //_AER_H_ >> >> -- >> 2.34.1 >>
On 2/11/25 12:24 PM, Terry Bowman wrote: > The AER service driver enables PCIe Uncorrectable Internal Errors (UIE) and > Correctable Internal errors (CIE) for CXL Root Ports. The UIE and CIE are > used in reporting CXL Protocol Errors. The same UIE/CIE enablement is > needed for CXL Upstream Switch Ports and CXL Downstream Switch Ports > inorder to notify the associated Root Port and OS.[1] > > Export the AER service driver's pci_aer_unmask_internal_errors() function > to CXL namespace. > > Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config > because it is now an exported function. > > Call pci_aer_unmask_internal_errors() during RAS initialization in: > cxl_uport_init_ras_reporting() and cxl_dport_init_ras_reporting(). > > [1] PCIe Base Spec r6.2-1.0, 6.2.3.2.2 Masking Individual Errors > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/cxl/core/pci.c | 2 ++ > drivers/pci/pcie/aer.c | 3 ++- > include/linux/aer.h | 1 + > 3 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 03ae21a944e0..36e686a31045 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -912,6 +912,7 @@ void cxl_uport_init_ras_reporting(struct cxl_port *port) > > cxl_assign_port_error_handlers(pdev); > devm_add_action_or_reset(&port->dev, cxl_clear_port_error_handlers, pdev); > + pci_aer_unmask_internal_errors(pdev); > } > EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, "CXL"); > > @@ -959,6 +960,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport) > cxl_assign_port_error_handlers(pdev); > devm_add_action_or_reset(&port->dev, cxl_clear_port_error_handlers, pdev); > put_device(&port->dev); > + pci_aer_unmask_internal_errors(pdev); > } > EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL"); > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index ee38db08d005..8e3a60411610 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -948,7 +948,7 @@ static bool find_source_device(struct pci_dev *parent, > * Note: AER must be enabled and supported by the device which must be > * checked in advance, e.g. with pcie_aer_is_native(). > */ > -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) > +void pci_aer_unmask_internal_errors(struct pci_dev *dev) > { > int aer = dev->aer_cap; > u32 mask; > @@ -961,6 +961,7 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev) > mask &= ~PCI_ERR_COR_INTERNAL; > pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); > } > +EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, "CXL"); > > static bool is_cxl_mem_dev(struct pci_dev *dev) > { > diff --git a/include/linux/aer.h b/include/linux/aer.h > index 947b63091902..a54545796edc 100644 > --- a/include/linux/aer.h > +++ b/include/linux/aer.h > @@ -61,5 +61,6 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, > int cper_severity_to_aer(int cper_severity); > void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, > int severity, struct aer_capability_regs *aer_regs); > +void pci_aer_unmask_internal_errors(struct pci_dev *dev); > #endif //_AER_H_ >
Terry Bowman wrote: > The AER service driver enables PCIe Uncorrectable Internal Errors (UIE) and > Correctable Internal errors (CIE) for CXL Root Ports. The UIE and CIE are > used in reporting CXL Protocol Errors. The same UIE/CIE enablement is > needed for CXL Upstream Switch Ports and CXL Downstream Switch Ports > inorder to notify the associated Root Port and OS.[1] > > Export the AER service driver's pci_aer_unmask_internal_errors() function > to CXL namespace. > > Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config > because it is now an exported function. > > Call pci_aer_unmask_internal_errors() during RAS initialization in: > cxl_uport_init_ras_reporting() and cxl_dport_init_ras_reporting(). > > [1] PCIe Base Spec r6.2-1.0, 6.2.3.2.2 Masking Individual Errors > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> I wonder if this should save+unmask and restore the prior state when the cxl_port detaches from the port driver? I guess we can wait to see if this causes problems since internal errors should be more predictable / reliable on CXL devices compared to generic PCIe devices where Linux never enabled internal errors previously.
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index 03ae21a944e0..36e686a31045 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -912,6 +912,7 @@ void cxl_uport_init_ras_reporting(struct cxl_port *port) cxl_assign_port_error_handlers(pdev); devm_add_action_or_reset(&port->dev, cxl_clear_port_error_handlers, pdev); + pci_aer_unmask_internal_errors(pdev); } EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, "CXL"); @@ -959,6 +960,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport) cxl_assign_port_error_handlers(pdev); devm_add_action_or_reset(&port->dev, cxl_clear_port_error_handlers, pdev); put_device(&port->dev); + pci_aer_unmask_internal_errors(pdev); } EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, "CXL"); diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index ee38db08d005..8e3a60411610 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -948,7 +948,7 @@ static bool find_source_device(struct pci_dev *parent, * Note: AER must be enabled and supported by the device which must be * checked in advance, e.g. with pcie_aer_is_native(). */ -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) +void pci_aer_unmask_internal_errors(struct pci_dev *dev) { int aer = dev->aer_cap; u32 mask; @@ -961,6 +961,7 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev) mask &= ~PCI_ERR_COR_INTERNAL; pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); } +EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, "CXL"); static bool is_cxl_mem_dev(struct pci_dev *dev) { diff --git a/include/linux/aer.h b/include/linux/aer.h index 947b63091902..a54545796edc 100644 --- a/include/linux/aer.h +++ b/include/linux/aer.h @@ -61,5 +61,6 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, int cper_severity_to_aer(int cper_severity); void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, int severity, struct aer_capability_regs *aer_regs); +void pci_aer_unmask_internal_errors(struct pci_dev *dev); #endif //_AER_H_