Message ID | 20241211234002.3728674-16-terry.bowman@amd.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Enable CXL PCIe Port protocol error handling and logging | expand |
On 12/11/24 23:40, Terry Bowman wrote: > The AER service driver enables PCIe Uncorrectable Internal Errors (UIE) and > Correctable Internal errors (CIE) for CXL Root Ports and CXL RCEC's. The > UIE and CIE are used in reporting CXL Protocol Errors. The same UIE/CIE > enablement is needed for CXL PCIe Upstream and Downstream Ports inorder to > notify the associated Root Port and OS.[1] > > Export the AER service driver's pci_aer_unmask_internal_errors() function > to CXL namespace. > > Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config > because it is now an exported function. > > Call pci_aer_unmask_internal_errors() during RAS initialization in: > cxl_uport_init_ras_reporting() and cxl_dport_init_ras_reporting(). > > [1] PCIe Base Spec r6.2-1.0, 6.2.3.2.2 Masking Individual Errors > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > --- > drivers/cxl/core/pci.c | 2 ++ > drivers/pci/pcie/aer.c | 5 +++-- > include/linux/aer.h | 1 + > 3 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 9734a4c55b29..740ac5d8809f 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -886,6 +886,7 @@ void cxl_uport_init_ras_reporting(struct cxl_port *port) > > cxl_assign_port_error_handlers(pdev); > devm_add_action_or_reset(port->uport_dev, cxl_clear_port_error_handlers, pdev); > + pci_aer_unmask_internal_errors(pdev); > } > EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, CXL); > > @@ -920,6 +921,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport) > > cxl_assign_port_error_handlers(pdev); > devm_add_action_or_reset(dport_dev, cxl_clear_port_error_handlers, pdev); > + pci_aer_unmask_internal_errors(pdev); > } > EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL); > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 861521872318..0fa1b1ed48c9 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -949,7 +949,6 @@ static bool is_internal_error(struct aer_err_info *info) > return info->status & PCI_ERR_UNC_INTN; > } > > -#ifdef CONFIG_PCIEAER_CXL This ifdef move puzzles me. I would expect to use it when the next function is invoked instead of moving it here. It seems weird to have such a config but code using those related functions not aware of it. > /** > * pci_aer_unmask_internal_errors - unmask internal errors > * @dev: pointer to the pcie_dev data structure > @@ -960,7 +959,7 @@ static bool is_internal_error(struct aer_err_info *info) > * Note: AER must be enabled and supported by the device which must be > * checked in advance, e.g. with pcie_aer_is_native(). > */ > -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) > +void pci_aer_unmask_internal_errors(struct pci_dev *dev) > { > int aer = dev->aer_cap; > u32 mask; > @@ -973,7 +972,9 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev) > mask &= ~PCI_ERR_COR_INTERNAL; > pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); > } > +EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, CXL); > > +#ifdef CONFIG_PCIEAER_CXL > static bool is_cxl_mem_dev(struct pci_dev *dev) > { > /* > diff --git a/include/linux/aer.h b/include/linux/aer.h > index 4b97f38f3fcf..093293f9f12b 100644 > --- a/include/linux/aer.h > +++ b/include/linux/aer.h > @@ -55,5 +55,6 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, > int cper_severity_to_aer(int cper_severity); > void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, > int severity, struct aer_capability_regs *aer_regs); > +void pci_aer_unmask_internal_errors(struct pci_dev *dev); > #endif //_AER_H_ >
On 12/12/24 09:44, Alejandro Lucero Palau wrote: > > On 12/11/24 23:40, Terry Bowman wrote: >> The AER service driver enables PCIe Uncorrectable Internal Errors >> (UIE) and >> Correctable Internal errors (CIE) for CXL Root Ports and CXL RCEC's. The >> UIE and CIE are used in reporting CXL Protocol Errors. The same UIE/CIE >> enablement is needed for CXL PCIe Upstream and Downstream Ports >> inorder to >> notify the associated Root Port and OS.[1] >> >> Export the AER service driver's pci_aer_unmask_internal_errors() >> function >> to CXL namespace. >> >> Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config >> because it is now an exported function. >> >> Call pci_aer_unmask_internal_errors() during RAS initialization in: >> cxl_uport_init_ras_reporting() and cxl_dport_init_ras_reporting(). >> >> [1] PCIe Base Spec r6.2-1.0, 6.2.3.2.2 Masking Individual Errors >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> --- >> drivers/cxl/core/pci.c | 2 ++ >> drivers/pci/pcie/aer.c | 5 +++-- >> include/linux/aer.h | 1 + >> 3 files changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c >> index 9734a4c55b29..740ac5d8809f 100644 >> --- a/drivers/cxl/core/pci.c >> +++ b/drivers/cxl/core/pci.c >> @@ -886,6 +886,7 @@ void cxl_uport_init_ras_reporting(struct cxl_port >> *port) >> cxl_assign_port_error_handlers(pdev); >> devm_add_action_or_reset(port->uport_dev, >> cxl_clear_port_error_handlers, pdev); >> + pci_aer_unmask_internal_errors(pdev); >> } >> EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, CXL); >> @@ -920,6 +921,7 @@ void cxl_dport_init_ras_reporting(struct >> cxl_dport *dport) >> cxl_assign_port_error_handlers(pdev); >> devm_add_action_or_reset(dport_dev, >> cxl_clear_port_error_handlers, pdev); >> + pci_aer_unmask_internal_errors(pdev); >> } >> EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL); >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index 861521872318..0fa1b1ed48c9 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -949,7 +949,6 @@ static bool is_internal_error(struct aer_err_info >> *info) >> return info->status & PCI_ERR_UNC_INTN; >> } >> -#ifdef CONFIG_PCIEAER_CXL > > > This ifdef move puzzles me. I would expect to use it when the next > function is invoked instead of moving it here. > > It seems weird to have such a config but code using those related > functions not aware of it. > > >> /** >> * pci_aer_unmask_internal_errors - unmask internal errors >> * @dev: pointer to the pcie_dev data structure >> @@ -960,7 +959,7 @@ static bool is_internal_error(struct aer_err_info >> *info) >> * Note: AER must be enabled and supported by the device which must be >> * checked in advance, e.g. with pcie_aer_is_native(). >> */ >> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) >> +void pci_aer_unmask_internal_errors(struct pci_dev *dev) >> { >> int aer = dev->aer_cap; >> u32 mask; >> @@ -973,7 +972,9 @@ static void pci_aer_unmask_internal_errors(struct >> pci_dev *dev) >> mask &= ~PCI_ERR_COR_INTERNAL; >> pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); >> } >> +EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, CXL); Forgot to mention all these exports are changing in 6.13 with the second macro param being now an string, so just EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, "CXL"); Not affected in the codebase linked to this patchset, but I hope it helps you when getting weird errors with a newer kernel. >> +#ifdef CONFIG_PCIEAER_CXL >> static bool is_cxl_mem_dev(struct pci_dev *dev) >> { >> /* >> diff --git a/include/linux/aer.h b/include/linux/aer.h >> index 4b97f38f3fcf..093293f9f12b 100644 >> --- a/include/linux/aer.h >> +++ b/include/linux/aer.h >> @@ -55,5 +55,6 @@ void pci_print_aer(struct pci_dev *dev, int >> aer_severity, >> int cper_severity_to_aer(int cper_severity); >> void aer_recover_queue(int domain, unsigned int bus, unsigned int >> devfn, >> int severity, struct aer_capability_regs *aer_regs); >> +void pci_aer_unmask_internal_errors(struct pci_dev *dev); >> #endif //_AER_H_ >
On 12/12/2024 4:44 AM, Alejandro Lucero Palau wrote: > On 12/12/24 09:44, Alejandro Lucero Palau wrote: >> On 12/11/24 23:40, Terry Bowman wrote: >>> The AER service driver enables PCIe Uncorrectable Internal Errors >>> (UIE) and >>> Correctable Internal errors (CIE) for CXL Root Ports and CXL RCEC's. The >>> UIE and CIE are used in reporting CXL Protocol Errors. The same UIE/CIE >>> enablement is needed for CXL PCIe Upstream and Downstream Ports >>> inorder to >>> notify the associated Root Port and OS.[1] >>> >>> Export the AER service driver's pci_aer_unmask_internal_errors() >>> function >>> to CXL namespace. >>> >>> Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config >>> because it is now an exported function. >>> >>> Call pci_aer_unmask_internal_errors() during RAS initialization in: >>> cxl_uport_init_ras_reporting() and cxl_dport_init_ras_reporting(). >>> >>> [1] PCIe Base Spec r6.2-1.0, 6.2.3.2.2 Masking Individual Errors >>> >>> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >>> --- >>> drivers/cxl/core/pci.c | 2 ++ >>> drivers/pci/pcie/aer.c | 5 +++-- >>> include/linux/aer.h | 1 + >>> 3 files changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c >>> index 9734a4c55b29..740ac5d8809f 100644 >>> --- a/drivers/cxl/core/pci.c >>> +++ b/drivers/cxl/core/pci.c >>> @@ -886,6 +886,7 @@ void cxl_uport_init_ras_reporting(struct cxl_port >>> *port) >>> cxl_assign_port_error_handlers(pdev); >>> devm_add_action_or_reset(port->uport_dev, >>> cxl_clear_port_error_handlers, pdev); >>> + pci_aer_unmask_internal_errors(pdev); >>> } >>> EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, CXL); >>> @@ -920,6 +921,7 @@ void cxl_dport_init_ras_reporting(struct >>> cxl_dport *dport) >>> cxl_assign_port_error_handlers(pdev); >>> devm_add_action_or_reset(dport_dev, >>> cxl_clear_port_error_handlers, pdev); >>> + pci_aer_unmask_internal_errors(pdev); >>> } >>> EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL); >>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >>> index 861521872318..0fa1b1ed48c9 100644 >>> --- a/drivers/pci/pcie/aer.c >>> +++ b/drivers/pci/pcie/aer.c >>> @@ -949,7 +949,6 @@ static bool is_internal_error(struct aer_err_info >>> *info) >>> return info->status & PCI_ERR_UNC_INTN; >>> } >>> -#ifdef CONFIG_PCIEAER_CXL >> >> This ifdef move puzzles me. I would expect to use it when the next >> function is invoked instead of moving it here. >> >> It seems weird to have such a config but code using those related >> functions not aware of it. >> >> >>> /** >>> * pci_aer_unmask_internal_errors - unmask internal errors >>> * @dev: pointer to the pcie_dev data structure >>> @@ -960,7 +959,7 @@ static bool is_internal_error(struct aer_err_info >>> *info) >>> * Note: AER must be enabled and supported by the device which must be >>> * checked in advance, e.g. with pcie_aer_is_native(). >>> */ >>> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) >>> +void pci_aer_unmask_internal_errors(struct pci_dev *dev) >>> { >>> int aer = dev->aer_cap; >>> u32 mask; >>> @@ -973,7 +972,9 @@ static void pci_aer_unmask_internal_errors(struct >>> pci_dev *dev) >>> mask &= ~PCI_ERR_COR_INTERNAL; >>> pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); >>> } >>> +EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, CXL); > > Forgot to mention all these exports are changing in 6.13 with the second > macro param being now an string, so just > > EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, "CXL"); > > > Not affected in the codebase linked to this patchset, but I hope it > helps you when getting weird errors with a newer kernel. Thanks for the heads-up? - Terry > >>> +#ifdef CONFIG_PCIEAER_CXL >>> static bool is_cxl_mem_dev(struct pci_dev *dev) >>> { >>> /* >>> diff --git a/include/linux/aer.h b/include/linux/aer.h >>> index 4b97f38f3fcf..093293f9f12b 100644 >>> --- a/include/linux/aer.h >>> +++ b/include/linux/aer.h >>> @@ -55,5 +55,6 @@ void pci_print_aer(struct pci_dev *dev, int >>> aer_severity, >>> int cper_severity_to_aer(int cper_severity); >>> void aer_recover_queue(int domain, unsigned int bus, unsigned int >>> devfn, >>> int severity, struct aer_capability_regs *aer_regs); >>> +void pci_aer_unmask_internal_errors(struct pci_dev *dev); >>> #endif //_AER_H_
On 12/12/2024 3:44 AM, Alejandro Lucero Palau wrote: > On 12/11/24 23:40, Terry Bowman wrote: >> The AER service driver enables PCIe Uncorrectable Internal Errors (UIE) and >> Correctable Internal errors (CIE) for CXL Root Ports and CXL RCEC's. The >> UIE and CIE are used in reporting CXL Protocol Errors. The same UIE/CIE >> enablement is needed for CXL PCIe Upstream and Downstream Ports inorder to >> notify the associated Root Port and OS.[1] >> >> Export the AER service driver's pci_aer_unmask_internal_errors() function >> to CXL namespace. >> >> Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config >> because it is now an exported function. >> >> Call pci_aer_unmask_internal_errors() during RAS initialization in: >> cxl_uport_init_ras_reporting() and cxl_dport_init_ras_reporting(). >> >> [1] PCIe Base Spec r6.2-1.0, 6.2.3.2.2 Masking Individual Errors >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> --- >> drivers/cxl/core/pci.c | 2 ++ >> drivers/pci/pcie/aer.c | 5 +++-- >> include/linux/aer.h | 1 + >> 3 files changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c >> index 9734a4c55b29..740ac5d8809f 100644 >> --- a/drivers/cxl/core/pci.c >> +++ b/drivers/cxl/core/pci.c >> @@ -886,6 +886,7 @@ void cxl_uport_init_ras_reporting(struct cxl_port *port) >> >> cxl_assign_port_error_handlers(pdev); >> devm_add_action_or_reset(port->uport_dev, cxl_clear_port_error_handlers, pdev); >> + pci_aer_unmask_internal_errors(pdev); >> } >> EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, CXL); >> >> @@ -920,6 +921,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport) >> >> cxl_assign_port_error_handlers(pdev); >> devm_add_action_or_reset(dport_dev, cxl_clear_port_error_handlers, pdev); >> + pci_aer_unmask_internal_errors(pdev); >> } >> EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL); >> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index 861521872318..0fa1b1ed48c9 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -949,7 +949,6 @@ static bool is_internal_error(struct aer_err_info *info) >> return info->status & PCI_ERR_UNC_INTN; >> } >> >> -#ifdef CONFIG_PCIEAER_CXL > > This ifdef move puzzles me. I would expect to use it when the next > function is invoked instead of moving it here. > > It seems weird to have such a config but code using those related > functions not aware of it. > I was asked to remove the dependency on the KConfig (ifdef) because the function is also being 'exported' and used across multiple subsystems. Because its exported, the function behavior needs to be consistent and independent of a KConfig. I'll update the commit message with this reasoning. - Terry >> /** >> * pci_aer_unmask_internal_errors - unmask internal errors >> * @dev: pointer to the pcie_dev data structure >> @@ -960,7 +959,7 @@ static bool is_internal_error(struct aer_err_info *info) >> * Note: AER must be enabled and supported by the device which must be >> * checked in advance, e.g. with pcie_aer_is_native(). >> */ >> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) >> +void pci_aer_unmask_internal_errors(struct pci_dev *dev) >> { >> int aer = dev->aer_cap; >> u32 mask; >> @@ -973,7 +972,9 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev) >> mask &= ~PCI_ERR_COR_INTERNAL; >> pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); >> } >> +EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, CXL); >> >> +#ifdef CONFIG_PCIEAER_CXL >> static bool is_cxl_mem_dev(struct pci_dev *dev) >> { >> /* >> diff --git a/include/linux/aer.h b/include/linux/aer.h >> index 4b97f38f3fcf..093293f9f12b 100644 >> --- a/include/linux/aer.h >> +++ b/include/linux/aer.h >> @@ -55,5 +55,6 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, >> int cper_severity_to_aer(int cper_severity); >> void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, >> int severity, struct aer_capability_regs *aer_regs); >> +void pci_aer_unmask_internal_errors(struct pci_dev *dev); >> #endif //_AER_H_ >>
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index 9734a4c55b29..740ac5d8809f 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -886,6 +886,7 @@ void cxl_uport_init_ras_reporting(struct cxl_port *port) cxl_assign_port_error_handlers(pdev); devm_add_action_or_reset(port->uport_dev, cxl_clear_port_error_handlers, pdev); + pci_aer_unmask_internal_errors(pdev); } EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, CXL); @@ -920,6 +921,7 @@ void cxl_dport_init_ras_reporting(struct cxl_dport *dport) cxl_assign_port_error_handlers(pdev); devm_add_action_or_reset(dport_dev, cxl_clear_port_error_handlers, pdev); + pci_aer_unmask_internal_errors(pdev); } EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL); diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 861521872318..0fa1b1ed48c9 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -949,7 +949,6 @@ static bool is_internal_error(struct aer_err_info *info) return info->status & PCI_ERR_UNC_INTN; } -#ifdef CONFIG_PCIEAER_CXL /** * pci_aer_unmask_internal_errors - unmask internal errors * @dev: pointer to the pcie_dev data structure @@ -960,7 +959,7 @@ static bool is_internal_error(struct aer_err_info *info) * Note: AER must be enabled and supported by the device which must be * checked in advance, e.g. with pcie_aer_is_native(). */ -static void pci_aer_unmask_internal_errors(struct pci_dev *dev) +void pci_aer_unmask_internal_errors(struct pci_dev *dev) { int aer = dev->aer_cap; u32 mask; @@ -973,7 +972,9 @@ static void pci_aer_unmask_internal_errors(struct pci_dev *dev) mask &= ~PCI_ERR_COR_INTERNAL; pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask); } +EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, CXL); +#ifdef CONFIG_PCIEAER_CXL static bool is_cxl_mem_dev(struct pci_dev *dev) { /* diff --git a/include/linux/aer.h b/include/linux/aer.h index 4b97f38f3fcf..093293f9f12b 100644 --- a/include/linux/aer.h +++ b/include/linux/aer.h @@ -55,5 +55,6 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity, int cper_severity_to_aer(int cper_severity); void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, int severity, struct aer_capability_regs *aer_regs); +void pci_aer_unmask_internal_errors(struct pci_dev *dev); #endif //_AER_H_
The AER service driver enables PCIe Uncorrectable Internal Errors (UIE) and Correctable Internal errors (CIE) for CXL Root Ports and CXL RCEC's. The UIE and CIE are used in reporting CXL Protocol Errors. The same UIE/CIE enablement is needed for CXL PCIe Upstream and Downstream Ports inorder to notify the associated Root Port and OS.[1] Export the AER service driver's pci_aer_unmask_internal_errors() function to CXL namespace. Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config because it is now an exported function. Call pci_aer_unmask_internal_errors() during RAS initialization in: cxl_uport_init_ras_reporting() and cxl_dport_init_ras_reporting(). [1] PCIe Base Spec r6.2-1.0, 6.2.3.2.2 Masking Individual Errors Signed-off-by: Terry Bowman <terry.bowman@amd.com> --- drivers/cxl/core/pci.c | 2 ++ drivers/pci/pcie/aer.c | 5 +++-- include/linux/aer.h | 1 + 3 files changed, 6 insertions(+), 2 deletions(-)