Message ID | 20250107143852.3692571-7-terry.bowman@amd.com |
---|---|
State | New |
Headers | show |
Series | Enable CXL PCIe port protocol error handling and logging | expand |
On Tue, 7 Jan 2025 08:38:42 -0600 Terry Bowman <terry.bowman@amd.com> wrote: > The AER service driver's aer_get_device_error_info() function doesn't read > uncorrectable (UCE) fatal error status from PCIe Upstream Port devices, > including CXL Upstream Switch Ports. As a result, fatal errors are not > logged or handled as needed for CXL PCIe Upstream Switch Port devices. > > Update the aer_get_device_error_info() function to read the UCE fatal > status for all CXL PCIe devices. Make the change such that non-CXL devices > are not affected. > > The fatal error status will be used in future patches implementing > CXL PCIe Port uncorrectable error handling and logging. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> This clashes with Shuai's series adding link healthy checks. Maybe we can reuse that logic to incorporate the condition we care about here? > --- > drivers/pci/pcie/aer.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 62be599e3bee..79c828bdcb6d 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1253,7 +1253,8 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) > } else if (type == PCI_EXP_TYPE_ROOT_PORT || > type == PCI_EXP_TYPE_RC_EC || > type == PCI_EXP_TYPE_DOWNSTREAM || > - info->severity == AER_NONFATAL) { > + info->severity == AER_NONFATAL || > + (pcie_is_cxl(dev) && type == PCI_EXP_TYPE_UPSTREAM)) { > > /* Link is still healthy for IO reads */ > pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
Terry Bowman wrote: > The AER service driver's aer_get_device_error_info() function doesn't read > uncorrectable (UCE) fatal error status from PCIe Upstream Port devices, > including CXL Upstream Switch Ports. As a result, fatal errors are not > logged or handled as needed for CXL PCIe Upstream Switch Port devices. > > Update the aer_get_device_error_info() function to read the UCE fatal > status for all CXL PCIe devices. Make the change such that non-CXL devices > are not affected. > > The fatal error status will be used in future patches implementing > CXL PCIe Port uncorrectable error handling and logging. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> [snip]
On 1/14/2025 5:32 AM, Jonathan Cameron wrote: > On Tue, 7 Jan 2025 08:38:42 -0600 > Terry Bowman <terry.bowman@amd.com> wrote: > >> The AER service driver's aer_get_device_error_info() function doesn't read >> uncorrectable (UCE) fatal error status from PCIe Upstream Port devices, >> including CXL Upstream Switch Ports. As a result, fatal errors are not >> logged or handled as needed for CXL PCIe Upstream Switch Port devices. >> >> Update the aer_get_device_error_info() function to read the UCE fatal >> status for all CXL PCIe devices. Make the change such that non-CXL devices >> are not affected. >> >> The fatal error status will be used in future patches implementing >> CXL PCIe Port uncorrectable error handling and logging. >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> > This clashes with Shuai's series adding link healthy checks. > Maybe we can reuse that logic to incorporate the condition we > care about here? > I'll add changes to query Upstream Port link status. I'll borrow from Shuai's patch. Regards, Terry >> --- >> drivers/pci/pcie/aer.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index 62be599e3bee..79c828bdcb6d 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -1253,7 +1253,8 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) >> } else if (type == PCI_EXP_TYPE_ROOT_PORT || >> type == PCI_EXP_TYPE_RC_EC || >> type == PCI_EXP_TYPE_DOWNSTREAM || >> - info->severity == AER_NONFATAL) { >> + info->severity == AER_NONFATAL || >> + (pcie_is_cxl(dev) && type == PCI_EXP_TYPE_UPSTREAM)) { >> >> /* Link is still healthy for IO reads */ >> pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 62be599e3bee..79c828bdcb6d 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1253,7 +1253,8 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) } else if (type == PCI_EXP_TYPE_ROOT_PORT || type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_DOWNSTREAM || - info->severity == AER_NONFATAL) { + info->severity == AER_NONFATAL || + (pcie_is_cxl(dev) && type == PCI_EXP_TYPE_UPSTREAM)) { /* Link is still healthy for IO reads */ pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
The AER service driver's aer_get_device_error_info() function doesn't read uncorrectable (UCE) fatal error status from PCIe Upstream Port devices, including CXL Upstream Switch Ports. As a result, fatal errors are not logged or handled as needed for CXL PCIe Upstream Switch Port devices. Update the aer_get_device_error_info() function to read the UCE fatal status for all CXL PCIe devices. Make the change such that non-CXL devices are not affected. The fatal error status will be used in future patches implementing CXL PCIe Port uncorrectable error handling and logging. Signed-off-by: Terry Bowman <terry.bowman@amd.com> --- drivers/pci/pcie/aer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)