Message ID | 20241025210305.27499-2-terry.bowman@amd.com |
---|---|
State | New |
Headers | show |
Series | Enable CXL PCIe port protocol error handling and logging | expand |
On Fri, 25 Oct 2024 16:02:52 -0500 Terry Bowman <terry.bowman@amd.com> wrote: > CXL.io provides PCIe like protocol error implementation, but CXL.io and > PCIe have different handling requirements. > > The PCIe AER service driver may attempt recovering PCIe devices with > uncorrectable errors while recovery is not used for CXL.io. Recovery is not > used in the CXL.io recovery because of the potential for corruption on > what can be system memory. > > Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler. > Create handlers for correctable and uncorrectable CXL.io error > handling. > > The CXL error handlers will be used in future patches adding CXL PCIe > port protocol error handling. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Hi Jonathan, Thank you for reviewing. Regards, Terry On 10/30/2024 10:14 AM, Jonathan Cameron wrote: > On Fri, 25 Oct 2024 16:02:52 -0500 > Terry Bowman <terry.bowman@amd.com> wrote: > >> CXL.io provides PCIe like protocol error implementation, but CXL.io and >> PCIe have different handling requirements. >> >> The PCIe AER service driver may attempt recovering PCIe devices with >> uncorrectable errors while recovery is not used for CXL.io. Recovery is not >> used in the CXL.io recovery because of the potential for corruption on >> what can be system memory. >> >> Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler. >> Create handlers for correctable and uncorrectable CXL.io error >> handling. >> >> The CXL error handlers will be used in future patches adding CXL PCIe >> port protocol error handling. >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
On 10/25/24 2:02 PM, Terry Bowman wrote: > CXL.io provides PCIe like protocol error implementation, but CXL.io and > PCIe have different handling requirements. > > The PCIe AER service driver may attempt recovering PCIe devices with > uncorrectable errors while recovery is not used for CXL.io. Recovery is not > used in the CXL.io recovery because of the potential for corruption on > what can be system memory. > > Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler. > Create handlers for correctable and uncorrectable CXL.io error > handling. > > The CXL error handlers will be used in future patches adding CXL PCIe > port protocol error handling. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> > --- > include/linux/pci.h | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 573b4c4c2be6..106ac83e3a7b 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -886,6 +886,14 @@ struct pci_error_handlers { > void (*cor_error_detected)(struct pci_dev *dev); > }; > > +/* CXL bus error event callbacks */ > +struct cxl_error_handlers { > + /* CXL bus error detected on this device */ > + bool (*error_detected)(struct pci_dev *dev); > + > + /* Allow device driver to record more details of a correctable error */ > + void (*cor_error_detected)(struct pci_dev *dev); > +}; > > struct module; > > @@ -956,6 +964,7 @@ struct pci_driver { > int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */ > u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf); > const struct pci_error_handlers *err_handler; > + const struct cxl_error_handlers *cxl_err_handler; > const struct attribute_group **groups; > const struct attribute_group **dev_groups; > struct device_driver driver;
On Fri, Oct 25, 2024 at 04:02:52PM -0500, Terry Bowman wrote: > CXL.io provides PCIe like protocol error implementation, but CXL.io and > PCIe have different handling requirements. > > The PCIe AER service driver may attempt recovering PCIe devices with > uncorrectable errors while recovery is not used for CXL.io. Recovery is not > used in the CXL.io recovery because of the potential for corruption on > what can be system memory. > > Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler. > Create handlers for correctable and uncorrectable CXL.io error > handling. > > The CXL error handlers will be used in future patches adding CXL PCIe > port protocol error handling. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > --- Reviewed-by: Fan Ni <fan.ni@samsung.com> > include/linux/pci.h | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 573b4c4c2be6..106ac83e3a7b 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -886,6 +886,14 @@ struct pci_error_handlers { > void (*cor_error_detected)(struct pci_dev *dev); > }; > > +/* CXL bus error event callbacks */ > +struct cxl_error_handlers { > + /* CXL bus error detected on this device */ > + bool (*error_detected)(struct pci_dev *dev); > + > + /* Allow device driver to record more details of a correctable error */ > + void (*cor_error_detected)(struct pci_dev *dev); > +}; > > struct module; > > @@ -956,6 +964,7 @@ struct pci_driver { > int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */ > u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf); > const struct pci_error_handlers *err_handler; > + const struct cxl_error_handlers *cxl_err_handler; > const struct attribute_group **groups; > const struct attribute_group **dev_groups; > struct device_driver driver; > -- > 2.34.1 >
diff --git a/include/linux/pci.h b/include/linux/pci.h index 573b4c4c2be6..106ac83e3a7b 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -886,6 +886,14 @@ struct pci_error_handlers { void (*cor_error_detected)(struct pci_dev *dev); }; +/* CXL bus error event callbacks */ +struct cxl_error_handlers { + /* CXL bus error detected on this device */ + bool (*error_detected)(struct pci_dev *dev); + + /* Allow device driver to record more details of a correctable error */ + void (*cor_error_detected)(struct pci_dev *dev); +}; struct module; @@ -956,6 +964,7 @@ struct pci_driver { int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */ u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf); const struct pci_error_handlers *err_handler; + const struct cxl_error_handlers *cxl_err_handler; const struct attribute_group **groups; const struct attribute_group **dev_groups; struct device_driver driver;
CXL.io provides PCIe like protocol error implementation, but CXL.io and PCIe have different handling requirements. The PCIe AER service driver may attempt recovering PCIe devices with uncorrectable errors while recovery is not used for CXL.io. Recovery is not used in the CXL.io recovery because of the potential for corruption on what can be system memory. Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler. Create handlers for correctable and uncorrectable CXL.io error handling. The CXL error handlers will be used in future patches adding CXL PCIe port protocol error handling. Signed-off-by: Terry Bowman <terry.bowman@amd.com> --- include/linux/pci.h | 9 +++++++++ 1 file changed, 9 insertions(+)