Message ID | 20240617200411.1426554-8-terry.bowman@amd.com |
---|---|
State | New, archived |
Headers | show |
Series | Add RAS support for CXL root ports, CXL downstream switch ports, and CXL upstream switch ports | expand |
On Mon, 17 Jun 2024 15:04:09 -0500 Terry Bowman <terry.bowman@amd.com> wrote: > CXL root ports, CXL downstream switch ports, and CXL upstream switch > ports are bound to the PCIe port bus driver, portdrv. portdrv provides > an atomic notifier chain for reporting PCIe port device AER > correctable internal errors (CIE) and AER uncorrectable internal > errors (UIE). > > CXL PCIe port devices use AER CIE/UIE to report CXL RAS.[1] > > Add a cxl_pci atomic notification callback for handling the portdrv's > AER UIE/CIE notifications. > > Register the atomic notification callback in the cxl_pci module's > load. Unregister the callback in the cxl_pci driver's unload. > > Implement the callback to check if the device parameter is a valid > CXL PCIe port. If it is valid then make the notification callback call > __cxl_handle_cor_ras() or __cxl_handle_ras() depending on the AER > type. > > [1] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and > Upstream Switch Ports > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> Hi Terry, Some comments inline. Mostly this comes down to earlier question of whether this notifier should be registered per device or globally. I think I still prefer per device, but attaching the handler will be trickier and I'm guessing there may be some locking/lifetime issues doing that which are avoided by a global notifier. Jonathan > --- > drivers/cxl/core/core.h | 4 ++ > drivers/cxl/core/pci.c | 97 ++++++++++++++++++++++++++++++++++++++--- > drivers/cxl/core/port.c | 6 +-- > drivers/cxl/cxl.h | 5 +++ > drivers/cxl/cxlpci.h | 2 + > drivers/cxl/pci.c | 19 +++++++- > 6 files changed, 123 insertions(+), 10 deletions(-) > > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h > index bc5a95665aa0..69bef1db6ee0 100644 > --- a/drivers/cxl/core/core.h > +++ b/drivers/cxl/core/core.h > @@ -94,4 +94,8 @@ int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, > enum access_coordinate_class access); > bool cxl_need_node_perf_attrs_update(int nid); > > +struct cxl_dport *find_dport(struct cxl_port *port, int id); > +struct cxl_port *find_cxl_port(struct device *dport_dev, > + struct cxl_dport **dport); > + > #endif /* __CXL_CORE_H__ */ > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 59a317ab84bb..e630eccb733d 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -689,7 +689,6 @@ EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL); > static void __cxl_handle_cor_ras(struct device *dev, > void __iomem *ras_base) > { > - struct cxl_memdev *cxlmd = to_cxl_memdev(dev); > void __iomem *addr; > u32 status; > > @@ -698,10 +697,17 @@ static void __cxl_handle_cor_ras(struct device *dev, > > addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET; > status = readl(addr); > - if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) { > - writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr); > + Blank line probably not wanted as we want to group the status check with the read (it's kind of an error check). > + if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK)) > + return; > + > + writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr); > + if (is_cxl_memdev(dev)) { > + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); > + > trace_cxl_aer_correctable_error(cxlmd, status); As below - don't bother with local cxlmd variable. > - } > + } else if (dev_is_pci(dev)) > + trace_cxl_port_aer_correctable_error(dev, status); > } > > static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds) > @@ -733,7 +739,6 @@ static void header_log_copy(void __iomem *ras_base, u32 *log) > static bool __cxl_handle_ras(struct device *dev, > void __iomem *ras_base) > { > - struct cxl_memdev *cxlmd = to_cxl_memdev(dev); > u32 hl[CXL_HEADERLOG_SIZE_U32]; > void __iomem *addr; > u32 status; > @@ -759,7 +764,13 @@ static bool __cxl_handle_ras(struct device *dev, > } > > header_log_copy(ras_base, hl); > - trace_cxl_aer_uncorrectable_error(cxlmd, status, fe, hl); > + if (is_cxl_memdev(dev)) { > + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); Just use this inline. trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), status, fe, hl); > + > + trace_cxl_aer_uncorrectable_error(cxlmd, status, fe, hl); > + } else if (dev_is_pci(dev)) > + trace_cxl_port_aer_uncorrectable_error(dev, status); As before, why no fe or hl? I'm sure I'm missing some spec subtlty but a comment would help me and others avoid that. > + > writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr); > > return true; > @@ -882,6 +893,80 @@ static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds, > return __cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras); > } > > +static int match_uport(struct device *dev, void *data) > +{ > + struct device *uport_dev = (struct device *)data; > + struct cxl_port *port; > + > + if (!is_cxl_port(dev)) > + return 0; > + > + port = to_cxl_port(dev); > + > + return (port->uport_dev == uport_dev); () doesn't add much so I'd drop them. > +} > + > +static struct cxl_port *pci_to_cxl_uport(struct pci_dev *pdev) > +{ > + struct cxl_dport *dport; > + struct device *port_dev; > + struct cxl_port *port; > + > + port = find_cxl_port(pdev->dev.parent, &dport); > + if (!port) > + return NULL; > + put_device(&port->dev); I'm confused on the lifetimes. Doesn't it make more sense to hold this until after you've stopped using it? So move the put after device_find_child(...) > + > + port_dev = device_find_child(&port->dev, &pdev->dev, match_uport); > + if (!port_dev) > + return NULL; > + put_device(port_dev); > + > + port = to_cxl_port(port_dev); > + > + return port; return to_cxl_port(port_dev); > +} > + > +static void __iomem *cxl_pci_port_ras(struct pci_dev *pdev) > +{ > + void __iomem *ras_base = NULL; Don't initialize and... > + > + if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) || > + (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM)) { > + struct cxl_dport *dport; > + > + find_cxl_port(&pdev->dev, &dport); > + ras_base = dport ? dport->regs.ras : NULL; if (dport) return dport->regs.ras; > + } else if (pci_pcie_type(pdev) == PCI_EXP_TYPE_UPSTREAM) { > + struct cxl_port *port = pci_to_cxl_uport(pdev); > + > + ras_base = port ? port->regs.ras : NULL; if (port) return port->regs.ras; > + } return NULL; > + > + return ras_base; > +} > + > +int port_internal_err_cb(struct notifier_block *unused, > + unsigned long event, void *ptr) > +{ > + struct pci_dev *pdev = (struct pci_dev *)ptr; I think you can use this notifier for other types of device in future? If it's going to be global definitely want to check here that we actually have a CXL port of some type. It may be that via the various calls any non CXL device will result in a safe error. However that's not obvious, so an upfront check makes sense (or a per device notifier registration!) > + void __iomem *ras_base; > + > + if (!pdev) > + return 0; > + > + if (event == AER_CORRECTABLE) { > + ras_base = cxl_pci_port_ras(pdev); > + __cxl_handle_cor_ras(&pdev->dev, ras_base); as below - one line should be fine for this. > + } else if ((event == AER_FATAL) || (event == AER_NONFATAL)) { > + ras_base = cxl_pci_port_ras(pdev); > + __cxl_handle_ras(&pdev->dev, ras_base); __cxl_handle_ras(&pdev->dev, cxl_pci_port_ras(pdev)); > + } > + > + return 0; > +} > +EXPORT_SYMBOL_NS_GPL(port_internal_err_cb, CXL); > + > /* > * Copy the AER capability registers using 32 bit read accesses. > * This is necessary because RCRB AER capability is MMIO mapped. Clear the > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index 887ed6e358fb..d0f95c965ab4 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -1027,7 +1027,7 @@ void put_cxl_root(struct cxl_root *cxl_root) > } > EXPORT_SYMBOL_NS_GPL(put_cxl_root, CXL); > > -static struct cxl_dport *find_dport(struct cxl_port *port, int id) > +struct cxl_dport *find_dport(struct cxl_port *port, int id) > { > struct cxl_dport *dport; > unsigned long index; > @@ -1336,8 +1336,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx) > return NULL; > } > > -static struct cxl_port *find_cxl_port(struct device *dport_dev, > - struct cxl_dport **dport) > +struct cxl_port *find_cxl_port(struct device *dport_dev, > + struct cxl_dport **dport) > { > struct cxl_find_port_ctx ctx = { > .dport_dev = dport_dev, > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index 7cee678fdb75..04725344393b 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -11,6 +11,7 @@ > #include <linux/log2.h> > #include <linux/node.h> > #include <linux/io.h> > +#include "../pci/pcie/portdrv.h" Why add the include? Maybe only needed in c files/ > > /** > * DOC: cxl objects > @@ -760,11 +761,15 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port, > #ifdef CONFIG_PCIEAER_CXL > void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport); > void cxl_setup_parent_uport(struct device *host, struct cxl_port *port); > +int port_internal_err_cb(struct notifier_block *unused, > + unsigned long event, void *ptr); > #else > static inline void cxl_setup_parent_dport(struct device *host, > struct cxl_dport *dport) { } > static inline void cxl_setup_parent_uport(struct device *host, > struct cxl_port *port) { } > +static inline int port_internal_err_cb(struct notifier_block *unused, > + unsigned long event, void *ptr) { return 0; } > #endif > > struct cxl_decoder *to_cxl_decoder(struct device *dev); > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h > index 93992a1c8eec..6044955e1c48 100644 > --- a/drivers/cxl/cxlpci.h > +++ b/drivers/cxl/cxlpci.h > @@ -130,4 +130,6 @@ void read_cdat_data(struct cxl_port *port); > void cxl_cor_error_detected(struct pci_dev *pdev); > pci_ers_result_t cxl_error_detected(struct pci_dev *pdev, > pci_channel_state_t state); > +int port_err_nb_cb(struct notifier_block *unused, > + unsigned long event, void *ptr); > #endif /* __CXL_PCI_H__ */ > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 2ff361e756d6..f4183c5aea38 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -926,6 +926,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > return rc; > } > > +struct notifier_block port_internal_err_nb = { > + .notifier_call = port_internal_err_cb, > +}; > + > static const struct pci_device_id cxl_mem_pci_tbl[] = { > /* PCI class code for CXL.mem Type-3 Devices */ > { PCI_DEVICE_CLASS((PCI_CLASS_MEMORY_CXL << 8 | CXL_MEMORY_PROGIF), ~0)}, > @@ -974,6 +978,19 @@ static struct pci_driver cxl_pci_driver = { > }, > }; > > -module_pci_driver(cxl_pci_driver); > +static int __init cxl_pci_init(void) > +{ > + atomic_notifier_chain_register(&portdrv_aer_internal_err_chain, &port_internal_err_nb); Long line that you can easily break. > + return pci_register_driver(&cxl_pci_driver); > +} > +module_init(cxl_pci_init); > + > +static void __exit cxl_pci_exit(void) > +{ > + atomic_notifier_chain_unregister(&portdrv_aer_internal_err_chain, &port_internal_err_nb); Long line that you can easily break. > + pci_unregister_driver(&cxl_pci_driver); > +} > +module_exit(cxl_pci_exit); > + > MODULE_LICENSE("GPL v2"); > MODULE_IMPORT_NS(CXL);
Hi Jonathan, I added repsonses inline below. On 6/20/24 08:09, Jonathan Cameron wrote: > On Mon, 17 Jun 2024 15:04:09 -0500 > Terry Bowman <terry.bowman@amd.com> wrote: > >> CXL root ports, CXL downstream switch ports, and CXL upstream switch >> ports are bound to the PCIe port bus driver, portdrv. portdrv provides >> an atomic notifier chain for reporting PCIe port device AER >> correctable internal errors (CIE) and AER uncorrectable internal >> errors (UIE). >> >> CXL PCIe port devices use AER CIE/UIE to report CXL RAS.[1] >> >> Add a cxl_pci atomic notification callback for handling the portdrv's >> AER UIE/CIE notifications. >> >> Register the atomic notification callback in the cxl_pci module's >> load. Unregister the callback in the cxl_pci driver's unload. >> >> Implement the callback to check if the device parameter is a valid >> CXL PCIe port. If it is valid then make the notification callback call >> __cxl_handle_cor_ras() or __cxl_handle_ras() depending on the AER >> type. >> >> [1] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and >> Upstream Switch Ports >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Hi Terry, > > Some comments inline. Mostly this comes down to earlier question of whether > this notifier should be registered per device or globally. > I think I still prefer per device, but attaching the handler will be trickier > and I'm guessing there may be some locking/lifetime issues doing that which > are avoided by a global notifier. > > Jonathan > I agree on the per-device notifier. >> --- >> drivers/cxl/core/core.h | 4 ++ >> drivers/cxl/core/pci.c | 97 ++++++++++++++++++++++++++++++++++++++--- >> drivers/cxl/core/port.c | 6 +-- >> drivers/cxl/cxl.h | 5 +++ >> drivers/cxl/cxlpci.h | 2 + >> drivers/cxl/pci.c | 19 +++++++- >> 6 files changed, 123 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h >> index bc5a95665aa0..69bef1db6ee0 100644 >> --- a/drivers/cxl/core/core.h >> +++ b/drivers/cxl/core/core.h >> @@ -94,4 +94,8 @@ int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, >> enum access_coordinate_class access); >> bool cxl_need_node_perf_attrs_update(int nid); >> >> +struct cxl_dport *find_dport(struct cxl_port *port, int id); >> +struct cxl_port *find_cxl_port(struct device *dport_dev, >> + struct cxl_dport **dport); >> + >> #endif /* __CXL_CORE_H__ */ >> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c >> index 59a317ab84bb..e630eccb733d 100644 >> --- a/drivers/cxl/core/pci.c >> +++ b/drivers/cxl/core/pci.c >> @@ -689,7 +689,6 @@ EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL); >> static void __cxl_handle_cor_ras(struct device *dev, >> void __iomem *ras_base) >> { >> - struct cxl_memdev *cxlmd = to_cxl_memdev(dev); >> void __iomem *addr; >> u32 status; >> >> @@ -698,10 +697,17 @@ static void __cxl_handle_cor_ras(struct device *dev, >> >> addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET; >> status = readl(addr); >> - if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) { >> - writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr); >> + > > Blank line probably not wanted as we want to group the status > check with the read (it's kind of an error check). > Ok. >> + if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK)) >> + return; >> + >> + writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr); >> + if (is_cxl_memdev(dev)) { >> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); >> + >> trace_cxl_aer_correctable_error(cxlmd, status); > As below - don't bother with local cxlmd variable. > Ok. >> - } >> + } else if (dev_is_pci(dev)) >> + trace_cxl_port_aer_correctable_error(dev, status); >> } >> >> static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds) >> @@ -733,7 +739,6 @@ static void header_log_copy(void __iomem *ras_base, u32 *log) >> static bool __cxl_handle_ras(struct device *dev, >> void __iomem *ras_base) >> { >> - struct cxl_memdev *cxlmd = to_cxl_memdev(dev); >> u32 hl[CXL_HEADERLOG_SIZE_U32]; >> void __iomem *addr; >> u32 status; >> @@ -759,7 +764,13 @@ static bool __cxl_handle_ras(struct device *dev, >> } >> >> header_log_copy(ras_base, hl); >> - trace_cxl_aer_uncorrectable_error(cxlmd, status, fe, hl); >> + if (is_cxl_memdev(dev)) { >> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); > Just use this inline. > trace_cxl_aer_uncorrectable_error(to_cxl_memdev(dev), > status, fe, hl); > >> + >> + trace_cxl_aer_uncorrectable_error(cxlmd, status, fe, hl); >> + } else if (dev_is_pci(dev)) >> + trace_cxl_port_aer_uncorrectable_error(dev, status); > > As before, why no fe or hl? I'm sure I'm missing some spec subtlty > but a comment would help me and others avoid that. > Adding the fe and hl on the list to be added. No, you're spot on. >> + >> writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr); >> >> return true; >> @@ -882,6 +893,80 @@ static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds, >> return __cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras); >> } >> >> +static int match_uport(struct device *dev, void *data) >> +{ >> + struct device *uport_dev = (struct device *)data; >> + struct cxl_port *port; >> + >> + if (!is_cxl_port(dev)) >> + return 0; >> + >> + port = to_cxl_port(dev); >> + >> + return (port->uport_dev == uport_dev); > () doesn't add much so I'd drop them. > >> +} >> + >> +static struct cxl_port *pci_to_cxl_uport(struct pci_dev *pdev) >> +{ >> + struct cxl_dport *dport; >> + struct device *port_dev; >> + struct cxl_port *port; >> + >> + port = find_cxl_port(pdev->dev.parent, &dport); >> + if (!port) >> + return NULL; >> + put_device(&port->dev); > I'm confused on the lifetimes. Doesn't it make more sense > to hold this until after you've stopped using it? So move the > put after device_find_child(...) > Ok. >> + >> + port_dev = device_find_child(&port->dev, &pdev->dev, match_uport); >> + if (!port_dev) >> + return NULL; >> + put_device(port_dev); >> + >> + port = to_cxl_port(port_dev); >> + >> + return port; > > return to_cxl_port(port_dev); > >> +} >> + >> +static void __iomem *cxl_pci_port_ras(struct pci_dev *pdev) >> +{ >> + void __iomem *ras_base = NULL; > Don't initialize and... There is possibility the incorrect PCI type is passed and this is intended to return NULL for these cases. Should ras_base not be preinitialized in that for the scenario? >> + >> + if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) || >> + (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM)) { >> + struct cxl_dport *dport; >> + >> + find_cxl_port(&pdev->dev, &dport); >> + ras_base = dport ? dport->regs.ras : NULL; > if (dport) > return dport->regs.ras; >> + } else if (pci_pcie_type(pdev) == PCI_EXP_TYPE_UPSTREAM) { >> + struct cxl_port *port = pci_to_cxl_uport(pdev); >> + >> + ras_base = port ? port->regs.ras : NULL; > if (port) > return port->regs.ras; >> + } > return NULL; >> + >> + return ras_base; >> +} >> + >> +int port_internal_err_cb(struct notifier_block *unused, >> + unsigned long event, void *ptr) >> +{ >> + struct pci_dev *pdev = (struct pci_dev *)ptr; > > I think you can use this notifier for other types of device in future? > If it's going to be global definitely want to check here that we > actually have a CXL port of some type. > > It may be that via the various calls any non CXL device > will result in a safe error. However that's not obvious, so an > upfront check makes sense (or a per device notifier registration!) > cxl_pci_port_ras() performs PCIe type check and sets RAS base to NULL if the type is not a port. >> + void __iomem *ras_base; >> + >> + if (!pdev) >> + return 0; >> + >> + if (event == AER_CORRECTABLE) { >> + ras_base = cxl_pci_port_ras(pdev); >> + __cxl_handle_cor_ras(&pdev->dev, ras_base); > > as below - one line should be fine for this. > >> + } else if ((event == AER_FATAL) || (event == AER_NONFATAL)) { >> + ras_base = cxl_pci_port_ras(pdev); >> + __cxl_handle_ras(&pdev->dev, ras_base); > > __cxl_handle_ras(&pdev->dev, cxl_pci_port_ras(pdev)); > >> + } >> + >> + return 0; >> +} >> +EXPORT_SYMBOL_NS_GPL(port_internal_err_cb, CXL); >> + >> /* >> * Copy the AER capability registers using 32 bit read accesses. >> * This is necessary because RCRB AER capability is MMIO mapped. Clear the >> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c >> index 887ed6e358fb..d0f95c965ab4 100644 >> --- a/drivers/cxl/core/port.c >> +++ b/drivers/cxl/core/port.c >> @@ -1027,7 +1027,7 @@ void put_cxl_root(struct cxl_root *cxl_root) >> } >> EXPORT_SYMBOL_NS_GPL(put_cxl_root, CXL); >> >> -static struct cxl_dport *find_dport(struct cxl_port *port, int id) >> +struct cxl_dport *find_dport(struct cxl_port *port, int id) >> { >> struct cxl_dport *dport; >> unsigned long index; >> @@ -1336,8 +1336,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx) >> return NULL; >> } >> >> -static struct cxl_port *find_cxl_port(struct device *dport_dev, >> - struct cxl_dport **dport) >> +struct cxl_port *find_cxl_port(struct device *dport_dev, >> + struct cxl_dport **dport) >> { >> struct cxl_find_port_ctx ctx = { >> .dport_dev = dport_dev, >> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h >> index 7cee678fdb75..04725344393b 100644 >> --- a/drivers/cxl/cxl.h >> +++ b/drivers/cxl/cxl.h >> @@ -11,6 +11,7 @@ >> #include <linux/log2.h> >> #include <linux/node.h> >> #include <linux/io.h> >> +#include "../pci/pcie/portdrv.h" > > Why add the include? Maybe only needed in c files/ > Ok >> >> /** >> * DOC: cxl objects >> @@ -760,11 +761,15 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port, >> #ifdef CONFIG_PCIEAER_CXL >> void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport); >> void cxl_setup_parent_uport(struct device *host, struct cxl_port *port); >> +int port_internal_err_cb(struct notifier_block *unused, >> + unsigned long event, void *ptr); >> #else >> static inline void cxl_setup_parent_dport(struct device *host, >> struct cxl_dport *dport) { } >> static inline void cxl_setup_parent_uport(struct device *host, >> struct cxl_port *port) { } >> +static inline int port_internal_err_cb(struct notifier_block *unused, >> + unsigned long event, void *ptr) { return 0; } >> #endif >> >> struct cxl_decoder *to_cxl_decoder(struct device *dev); >> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h >> index 93992a1c8eec..6044955e1c48 100644 >> --- a/drivers/cxl/cxlpci.h >> +++ b/drivers/cxl/cxlpci.h >> @@ -130,4 +130,6 @@ void read_cdat_data(struct cxl_port *port); >> void cxl_cor_error_detected(struct pci_dev *pdev); >> pci_ers_result_t cxl_error_detected(struct pci_dev *pdev, >> pci_channel_state_t state); >> +int port_err_nb_cb(struct notifier_block *unused, >> + unsigned long event, void *ptr); >> #endif /* __CXL_PCI_H__ */ >> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c >> index 2ff361e756d6..f4183c5aea38 100644 >> --- a/drivers/cxl/pci.c >> +++ b/drivers/cxl/pci.c >> @@ -926,6 +926,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) >> return rc; >> } >> >> +struct notifier_block port_internal_err_nb = { >> + .notifier_call = port_internal_err_cb, >> +}; >> + >> static const struct pci_device_id cxl_mem_pci_tbl[] = { >> /* PCI class code for CXL.mem Type-3 Devices */ >> { PCI_DEVICE_CLASS((PCI_CLASS_MEMORY_CXL << 8 | CXL_MEMORY_PROGIF), ~0)}, >> @@ -974,6 +978,19 @@ static struct pci_driver cxl_pci_driver = { >> }, >> }; >> >> -module_pci_driver(cxl_pci_driver); >> +static int __init cxl_pci_init(void) >> +{ >> + atomic_notifier_chain_register(&portdrv_aer_internal_err_chain, &port_internal_err_nb); > > Long line that you can easily break. > >> + return pci_register_driver(&cxl_pci_driver); >> +} >> +module_init(cxl_pci_init); >> + >> +static void __exit cxl_pci_exit(void) >> +{ >> + atomic_notifier_chain_unregister(&portdrv_aer_internal_err_chain, &port_internal_err_nb); > > Long line that you can easily break. > >> + pci_unregister_driver(&cxl_pci_driver); >> +} >> +module_exit(cxl_pci_exit); >> + >> MODULE_LICENSE("GPL v2"); >> MODULE_IMPORT_NS(CXL); >
On 6/18/2024 4:04 AM, Terry Bowman wrote: > CXL root ports, CXL downstream switch ports, and CXL upstream switch > ports are bound to the PCIe port bus driver, portdrv. portdrv provides > an atomic notifier chain for reporting PCIe port device AER > correctable internal errors (CIE) and AER uncorrectable internal > errors (UIE). > > CXL PCIe port devices use AER CIE/UIE to report CXL RAS.[1] > > Add a cxl_pci atomic notification callback for handling the portdrv's > AER UIE/CIE notifications. > > Register the atomic notification callback in the cxl_pci module's > load. Unregister the callback in the cxl_pci driver's unload. > > Implement the callback to check if the device parameter is a valid > CXL PCIe port. If it is valid then make the notification callback call > __cxl_handle_cor_ras() or __cxl_handle_ras() depending on the AER > type. > > [1] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and > Upstream Switch Ports > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > --- > drivers/cxl/core/core.h | 4 ++ > drivers/cxl/core/pci.c | 97 ++++++++++++++++++++++++++++++++++++++--- > drivers/cxl/core/port.c | 6 +-- > drivers/cxl/cxl.h | 5 +++ > drivers/cxl/cxlpci.h | 2 + > drivers/cxl/pci.c | 19 +++++++- > 6 files changed, 123 insertions(+), 10 deletions(-) > > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h > index bc5a95665aa0..69bef1db6ee0 100644 > --- a/drivers/cxl/core/core.h > +++ b/drivers/cxl/core/core.h > @@ -94,4 +94,8 @@ int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, > enum access_coordinate_class access); > bool cxl_need_node_perf_attrs_update(int nid); > > +struct cxl_dport *find_dport(struct cxl_port *port, int id); > +struct cxl_port *find_cxl_port(struct device *dport_dev, > + struct cxl_dport **dport); > + > #endif /* __CXL_CORE_H__ */ > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 59a317ab84bb..e630eccb733d 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -689,7 +689,6 @@ EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL); > static void __cxl_handle_cor_ras(struct device *dev, > void __iomem *ras_base) > { > - struct cxl_memdev *cxlmd = to_cxl_memdev(dev); > void __iomem *addr; > u32 status; > > @@ -698,10 +697,17 @@ static void __cxl_handle_cor_ras(struct device *dev, > > addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET; > status = readl(addr); > - if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) { > - writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr); > + > + if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK)) > + return; > + > + writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr); > + if (is_cxl_memdev(dev)) { > + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); > + > trace_cxl_aer_correctable_error(cxlmd, status); > - } > + } else if (dev_is_pci(dev)) > + trace_cxl_port_aer_correctable_error(dev, status); > } > > static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds) > @@ -733,7 +739,6 @@ static void header_log_copy(void __iomem *ras_base, u32 *log) > static bool __cxl_handle_ras(struct device *dev, > void __iomem *ras_base) > { > - struct cxl_memdev *cxlmd = to_cxl_memdev(dev); > u32 hl[CXL_HEADERLOG_SIZE_U32]; > void __iomem *addr; > u32 status; > @@ -759,7 +764,13 @@ static bool __cxl_handle_ras(struct device *dev, > } > > header_log_copy(ras_base, hl); > - trace_cxl_aer_uncorrectable_error(cxlmd, status, fe, hl); > + if (is_cxl_memdev(dev)) { > + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); > + > + trace_cxl_aer_uncorrectable_error(cxlmd, status, fe, hl); > + } else if (dev_is_pci(dev)) > + trace_cxl_port_aer_uncorrectable_error(dev, status); > + > writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr); > > return true; > @@ -882,6 +893,80 @@ static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds, > return __cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras); > } > > +static int match_uport(struct device *dev, void *data) > +{ > + struct device *uport_dev = (struct device *)data; > + struct cxl_port *port; > + > + if (!is_cxl_port(dev)) > + return 0; > + > + port = to_cxl_port(dev); > + > + return (port->uport_dev == uport_dev); > +} > + > +static struct cxl_port *pci_to_cxl_uport(struct pci_dev *pdev) > +{ > + struct cxl_dport *dport; > + struct device *port_dev; > + struct cxl_port *port; > + > + port = find_cxl_port(pdev->dev.parent, &dport); > + if (!port) > + return NULL; > + put_device(&port->dev); > + > + port_dev = device_find_child(&port->dev, &pdev->dev, match_uport); > + if (!port_dev) > + return NULL; seems like just a bus_find_device(&cxl_bus_type, NULL, &pdev->dev, match_uport) can replace these find_cxl_port() and device_find_child(). > + put_device(port_dev); > + > + port = to_cxl_port(port_dev); > + > + return port; > +} > + > +static void __iomem *cxl_pci_port_ras(struct pci_dev *pdev) > +{ > + void __iomem *ras_base = NULL; > + > + if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) || > + (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM)) { > + struct cxl_dport *dport; > + > + find_cxl_port(&pdev->dev, &dport); > + ras_base = dport ? dport->regs.ras : NULL; Need put_device(&port->dev) after find_cxl_port(), use scope-based resource management __free() here should be better. > + } else if (pci_pcie_type(pdev) == PCI_EXP_TYPE_UPSTREAM) { > + struct cxl_port *port = pci_to_cxl_uport(pdev); > + > + ras_base = port ? port->regs.ras : NULL; > + } > + > + return ras_base; > +} > + > +int port_internal_err_cb(struct notifier_block *unused, > + unsigned long event, void *ptr) > +{ > + struct pci_dev *pdev = (struct pci_dev *)ptr; > + void __iomem *ras_base; > + > + if (!pdev) > + return 0; > + > + if (event == AER_CORRECTABLE) { > + ras_base = cxl_pci_port_ras(pdev); > + __cxl_handle_cor_ras(&pdev->dev, ras_base); > + } else if ((event == AER_FATAL) || (event == AER_NONFATAL)) { > + ras_base = cxl_pci_port_ras(pdev); > + __cxl_handle_ras(&pdev->dev, ras_base); > + } > + > + return 0; > +} > +EXPORT_SYMBOL_NS_GPL(port_internal_err_cb, CXL); > + > /* > * Copy the AER capability registers using 32 bit read accesses. > * This is necessary because RCRB AER capability is MMIO mapped. Clear the > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index 887ed6e358fb..d0f95c965ab4 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -1027,7 +1027,7 @@ void put_cxl_root(struct cxl_root *cxl_root) > } > EXPORT_SYMBOL_NS_GPL(put_cxl_root, CXL); > > -static struct cxl_dport *find_dport(struct cxl_port *port, int id) > +struct cxl_dport *find_dport(struct cxl_port *port, int id) > { > struct cxl_dport *dport; > unsigned long index; > @@ -1336,8 +1336,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx) > return NULL; > } > > -static struct cxl_port *find_cxl_port(struct device *dport_dev, > - struct cxl_dport **dport) > +struct cxl_port *find_cxl_port(struct device *dport_dev, > + struct cxl_dport **dport) > { > struct cxl_find_port_ctx ctx = { > .dport_dev = dport_dev, > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index 7cee678fdb75..04725344393b 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -11,6 +11,7 @@ > #include <linux/log2.h> > #include <linux/node.h> > #include <linux/io.h> > +#include "../pci/pcie/portdrv.h" > > /** > * DOC: cxl objects > @@ -760,11 +761,15 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port, > #ifdef CONFIG_PCIEAER_CXL > void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport); > void cxl_setup_parent_uport(struct device *host, struct cxl_port *port); > +int port_internal_err_cb(struct notifier_block *unused, > + unsigned long event, void *ptr); > #else > static inline void cxl_setup_parent_dport(struct device *host, > struct cxl_dport *dport) { } > static inline void cxl_setup_parent_uport(struct device *host, > struct cxl_port *port) { } > +static inline int port_internal_err_cb(struct notifier_block *unused, > + unsigned long event, void *ptr) { return 0; } > #endif > > struct cxl_decoder *to_cxl_decoder(struct device *dev); > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h > index 93992a1c8eec..6044955e1c48 100644 > --- a/drivers/cxl/cxlpci.h > +++ b/drivers/cxl/cxlpci.h > @@ -130,4 +130,6 @@ void read_cdat_data(struct cxl_port *port); > void cxl_cor_error_detected(struct pci_dev *pdev); > pci_ers_result_t cxl_error_detected(struct pci_dev *pdev, > pci_channel_state_t state); > +int port_err_nb_cb(struct notifier_block *unused, > + unsigned long event, void *ptr); > #endif /* __CXL_PCI_H__ */ > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 2ff361e756d6..f4183c5aea38 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -926,6 +926,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > return rc; > } > > +struct notifier_block port_internal_err_nb = { > + .notifier_call = port_internal_err_cb, > +}; > + > static const struct pci_device_id cxl_mem_pci_tbl[] = { > /* PCI class code for CXL.mem Type-3 Devices */ > { PCI_DEVICE_CLASS((PCI_CLASS_MEMORY_CXL << 8 | CXL_MEMORY_PROGIF), ~0)}, > @@ -974,6 +978,19 @@ static struct pci_driver cxl_pci_driver = { > }, > }; > > -module_pci_driver(cxl_pci_driver); > +static int __init cxl_pci_init(void) > +{ > + atomic_notifier_chain_register(&portdrv_aer_internal_err_chain, &port_internal_err_nb); > + return pci_register_driver(&cxl_pci_driver); > +} > +module_init(cxl_pci_init); > + > +static void __exit cxl_pci_exit(void) > +{ > + atomic_notifier_chain_unregister(&portdrv_aer_internal_err_chain, &port_internal_err_nb); > + pci_unregister_driver(&cxl_pci_driver); > +} > +module_exit(cxl_pci_exit); > + > MODULE_LICENSE("GPL v2"); > MODULE_IMPORT_NS(CXL);
On 6/26/24 01:22, Li, Ming4 wrote: > On 6/18/2024 4:04 AM, Terry Bowman wrote: >> CXL root ports, CXL downstream switch ports, and CXL upstream switch >> ports are bound to the PCIe port bus driver, portdrv. portdrv provides >> an atomic notifier chain for reporting PCIe port device AER >> correctable internal errors (CIE) and AER uncorrectable internal >> errors (UIE). >> >> CXL PCIe port devices use AER CIE/UIE to report CXL RAS.[1] >> >> Add a cxl_pci atomic notification callback for handling the portdrv's >> AER UIE/CIE notifications. >> >> Register the atomic notification callback in the cxl_pci module's >> load. Unregister the callback in the cxl_pci driver's unload. >> >> Implement the callback to check if the device parameter is a valid >> CXL PCIe port. If it is valid then make the notification callback call >> __cxl_handle_cor_ras() or __cxl_handle_ras() depending on the AER >> type. >> >> [1] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and >> Upstream Switch Ports >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> --- >> drivers/cxl/core/core.h | 4 ++ >> drivers/cxl/core/pci.c | 97 ++++++++++++++++++++++++++++++++++++++--- >> drivers/cxl/core/port.c | 6 +-- >> drivers/cxl/cxl.h | 5 +++ >> drivers/cxl/cxlpci.h | 2 + >> drivers/cxl/pci.c | 19 +++++++- >> 6 files changed, 123 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h >> index bc5a95665aa0..69bef1db6ee0 100644 >> --- a/drivers/cxl/core/core.h >> +++ b/drivers/cxl/core/core.h >> @@ -94,4 +94,8 @@ int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, >> enum access_coordinate_class access); >> bool cxl_need_node_perf_attrs_update(int nid); >> >> +struct cxl_dport *find_dport(struct cxl_port *port, int id); >> +struct cxl_port *find_cxl_port(struct device *dport_dev, >> + struct cxl_dport **dport); >> + >> #endif /* __CXL_CORE_H__ */ >> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c >> index 59a317ab84bb..e630eccb733d 100644 >> --- a/drivers/cxl/core/pci.c >> +++ b/drivers/cxl/core/pci.c >> @@ -689,7 +689,6 @@ EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL); >> static void __cxl_handle_cor_ras(struct device *dev, >> void __iomem *ras_base) >> { >> - struct cxl_memdev *cxlmd = to_cxl_memdev(dev); >> void __iomem *addr; >> u32 status; >> >> @@ -698,10 +697,17 @@ static void __cxl_handle_cor_ras(struct device *dev, >> >> addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET; >> status = readl(addr); >> - if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) { >> - writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr); >> + >> + if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK)) >> + return; >> + >> + writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr); >> + if (is_cxl_memdev(dev)) { >> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); >> + >> trace_cxl_aer_correctable_error(cxlmd, status); >> - } >> + } else if (dev_is_pci(dev)) >> + trace_cxl_port_aer_correctable_error(dev, status); >> } >> >> static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds) >> @@ -733,7 +739,6 @@ static void header_log_copy(void __iomem *ras_base, u32 *log) >> static bool __cxl_handle_ras(struct device *dev, >> void __iomem *ras_base) >> { >> - struct cxl_memdev *cxlmd = to_cxl_memdev(dev); >> u32 hl[CXL_HEADERLOG_SIZE_U32]; >> void __iomem *addr; >> u32 status; >> @@ -759,7 +764,13 @@ static bool __cxl_handle_ras(struct device *dev, >> } >> >> header_log_copy(ras_base, hl); >> - trace_cxl_aer_uncorrectable_error(cxlmd, status, fe, hl); >> + if (is_cxl_memdev(dev)) { >> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); >> + >> + trace_cxl_aer_uncorrectable_error(cxlmd, status, fe, hl); >> + } else if (dev_is_pci(dev)) >> + trace_cxl_port_aer_uncorrectable_error(dev, status); >> + >> writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr); >> >> return true; >> @@ -882,6 +893,80 @@ static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds, >> return __cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras); >> } >> >> +static int match_uport(struct device *dev, void *data) >> +{ >> + struct device *uport_dev = (struct device *)data; >> + struct cxl_port *port; >> + >> + if (!is_cxl_port(dev)) >> + return 0; >> + >> + port = to_cxl_port(dev); >> + >> + return (port->uport_dev == uport_dev); >> +} >> + >> +static struct cxl_port *pci_to_cxl_uport(struct pci_dev *pdev) >> +{ >> + struct cxl_dport *dport; >> + struct device *port_dev; >> + struct cxl_port *port; >> + >> + port = find_cxl_port(pdev->dev.parent, &dport); >> + if (!port) >> + return NULL; >> + put_device(&port->dev); >> + >> + port_dev = device_find_child(&port->dev, &pdev->dev, match_uport); >> + if (!port_dev) >> + return NULL; > > seems like just a bus_find_device(&cxl_bus_type, NULL, &pdev->dev, match_uport) can replace these find_cxl_port() and device_find_child(). > > That would be a good improvement/optimization. I'll look into making that change. >> + put_device(port_dev); >> + >> + port = to_cxl_port(port_dev); >> + >> + return port; >> +} >> + >> +static void __iomem *cxl_pci_port_ras(struct pci_dev *pdev) >> +{ >> + void __iomem *ras_base = NULL; >> + >> + if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) || >> + (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM)) { >> + struct cxl_dport *dport; >> + >> + find_cxl_port(&pdev->dev, &dport); >> + ras_base = dport ? dport->regs.ras : NULL; > > Need put_device(&port->dev) after find_cxl_port(), use scope-based resource management __free() here should be better. > > Thanks. Regards, Terry
> >> + > >> +static void __iomem *cxl_pci_port_ras(struct pci_dev *pdev) > >> +{ > >> + void __iomem *ras_base = NULL; > > Don't initialize and... > > There is possibility the incorrect PCI type is passed and this is intended to > return NULL for these cases. Should ras_base not be preinitialized in > that for the scenario? From a code point of view at least, nope - just return NULL directly give it's an error case. > > >> + > >> + if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) || > >> + (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM)) { > >> + struct cxl_dport *dport; > >> + > >> + find_cxl_port(&pdev->dev, &dport); > >> + ras_base = dport ? dport->regs.ras : NULL; > > if (dport) > > return dport->regs.ras; > >> + } else if (pci_pcie_type(pdev) == PCI_EXP_TYPE_UPSTREAM) { > >> + struct cxl_port *port = pci_to_cxl_uport(pdev); > >> + > >> + ras_base = port ? port->regs.ras : NULL; > > if (port) > > return port->regs.ras; > >> + } > > return NULL; This is why you don't need to set ras_base. If you get here it's always NULL. > >> + > >> + return ras_base; > >> +} > >> + > >> +int port_internal_err_cb(struct notifier_block *unused, > >> + unsigned long event, void *ptr) > >> +{ > >> + struct pci_dev *pdev = (struct pci_dev *)ptr; > > > > I think you can use this notifier for other types of device in future? > > If it's going to be global definitely want to check here that we > > actually have a CXL port of some type. > > > > It may be that via the various calls any non CXL device > > will result in a safe error. However that's not obvious, so an > > upfront check makes sense (or a per device notifier registration!) > > > > cxl_pci_port_ras() performs PCIe type check and sets RAS base to NULL if
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index bc5a95665aa0..69bef1db6ee0 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -94,4 +94,8 @@ int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, enum access_coordinate_class access); bool cxl_need_node_perf_attrs_update(int nid); +struct cxl_dport *find_dport(struct cxl_port *port, int id); +struct cxl_port *find_cxl_port(struct device *dport_dev, + struct cxl_dport **dport); + #endif /* __CXL_CORE_H__ */ diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index 59a317ab84bb..e630eccb733d 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -689,7 +689,6 @@ EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL); static void __cxl_handle_cor_ras(struct device *dev, void __iomem *ras_base) { - struct cxl_memdev *cxlmd = to_cxl_memdev(dev); void __iomem *addr; u32 status; @@ -698,10 +697,17 @@ static void __cxl_handle_cor_ras(struct device *dev, addr = ras_base + CXL_RAS_CORRECTABLE_STATUS_OFFSET; status = readl(addr); - if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) { - writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr); + + if (!(status & CXL_RAS_CORRECTABLE_STATUS_MASK)) + return; + + writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr); + if (is_cxl_memdev(dev)) { + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + trace_cxl_aer_correctable_error(cxlmd, status); - } + } else if (dev_is_pci(dev)) + trace_cxl_port_aer_correctable_error(dev, status); } static void cxl_handle_endpoint_cor_ras(struct cxl_dev_state *cxlds) @@ -733,7 +739,6 @@ static void header_log_copy(void __iomem *ras_base, u32 *log) static bool __cxl_handle_ras(struct device *dev, void __iomem *ras_base) { - struct cxl_memdev *cxlmd = to_cxl_memdev(dev); u32 hl[CXL_HEADERLOG_SIZE_U32]; void __iomem *addr; u32 status; @@ -759,7 +764,13 @@ static bool __cxl_handle_ras(struct device *dev, } header_log_copy(ras_base, hl); - trace_cxl_aer_uncorrectable_error(cxlmd, status, fe, hl); + if (is_cxl_memdev(dev)) { + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + + trace_cxl_aer_uncorrectable_error(cxlmd, status, fe, hl); + } else if (dev_is_pci(dev)) + trace_cxl_port_aer_uncorrectable_error(dev, status); + writel(status & CXL_RAS_UNCORRECTABLE_STATUS_MASK, addr); return true; @@ -882,6 +893,80 @@ static bool cxl_handle_rdport_ras(struct cxl_dev_state *cxlds, return __cxl_handle_ras(&cxlds->cxlmd->dev, dport->regs.ras); } +static int match_uport(struct device *dev, void *data) +{ + struct device *uport_dev = (struct device *)data; + struct cxl_port *port; + + if (!is_cxl_port(dev)) + return 0; + + port = to_cxl_port(dev); + + return (port->uport_dev == uport_dev); +} + +static struct cxl_port *pci_to_cxl_uport(struct pci_dev *pdev) +{ + struct cxl_dport *dport; + struct device *port_dev; + struct cxl_port *port; + + port = find_cxl_port(pdev->dev.parent, &dport); + if (!port) + return NULL; + put_device(&port->dev); + + port_dev = device_find_child(&port->dev, &pdev->dev, match_uport); + if (!port_dev) + return NULL; + put_device(port_dev); + + port = to_cxl_port(port_dev); + + return port; +} + +static void __iomem *cxl_pci_port_ras(struct pci_dev *pdev) +{ + void __iomem *ras_base = NULL; + + if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) || + (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM)) { + struct cxl_dport *dport; + + find_cxl_port(&pdev->dev, &dport); + ras_base = dport ? dport->regs.ras : NULL; + } else if (pci_pcie_type(pdev) == PCI_EXP_TYPE_UPSTREAM) { + struct cxl_port *port = pci_to_cxl_uport(pdev); + + ras_base = port ? port->regs.ras : NULL; + } + + return ras_base; +} + +int port_internal_err_cb(struct notifier_block *unused, + unsigned long event, void *ptr) +{ + struct pci_dev *pdev = (struct pci_dev *)ptr; + void __iomem *ras_base; + + if (!pdev) + return 0; + + if (event == AER_CORRECTABLE) { + ras_base = cxl_pci_port_ras(pdev); + __cxl_handle_cor_ras(&pdev->dev, ras_base); + } else if ((event == AER_FATAL) || (event == AER_NONFATAL)) { + ras_base = cxl_pci_port_ras(pdev); + __cxl_handle_ras(&pdev->dev, ras_base); + } + + return 0; +} +EXPORT_SYMBOL_NS_GPL(port_internal_err_cb, CXL); + /* * Copy the AER capability registers using 32 bit read accesses. * This is necessary because RCRB AER capability is MMIO mapped. Clear the diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 887ed6e358fb..d0f95c965ab4 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1027,7 +1027,7 @@ void put_cxl_root(struct cxl_root *cxl_root) } EXPORT_SYMBOL_NS_GPL(put_cxl_root, CXL); -static struct cxl_dport *find_dport(struct cxl_port *port, int id) +struct cxl_dport *find_dport(struct cxl_port *port, int id) { struct cxl_dport *dport; unsigned long index; @@ -1336,8 +1336,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx) return NULL; } -static struct cxl_port *find_cxl_port(struct device *dport_dev, - struct cxl_dport **dport) +struct cxl_port *find_cxl_port(struct device *dport_dev, + struct cxl_dport **dport) { struct cxl_find_port_ctx ctx = { .dport_dev = dport_dev, diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 7cee678fdb75..04725344393b 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -11,6 +11,7 @@ #include <linux/log2.h> #include <linux/node.h> #include <linux/io.h> +#include "../pci/pcie/portdrv.h" /** * DOC: cxl objects @@ -760,11 +761,15 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port, #ifdef CONFIG_PCIEAER_CXL void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport); void cxl_setup_parent_uport(struct device *host, struct cxl_port *port); +int port_internal_err_cb(struct notifier_block *unused, + unsigned long event, void *ptr); #else static inline void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport) { } static inline void cxl_setup_parent_uport(struct device *host, struct cxl_port *port) { } +static inline int port_internal_err_cb(struct notifier_block *unused, + unsigned long event, void *ptr) { return 0; } #endif struct cxl_decoder *to_cxl_decoder(struct device *dev); diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h index 93992a1c8eec..6044955e1c48 100644 --- a/drivers/cxl/cxlpci.h +++ b/drivers/cxl/cxlpci.h @@ -130,4 +130,6 @@ void read_cdat_data(struct cxl_port *port); void cxl_cor_error_detected(struct pci_dev *pdev); pci_ers_result_t cxl_error_detected(struct pci_dev *pdev, pci_channel_state_t state); +int port_err_nb_cb(struct notifier_block *unused, + unsigned long event, void *ptr); #endif /* __CXL_PCI_H__ */ diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 2ff361e756d6..f4183c5aea38 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -926,6 +926,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) return rc; } +struct notifier_block port_internal_err_nb = { + .notifier_call = port_internal_err_cb, +}; + static const struct pci_device_id cxl_mem_pci_tbl[] = { /* PCI class code for CXL.mem Type-3 Devices */ { PCI_DEVICE_CLASS((PCI_CLASS_MEMORY_CXL << 8 | CXL_MEMORY_PROGIF), ~0)}, @@ -974,6 +978,19 @@ static struct pci_driver cxl_pci_driver = { }, }; -module_pci_driver(cxl_pci_driver); +static int __init cxl_pci_init(void) +{ + atomic_notifier_chain_register(&portdrv_aer_internal_err_chain, &port_internal_err_nb); + return pci_register_driver(&cxl_pci_driver); +} +module_init(cxl_pci_init); + +static void __exit cxl_pci_exit(void) +{ + atomic_notifier_chain_unregister(&portdrv_aer_internal_err_chain, &port_internal_err_nb); + pci_unregister_driver(&cxl_pci_driver); +} +module_exit(cxl_pci_exit); + MODULE_LICENSE("GPL v2"); MODULE_IMPORT_NS(CXL);
CXL root ports, CXL downstream switch ports, and CXL upstream switch ports are bound to the PCIe port bus driver, portdrv. portdrv provides an atomic notifier chain for reporting PCIe port device AER correctable internal errors (CIE) and AER uncorrectable internal errors (UIE). CXL PCIe port devices use AER CIE/UIE to report CXL RAS.[1] Add a cxl_pci atomic notification callback for handling the portdrv's AER UIE/CIE notifications. Register the atomic notification callback in the cxl_pci module's load. Unregister the callback in the cxl_pci driver's unload. Implement the callback to check if the device parameter is a valid CXL PCIe port. If it is valid then make the notification callback call __cxl_handle_cor_ras() or __cxl_handle_ras() depending on the AER type. [1] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and Upstream Switch Ports Signed-off-by: Terry Bowman <terry.bowman@amd.com> --- drivers/cxl/core/core.h | 4 ++ drivers/cxl/core/pci.c | 97 ++++++++++++++++++++++++++++++++++++++--- drivers/cxl/core/port.c | 6 +-- drivers/cxl/cxl.h | 5 +++ drivers/cxl/cxlpci.h | 2 + drivers/cxl/pci.c | 19 +++++++- 6 files changed, 123 insertions(+), 10 deletions(-)