Message ID | 20230831170248.185078-5-terry.bowman@amd.com |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
Terry Bowman wrote: > The RCH root port contains root command AER registers that should not be > enabled.[1] Disable these to prevent root port interrupts. > > [1] CXL 3.0 - 12.2.1.1 RCH Downstream Port-detected Errors > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Signed-off-by: Robert Richter <rrichter@amd.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Reviewed-by: Dave Jiang <dave.jiang@intel.com> [..] > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index 2a22a7ed4704..d195af72ed65 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -1042,6 +1042,9 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev, > > cxl_dport_map_regs(dport); > > + if (dport->rch) > + cxl_disable_rch_root_ints(dport); > + Similar to the comment about cxl_dport_map_regs() not being appropriate in an enumeration routine, this also needs to move out of _add_dport. It occurs to me that it should also be undone on driver detach just like other device "enables".
Hi Dan, I added comments below. On 9/15/23 13:43, Dan Williams wrote: > Terry Bowman wrote: >> The RCH root port contains root command AER registers that should not be >> enabled.[1] Disable these to prevent root port interrupts. >> >> [1] CXL 3.0 - 12.2.1.1 RCH Downstream Port-detected Errors >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Signed-off-by: Robert Richter <rrichter@amd.com> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> Reviewed-by: Dave Jiang <dave.jiang@intel.com> > [..] >> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c >> index 2a22a7ed4704..d195af72ed65 100644 >> --- a/drivers/cxl/core/port.c >> +++ b/drivers/cxl/core/port.c >> @@ -1042,6 +1042,9 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev, >> >> cxl_dport_map_regs(dport); >> >> + if (dport->rch) >> + cxl_disable_rch_root_ints(dport); >> + > > Similar to the comment about cxl_dport_map_regs() not being appropriate > in an enumeration routine, this also needs to move out of _add_dport. It > occurs to me that it should also be undone on driver detach just like > other device "enables". Ok. I will move out of enumeration. Per the 'undo' request: This is a RCH downstream port (dport) with PCIe root port capability. PCI spec states root port error reporting is disabled by default at powerup. And SW does *not* enable the root port errors because the RCH dport is *not* bound to a root port driver (missing BDF, etc). This mask is added to follow the CXL spec precisely and if the rest of the system behaves as expected should not be necessary. I don't believe masking should be 'undone' in driver detach or elsewhere. Adding the 'undo' masking would potentially introduce RCH dport root port interrupt reporting which is incorrect for the RCH/RCD mode. Only CXL components (device, uport, switch) may reside under the RCH dport and never want RCH dport reporting root port errors. RCEC reports the root complex errors in RCH/RCD mode. Regards, Terry
Terry Bowman wrote: > Hi Dan, > > I added comments below. > > On 9/15/23 13:43, Dan Williams wrote: > > Terry Bowman wrote: > >> The RCH root port contains root command AER registers that should not be > >> enabled.[1] Disable these to prevent root port interrupts. > >> > >> [1] CXL 3.0 - 12.2.1.1 RCH Downstream Port-detected Errors > >> > >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> > >> Signed-off-by: Robert Richter <rrichter@amd.com> > >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > >> Reviewed-by: Dave Jiang <dave.jiang@intel.com> > > [..] > >> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > >> index 2a22a7ed4704..d195af72ed65 100644 > >> --- a/drivers/cxl/core/port.c > >> +++ b/drivers/cxl/core/port.c > >> @@ -1042,6 +1042,9 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev, > >> > >> cxl_dport_map_regs(dport); > >> > >> + if (dport->rch) > >> + cxl_disable_rch_root_ints(dport); > >> + > > > > Similar to the comment about cxl_dport_map_regs() not being appropriate > > in an enumeration routine, this also needs to move out of _add_dport. It > > occurs to me that it should also be undone on driver detach just like > > other device "enables". > > Ok. I will move out of enumeration. > > Per the 'undo' request: This is a RCH downstream port (dport) with PCIe root port > capability. PCI spec states root port error reporting is disabled by default at > powerup. And SW does *not* enable the root port errors because the RCH dport is *not* > bound to a root port driver (missing BDF, etc). This mask is added to follow the > CXL spec precisely and if the rest of the system behaves as expected should not > be necessary. Ah, got it perhaps add a comment to sanity check that the hardware is in the per-spec state. Are you certain that even in firmware-first error handling it is safe for the driver to unconditionally disable these interrupts? > I don't believe masking should be 'undone' in driver detach or elsewhere. Adding > the 'undo' masking would potentially introduce RCH dport root port interrupt > reporting which is incorrect for the RCH/RCD mode. Only CXL components (device, > uport, switch) may reside under the RCH dport and never want RCH dport reporting > root port errors. RCEC reports the root complex errors in RCH/RCD mode. Ok, that also seems to suggest that even in the firmware-first case the driver should make sure they are off per-spec.
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index f470ef5c0a6a..6b037030b936 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -87,4 +87,10 @@ enum cxl_poison_trace_type { CXL_POISON_TRACE_CLEAR, }; +#ifdef CONFIG_PCIEAER_CXL +void cxl_disable_rch_root_ints(struct cxl_dport *dport); +#else +static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }; +#endif + #endif /* __CXL_CORE_H__ */ diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index 1c40270968b6..e306d3c9638b 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -819,6 +819,35 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) cxl_handle_rdport_ras(cxlds, dport); } +void cxl_disable_rch_root_ints(struct cxl_dport *dport) +{ + void __iomem *aer_base = dport->regs.dport_aer; + struct pci_host_bridge *bridge; + u32 aer_cmd_mask, aer_cmd; + + if (!aer_base) + return; + + bridge = to_pci_host_bridge(dport->dport_dev); + + /* + * Disable RCH root port command interrupts. + * CXL 3.0 12.2.1.1 - RCH Downstream Port-detected Errors + * + * This sequence may not be necessary. CXL spec states disabling + * the root cmd register's interrupts is required. But, PCI spec + * shows these are disabled by default on reset. + */ + if (bridge->native_cxl_error) { + aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN | + PCI_ERR_ROOT_CMD_NONFATAL_EN | + PCI_ERR_ROOT_CMD_FATAL_EN); + aer_cmd = readl(aer_base + PCI_ERR_ROOT_COMMAND); + aer_cmd &= ~aer_cmd_mask; + writel(aer_cmd, aer_base + PCI_ERR_ROOT_COMMAND); + } +} + #else static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { } #endif diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 2a22a7ed4704..d195af72ed65 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1042,6 +1042,9 @@ __devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev, cxl_dport_map_regs(dport); + if (dport->rch) + cxl_disable_rch_root_ints(dport); + cond_cxl_root_lock(port); rc = add_dport(port, dport); cond_cxl_root_unlock(port);