Message ID | 20231102155232.1421261-1-terry.bowman@amd.com |
---|---|
State | Accepted |
Commit | b3741ac86c8e648709506102f7ab51905d50df43 |
Headers | show |
Series | cxl/pci: Change CXL AER support check to use native AER | expand |
On Thu, Nov 02, 2023 at 10:52:32AM -0500, Terry Bowman wrote: > Native CXL protocol errors are delivered to the OS through AER > reporting. The owner of AER owns CXL Protocol error management with > respect to _OSC negotiation.[1] CXL device errors are handled by a > separate interrupt with native control gated by _OSC control field > 'CXL Memory Error Reporting Control'. > > The CXL driver incorrectly checks for 'CXL Memory Error Reporting > Control' before accessing AER registers and caching RCH downport > AER registers. Replace the current check in these 2 cases with > native AER checks. Hi Terry, Does this have a user visible impact? Alison > --snip
Terry Bowman wrote: > Native CXL protocol errors are delivered to the OS through AER > reporting. The owner of AER owns CXL Protocol error management with > respect to _OSC negotiation.[1] CXL device errors are handled by a > separate interrupt with native control gated by _OSC control field > 'CXL Memory Error Reporting Control'. > > The CXL driver incorrectly checks for 'CXL Memory Error Reporting > Control' before accessing AER registers and caching RCH downport > AER registers. Replace the current check in these 2 cases with > native AER checks. > > [1] CXL 3.0 - 9.17.2 CXL _OSC, Table-9-26, Interpretation of CXL > _OSC Support Fields, p.641 Makes sense, applied.
Alison Schofield wrote: > On Thu, Nov 02, 2023 at 10:52:32AM -0500, Terry Bowman wrote: > > Native CXL protocol errors are delivered to the OS through AER > > reporting. The owner of AER owns CXL Protocol error management with > > respect to _OSC negotiation.[1] CXL device errors are handled by a > > separate interrupt with native control gated by _OSC control field > > 'CXL Memory Error Reporting Control'. > > > > The CXL driver incorrectly checks for 'CXL Memory Error Reporting > > Control' before accessing AER registers and caching RCH downport > > AER registers. Replace the current check in these 2 cases with > > native AER checks. > > Hi Terry, Does this have a user visible impact? Saw this after I applied it. It is good feedback in general. The reason I did not ask for this clarification was that this is fixing brand new code and was just using the wrong flag, so I had the context. A backporter will never need to make a judgement call about this patch. The end user impact is that CXL protocol errors that could be handled by AER will not be handled if Linux failed to negotiate memory error handling. Memory errors are strictly related to memory-error-record events, not protocol errors.
Dan Williams wrote: > Alison Schofield wrote: > > On Thu, Nov 02, 2023 at 10:52:32AM -0500, Terry Bowman wrote: > > > Native CXL protocol errors are delivered to the OS through AER > > > reporting. The owner of AER owns CXL Protocol error management with > > > respect to _OSC negotiation.[1] CXL device errors are handled by a > > > separate interrupt with native control gated by _OSC control field > > > 'CXL Memory Error Reporting Control'. > > > > > > The CXL driver incorrectly checks for 'CXL Memory Error Reporting > > > Control' before accessing AER registers and caching RCH downport > > > AER registers. Replace the current check in these 2 cases with > > > native AER checks. > > > > Hi Terry, Does this have a user visible impact? > > Saw this after I applied it. It is good feedback in general. > > The reason I did not ask for this clarification was that this is fixing > brand new code and was just using the wrong flag, so I had the context. > A backporter will never need to make a judgement call about this patch. > > The end user impact is that CXL protocol errors that could be handled by > AER will not be handled if Linux failed to negotiate memory error > handling. Memory errors are strictly related to memory-error-record > events, not protocol errors. However, to that point the "Fixes:" tag looks wrong, it should be: f05fd10d138d cxl/pci: Add RCH downstream port AER register discovery
Hi Dan and Allison, On 11/2/23 16:31, Dan Williams wrote: > Dan Williams wrote: >> Alison Schofield wrote: >>> On Thu, Nov 02, 2023 at 10:52:32AM -0500, Terry Bowman wrote: >>>> Native CXL protocol errors are delivered to the OS through AER >>>> reporting. The owner of AER owns CXL Protocol error management with >>>> respect to _OSC negotiation.[1] CXL device errors are handled by a >>>> separate interrupt with native control gated by _OSC control field >>>> 'CXL Memory Error Reporting Control'. >>>> >>>> The CXL driver incorrectly checks for 'CXL Memory Error Reporting >>>> Control' before accessing AER registers and caching RCH downport >>>> AER registers. Replace the current check in these 2 cases with >>>> native AER checks. >>> >>> Hi Terry, Does this have a user visible impact? >> >> Saw this after I applied it. It is good feedback in general. >> >> The reason I did not ask for this clarification was that this is fixing >> brand new code and was just using the wrong flag, so I had the context. >> A backporter will never need to make a judgement call about this patch. >> >> The end user impact is that CXL protocol errors that could be handled by >> AER will not be handled if Linux failed to negotiate memory error >> handling. Memory errors are strictly related to memory-error-record >> events, not protocol errors. > Right, end user impact is RCH error handling will require using native memory error/event _OSC control inorder for protocol errors to be logged. > However, to that point the "Fixes:" tag looks wrong, it should be: > > f05fd10d138d cxl/pci: Add RCH downstream port AER register discovery Correct, it is f05fd10d138d. Regards, Terry
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index 01c441f2e25e..b29f6d09744b 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -812,7 +812,7 @@ static void cxl_disable_rch_root_ints(struct cxl_dport *dport) * the root cmd register's interrupts is required. But, PCI spec * shows these are disabled by default on reset. */ - if (bridge->native_cxl_error) { + if (bridge->native_aer) { aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN | PCI_ERR_ROOT_CMD_NONFATAL_EN | PCI_ERR_ROOT_CMD_FATAL_EN); @@ -828,7 +828,7 @@ void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport) struct pci_host_bridge *host_bridge; host_bridge = to_pci_host_bridge(dport_dev); - if (host_bridge->native_cxl_error) + if (host_bridge->native_aer) dport->rcrb.aer_cap = cxl_rcrb_to_aer(dport_dev, dport->rcrb.base); dport->reg_map.host = host;