diff mbox series

cxl/pci: Change CXL AER support check to use native AER

Message ID 20231102155232.1421261-1-terry.bowman@amd.com
State Accepted
Commit b3741ac86c8e648709506102f7ab51905d50df43
Headers show
Series cxl/pci: Change CXL AER support check to use native AER | expand

Commit Message

Terry Bowman Nov. 2, 2023, 3:52 p.m. UTC
Native CXL protocol errors are delivered to the OS through AER
reporting. The owner of AER owns CXL Protocol error management with
respect to _OSC negotiation.[1] CXL device errors are handled by a
separate interrupt with native control gated by _OSC control field
'CXL Memory Error Reporting Control'.

The CXL driver incorrectly checks for 'CXL Memory Error Reporting
Control' before accessing AER registers and caching RCH downport
AER registers. Replace the current check in these 2 cases with
native AER checks.

[1] CXL 3.0 - 9.17.2 CXL _OSC, Table-9-26, Interpretation of CXL
_OSC Support Fields, p.641

Fixes: 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport")
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
---
 drivers/cxl/core/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Alison Schofield Nov. 2, 2023, 8:19 p.m. UTC | #1
On Thu, Nov 02, 2023 at 10:52:32AM -0500, Terry Bowman wrote:
> Native CXL protocol errors are delivered to the OS through AER
> reporting. The owner of AER owns CXL Protocol error management with
> respect to _OSC negotiation.[1] CXL device errors are handled by a
> separate interrupt with native control gated by _OSC control field
> 'CXL Memory Error Reporting Control'.
> 
> The CXL driver incorrectly checks for 'CXL Memory Error Reporting
> Control' before accessing AER registers and caching RCH downport
> AER registers. Replace the current check in these 2 cases with
> native AER checks.

Hi Terry,  Does this have a user visible impact? 

Alison

>
--snip
Dan Williams Nov. 2, 2023, 9:09 p.m. UTC | #2
Terry Bowman wrote:
> Native CXL protocol errors are delivered to the OS through AER
> reporting. The owner of AER owns CXL Protocol error management with
> respect to _OSC negotiation.[1] CXL device errors are handled by a
> separate interrupt with native control gated by _OSC control field
> 'CXL Memory Error Reporting Control'.
> 
> The CXL driver incorrectly checks for 'CXL Memory Error Reporting
> Control' before accessing AER registers and caching RCH downport
> AER registers. Replace the current check in these 2 cases with
> native AER checks.
> 
> [1] CXL 3.0 - 9.17.2 CXL _OSC, Table-9-26, Interpretation of CXL
> _OSC Support Fields, p.641

Makes sense, applied.
Dan Williams Nov. 2, 2023, 9:13 p.m. UTC | #3
Alison Schofield wrote:
> On Thu, Nov 02, 2023 at 10:52:32AM -0500, Terry Bowman wrote:
> > Native CXL protocol errors are delivered to the OS through AER
> > reporting. The owner of AER owns CXL Protocol error management with
> > respect to _OSC negotiation.[1] CXL device errors are handled by a
> > separate interrupt with native control gated by _OSC control field
> > 'CXL Memory Error Reporting Control'.
> > 
> > The CXL driver incorrectly checks for 'CXL Memory Error Reporting
> > Control' before accessing AER registers and caching RCH downport
> > AER registers. Replace the current check in these 2 cases with
> > native AER checks.
> 
> Hi Terry,  Does this have a user visible impact? 

Saw this after I applied it. It is good feedback in general.

The reason I did not ask for this clarification was that this is fixing
brand new code and was just using the wrong flag, so I had the context.
A backporter will never need to make a judgement call about this patch.

The end user impact is that CXL protocol errors that could be handled by
AER will not be handled if Linux failed to negotiate memory error
handling. Memory errors are strictly related to memory-error-record
events, not protocol errors.
Dan Williams Nov. 2, 2023, 9:31 p.m. UTC | #4
Dan Williams wrote:
> Alison Schofield wrote:
> > On Thu, Nov 02, 2023 at 10:52:32AM -0500, Terry Bowman wrote:
> > > Native CXL protocol errors are delivered to the OS through AER
> > > reporting. The owner of AER owns CXL Protocol error management with
> > > respect to _OSC negotiation.[1] CXL device errors are handled by a
> > > separate interrupt with native control gated by _OSC control field
> > > 'CXL Memory Error Reporting Control'.
> > > 
> > > The CXL driver incorrectly checks for 'CXL Memory Error Reporting
> > > Control' before accessing AER registers and caching RCH downport
> > > AER registers. Replace the current check in these 2 cases with
> > > native AER checks.
> > 
> > Hi Terry,  Does this have a user visible impact? 
> 
> Saw this after I applied it. It is good feedback in general.
> 
> The reason I did not ask for this clarification was that this is fixing
> brand new code and was just using the wrong flag, so I had the context.
> A backporter will never need to make a judgement call about this patch.
> 
> The end user impact is that CXL protocol errors that could be handled by
> AER will not be handled if Linux failed to negotiate memory error
> handling. Memory errors are strictly related to memory-error-record
> events, not protocol errors.

However, to that point the "Fixes:" tag looks wrong, it should be:

f05fd10d138d cxl/pci: Add RCH downstream port AER register discovery
Terry Bowman Nov. 2, 2023, 11:24 p.m. UTC | #5
Hi Dan and Allison,

On 11/2/23 16:31, Dan Williams wrote:
> Dan Williams wrote:
>> Alison Schofield wrote:
>>> On Thu, Nov 02, 2023 at 10:52:32AM -0500, Terry Bowman wrote:
>>>> Native CXL protocol errors are delivered to the OS through AER
>>>> reporting. The owner of AER owns CXL Protocol error management with
>>>> respect to _OSC negotiation.[1] CXL device errors are handled by a
>>>> separate interrupt with native control gated by _OSC control field
>>>> 'CXL Memory Error Reporting Control'.
>>>>
>>>> The CXL driver incorrectly checks for 'CXL Memory Error Reporting
>>>> Control' before accessing AER registers and caching RCH downport
>>>> AER registers. Replace the current check in these 2 cases with
>>>> native AER checks.
>>>
>>> Hi Terry,  Does this have a user visible impact? 
>>
>> Saw this after I applied it. It is good feedback in general.
>>
>> The reason I did not ask for this clarification was that this is fixing
>> brand new code and was just using the wrong flag, so I had the context.
>> A backporter will never need to make a judgement call about this patch.
>>
>> The end user impact is that CXL protocol errors that could be handled by
>> AER will not be handled if Linux failed to negotiate memory error
>> handling. Memory errors are strictly related to memory-error-record
>> events, not protocol errors.
> 
Right, end user impact is RCH error handling will require using native 
memory error/event _OSC control inorder for protocol errors to be logged.

> However, to that point the "Fixes:" tag looks wrong, it should be:
> 
> f05fd10d138d cxl/pci: Add RCH downstream port AER register discovery

Correct, it is f05fd10d138d.

Regards,
Terry
diff mbox series

Patch

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 01c441f2e25e..b29f6d09744b 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -812,7 +812,7 @@  static void cxl_disable_rch_root_ints(struct cxl_dport *dport)
 	 * the root cmd register's interrupts is required. But, PCI spec
 	 * shows these are disabled by default on reset.
 	 */
-	if (bridge->native_cxl_error) {
+	if (bridge->native_aer) {
 		aer_cmd_mask = (PCI_ERR_ROOT_CMD_COR_EN |
 				PCI_ERR_ROOT_CMD_NONFATAL_EN |
 				PCI_ERR_ROOT_CMD_FATAL_EN);
@@ -828,7 +828,7 @@  void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport)
 	struct pci_host_bridge *host_bridge;
 
 	host_bridge = to_pci_host_bridge(dport_dev);
-	if (host_bridge->native_cxl_error)
+	if (host_bridge->native_aer)
 		dport->rcrb.aer_cap = cxl_rcrb_to_aer(dport_dev, dport->rcrb.base);
 
 	dport->reg_map.host = host;