[V2,4/9] PCI/AER: Extend AER error handling to RCECs
diff mbox series

Message ID 20200804194052.193272-5-sean.v.kelley@intel.com
State Superseded
Headers show
Series
  • Add RCEC handling to PCI/AER
Related show

Commit Message

Sean V Kelley Aug. 4, 2020, 7:40 p.m. UTC
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Currently the kernel does not handle AER errors for Root Complex integrated
End Points (RCiEPs)[0]. These devices sit on a root bus within the Root Complex
(RC). AER handling is performed by a Root Complex Event Collector (RCEC) [1]
which is a effectively a type of RCiEP on the same root bus.

For an RCEC (technically not a Bridge), error messages "received" from
associated RCiEPs must be enabled for "transmission" in order to cause a
System Error via the Root Control register or (when the Advanced Error
Reporting Capability is present) reporting via the Root Error Command
register and logging in the Root Error Status register and Error Source
Identification register.

In addition to the defined OS level handling of the reset flow for the
associated RCiEPs of an RCEC, it is possible to also have a firmware first
model. In that case there is no need to take any actions on the RCEC because
the firmware is responsible for them. This is true where APEI [2] is used
to report the AER errors via a GHES[v2] HEST entry [3] and relevant
AER CPER record [4] and Firmware First handling is in use.

We effectively end up with two different types of discovery for
purposes of handling AER errors:

1) Normal bus walk - we pass the downstream port above a bus to which
the device is attached and it walks everything below that point.

2) An RCiEP with no visible association with an RCEC as there is no need to
walk devices. In that case, the flow is to just call the callbacks for the actual
device.

A new walk function, similar to pci_bus_walk is provided that takes a pci_dev
instead of a bus. If that dev corresponds to a downstream port it will walk
the subordinate bus of that downstream port. If the dev does not then it
will call the function on that device alone.

[0] ACPI PCI Express Base Specification 5.0-1 1.3.2.3 Root Complex Integrated
    Endpoint Rules.
[1] ACPI PCI Express Base Specification 5.0-1 6.2 Error Signalling and Logging
[2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface (APEI)
[3] ACPI Specification 6.3 18.2.3.7 Generic Hardware Error Source
[4] UEFI Specification 2.8, N.2.7 PCI Express Error Section

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
---
 drivers/pci/pcie/err.c | 59 +++++++++++++++++++++++++++++++++---------
 1 file changed, 47 insertions(+), 12 deletions(-)

Comments

Bjorn Helgaas Aug. 7, 2020, 10:53 p.m. UTC | #1
On Tue, Aug 04, 2020 at 12:40:47PM -0700, Sean V Kelley wrote:
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> Currently the kernel does not handle AER errors for Root Complex integrated
> End Points (RCiEPs)[0]. These devices sit on a root bus within the Root Complex
> (RC). AER handling is performed by a Root Complex Event Collector (RCEC) [1]
> which is a effectively a type of RCiEP on the same root bus.
> 
> For an RCEC (technically not a Bridge), error messages "received" from
> associated RCiEPs must be enabled for "transmission" in order to cause a
> System Error via the Root Control register or (when the Advanced Error
> Reporting Capability is present) reporting via the Root Error Command
> register and logging in the Root Error Status register and Error Source
> Identification register.
> 
> In addition to the defined OS level handling of the reset flow for the
> associated RCiEPs of an RCEC, it is possible to also have a firmware first
> model. In that case there is no need to take any actions on the RCEC because
> the firmware is responsible for them. This is true where APEI [2] is used
> to report the AER errors via a GHES[v2] HEST entry [3] and relevant
> AER CPER record [4] and Firmware First handling is in use.

I don't see anything in the patch that mentions "firmware first." Do
we need it in the commit log?  After
https://git.kernel.org/linus/708b20003624 ("PCI/AER: Remove
HEST/FIRMWARE_FIRST parsing for AER ownership"), I think we no longer 
know anything about firmware-first in the kernel.

> We effectively end up with two different types of discovery for
> purposes of handling AER errors:
> 
> 1) Normal bus walk - we pass the downstream port above a bus to which
> the device is attached and it walks everything below that point.
> 
> 2) An RCiEP with no visible association with an RCEC as there is no need to
> walk devices. In that case, the flow is to just call the callbacks for the actual
> device.
> 
> A new walk function, similar to pci_bus_walk is provided that takes a pci_dev
> instead of a bus. If that dev corresponds to a downstream port it will walk
> the subordinate bus of that downstream port. If the dev does not then it
> will call the function on that device alone.

Maybe mention the new function name here?

Add "()" after function names in commit logs and comments so they
don't look like English words.

Wrap commit logs so they fit in 75 columns, so they don't wrap when
"git log" indents them in a default 80 column window.  Yes, I know I
could use wider windows, but I'd still want *some* default so commits
don't just have random widths.

> [0] ACPI PCI Express Base Specification 5.0-1 1.3.2.3 Root Complex Integrated
>     Endpoint Rules.
> [1] ACPI PCI Express Base Specification 5.0-1 6.2 Error Signalling and Logging
> [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface (APEI)
> [3] ACPI Specification 6.3 18.2.3.7 Generic Hardware Error Source
> [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
> ---
>  drivers/pci/pcie/err.c | 59 +++++++++++++++++++++++++++++++++---------
>  1 file changed, 47 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index c543f419d8f9..682302dfb55b 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -146,38 +146,69 @@ static int report_resume(struct pci_dev *dev, void *data)
>  	return 0;
>  }
>  
> +/**
> + * pci_walk_dev_affected - walk devices potentially AER affected
> + * @dev      device which may be an RCEC with associated RCiEPs,
> + *           an RCiEP associated with an RCEC, or a Port.

Does this mean that if dev is an RCEC, we call the callback for the
*RCEC* itself?  I would have thought we'd want to do that for the
associated *RCiEPs*?

> + * @cb       callback to be called for each device found
> + * @userdata arbitrary pointer to be passed to callback.
> + *
> + * If the device provided is a port, walk the subordinate bus,

This usage of "port" doesn't seem quite right.  "Port" includes root
ports, switch upstream ports, switch downstream ports, *and* the
upstream ports on endpoints.  The endpoint upstream ports obviously
don't have subordinate buses.  We typically use "bridge" as the
generic term for something with a subordinate bus.

> + * including any bridged devices on buses under this bus.
> + * Call the provided callback on each device found.
> + *
> + * If the device provided has no subordinate bus, call the provided
> + * callback on the device itself.
> + */
> +static void pci_walk_dev_affected(struct pci_dev *dev, int (*cb)(struct pci_dev *, void *),

I don't understand the "affected" reference in the function name.
This doesn't test anything to see whether devices are "affected".
Naming is the hardest part of programming :)

> +				  void *userdata)
> +{
> +	if (dev->subordinate) {
> +		pci_walk_bus(dev->subordinate, cb, userdata);
> +	} else {
> +		cb(dev, userdata);
> +	}

Typical Linux style omits {} for single-line if/else branches.

> +}
> +
>  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  			pci_channel_state_t state,
>  			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
>  {
>  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> -	struct pci_bus *bus;
>  
>  	/*
>  	 * Error recovery runs on all subordinates of the first downstream port.
>  	 * If the downstream port detected the error, it is cleared at the end.
> +	 * For RCiEPs we should reset just the RCiEP itself.
>  	 */
>  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END ||
> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC))
>  		dev = dev->bus->self;
> -	bus = dev->subordinate;
>  
>  	pci_dbg(dev, "broadcast error_detected message\n");
>  	if (state == pci_channel_io_frozen) {
> -		pci_walk_bus(bus, report_frozen_detected, &status);
> +		pci_walk_dev_affected(dev, report_frozen_detected, &status);
> +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
> +			pci_warn(dev, "link reset not possible for RCiEP\n");
> +			status = PCI_ERS_RESULT_NONE;
> +			goto failed;
> +		}
> +
>  		status = reset_link(dev);

reset_link() might be misnamed.  IIUC "dev" is a bridge, and the point
is really to reset any devices below "dev."  Whether we do that by
resetting link, DPC trigger, secondary bus reset, FLR, etc, is sort of
immaterial.  Some of those methods might be applicable for RCiEPs.

But you didn't add that name; I'm just trying to understand this
better.

>  		if (status != PCI_ERS_RESULT_RECOVERED) {
>  			pci_warn(dev, "link reset failed\n");
>  			goto failed;
>  		}
>  	} else {
> -		pci_walk_bus(bus, report_normal_detected, &status);
> +		pci_walk_dev_affected(dev, report_normal_detected, &status);
>  	}
>  
>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>  		status = PCI_ERS_RESULT_RECOVERED;
>  		pci_dbg(dev, "broadcast mmio_enabled message\n");
> -		pci_walk_bus(bus, report_mmio_enabled, &status);
> +		pci_walk_dev_affected(dev, report_mmio_enabled, &status);
>  	}
>  
>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> @@ -188,18 +219,22 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  		 */
>  		status = PCI_ERS_RESULT_RECOVERED;
>  		pci_dbg(dev, "broadcast slot_reset message\n");
> -		pci_walk_bus(bus, report_slot_reset, &status);
> +		pci_walk_dev_affected(dev, report_slot_reset, &status);
>  	}
>  
>  	if (status != PCI_ERS_RESULT_RECOVERED)
>  		goto failed;
>  
>  	pci_dbg(dev, "broadcast resume message\n");
> -	pci_walk_bus(bus, report_resume, &status);
> -
> -	if (pcie_aer_is_native(dev))
> -		pcie_clear_device_status(dev);
> -	pci_aer_clear_nonfatal_status(dev);
> +	pci_walk_dev_affected(dev, report_resume, &status);
> +
> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)) {
> +		if (pcie_aer_is_native(dev))
> +			pcie_clear_device_status(dev);
> +		pci_aer_clear_nonfatal_status(dev);

This change (testing pci_pcie_type()) looks like it's not strictly
related to the rest of this patch and maybe should be split out into
its own patch?

> +	}
>  	pci_info(dev, "device recovery successful\n");
>  	return status;
>  
> -- 
> 2.27.0
>
Sean V Kelley Aug. 8, 2020, 12:55 a.m. UTC | #2
On 7 Aug 2020, at 15:53, Bjorn Helgaas wrote:

> On Tue, Aug 04, 2020 at 12:40:47PM -0700, Sean V Kelley wrote:
>> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>
>> Currently the kernel does not handle AER errors for Root Complex 
>> integrated
>> End Points (RCiEPs)[0]. These devices sit on a root bus within the 
>> Root Complex
>> (RC). AER handling is performed by a Root Complex Event Collector 
>> (RCEC) [1]
>> which is a effectively a type of RCiEP on the same root bus.
>>
>> For an RCEC (technically not a Bridge), error messages "received" 
>> from
>> associated RCiEPs must be enabled for "transmission" in order to 
>> cause a
>> System Error via the Root Control register or (when the Advanced 
>> Error
>> Reporting Capability is present) reporting via the Root Error Command
>> register and logging in the Root Error Status register and Error 
>> Source
>> Identification register.
>>
>> In addition to the defined OS level handling of the reset flow for 
>> the
>> associated RCiEPs of an RCEC, it is possible to also have a firmware 
>> first
>> model. In that case there is no need to take any actions on the RCEC 
>> because
>> the firmware is responsible for them. This is true where APEI [2] is 
>> used
>> to report the AER errors via a GHES[v2] HEST entry [3] and relevant
>> AER CPER record [4] and Firmware First handling is in use.
>
> I don't see anything in the patch that mentions "firmware first." Do
> we need it in the commit log?  After
> https://git.kernel.org/linus/708b20003624 ("PCI/AER: Remove
> HEST/FIRMWARE_FIRST parsing for AER ownership"), I think we no longer
> know anything about firmware-first in the kernel.

I’ll let Jonathan reply here.

>
>> We effectively end up with two different types of discovery for
>> purposes of handling AER errors:
>>
>> 1) Normal bus walk - we pass the downstream port above a bus to which
>> the device is attached and it walks everything below that point.
>>
>> 2) An RCiEP with no visible association with an RCEC as there is no 
>> need to
>> walk devices. In that case, the flow is to just call the callbacks 
>> for the actual
>> device.
>>
>> A new walk function, similar to pci_bus_walk is provided that takes a 
>> pci_dev
>> instead of a bus. If that dev corresponds to a downstream port it 
>> will walk
>> the subordinate bus of that downstream port. If the dev does not then 
>> it
>> will call the function on that device alone.
>
> Maybe mention the new function name here?

Agree, I will mention it here.

>
> Add "()" after function names in commit logs and comments so they
> don't look like English words.

Will fix.

>
> Wrap commit logs so they fit in 75 columns, so they don't wrap when
> "git log" indents them in a default 80 column window.  Yes, I know I
> could use wider windows, but I'd still want *some* default so commits
> don't just have random widths.

Will fix.

>
>> [0] ACPI PCI Express Base Specification 5.0-1 1.3.2.3 Root Complex 
>> Integrated
>>     Endpoint Rules.
>> [1] ACPI PCI Express Base Specification 5.0-1 6.2 Error Signalling 
>> and Logging
>> [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface 
>> (APEI)
>> [3] ACPI Specification 6.3 18.2.3.7 Generic Hardware Error Source
>> [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
>>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
>> ---
>>  drivers/pci/pcie/err.c | 59 
>> +++++++++++++++++++++++++++++++++---------
>>  1 file changed, 47 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index c543f419d8f9..682302dfb55b 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -146,38 +146,69 @@ static int report_resume(struct pci_dev *dev, 
>> void *data)
>>  	return 0;
>>  }
>>
>> +/**
>> + * pci_walk_dev_affected - walk devices potentially AER affected
>> + * @dev      device which may be an RCEC with associated RCiEPs,
>> + *           an RCiEP associated with an RCEC, or a Port.
>
> Does this mean that if dev is an RCEC, we call the callback for the
> *RCEC* itself?  I would have thought we'd want to do that for the
> associated *RCiEPs*?

Yes, we do. The errors can come from either an RCEC or its respective 
RCiEPs. Both Root Port and RCEC can report error for themselves. So both 
an error Root Port device and an error RCEC device can be passed here 
for error handling. In fact, the bit corresponding to the device number 
of the RCEC is always set in the RCiEP Bitmap (section 7.9.2). And an 
RCEC must also follow all the rules for an RCiEP (section 1.3.4).

I was wanting to do a test with an AER injection of the RCEC itself. But 
it looks like current aer_inject.c doesn’t support injecting an error 
to Root Ports or RCECs. Will need to take a look at it.

>
>> + * @cb       callback to be called for each device found
>> + * @userdata arbitrary pointer to be passed to callback.
>> + *
>> + * If the device provided is a port, walk the subordinate bus,
>
> This usage of "port" doesn't seem quite right.  "Port" includes root
> ports, switch upstream ports, switch downstream ports, *and* the
> upstream ports on endpoints.  The endpoint upstream ports obviously
> don't have subordinate buses.  We typically use "bridge" as the
> generic term for something with a subordinate bus.

Okay, that makes sense.  Will correct with “bridge”.

>
>> + * including any bridged devices on buses under this bus.
>> + * Call the provided callback on each device found.
>> + *
>> + * If the device provided has no subordinate bus, call the provided
>> + * callback on the device itself.
>> + */
>> +static void pci_walk_dev_affected(struct pci_dev *dev, int 
>> (*cb)(struct pci_dev *, void *),
>
> I don't understand the "affected" reference in the function name.
> This doesn't test anything to see whether devices are "affected".
> Naming is the hardest part of programming :)

In earlier discussion, Cameron had suggested pci_walk_aer_affected(). 
But I thought perhaps that the focus should be on the devices.  Perhaps 
a better description would be pci_walk_aer_devices() or something along 
those lines.  The original incarnation was pci_walk_below_dev().

I’m open to anything, really.

>
>> +				  void *userdata)
>> +{
>> +	if (dev->subordinate) {
>> +		pci_walk_bus(dev->subordinate, cb, userdata);
>> +	} else {
>> +		cb(dev, userdata);
>> +	}
>
> Typical Linux style omits {} for single-line if/else branches.

Will fix.

>
>> +}
>> +
>>  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>  			pci_channel_state_t state,
>>  			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
>>  {
>>  	pci_ers_result_t 'status = PCI_ERS_RESULT_CAN_RECOVER;
>> -	struct pci_bus *bus;
>>
>>  	/*
>>  	 * Error recovery runs on all subordinates of the first downstream 
>> port.
>>  	 * If the downstream port detected the error, it is cleared at the 
>> end.
>> +	 * For RCiEPs we should reset just the RCiEP itself.
>>  	 */
>>  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END ||
>> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC))
>>  		dev = dev->bus->self;
>> -	bus = dev->subordinate;
>>
>>  	pci_dbg(dev, "broadcast error_detected message\n");
>>  	if (state == pci_channel_io_frozen) {
>> -		pci_walk_bus(bus, report_frozen_detected, &status);
>> +		pci_walk_dev_affected(dev, report_frozen_detected, &status);
>> +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
>> +			pci_warn(dev, "link reset not possible for RCiEP\n");
>> +			status = PCI_ERS_RESULT_NONE;
>> +			goto failed;
>> +		}
>> +
>>  		status = reset_link(dev);
>
> reset_link() might be misnamed.  IIUC "dev" is a bridge, and the point
> is really to reset any devices below "dev."  Whether we do that by
> resetting link, DPC trigger, secondary bus reset, FLR, etc, is sort of
> immaterial.  Some of those methods might be applicable for RCiEPs.
>
> But you didn't add that name; I'm just trying to understand this
> better.

Yes, that’s a confusing term with the _link attached. It’s difficult 
to relate to the different resets that might be applicable. I was 
thinking about that when looking at the callback path via the 
“reset_link” of the RCiEP to the RCEC for the sole purpose of 
clearing the Root Port Error Status. It would be worth time to spend 
looking at better descriptive naming/methods.

>
>>  		if (status != PCI_ERS_RESULT_RECOVERED) {
>>  			pci_warn(dev, "link reset failed\n");
>>  			goto failed;
>>  		}
>>  	} else {
>> -		pci_walk_bus(bus, report_normal_detected, &status);
>> +		pci_walk_dev_affected(dev, report_normal_detected, &status);
>>  	}
>>
>>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>>  		status = PCI_ERS_RESULT_RECOVERED;
>>  		pci_dbg(dev, "broadcast mmio_enabled message\n");
>> -		pci_walk_bus(bus, report_mmio_enabled, &status);
>> +		pci_walk_dev_affected(dev, report_mmio_enabled, &status);
>>  	}
>>
>>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
>> @@ -188,18 +219,22 @@ pci_ers_result_t pcie_do_recovery(struct 
>> pci_dev *dev,
>>  		 */
>>  		status = PCI_ERS_RESULT_RECOVERED;
>>  		pci_dbg(dev, "broadcast slot_reset message\n");
>> -		pci_walk_bus(bus, report_slot_reset, &status);
>> +		pci_walk_dev_affected(dev, report_slot_reset, &status);
>>  	}
>>
>>  	if (status != PCI_ERS_RESULT_RECOVERED)
>>  		goto failed;
>>
>>  	pci_dbg(dev, "broadcast resume message\n");
>> -	pci_walk_bus(bus, report_resume, &status);
>> -
>> -	if (pcie_aer_is_native(dev))
>> -		pcie_clear_device_status(dev);
>> -	pci_aer_clear_nonfatal_status(dev);
>> +	pci_walk_dev_affected(dev, report_resume, &status);
>> +
>> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)) {
>> +		if (pcie_aer_is_native(dev))
>> +			pcie_clear_device_status(dev);
>> +		pci_aer_clear_nonfatal_status(dev);
>
> This change (testing pci_pcie_type()) looks like it's not strictly
> related to the rest of this patch and maybe should be split out into
> its own patch?

This change was also based on a commit (068c29a24) in the pci/next 
branch. The type testing was brought over from Jonathan’s original V2, 
but actually, it went full circle by adding the RC_EC type, because now 
it was no longer a no-op. There was an original concern about the need 
for those to be called on the RCEC from Jonathan’s RFC.

Thoughts Jonathan?

Thanks,

Sean

>
>> +	}
>>  	pci_info(dev, "device recovery successful\n");
>>  	return status;
>>
>> -- 
>> 2.27.0
>>
Jonathan Cameron Aug. 10, 2020, 9:32 a.m. UTC | #3
On Fri, 7 Aug 2020 17:55:17 -0700
Sean V Kelley <sean.v.kelley@intel.com> wrote:

> On 7 Aug 2020, at 15:53, Bjorn Helgaas wrote:
> 
> > On Tue, Aug 04, 2020 at 12:40:47PM -0700, Sean V Kelley wrote:  
> >> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >>
> >> Currently the kernel does not handle AER errors for Root Complex 
> >> integrated
> >> End Points (RCiEPs)[0]. These devices sit on a root bus within the 
> >> Root Complex
> >> (RC). AER handling is performed by a Root Complex Event Collector 
> >> (RCEC) [1]
> >> which is a effectively a type of RCiEP on the same root bus.
> >>
> >> For an RCEC (technically not a Bridge), error messages "received" 
> >> from
> >> associated RCiEPs must be enabled for "transmission" in order to 
> >> cause a
> >> System Error via the Root Control register or (when the Advanced 
> >> Error
> >> Reporting Capability is present) reporting via the Root Error Command
> >> register and logging in the Root Error Status register and Error 
> >> Source
> >> Identification register.
> >>
> >> In addition to the defined OS level handling of the reset flow for 
> >> the
> >> associated RCiEPs of an RCEC, it is possible to also have a firmware 
> >> first
> >> model. In that case there is no need to take any actions on the RCEC 
> >> because
> >> the firmware is responsible for them. This is true where APEI [2] is 
> >> used
> >> to report the AER errors via a GHES[v2] HEST entry [3] and relevant
> >> AER CPER record [4] and Firmware First handling is in use.  
> >
> > I don't see anything in the patch that mentions "firmware first." Do
> > we need it in the commit log?  After
> > https://git.kernel.org/linus/708b20003624 ("PCI/AER: Remove
> > HEST/FIRMWARE_FIRST parsing for AER ownership"), I think we no longer
> > know anything about firmware-first in the kernel.  
> 
> I’ll let Jonathan reply here.

This is a terminology question rather about what the distinction
between "Firmware First" vs "non native handling" is.

Note that in ARM world, native handling tends to be referred to as
'kernel first' and firmware based handling as 'firmware first' 
but that isn't that relevant here other than perhaps explaining why I used
this terminology. e.g.
http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-116.pdf
The distinction is who receives the notification of the error from hardware
and hence who sees it 'first' - basically where does the interrupt go?

Anyhow, if we want to avoid confusion here, we can just use the phrase
"non native handling" which I think is unambiguous?


...
 
> >  
> >> + * including any bridged devices on buses under this bus.
> >> + * Call the provided callback on each device found.
> >> + *
> >> + * If the device provided has no subordinate bus, call the provided
> >> + * callback on the device itself.
> >> + */
> >> +static void pci_walk_dev_affected(struct pci_dev *dev, int 
> >> (*cb)(struct pci_dev *, void *),  
> >
> > I don't understand the "affected" reference in the function name.
> > This doesn't test anything to see whether devices are "affected".
> > Naming is the hardest part of programming :)  
> 
> In earlier discussion, Cameron had suggested pci_walk_aer_affected(). 
> But I thought perhaps that the focus should be on the devices.  Perhaps 
> a better description would be pci_walk_aer_devices() or something along 
> those lines.  The original incarnation was pci_walk_below_dev().
> 
> I’m open to anything, really.

Agreed. It is really hard to name this function and would be great to 
have a better option.

> >>  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >>  			pci_channel_state_t state,
> >>  			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
> >>  {
> >>  	pci_ers_result_t 'status = PCI_ERS_RESULT_CAN_RECOVER;
> >> -	struct pci_bus *bus;
> >>
> >>  	/*
> >>  	 * Error recovery runs on all subordinates of the first downstream 
> >> port.
> >>  	 * If the downstream port detected the error, it is cleared at the 
> >> end.
> >> +	 * For RCiEPs we should reset just the RCiEP itself.
> >>  	 */
> >>  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> >> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> >> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> >> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END ||
> >> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC))
> >>  		dev = dev->bus->self;
> >> -	bus = dev->subordinate;
> >>
> >>  	pci_dbg(dev, "broadcast error_detected message\n");
> >>  	if (state == pci_channel_io_frozen) {
> >> -		pci_walk_bus(bus, report_frozen_detected, &status);
> >> +		pci_walk_dev_affected(dev, report_frozen_detected, &status);
> >> +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
> >> +			pci_warn(dev, "link reset not possible for RCiEP\n");
> >> +			status = PCI_ERS_RESULT_NONE;
> >> +			goto failed;
> >> +		}
> >> +
> >>  		status = reset_link(dev);  
> >
> > reset_link() might be misnamed.  IIUC "dev" is a bridge, and the point
> > is really to reset any devices below "dev."  Whether we do that by
> > resetting link, DPC trigger, secondary bus reset, FLR, etc, is sort of
> > immaterial.  Some of those methods might be applicable for RCiEPs.
> >
> > But you didn't add that name; I'm just trying to understand this
> > better.  
> 
> Yes, that’s a confusing term with the _link attached. It’s difficult 
> to relate to the different resets that might be applicable. I was 
> thinking about that when looking at the callback path via the 
> “reset_link” of the RCiEP to the RCEC for the sole purpose of 
> clearing the Root Port Error Status. It would be worth time to spend 
> looking at better descriptive naming/methods.

Agreed, this caused me some some confusion as well so more descriptive
naming would be good.

> 
> >  
> >>  		if (status != PCI_ERS_RESULT_RECOVERED) {
> >>  			pci_warn(dev, "link reset failed\n");
> >>  			goto failed;
> >>  		}
> >>  	} else {
> >> -		pci_walk_bus(bus, report_normal_detected, &status);
> >> +		pci_walk_dev_affected(dev, report_normal_detected, &status);
> >>  	}
> >>
> >>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> >>  		status = PCI_ERS_RESULT_RECOVERED;
> >>  		pci_dbg(dev, "broadcast mmio_enabled message\n");
> >> -		pci_walk_bus(bus, report_mmio_enabled, &status);
> >> +		pci_walk_dev_affected(dev, report_mmio_enabled, &status);
> >>  	}
> >>
> >>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> >> @@ -188,18 +219,22 @@ pci_ers_result_t pcie_do_recovery(struct 
> >> pci_dev *dev,
> >>  		 */
> >>  		status = PCI_ERS_RESULT_RECOVERED;
> >>  		pci_dbg(dev, "broadcast slot_reset message\n");
> >> -		pci_walk_bus(bus, report_slot_reset, &status);
> >> +		pci_walk_dev_affected(dev, report_slot_reset, &status);
> >>  	}
> >>
> >>  	if (status != PCI_ERS_RESULT_RECOVERED)
> >>  		goto failed;
> >>
> >>  	pci_dbg(dev, "broadcast resume message\n");
> >> -	pci_walk_bus(bus, report_resume, &status);
> >> -
> >> -	if (pcie_aer_is_native(dev))
> >> -		pcie_clear_device_status(dev);
> >> -	pci_aer_clear_nonfatal_status(dev);
> >> +	pci_walk_dev_affected(dev, report_resume, &status);
> >> +
> >> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> >> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> >> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)) {
> >> +		if (pcie_aer_is_native(dev))
> >> +			pcie_clear_device_status(dev);
> >> +		pci_aer_clear_nonfatal_status(dev);  
> >
> > This change (testing pci_pcie_type()) looks like it's not strictly
> > related to the rest of this patch and maybe should be split out into
> > its own patch?  
> 
> This change was also based on a commit (068c29a24) in the pci/next 
> branch. The type testing was brought over from Jonathan’s original V2, 
> but actually, it went full circle by adding the RC_EC type, because now 
> it was no longer a no-op. There was an original concern about the need 
> for those to be called on the RCEC from Jonathan’s RFC.

What this is doing is ensuring that we do not call these reset functions
if dev is an RCiEP.  This patch is introducing that possibility for the
first time.  Breaking it out to a precursor patch might be possible but
would seem a bit odd.  Perhaps we could invert the logic to check it
isn't PCI_EXP_TYPE_RC_END?  That seems less intuitive than a positive
check to me.  It might not be obvious to a future reader that we can't get
here with most of the other types.

Thanks,

Jonathan



> 
> Thoughts Jonathan?
> 
> Thanks,
> 
> Sean
> 
> >  
> >> +	}
> >>  	pci_info(dev, "device recovery successful\n");
> >>  	return status;
> >>
> >> -- 
> >> 2.27.0
> >>
Bjorn Helgaas Aug. 17, 2020, 10:24 p.m. UTC | #4
On Mon, Aug 10, 2020 at 10:32:52AM +0100, Jonathan Cameron wrote:
> On Fri, 7 Aug 2020 17:55:17 -0700
> Sean V Kelley <sean.v.kelley@intel.com> wrote:
> > On 7 Aug 2020, at 15:53, Bjorn Helgaas wrote:
> > > On Tue, Aug 04, 2020 at 12:40:47PM -0700, Sean V Kelley wrote:  

> > >>  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > >> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> > >> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> > >> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END ||
> > >> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC))
> > >>  		dev = dev->bus->self;

I'm not sure I understand this "if" statement.  Previously (with no
RCEC support), the possible ways I see to call pcie_do_recovery() are
with:

  AER native:   Root Port
  AER via APEI: Root Port or other PCIe device (ACPI v6.3, 18.3.2.5)
  DPC:          Root Port or Switch Downstream Port
  EDR:          Root Port or Switch Downstream Port

I *guess* the reason we have this "if" statement is for the AER/APEI
case?  And the effect is that even if AER/APEI gives us an Endpoint,
we back up and handle it as though we got it from the Downstream Port
above it, i.e., we reset the Endpoint along with any other children of
that Downstream Port?

Then, IIUC, your patches add this case:

  AER native:   Root Port or RCEC
  AER via APEI: Root Port, RCEC, or other PCIe device

Just noodling here, but I wonder if this would be more understandable
as something like:

  type = pci_pcie_type(dev);
  if (type == PCI_EXP_TYPE_ROOT_PORT ||
      type == PCI_EXP_TYPE_DOWNSTREAM ||
      type == PCI_EXP_TYPE_RC_EC)
    bridge = dev;
  else if (type == PCI_EXP_TYPE_RC_END)
    bridge = dev->rcec;
  else
    bridge = pci_upstream_bridge(dev);

and then we could do:

  if (type == PCI_EXP_TYPE_RC_END)
    flr_on_rciep(dev);
  else
    reset_link(bridge);

It's still awkward to have to deal with being supplied either
endpoints or bridges.  But I guess in the AER/APEI case, we aren't
allowed to touch the error registers so maybe we can't avoid the
awkwardness.

> > >>  		status = reset_link(dev);  
> > >
> > > reset_link() might be misnamed.  IIUC "dev" is a bridge, and the point
> > > is really to reset any devices below "dev."  Whether we do that by
> > > resetting link, DPC trigger, secondary bus reset, FLR, etc, is sort of
> > > immaterial.  Some of those methods might be applicable for RCiEPs.
> > >
> > > But you didn't add that name; I'm just trying to understand this
> > > better.  
> > 
> > Yes, that’s a confusing term with the _link attached. It’s difficult 
> > to relate to the different resets that might be applicable. I was 
> > thinking about that when looking at the callback path via the 
> > “reset_link” of the RCiEP to the RCEC for the sole purpose of 
> > clearing the Root Port Error Status. It would be worth time to spend 
> > looking at better descriptive naming/methods.
> 
> Agreed, this caused me some some confusion as well so more descriptive
> naming would be good.

Maybe something like reset_subordinate_devices()?  Then it's clear
that we pass a bridge and reset the devices *below* it.  It's not
quite as obvious for RCECs, since they aren't bridges and the RCiEPs
aren't actually *subordinates*, but maybe it's still suggestive of the
logical relationship?

Bjorn
Jonathan Cameron Aug. 18, 2020, 9:01 a.m. UTC | #5
On Mon, 17 Aug 2020 17:24:33 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> On Mon, Aug 10, 2020 at 10:32:52AM +0100, Jonathan Cameron wrote:
> > On Fri, 7 Aug 2020 17:55:17 -0700
> > Sean V Kelley <sean.v.kelley@intel.com> wrote:  
> > > On 7 Aug 2020, at 15:53, Bjorn Helgaas wrote:  
> > > > On Tue, Aug 04, 2020 at 12:40:47PM -0700, Sean V Kelley wrote:    
> 
> > > >>  	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > > >> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> > > >> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> > > >> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END ||
> > > >> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC))
> > > >>  		dev = dev->bus->self;  
> 
> I'm not sure I understand this "if" statement.  Previously (with no
> RCEC support), the possible ways I see to call pcie_do_recovery() are
> with:
> 
>   AER native:   Root Port
>   AER via APEI: Root Port or other PCIe device (ACPI v6.3, 18.3.2.5)
>   DPC:          Root Port or Switch Downstream Port
>   EDR:          Root Port or Switch Downstream Port
> 
> I *guess* the reason we have this "if" statement is for the AER/APEI
> case?  And the effect is that even if AER/APEI gives us an Endpoint,
> we back up and handle it as though we got it from the Downstream Port
> above it, i.e., we reset the Endpoint along with any other children of
> that Downstream Port?
> 
> Then, IIUC, your patches add this case:
> 
>   AER native:   Root Port or RCEC
>   AER via APEI: Root Port, RCEC, or other PCIe device
> 
> Just noodling here, but I wonder if this would be more understandable
> as something like:
> 
>   type = pci_pcie_type(dev);
>   if (type == PCI_EXP_TYPE_ROOT_PORT ||
>       type == PCI_EXP_TYPE_DOWNSTREAM ||
>       type == PCI_EXP_TYPE_RC_EC)
>     bridge = dev;
>   else if (type == PCI_EXP_TYPE_RC_END)
>     bridge = dev->rcec;
>   else
>     bridge = pci_upstream_bridge(dev);
> 
> and then we could do:
> 
>   if (type == PCI_EXP_TYPE_RC_END)
>     flr_on_rciep(dev);
>   else
>     reset_link(bridge);
> 
> It's still awkward to have to deal with being supplied either
> endpoints or bridges.  But I guess in the AER/APEI case, we aren't
> allowed to touch the error registers so maybe we can't avoid the
> awkwardness.

Agreed with your analysis with one exception. It isn't just that we
aren't allowed to touch the error registers, but also that they may
not even exist (i.e. there is no RCEC).

There are quite a lot of places where we have to then handle the
cases separately.  For an RC_END in the APEI case we don't
have to have an RCEC as we should never be touching it or
any of its registers.  We have platforms that do it this way
(obviously there is a hardware entity doing RCEC like stuff, but it is
not visible to the OS).

In these cases (bridge == NULL) and we can't call the bus_walk on it
to call the various desired resets on the RCiEP.  We could
do something like

pci_walk_affected(bridge, dev, report_frozen, &status);

and if bridge is NULL, perform the reset just on dev.

Would that be clearer?

Thanks,

Jonathan

> 
> > > >>  		status = reset_link(dev);    
> > > >
> > > > reset_link() might be misnamed.  IIUC "dev" is a bridge, and the point
> > > > is really to reset any devices below "dev."  Whether we do that by
> > > > resetting link, DPC trigger, secondary bus reset, FLR, etc, is sort of
> > > > immaterial.  Some of those methods might be applicable for RCiEPs.
> > > >
> > > > But you didn't add that name; I'm just trying to understand this
> > > > better.    
> > > 
> > > Yes, that’s a confusing term with the _link attached. It’s difficult 
> > > to relate to the different resets that might be applicable. I was 
> > > thinking about that when looking at the callback path via the 
> > > “reset_link” of the RCiEP to the RCEC for the sole purpose of 
> > > clearing the Root Port Error Status. It would be worth time to spend 
> > > looking at better descriptive naming/methods.  
> > 
> > Agreed, this caused me some some confusion as well so more descriptive
> > naming would be good.  
> 
> Maybe something like reset_subordinate_devices()?  Then it's clear
> that we pass a bridge and reset the devices *below* it.  It's not
> quite as obvious for RCECs, since they aren't bridges and the RCiEPs
> aren't actually *subordinates*, but maybe it's still suggestive of the
> logical relationship?
> 
> Bjorn

Patch
diff mbox series

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index c543f419d8f9..682302dfb55b 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -146,38 +146,69 @@  static int report_resume(struct pci_dev *dev, void *data)
 	return 0;
 }
 
+/**
+ * pci_walk_dev_affected - walk devices potentially AER affected
+ * @dev      device which may be an RCEC with associated RCiEPs,
+ *           an RCiEP associated with an RCEC, or a Port.
+ * @cb       callback to be called for each device found
+ * @userdata arbitrary pointer to be passed to callback.
+ *
+ * If the device provided is a port, walk the subordinate bus,
+ * including any bridged devices on buses under this bus.
+ * Call the provided callback on each device found.
+ *
+ * If the device provided has no subordinate bus, call the provided
+ * callback on the device itself.
+ */
+static void pci_walk_dev_affected(struct pci_dev *dev, int (*cb)(struct pci_dev *, void *),
+				  void *userdata)
+{
+	if (dev->subordinate) {
+		pci_walk_bus(dev->subordinate, cb, userdata);
+	} else {
+		cb(dev, userdata);
+	}
+}
+
 pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 			pci_channel_state_t state,
 			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
 {
 	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-	struct pci_bus *bus;
 
 	/*
 	 * Error recovery runs on all subordinates of the first downstream port.
 	 * If the downstream port detected the error, it is cleared at the end.
+	 * For RCiEPs we should reset just the RCiEP itself.
 	 */
 	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
-	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
+	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
+	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END ||
+	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC))
 		dev = dev->bus->self;
-	bus = dev->subordinate;
 
 	pci_dbg(dev, "broadcast error_detected message\n");
 	if (state == pci_channel_io_frozen) {
-		pci_walk_bus(bus, report_frozen_detected, &status);
+		pci_walk_dev_affected(dev, report_frozen_detected, &status);
+		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
+			pci_warn(dev, "link reset not possible for RCiEP\n");
+			status = PCI_ERS_RESULT_NONE;
+			goto failed;
+		}
+
 		status = reset_link(dev);
 		if (status != PCI_ERS_RESULT_RECOVERED) {
 			pci_warn(dev, "link reset failed\n");
 			goto failed;
 		}
 	} else {
-		pci_walk_bus(bus, report_normal_detected, &status);
+		pci_walk_dev_affected(dev, report_normal_detected, &status);
 	}
 
 	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
 		status = PCI_ERS_RESULT_RECOVERED;
 		pci_dbg(dev, "broadcast mmio_enabled message\n");
-		pci_walk_bus(bus, report_mmio_enabled, &status);
+		pci_walk_dev_affected(dev, report_mmio_enabled, &status);
 	}
 
 	if (status == PCI_ERS_RESULT_NEED_RESET) {
@@ -188,18 +219,22 @@  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 		 */
 		status = PCI_ERS_RESULT_RECOVERED;
 		pci_dbg(dev, "broadcast slot_reset message\n");
-		pci_walk_bus(bus, report_slot_reset, &status);
+		pci_walk_dev_affected(dev, report_slot_reset, &status);
 	}
 
 	if (status != PCI_ERS_RESULT_RECOVERED)
 		goto failed;
 
 	pci_dbg(dev, "broadcast resume message\n");
-	pci_walk_bus(bus, report_resume, &status);
-
-	if (pcie_aer_is_native(dev))
-		pcie_clear_device_status(dev);
-	pci_aer_clear_nonfatal_status(dev);
+	pci_walk_dev_affected(dev, report_resume, &status);
+
+	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
+	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)) {
+		if (pcie_aer_is_native(dev))
+			pcie_clear_device_status(dev);
+		pci_aer_clear_nonfatal_status(dev);
+	}
 	pci_info(dev, "device recovery successful\n");
 	return status;