diff mbox series

[v12,07/15] PCI/ERR: Use "bridge" for clarity in pcie_do_recovery()

Message ID 20201121001036.8560-8-sean.v.kelley@intel.com (mailing list archive)
State Accepted
Delegated to: Bjorn Helgaas
Headers show
Series Add RCEC handling to PCI/AER | expand

Commit Message

Kelley, Sean V Nov. 21, 2020, 12:10 a.m. UTC
pcie_do_recovery() may be called with "dev" being either a bridge (Root
Port or Switch Downstream Port) or an Endpoint.  The bulk of the function
deals with the bridge, so if we start with an Endpoint, we reset "dev" to
be the bridge leading to it.

For clarity, replace "dev" in the body of the function with "bridge".  No
functional change intended.

[bhelgaas: commit log, split pieces out so this is pure rename, also
replace "dev" with "bridge" in messages and pci_uevent_ers()]
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20201002184735.1229220-6-seanvk.dev@oregontracks.org
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 drivers/pci/pcie/err.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

Comments

Bjorn Helgaas Dec. 2, 2020, 11:18 p.m. UTC | #1
On Fri, Nov 20, 2020 at 04:10:28PM -0800, Sean V Kelley wrote:
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 7a5af873d8bc..46a5b84f8842 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -151,24 +151,27 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  		pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev))
>  {
>  	int type = pci_pcie_type(dev);
> -	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> +	struct pci_dev *bridge;
>  	struct pci_bus *bus;
> +	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
>  
>  	/*
> -	 * Error recovery runs on all subordinates of the first downstream port.
> -	 * If the downstream port detected the error, it is cleared at the end.
> +	 * Error recovery runs on all subordinates of the bridge.  If the
> +	 * bridge detected the error, it is cleared at the end.
>  	 */
>  	if (!(type == PCI_EXP_TYPE_ROOT_PORT ||
>  	      type == PCI_EXP_TYPE_DOWNSTREAM))
> -		dev = pci_upstream_bridge(dev);
> -	bus = dev->subordinate;
> +		bridge = pci_upstream_bridge(dev);
> +	else
> +		bridge = dev;

I think there's a bug here even before your series.  We started with:

  pcie_do_recovery(struct pci_dev *dev, ..., pci_ers_result_t (*reset_link)())
  {
    if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
	  pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
      dev = dev->bus->self;
    ...
    reset_link(dev);

so if we called pcie_do_recovery() with an Endpoint, we set "dev" to
the upstream bridge, either a Root Port or a Switch Downstream Port,
which we then pass on to reset_link().  For native AER and APEI,
that's aer_root_reset(), which assumes it gets a Root Port.

If we pass a Switch Downstream Port, aer_root_reset() writes to the
*switch port's* PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS, which
are reserved since it's not a Root Port or an RCEC.

The writes probably don't *break* anything since those registers are
reserved, but they also don't disable the interrupt or clear the Root
Error Status.

Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 7a5af873d8bc..46a5b84f8842 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -151,24 +151,27 @@  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 		pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev))
 {
 	int type = pci_pcie_type(dev);
-	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
+	struct pci_dev *bridge;
 	struct pci_bus *bus;
+	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
 
 	/*
-	 * Error recovery runs on all subordinates of the first downstream port.
-	 * If the downstream port detected the error, it is cleared at the end.
+	 * Error recovery runs on all subordinates of the bridge.  If the
+	 * bridge detected the error, it is cleared at the end.
 	 */
 	if (!(type == PCI_EXP_TYPE_ROOT_PORT ||
 	      type == PCI_EXP_TYPE_DOWNSTREAM))
-		dev = pci_upstream_bridge(dev);
-	bus = dev->subordinate;
+		bridge = pci_upstream_bridge(dev);
+	else
+		bridge = dev;
 
-	pci_dbg(dev, "broadcast error_detected message\n");
+	bus = bridge->subordinate;
+	pci_dbg(bridge, "broadcast error_detected message\n");
 	if (state == pci_channel_io_frozen) {
 		pci_walk_bus(bus, report_frozen_detected, &status);
-		status = reset_subordinates(dev);
+		status = reset_subordinates(bridge);
 		if (status != PCI_ERS_RESULT_RECOVERED) {
-			pci_warn(dev, "subordinate device reset failed\n");
+			pci_warn(bridge, "subordinate device reset failed\n");
 			goto failed;
 		}
 	} else {
@@ -177,7 +180,7 @@  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 
 	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
 		status = PCI_ERS_RESULT_RECOVERED;
-		pci_dbg(dev, "broadcast mmio_enabled message\n");
+		pci_dbg(bridge, "broadcast mmio_enabled message\n");
 		pci_walk_bus(bus, report_mmio_enabled, &status);
 	}
 
@@ -188,27 +191,27 @@  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 		 * drivers' slot_reset callbacks?
 		 */
 		status = PCI_ERS_RESULT_RECOVERED;
-		pci_dbg(dev, "broadcast slot_reset message\n");
+		pci_dbg(bridge, "broadcast slot_reset message\n");
 		pci_walk_bus(bus, report_slot_reset, &status);
 	}
 
 	if (status != PCI_ERS_RESULT_RECOVERED)
 		goto failed;
 
-	pci_dbg(dev, "broadcast resume message\n");
+	pci_dbg(bridge, "broadcast resume message\n");
 	pci_walk_bus(bus, report_resume, &status);
 
-	if (pcie_aer_is_native(dev))
-		pcie_clear_device_status(dev);
-	pci_aer_clear_nonfatal_status(dev);
-	pci_info(dev, "device recovery successful\n");
+	if (pcie_aer_is_native(bridge))
+		pcie_clear_device_status(bridge);
+	pci_aer_clear_nonfatal_status(bridge);
+	pci_info(bridge, "device recovery successful\n");
 	return status;
 
 failed:
-	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
+	pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);
 
 	/* TODO: Should kernel panic here? */
-	pci_info(dev, "device recovery failed\n");
+	pci_info(bridge, "device recovery failed\n");
 
 	return status;
 }