diff mbox

[V4,7/7] PCI: Handle link reset via hotplug if supported

Message ID 1530214274-21139-7-git-send-email-okaya@codeaurora.org (mailing list archive)
State Not Applicable, archived
Delegated to: Andy Gross
Headers show

Commit Message

Sinan Kaya June 28, 2018, 7:31 p.m. UTC
If a bridge supports hotplug and observes a PCIe fatal error, the following
events happen:

1. AER driver removes the devices from PCI tree on fatal error
2. AER driver brings down the link by issuing a secondary bus reset waits
for the link to come up.
3. Hotplug driver observes a link down interrupt
4. Hotplug driver tries to remove the devices waiting for the rescan lock
but devices are already removed by the AER driver and AER driver is waiting
for the link to come back up.
5. AER driver tries to re-enumerate devices after polling for the link
state to go up.
6. Hotplug driver obtains the lock and tries to remove the devices again.

If a bridge is a hotplug capable bridge, bounce the error handling to the
hotplug driver so that hotplug driver can mask link up/down interrupts
while performing a secondary bus reset.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/pci/hotplug/pciehp_core.c | 20 ++++++++++++++++++++
 drivers/pci/pcie/err.c            |  5 +++++
 2 files changed, 25 insertions(+)

Comments

Andy Shevchenko June 29, 2018, 9:39 p.m. UTC | #1
On Thu, Jun 28, 2018 at 10:31 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> If a bridge supports hotplug and observes a PCIe fatal error, the following
> events happen:
>
> 1. AER driver removes the devices from PCI tree on fatal error
> 2. AER driver brings down the link by issuing a secondary bus reset waits
> for the link to come up.
> 3. Hotplug driver observes a link down interrupt
> 4. Hotplug driver tries to remove the devices waiting for the rescan lock
> but devices are already removed by the AER driver and AER driver is waiting
> for the link to come back up.
> 5. AER driver tries to re-enumerate devices after polling for the link
> state to go up.
> 6. Hotplug driver obtains the lock and tries to remove the devices again.
>
> If a bridge is a hotplug capable bridge, bounce the error handling to the
> hotplug driver so that hotplug driver can mask link up/down interrupts
> while performing a secondary bus reset.

> +static pci_ers_result_t pciehp_reset_link(struct pci_dev *pdev)
> +{
> +       struct pcie_device *pciedev;
> +       struct controller *ctrl;
> +       struct device *devhp;
> +       struct slot *slot;
> +       int rc;
> +
> +       devhp = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_HP);
> +       pciedev = to_pcie_device(devhp);
> +       ctrl = get_service_data(pciedev);
> +       slot = ctrl->slot;
> +
> +       rc = reset_slot(slot->hotplug_slot, 0);
> +
> +       return !rc ? PCI_ERS_RESULT_RECOVERED : PCI_ERS_RESULT_DISCONNECT;

Would it be better to

return rc ? ..._DISCONNECT : ..._RECOVERED; ?

> +}
Lukas Wunner July 1, 2018, 5:14 p.m. UTC | #2
On Thu, Jun 28, 2018 at 03:31:05PM -0400, Sinan Kaya wrote:
> +static pci_ers_result_t pciehp_reset_link(struct pci_dev *pdev)
> +{
> +	struct pcie_device *pciedev;
> +	struct controller *ctrl;
> +	struct device *devhp;
> +	struct slot *slot;
> +	int rc;
> +
> +	devhp = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_HP);
> +	pciedev = to_pcie_device(devhp);
> +	ctrl = get_service_data(pciedev);
> +	slot = ctrl->slot;
> +
> +	rc = reset_slot(slot->hotplug_slot, 0);
> +
> +	return !rc ? PCI_ERS_RESULT_RECOVERED : PCI_ERS_RESULT_DISCONNECT;
> +}

This looks like a bit of a detour.  There's a "struct pci_slot *slot"
pointer in struct pci_dev.  Any reason not to simply call:

	rc = reset_slot(pdev->slot->hotplug_slot)

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya July 1, 2018, 5:24 p.m. UTC | #3
On 2018-07-01 13:14, Lukas Wunner wrote:
> On Thu, Jun 28, 2018 at 03:31:05PM -0400, Sinan Kaya wrote:
>> +static pci_ers_result_t pciehp_reset_link(struct pci_dev *pdev)
>> +{
>> +	struct pcie_device *pciedev;
>> +	struct controller *ctrl;
>> +	struct device *devhp;
>> +	struct slot *slot;
>> +	int rc;
>> +
>> +	devhp = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_HP);
>> +	pciedev = to_pcie_device(devhp);
>> +	ctrl = get_service_data(pciedev);
>> +	slot = ctrl->slot;
>> +
>> +	rc = reset_slot(slot->hotplug_slot, 0);
>> +
>> +	return !rc ? PCI_ERS_RESULT_RECOVERED : PCI_ERS_RESULT_DISCONNECT;
>> +}
> 
> This looks like a bit of a detour.  There's a "struct pci_slot *slot"
> pointer in struct pci_dev.  Any reason not to simply call:
> 
> 	rc = reset_slot(pdev->slot->hotplug_slot)

pdev here is the bridge. Slot pointers are only valid for children 
objects.


> 
> Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 44a6a63..43a49cc 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -299,6 +299,24 @@  static int pciehp_resume(struct pcie_device *dev)
 }
 #endif /* PM */
 
+static pci_ers_result_t pciehp_reset_link(struct pci_dev *pdev)
+{
+	struct pcie_device *pciedev;
+	struct controller *ctrl;
+	struct device *devhp;
+	struct slot *slot;
+	int rc;
+
+	devhp = pcie_port_find_device(pdev, PCIE_PORT_SERVICE_HP);
+	pciedev = to_pcie_device(devhp);
+	ctrl = get_service_data(pciedev);
+	slot = ctrl->slot;
+
+	rc = reset_slot(slot->hotplug_slot, 0);
+
+	return !rc ? PCI_ERS_RESULT_RECOVERED : PCI_ERS_RESULT_DISCONNECT;
+}
+
 static struct pcie_port_service_driver hpdriver_portdrv = {
 	.name		= PCIE_MODULE_NAME,
 	.port_type	= PCIE_ANY_PORT,
@@ -311,6 +329,8 @@  static struct pcie_port_service_driver hpdriver_portdrv = {
 	.suspend	= pciehp_suspend,
 	.resume		= pciehp_resume,
 #endif	/* PM */
+
+	.reset_link	= pciehp_reset_link,
 };
 
 static int __init pcied_init(void)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index a3a26f1..0b66779 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -308,6 +308,11 @@  void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
 		pci_dev_put(pdev);
 	}
 
+	/* handle link reset via hotplug driver if supported */
+	if (dev->is_hotplug_bridge &&
+		pcie_port_find_device(dev, PCIE_PORT_SERVICE_HP))
+		service = PCIE_PORT_SERVICE_HP;
+
 	result = reset_link(udev, service);
 
 	if ((service == PCIE_PORT_SERVICE_AER) &&