Message ID | 9f5ff00e1593d8d9a4b452398b98aa14d23fca11.1673769517.git.lukas@wunner.de (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI reset delay fixes | expand |
Hi Lucas, I do not understand why pci_bridge_wait_for_secondary_bus() can fix Intel's Ponte Vecchio HPC GPU after a DPC-induced Hot Reset. The func pci_bridge_wait_for_secondary_bus() also use pcie_wait_for_link_delay() which time depends on the max device delay time of one bus, for the GPU which bus only one device, I think the time is 100ms as the input parater in pcie_wait_for_link_delay(). pcie_wait_for_link() also wait fixed 100ms and then wait the device data link is ready. So another wait time is pci_dev_wait() in your patch? pci_dev_wait() to receive the CRS from the device to check the device whether is ready. Please help me understand which difference work. On 2023/1/15 16:20, Lukas Wunner wrote: > pci_bridge_wait_for_secondary_bus() is called after a Secondary Bus > Reset, but not after a DPC-induced Hot Reset. > > As a result, the delays prescribed by PCIe r6.0 sec 6.6.1 are not > observed and devices on the secondary bus may be accessed before > they're ready. > > One affected device is Intel's Ponte Vecchio HPC GPU. It comprises a > PCIe switch whose upstream port is not immediately ready after reset. > Because its config space is restored too early, it remains in > D0uninitialized, its subordinate devices remain inaccessible and DPC > recovery fails with messages such as: > > i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible) > intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible) > pcieport 0000:89:02.0: AER: device recovery failed > > Fix it. > > Tested-by: Ravi Kishore Koppuravuri<ravi.kishore.koppuravuri@intel.com> > Signed-off-by: Lukas Wunner<lukas@wunner.de> > Reviewed-by: Mika Westerberg<mika.westerberg@linux.intel.com> > Cc:stable@vger.kernel.org > --- > Changes v1 -> v2: > * Move PCIE_RESET_READY_POLL_MS macro below the newly introduced > PCI_RESET_WAIT from patch [2/3] and extend its code comment > * Mention errors seen on Ponte Vecchio in commit message (Bjorn) > * Avoid first person plural in commit message (Sathyanarayanan) > * Add Reviewed-by tag (Mika) > > drivers/pci/pci.c | 3 --- > drivers/pci/pci.h | 6 ++++++ > drivers/pci/pcie/dpc.c | 4 ++-- > 3 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 509f6b5c9e14..d31c21ea9688 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -167,9 +167,6 @@ static int __init pcie_port_pm_setup(char *str) > } > __setup("pcie_port_pm=", pcie_port_pm_setup); > > -/* Time to wait after a reset for device to become responsive */ > -#define PCIE_RESET_READY_POLL_MS 60000 > - > /** > * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children > * @bus: pointer to PCI bus structure to search > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index ce1fc3a90b3f..8f5d4bd5b410 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -70,6 +70,12 @@ struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev, > * Reset (PCIe r6.0 sec 5.8). > */ > #define PCI_RESET_WAIT 1000 /* msec */ > +/* > + * Devices may extend the 1 sec period through Request Retry Status completions > + * (PCIe r6.0 sec 2.3.1). The spec does not provide an upper limit, but 60 sec > + * ought to be enough for any device to become responsive. > + */ > +#define PCIE_RESET_READY_POLL_MS 60000 /* msec */ > > void pci_update_current_state(struct pci_dev *dev, pci_power_t state); > void pci_refresh_power_state(struct pci_dev *dev); > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > index f5ffea17c7f8..a5d7c69b764e 100644 > --- a/drivers/pci/pcie/dpc.c > +++ b/drivers/pci/pcie/dpc.c > @@ -170,8 +170,8 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev) > pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, > PCI_EXP_DPC_STATUS_TRIGGER); > > - if (!pcie_wait_for_link(pdev, true)) { > - pci_info(pdev, "Data Link Layer Link Active not set in 1000 msec\n"); > + if (pci_bridge_wait_for_secondary_bus(pdev, "DPC", > + PCIE_RESET_READY_POLL_MS)) { > clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags); > ret = PCI_ERS_RESULT_DISCONNECT; > } else { Thanks, Yang
On Sat, Feb 18, 2023 at 09:23:47PM +0800, Yang Su wrote: > I do not understand why pci_bridge_wait_for_secondary_bus() can fix > Intel's Ponte Vecchio HPC GPU after a DPC-induced Hot Reset. > > The func pci_bridge_wait_for_secondary_bus() also use > pcie_wait_for_link_delay() which time depends on the max device delay > time of one bus, for the GPU which bus only one device, I think the > time is 100ms as the input parater in pcie_wait_for_link_delay(). > > pcie_wait_for_link() also wait fixed 100ms and then wait the device data > link is ready. So another wait time is pci_dev_wait() in your patch? > pci_dev_wait() to receive the CRS from the device to check the device > whether is ready. > > Please help me understand which difference work. The crucial difference is the invocation of pci_dev_wait(), which waits up to 60 seconds for the device to come out of reset. The spec allows 1 second but that may be extended via CRS. Ponte Vecchio has been witnessed to take more than 4 seconds in some cases, hence the need to wait longer than 1 second. Thanks, Lukas
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 509f6b5c9e14..d31c21ea9688 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -167,9 +167,6 @@ static int __init pcie_port_pm_setup(char *str) } __setup("pcie_port_pm=", pcie_port_pm_setup); -/* Time to wait after a reset for device to become responsive */ -#define PCIE_RESET_READY_POLL_MS 60000 - /** * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children * @bus: pointer to PCI bus structure to search diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index ce1fc3a90b3f..8f5d4bd5b410 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -70,6 +70,12 @@ struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev, * Reset (PCIe r6.0 sec 5.8). */ #define PCI_RESET_WAIT 1000 /* msec */ +/* + * Devices may extend the 1 sec period through Request Retry Status completions + * (PCIe r6.0 sec 2.3.1). The spec does not provide an upper limit, but 60 sec + * ought to be enough for any device to become responsive. + */ +#define PCIE_RESET_READY_POLL_MS 60000 /* msec */ void pci_update_current_state(struct pci_dev *dev, pci_power_t state); void pci_refresh_power_state(struct pci_dev *dev); diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index f5ffea17c7f8..a5d7c69b764e 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -170,8 +170,8 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev) pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, PCI_EXP_DPC_STATUS_TRIGGER); - if (!pcie_wait_for_link(pdev, true)) { - pci_info(pdev, "Data Link Layer Link Active not set in 1000 msec\n"); + if (pci_bridge_wait_for_secondary_bus(pdev, "DPC", + PCIE_RESET_READY_POLL_MS)) { clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags); ret = PCI_ERS_RESULT_DISCONNECT; } else {