diff mbox series

[v2,3/3] PCI/DPC: Await readiness of secondary bus after reset

Message ID 9f5ff00e1593d8d9a4b452398b98aa14d23fca11.1673769517.git.lukas@wunner.de (mailing list archive)
State Accepted
Delegated to: Bjorn Helgaas
Headers show
Series PCI reset delay fixes | expand

Commit Message

Lukas Wunner Jan. 15, 2023, 8:20 a.m. UTC
pci_bridge_wait_for_secondary_bus() is called after a Secondary Bus
Reset, but not after a DPC-induced Hot Reset.

As a result, the delays prescribed by PCIe r6.0 sec 6.6.1 are not
observed and devices on the secondary bus may be accessed before
they're ready.

One affected device is Intel's Ponte Vecchio HPC GPU.  It comprises a
PCIe switch whose upstream port is not immediately ready after reset.
Because its config space is restored too early, it remains in
D0uninitialized, its subordinate devices remain inaccessible and DPC
recovery fails with messages such as:

i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible)
intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible)
pcieport 0000:89:02.0: AER: device recovery failed

Fix it.

Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: stable@vger.kernel.org
---
Changes v1 -> v2:
 * Move PCIE_RESET_READY_POLL_MS macro below the newly introduced
   PCI_RESET_WAIT from patch [2/3] and extend its code comment
 * Mention errors seen on Ponte Vecchio in commit message (Bjorn)
 * Avoid first person plural in commit message (Sathyanarayanan)
 * Add Reviewed-by tag (Mika)

 drivers/pci/pci.c      | 3 ---
 drivers/pci/pci.h      | 6 ++++++
 drivers/pci/pcie/dpc.c | 4 ++--
 3 files changed, 8 insertions(+), 5 deletions(-)

Comments

Yang Su Feb. 18, 2023, 1:23 p.m. UTC | #1
Hi Lucas,

I do not understand why pci_bridge_wait_for_secondary_bus() can fix 
Intel's Ponte Vecchio HPC GPU

after a DPC-induced Hot Reset.


The func pci_bridge_wait_for_secondary_bus() also use 
pcie_wait_for_link_delay() which time depends on

the max device delay time of one bus, for the GPU which bus only one 
device, I think the time is 100ms as

the input parater in pcie_wait_for_link_delay().


pcie_wait_for_link() also wait fixed 100ms and then wait the device data 
link is ready. So another wait time

is pci_dev_wait() in your patch? pci_dev_wait() to receive the CRS from 
the device to check the device

whether is ready.


Please help me understand which difference work.


On 2023/1/15 16:20, Lukas Wunner wrote:
> pci_bridge_wait_for_secondary_bus() is called after a Secondary Bus
> Reset, but not after a DPC-induced Hot Reset.
>
> As a result, the delays prescribed by PCIe r6.0 sec 6.6.1 are not
> observed and devices on the secondary bus may be accessed before
> they're ready.
>
> One affected device is Intel's Ponte Vecchio HPC GPU.  It comprises a
> PCIe switch whose upstream port is not immediately ready after reset.
> Because its config space is restored too early, it remains in
> D0uninitialized, its subordinate devices remain inaccessible and DPC
> recovery fails with messages such as:
>
> i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible)
> intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible)
> pcieport 0000:89:02.0: AER: device recovery failed
>
> Fix it.
>
> Tested-by: Ravi Kishore Koppuravuri<ravi.kishore.koppuravuri@intel.com>
> Signed-off-by: Lukas Wunner<lukas@wunner.de>
> Reviewed-by: Mika Westerberg<mika.westerberg@linux.intel.com>
> Cc:stable@vger.kernel.org
> ---
> Changes v1 -> v2:
>   * Move PCIE_RESET_READY_POLL_MS macro below the newly introduced
>     PCI_RESET_WAIT from patch [2/3] and extend its code comment
>   * Mention errors seen on Ponte Vecchio in commit message (Bjorn)
>   * Avoid first person plural in commit message (Sathyanarayanan)
>   * Add Reviewed-by tag (Mika)
>
>   drivers/pci/pci.c      | 3 ---
>   drivers/pci/pci.h      | 6 ++++++
>   drivers/pci/pcie/dpc.c | 4 ++--
>   3 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 509f6b5c9e14..d31c21ea9688 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -167,9 +167,6 @@ static int __init pcie_port_pm_setup(char *str)
>   }
>   __setup("pcie_port_pm=", pcie_port_pm_setup);
>   
> -/* Time to wait after a reset for device to become responsive */
> -#define PCIE_RESET_READY_POLL_MS 60000
> -
>   /**
>    * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children
>    * @bus: pointer to PCI bus structure to search
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index ce1fc3a90b3f..8f5d4bd5b410 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -70,6 +70,12 @@ struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev,
>    * Reset (PCIe r6.0 sec 5.8).
>    */
>   #define PCI_RESET_WAIT		1000	/* msec */
> +/*
> + * Devices may extend the 1 sec period through Request Retry Status completions
> + * (PCIe r6.0 sec 2.3.1).  The spec does not provide an upper limit, but 60 sec
> + * ought to be enough for any device to become responsive.
> + */
> +#define PCIE_RESET_READY_POLL_MS 60000	/* msec */
>   
>   void pci_update_current_state(struct pci_dev *dev, pci_power_t state);
>   void pci_refresh_power_state(struct pci_dev *dev);
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index f5ffea17c7f8..a5d7c69b764e 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -170,8 +170,8 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
>   	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
>   			      PCI_EXP_DPC_STATUS_TRIGGER);
>   
> -	if (!pcie_wait_for_link(pdev, true)) {
> -		pci_info(pdev, "Data Link Layer Link Active not set in 1000 msec\n");
> +	if (pci_bridge_wait_for_secondary_bus(pdev, "DPC",
> +					      PCIE_RESET_READY_POLL_MS)) {
>   		clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
>   		ret = PCI_ERS_RESULT_DISCONNECT;
>   	} else {


Thanks,


Yang
Lukas Wunner Feb. 19, 2023, 5:12 a.m. UTC | #2
On Sat, Feb 18, 2023 at 09:23:47PM +0800, Yang Su wrote:
> I do not understand why pci_bridge_wait_for_secondary_bus() can fix
> Intel's Ponte Vecchio HPC GPU after a DPC-induced Hot Reset.
> 
> The func pci_bridge_wait_for_secondary_bus() also use
> pcie_wait_for_link_delay() which time depends on the max device delay
> time of one bus, for the GPU which bus only one device, I think the
> time is 100ms as the input parater in pcie_wait_for_link_delay().
> 
> pcie_wait_for_link() also wait fixed 100ms and then wait the device data
> link is ready. So another wait time is pci_dev_wait() in your patch?
> pci_dev_wait() to receive the CRS from the device to check the device
> whether is ready.
> 
> Please help me understand which difference work.

The crucial difference is the invocation of pci_dev_wait(), which waits
up to 60 seconds for the device to come out of reset.

The spec allows 1 second but that may be extended via CRS.  Ponte Vecchio
has been witnessed to take more than 4 seconds in some cases, hence the
need to wait longer than 1 second.

Thanks,

Lukas
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 509f6b5c9e14..d31c21ea9688 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -167,9 +167,6 @@  static int __init pcie_port_pm_setup(char *str)
 }
 __setup("pcie_port_pm=", pcie_port_pm_setup);
 
-/* Time to wait after a reset for device to become responsive */
-#define PCIE_RESET_READY_POLL_MS 60000
-
 /**
  * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children
  * @bus: pointer to PCI bus structure to search
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index ce1fc3a90b3f..8f5d4bd5b410 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -70,6 +70,12 @@  struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev,
  * Reset (PCIe r6.0 sec 5.8).
  */
 #define PCI_RESET_WAIT		1000	/* msec */
+/*
+ * Devices may extend the 1 sec period through Request Retry Status completions
+ * (PCIe r6.0 sec 2.3.1).  The spec does not provide an upper limit, but 60 sec
+ * ought to be enough for any device to become responsive.
+ */
+#define PCIE_RESET_READY_POLL_MS 60000	/* msec */
 
 void pci_update_current_state(struct pci_dev *dev, pci_power_t state);
 void pci_refresh_power_state(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index f5ffea17c7f8..a5d7c69b764e 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -170,8 +170,8 @@  pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
 			      PCI_EXP_DPC_STATUS_TRIGGER);
 
-	if (!pcie_wait_for_link(pdev, true)) {
-		pci_info(pdev, "Data Link Layer Link Active not set in 1000 msec\n");
+	if (pci_bridge_wait_for_secondary_bus(pdev, "DPC",
+					      PCIE_RESET_READY_POLL_MS)) {
 		clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
 		ret = PCI_ERS_RESULT_DISCONNECT;
 	} else {