diff mbox series

[v4] PCI/PM: Shorten pci_bridge_wait_for_secondary_bus() wait time for slow links

Message ID 20230418072808.10431-1-mika.westerberg@linux.intel.com (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series [v4] PCI/PM: Shorten pci_bridge_wait_for_secondary_bus() wait time for slow links | expand

Commit Message

Mika Westerberg April 18, 2023, 7:28 a.m. UTC
With slow links (<= 5GT/s) active link reporting is not mandatory, so if
a device is disconnected during system sleep we might end up waiting for
it to respond for ~60s slowing down resume time. PCIe spec r6.0 sec
6.6.1 mandates that the system software must wait for at least 1s before
it can determine the device as brokine device so use the minimum
requirement for slow links. In addition if the port supports active link
reporting we check if the link is trained before starting to wait.

This should make system resume time faster for slow links as well while
still following the PCIe spec.

While there move the PCI_RESET_WAIT constant into pci.c because it is
not used outside of that file anymore.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
Hi all,

Here is the fourth version of the patch. Hopefully I did not miss
anything this time ;-)

This applies on top of

  https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=reset

Changes since the previous version:
  
  * Added back the comment change that was missed
  * Use PCI_RESET_WAIT - delay for the slow link timeout
  * Check active link reporting support in the slow link path

The previous version of the patch can be found:

  https://lore.kernel.org/linux-pci/20230413101642.8724-1-mika.westerberg@linux.intel.com/

 drivers/pci/pci.c | 44 ++++++++++++++++++++++++++++++++------------
 drivers/pci/pci.h |  7 -------
 2 files changed, 32 insertions(+), 19 deletions(-)

Comments

Lukas Wunner April 21, 2023, 8:51 p.m. UTC | #1
On Tue, Apr 18, 2023 at 10:28:08AM +0300, Mika Westerberg wrote:
> With slow links (<= 5GT/s) active link reporting is not mandatory, so if
> a device is disconnected during system sleep we might end up waiting for
> it to respond for ~60s slowing down resume time. PCIe spec r6.0 sec
> 6.6.1 mandates that the system software must wait for at least 1s before
> it can determine the device as brokine device so use the minimum
                                 ^^^^^^^
				 broken


> @@ -5027,14 +5032,29 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
>  	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
>  		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
>  		msleep(delay);
> -	} else {
> -		pci_dbg(dev, "waiting %d ms for downstream link, after activation\n",
> -			delay);
> -		if (!pcie_wait_for_link_delay(dev, true, delay)) {
> -			/* Did not train, no need to wait any further */
> -			pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n");
> -			return -ENOTTY;
> +
> +		/*
> +		 * If the port supports active link reporting we now check
> +		 * whether the link is active and if not bail out early with
> +		 * the assumption that the device is not present anymore.
> +		 */
> +		if (dev->link_active_reporting) {
> +			u16 status;
> +
> +			pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status);
> +			if (!(status & PCI_EXP_LNKSTA_DLLLA))
> +				return -ENOTTY;
>  		}
> +
> +		return pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay);
> +	}

So above in the Gen1/Gen2 case (<= 5 GT/s), a delay of 100 msec is afforded
and if the link isn't up by then, the function returns an error.

Doesn't that violate PCIe r6.0.1 sec 6.6.1 that states:

 "system software must allow at least 1.0 s following exit from a
  Conventional Reset of a device, before determining that the device
  is broken if it fails to return a Successful Completion status for
  a valid Configuration Request.  This period is independent of how
  quickly Link training completes."

I think what we can do here is:

		if (!pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay))
			return 0;

		if (!dev->link_active_reporting)
			return -ENOTTY;

		pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status);
		if (!(status & PCI_EXP_LNKSTA_DLLLA))
			return -ENOTTY;

		return pci_dev_wait(child, reset_type,
				    PCIE_RESET_READY_POLL_MS - PCI_RESET_WAIT);

In other words, if link active reporting is unsupported, we can only
afford the 1 second prescribed by the spec and that's it.  If the
subordinate device is still inaccessible after that, reset recovery
failed.

If link active reporting is supported and the link is up, then we know
the device is accessible but may need more time.  In that case the
full 60 seconds are afforded.

Does that make sense?

Thanks,

Lukas
Mika Westerberg April 24, 2023, 6 a.m. UTC | #2
Hi,

On Fri, Apr 21, 2023 at 10:51:14PM +0200, Lukas Wunner wrote:
> On Tue, Apr 18, 2023 at 10:28:08AM +0300, Mika Westerberg wrote:
> > With slow links (<= 5GT/s) active link reporting is not mandatory, so if
> > a device is disconnected during system sleep we might end up waiting for
> > it to respond for ~60s slowing down resume time. PCIe spec r6.0 sec
> > 6.6.1 mandates that the system software must wait for at least 1s before
> > it can determine the device as brokine device so use the minimum
>                                  ^^^^^^^
> 				 broken
> 
> 
> > @@ -5027,14 +5032,29 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
> >  	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
> >  		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
> >  		msleep(delay);
> > -	} else {
> > -		pci_dbg(dev, "waiting %d ms for downstream link, after activation\n",
> > -			delay);
> > -		if (!pcie_wait_for_link_delay(dev, true, delay)) {
> > -			/* Did not train, no need to wait any further */
> > -			pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n");
> > -			return -ENOTTY;
> > +
> > +		/*
> > +		 * If the port supports active link reporting we now check
> > +		 * whether the link is active and if not bail out early with
> > +		 * the assumption that the device is not present anymore.
> > +		 */
> > +		if (dev->link_active_reporting) {
> > +			u16 status;
> > +
> > +			pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status);
> > +			if (!(status & PCI_EXP_LNKSTA_DLLLA))
> > +				return -ENOTTY;
> >  		}
> > +
> > +		return pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay);
> > +	}
> 
> So above in the Gen1/Gen2 case (<= 5 GT/s), a delay of 100 msec is afforded
> and if the link isn't up by then, the function returns an error.
> 
> Doesn't that violate PCIe r6.0.1 sec 6.6.1 that states:
> 
>  "system software must allow at least 1.0 s following exit from a
>   Conventional Reset of a device, before determining that the device
>   is broken if it fails to return a Successful Completion status for
>   a valid Configuration Request.  This period is independent of how
>   quickly Link training completes."

Yes, it does :( Missed that last sentence.

> I think what we can do here is:
> 
> 		if (!pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay))
> 			return 0;
> 
> 		if (!dev->link_active_reporting)
> 			return -ENOTTY;
> 
> 		pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status);
> 		if (!(status & PCI_EXP_LNKSTA_DLLLA))
> 			return -ENOTTY;
> 
> 		return pci_dev_wait(child, reset_type,
> 				    PCIE_RESET_READY_POLL_MS - PCI_RESET_WAIT);
> 
> In other words, if link active reporting is unsupported, we can only
> afford the 1 second prescribed by the spec and that's it.  If the
> subordinate device is still inaccessible after that, reset recovery
> failed.
> 
> If link active reporting is supported and the link is up, then we know
> the device is accessible but may need more time.  In that case the
> full 60 seconds are afforded.
> 
> Does that make sense?

Yes, it does, thanks! I will send an updated version with this (and the
typo) fixed after the merge window closes.
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0b4f3b08f780..a40234c9f7a4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -64,6 +64,13 @@  struct pci_pme_device {
 
 #define PME_TIMEOUT 1000 /* How long between PME checks */
 
+/*
+ * Following exit from Conventional Reset, devices must be ready within 1 sec
+ * (PCIe r6.0 sec 6.6.1).  A D3cold to D0 transition implies a Conventional
+ * Reset (PCIe r6.0 sec 5.8).
+ */
+#define PCI_RESET_WAIT 1000 /* msec */
+
 /*
  * Devices may extend the 1 sec period through Request Retry Status
  * completions (PCIe r6.0 sec 2.3.1).  The spec does not provide an upper
@@ -5012,11 +5019,9 @@  int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
 	 *
 	 * However, 100 ms is the minimum and the PCIe spec says the
 	 * software must allow at least 1s before it can determine that the
-	 * device that did not respond is a broken device. There is
-	 * evidence that 100 ms is not always enough, for example certain
-	 * Titan Ridge xHCI controller does not always respond to
-	 * configuration requests if we only wait for 100 ms (see
-	 * https://bugzilla.kernel.org/show_bug.cgi?id=203885).
+	 * device that did not respond is a broken device. Also device can
+	 * take longer than that to respond if it indicates so through Request
+	 * Retry Status completions.
 	 *
 	 * Therefore we wait for 100 ms and check for the device presence
 	 * until the timeout expires.
@@ -5027,14 +5032,29 @@  int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
 	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
 		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
 		msleep(delay);
-	} else {
-		pci_dbg(dev, "waiting %d ms for downstream link, after activation\n",
-			delay);
-		if (!pcie_wait_for_link_delay(dev, true, delay)) {
-			/* Did not train, no need to wait any further */
-			pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n");
-			return -ENOTTY;
+
+		/*
+		 * If the port supports active link reporting we now check
+		 * whether the link is active and if not bail out early with
+		 * the assumption that the device is not present anymore.
+		 */
+		if (dev->link_active_reporting) {
+			u16 status;
+
+			pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &status);
+			if (!(status & PCI_EXP_LNKSTA_DLLLA))
+				return -ENOTTY;
 		}
+
+		return pci_dev_wait(child, reset_type, PCI_RESET_WAIT - delay);
+	}
+
+	pci_dbg(dev, "waiting %d ms for downstream link, after activation\n",
+		delay);
+	if (!pcie_wait_for_link_delay(dev, true, delay)) {
+		/* Did not train, no need to wait any further */
+		pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n");
+		return -ENOTTY;
 	}
 
 	return pci_dev_wait(child, reset_type,
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 022da58afb33..f2d3aeab91f4 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -64,13 +64,6 @@  struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev,
 #define PCI_PM_D3HOT_WAIT       10	/* msec */
 #define PCI_PM_D3COLD_WAIT      100	/* msec */
 
-/*
- * Following exit from Conventional Reset, devices must be ready within 1 sec
- * (PCIe r6.0 sec 6.6.1).  A D3cold to D0 transition implies a Conventional
- * Reset (PCIe r6.0 sec 5.8).
- */
-#define PCI_RESET_WAIT		1000	/* msec */
-
 void pci_update_current_state(struct pci_dev *dev, pci_power_t state);
 void pci_refresh_power_state(struct pci_dev *dev);
 int pci_power_up(struct pci_dev *dev);