diff mbox series

PCI: Do not use pcie_get_speed_cap() to determine when to start waiting

Message ID 20200416083245.73957-1-mika.westerberg@linux.intel.com (mailing list archive)
State Superseded, archived
Delegated to: Bjorn Helgaas
Headers show
Series PCI: Do not use pcie_get_speed_cap() to determine when to start waiting | expand

Commit Message

Mika Westerberg April 16, 2020, 8:32 a.m. UTC
Kai-Heng Feng reported that it takes long time (>1s) to resume
Thunderbolt connected PCIe devices from both runtime suspend and system
sleep (s2idle).

These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2)
announces support for speeds > 5 GT/s but it is then capped by the
second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This
possiblity was not considered in pci_bridge_wait_for_secondary_bus() so
it ended up waiting for 1100 ms as these ports do not support active
link layer reporting either.

PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms
before sending configuration request to the device below, if the port
does not support speeds > 5 GT/s, and if it does we first need to wait
for the data link layer to become active before waiting for that 100 ms.

PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports
that support speeds > 5 GT/s must support active link layer reporting so
instead of looking for the speed we can check for the active link layer
reporting capability and determine how to wait based on that (as they go
hand in hand).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206837
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/pci.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas May 6, 2020, 10:42 p.m. UTC | #1
On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote:
> Kai-Heng Feng reported that it takes long time (>1s) to resume
> Thunderbolt connected PCIe devices from both runtime suspend and system
> sleep (s2idle).
> 
> These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2)
> announces support for speeds > 5 GT/s but it is then capped by the
> second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This
> possiblity was not considered in pci_bridge_wait_for_secondary_bus() so
> it ended up waiting for 1100 ms as these ports do not support active
> link layer reporting either.
> 
> PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms
> before sending configuration request to the device below, if the port
> does not support speeds > 5 GT/s, and if it does we first need to wait
> for the data link layer to become active before waiting for that 100 ms.
> 
> PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports
> that support speeds > 5 GT/s must support active link layer reporting so
> instead of looking for the speed we can check for the active link layer
> reporting capability and determine how to wait based on that (as they go
> hand in hand).

I can't quite tell what the defect is here.

I assume you're talking about this text from sec 6.6.1:

  - With a Downstream Port that does not support Link speeds greater
    than 5.0 GT/s, software must wait a minimum of 100 ms before
    sending a Configuration Request to the device immediately below
    that Port.

  - With a Downstream Port that supports Link speeds greater than 5.0
    GT/s, software must wait a minimum of 100 ms after Link training
    completes before sending a Configuration Request to the device
    immediately below that Port. Software can determine when Link
    training completes by polling the Data Link Layer Link Active bit
    or by setting up an associated interrupt (see Section 6.7.3.3 ).

I don't understand what Link Control 2 has to do with this.  The spec
talks about ports *supporting* certain link speeds, which sounds to me
like the Link Capabilities.  It doesn't say anything about the current
or target link speed.

> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206837
> Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/pci/pci.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 595fcf59843f..d9d9ff5b968e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4822,7 +4822,13 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
>  	if (!pcie_downstream_port(dev))
>  		return;
>  
> -	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
> +	/*
> +	 * Since PCIe spec mandates that all downstream ports that support
> +	 * speeds greater than 5 GT/s must support data link layer active
> +	 * reporting so we use that here to determine when the delay should
> +	 * be issued.
> +	 */
> +	if (!dev->link_active_reporting) {
>  		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
>  		msleep(delay);
>  	} else {
> -- 
> 2.25.1
>
Bjorn Helgaas May 6, 2020, 11:27 p.m. UTC | #2
On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote:
> Kai-Heng Feng reported that it takes long time (>1s) to resume
> Thunderbolt connected PCIe devices from both runtime suspend and system
> sleep (s2idle).
> 
> These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2)
> announces support for speeds > 5 GT/s but it is then capped by the
> second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s.

> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206837

Slight tangent: The bugzilla mentions that lspci doesn't decode
LNKCAP2: https://github.com/pciutils/pciutils/issues/38

Can you try the lspci patch below and see if it matches what you
expect?  It works for me, but I don't understand the kernel issue yet,
so I might be missing something.

commit e2bdd75bbaf6 ("lspci: Decode PCIe Link Capabilities 2")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Wed May 6 18:05:55 2020 -0500

    lspci: Decode PCIe Link Capabilities 2
    
    Decode Link Capabilities 2, which includes the Supported Link Speeds
    Vector.
    
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/lib/header.h b/lib/header.h
index bfdcc80..3332b32 100644
--- a/lib/header.h
+++ b/lib/header.h
@@ -901,6 +901,9 @@
 #define  PCI_EXP_DEV2_OBFF(x)		(((x) >> 13) & 3) /* OBFF enabled */
 #define PCI_EXP_DEVSTA2			0x2a	/* Device Status */
 #define PCI_EXP_LNKCAP2			0x2c	/* Link Capabilities */
+#define  PCI_EXP_LNKCAP2_SPEED(x)	(((x) >> 1) & 0x7f)
+#define  PCI_EXP_LNKCAP2_CROSSLINK	0x00000100 /* Crosslink Supported */
+#define  PCI_EXP_LNKCAP2_DRS		0x80000000 /* Device Readiness Status */
 #define PCI_EXP_LNKCTL2			0x30	/* Link Control */
 #define  PCI_EXP_LNKCTL2_SPEED(x)	((x) & 0xf) /* Target Link Speed */
 #define  PCI_EXP_LNKCTL2_CMPLNC		0x0010	/* Enter Compliance */
diff --git a/ls-caps.c b/ls-caps.c
index a6705eb..585b208 100644
--- a/ls-caps.c
+++ b/ls-caps.c
@@ -1151,6 +1151,29 @@ static void cap_express_dev2(struct device *d, int where, int type)
     }
 }
 
+static const char *cap_express_link2_speed_cap(int vector)
+{
+  /*
+   * Per PCIe r5.0, sec 8.2.1, a device must support 2.5GT/s and is not
+   * permitted to skip support for any data rates between 2.5GT/s and the
+   * highest supported rate.
+   */
+  if (vector & 0x60)
+    return "RsvdP";
+  else if (vector & 0x10)
+    return "2.5-32GT/s";
+  else if (vector & 0x08)
+    return "2.5-16GT/s";
+  else if (vector & 0x04)
+    return "2.5-8GT/s";
+  else if (vector & 0x02)
+    return "2.5-5GT/s";
+  else if (vector & 0x01)
+    return "2.5GT/s";
+
+  return "Unknown";
+}
+
 static const char *cap_express_link2_speed(int type)
 {
   switch (type)
@@ -1204,10 +1227,19 @@ static const char *cap_express_link2_transmargin(int type)
 
 static void cap_express_link2(struct device *d, int where, int type)
 {
+  u32 l;
   u16 w;
 
   if (!((type == PCI_EXP_TYPE_ENDPOINT || type == PCI_EXP_TYPE_LEG_END) &&
 	(d->dev->dev != 0 || d->dev->func != 0))) {
+    /* Link Capabilities 2 was reserved before PCIe r3.0 */
+    l = get_conf_long(d, where + PCI_EXP_LNKCAP2);
+    if (l) {
+      printf("\t\tLnkCap2: Supported Link Speeds: %s, Crosslink%c DRS%c\n",
+	  cap_express_link2_speed_cap(PCI_EXP_LNKCAP2_SPEED(l)),
+	  FLAG(l, PCI_EXP_LNKCAP2_CROSSLINK),
+	  FLAG(l, PCI_EXP_LNKCAP2_DRS));
+    }
     w = get_conf_word(d, where + PCI_EXP_LNKCTL2);
     printf("\t\tLnkCtl2: Target Link Speed: %s, EnterCompliance%c SpeedDis%c",
 	cap_express_link2_speed(PCI_EXP_LNKCTL2_SPEED(w)),
Mika Westerberg May 7, 2020, 11:45 a.m. UTC | #3
On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote:
> On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote:
> > Kai-Heng Feng reported that it takes long time (>1s) to resume
> > Thunderbolt connected PCIe devices from both runtime suspend and system
> > sleep (s2idle).
> > 
> > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2)
> > announces support for speeds > 5 GT/s but it is then capped by the
> > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This
> > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so
> > it ended up waiting for 1100 ms as these ports do not support active
> > link layer reporting either.
> > 
> > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms
> > before sending configuration request to the device below, if the port
> > does not support speeds > 5 GT/s, and if it does we first need to wait
> > for the data link layer to become active before waiting for that 100 ms.
> > 
> > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports
> > that support speeds > 5 GT/s must support active link layer reporting so
> > instead of looking for the speed we can check for the active link layer
> > reporting capability and determine how to wait based on that (as they go
> > hand in hand).
> 
> I can't quite tell what the defect is here.
> 
> I assume you're talking about this text from sec 6.6.1:
> 
>   - With a Downstream Port that does not support Link speeds greater
>     than 5.0 GT/s, software must wait a minimum of 100 ms before
>     sending a Configuration Request to the device immediately below
>     that Port.
> 
>   - With a Downstream Port that supports Link speeds greater than 5.0
>     GT/s, software must wait a minimum of 100 ms after Link training
>     completes before sending a Configuration Request to the device
>     immediately below that Port. Software can determine when Link
>     training completes by polling the Data Link Layer Link Active bit
>     or by setting up an associated interrupt (see Section 6.7.3.3 ).
> 
> I don't understand what Link Control 2 has to do with this.  The spec
> talks about ports *supporting* certain link speeds, which sounds to me
> like the Link Capabilities.  It doesn't say anything about the current
> or target link speed.

PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link
Capabilities 2 register and that's what pcie_get_speed_cap() is doing.

However, we can avoid figuring the speed altogether by checking the
dev->link_active_reporting instead because that needs to be implemented
by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735).
Bjorn Helgaas May 7, 2020, 12:24 p.m. UTC | #4
On Thu, May 7, 2020 at 6:45 AM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote:
> > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote:
> > > Kai-Heng Feng reported that it takes long time (>1s) to resume
> > > Thunderbolt connected PCIe devices from both runtime suspend and system
> > > sleep (s2idle).
> > >
> > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2)
> > > announces support for speeds > 5 GT/s but it is then capped by the
> > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This
> > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so
> > > it ended up waiting for 1100 ms as these ports do not support active
> > > link layer reporting either.
> > >
> > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms
> > > before sending configuration request to the device below, if the port
> > > does not support speeds > 5 GT/s, and if it does we first need to wait
> > > for the data link layer to become active before waiting for that 100 ms.
> > >
> > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports
> > > that support speeds > 5 GT/s must support active link layer reporting so
> > > instead of looking for the speed we can check for the active link layer
> > > reporting capability and determine how to wait based on that (as they go
> > > hand in hand).
> >
> > I can't quite tell what the defect is here.
> >
> > I assume you're talking about this text from sec 6.6.1:
> >
> >   - With a Downstream Port that does not support Link speeds greater
> >     than 5.0 GT/s, software must wait a minimum of 100 ms before
> >     sending a Configuration Request to the device immediately below
> >     that Port.
> >
> >   - With a Downstream Port that supports Link speeds greater than 5.0
> >     GT/s, software must wait a minimum of 100 ms after Link training
> >     completes before sending a Configuration Request to the device
> >     immediately below that Port. Software can determine when Link
> >     training completes by polling the Data Link Layer Link Active bit
> >     or by setting up an associated interrupt (see Section 6.7.3.3 ).
> >
> > I don't understand what Link Control 2 has to do with this.  The spec
> > talks about ports *supporting* certain link speeds, which sounds to me
> > like the Link Capabilities.  It doesn't say anything about the current
> > or target link speed.
>
> PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link
> Capabilities 2 register and that's what pcie_get_speed_cap() is doing.
>
> However, we can avoid figuring the speed altogether by checking the
> dev->link_active_reporting instead because that needs to be implemented
> by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735).

I understand that part.  But the code as-is matches sec 6.6.1, which
makes it easy to understand.  Checking link_active_reporting instead
makes it harder to understand because it makes it one step removed
from 6.6.1.  And link_active_reporting must be set for ports that
support > 5 GT/s, but it must *also* be set in some hotplug cases, so
that further complicates the connection between it and 6.6.1.

And apparently there's an actual defect, and I don't understand what
that is yet.  It sounds like we agree that pcie_get_speed_cap() is
doing the right thing.  Is there something in
pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1?

Bjorn
Mika Westerberg May 7, 2020, 12:35 p.m. UTC | #5
On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote:
> On Thu, May 7, 2020 at 6:45 AM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote:
> > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote:
> > > > Kai-Heng Feng reported that it takes long time (>1s) to resume
> > > > Thunderbolt connected PCIe devices from both runtime suspend and system
> > > > sleep (s2idle).
> > > >
> > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2)
> > > > announces support for speeds > 5 GT/s but it is then capped by the
> > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This
> > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so
> > > > it ended up waiting for 1100 ms as these ports do not support active
> > > > link layer reporting either.
> > > >
> > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms
> > > > before sending configuration request to the device below, if the port
> > > > does not support speeds > 5 GT/s, and if it does we first need to wait
> > > > for the data link layer to become active before waiting for that 100 ms.
> > > >
> > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports
> > > > that support speeds > 5 GT/s must support active link layer reporting so
> > > > instead of looking for the speed we can check for the active link layer
> > > > reporting capability and determine how to wait based on that (as they go
> > > > hand in hand).
> > >
> > > I can't quite tell what the defect is here.
> > >
> > > I assume you're talking about this text from sec 6.6.1:
> > >
> > >   - With a Downstream Port that does not support Link speeds greater
> > >     than 5.0 GT/s, software must wait a minimum of 100 ms before
> > >     sending a Configuration Request to the device immediately below
> > >     that Port.
> > >
> > >   - With a Downstream Port that supports Link speeds greater than 5.0
> > >     GT/s, software must wait a minimum of 100 ms after Link training
> > >     completes before sending a Configuration Request to the device
> > >     immediately below that Port. Software can determine when Link
> > >     training completes by polling the Data Link Layer Link Active bit
> > >     or by setting up an associated interrupt (see Section 6.7.3.3 ).
> > >
> > > I don't understand what Link Control 2 has to do with this.  The spec
> > > talks about ports *supporting* certain link speeds, which sounds to me
> > > like the Link Capabilities.  It doesn't say anything about the current
> > > or target link speed.
> >
> > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link
> > Capabilities 2 register and that's what pcie_get_speed_cap() is doing.
> >
> > However, we can avoid figuring the speed altogether by checking the
> > dev->link_active_reporting instead because that needs to be implemented
> > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735).
> 
> I understand that part.  But the code as-is matches sec 6.6.1, which
> makes it easy to understand.  Checking link_active_reporting instead
> makes it harder to understand because it makes it one step removed
> from 6.6.1.  And link_active_reporting must be set for ports that
> support > 5 GT/s, but it must *also* be set in some hotplug cases, so
> that further complicates the connection between it and 6.6.1.
> 
> And apparently there's an actual defect, and I don't understand what
> that is yet.  It sounds like we agree that pcie_get_speed_cap() is
> doing the right thing.  Is there something in
> pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1?

The defect is that some downstream PCIe ports on a system Kai-Heng is
using have Supported Link Speeds Vector with > 5GT/s and it does not
support active link reporting so the currect code ends up waiting 1100 ms
slowing down resume time.

The Target Link Speed of this port (in Link Control 2) has the speed
capped to 2.5 GT/s.
Bjorn Helgaas May 7, 2020, 1:33 p.m. UTC | #6
On Thu, May 7, 2020 at 7:36 AM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote:
> > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg
> > <mika.westerberg@linux.intel.com> wrote:
> > >
> > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote:
> > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote:
> > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume
> > > > > Thunderbolt connected PCIe devices from both runtime suspend and system
> > > > > sleep (s2idle).
> > > > >
> > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2)
> > > > > announces support for speeds > 5 GT/s but it is then capped by the
> > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This
> > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so
> > > > > it ended up waiting for 1100 ms as these ports do not support active
> > > > > link layer reporting either.
> > > > >
> > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms
> > > > > before sending configuration request to the device below, if the port
> > > > > does not support speeds > 5 GT/s, and if it does we first need to wait
> > > > > for the data link layer to become active before waiting for that 100 ms.
> > > > >
> > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports
> > > > > that support speeds > 5 GT/s must support active link layer reporting so
> > > > > instead of looking for the speed we can check for the active link layer
> > > > > reporting capability and determine how to wait based on that (as they go
> > > > > hand in hand).
> > > >
> > > > I can't quite tell what the defect is here.
> > > >
> > > > I assume you're talking about this text from sec 6.6.1:
> > > >
> > > >   - With a Downstream Port that does not support Link speeds greater
> > > >     than 5.0 GT/s, software must wait a minimum of 100 ms before
> > > >     sending a Configuration Request to the device immediately below
> > > >     that Port.
> > > >
> > > >   - With a Downstream Port that supports Link speeds greater than 5.0
> > > >     GT/s, software must wait a minimum of 100 ms after Link training
> > > >     completes before sending a Configuration Request to the device
> > > >     immediately below that Port. Software can determine when Link
> > > >     training completes by polling the Data Link Layer Link Active bit
> > > >     or by setting up an associated interrupt (see Section 6.7.3.3 ).
> > > >
> > > > I don't understand what Link Control 2 has to do with this.  The spec
> > > > talks about ports *supporting* certain link speeds, which sounds to me
> > > > like the Link Capabilities.  It doesn't say anything about the current
> > > > or target link speed.
> > >
> > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link
> > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing.
> > >
> > > However, we can avoid figuring the speed altogether by checking the
> > > dev->link_active_reporting instead because that needs to be implemented
> > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735).
> >
> > I understand that part.  But the code as-is matches sec 6.6.1, which
> > makes it easy to understand.  Checking link_active_reporting instead
> > makes it harder to understand because it makes it one step removed
> > from 6.6.1.  And link_active_reporting must be set for ports that
> > support > 5 GT/s, but it must *also* be set in some hotplug cases, so
> > that further complicates the connection between it and 6.6.1.
> >
> > And apparently there's an actual defect, and I don't understand what
> > that is yet.  It sounds like we agree that pcie_get_speed_cap() is
> > doing the right thing.  Is there something in
> > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1?
>
> The defect is that some downstream PCIe ports on a system Kai-Heng is
> using have Supported Link Speeds Vector with > 5GT/s and it does not
> support active link reporting so the currect code ends up waiting 1100 ms
> slowing down resume time.

That sounds like a hardware defect that should be worked around with a
quirk or something.  If we just restructure the code to avoid the
problem, we're likely to reintroduce it later because there's no hint
in the code about this problem.

> The Target Link Speed of this port (in Link Control 2) has the speed
> capped to 2.5 GT/s.

I saw you mentioned this before, but I don't understand yet why it's
relevant.  The spec says "supported,", not "current" or "target".
Mika Westerberg May 7, 2020, 1:46 p.m. UTC | #7
On Thu, May 07, 2020 at 08:33:27AM -0500, Bjorn Helgaas wrote:
> On Thu, May 7, 2020 at 7:36 AM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote:
> > > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg
> > > <mika.westerberg@linux.intel.com> wrote:
> > > >
> > > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote:
> > > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote:
> > > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume
> > > > > > Thunderbolt connected PCIe devices from both runtime suspend and system
> > > > > > sleep (s2idle).
> > > > > >
> > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2)
> > > > > > announces support for speeds > 5 GT/s but it is then capped by the
> > > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This
> > > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so
> > > > > > it ended up waiting for 1100 ms as these ports do not support active
> > > > > > link layer reporting either.
> > > > > >
> > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms
> > > > > > before sending configuration request to the device below, if the port
> > > > > > does not support speeds > 5 GT/s, and if it does we first need to wait
> > > > > > for the data link layer to become active before waiting for that 100 ms.
> > > > > >
> > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports
> > > > > > that support speeds > 5 GT/s must support active link layer reporting so
> > > > > > instead of looking for the speed we can check for the active link layer
> > > > > > reporting capability and determine how to wait based on that (as they go
> > > > > > hand in hand).
> > > > >
> > > > > I can't quite tell what the defect is here.
> > > > >
> > > > > I assume you're talking about this text from sec 6.6.1:
> > > > >
> > > > >   - With a Downstream Port that does not support Link speeds greater
> > > > >     than 5.0 GT/s, software must wait a minimum of 100 ms before
> > > > >     sending a Configuration Request to the device immediately below
> > > > >     that Port.
> > > > >
> > > > >   - With a Downstream Port that supports Link speeds greater than 5.0
> > > > >     GT/s, software must wait a minimum of 100 ms after Link training
> > > > >     completes before sending a Configuration Request to the device
> > > > >     immediately below that Port. Software can determine when Link
> > > > >     training completes by polling the Data Link Layer Link Active bit
> > > > >     or by setting up an associated interrupt (see Section 6.7.3.3 ).
> > > > >
> > > > > I don't understand what Link Control 2 has to do with this.  The spec
> > > > > talks about ports *supporting* certain link speeds, which sounds to me
> > > > > like the Link Capabilities.  It doesn't say anything about the current
> > > > > or target link speed.
> > > >
> > > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link
> > > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing.
> > > >
> > > > However, we can avoid figuring the speed altogether by checking the
> > > > dev->link_active_reporting instead because that needs to be implemented
> > > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735).
> > >
> > > I understand that part.  But the code as-is matches sec 6.6.1, which
> > > makes it easy to understand.  Checking link_active_reporting instead
> > > makes it harder to understand because it makes it one step removed
> > > from 6.6.1.  And link_active_reporting must be set for ports that
> > > support > 5 GT/s, but it must *also* be set in some hotplug cases, so
> > > that further complicates the connection between it and 6.6.1.
> > >
> > > And apparently there's an actual defect, and I don't understand what
> > > that is yet.  It sounds like we agree that pcie_get_speed_cap() is
> > > doing the right thing.  Is there something in
> > > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1?
> >
> > The defect is that some downstream PCIe ports on a system Kai-Heng is
> > using have Supported Link Speeds Vector with > 5GT/s and it does not
> > support active link reporting so the currect code ends up waiting 1100 ms
> > slowing down resume time.
> 
> That sounds like a hardware defect that should be worked around with a
> quirk or something.  If we just restructure the code to avoid the
> problem, we're likely to reintroduce it later because there's no hint
> in the code about this problem.

That's why I added the comment there to explain this.

Can you propose a patch following what you were thinking that solves
this so Kai-Heng can try it out?

> > The Target Link Speed of this port (in Link Control 2) has the speed
> > capped to 2.5 GT/s.
> 
> I saw you mentioned this before, but I don't understand yet why it's
> relevant.  The spec says "supported,", not "current" or "target".

Because my interpretation of this part of the spec is that supported
speed is supported up to what is set in target speed. In other words you
can have 8GT/s there in the supported vector and the hardware/firmware
can force the target to be lower therefore the link speed is always
capped by that.

I'm not talking about "current" here. Only "supported" and "target".
Bjorn Helgaas May 7, 2020, 5:11 p.m. UTC | #8
On Thu, May 07, 2020 at 04:46:27PM +0300, Mika Westerberg wrote:
> On Thu, May 07, 2020 at 08:33:27AM -0500, Bjorn Helgaas wrote:
> > On Thu, May 7, 2020 at 7:36 AM Mika Westerberg
> > <mika.westerberg@linux.intel.com> wrote:
> > >
> > > On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote:
> > > > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg
> > > > <mika.westerberg@linux.intel.com> wrote:
> > > > >
> > > > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote:
> > > > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote:
> > > > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume
> > > > > > > Thunderbolt connected PCIe devices from both runtime suspend and system
> > > > > > > sleep (s2idle).
> > > > > > >
> > > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2)
> > > > > > > announces support for speeds > 5 GT/s but it is then capped by the
> > > > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This
> > > > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so
> > > > > > > it ended up waiting for 1100 ms as these ports do not support active
> > > > > > > link layer reporting either.
> > > > > > >
> > > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms
> > > > > > > before sending configuration request to the device below, if the port
> > > > > > > does not support speeds > 5 GT/s, and if it does we first need to wait
> > > > > > > for the data link layer to become active before waiting for that 100 ms.
> > > > > > >
> > > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports
> > > > > > > that support speeds > 5 GT/s must support active link layer reporting so
> > > > > > > instead of looking for the speed we can check for the active link layer
> > > > > > > reporting capability and determine how to wait based on that (as they go
> > > > > > > hand in hand).
> > > > > >
> > > > > > I can't quite tell what the defect is here.
> > > > > >
> > > > > > I assume you're talking about this text from sec 6.6.1:
> > > > > >
> > > > > >   - With a Downstream Port that does not support Link speeds greater
> > > > > >     than 5.0 GT/s, software must wait a minimum of 100 ms before
> > > > > >     sending a Configuration Request to the device immediately below
> > > > > >     that Port.
> > > > > >
> > > > > >   - With a Downstream Port that supports Link speeds greater than 5.0
> > > > > >     GT/s, software must wait a minimum of 100 ms after Link training
> > > > > >     completes before sending a Configuration Request to the device
> > > > > >     immediately below that Port. Software can determine when Link
> > > > > >     training completes by polling the Data Link Layer Link Active bit
> > > > > >     or by setting up an associated interrupt (see Section 6.7.3.3 ).
> > > > > >
> > > > > > I don't understand what Link Control 2 has to do with this.  The spec
> > > > > > talks about ports *supporting* certain link speeds, which sounds to me
> > > > > > like the Link Capabilities.  It doesn't say anything about the current
> > > > > > or target link speed.
> > > > >
> > > > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link
> > > > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing.
> > > > >
> > > > > However, we can avoid figuring the speed altogether by checking the
> > > > > dev->link_active_reporting instead because that needs to be implemented
> > > > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735).
> > > >
> > > > I understand that part.  But the code as-is matches sec 6.6.1, which
> > > > makes it easy to understand.  Checking link_active_reporting instead
> > > > makes it harder to understand because it makes it one step removed
> > > > from 6.6.1.  And link_active_reporting must be set for ports that
> > > > support > 5 GT/s, but it must *also* be set in some hotplug cases, so
> > > > that further complicates the connection between it and 6.6.1.
> > > >
> > > > And apparently there's an actual defect, and I don't understand what
> > > > that is yet.  It sounds like we agree that pcie_get_speed_cap() is
> > > > doing the right thing.  Is there something in
> > > > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1?
> > >
> > > The defect is that some downstream PCIe ports on a system Kai-Heng is
> > > using have Supported Link Speeds Vector with > 5GT/s and it does not
> > > support active link reporting so the currect code ends up waiting 1100 ms
> > > slowing down resume time.

Ah.  From the lspci and dmesg instrumentation in the bugzilla, I
guess:

  04:00.0 Thunderbolt Downstream Port to [bus 05]
    LnkCap: Speed 2.5GT/s, LLActRep-
    LnkSta: Speed 2.5GT/s
    LnkCap2: Supported Link Speeds: 2.5-8GT/s
    LnkCtl2: Target Link Speed: 2.5GT/s
  04:02.0 Thunderbolt Downstream Port to [bus 39]
    LnkCap: Speed 2.5GT/s, LLActRep-
    LnkSta: Speed 2.5GT/s
    LnkCap2: Supported Link Speeds: 2.5-8GT/s
    LnkCtl2: Target Link Speed: 2.5GT/s

So per the Link Cap 2 Supported Link Speeds Vector, both devices
support up to 8GT/s, but neither one advertises Data Link Layer Link
Active Reporting Capable in Link Cap.

The Root Port to the NVIDIA GPU is similar; it advertises 8GT/s
support but not LLActRep:

  00:01.0 Root Port to [bus 01]
    LnkCap: Speed 8GT/s, LLActRep-
    LnkSta: Speed 8GT/s
    LnkCap2: Supported Link Speeds: 2.5-8GT/s
    LnkCtl2: Target Link Speed: 8GT/s

The fact that all three of these don't match what I expect makes me
wonder if I'm reading the spec wrong or lspci is decoding something
wrong.

For the Thunderbolt ports we could make the argument (as I think
you're suggesting) that the "supported link speed" is really the
minimum of the "Link Cap 2 Supported Link Speed" and the "Link Control
2 Target Link Speed".

But even that wouldn't explain why 00:01.0 doesn't advertise LLActRep+
when it is actually running at 8GT/s.

> > That sounds like a hardware defect that should be worked around with a
> > quirk or something.  If we just restructure the code to avoid the
> > problem, we're likely to reintroduce it later because there's no hint
> > in the code about this problem.
> 
> That's why I added the comment there to explain this.
> 
> Can you propose a patch following what you were thinking that solves
> this so Kai-Heng can try it out?
> 
> > > The Target Link Speed of this port (in Link Control 2) has the speed
> > > capped to 2.5 GT/s.
> > 
> > I saw you mentioned this before, but I don't understand yet why it's
> > relevant.  The spec says "supported,", not "current" or "target".
> 
> Because my interpretation of this part of the spec is that supported
> speed is supported up to what is set in target speed. In other words you
> can have 8GT/s there in the supported vector and the hardware/firmware
> can force the target to be lower therefore the link speed is always
> capped by that.
> 
> I'm not talking about "current" here. Only "supported" and "target".

Kai-Heng, would you mind attaching the "sudo lspci -vvxxxx" output to
the bugzilla?  I can use that to test my lspci patch.
Bjorn Helgaas May 8, 2020, 10:58 p.m. UTC | #9
On Thu, May 07, 2020 at 12:11:27PM -0500, Bjorn Helgaas wrote:
> On Thu, May 07, 2020 at 04:46:27PM +0300, Mika Westerberg wrote:
> > On Thu, May 07, 2020 at 08:33:27AM -0500, Bjorn Helgaas wrote:
> > > On Thu, May 7, 2020 at 7:36 AM Mika Westerberg
> > > <mika.westerberg@linux.intel.com> wrote:
> > > >
> > > > On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote:
> > > > > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg
> > > > > <mika.westerberg@linux.intel.com> wrote:
> > > > > >
> > > > > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote:
> > > > > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote:
> > > > > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume
> > > > > > > > Thunderbolt connected PCIe devices from both runtime suspend and system
> > > > > > > > sleep (s2idle).
> > > > > > > >
> > > > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2)
> > > > > > > > announces support for speeds > 5 GT/s but it is then capped by the
> > > > > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This
> > > > > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so
> > > > > > > > it ended up waiting for 1100 ms as these ports do not support active
> > > > > > > > link layer reporting either.
> > > > > > > >
> > > > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms
> > > > > > > > before sending configuration request to the device below, if the port
> > > > > > > > does not support speeds > 5 GT/s, and if it does we first need to wait
> > > > > > > > for the data link layer to become active before waiting for that 100 ms.
> > > > > > > >
> > > > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports
> > > > > > > > that support speeds > 5 GT/s must support active link layer reporting so
> > > > > > > > instead of looking for the speed we can check for the active link layer
> > > > > > > > reporting capability and determine how to wait based on that (as they go
> > > > > > > > hand in hand).
> > > > > > >
> > > > > > > I can't quite tell what the defect is here.
> > > > > > >
> > > > > > > I assume you're talking about this text from sec 6.6.1:
> > > > > > >
> > > > > > >   - With a Downstream Port that does not support Link speeds greater
> > > > > > >     than 5.0 GT/s, software must wait a minimum of 100 ms before
> > > > > > >     sending a Configuration Request to the device immediately below
> > > > > > >     that Port.
> > > > > > >
> > > > > > >   - With a Downstream Port that supports Link speeds greater than 5.0
> > > > > > >     GT/s, software must wait a minimum of 100 ms after Link training
> > > > > > >     completes before sending a Configuration Request to the device
> > > > > > >     immediately below that Port. Software can determine when Link
> > > > > > >     training completes by polling the Data Link Layer Link Active bit
> > > > > > >     or by setting up an associated interrupt (see Section 6.7.3.3 ).
> > > > > > >
> > > > > > > I don't understand what Link Control 2 has to do with this.  The spec
> > > > > > > talks about ports *supporting* certain link speeds, which sounds to me
> > > > > > > like the Link Capabilities.  It doesn't say anything about the current
> > > > > > > or target link speed.
> > > > > >
> > > > > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link
> > > > > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing.
> > > > > >
> > > > > > However, we can avoid figuring the speed altogether by checking the
> > > > > > dev->link_active_reporting instead because that needs to be implemented
> > > > > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735).
> > > > >
> > > > > I understand that part.  But the code as-is matches sec 6.6.1, which
> > > > > makes it easy to understand.  Checking link_active_reporting instead
> > > > > makes it harder to understand because it makes it one step removed
> > > > > from 6.6.1.  And link_active_reporting must be set for ports that
> > > > > support > 5 GT/s, but it must *also* be set in some hotplug cases, so
> > > > > that further complicates the connection between it and 6.6.1.
> > > > >
> > > > > And apparently there's an actual defect, and I don't understand what
> > > > > that is yet.  It sounds like we agree that pcie_get_speed_cap() is
> > > > > doing the right thing.  Is there something in
> > > > > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1?
> > > >
> > > > The defect is that some downstream PCIe ports on a system Kai-Heng is
> > > > using have Supported Link Speeds Vector with > 5GT/s and it does not
> > > > support active link reporting so the currect code ends up waiting 1100 ms
> > > > slowing down resume time.
> 
> Ah.  From the lspci and dmesg instrumentation in the bugzilla, I
> guess:
> 
>   04:00.0 Thunderbolt Downstream Port to [bus 05]
>     LnkCap: Speed 2.5GT/s, LLActRep-
>     LnkSta: Speed 2.5GT/s
>     LnkCap2: Supported Link Speeds: 2.5-8GT/s
>     LnkCtl2: Target Link Speed: 2.5GT/s
>   04:02.0 Thunderbolt Downstream Port to [bus 39]
>     LnkCap: Speed 2.5GT/s, LLActRep-
>     LnkSta: Speed 2.5GT/s
>     LnkCap2: Supported Link Speeds: 2.5-8GT/s
>     LnkCtl2: Target Link Speed: 2.5GT/s
> 
> So per the Link Cap 2 Supported Link Speeds Vector, both devices
> support up to 8GT/s, but neither one advertises Data Link Layer Link
> Active Reporting Capable in Link Cap.
> 
> The Root Port to the NVIDIA GPU is similar; it advertises 8GT/s
> support but not LLActRep:
> 
>   00:01.0 Root Port to [bus 01]
>     LnkCap: Speed 8GT/s, LLActRep-
>     LnkSta: Speed 8GT/s
>     LnkCap2: Supported Link Speeds: 2.5-8GT/s
>     LnkCtl2: Target Link Speed: 8GT/s
> 
> The fact that all three of these don't match what I expect makes me
> wonder if I'm reading the spec wrong or lspci is decoding something
> wrong.
> 
> For the Thunderbolt ports we could make the argument (as I think
> you're suggesting) that the "supported link speed" is really the
> minimum of the "Link Cap 2 Supported Link Speed" and the "Link Control
> 2 Target Link Speed".
> 
> But even that wouldn't explain why 00:01.0 doesn't advertise LLActRep+
> when it is actually running at 8GT/s.

FWIW, I posted a question about this to the PCI-SIG forum.  I don't
have high hopes because that's a really low-bandwidth channel.

> > > That sounds like a hardware defect that should be worked around with a
> > > quirk or something.  If we just restructure the code to avoid the
> > > problem, we're likely to reintroduce it later because there's no hint
> > > in the code about this problem.
> > 
> > That's why I added the comment there to explain this.
> > 
> > Can you propose a patch following what you were thinking that solves
> > this so Kai-Heng can try it out?

I think your patch actually makes a lot more *sense* than the language
in the spec does.  For the second rule:

  With a Downstream Port that supports Link speeds greater than 5.0
  GT/s, software must wait a minimum of 100 ms after Link training
  completes before sending a Configuration Request to the device
  immediately below that Port. Software can determine when Link
  training completes by polling the Data Link Layer Link Active bit or
  by setting up an associated interrupt (see Section 6.7.3.3).

we have to be able to tell when Link training completes, then wait
100ms.  For us to tell when training is complete, Data Link Layer Link
Active must be implemented, and the spec says it should be implemented
iff Data Link Layer Link Active Reporting Capable bit is set.

The 6.6.1 language about "greater than 5.0 GT/s" is one step removed.
What we really care about is Data Link Layer Link Active, not the link
speed.  It seems like the spec would be much clearer if it said:

  With a Downstream Port that implements the Data Link Layer Link
  Active bit, software must wait a minimum of 100 ms after Link training
  completes ...

Bjorn
Mika Westerberg May 11, 2020, 10:16 a.m. UTC | #10
On Fri, May 08, 2020 at 05:58:21PM -0500, Bjorn Helgaas wrote:
> On Thu, May 07, 2020 at 12:11:27PM -0500, Bjorn Helgaas wrote:
> > On Thu, May 07, 2020 at 04:46:27PM +0300, Mika Westerberg wrote:
> > > On Thu, May 07, 2020 at 08:33:27AM -0500, Bjorn Helgaas wrote:
> > > > On Thu, May 7, 2020 at 7:36 AM Mika Westerberg
> > > > <mika.westerberg@linux.intel.com> wrote:
> > > > >
> > > > > On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote:
> > > > > > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg
> > > > > > <mika.westerberg@linux.intel.com> wrote:
> > > > > > >
> > > > > > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote:
> > > > > > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote:
> > > > > > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume
> > > > > > > > > Thunderbolt connected PCIe devices from both runtime suspend and system
> > > > > > > > > sleep (s2idle).
> > > > > > > > >
> > > > > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2)
> > > > > > > > > announces support for speeds > 5 GT/s but it is then capped by the
> > > > > > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This
> > > > > > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so
> > > > > > > > > it ended up waiting for 1100 ms as these ports do not support active
> > > > > > > > > link layer reporting either.
> > > > > > > > >
> > > > > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms
> > > > > > > > > before sending configuration request to the device below, if the port
> > > > > > > > > does not support speeds > 5 GT/s, and if it does we first need to wait
> > > > > > > > > for the data link layer to become active before waiting for that 100 ms.
> > > > > > > > >
> > > > > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports
> > > > > > > > > that support speeds > 5 GT/s must support active link layer reporting so
> > > > > > > > > instead of looking for the speed we can check for the active link layer
> > > > > > > > > reporting capability and determine how to wait based on that (as they go
> > > > > > > > > hand in hand).
> > > > > > > >
> > > > > > > > I can't quite tell what the defect is here.
> > > > > > > >
> > > > > > > > I assume you're talking about this text from sec 6.6.1:
> > > > > > > >
> > > > > > > >   - With a Downstream Port that does not support Link speeds greater
> > > > > > > >     than 5.0 GT/s, software must wait a minimum of 100 ms before
> > > > > > > >     sending a Configuration Request to the device immediately below
> > > > > > > >     that Port.
> > > > > > > >
> > > > > > > >   - With a Downstream Port that supports Link speeds greater than 5.0
> > > > > > > >     GT/s, software must wait a minimum of 100 ms after Link training
> > > > > > > >     completes before sending a Configuration Request to the device
> > > > > > > >     immediately below that Port. Software can determine when Link
> > > > > > > >     training completes by polling the Data Link Layer Link Active bit
> > > > > > > >     or by setting up an associated interrupt (see Section 6.7.3.3 ).
> > > > > > > >
> > > > > > > > I don't understand what Link Control 2 has to do with this.  The spec
> > > > > > > > talks about ports *supporting* certain link speeds, which sounds to me
> > > > > > > > like the Link Capabilities.  It doesn't say anything about the current
> > > > > > > > or target link speed.
> > > > > > >
> > > > > > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link
> > > > > > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing.
> > > > > > >
> > > > > > > However, we can avoid figuring the speed altogether by checking the
> > > > > > > dev->link_active_reporting instead because that needs to be implemented
> > > > > > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735).
> > > > > >
> > > > > > I understand that part.  But the code as-is matches sec 6.6.1, which
> > > > > > makes it easy to understand.  Checking link_active_reporting instead
> > > > > > makes it harder to understand because it makes it one step removed
> > > > > > from 6.6.1.  And link_active_reporting must be set for ports that
> > > > > > support > 5 GT/s, but it must *also* be set in some hotplug cases, so
> > > > > > that further complicates the connection between it and 6.6.1.
> > > > > >
> > > > > > And apparently there's an actual defect, and I don't understand what
> > > > > > that is yet.  It sounds like we agree that pcie_get_speed_cap() is
> > > > > > doing the right thing.  Is there something in
> > > > > > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1?
> > > > >
> > > > > The defect is that some downstream PCIe ports on a system Kai-Heng is
> > > > > using have Supported Link Speeds Vector with > 5GT/s and it does not
> > > > > support active link reporting so the currect code ends up waiting 1100 ms
> > > > > slowing down resume time.
> > 
> > Ah.  From the lspci and dmesg instrumentation in the bugzilla, I
> > guess:
> > 
> >   04:00.0 Thunderbolt Downstream Port to [bus 05]
> >     LnkCap: Speed 2.5GT/s, LLActRep-
> >     LnkSta: Speed 2.5GT/s
> >     LnkCap2: Supported Link Speeds: 2.5-8GT/s
> >     LnkCtl2: Target Link Speed: 2.5GT/s
> >   04:02.0 Thunderbolt Downstream Port to [bus 39]
> >     LnkCap: Speed 2.5GT/s, LLActRep-
> >     LnkSta: Speed 2.5GT/s
> >     LnkCap2: Supported Link Speeds: 2.5-8GT/s
> >     LnkCtl2: Target Link Speed: 2.5GT/s
> > 
> > So per the Link Cap 2 Supported Link Speeds Vector, both devices
> > support up to 8GT/s, but neither one advertises Data Link Layer Link
> > Active Reporting Capable in Link Cap.
> > 
> > The Root Port to the NVIDIA GPU is similar; it advertises 8GT/s
> > support but not LLActRep:
> > 
> >   00:01.0 Root Port to [bus 01]
> >     LnkCap: Speed 8GT/s, LLActRep-
> >     LnkSta: Speed 8GT/s
> >     LnkCap2: Supported Link Speeds: 2.5-8GT/s
> >     LnkCtl2: Target Link Speed: 8GT/s
> > 
> > The fact that all three of these don't match what I expect makes me
> > wonder if I'm reading the spec wrong or lspci is decoding something
> > wrong.
> > 
> > For the Thunderbolt ports we could make the argument (as I think
> > you're suggesting) that the "supported link speed" is really the
> > minimum of the "Link Cap 2 Supported Link Speed" and the "Link Control
> > 2 Target Link Speed".
> > 
> > But even that wouldn't explain why 00:01.0 doesn't advertise LLActRep+
> > when it is actually running at 8GT/s.
> 
> FWIW, I posted a question about this to the PCI-SIG forum.  I don't
> have high hopes because that's a really low-bandwidth channel.

OK, thanks.

> > > > That sounds like a hardware defect that should be worked around with a
> > > > quirk or something.  If we just restructure the code to avoid the
> > > > problem, we're likely to reintroduce it later because there's no hint
> > > > in the code about this problem.
> > > 
> > > That's why I added the comment there to explain this.
> > > 
> > > Can you propose a patch following what you were thinking that solves
> > > this so Kai-Heng can try it out?
> 
> I think your patch actually makes a lot more *sense* than the language
> in the spec does.  For the second rule:
> 
>   With a Downstream Port that supports Link speeds greater than 5.0
>   GT/s, software must wait a minimum of 100 ms after Link training
>   completes before sending a Configuration Request to the device
>   immediately below that Port. Software can determine when Link
>   training completes by polling the Data Link Layer Link Active bit or
>   by setting up an associated interrupt (see Section 6.7.3.3).
> 
> we have to be able to tell when Link training completes, then wait
> 100ms.  For us to tell when training is complete, Data Link Layer Link
> Active must be implemented, and the spec says it should be implemented
> iff Data Link Layer Link Active Reporting Capable bit is set.
> 
> The 6.6.1 language about "greater than 5.0 GT/s" is one step removed.
> What we really care about is Data Link Layer Link Active, not the link
> speed.  It seems like the spec would be much clearer if it said:
> 
>   With a Downstream Port that implements the Data Link Layer Link
>   Active bit, software must wait a minimum of 100 ms after Link training
>   completes ...

Yes, I agree. That would make it more understandable.

Do you want me to do some changes to the patch or you are fine with it?
Bjorn Helgaas May 12, 2020, 5:09 p.m. UTC | #11
On Mon, May 11, 2020 at 01:16:06PM +0300, Mika Westerberg wrote:
> On Fri, May 08, 2020 at 05:58:21PM -0500, Bjorn Helgaas wrote:
> > On Thu, May 07, 2020 at 12:11:27PM -0500, Bjorn Helgaas wrote:
> > > On Thu, May 07, 2020 at 04:46:27PM +0300, Mika Westerberg wrote:
> > > > On Thu, May 07, 2020 at 08:33:27AM -0500, Bjorn Helgaas wrote:
> > > > > On Thu, May 7, 2020 at 7:36 AM Mika Westerberg
> > > > > <mika.westerberg@linux.intel.com> wrote:
> > > > > >
> > > > > > On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote:
> > > > > > > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg
> > > > > > > <mika.westerberg@linux.intel.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote:
> > > > > > > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote:
> > > > > > > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume
> > > > > > > > > > Thunderbolt connected PCIe devices from both runtime suspend and system
> > > > > > > > > > sleep (s2idle).
> > > > > > > > > >
> > > > > > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2)
> > > > > > > > > > announces support for speeds > 5 GT/s but it is then capped by the
> > > > > > > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This
> > > > > > > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so
> > > > > > > > > > it ended up waiting for 1100 ms as these ports do not support active
> > > > > > > > > > link layer reporting either.
> > > > > > > > > >
> > > > > > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms
> > > > > > > > > > before sending configuration request to the device below, if the port
> > > > > > > > > > does not support speeds > 5 GT/s, and if it does we first need to wait
> > > > > > > > > > for the data link layer to become active before waiting for that 100 ms.
> > > > > > > > > >
> > > > > > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports
> > > > > > > > > > that support speeds > 5 GT/s must support active link layer reporting so
> > > > > > > > > > instead of looking for the speed we can check for the active link layer
> > > > > > > > > > reporting capability and determine how to wait based on that (as they go
> > > > > > > > > > hand in hand).
> > > > > > > > >
> > > > > > > > > I can't quite tell what the defect is here.
> > > > > > > > >
> > > > > > > > > I assume you're talking about this text from sec 6.6.1:
> > > > > > > > >
> > > > > > > > >   - With a Downstream Port that does not support Link speeds greater
> > > > > > > > >     than 5.0 GT/s, software must wait a minimum of 100 ms before
> > > > > > > > >     sending a Configuration Request to the device immediately below
> > > > > > > > >     that Port.
> > > > > > > > >
> > > > > > > > >   - With a Downstream Port that supports Link speeds greater than 5.0
> > > > > > > > >     GT/s, software must wait a minimum of 100 ms after Link training
> > > > > > > > >     completes before sending a Configuration Request to the device
> > > > > > > > >     immediately below that Port. Software can determine when Link
> > > > > > > > >     training completes by polling the Data Link Layer Link Active bit
> > > > > > > > >     or by setting up an associated interrupt (see Section 6.7.3.3 ).
> > > > > > > > >
> > > > > > > > > I don't understand what Link Control 2 has to do with this.  The spec
> > > > > > > > > talks about ports *supporting* certain link speeds, which sounds to me
> > > > > > > > > like the Link Capabilities.  It doesn't say anything about the current
> > > > > > > > > or target link speed.
> > > > > > > >
> > > > > > > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link
> > > > > > > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing.
> > > > > > > >
> > > > > > > > However, we can avoid figuring the speed altogether by checking the
> > > > > > > > dev->link_active_reporting instead because that needs to be implemented
> > > > > > > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735).
> > > > > > >
> > > > > > > I understand that part.  But the code as-is matches sec 6.6.1, which
> > > > > > > makes it easy to understand.  Checking link_active_reporting instead
> > > > > > > makes it harder to understand because it makes it one step removed
> > > > > > > from 6.6.1.  And link_active_reporting must be set for ports that
> > > > > > > support > 5 GT/s, but it must *also* be set in some hotplug cases, so
> > > > > > > that further complicates the connection between it and 6.6.1.
> > > > > > >
> > > > > > > And apparently there's an actual defect, and I don't understand what
> > > > > > > that is yet.  It sounds like we agree that pcie_get_speed_cap() is
> > > > > > > doing the right thing.  Is there something in
> > > > > > > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1?
> > > > > >
> > > > > > The defect is that some downstream PCIe ports on a system Kai-Heng is
> > > > > > using have Supported Link Speeds Vector with > 5GT/s and it does not
> > > > > > support active link reporting so the currect code ends up waiting 1100 ms
> > > > > > slowing down resume time.
> > > 
> > > Ah.  From the lspci and dmesg instrumentation in the bugzilla, I
> > > guess:
> > > 
> > >   04:00.0 Thunderbolt Downstream Port to [bus 05]
> > >     LnkCap: Speed 2.5GT/s, LLActRep-
> > >     LnkSta: Speed 2.5GT/s
> > >     LnkCap2: Supported Link Speeds: 2.5-8GT/s
> > >     LnkCtl2: Target Link Speed: 2.5GT/s
> > >   04:02.0 Thunderbolt Downstream Port to [bus 39]
> > >     LnkCap: Speed 2.5GT/s, LLActRep-
> > >     LnkSta: Speed 2.5GT/s
> > >     LnkCap2: Supported Link Speeds: 2.5-8GT/s
> > >     LnkCtl2: Target Link Speed: 2.5GT/s
> > > 
> > > So per the Link Cap 2 Supported Link Speeds Vector, both devices
> > > support up to 8GT/s, but neither one advertises Data Link Layer Link
> > > Active Reporting Capable in Link Cap.
> > > 
> > > The Root Port to the NVIDIA GPU is similar; it advertises 8GT/s
> > > support but not LLActRep:
> > > 
> > >   00:01.0 Root Port to [bus 01]
> > >     LnkCap: Speed 8GT/s, LLActRep-
> > >     LnkSta: Speed 8GT/s
> > >     LnkCap2: Supported Link Speeds: 2.5-8GT/s
> > >     LnkCtl2: Target Link Speed: 8GT/s
> > > 
> > > The fact that all three of these don't match what I expect makes me
> > > wonder if I'm reading the spec wrong or lspci is decoding something
> > > wrong.
> > > 
> > > For the Thunderbolt ports we could make the argument (as I think
> > > you're suggesting) that the "supported link speed" is really the
> > > minimum of the "Link Cap 2 Supported Link Speed" and the "Link Control
> > > 2 Target Link Speed".
> > > 
> > > But even that wouldn't explain why 00:01.0 doesn't advertise LLActRep+
> > > when it is actually running at 8GT/s.
> > 
> > FWIW, I posted a question about this to the PCI-SIG forum.  I don't
> > have high hopes because that's a really low-bandwidth channel.
> 
> OK, thanks.
> 
> > > > > That sounds like a hardware defect that should be worked around with a
> > > > > quirk or something.  If we just restructure the code to avoid the
> > > > > problem, we're likely to reintroduce it later because there's no hint
> > > > > in the code about this problem.
> > > > 
> > > > That's why I added the comment there to explain this.
> > > > 
> > > > Can you propose a patch following what you were thinking that solves
> > > > this so Kai-Heng can try it out?
> > 
> > I think your patch actually makes a lot more *sense* than the language
> > in the spec does.  For the second rule:
> > 
> >   With a Downstream Port that supports Link speeds greater than 5.0
> >   GT/s, software must wait a minimum of 100 ms after Link training
> >   completes before sending a Configuration Request to the device
> >   immediately below that Port. Software can determine when Link
> >   training completes by polling the Data Link Layer Link Active bit or
> >   by setting up an associated interrupt (see Section 6.7.3.3).
> > 
> > we have to be able to tell when Link training completes, then wait
> > 100ms.  For us to tell when training is complete, Data Link Layer Link
> > Active must be implemented, and the spec says it should be implemented
> > iff Data Link Layer Link Active Reporting Capable bit is set.
> > 
> > The 6.6.1 language about "greater than 5.0 GT/s" is one step removed.
> > What we really care about is Data Link Layer Link Active, not the link
> > speed.  It seems like the spec would be much clearer if it said:
> > 
> >   With a Downstream Port that implements the Data Link Layer Link
> >   Active bit, software must wait a minimum of 100 ms after Link training
> >   completes ...
> 
> Yes, I agree. That would make it more understandable.
> 
> Do you want me to do some changes to the patch or you are fine with it?

I think that code would be more readable as something like:

  if (dev->link_active_reporting)
    wait_for_link_active(dev);
  msleep(delay);

Obviously we still want the printks and error/timeout checking, but
fundamentally we want to wait for the link to be active (if possible),
then wait the 100ms.  That structure isn't as clear as it could be
when the delay is buried inside pcie_wait_for_link_delay().

It's not completely trivial to restructure it like that because of
pcie_wait_for_link() and its callers.  But I think it's worth pushing
on it a little bit to see if it could be done reasonably.

This does expose the fact that per spec, a hot-plug capable port that
supports only 5 GT/s must support link_active_reporting, but sec 6.6.1
says we *don't* need to wait for link training to complete before
waiting 100ms.  So maybe we end up waiting slightly more than
necessary.  Probably not worth worrying about?

Bjorn
Mika Westerberg May 14, 2020, 12:31 p.m. UTC | #12
On Tue, May 12, 2020 at 12:09:06PM -0500, Bjorn Helgaas wrote:
> I think that code would be more readable as something like:
> 
>   if (dev->link_active_reporting)
>     wait_for_link_active(dev);
>   msleep(delay);
> 
> Obviously we still want the printks and error/timeout checking, but
> fundamentally we want to wait for the link to be active (if possible),
> then wait the 100ms.  That structure isn't as clear as it could be
> when the delay is buried inside pcie_wait_for_link_delay().
> 
> It's not completely trivial to restructure it like that because of
> pcie_wait_for_link() and its callers.  But I think it's worth pushing
> on it a little bit to see if it could be done reasonably.

OK, I see what I can do to make it clearer.

> This does expose the fact that per spec, a hot-plug capable port that
> supports only 5 GT/s must support link_active_reporting, but sec 6.6.1
> says we *don't* need to wait for link training to complete before
> waiting 100ms.  So maybe we end up waiting slightly more than
> necessary.  Probably not worth worrying about?

I would say so. Better wait a bit too much than the opposite IMHO.
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 595fcf59843f..d9d9ff5b968e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4822,7 +4822,13 @@  void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
 	if (!pcie_downstream_port(dev))
 		return;
 
-	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
+	/*
+	 * Since PCIe spec mandates that all downstream ports that support
+	 * speeds greater than 5 GT/s must support data link layer active
+	 * reporting so we use that here to determine when the delay should
+	 * be issued.
+	 */
+	if (!dev->link_active_reporting) {
 		pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
 		msleep(delay);
 	} else {