Message ID | 20200416083245.73957-1-mika.westerberg@linux.intel.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: Do not use pcie_get_speed_cap() to determine when to start waiting | expand |
On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote: > Kai-Heng Feng reported that it takes long time (>1s) to resume > Thunderbolt connected PCIe devices from both runtime suspend and system > sleep (s2idle). > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2) > announces support for speeds > 5 GT/s but it is then capped by the > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so > it ended up waiting for 1100 ms as these ports do not support active > link layer reporting either. > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms > before sending configuration request to the device below, if the port > does not support speeds > 5 GT/s, and if it does we first need to wait > for the data link layer to become active before waiting for that 100 ms. > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports > that support speeds > 5 GT/s must support active link layer reporting so > instead of looking for the speed we can check for the active link layer > reporting capability and determine how to wait based on that (as they go > hand in hand). I can't quite tell what the defect is here. I assume you're talking about this text from sec 6.6.1: - With a Downstream Port that does not support Link speeds greater than 5.0 GT/s, software must wait a minimum of 100 ms before sending a Configuration Request to the device immediately below that Port. - With a Downstream Port that supports Link speeds greater than 5.0 GT/s, software must wait a minimum of 100 ms after Link training completes before sending a Configuration Request to the device immediately below that Port. Software can determine when Link training completes by polling the Data Link Layer Link Active bit or by setting up an associated interrupt (see Section 6.7.3.3 ). I don't understand what Link Control 2 has to do with this. The spec talks about ports *supporting* certain link speeds, which sounds to me like the Link Capabilities. It doesn't say anything about the current or target link speed. > Link: https://bugzilla.kernel.org/show_bug.cgi?id=206837 > Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> > --- > drivers/pci/pci.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 595fcf59843f..d9d9ff5b968e 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4822,7 +4822,13 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev) > if (!pcie_downstream_port(dev)) > return; > > - if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) { > + /* > + * Since PCIe spec mandates that all downstream ports that support > + * speeds greater than 5 GT/s must support data link layer active > + * reporting so we use that here to determine when the delay should > + * be issued. > + */ > + if (!dev->link_active_reporting) { > pci_dbg(dev, "waiting %d ms for downstream link\n", delay); > msleep(delay); > } else { > -- > 2.25.1 >
On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote: > Kai-Heng Feng reported that it takes long time (>1s) to resume > Thunderbolt connected PCIe devices from both runtime suspend and system > sleep (s2idle). > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2) > announces support for speeds > 5 GT/s but it is then capped by the > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. > Link: https://bugzilla.kernel.org/show_bug.cgi?id=206837 Slight tangent: The bugzilla mentions that lspci doesn't decode LNKCAP2: https://github.com/pciutils/pciutils/issues/38 Can you try the lspci patch below and see if it matches what you expect? It works for me, but I don't understand the kernel issue yet, so I might be missing something. commit e2bdd75bbaf6 ("lspci: Decode PCIe Link Capabilities 2") Author: Bjorn Helgaas <bhelgaas@google.com> Date: Wed May 6 18:05:55 2020 -0500 lspci: Decode PCIe Link Capabilities 2 Decode Link Capabilities 2, which includes the Supported Link Speeds Vector. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> diff --git a/lib/header.h b/lib/header.h index bfdcc80..3332b32 100644 --- a/lib/header.h +++ b/lib/header.h @@ -901,6 +901,9 @@ #define PCI_EXP_DEV2_OBFF(x) (((x) >> 13) & 3) /* OBFF enabled */ #define PCI_EXP_DEVSTA2 0x2a /* Device Status */ #define PCI_EXP_LNKCAP2 0x2c /* Link Capabilities */ +#define PCI_EXP_LNKCAP2_SPEED(x) (((x) >> 1) & 0x7f) +#define PCI_EXP_LNKCAP2_CROSSLINK 0x00000100 /* Crosslink Supported */ +#define PCI_EXP_LNKCAP2_DRS 0x80000000 /* Device Readiness Status */ #define PCI_EXP_LNKCTL2 0x30 /* Link Control */ #define PCI_EXP_LNKCTL2_SPEED(x) ((x) & 0xf) /* Target Link Speed */ #define PCI_EXP_LNKCTL2_CMPLNC 0x0010 /* Enter Compliance */ diff --git a/ls-caps.c b/ls-caps.c index a6705eb..585b208 100644 --- a/ls-caps.c +++ b/ls-caps.c @@ -1151,6 +1151,29 @@ static void cap_express_dev2(struct device *d, int where, int type) } } +static const char *cap_express_link2_speed_cap(int vector) +{ + /* + * Per PCIe r5.0, sec 8.2.1, a device must support 2.5GT/s and is not + * permitted to skip support for any data rates between 2.5GT/s and the + * highest supported rate. + */ + if (vector & 0x60) + return "RsvdP"; + else if (vector & 0x10) + return "2.5-32GT/s"; + else if (vector & 0x08) + return "2.5-16GT/s"; + else if (vector & 0x04) + return "2.5-8GT/s"; + else if (vector & 0x02) + return "2.5-5GT/s"; + else if (vector & 0x01) + return "2.5GT/s"; + + return "Unknown"; +} + static const char *cap_express_link2_speed(int type) { switch (type) @@ -1204,10 +1227,19 @@ static const char *cap_express_link2_transmargin(int type) static void cap_express_link2(struct device *d, int where, int type) { + u32 l; u16 w; if (!((type == PCI_EXP_TYPE_ENDPOINT || type == PCI_EXP_TYPE_LEG_END) && (d->dev->dev != 0 || d->dev->func != 0))) { + /* Link Capabilities 2 was reserved before PCIe r3.0 */ + l = get_conf_long(d, where + PCI_EXP_LNKCAP2); + if (l) { + printf("\t\tLnkCap2: Supported Link Speeds: %s, Crosslink%c DRS%c\n", + cap_express_link2_speed_cap(PCI_EXP_LNKCAP2_SPEED(l)), + FLAG(l, PCI_EXP_LNKCAP2_CROSSLINK), + FLAG(l, PCI_EXP_LNKCAP2_DRS)); + } w = get_conf_word(d, where + PCI_EXP_LNKCTL2); printf("\t\tLnkCtl2: Target Link Speed: %s, EnterCompliance%c SpeedDis%c", cap_express_link2_speed(PCI_EXP_LNKCTL2_SPEED(w)),
On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote: > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote: > > Kai-Heng Feng reported that it takes long time (>1s) to resume > > Thunderbolt connected PCIe devices from both runtime suspend and system > > sleep (s2idle). > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2) > > announces support for speeds > 5 GT/s but it is then capped by the > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so > > it ended up waiting for 1100 ms as these ports do not support active > > link layer reporting either. > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms > > before sending configuration request to the device below, if the port > > does not support speeds > 5 GT/s, and if it does we first need to wait > > for the data link layer to become active before waiting for that 100 ms. > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports > > that support speeds > 5 GT/s must support active link layer reporting so > > instead of looking for the speed we can check for the active link layer > > reporting capability and determine how to wait based on that (as they go > > hand in hand). > > I can't quite tell what the defect is here. > > I assume you're talking about this text from sec 6.6.1: > > - With a Downstream Port that does not support Link speeds greater > than 5.0 GT/s, software must wait a minimum of 100 ms before > sending a Configuration Request to the device immediately below > that Port. > > - With a Downstream Port that supports Link speeds greater than 5.0 > GT/s, software must wait a minimum of 100 ms after Link training > completes before sending a Configuration Request to the device > immediately below that Port. Software can determine when Link > training completes by polling the Data Link Layer Link Active bit > or by setting up an associated interrupt (see Section 6.7.3.3 ). > > I don't understand what Link Control 2 has to do with this. The spec > talks about ports *supporting* certain link speeds, which sounds to me > like the Link Capabilities. It doesn't say anything about the current > or target link speed. PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link Capabilities 2 register and that's what pcie_get_speed_cap() is doing. However, we can avoid figuring the speed altogether by checking the dev->link_active_reporting instead because that needs to be implemented by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735).
On Thu, May 7, 2020 at 6:45 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote: > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote: > > > Kai-Heng Feng reported that it takes long time (>1s) to resume > > > Thunderbolt connected PCIe devices from both runtime suspend and system > > > sleep (s2idle). > > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2) > > > announces support for speeds > 5 GT/s but it is then capped by the > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so > > > it ended up waiting for 1100 ms as these ports do not support active > > > link layer reporting either. > > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms > > > before sending configuration request to the device below, if the port > > > does not support speeds > 5 GT/s, and if it does we first need to wait > > > for the data link layer to become active before waiting for that 100 ms. > > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports > > > that support speeds > 5 GT/s must support active link layer reporting so > > > instead of looking for the speed we can check for the active link layer > > > reporting capability and determine how to wait based on that (as they go > > > hand in hand). > > > > I can't quite tell what the defect is here. > > > > I assume you're talking about this text from sec 6.6.1: > > > > - With a Downstream Port that does not support Link speeds greater > > than 5.0 GT/s, software must wait a minimum of 100 ms before > > sending a Configuration Request to the device immediately below > > that Port. > > > > - With a Downstream Port that supports Link speeds greater than 5.0 > > GT/s, software must wait a minimum of 100 ms after Link training > > completes before sending a Configuration Request to the device > > immediately below that Port. Software can determine when Link > > training completes by polling the Data Link Layer Link Active bit > > or by setting up an associated interrupt (see Section 6.7.3.3 ). > > > > I don't understand what Link Control 2 has to do with this. The spec > > talks about ports *supporting* certain link speeds, which sounds to me > > like the Link Capabilities. It doesn't say anything about the current > > or target link speed. > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link > Capabilities 2 register and that's what pcie_get_speed_cap() is doing. > > However, we can avoid figuring the speed altogether by checking the > dev->link_active_reporting instead because that needs to be implemented > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735). I understand that part. But the code as-is matches sec 6.6.1, which makes it easy to understand. Checking link_active_reporting instead makes it harder to understand because it makes it one step removed from 6.6.1. And link_active_reporting must be set for ports that support > 5 GT/s, but it must *also* be set in some hotplug cases, so that further complicates the connection between it and 6.6.1. And apparently there's an actual defect, and I don't understand what that is yet. It sounds like we agree that pcie_get_speed_cap() is doing the right thing. Is there something in pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1? Bjorn
On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote: > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote: > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote: > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume > > > > Thunderbolt connected PCIe devices from both runtime suspend and system > > > > sleep (s2idle). > > > > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2) > > > > announces support for speeds > 5 GT/s but it is then capped by the > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so > > > > it ended up waiting for 1100 ms as these ports do not support active > > > > link layer reporting either. > > > > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms > > > > before sending configuration request to the device below, if the port > > > > does not support speeds > 5 GT/s, and if it does we first need to wait > > > > for the data link layer to become active before waiting for that 100 ms. > > > > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports > > > > that support speeds > 5 GT/s must support active link layer reporting so > > > > instead of looking for the speed we can check for the active link layer > > > > reporting capability and determine how to wait based on that (as they go > > > > hand in hand). > > > > > > I can't quite tell what the defect is here. > > > > > > I assume you're talking about this text from sec 6.6.1: > > > > > > - With a Downstream Port that does not support Link speeds greater > > > than 5.0 GT/s, software must wait a minimum of 100 ms before > > > sending a Configuration Request to the device immediately below > > > that Port. > > > > > > - With a Downstream Port that supports Link speeds greater than 5.0 > > > GT/s, software must wait a minimum of 100 ms after Link training > > > completes before sending a Configuration Request to the device > > > immediately below that Port. Software can determine when Link > > > training completes by polling the Data Link Layer Link Active bit > > > or by setting up an associated interrupt (see Section 6.7.3.3 ). > > > > > > I don't understand what Link Control 2 has to do with this. The spec > > > talks about ports *supporting* certain link speeds, which sounds to me > > > like the Link Capabilities. It doesn't say anything about the current > > > or target link speed. > > > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing. > > > > However, we can avoid figuring the speed altogether by checking the > > dev->link_active_reporting instead because that needs to be implemented > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735). > > I understand that part. But the code as-is matches sec 6.6.1, which > makes it easy to understand. Checking link_active_reporting instead > makes it harder to understand because it makes it one step removed > from 6.6.1. And link_active_reporting must be set for ports that > support > 5 GT/s, but it must *also* be set in some hotplug cases, so > that further complicates the connection between it and 6.6.1. > > And apparently there's an actual defect, and I don't understand what > that is yet. It sounds like we agree that pcie_get_speed_cap() is > doing the right thing. Is there something in > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1? The defect is that some downstream PCIe ports on a system Kai-Heng is using have Supported Link Speeds Vector with > 5GT/s and it does not support active link reporting so the currect code ends up waiting 1100 ms slowing down resume time. The Target Link Speed of this port (in Link Control 2) has the speed capped to 2.5 GT/s.
On Thu, May 7, 2020 at 7:36 AM Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote: > > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg > > <mika.westerberg@linux.intel.com> wrote: > > > > > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote: > > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote: > > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume > > > > > Thunderbolt connected PCIe devices from both runtime suspend and system > > > > > sleep (s2idle). > > > > > > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2) > > > > > announces support for speeds > 5 GT/s but it is then capped by the > > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This > > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so > > > > > it ended up waiting for 1100 ms as these ports do not support active > > > > > link layer reporting either. > > > > > > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms > > > > > before sending configuration request to the device below, if the port > > > > > does not support speeds > 5 GT/s, and if it does we first need to wait > > > > > for the data link layer to become active before waiting for that 100 ms. > > > > > > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports > > > > > that support speeds > 5 GT/s must support active link layer reporting so > > > > > instead of looking for the speed we can check for the active link layer > > > > > reporting capability and determine how to wait based on that (as they go > > > > > hand in hand). > > > > > > > > I can't quite tell what the defect is here. > > > > > > > > I assume you're talking about this text from sec 6.6.1: > > > > > > > > - With a Downstream Port that does not support Link speeds greater > > > > than 5.0 GT/s, software must wait a minimum of 100 ms before > > > > sending a Configuration Request to the device immediately below > > > > that Port. > > > > > > > > - With a Downstream Port that supports Link speeds greater than 5.0 > > > > GT/s, software must wait a minimum of 100 ms after Link training > > > > completes before sending a Configuration Request to the device > > > > immediately below that Port. Software can determine when Link > > > > training completes by polling the Data Link Layer Link Active bit > > > > or by setting up an associated interrupt (see Section 6.7.3.3 ). > > > > > > > > I don't understand what Link Control 2 has to do with this. The spec > > > > talks about ports *supporting* certain link speeds, which sounds to me > > > > like the Link Capabilities. It doesn't say anything about the current > > > > or target link speed. > > > > > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link > > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing. > > > > > > However, we can avoid figuring the speed altogether by checking the > > > dev->link_active_reporting instead because that needs to be implemented > > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735). > > > > I understand that part. But the code as-is matches sec 6.6.1, which > > makes it easy to understand. Checking link_active_reporting instead > > makes it harder to understand because it makes it one step removed > > from 6.6.1. And link_active_reporting must be set for ports that > > support > 5 GT/s, but it must *also* be set in some hotplug cases, so > > that further complicates the connection between it and 6.6.1. > > > > And apparently there's an actual defect, and I don't understand what > > that is yet. It sounds like we agree that pcie_get_speed_cap() is > > doing the right thing. Is there something in > > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1? > > The defect is that some downstream PCIe ports on a system Kai-Heng is > using have Supported Link Speeds Vector with > 5GT/s and it does not > support active link reporting so the currect code ends up waiting 1100 ms > slowing down resume time. That sounds like a hardware defect that should be worked around with a quirk or something. If we just restructure the code to avoid the problem, we're likely to reintroduce it later because there's no hint in the code about this problem. > The Target Link Speed of this port (in Link Control 2) has the speed > capped to 2.5 GT/s. I saw you mentioned this before, but I don't understand yet why it's relevant. The spec says "supported,", not "current" or "target".
On Thu, May 07, 2020 at 08:33:27AM -0500, Bjorn Helgaas wrote: > On Thu, May 7, 2020 at 7:36 AM Mika Westerberg > <mika.westerberg@linux.intel.com> wrote: > > > > On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote: > > > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg > > > <mika.westerberg@linux.intel.com> wrote: > > > > > > > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote: > > > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote: > > > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume > > > > > > Thunderbolt connected PCIe devices from both runtime suspend and system > > > > > > sleep (s2idle). > > > > > > > > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2) > > > > > > announces support for speeds > 5 GT/s but it is then capped by the > > > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This > > > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so > > > > > > it ended up waiting for 1100 ms as these ports do not support active > > > > > > link layer reporting either. > > > > > > > > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms > > > > > > before sending configuration request to the device below, if the port > > > > > > does not support speeds > 5 GT/s, and if it does we first need to wait > > > > > > for the data link layer to become active before waiting for that 100 ms. > > > > > > > > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports > > > > > > that support speeds > 5 GT/s must support active link layer reporting so > > > > > > instead of looking for the speed we can check for the active link layer > > > > > > reporting capability and determine how to wait based on that (as they go > > > > > > hand in hand). > > > > > > > > > > I can't quite tell what the defect is here. > > > > > > > > > > I assume you're talking about this text from sec 6.6.1: > > > > > > > > > > - With a Downstream Port that does not support Link speeds greater > > > > > than 5.0 GT/s, software must wait a minimum of 100 ms before > > > > > sending a Configuration Request to the device immediately below > > > > > that Port. > > > > > > > > > > - With a Downstream Port that supports Link speeds greater than 5.0 > > > > > GT/s, software must wait a minimum of 100 ms after Link training > > > > > completes before sending a Configuration Request to the device > > > > > immediately below that Port. Software can determine when Link > > > > > training completes by polling the Data Link Layer Link Active bit > > > > > or by setting up an associated interrupt (see Section 6.7.3.3 ). > > > > > > > > > > I don't understand what Link Control 2 has to do with this. The spec > > > > > talks about ports *supporting* certain link speeds, which sounds to me > > > > > like the Link Capabilities. It doesn't say anything about the current > > > > > or target link speed. > > > > > > > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link > > > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing. > > > > > > > > However, we can avoid figuring the speed altogether by checking the > > > > dev->link_active_reporting instead because that needs to be implemented > > > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735). > > > > > > I understand that part. But the code as-is matches sec 6.6.1, which > > > makes it easy to understand. Checking link_active_reporting instead > > > makes it harder to understand because it makes it one step removed > > > from 6.6.1. And link_active_reporting must be set for ports that > > > support > 5 GT/s, but it must *also* be set in some hotplug cases, so > > > that further complicates the connection between it and 6.6.1. > > > > > > And apparently there's an actual defect, and I don't understand what > > > that is yet. It sounds like we agree that pcie_get_speed_cap() is > > > doing the right thing. Is there something in > > > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1? > > > > The defect is that some downstream PCIe ports on a system Kai-Heng is > > using have Supported Link Speeds Vector with > 5GT/s and it does not > > support active link reporting so the currect code ends up waiting 1100 ms > > slowing down resume time. > > That sounds like a hardware defect that should be worked around with a > quirk or something. If we just restructure the code to avoid the > problem, we're likely to reintroduce it later because there's no hint > in the code about this problem. That's why I added the comment there to explain this. Can you propose a patch following what you were thinking that solves this so Kai-Heng can try it out? > > The Target Link Speed of this port (in Link Control 2) has the speed > > capped to 2.5 GT/s. > > I saw you mentioned this before, but I don't understand yet why it's > relevant. The spec says "supported,", not "current" or "target". Because my interpretation of this part of the spec is that supported speed is supported up to what is set in target speed. In other words you can have 8GT/s there in the supported vector and the hardware/firmware can force the target to be lower therefore the link speed is always capped by that. I'm not talking about "current" here. Only "supported" and "target".
On Thu, May 07, 2020 at 04:46:27PM +0300, Mika Westerberg wrote: > On Thu, May 07, 2020 at 08:33:27AM -0500, Bjorn Helgaas wrote: > > On Thu, May 7, 2020 at 7:36 AM Mika Westerberg > > <mika.westerberg@linux.intel.com> wrote: > > > > > > On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote: > > > > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg > > > > <mika.westerberg@linux.intel.com> wrote: > > > > > > > > > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote: > > > > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote: > > > > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume > > > > > > > Thunderbolt connected PCIe devices from both runtime suspend and system > > > > > > > sleep (s2idle). > > > > > > > > > > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2) > > > > > > > announces support for speeds > 5 GT/s but it is then capped by the > > > > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This > > > > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so > > > > > > > it ended up waiting for 1100 ms as these ports do not support active > > > > > > > link layer reporting either. > > > > > > > > > > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms > > > > > > > before sending configuration request to the device below, if the port > > > > > > > does not support speeds > 5 GT/s, and if it does we first need to wait > > > > > > > for the data link layer to become active before waiting for that 100 ms. > > > > > > > > > > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports > > > > > > > that support speeds > 5 GT/s must support active link layer reporting so > > > > > > > instead of looking for the speed we can check for the active link layer > > > > > > > reporting capability and determine how to wait based on that (as they go > > > > > > > hand in hand). > > > > > > > > > > > > I can't quite tell what the defect is here. > > > > > > > > > > > > I assume you're talking about this text from sec 6.6.1: > > > > > > > > > > > > - With a Downstream Port that does not support Link speeds greater > > > > > > than 5.0 GT/s, software must wait a minimum of 100 ms before > > > > > > sending a Configuration Request to the device immediately below > > > > > > that Port. > > > > > > > > > > > > - With a Downstream Port that supports Link speeds greater than 5.0 > > > > > > GT/s, software must wait a minimum of 100 ms after Link training > > > > > > completes before sending a Configuration Request to the device > > > > > > immediately below that Port. Software can determine when Link > > > > > > training completes by polling the Data Link Layer Link Active bit > > > > > > or by setting up an associated interrupt (see Section 6.7.3.3 ). > > > > > > > > > > > > I don't understand what Link Control 2 has to do with this. The spec > > > > > > talks about ports *supporting* certain link speeds, which sounds to me > > > > > > like the Link Capabilities. It doesn't say anything about the current > > > > > > or target link speed. > > > > > > > > > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link > > > > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing. > > > > > > > > > > However, we can avoid figuring the speed altogether by checking the > > > > > dev->link_active_reporting instead because that needs to be implemented > > > > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735). > > > > > > > > I understand that part. But the code as-is matches sec 6.6.1, which > > > > makes it easy to understand. Checking link_active_reporting instead > > > > makes it harder to understand because it makes it one step removed > > > > from 6.6.1. And link_active_reporting must be set for ports that > > > > support > 5 GT/s, but it must *also* be set in some hotplug cases, so > > > > that further complicates the connection between it and 6.6.1. > > > > > > > > And apparently there's an actual defect, and I don't understand what > > > > that is yet. It sounds like we agree that pcie_get_speed_cap() is > > > > doing the right thing. Is there something in > > > > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1? > > > > > > The defect is that some downstream PCIe ports on a system Kai-Heng is > > > using have Supported Link Speeds Vector with > 5GT/s and it does not > > > support active link reporting so the currect code ends up waiting 1100 ms > > > slowing down resume time. Ah. From the lspci and dmesg instrumentation in the bugzilla, I guess: 04:00.0 Thunderbolt Downstream Port to [bus 05] LnkCap: Speed 2.5GT/s, LLActRep- LnkSta: Speed 2.5GT/s LnkCap2: Supported Link Speeds: 2.5-8GT/s LnkCtl2: Target Link Speed: 2.5GT/s 04:02.0 Thunderbolt Downstream Port to [bus 39] LnkCap: Speed 2.5GT/s, LLActRep- LnkSta: Speed 2.5GT/s LnkCap2: Supported Link Speeds: 2.5-8GT/s LnkCtl2: Target Link Speed: 2.5GT/s So per the Link Cap 2 Supported Link Speeds Vector, both devices support up to 8GT/s, but neither one advertises Data Link Layer Link Active Reporting Capable in Link Cap. The Root Port to the NVIDIA GPU is similar; it advertises 8GT/s support but not LLActRep: 00:01.0 Root Port to [bus 01] LnkCap: Speed 8GT/s, LLActRep- LnkSta: Speed 8GT/s LnkCap2: Supported Link Speeds: 2.5-8GT/s LnkCtl2: Target Link Speed: 8GT/s The fact that all three of these don't match what I expect makes me wonder if I'm reading the spec wrong or lspci is decoding something wrong. For the Thunderbolt ports we could make the argument (as I think you're suggesting) that the "supported link speed" is really the minimum of the "Link Cap 2 Supported Link Speed" and the "Link Control 2 Target Link Speed". But even that wouldn't explain why 00:01.0 doesn't advertise LLActRep+ when it is actually running at 8GT/s. > > That sounds like a hardware defect that should be worked around with a > > quirk or something. If we just restructure the code to avoid the > > problem, we're likely to reintroduce it later because there's no hint > > in the code about this problem. > > That's why I added the comment there to explain this. > > Can you propose a patch following what you were thinking that solves > this so Kai-Heng can try it out? > > > > The Target Link Speed of this port (in Link Control 2) has the speed > > > capped to 2.5 GT/s. > > > > I saw you mentioned this before, but I don't understand yet why it's > > relevant. The spec says "supported,", not "current" or "target". > > Because my interpretation of this part of the spec is that supported > speed is supported up to what is set in target speed. In other words you > can have 8GT/s there in the supported vector and the hardware/firmware > can force the target to be lower therefore the link speed is always > capped by that. > > I'm not talking about "current" here. Only "supported" and "target". Kai-Heng, would you mind attaching the "sudo lspci -vvxxxx" output to the bugzilla? I can use that to test my lspci patch.
On Thu, May 07, 2020 at 12:11:27PM -0500, Bjorn Helgaas wrote: > On Thu, May 07, 2020 at 04:46:27PM +0300, Mika Westerberg wrote: > > On Thu, May 07, 2020 at 08:33:27AM -0500, Bjorn Helgaas wrote: > > > On Thu, May 7, 2020 at 7:36 AM Mika Westerberg > > > <mika.westerberg@linux.intel.com> wrote: > > > > > > > > On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote: > > > > > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg > > > > > <mika.westerberg@linux.intel.com> wrote: > > > > > > > > > > > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote: > > > > > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote: > > > > > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume > > > > > > > > Thunderbolt connected PCIe devices from both runtime suspend and system > > > > > > > > sleep (s2idle). > > > > > > > > > > > > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2) > > > > > > > > announces support for speeds > 5 GT/s but it is then capped by the > > > > > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This > > > > > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so > > > > > > > > it ended up waiting for 1100 ms as these ports do not support active > > > > > > > > link layer reporting either. > > > > > > > > > > > > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms > > > > > > > > before sending configuration request to the device below, if the port > > > > > > > > does not support speeds > 5 GT/s, and if it does we first need to wait > > > > > > > > for the data link layer to become active before waiting for that 100 ms. > > > > > > > > > > > > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports > > > > > > > > that support speeds > 5 GT/s must support active link layer reporting so > > > > > > > > instead of looking for the speed we can check for the active link layer > > > > > > > > reporting capability and determine how to wait based on that (as they go > > > > > > > > hand in hand). > > > > > > > > > > > > > > I can't quite tell what the defect is here. > > > > > > > > > > > > > > I assume you're talking about this text from sec 6.6.1: > > > > > > > > > > > > > > - With a Downstream Port that does not support Link speeds greater > > > > > > > than 5.0 GT/s, software must wait a minimum of 100 ms before > > > > > > > sending a Configuration Request to the device immediately below > > > > > > > that Port. > > > > > > > > > > > > > > - With a Downstream Port that supports Link speeds greater than 5.0 > > > > > > > GT/s, software must wait a minimum of 100 ms after Link training > > > > > > > completes before sending a Configuration Request to the device > > > > > > > immediately below that Port. Software can determine when Link > > > > > > > training completes by polling the Data Link Layer Link Active bit > > > > > > > or by setting up an associated interrupt (see Section 6.7.3.3 ). > > > > > > > > > > > > > > I don't understand what Link Control 2 has to do with this. The spec > > > > > > > talks about ports *supporting* certain link speeds, which sounds to me > > > > > > > like the Link Capabilities. It doesn't say anything about the current > > > > > > > or target link speed. > > > > > > > > > > > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link > > > > > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing. > > > > > > > > > > > > However, we can avoid figuring the speed altogether by checking the > > > > > > dev->link_active_reporting instead because that needs to be implemented > > > > > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735). > > > > > > > > > > I understand that part. But the code as-is matches sec 6.6.1, which > > > > > makes it easy to understand. Checking link_active_reporting instead > > > > > makes it harder to understand because it makes it one step removed > > > > > from 6.6.1. And link_active_reporting must be set for ports that > > > > > support > 5 GT/s, but it must *also* be set in some hotplug cases, so > > > > > that further complicates the connection between it and 6.6.1. > > > > > > > > > > And apparently there's an actual defect, and I don't understand what > > > > > that is yet. It sounds like we agree that pcie_get_speed_cap() is > > > > > doing the right thing. Is there something in > > > > > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1? > > > > > > > > The defect is that some downstream PCIe ports on a system Kai-Heng is > > > > using have Supported Link Speeds Vector with > 5GT/s and it does not > > > > support active link reporting so the currect code ends up waiting 1100 ms > > > > slowing down resume time. > > Ah. From the lspci and dmesg instrumentation in the bugzilla, I > guess: > > 04:00.0 Thunderbolt Downstream Port to [bus 05] > LnkCap: Speed 2.5GT/s, LLActRep- > LnkSta: Speed 2.5GT/s > LnkCap2: Supported Link Speeds: 2.5-8GT/s > LnkCtl2: Target Link Speed: 2.5GT/s > 04:02.0 Thunderbolt Downstream Port to [bus 39] > LnkCap: Speed 2.5GT/s, LLActRep- > LnkSta: Speed 2.5GT/s > LnkCap2: Supported Link Speeds: 2.5-8GT/s > LnkCtl2: Target Link Speed: 2.5GT/s > > So per the Link Cap 2 Supported Link Speeds Vector, both devices > support up to 8GT/s, but neither one advertises Data Link Layer Link > Active Reporting Capable in Link Cap. > > The Root Port to the NVIDIA GPU is similar; it advertises 8GT/s > support but not LLActRep: > > 00:01.0 Root Port to [bus 01] > LnkCap: Speed 8GT/s, LLActRep- > LnkSta: Speed 8GT/s > LnkCap2: Supported Link Speeds: 2.5-8GT/s > LnkCtl2: Target Link Speed: 8GT/s > > The fact that all three of these don't match what I expect makes me > wonder if I'm reading the spec wrong or lspci is decoding something > wrong. > > For the Thunderbolt ports we could make the argument (as I think > you're suggesting) that the "supported link speed" is really the > minimum of the "Link Cap 2 Supported Link Speed" and the "Link Control > 2 Target Link Speed". > > But even that wouldn't explain why 00:01.0 doesn't advertise LLActRep+ > when it is actually running at 8GT/s. FWIW, I posted a question about this to the PCI-SIG forum. I don't have high hopes because that's a really low-bandwidth channel. > > > That sounds like a hardware defect that should be worked around with a > > > quirk or something. If we just restructure the code to avoid the > > > problem, we're likely to reintroduce it later because there's no hint > > > in the code about this problem. > > > > That's why I added the comment there to explain this. > > > > Can you propose a patch following what you were thinking that solves > > this so Kai-Heng can try it out? I think your patch actually makes a lot more *sense* than the language in the spec does. For the second rule: With a Downstream Port that supports Link speeds greater than 5.0 GT/s, software must wait a minimum of 100 ms after Link training completes before sending a Configuration Request to the device immediately below that Port. Software can determine when Link training completes by polling the Data Link Layer Link Active bit or by setting up an associated interrupt (see Section 6.7.3.3). we have to be able to tell when Link training completes, then wait 100ms. For us to tell when training is complete, Data Link Layer Link Active must be implemented, and the spec says it should be implemented iff Data Link Layer Link Active Reporting Capable bit is set. The 6.6.1 language about "greater than 5.0 GT/s" is one step removed. What we really care about is Data Link Layer Link Active, not the link speed. It seems like the spec would be much clearer if it said: With a Downstream Port that implements the Data Link Layer Link Active bit, software must wait a minimum of 100 ms after Link training completes ... Bjorn
On Fri, May 08, 2020 at 05:58:21PM -0500, Bjorn Helgaas wrote: > On Thu, May 07, 2020 at 12:11:27PM -0500, Bjorn Helgaas wrote: > > On Thu, May 07, 2020 at 04:46:27PM +0300, Mika Westerberg wrote: > > > On Thu, May 07, 2020 at 08:33:27AM -0500, Bjorn Helgaas wrote: > > > > On Thu, May 7, 2020 at 7:36 AM Mika Westerberg > > > > <mika.westerberg@linux.intel.com> wrote: > > > > > > > > > > On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote: > > > > > > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg > > > > > > <mika.westerberg@linux.intel.com> wrote: > > > > > > > > > > > > > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote: > > > > > > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote: > > > > > > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume > > > > > > > > > Thunderbolt connected PCIe devices from both runtime suspend and system > > > > > > > > > sleep (s2idle). > > > > > > > > > > > > > > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2) > > > > > > > > > announces support for speeds > 5 GT/s but it is then capped by the > > > > > > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This > > > > > > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so > > > > > > > > > it ended up waiting for 1100 ms as these ports do not support active > > > > > > > > > link layer reporting either. > > > > > > > > > > > > > > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms > > > > > > > > > before sending configuration request to the device below, if the port > > > > > > > > > does not support speeds > 5 GT/s, and if it does we first need to wait > > > > > > > > > for the data link layer to become active before waiting for that 100 ms. > > > > > > > > > > > > > > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports > > > > > > > > > that support speeds > 5 GT/s must support active link layer reporting so > > > > > > > > > instead of looking for the speed we can check for the active link layer > > > > > > > > > reporting capability and determine how to wait based on that (as they go > > > > > > > > > hand in hand). > > > > > > > > > > > > > > > > I can't quite tell what the defect is here. > > > > > > > > > > > > > > > > I assume you're talking about this text from sec 6.6.1: > > > > > > > > > > > > > > > > - With a Downstream Port that does not support Link speeds greater > > > > > > > > than 5.0 GT/s, software must wait a minimum of 100 ms before > > > > > > > > sending a Configuration Request to the device immediately below > > > > > > > > that Port. > > > > > > > > > > > > > > > > - With a Downstream Port that supports Link speeds greater than 5.0 > > > > > > > > GT/s, software must wait a minimum of 100 ms after Link training > > > > > > > > completes before sending a Configuration Request to the device > > > > > > > > immediately below that Port. Software can determine when Link > > > > > > > > training completes by polling the Data Link Layer Link Active bit > > > > > > > > or by setting up an associated interrupt (see Section 6.7.3.3 ). > > > > > > > > > > > > > > > > I don't understand what Link Control 2 has to do with this. The spec > > > > > > > > talks about ports *supporting* certain link speeds, which sounds to me > > > > > > > > like the Link Capabilities. It doesn't say anything about the current > > > > > > > > or target link speed. > > > > > > > > > > > > > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link > > > > > > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing. > > > > > > > > > > > > > > However, we can avoid figuring the speed altogether by checking the > > > > > > > dev->link_active_reporting instead because that needs to be implemented > > > > > > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735). > > > > > > > > > > > > I understand that part. But the code as-is matches sec 6.6.1, which > > > > > > makes it easy to understand. Checking link_active_reporting instead > > > > > > makes it harder to understand because it makes it one step removed > > > > > > from 6.6.1. And link_active_reporting must be set for ports that > > > > > > support > 5 GT/s, but it must *also* be set in some hotplug cases, so > > > > > > that further complicates the connection between it and 6.6.1. > > > > > > > > > > > > And apparently there's an actual defect, and I don't understand what > > > > > > that is yet. It sounds like we agree that pcie_get_speed_cap() is > > > > > > doing the right thing. Is there something in > > > > > > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1? > > > > > > > > > > The defect is that some downstream PCIe ports on a system Kai-Heng is > > > > > using have Supported Link Speeds Vector with > 5GT/s and it does not > > > > > support active link reporting so the currect code ends up waiting 1100 ms > > > > > slowing down resume time. > > > > Ah. From the lspci and dmesg instrumentation in the bugzilla, I > > guess: > > > > 04:00.0 Thunderbolt Downstream Port to [bus 05] > > LnkCap: Speed 2.5GT/s, LLActRep- > > LnkSta: Speed 2.5GT/s > > LnkCap2: Supported Link Speeds: 2.5-8GT/s > > LnkCtl2: Target Link Speed: 2.5GT/s > > 04:02.0 Thunderbolt Downstream Port to [bus 39] > > LnkCap: Speed 2.5GT/s, LLActRep- > > LnkSta: Speed 2.5GT/s > > LnkCap2: Supported Link Speeds: 2.5-8GT/s > > LnkCtl2: Target Link Speed: 2.5GT/s > > > > So per the Link Cap 2 Supported Link Speeds Vector, both devices > > support up to 8GT/s, but neither one advertises Data Link Layer Link > > Active Reporting Capable in Link Cap. > > > > The Root Port to the NVIDIA GPU is similar; it advertises 8GT/s > > support but not LLActRep: > > > > 00:01.0 Root Port to [bus 01] > > LnkCap: Speed 8GT/s, LLActRep- > > LnkSta: Speed 8GT/s > > LnkCap2: Supported Link Speeds: 2.5-8GT/s > > LnkCtl2: Target Link Speed: 8GT/s > > > > The fact that all three of these don't match what I expect makes me > > wonder if I'm reading the spec wrong or lspci is decoding something > > wrong. > > > > For the Thunderbolt ports we could make the argument (as I think > > you're suggesting) that the "supported link speed" is really the > > minimum of the "Link Cap 2 Supported Link Speed" and the "Link Control > > 2 Target Link Speed". > > > > But even that wouldn't explain why 00:01.0 doesn't advertise LLActRep+ > > when it is actually running at 8GT/s. > > FWIW, I posted a question about this to the PCI-SIG forum. I don't > have high hopes because that's a really low-bandwidth channel. OK, thanks. > > > > That sounds like a hardware defect that should be worked around with a > > > > quirk or something. If we just restructure the code to avoid the > > > > problem, we're likely to reintroduce it later because there's no hint > > > > in the code about this problem. > > > > > > That's why I added the comment there to explain this. > > > > > > Can you propose a patch following what you were thinking that solves > > > this so Kai-Heng can try it out? > > I think your patch actually makes a lot more *sense* than the language > in the spec does. For the second rule: > > With a Downstream Port that supports Link speeds greater than 5.0 > GT/s, software must wait a minimum of 100 ms after Link training > completes before sending a Configuration Request to the device > immediately below that Port. Software can determine when Link > training completes by polling the Data Link Layer Link Active bit or > by setting up an associated interrupt (see Section 6.7.3.3). > > we have to be able to tell when Link training completes, then wait > 100ms. For us to tell when training is complete, Data Link Layer Link > Active must be implemented, and the spec says it should be implemented > iff Data Link Layer Link Active Reporting Capable bit is set. > > The 6.6.1 language about "greater than 5.0 GT/s" is one step removed. > What we really care about is Data Link Layer Link Active, not the link > speed. It seems like the spec would be much clearer if it said: > > With a Downstream Port that implements the Data Link Layer Link > Active bit, software must wait a minimum of 100 ms after Link training > completes ... Yes, I agree. That would make it more understandable. Do you want me to do some changes to the patch or you are fine with it?
On Mon, May 11, 2020 at 01:16:06PM +0300, Mika Westerberg wrote: > On Fri, May 08, 2020 at 05:58:21PM -0500, Bjorn Helgaas wrote: > > On Thu, May 07, 2020 at 12:11:27PM -0500, Bjorn Helgaas wrote: > > > On Thu, May 07, 2020 at 04:46:27PM +0300, Mika Westerberg wrote: > > > > On Thu, May 07, 2020 at 08:33:27AM -0500, Bjorn Helgaas wrote: > > > > > On Thu, May 7, 2020 at 7:36 AM Mika Westerberg > > > > > <mika.westerberg@linux.intel.com> wrote: > > > > > > > > > > > > On Thu, May 07, 2020 at 07:24:37AM -0500, Bjorn Helgaas wrote: > > > > > > > On Thu, May 7, 2020 at 6:45 AM Mika Westerberg > > > > > > > <mika.westerberg@linux.intel.com> wrote: > > > > > > > > > > > > > > > > On Wed, May 06, 2020 at 05:42:28PM -0500, Bjorn Helgaas wrote: > > > > > > > > > On Thu, Apr 16, 2020 at 11:32:45AM +0300, Mika Westerberg wrote: > > > > > > > > > > Kai-Heng Feng reported that it takes long time (>1s) to resume > > > > > > > > > > Thunderbolt connected PCIe devices from both runtime suspend and system > > > > > > > > > > sleep (s2idle). > > > > > > > > > > > > > > > > > > > > These PCIe downstream ports the second link capability (PCI_EXP_LNKCAP2) > > > > > > > > > > announces support for speeds > 5 GT/s but it is then capped by the > > > > > > > > > > second link control (PCI_EXP_LNKCTL2) register to 2.5 GT/s. This > > > > > > > > > > possiblity was not considered in pci_bridge_wait_for_secondary_bus() so > > > > > > > > > > it ended up waiting for 1100 ms as these ports do not support active > > > > > > > > > > link layer reporting either. > > > > > > > > > > > > > > > > > > > > PCIe spec 5.0 section 6.6.1 mandates that we must wait minimum of 100 ms > > > > > > > > > > before sending configuration request to the device below, if the port > > > > > > > > > > does not support speeds > 5 GT/s, and if it does we first need to wait > > > > > > > > > > for the data link layer to become active before waiting for that 100 ms. > > > > > > > > > > > > > > > > > > > > PCIe spec 5.0 section 7.5.3.6 further says that all downstream ports > > > > > > > > > > that support speeds > 5 GT/s must support active link layer reporting so > > > > > > > > > > instead of looking for the speed we can check for the active link layer > > > > > > > > > > reporting capability and determine how to wait based on that (as they go > > > > > > > > > > hand in hand). > > > > > > > > > > > > > > > > > > I can't quite tell what the defect is here. > > > > > > > > > > > > > > > > > > I assume you're talking about this text from sec 6.6.1: > > > > > > > > > > > > > > > > > > - With a Downstream Port that does not support Link speeds greater > > > > > > > > > than 5.0 GT/s, software must wait a minimum of 100 ms before > > > > > > > > > sending a Configuration Request to the device immediately below > > > > > > > > > that Port. > > > > > > > > > > > > > > > > > > - With a Downstream Port that supports Link speeds greater than 5.0 > > > > > > > > > GT/s, software must wait a minimum of 100 ms after Link training > > > > > > > > > completes before sending a Configuration Request to the device > > > > > > > > > immediately below that Port. Software can determine when Link > > > > > > > > > training completes by polling the Data Link Layer Link Active bit > > > > > > > > > or by setting up an associated interrupt (see Section 6.7.3.3 ). > > > > > > > > > > > > > > > > > > I don't understand what Link Control 2 has to do with this. The spec > > > > > > > > > talks about ports *supporting* certain link speeds, which sounds to me > > > > > > > > > like the Link Capabilities. It doesn't say anything about the current > > > > > > > > > or target link speed. > > > > > > > > > > > > > > > > PCIe 5.0 page 764 recommends using Supported Link Speeds Vector in Link > > > > > > > > Capabilities 2 register and that's what pcie_get_speed_cap() is doing. > > > > > > > > > > > > > > > > However, we can avoid figuring the speed altogether by checking the > > > > > > > > dev->link_active_reporting instead because that needs to be implemented > > > > > > > > by all Downstream Ports that supports speeds > 5 GT/s (PCIe 5.0 page 735). > > > > > > > > > > > > > > I understand that part. But the code as-is matches sec 6.6.1, which > > > > > > > makes it easy to understand. Checking link_active_reporting instead > > > > > > > makes it harder to understand because it makes it one step removed > > > > > > > from 6.6.1. And link_active_reporting must be set for ports that > > > > > > > support > 5 GT/s, but it must *also* be set in some hotplug cases, so > > > > > > > that further complicates the connection between it and 6.6.1. > > > > > > > > > > > > > > And apparently there's an actual defect, and I don't understand what > > > > > > > that is yet. It sounds like we agree that pcie_get_speed_cap() is > > > > > > > doing the right thing. Is there something in > > > > > > > pci_bridge_wait_for_secondary_bus() that doesn't match sec 6.6.1? > > > > > > > > > > > > The defect is that some downstream PCIe ports on a system Kai-Heng is > > > > > > using have Supported Link Speeds Vector with > 5GT/s and it does not > > > > > > support active link reporting so the currect code ends up waiting 1100 ms > > > > > > slowing down resume time. > > > > > > Ah. From the lspci and dmesg instrumentation in the bugzilla, I > > > guess: > > > > > > 04:00.0 Thunderbolt Downstream Port to [bus 05] > > > LnkCap: Speed 2.5GT/s, LLActRep- > > > LnkSta: Speed 2.5GT/s > > > LnkCap2: Supported Link Speeds: 2.5-8GT/s > > > LnkCtl2: Target Link Speed: 2.5GT/s > > > 04:02.0 Thunderbolt Downstream Port to [bus 39] > > > LnkCap: Speed 2.5GT/s, LLActRep- > > > LnkSta: Speed 2.5GT/s > > > LnkCap2: Supported Link Speeds: 2.5-8GT/s > > > LnkCtl2: Target Link Speed: 2.5GT/s > > > > > > So per the Link Cap 2 Supported Link Speeds Vector, both devices > > > support up to 8GT/s, but neither one advertises Data Link Layer Link > > > Active Reporting Capable in Link Cap. > > > > > > The Root Port to the NVIDIA GPU is similar; it advertises 8GT/s > > > support but not LLActRep: > > > > > > 00:01.0 Root Port to [bus 01] > > > LnkCap: Speed 8GT/s, LLActRep- > > > LnkSta: Speed 8GT/s > > > LnkCap2: Supported Link Speeds: 2.5-8GT/s > > > LnkCtl2: Target Link Speed: 8GT/s > > > > > > The fact that all three of these don't match what I expect makes me > > > wonder if I'm reading the spec wrong or lspci is decoding something > > > wrong. > > > > > > For the Thunderbolt ports we could make the argument (as I think > > > you're suggesting) that the "supported link speed" is really the > > > minimum of the "Link Cap 2 Supported Link Speed" and the "Link Control > > > 2 Target Link Speed". > > > > > > But even that wouldn't explain why 00:01.0 doesn't advertise LLActRep+ > > > when it is actually running at 8GT/s. > > > > FWIW, I posted a question about this to the PCI-SIG forum. I don't > > have high hopes because that's a really low-bandwidth channel. > > OK, thanks. > > > > > > That sounds like a hardware defect that should be worked around with a > > > > > quirk or something. If we just restructure the code to avoid the > > > > > problem, we're likely to reintroduce it later because there's no hint > > > > > in the code about this problem. > > > > > > > > That's why I added the comment there to explain this. > > > > > > > > Can you propose a patch following what you were thinking that solves > > > > this so Kai-Heng can try it out? > > > > I think your patch actually makes a lot more *sense* than the language > > in the spec does. For the second rule: > > > > With a Downstream Port that supports Link speeds greater than 5.0 > > GT/s, software must wait a minimum of 100 ms after Link training > > completes before sending a Configuration Request to the device > > immediately below that Port. Software can determine when Link > > training completes by polling the Data Link Layer Link Active bit or > > by setting up an associated interrupt (see Section 6.7.3.3). > > > > we have to be able to tell when Link training completes, then wait > > 100ms. For us to tell when training is complete, Data Link Layer Link > > Active must be implemented, and the spec says it should be implemented > > iff Data Link Layer Link Active Reporting Capable bit is set. > > > > The 6.6.1 language about "greater than 5.0 GT/s" is one step removed. > > What we really care about is Data Link Layer Link Active, not the link > > speed. It seems like the spec would be much clearer if it said: > > > > With a Downstream Port that implements the Data Link Layer Link > > Active bit, software must wait a minimum of 100 ms after Link training > > completes ... > > Yes, I agree. That would make it more understandable. > > Do you want me to do some changes to the patch or you are fine with it? I think that code would be more readable as something like: if (dev->link_active_reporting) wait_for_link_active(dev); msleep(delay); Obviously we still want the printks and error/timeout checking, but fundamentally we want to wait for the link to be active (if possible), then wait the 100ms. That structure isn't as clear as it could be when the delay is buried inside pcie_wait_for_link_delay(). It's not completely trivial to restructure it like that because of pcie_wait_for_link() and its callers. But I think it's worth pushing on it a little bit to see if it could be done reasonably. This does expose the fact that per spec, a hot-plug capable port that supports only 5 GT/s must support link_active_reporting, but sec 6.6.1 says we *don't* need to wait for link training to complete before waiting 100ms. So maybe we end up waiting slightly more than necessary. Probably not worth worrying about? Bjorn
On Tue, May 12, 2020 at 12:09:06PM -0500, Bjorn Helgaas wrote: > I think that code would be more readable as something like: > > if (dev->link_active_reporting) > wait_for_link_active(dev); > msleep(delay); > > Obviously we still want the printks and error/timeout checking, but > fundamentally we want to wait for the link to be active (if possible), > then wait the 100ms. That structure isn't as clear as it could be > when the delay is buried inside pcie_wait_for_link_delay(). > > It's not completely trivial to restructure it like that because of > pcie_wait_for_link() and its callers. But I think it's worth pushing > on it a little bit to see if it could be done reasonably. OK, I see what I can do to make it clearer. > This does expose the fact that per spec, a hot-plug capable port that > supports only 5 GT/s must support link_active_reporting, but sec 6.6.1 > says we *don't* need to wait for link training to complete before > waiting 100ms. So maybe we end up waiting slightly more than > necessary. Probably not worth worrying about? I would say so. Better wait a bit too much than the opposite IMHO.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 595fcf59843f..d9d9ff5b968e 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4822,7 +4822,13 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev) if (!pcie_downstream_port(dev)) return; - if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) { + /* + * Since PCIe spec mandates that all downstream ports that support + * speeds greater than 5 GT/s must support data link layer active + * reporting so we use that here to determine when the delay should + * be issued. + */ + if (!dev->link_active_reporting) { pci_dbg(dev, "waiting %d ms for downstream link\n", delay); msleep(delay); } else {