diff mbox series

[for-linus,v3,1/2] PCI: Honor Max Link Speed when determining supported speeds

Message ID fe03941e3e1cc42fb9bf4395e302bff53ee2198b.1734428762.git.lukas@wunner.de (mailing list archive)
State Accepted
Delegated to: Krzysztof Wilczyński
Headers show
Series [for-linus,v3,1/2] PCI: Honor Max Link Speed when determining supported speeds | expand

Commit Message

Lukas Wunner Dec. 17, 2024, 9:51 a.m. UTC
The Supported Link Speeds Vector in the Link Capabilities 2 Register
indicates the *supported* link speeds.  The Max Link Speed field in the
Link Capabilities Register indicates the *maximum* of those speeds.

pcie_get_supported_speeds() neglects to honor the Max Link Speed field and
will thus incorrectly deem higher speeds as supported.  Fix it.

One user-visible issue addressed here is an incorrect value in the sysfs
attribute "max_link_speed".

But the main motivation is a boot hang reported by Niklas:  Intel JHL7540
"Titan Ridge 2018" Thunderbolt controllers supports 2.5-8 GT/s speeds,
but indicate 2.5 GT/s as maximum.  Ilpo recalls seeing this on more
devices.  It can be explained by the controller's Downstream Ports
supporting 8 GT/s if an Endpoint is attached, but limiting to 2.5 GT/s
if the port interfaces to a PCIe Adapter, in accordance with USB4 v2
sec 11.2.1:

   "This section defines the functionality of an Internal PCIe Port that
    interfaces to a PCIe Adapter. [...]
    The Logical sub-block shall update the PCIe configuration registers
    with the following characteristics: [...]
    Max Link Speed field in the Link Capabilities Register set to 0001b
    (data rate of 2.5 GT/s only).
    Note: These settings do not represent actual throughput. Throughput
    is implementation specific and based on the USB4 Fabric performance."

The present commit is not sufficient on its own to fix Niklas' boot hang,
but it is a prerequisite:  A subsequent commit will fix the boot hang by
enabling bandwidth control only if more than one speed is supported.

The GENMASK() macro used herein specifies 0 as lowest bit, even though
the Supported Link Speeds Vector ends at bit 1.  This is done on purpose
to avoid a GENMASK(0, 1) macro if Max Link Speed is zero.  That macro
would be invalid as the lowest bit is greater than the highest bit.
Ilpo has witnessed a zero Max Link Speed on Root Complex Integrated
Endpoints in particular, so it does occur in practice.

Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
Reported-by: Niklas Schnelle <niks@kernel.org>
Tested-by: Niklas Schnelle <niks@kernel.org>
Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/pci.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Ilpo Järvinen Dec. 17, 2024, 11:33 a.m. UTC | #1
On Tue, 17 Dec 2024, Lukas Wunner wrote:

> The Supported Link Speeds Vector in the Link Capabilities 2 Register
> indicates the *supported* link speeds.  The Max Link Speed field in the
> Link Capabilities Register indicates the *maximum* of those speeds.
> 
> pcie_get_supported_speeds() neglects to honor the Max Link Speed field and
> will thus incorrectly deem higher speeds as supported.  Fix it.
> 
> One user-visible issue addressed here is an incorrect value in the sysfs
> attribute "max_link_speed".
> 
> But the main motivation is a boot hang reported by Niklas:  Intel JHL7540
> "Titan Ridge 2018" Thunderbolt controllers supports 2.5-8 GT/s speeds,
> but indicate 2.5 GT/s as maximum.  Ilpo recalls seeing this on more
> devices.  It can be explained by the controller's Downstream Ports
> supporting 8 GT/s if an Endpoint is attached, but limiting to 2.5 GT/s
> if the port interfaces to a PCIe Adapter, in accordance with USB4 v2
> sec 11.2.1:
> 
>    "This section defines the functionality of an Internal PCIe Port that
>     interfaces to a PCIe Adapter. [...]
>     The Logical sub-block shall update the PCIe configuration registers
>     with the following characteristics: [...]
>     Max Link Speed field in the Link Capabilities Register set to 0001b
>     (data rate of 2.5 GT/s only).
>     Note: These settings do not represent actual throughput. Throughput
>     is implementation specific and based on the USB4 Fabric performance."
> 
> The present commit is not sufficient on its own to fix Niklas' boot hang,
> but it is a prerequisite:  A subsequent commit will fix the boot hang by
> enabling bandwidth control only if more than one speed is supported.
> 
> The GENMASK() macro used herein specifies 0 as lowest bit, even though
> the Supported Link Speeds Vector ends at bit 1.  This is done on purpose
> to avoid a GENMASK(0, 1) macro if Max Link Speed is zero.  That macro
> would be invalid as the lowest bit is greater than the highest bit.
> Ilpo has witnessed a zero Max Link Speed on Root Complex Integrated
> Endpoints in particular, so it does occur in practice.

Thanks for adding this extra information.

I'd also add reference to r6.2 section 7.5.3 which states those registers 
are required for RPs, Switch Ports, Bridges, and Endpoints _that are not 
RCiEPs_. My reading is that implies they're not required from RCiEPs.

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

--
 i.

> Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
> Reported-by: Niklas Schnelle <niks@kernel.org>
> Tested-by: Niklas Schnelle <niks@kernel.org>
> Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org/
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>  drivers/pci/pci.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 35dc9f2..b730560 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6240,12 +6240,14 @@ u8 pcie_get_supported_speeds(struct pci_dev *dev)
>  	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
>  	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
>  
> +	/* Ignore speeds higher than Max Link Speed */
> +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> +	speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
> +
>  	/* PCIe r3.0-compliant */
>  	if (speeds)
>  		return speeds;
>  
> -	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> -
>  	/* Synthesize from the Max Link Speed field */
>  	if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
>  		speeds = PCI_EXP_LNKCAP2_SLS_5_0GB | PCI_EXP_LNKCAP2_SLS_2_5GB;
>
Krzysztof Wilczyński Dec. 18, 2024, 11:43 p.m. UTC | #2
Hello,

> > One user-visible issue addressed here is an incorrect value in the sysfs
> > attribute "max_link_speed".
> > 
> > But the main motivation is a boot hang reported by Niklas:  Intel JHL7540
> > "Titan Ridge 2018" Thunderbolt controllers supports 2.5-8 GT/s speeds,
> > but indicate 2.5 GT/s as maximum.  Ilpo recalls seeing this on more
> > devices.  It can be explained by the controller's Downstream Ports
> > supporting 8 GT/s if an Endpoint is attached, but limiting to 2.5 GT/s
> > if the port interfaces to a PCIe Adapter, in accordance with USB4 v2
> > sec 11.2.1:
> > 
> >    "This section defines the functionality of an Internal PCIe Port that
> >     interfaces to a PCIe Adapter. [...]
> >     The Logical sub-block shall update the PCIe configuration registers
> >     with the following characteristics: [...]
> >     Max Link Speed field in the Link Capabilities Register set to 0001b
> >     (data rate of 2.5 GT/s only).
> >     Note: These settings do not represent actual throughput. Throughput
> >     is implementation specific and based on the USB4 Fabric performance."
> > 
> > The present commit is not sufficient on its own to fix Niklas' boot hang,
> > but it is a prerequisite:  A subsequent commit will fix the boot hang by
> > enabling bandwidth control only if more than one speed is supported.
> > 
> > The GENMASK() macro used herein specifies 0 as lowest bit, even though
> > the Supported Link Speeds Vector ends at bit 1.  This is done on purpose
> > to avoid a GENMASK(0, 1) macro if Max Link Speed is zero.  That macro
> > would be invalid as the lowest bit is greater than the highest bit.
> > Ilpo has witnessed a zero Max Link Speed on Root Complex Integrated
> > Endpoints in particular, so it does occur in practice.
> 
> Thanks for adding this extra information.
> 
> I'd also add reference to r6.2 section 7.5.3 which states those registers 
> are required for RPs, Switch Ports, Bridges, and Endpoints _that are not 
> RCiEPs_. My reading is that implies they're not required from RCiEPs.

Let me know how you would like to update the commit message.  I will do it
directly on the branch.

Thank you!

	Krzysztof
Lukas Wunner Dec. 19, 2024, 7:41 a.m. UTC | #3
On Thu, Dec 19, 2024 at 08:43:57AM +0900, Krzysztof Wilczy??ski wrote:
> > > The GENMASK() macro used herein specifies 0 as lowest bit, even though
> > > the Supported Link Speeds Vector ends at bit 1.  This is done on purpose
> > > to avoid a GENMASK(0, 1) macro if Max Link Speed is zero.  That macro
> > > would be invalid as the lowest bit is greater than the highest bit.
> > > Ilpo has witnessed a zero Max Link Speed on Root Complex Integrated
> > > Endpoints in particular, so it does occur in practice.
> > 
> > Thanks for adding this extra information.
> > 
> > I'd also add reference to r6.2 section 7.5.3 which states those registers 
> > are required for RPs, Switch Ports, Bridges, and Endpoints _that are not 
> > RCiEPs_. My reading is that implies they're not required from RCiEPs.
> 
> Let me know how you would like to update the commit message.  I will do it
> directly on the branch.

FWIW, I edited the commit message like this on my local branch:

-Endpoints in particular, so it does occur in practice.
+Endpoints in particular, so it does occur in practice.  (The Link
+Capabilities Register is optional on RCiEPs per PCIe r6.2 sec 7.5.3.)

In other words, I just added the sentence in parentheses.
But maybe Ilpo has another wording preference... :)

Thanks,

Lukas
Ilpo Järvinen Dec. 19, 2024, 11:05 a.m. UTC | #4
On Thu, 19 Dec 2024, Lukas Wunner wrote:

> On Thu, Dec 19, 2024 at 08:43:57AM +0900, Krzysztof Wilczy??ski wrote:
> > > > The GENMASK() macro used herein specifies 0 as lowest bit, even though
> > > > the Supported Link Speeds Vector ends at bit 1.  This is done on purpose
> > > > to avoid a GENMASK(0, 1) macro if Max Link Speed is zero.  That macro
> > > > would be invalid as the lowest bit is greater than the highest bit.
> > > > Ilpo has witnessed a zero Max Link Speed on Root Complex Integrated
> > > > Endpoints in particular, so it does occur in practice.
> > > 
> > > Thanks for adding this extra information.
> > > 
> > > I'd also add reference to r6.2 section 7.5.3 which states those registers 
> > > are required for RPs, Switch Ports, Bridges, and Endpoints _that are not 
> > > RCiEPs_. My reading is that implies they're not required from RCiEPs.
> > 
> > Let me know how you would like to update the commit message.  I will do it
> > directly on the branch.
> 
> FWIW, I edited the commit message like this on my local branch:
> 
> -Endpoints in particular, so it does occur in practice.
> +Endpoints in particular, so it does occur in practice.  (The Link
> +Capabilities Register is optional on RCiEPs per PCIe r6.2 sec 7.5.3.)
> 
> In other words, I just added the sentence in parentheses.
> But maybe Ilpo has another wording preference... :)

Your wording is good summary for the real substance that is the spec 
itself. :-)
Krzysztof Wilczyński Dec. 19, 2024, 4:37 p.m. UTC | #5
Hello,

[...]
> > > > Thanks for adding this extra information.
> > > > 
> > > > I'd also add reference to r6.2 section 7.5.3 which states those registers 
> > > > are required for RPs, Switch Ports, Bridges, and Endpoints _that are not 
> > > > RCiEPs_. My reading is that implies they're not required from RCiEPs.
> > > 
> > > Let me know how you would like to update the commit message.  I will do it
> > > directly on the branch.
> > 
> > FWIW, I edited the commit message like this on my local branch:
> > 
> > -Endpoints in particular, so it does occur in practice.
> > +Endpoints in particular, so it does occur in practice.  (The Link
> > +Capabilities Register is optional on RCiEPs per PCIe r6.2 sec 7.5.3.)
> > 
> > In other words, I just added the sentence in parentheses.
> > But maybe Ilpo has another wording preference... :)
> 
> Your wording is good summary for the real substance that is the spec 
> itself. :-)

Updated.  Thank you both!

	Krzysztof
Lukas Wunner Dec. 20, 2024, 8:57 a.m. UTC | #6
On Thu, Dec 19, 2024 at 12:50:59PM -0500, Bjorn Helgaas wrote:
> On Thu, Dec 19, 2024, 11:37AM Krzysztof Wilczynski <kw@linux.com> wrote:
> > > > > > I'd also add reference to r6.2 section 7.5.3 which states those
> > > > > > registers are required for RPs, Switch Ports, Bridges, and
> > > > > > Endpoints _that are not RCiEPs_. My reading is that implies
> > > > > > they're not required from RCiEPs.
> 
> Don't have the spec with me, but I don't know what link-related registers
> would even mean for RCiEPs. Why would we look at them at all?

We don't:  pcie_capability_read_dword() checks whether the register
being read is actually implemented by the device:

pcie_capability_read_dword()
  pcie_capability_reg_implemented()
    pcie_cap_has_lnkctl()

And pcie_cap_has_lnkctl() returns false for PCI_EXP_TYPE_RC_END,
in which case pcie_capability_read_dword() just returns zero
without accessing Config Space.

Likewise accesses to PCI_EXP_LNKCAP2_SLS are short-circuited to zero
if the device only conforms to PCIe r1.1 or earlier and thus doesn't
implement the Link Capabilities 2 Register.  (Recognizable by
PCI_EXP_FLAGS_VERS being 1 instead of 2.)

So pcie_get_supported_speeds() returns zero for such devices and
that's the value assigned to dev->supported_speeds for RCiEPs on probe.

Thanks,

Lukas
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 35dc9f2..b730560 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6240,12 +6240,14 @@  u8 pcie_get_supported_speeds(struct pci_dev *dev)
 	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
 	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
 
+	/* Ignore speeds higher than Max Link Speed */
+	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+	speeds &= GENMASK(lnkcap & PCI_EXP_LNKCAP_SLS, 0);
+
 	/* PCIe r3.0-compliant */
 	if (speeds)
 		return speeds;
 
-	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
-
 	/* Synthesize from the Max Link Speed field */
 	if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
 		speeds = PCI_EXP_LNKCAP2_SLS_5_0GB | PCI_EXP_LNKCAP2_SLS_2_5GB;