Message ID | 20241207-fix_bwctrl_thunderbolt-v1-1-b711f572a705@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | PCI/portdrv: Disable bwctrl service if port is fixed at 2.5 GT/s | expand |
On Sat, 2024-12-07 at 19:44 +0100, Niklas Schnelle wrote: > Trying to enable bwctrl on a Thunderbolt port causes a boot hang on some > systems though the exact reason is not yet understood. As per the spec > Thunderbolt PCIe Downstream Ports have a fake Max Link Speed of 2.5 GT/s > (USB4 v2 sec 11.2.1): > > "Max Link Speed field in the Link Capabilities Register set to 0001b > (data rate of 2.5 GT/s only). > Note: These settings do not represent actual throughput. > Throughput is implementation specific and based on the USB4 Fabric > performance." > > More generally if 2.5 GT/s is the only supported link speed there is no > point in throtteling as this is already the lowest possible PCIe speed > so don't advertise the capability stopping bwctrl from being probed on > these ports. > > Link: https://lore.kernel.org/linux-pci/Z1R4VNwCOlh9Sg9n@wunner.de/ > Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") > Tested-by: Niklas Schnelle <niks@kernel.org> > Signed-off-by: Niklas Schnelle <niks@kernel.org> Should probably add but forgot: Suggested-by: Lukas Wunner <lukas@wunner.de> > --- > Note: This issue causes a boot hang on my personal workstation see the > Link for details. > --- > drivers/pci/pcie/portdrv.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index 5e10306b63081b1ddd13e0a545418e2a8610c14c..e5f80e4a11aad4ce60b2ce998b40ec9fda8c653d 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -270,7 +270,8 @@ static int get_port_device_capability(struct pci_dev *dev) > u32 linkcap; > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap); > - if (linkcap & PCI_EXP_LNKCAP_LBNC) > + if (linkcap & PCI_EXP_LNKCAP_LBNC && > + (linkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) > services |= PCIE_PORT_SERVICE_BWCTRL; > } > > > --- > base-commit: 40384c840ea1944d7c5a392e8975ed088ecf0b37 > change-id: 20241207-fix_bwctrl_thunderbolt-bd1f96b3d98f > > Best regards,
On Sat, 7 Dec 2024, Niklas Schnelle wrote: > Trying to enable bwctrl on a Thunderbolt port causes a boot hang on some > systems though the exact reason is not yet understood. As per the spec > Thunderbolt PCIe Downstream Ports have a fake Max Link Speed of 2.5 GT/s > (USB4 v2 sec 11.2.1): > > "Max Link Speed field in the Link Capabilities Register set to 0001b > (data rate of 2.5 GT/s only). > Note: These settings do not represent actual throughput. > Throughput is implementation specific and based on the USB4 Fabric > performance." > > More generally if 2.5 GT/s is the only supported link speed there is no > point in throtteling as this is already the lowest possible PCIe speed > so don't advertise the capability stopping bwctrl from being probed on > these ports. Hi, Thanks for finding the reason this far. Mika mentioned earlier to me this Link Speed stuff is all made up on Thunderbolt but I didn't make any changes back then because I thought bwctrl is not going to change the speed anyway in that case. Seem reasonable way to workaround the Thunderbold problem and agreed, there's not much point in adding bwctrl for 2.5GT/s only ports. Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
On Sat, Dec 07, 2024 at 07:44:09PM +0100, Niklas Schnelle wrote: > Trying to enable bwctrl on a Thunderbolt port causes a boot hang on some > systems though the exact reason is not yet understood. As per the spec > Thunderbolt PCIe Downstream Ports have a fake Max Link Speed of 2.5 GT/s > (USB4 v2 sec 11.2.1): > > "Max Link Speed field in the Link Capabilities Register set to 0001b > (data rate of 2.5 GT/s only). > Note: These settings do not represent actual throughput. > Throughput is implementation specific and based on the USB4 Fabric > performance." > > More generally if 2.5 GT/s is the only supported link speed there is no > point in throtteling as this is already the lowest possible PCIe speed > so don't advertise the capability stopping bwctrl from being probed on > these ports. > > Link: https://lore.kernel.org/linux-pci/Z1R4VNwCOlh9Sg9n@wunner.de/ > Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") > Tested-by: Niklas Schnelle <niks@kernel.org> > Signed-off-by: Niklas Schnelle <niks@kernel.org> Makes sense to me, Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
On Sat, Dec 07, 2024 at 07:44:09PM +0100, Niklas Schnelle wrote: > Trying to enable bwctrl on a Thunderbolt port causes a boot hang on some > systems though the exact reason is not yet understood. Probably worth highlighting the discrete Thunderbolt chip which exhibits this issue, i.e. Intel JHL7540 (Titan Ridge). > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -270,7 +270,8 @@ static int get_port_device_capability(struct pci_dev *dev) > u32 linkcap; > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap); > - if (linkcap & PCI_EXP_LNKCAP_LBNC) > + if (linkcap & PCI_EXP_LNKCAP_LBNC && > + (linkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) > services |= PCIE_PORT_SERVICE_BWCTRL; > } This is fine in principle because PCIe r6.2 sec 8.2.1 states: "A device must support 2.5 GT/s and is not permitted to skip support for any data rates between 2.5 GT/s and the highest supported rate." However the Implementation Note at the end of PCIe r6.2 sec 7.5.3.18 cautions: "It is strongly encouraged that software primarily utilize the Supported Link Speeds Vector instead of the Max Link Speed field, so that software can determine the exact set of supported speeds on current and future hardware. This can avoid software being confused if a future specification defines Links that do not require support for all slower speeds." First of all, the Supported Link Speeds field in the Link Capabilities register (which you're querying here) was renamed to Max Link Speed in PCIe r3.1 and a new Link Capabilities 2 register was added which contains a new Supported Link Speeds field. Software is supposed to query the latter if the device implements the Link Capabilities 2 register (see the other Implementation Note at the end of PCIe r6.2 sec 7.5.3.18). Second, the above-quoted Implementation Note says that software should not rely on future spec versions to mandate that *all* link speeds (2.5 GT/s and all intermediate speeds up to the maximum supported speed) are supported. Since v6.13-rc1, we cache the supported speeds in the "supported_speeds" field in struct pci_dev, taking care of the PCIe 3.0 versus later versions issue. So to make this future-proof what you could do is check whether only a *single* speed is supported (which could be something else than 2.5 GT/s if future spec versions allow that), i.e.: - if (linkcap & PCI_EXP_LNKCAP_LBNC) + if (linkcap & PCI_EXP_LNKCAP_LBNC && + hweight8(dev->supported_speeds) > 1) ...and optionally add a code comment, e.g.: /* Enable bandwidth control if more than one speed is supported. */ Thanks, Lukas
On Tue, 2024-12-10 at 11:05 +0100, Lukas Wunner wrote: > On Sat, Dec 07, 2024 at 07:44:09PM +0100, Niklas Schnelle wrote: > > Trying to enable bwctrl on a Thunderbolt port causes a boot hang on some > > systems though the exact reason is not yet understood. > > Probably worth highlighting the discrete Thunderbolt chip which exhibits > this issue, i.e. Intel JHL7540 (Titan Ridge). Agree will ad for v2. > > > --- a/drivers/pci/pcie/portdrv.c > > +++ b/drivers/pci/pcie/portdrv.c > > @@ -270,7 +270,8 @@ static int get_port_device_capability(struct pci_dev *dev) > > u32 linkcap; > > > > pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap); > > - if (linkcap & PCI_EXP_LNKCAP_LBNC) > > + if (linkcap & PCI_EXP_LNKCAP_LBNC && > > + (linkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) > > services |= PCIE_PORT_SERVICE_BWCTRL; > > } > > This is fine in principle because PCIe r6.2 sec 8.2.1 states: > > "A device must support 2.5 GT/s and is not permitted to skip support > for any data rates between 2.5 GT/s and the highest supported rate." > > However the Implementation Note at the end of PCIe r6.2 sec 7.5.3.18 > cautions: > > "It is strongly encouraged that software primarily utilize the > Supported Link Speeds Vector instead of the Max Link Speed field, > so that software can determine the exact set of supported speeds > on current and future hardware. This can avoid software being > confused if a future specification defines Links that do not > require support for all slower speeds." > > First of all, the Supported Link Speeds field in the Link Capabilities > register (which you're querying here) was renamed to Max Link Speed in > PCIe r3.1 and a new Link Capabilities 2 register was added which contains > a new Supported Link Speeds field. Software is supposed to query the > latter if the device implements the Link Capabilities 2 register > (see the other Implementation Note at the end of PCIe r6.2 sec 7.5.3.18). Would it maybe make sense to update the comment for PCI_EXP_LNKCAP_SLS in pci_regs.h to point out that in PCIe r3.1 and newer this is called the Max Link Speed field? This would certainly helped me here. > > Second, the above-quoted Implementation Note says that software should > not rely on future spec versions to mandate that *all* link speeds > (2.5 GT/s and all intermediate speeds up to the maximum supported speed) > are supported. > > Since v6.13-rc1, we cache the supported speeds in the "supported_speeds" > field in struct pci_dev, taking care of the PCIe 3.0 versus later versions > issue. > > So to make this future-proof what you could do is check whether only a > *single* speed is supported (which could be something else than 2.5 GT/s > if future spec versions allow that), i.e.: > > - if (linkcap & PCI_EXP_LNKCAP_LBNC) > + if (linkcap & PCI_EXP_LNKCAP_LBNC && > + hweight8(dev->supported_speeds) > 1) This also makes sense to me in that the argument holds that if there is only one supported speed bwctrl can't control it. That said it is definitely more general than this patch. Sadly, I tried it and in my case it doesn't work. Taking a closer look at lspci -vvv of the Thunderbolt port as well as a debug print reveals why: 07:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06) (prog-if 00 [Normal decode]) ... LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L1, Exit Latency L1 <1us ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+ LnkCtl: ASPM Disabled; LnkDisable- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x4 TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt- ... LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot ... So it seems that on this Thunderbolt chip the LnkCap field says 2.5 GT/s only as per the USB 4 spec you quoted but LnkCap2 is 0x0E i.e. 2.5-8 GT/s. I wonder if this is related to why the hang occurs. Could it be that bwctrl tries to enable speeds above 2.5 GT/s and that causes links to fail? Thanks, Niklas
On Tue, Dec 10, 2024 at 09:45:18PM +0100, Niklas Schnelle wrote: > On Tue, 2024-12-10 at 11:05 +0100, Lukas Wunner wrote: > > First of all, the Supported Link Speeds field in the Link Capabilities > > register (which you're querying here) was renamed to Max Link Speed in > > PCIe r3.1 and a new Link Capabilities 2 register was added which contains > > a new Supported Link Speeds field. Software is supposed to query the > > latter if the device implements the Link Capabilities 2 register > > (see the other Implementation Note at the end of PCIe r6.2 sec 7.5.3.18). > > Would it maybe make sense to update the comment for PCI_EXP_LNKCAP_SLS > in pci_regs.h to point out that in PCIe r3.1 and newer this is called > the Max Link Speed field? This would certainly helped me here. The macros for the individual speeds (e.g. PCI_EXP_LNKCAP_SLS_2_5GB) already have code comments which describe their new meaning. I guess the reason why the code comment for PCI_EXP_LNKCAP_SLS wasn't updated is that it seeks to document the meaning of the "SLS" acronym (Supported Link Speeds). But yes, amending that with something like... /* Max Link Speed (Supported Link Speeds before PCIe r3.1) */ ...probably make sense, so feel free to propose that in a separate patch. > > So to make this future-proof what you could do is check whether only a > > *single* speed is supported (which could be something else than 2.5 GT/s > > if future spec versions allow that), i.e.: > > > > - if (linkcap & PCI_EXP_LNKCAP_LBNC) > > + if (linkcap & PCI_EXP_LNKCAP_LBNC && > > + hweight8(dev->supported_speeds) > 1) > > This also makes sense to me in that the argument holds that if there is > only one supported speed bwctrl can't control it. That said it is > definitely more general than this patch. > > Sadly, I tried it and in my case it doesn't work. Taking a closer look > at lspci -vvv of the Thunderbolt port as well as a debug print reveals > why: > > 07:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06) (prog-if 00 [Normal decode]) > ... > LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L1, Exit Latency L1 <1us > ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+ > LnkCtl: ASPM Disabled; LnkDisable- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x4 > TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt- > ... > LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot > ... > > So it seems that on this Thunderbolt chip the LnkCap field > says 2.5 GT/s only as per the USB 4 spec you quoted but LnkCap2 > is 0x0E i.e. 2.5-8 GT/s. > > I wonder if this is related to why the hang occurs. Could it be that > bwctrl tries to enable speeds above 2.5 GT/s and that causes links to > fail? Ilpo knows this code better than I do but yes, that's plausible. The bandwidth controller does't change the speed by itself, it only monitors speed changes. But it does provide a pcie_set_target_speed() API which is called by the thermal driver as well as the pcie_failed_link_retrain() quirk. I suspect the latter is the culprit here. If that suspicion is correct, you should be seeing messages such as... "removing 2.5GT/s downstream link speed restriction" ...in dmesg but I think you wrote that you're not getting any messages at all, right? Perhaps if you add "early_printk=efi" to the kernel command line you may see what's going on. One idea in this case would be to modify pcie_get_supported_speeds() such that it filters out any speeds in the Link Capabilities 2 register which exceed the Max Link Speed in the Link Capabilties register. However the spec says that software should look at the Link Capabilities 2 register to determine supported speeds if that register is present. So I think we may not conform to the spec then. The better option is thus probably to add a DECLARE_PCI_FIXUP_EARLY() quirk for Titan Ridge which sets the supported_speeds to just 2.5 GT/s. *If* you want to go with the future-proof option which checks that just one speed is supported. Titan Ridge is an old chip. I'm not sure if newer discrete Thunderbolt controllers exhibit the same issue but likely not. Thanks, Lukas
On Wed, 11 Dec 2024, Lukas Wunner wrote: > On Tue, Dec 10, 2024 at 09:45:18PM +0100, Niklas Schnelle wrote: > > On Tue, 2024-12-10 at 11:05 +0100, Lukas Wunner wrote: > > > First of all, the Supported Link Speeds field in the Link Capabilities > > > register (which you're querying here) was renamed to Max Link Speed in > > > PCIe r3.1 and a new Link Capabilities 2 register was added which contains > > > a new Supported Link Speeds field. Software is supposed to query the > > > latter if the device implements the Link Capabilities 2 register > > > (see the other Implementation Note at the end of PCIe r6.2 sec 7.5.3.18). > > > > Would it maybe make sense to update the comment for PCI_EXP_LNKCAP_SLS > > in pci_regs.h to point out that in PCIe r3.1 and newer this is called > > the Max Link Speed field? This would certainly helped me here. > > The macros for the individual speeds (e.g. PCI_EXP_LNKCAP_SLS_2_5GB) > already have code comments which describe their new meaning. > > I guess the reason why the code comment for PCI_EXP_LNKCAP_SLS wasn't > updated is that it seeks to document the meaning of the "SLS" acronym > (Supported Link Speeds). > > But yes, amending that with something like... > > /* Max Link Speed (Supported Link Speeds before PCIe r3.1) */ > > ...probably make sense, so feel free to propose that in a separate patch. > > > > So to make this future-proof what you could do is check whether only a > > > *single* speed is supported (which could be something else than 2.5 GT/s > > > if future spec versions allow that), i.e.: > > > > > > - if (linkcap & PCI_EXP_LNKCAP_LBNC) > > > + if (linkcap & PCI_EXP_LNKCAP_LBNC && > > > + hweight8(dev->supported_speeds) > 1) > > > > This also makes sense to me in that the argument holds that if there is > > only one supported speed bwctrl can't control it. That said it is > > definitely more general than this patch. > > > > Sadly, I tried it and in my case it doesn't work. Taking a closer look > > at lspci -vvv of the Thunderbolt port as well as a debug print reveals > > why: > > > > 07:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06) (prog-if 00 [Normal decode]) > > ... > > LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L1, Exit Latency L1 <1us > > ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+ > > LnkCtl: ASPM Disabled; LnkDisable- CommClk+ > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > LnkSta: Speed 2.5GT/s, Width x4 > > TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt- > > ... > > LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- > > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB > > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > > Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot > > ... > > > > So it seems that on this Thunderbolt chip the LnkCap field > > says 2.5 GT/s only as per the USB 4 spec you quoted but LnkCap2 > > is 0x0E i.e. 2.5-8 GT/s. > > > > I wonder if this is related to why the hang occurs. Could it be that > > bwctrl tries to enable speeds above 2.5 GT/s and that causes links to > > fail? > > Ilpo knows this code better than I do but yes, that's plausible. > The bandwidth controller does't change the speed by itself, > it only monitors speed changes. But it does provide a > pcie_set_target_speed() API which is called by the thermal driver > as well as the pcie_failed_link_retrain() quirk. I suspect the > latter is the culprit here. If that suspicion is correct, > you should be seeing messages such as... > > "removing 2.5GT/s downstream link speed restriction" That block is conditioned by pci_match_id() so I don't think it would execute. The block that prints "retraining failed" is the one which attempts to restore the old Target Link Speed. TLS seems to also be set to 8GT/s as per the lspci from Niklas which is another inconsistency in the config space. Although, I'd tend to think if these trigger the quirk/retraining now, kernel has done some retraining for these links prior to bwctrl was added, so perhaps that 8GT/s TLS is not going to be found as the culprit. What could be tried though is to not do the LBMIE enabled at all which is one clearly new thing which comes with bwctrl. Commenting out the calls to pcie_bwnotif_enable() should achieve that. > ...in dmesg but I think you wrote that you're not getting any > messages at all, right? Perhaps if you add "early_printk=efi" > to the kernel command line you may see what's going on. > > One idea in this case would be to modify pcie_get_supported_speeds() > such that it filters out any speeds in the Link Capabilities 2 register > which exceed the Max Link Speed in the Link Capabilties register. > However the spec says that software should look at the Link Capabilities 2 > register to determine supported speeds if that register is present. > So I think we may not conform to the spec then. > > The better option is thus probably to add a DECLARE_PCI_FIXUP_EARLY() > quirk for Titan Ridge which sets the supported_speeds to just 2.5 GT/s. > *If* you want to go with the future-proof option which checks that > just one speed is supported. > > Titan Ridge is an old chip. I'm not sure if newer discrete Thunderbolt > controllers exhibit the same issue but likely not. I recall taking note of this inconsistency in some lspci dumps I've from Mika (but forgot it until now). So I'm afraid it might be more widespread than just TR.
On Wed, 11 Dec 2024 08:57:23 +0100 Lukas Wunner <lukas@wunner.de> wrote: > On Tue, Dec 10, 2024 at 09:45:18PM +0100, Niklas Schnelle wrote: > > On Tue, 2024-12-10 at 11:05 +0100, Lukas Wunner wrote: > > > First of all, the Supported Link Speeds field in the Link Capabilities > > > register (which you're querying here) was renamed to Max Link Speed in > > > PCIe r3.1 and a new Link Capabilities 2 register was added which contains > > > a new Supported Link Speeds field. Software is supposed to query the > > > latter if the device implements the Link Capabilities 2 register > > > (see the other Implementation Note at the end of PCIe r6.2 sec 7.5.3.18). > > > > Would it maybe make sense to update the comment for PCI_EXP_LNKCAP_SLS > > in pci_regs.h to point out that in PCIe r3.1 and newer this is called > > the Max Link Speed field? This would certainly helped me here. > > The macros for the individual speeds (e.g. PCI_EXP_LNKCAP_SLS_2_5GB) > already have code comments which describe their new meaning. > > I guess the reason why the code comment for PCI_EXP_LNKCAP_SLS wasn't > updated is that it seeks to document the meaning of the "SLS" acronym > (Supported Link Speeds). > > But yes, amending that with something like... > > /* Max Link Speed (Supported Link Speeds before PCIe r3.1) */ > > ...probably make sense, so feel free to propose that in a separate patch. > > > > > So to make this future-proof what you could do is check whether only a > > > *single* speed is supported (which could be something else than 2.5 GT/s > > > if future spec versions allow that), i.e.: > > > > > > - if (linkcap & PCI_EXP_LNKCAP_LBNC) > > > + if (linkcap & PCI_EXP_LNKCAP_LBNC && > > > + hweight8(dev->supported_speeds) > 1) > > > > This also makes sense to me in that the argument holds that if there is > > only one supported speed bwctrl can't control it. That said it is > > definitely more general than this patch. > > > > Sadly, I tried it and in my case it doesn't work. Taking a closer look > > at lspci -vvv of the Thunderbolt port as well as a debug print reveals > > why: > > > > 07:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06) (prog-if 00 [Normal decode]) > > ... > > LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L1, Exit Latency L1 <1us > > ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+ > > LnkCtl: ASPM Disabled; LnkDisable- CommClk+ > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > LnkSta: Speed 2.5GT/s, Width x4 > > TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt- > > ... > > LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- > > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB > > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > > Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot > > ... > > > > So it seems that on this Thunderbolt chip the LnkCap field > > says 2.5 GT/s only as per the USB 4 spec you quoted but LnkCap2 > > is 0x0E i.e. 2.5-8 GT/s. > > > > I wonder if this is related to why the hang occurs. Could it be that > > bwctrl tries to enable speeds above 2.5 GT/s and that causes links to > > fail? > > Ilpo knows this code better than I do but yes, that's plausible. > The bandwidth controller does't change the speed by itself, > it only monitors speed changes. But it does provide a > pcie_set_target_speed() API which is called by the thermal driver > as well as the pcie_failed_link_retrain() quirk. I suspect the > latter is the culprit here. If that suspicion is correct, > you should be seeing messages such as... > > "removing 2.5GT/s downstream link speed restriction" > > ...in dmesg but I think you wrote that you're not getting any > messages at all, right? Perhaps if you add "early_printk=efi" > to the kernel command line you may see what's going on. > > One idea in this case would be to modify pcie_get_supported_speeds() > such that it filters out any speeds in the Link Capabilities 2 register > which exceed the Max Link Speed in the Link Capabilties register. > However the spec says that software should look at the Link Capabilities 2 > register to determine supported speeds if that register is present. > So I think we may not conform to the spec then. > > The better option is thus probably to add a DECLARE_PCI_FIXUP_EARLY() > quirk for Titan Ridge which sets the supported_speeds to just 2.5 GT/s. > *If* you want to go with the future-proof option which checks that > just one speed is supported. I'd definitely support going with the future proof solution here if we can. > > Titan Ridge is an old chip. I'm not sure if newer discrete Thunderbolt > controllers exhibit the same issue but likely not. > > Thanks, > > Lukas >
On Wed, Dec 11, 2024 at 03:07:38PM +0200, Ilpo Järvinen wrote: > I recall taking note of this inconsistency in some lspci dumps I've from > Mika (but forgot it until now). So I'm afraid it might be more widespread > than just TR. Oh you did? Interesting. After re-reading the spec I'm convinced now that we're doing this wrong and that we should honor the Max Link Speed instead of blindly deeming all set bits in the Link Capabilities 2 Register as supported speeds: https://lore.kernel.org/r/e3386d62a766be6d0ef7138a001dabfe563cdff8.1733991971.git.lukas@wunner.de/ @Niklas, could you test if this is sufficient to avoid the issue? Or do we still need to stop instantiating the bandwidth controller if more than one speed is supported? Thanks! Lukas
On Thu, 2024-12-12 at 10:08 +0100, Lukas Wunner wrote: > On Wed, Dec 11, 2024 at 03:07:38PM +0200, Ilpo Järvinen wrote: > > I recall taking note of this inconsistency in some lspci dumps I've from > > Mika (but forgot it until now). So I'm afraid it might be more widespread > > than just TR. > > Oh you did? Interesting. After re-reading the spec I'm convinced now > that we're doing this wrong and that we should honor the Max Link Speed > instead of blindly deeming all set bits in the Link Capabilities 2 > Register as supported speeds: > > https://lore.kernel.org/r/e3386d62a766be6d0ef7138a001dabfe563cdff8.1733991971.git.lukas@wunner.de/ > > @Niklas, could you test if this is sufficient to avoid the issue? > Or do we still need to stop instantiating the bandwidth controller > if more than one speed is supported? > > Thanks! > > Lukas Yes, I will test this but will only get to do so tonight (UTC +2). If it's not sufficient I think we could use the modified pcie_get_supported_speeds() to check if only one link speed is supported, right? Thanks, Niklas
On Thu, Dec 12, 2024 at 10:17:21AM +0100, Niklas Schnelle wrote: > On Thu, 2024-12-12 at 10:08 +0100, Lukas Wunner wrote: > > After re-reading the spec I'm convinced now > > that we're doing this wrong and that we should honor the Max Link Speed > > instead of blindly deeming all set bits in the Link Capabilities 2 > > Register as supported speeds: > > > > https://lore.kernel.org/r/e3386d62a766be6d0ef7138a001dabfe563cdff8.1733991971.git.lukas@wunner.de/ > > > > @Niklas, could you test if this is sufficient to avoid the issue? > > Or do we still need to stop instantiating the bandwidth controller > > if more than one speed is supported? > > Yes, I will test this but will only get to do so tonight (UTC +2). Hey, no worries. We're not on the run! > If it's not sufficient I think we could use the modified > pcie_get_supported_speeds() to check if only one link speed is > supported, right? pcie_get_supported_speeds() is used to fill in the supported_speeds field in struct pci_dev. And that field is used in a number of places (exposure of the max link speed in sysfs, delay handling in pci_bridge_wait_for_secondary_bus(), link tuning in radeon/amdgpu drivers, etc). So we can't use pcie_get_supported_speeds() to (exclusively) influence the behavior of the bandwidth controller. Instead, the solution is your patch for get_port_device_capability(), but future-proofed such that bwctrl is only instantiated if more than one link speed is supported. Thanks! Lukas
On Thu, 2024-12-12 at 13:32 +0100, Lukas Wunner wrote: > On Thu, Dec 12, 2024 at 10:17:21AM +0100, Niklas Schnelle wrote: > > On Thu, 2024-12-12 at 10:08 +0100, Lukas Wunner wrote: > > > After re-reading the spec I'm convinced now > > > that we're doing this wrong and that we should honor the Max Link Speed > > > instead of blindly deeming all set bits in the Link Capabilities 2 > > > Register as supported speeds: > > > > > > https://lore.kernel.org/r/e3386d62a766be6d0ef7138a001dabfe563cdff8.1733991971.git.lukas@wunner.de/ > > > > > > @Niklas, could you test if this is sufficient to avoid the issue? > > > Or do we still need to stop instantiating the bandwidth controller > > > if more than one speed is supported? > > > > Yes, I will test this but will only get to do so tonight (UTC +2). > > Hey, no worries. We're not on the run! > > > If it's not sufficient I think we could use the modified > > pcie_get_supported_speeds() to check if only one link speed is > > supported, right? > > pcie_get_supported_speeds() is used to fill in the supported_speeds > field in struct pci_dev. > > And that field is used in a number of places (exposure of the max link > speed in sysfs, delay handling in pci_bridge_wait_for_secondary_bus(), > link tuning in radeon/amdgpu drivers, etc). > > So we can't use pcie_get_supported_speeds() to (exclusively) influence > the behavior of the bandwidth controller. Instead, the solution is your > patch for get_port_device_capability(), but future-proofed such that > bwctrl is only instantiated if more than one link speed is supported. > > Thanks! > > Lukas Yeah right, I was imprecise, should have said that we can use the use the updated pcie_get_supported_speeds() via the now correct dev- >supported_speeds. But first let's see if it alone already fixes things. Thanks, Niklas
On Thu, 2024-12-12 at 13:32 +0100, Lukas Wunner wrote: > On Thu, Dec 12, 2024 at 10:17:21AM +0100, Niklas Schnelle wrote: > > On Thu, 2024-12-12 at 10:08 +0100, Lukas Wunner wrote: > > > After re-reading the spec I'm convinced now > > > that we're doing this wrong and that we should honor the Max Link Speed > > > instead of blindly deeming all set bits in the Link Capabilities 2 > > > Register as supported speeds: > > > > > > https://lore.kernel.org/r/e3386d62a766be6d0ef7138a001dabfe563cdff8.1733991971.git.lukas@wunner.de/ > > > > > > @Niklas, could you test if this is sufficient to avoid the issue? > > > Or do we still need to stop instantiating the bandwidth controller > > > if more than one speed is supported? > > > > Yes, I will test this but will only get to do so tonight (UTC +2). > > Hey, no worries. We're not on the run! > > > If it's not sufficient I think we could use the modified > > pcie_get_supported_speeds() to check if only one link speed is > > supported, right? > > pcie_get_supported_speeds() is used to fill in the supported_speeds > field in struct pci_dev. > > And that field is used in a number of places (exposure of the max link > speed in sysfs, delay handling in pci_bridge_wait_for_secondary_bus(), > link tuning in radeon/amdgpu drivers, etc). Side question. If this is used in radeon/amdgpu could detecting the thunderbolt port's max link speed as 2.5 GT/s cause issues for external GPUs?
On Thu, Dec 12, 2024 at 09:40:04PM +0100, Niklas Schnelle wrote: > On Thu, 2024-12-12 at 13:32 +0100, Lukas Wunner wrote: > > pcie_get_supported_speeds() is used to fill in the supported_speeds > > field in struct pci_dev. > > > > And that field is used in a number of places (exposure of the max link > > speed in sysfs, delay handling in pci_bridge_wait_for_secondary_bus(), > > link tuning in radeon/amdgpu drivers, etc). > > Side question. If this is used in radeon/amdgpu could detecting the > thunderbolt port's max link speed as 2.5 GT/s cause issues for external > GPUs? I don't think so: An attached Thunderbolt gadget (e.g. eGPU) is visible to the OS as a PCIe switch. A portion of the Switch Downstream Ports is used to attach Endpoints (e.g. GPU) and the remainder is used for tunneling, i.e. to extend the hierarchy further if multiple Thunderbolt devices are daisy-chained. My expectation is that the Max Link Speed is 8 GT/s on those Downstream Ports leading to Endpoints and 2.5 GT/s on those Downstream Ports used for tunneling (to conform with the USB4/Thunderbolt spec). In other words, the Supported Link Speeds is the same on all of them, but Max Link Speed is reduced to 2.5 GT/s on so-called PCIe Adapters (in USB4/Thunderbolt terminology). The PCIe Adapters encapsulate PCIe TLPs into Thunderbolt packets and send them over the Thunderbolt fabric, and similarly decapsulate TLPs received from the fabric. There are some illustrations available here which explain the distinction between the two types of Downstream Ports: https://developer.apple.com/library/archive/documentation/HardwareDrivers/Conceptual/ThunderboltDevGuide/Basics/Basics.html I'm hoping Mika or Ilpo can verify the above information. I have lspci dumps here of MeteorLake-P and BarlowRidge host controllers, but without any attached USB4/Thunderbolt gadgets. Thanks, Lukas
On Fri, Dec 13, 2024 at 09:37:00AM +0100, Lukas Wunner wrote: > On Thu, Dec 12, 2024 at 09:40:04PM +0100, Niklas Schnelle wrote: > > On Thu, 2024-12-12 at 13:32 +0100, Lukas Wunner wrote: > > > pcie_get_supported_speeds() is used to fill in the supported_speeds > > > field in struct pci_dev. > > > > > > And that field is used in a number of places (exposure of the max link > > > speed in sysfs, delay handling in pci_bridge_wait_for_secondary_bus(), > > > link tuning in radeon/amdgpu drivers, etc). > > > > Side question. If this is used in radeon/amdgpu could detecting the > > thunderbolt port's max link speed as 2.5 GT/s cause issues for external > > GPUs? > > I don't think so: > > An attached Thunderbolt gadget (e.g. eGPU) is visible to the OS as a > PCIe switch. A portion of the Switch Downstream Ports is used to > attach Endpoints (e.g. GPU) and the remainder is used for tunneling, > i.e. to extend the hierarchy further if multiple Thunderbolt devices > are daisy-chained. > > My expectation is that the Max Link Speed is 8 GT/s on those Downstream > Ports leading to Endpoints and 2.5 GT/s on those Downstream Ports used > for tunneling (to conform with the USB4/Thunderbolt spec). In other words, > the Supported Link Speeds is the same on all of them, but Max Link Speed > is reduced to 2.5 GT/s on so-called PCIe Adapters (in USB4/Thunderbolt > terminology). > > The PCIe Adapters encapsulate PCIe TLPs into Thunderbolt packets and > send them over the Thunderbolt fabric, and similarly decapsulate TLPs > received from the fabric. > > There are some illustrations available here which explain the distinction > between the two types of Downstream Ports: > > https://developer.apple.com/library/archive/documentation/HardwareDrivers/Conceptual/ThunderboltDevGuide/Basics/Basics.html > > I'm hoping Mika or Ilpo can verify the above information. I have > lspci dumps here of MeteorLake-P and BarlowRidge host controllers, > but without any attached USB4/Thunderbolt gadgets. That's right. The bandwidth in the PCIe tunnel is dynamic but the adapters announce Gen 1 x 4 according to USB4 spec. You can get up to the 90% of the available TB/USB4 link bandwidth for PCIe depending what is being tunneled. Graphics drivers should not really look for these "virtual" PCIe links but instead the native link if they need to.
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c index 5e10306b63081b1ddd13e0a545418e2a8610c14c..e5f80e4a11aad4ce60b2ce998b40ec9fda8c653d 100644 --- a/drivers/pci/pcie/portdrv.c +++ b/drivers/pci/pcie/portdrv.c @@ -270,7 +270,8 @@ static int get_port_device_capability(struct pci_dev *dev) u32 linkcap; pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap); - if (linkcap & PCI_EXP_LNKCAP_LBNC) + if (linkcap & PCI_EXP_LNKCAP_LBNC && + (linkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) services |= PCIE_PORT_SERVICE_BWCTRL; }