diff mbox series

[v3,2/4] PCI: Revert to the original speed after PCIe failed link retraining

Message ID alpine.DEB.2.21.2408251412590.30766@angie.orcam.me.uk (mailing list archive)
State Accepted
Delegated to: Krzysztof Wilczyński
Headers show
Series PCI: Rework error reporting with PCIe failed link retraining | expand

Commit Message

Maciej W. Rozycki Aug. 25, 2024, 1:47 p.m. UTC
When `pcie_failed_link_retrain' has failed to retrain the link by hand 
it leaves the link speed restricted to 2.5GT/s, which will then affect 
any device that has been plugged in later on, which may not suffer from 
the problem that caused the speed restriction to have been attempted.  
Consequently such a downstream device will suffer from an unnecessary 
communication throughput limitation and therefore performance loss.

Remove the speed restriction then and revert the Link Control 2 register 
to its original state if link retraining with the speed restriction in 
place has failed.  Retrain the link again afterwards so as to remove any 
residual state, waiting on LT rather than DLLLA to avoid an excessive 
delay and ignoring the result as this training is supposed to fail anyway.

Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Reported-by: Matthew W Carlis <mattc@purestorage.com>
Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # v6.5+
---
Changes from v2:

- Wait on LT rather than DLLLA with clean-up retraining with the speed 
  restriction lifted, so as to avoid an excessive delay as it's supposed 
  to fail anyway.

New change in v2.
---
 drivers/pci/quirks.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

linux-pcie-failed-link-retrain-fail-unclamp.diff

Comments

Ilpo Järvinen Aug. 26, 2024, 9:16 a.m. UTC | #1
On Sun, 25 Aug 2024, Maciej W. Rozycki wrote:

> When `pcie_failed_link_retrain' has failed to retrain the link by hand 
> it leaves the link speed restricted to 2.5GT/s, which will then affect 
> any device that has been plugged in later on, which may not suffer from 
> the problem that caused the speed restriction to have been attempted.  
> Consequently such a downstream device will suffer from an unnecessary 
> communication throughput limitation and therefore performance loss.
> 
> Remove the speed restriction then and revert the Link Control 2 register 
> to its original state if link retraining with the speed restriction in 
> place has failed.  Retrain the link again afterwards so as to remove any 
> residual state, waiting on LT rather than DLLLA to avoid an excessive 
> delay and ignoring the result as this training is supposed to fail anyway.
> 
> Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
> Reported-by: Matthew W Carlis <mattc@purestorage.com>
> Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
> Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
> Cc: stable@vger.kernel.org # v6.5+
> ---
> Changes from v2:
> 
> - Wait on LT rather than DLLLA with clean-up retraining with the speed 
>   restriction lifted, so as to avoid an excessive delay as it's supposed 
>   to fail anyway.
> 
> New change in v2.
> ---
>  drivers/pci/quirks.c |   11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> linux-pcie-failed-link-retrain-fail-unclamp.diff
> Index: linux-macro/drivers/pci/quirks.c
> ===================================================================
> --- linux-macro.orig/drivers/pci/quirks.c
> +++ linux-macro/drivers/pci/quirks.c
> @@ -66,7 +66,7 @@
>   * apply this erratum workaround to any downstream ports as long as they
>   * support Link Active reporting and have the Link Control 2 register.
>   * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
> - * request a retrain and wait 200ms for the data link to go up.
> + * request a retrain and check the result.
>   *
>   * If this turns out successful and we know by the Vendor:Device ID it is
>   * safe to do so, then lift the restriction, letting the devices negotiate
> @@ -74,6 +74,10 @@
>   * firmware may have already arranged and lift it with ports that already
>   * report their data link being up.
>   *
> + * Otherwise revert the speed to the original setting and request a retrain
> + * again to remove any residual state, ignoring the result as it's supposed
> + * to fail anyway.
> + *
>   * Return TRUE if the link has been successfully retrained, otherwise FALSE.
>   */
>  bool pcie_failed_link_retrain(struct pci_dev *dev)
> @@ -92,6 +96,8 @@ bool pcie_failed_link_retrain(struct pci
>  	pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
>  	if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
>  	    PCI_EXP_LNKSTA_LBMS) {
> +		u16 oldlnkctl2 = lnkctl2;
> +
>  		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
>  
>  		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> @@ -100,6 +106,9 @@ bool pcie_failed_link_retrain(struct pci
>  
>  		if (pcie_retrain_link(dev, false)) {
>  			pci_info(dev, "retraining failed\n");
> +			pcie_capability_write_word(dev, PCI_EXP_LNKCTL2,
> +						   oldlnkctl2);
> +			pcie_retrain_link(dev, true);

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Ilpo Järvinen Sept. 9, 2024, 12:50 p.m. UTC | #2
Hi Bjorn,

You still seem to have the old version of this patch in enumeration 
branch.

Could you please consider replacing it with this v3 one that is slightly 
better (use_lt was changed to true because it makes more sense).

On Mon, 26 Aug 2024, Ilpo Järvinen wrote:
> On Sun, 25 Aug 2024, Maciej W. Rozycki wrote:
> 
> > When `pcie_failed_link_retrain' has failed to retrain the link by hand 
> > it leaves the link speed restricted to 2.5GT/s, which will then affect 
> > any device that has been plugged in later on, which may not suffer from 
> > the problem that caused the speed restriction to have been attempted.  
> > Consequently such a downstream device will suffer from an unnecessary 
> > communication throughput limitation and therefore performance loss.
> > 
> > Remove the speed restriction then and revert the Link Control 2 register 
> > to its original state if link retraining with the speed restriction in 
> > place has failed.  Retrain the link again afterwards so as to remove any 
> > residual state, waiting on LT rather than DLLLA to avoid an excessive 
> > delay and ignoring the result as this training is supposed to fail anyway.
> > 
> > Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
> > Reported-by: Matthew W Carlis <mattc@purestorage.com>
> > Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
> > Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
> > Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
> > Cc: stable@vger.kernel.org # v6.5+
> > ---
> > Changes from v2:
> > 
> > - Wait on LT rather than DLLLA with clean-up retraining with the speed 
> >   restriction lifted, so as to avoid an excessive delay as it's supposed 
> >   to fail anyway.
> > 
> > New change in v2.
> > ---
> >  drivers/pci/quirks.c |   11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > linux-pcie-failed-link-retrain-fail-unclamp.diff
> > Index: linux-macro/drivers/pci/quirks.c
> > ===================================================================
> > --- linux-macro.orig/drivers/pci/quirks.c
> > +++ linux-macro/drivers/pci/quirks.c
> > @@ -66,7 +66,7 @@
> >   * apply this erratum workaround to any downstream ports as long as they
> >   * support Link Active reporting and have the Link Control 2 register.
> >   * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
> > - * request a retrain and wait 200ms for the data link to go up.
> > + * request a retrain and check the result.
> >   *
> >   * If this turns out successful and we know by the Vendor:Device ID it is
> >   * safe to do so, then lift the restriction, letting the devices negotiate
> > @@ -74,6 +74,10 @@
> >   * firmware may have already arranged and lift it with ports that already
> >   * report their data link being up.
> >   *
> > + * Otherwise revert the speed to the original setting and request a retrain
> > + * again to remove any residual state, ignoring the result as it's supposed
> > + * to fail anyway.
> > + *
> >   * Return TRUE if the link has been successfully retrained, otherwise FALSE.
> >   */
> >  bool pcie_failed_link_retrain(struct pci_dev *dev)
> > @@ -92,6 +96,8 @@ bool pcie_failed_link_retrain(struct pci
> >  	pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
> >  	if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
> >  	    PCI_EXP_LNKSTA_LBMS) {
> > +		u16 oldlnkctl2 = lnkctl2;
> > +
> >  		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
> >  
> >  		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> > @@ -100,6 +106,9 @@ bool pcie_failed_link_retrain(struct pci
> >  
> >  		if (pcie_retrain_link(dev, false)) {
> >  			pci_info(dev, "retraining failed\n");
> > +			pcie_capability_write_word(dev, PCI_EXP_LNKCTL2,
> > +						   oldlnkctl2);
> > +			pcie_retrain_link(dev, true);
> 
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Krzysztof Wilczyński Sept. 9, 2024, 2:11 p.m. UTC | #3
Hello,

> You still seem to have the old version of this patch in enumeration 
> branch.
> 
> Could you please consider replacing it with this v3 one that is slightly 
> better (use_lt was changed to true because it makes more sense).

Done.  Should be up to date now.

	Krzysztof
diff mbox series

Patch

Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -66,7 +66,7 @@ 
  * apply this erratum workaround to any downstream ports as long as they
  * support Link Active reporting and have the Link Control 2 register.
  * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
- * request a retrain and wait 200ms for the data link to go up.
+ * request a retrain and check the result.
  *
  * If this turns out successful and we know by the Vendor:Device ID it is
  * safe to do so, then lift the restriction, letting the devices negotiate
@@ -74,6 +74,10 @@ 
  * firmware may have already arranged and lift it with ports that already
  * report their data link being up.
  *
+ * Otherwise revert the speed to the original setting and request a retrain
+ * again to remove any residual state, ignoring the result as it's supposed
+ * to fail anyway.
+ *
  * Return TRUE if the link has been successfully retrained, otherwise FALSE.
  */
 bool pcie_failed_link_retrain(struct pci_dev *dev)
@@ -92,6 +96,8 @@  bool pcie_failed_link_retrain(struct pci
 	pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
 	if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
 	    PCI_EXP_LNKSTA_LBMS) {
+		u16 oldlnkctl2 = lnkctl2;
+
 		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
 
 		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
@@ -100,6 +106,9 @@  bool pcie_failed_link_retrain(struct pci
 
 		if (pcie_retrain_link(dev, false)) {
 			pci_info(dev, "retraining failed\n");
+			pcie_capability_write_word(dev, PCI_EXP_LNKCTL2,
+						   oldlnkctl2);
+			pcie_retrain_link(dev, true);
 			return false;
 		}