diff mbox series

PCI: Disallow retraining link for Atheros QCA98xx chips on non-Gen1 PCIe bridges

Message ID 20210326124326.21163-1-pali@kernel.org (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series PCI: Disallow retraining link for Atheros QCA98xx chips on non-Gen1 PCIe bridges | expand

Commit Message

Pali Rohár March 26, 2021, 12:43 p.m. UTC
Atheros QCA9880 and QCA9890 chips do not behave after a bus reset and also
after retrain link when PCIe bridge is not in GEN1 mode at 2.5 GT/s speed.
The device will throw a Link Down error and config space is not accessible
again. Retrain link can be called only when using GEN1 PCIe bridge or when
PCIe bridge has forced link speed to 2.5 GT/s via PCI_EXP_LNKCTL2 register.

This issue was reproduced with more Compex WLE900VX cards (QCA9880 based)
on Armada 385 with pci-mvebu.c driver and also on Armada 3720 with
pci-aardvark.c driver. Also this issue was reproduced with some "noname"
card with QCA9890 WiFi chip on Armada 3720. All problematic cards with
these QCA chips have PCI device id 0x003c.

Tests showed that other WiFi cards based on AR93xx (PCI device id 0x0030)
and AR9287 (PCI device id 0x002e) chips do not have these problems.

To workaround this issue, this change introduces a new PCI quirk called
PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1 for PCI device id 0x003c.

When this quirk is set then kernel disallows triggering PCI_EXP_LNKCTL_RL
bit in config space of PCIe Bridge in case PCIe Bridge is capable of higher
speed than 2.5 GT/s and higher speed is already allowed. When PCIe Bridge
has accessible LNKCTL2 register then kernel tries to force target link
speed via PCI_EXP_LNKCTL2_TLS* bits to 2.5 GT/s. After this change it is
possible to trigger PCI_EXP_LNKCTL_RL bit without causing issues on
problematic Atheros QCA98xx cards.

Currently only PCIe ASPM kernel code triggers this PCI_EXP_LNKCTL_RL bit,
so quirk check is added only into pcie/aspm.c file.

Signed-off-by: Pali Rohár <pali@kernel.org>
Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: Marek Behún <kabel@kernel.org>
Link: https://lore.kernel.org/linux-pci/87h7l8axqp.fsf@toke.dk/
Cc: stable@vger.kernel.org # c80851f6ce63a ("PCI: Add PCI_EXP_LNKCTL2_TLS* macros")
---
 drivers/pci/pcie/aspm.c | 43 +++++++++++++++++++++++++++++++++++++++++
 drivers/pci/quirks.c    | 15 +++++++++++++-
 include/linux/pci.h     |  2 ++
 3 files changed, 59 insertions(+), 1 deletion(-)

Comments

Toke Høiland-Jørgensen March 26, 2021, 6:13 p.m. UTC | #1
Pali Rohár <pali@kernel.org> writes:

> Atheros QCA9880 and QCA9890 chips do not behave after a bus reset and also
> after retrain link when PCIe bridge is not in GEN1 mode at 2.5 GT/s speed.
> The device will throw a Link Down error and config space is not accessible
> again. Retrain link can be called only when using GEN1 PCIe bridge or when
> PCIe bridge has forced link speed to 2.5 GT/s via PCI_EXP_LNKCTL2 register.
>
> This issue was reproduced with more Compex WLE900VX cards (QCA9880 based)
> on Armada 385 with pci-mvebu.c driver and also on Armada 3720 with
> pci-aardvark.c driver. Also this issue was reproduced with some "noname"
> card with QCA9890 WiFi chip on Armada 3720. All problematic cards with
> these QCA chips have PCI device id 0x003c.
>
> Tests showed that other WiFi cards based on AR93xx (PCI device id 0x0030)
> and AR9287 (PCI device id 0x002e) chips do not have these problems.
>
> To workaround this issue, this change introduces a new PCI quirk called
> PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1 for PCI device id 0x003c.
>
> When this quirk is set then kernel disallows triggering PCI_EXP_LNKCTL_RL
> bit in config space of PCIe Bridge in case PCIe Bridge is capable of higher
> speed than 2.5 GT/s and higher speed is already allowed. When PCIe Bridge
> has accessible LNKCTL2 register then kernel tries to force target link
> speed via PCI_EXP_LNKCTL2_TLS* bits to 2.5 GT/s. After this change it is
> possible to trigger PCI_EXP_LNKCTL_RL bit without causing issues on
> problematic Atheros QCA98xx cards.
>
> Currently only PCIe ASPM kernel code triggers this PCI_EXP_LNKCTL_RL bit,
> so quirk check is added only into pcie/aspm.c file.
>
> Signed-off-by: Pali Rohár <pali@kernel.org>
> Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Tested-by: Marek Behún <kabel@kernel.org>
> Link: https://lore.kernel.org/linux-pci/87h7l8axqp.fsf@toke.dk/
> Cc: stable@vger.kernel.org # c80851f6ce63a ("PCI: Add
> PCI_EXP_LNKCTL2_TLS* macros")

Thanks!

Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Krzysztof Wilczyński March 27, 2021, 12:14 a.m. UTC | #2
Hi Pali,

Thank you for sending the patch over!

[...]
> +static int pcie_change_tls_to_gen1(struct pci_dev *parent)

Just a nitpick, so feel free to ignore it.  I would just call the
variable "dev" as we pass a pointer to a particular device, but it does
not matter as much, so I am leaving this to you.

[...]
> +	if (ret == 0) {

You prefer this style over "if (!ret)"?  Just asking in the view of the
style that seem to be preferred in the code base at the moment.

> +		/* Verify that new value was really set */
> +		pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &reg16);
> +		if ((reg16 & PCI_EXP_LNKCTL2_TLS) != PCI_EXP_LNKCTL2_TLS_2_5GT)
> +			ret = -EINVAL;

I am wondering about this verification - did you have a case where the
device would not properly set its capability, or accept the write and do
nothing?

> +	if (ret != 0)

I think "if (ret)" would be fine to use here, unless you prefer being
more explicit.  See my question about style above.

>  static bool pcie_retrain_link(struct pcie_link_state *link)
>  {
>  	struct pci_dev *parent = link->pdev;
>  	unsigned long end_jiffies;
>  	u16 reg16;
> +	u32 reg32;
> +
> +		/* Check if link is capable of higher speed than 2.5 GT/s and needs quirk */
> +		pcie_capability_read_dword(parent, PCI_EXP_LNKCAP, &reg32);
> +		if ((reg32 & PCI_EXP_LNKCAP_SLS) > PCI_EXP_LNKCAP_SLS_2_5GB) {

I wonder if moving this check to pcie_change_tls_to_gen1() would make
more sense?  It would then make this function a little cleaner.  What do
you think?

[...]
> +static void quirk_no_bus_reset_and_no_retrain_link(struct pci_dev *dev)
> +{
> +	dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET | PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1;
> +}
[...]

I know that the style has been changed to allow 100 characters width and
that checkpatch.pl now also does not warn about line length, as per
commit bdc48fa11e46 ("checkpatch/coding-style: deprecate 80-column
warning"), but I think Bjorn still prefers 80 characters, thus this line
above might have to be aligned.

Krzysztof
Pali Rohár March 27, 2021, 1:29 p.m. UTC | #3
Hello!

On Saturday 27 March 2021 01:14:10 Krzysztof Wilczyński wrote:
> Hi Pali,
> 
> Thank you for sending the patch over!
> 
> [...]
> > +static int pcie_change_tls_to_gen1(struct pci_dev *parent)
> 
> Just a nitpick, so feel free to ignore it.  I would just call the
> variable "dev" as we pass a pointer to a particular device, but it does
> not matter as much, so I am leaving this to you.

I called it 'parent' because it is called 'parent' also in caller
function. Link consists of two devices, so 'dev' could be ambiguous.

> [...]
> > +	if (ret == 0) {
> 
> You prefer this style over "if (!ret)"?  Just asking in the view of the
> style that seem to be preferred in the code base at the moment.

I can change this to 'if (!ret)' if needed, no problem.

I use 'if (!val)' mostly for boolean and pointer variables. If variable
can contain more integer values then I lot of times I use '=='.

> > +		/* Verify that new value was really set */
> > +		pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &reg16);
> > +		if ((reg16 & PCI_EXP_LNKCTL2_TLS) != PCI_EXP_LNKCTL2_TLS_2_5GT)
> > +			ret = -EINVAL;
> 
> I am wondering about this verification - did you have a case where the
> device would not properly set its capability, or accept the write and do
> nothing?

I do not know any real device which is doing this thing.

But this issue can happen with kernel's PCIe emulated root bridge:
drivers/pci/pci-bridge-emul.c

Drivers which are using this emulated root bridge (and both pci-mvebu.c
and pci-aardvark.c are using it!) do not have to implement callback for
every read and every write operation of every register.

Note that both pci-mvebu.c and pci-aardvark.c currently does not
implement access to HW register PCI_EXP_LNKCTL2. So currently it is not
possible to set PCI_EXP_LNKCTL2_TLS_2_5GT. And above code correctly
fails and disallow ASPM code to retrain link.

I have some WIP patches which implement LNKCAP2, LNKCTL2 and LNKSTA2
read/write callbacks on emulated bridge for pci-mvebu.c and
pci-aardvark.c. And I have tested that with those WIP patches ASPM code
can correctly switch link to 2.5GT/s and enable ASPM.

> > +	if (ret != 0)
> 
> I think "if (ret)" would be fine to use here, unless you prefer being
> more explicit.  See my question about style above.
> 
> >  static bool pcie_retrain_link(struct pcie_link_state *link)
> >  {
> >  	struct pci_dev *parent = link->pdev;
> >  	unsigned long end_jiffies;
> >  	u16 reg16;
> > +	u32 reg32;
> > +
> > +		/* Check if link is capable of higher speed than 2.5 GT/s and needs quirk */
> > +		pcie_capability_read_dword(parent, PCI_EXP_LNKCAP, &reg32);
> > +		if ((reg32 & PCI_EXP_LNKCAP_SLS) > PCI_EXP_LNKCAP_SLS_2_5GB) {
> 
> I wonder if moving this check to pcie_change_tls_to_gen1() would make
> more sense?  It would then make this function a little cleaner.  What do
> you think?

Ok, no problem. But then function needs to be renamed. Any idea how
should be this function called?

> [...]
> > +static void quirk_no_bus_reset_and_no_retrain_link(struct pci_dev *dev)
> > +{
> > +	dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET | PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1;
> > +}
> [...]
> 
> I know that the style has been changed to allow 100 characters width and
> that checkpatch.pl now also does not warn about line length, as per
> commit bdc48fa11e46 ("checkpatch/coding-style: deprecate 80-column
> warning"), but I think Bjorn still prefers 80 characters, thus this line
> above might have to be aligned.

Ok! If needed I can reformat patch to 80 characters per line.
Marek Behún March 27, 2021, 2:42 p.m. UTC | #4
On Sat, 27 Mar 2021 14:29:59 +0100
Pali Rohár <pali@kernel.org> wrote:

> I can change this to 'if (!ret)' if needed, no problem.
> 
> I use 'if (!val)' mostly for boolean and pointer variables. If
> variable can contain more integer values then I lot of times I use
> '=='.

Comparing integer varibales with explicit literals is sensible, but
if a function returning integer returns 0 on success and negative value
on error, Linux kernel has a tradition of using just
  if (!ret)
or 
  if (ret)

Marek
Pali Rohár March 27, 2021, 5:33 p.m. UTC | #5
On Saturday 27 March 2021 15:42:13 Marek Behún wrote:
> On Sat, 27 Mar 2021 14:29:59 +0100
> Pali Rohár <pali@kernel.org> wrote:
> 
> > I can change this to 'if (!ret)' if needed, no problem.
> > 
> > I use 'if (!val)' mostly for boolean and pointer variables. If
> > variable can contain more integer values then I lot of times I use
> > '=='.
> 
> Comparing integer varibales with explicit literals is sensible, but
> if a function returning integer returns 0 on success and negative value
> on error, Linux kernel has a tradition of using just
>   if (!ret)
> or 
>   if (ret)
> 
> Marek

Ok, I will change it.
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index ac0557a305af..ea5bdf6107f6 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -192,11 +192,54 @@  static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
 	link->clkpm_disable = blacklist ? 1 : 0;
 }
 
+static int pcie_change_tls_to_gen1(struct pci_dev *parent)
+{
+	u16 reg16;
+	u32 reg32;
+	int ret;
+
+	/* Check if link speed can be forced to 2.5 GT/s */
+	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &reg32);
+	if (!(reg32 & PCI_EXP_LNKCAP2_SLS_2_5GB)) {
+		pci_err(parent, "ASPM: Bridge does not support changing Link Speed to 2.5 GT/s\n");
+		return -EOPNOTSUPP;
+	}
+
+	/* Force link speed to 2.5 GT/s */
+	ret = pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
+						 PCI_EXP_LNKCTL2_TLS,
+						 PCI_EXP_LNKCTL2_TLS_2_5GT);
+	if (ret == 0) {
+		/* Verify that new value was really set */
+		pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &reg16);
+		if ((reg16 & PCI_EXP_LNKCTL2_TLS) != PCI_EXP_LNKCTL2_TLS_2_5GT)
+			ret = -EINVAL;
+	}
+
+	if (ret != 0)
+		pci_err(parent, "ASPM: Changing Target Link Speed to 2.5 GT/s failed: %d\n", ret);
+
+	return ret;
+}
+
 static bool pcie_retrain_link(struct pcie_link_state *link)
 {
 	struct pci_dev *parent = link->pdev;
 	unsigned long end_jiffies;
 	u16 reg16;
+	u32 reg32;
+
+	if (link->downstream->dev_flags & PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1) {
+		/* Check if link is capable of higher speed than 2.5 GT/s and needs quirk */
+		pcie_capability_read_dword(parent, PCI_EXP_LNKCAP, &reg32);
+		if ((reg32 & PCI_EXP_LNKCAP_SLS) > PCI_EXP_LNKCAP_SLS_2_5GB) {
+			if (pcie_change_tls_to_gen1(parent) != 0) {
+				pci_err(parent, "ASPM: Retrain Link at higher speed is disallowed by quirk\n");
+				return false;
+			}
+			pci_info(parent, "ASPM: Target Link Speed changed to 2.5 GT/s due to quirk\n");
+		}
+	}
 
 	pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &reg16);
 	reg16 |= PCI_EXP_LNKCTL_RL;
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e3ba9e..8ff690c7679d 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3558,6 +3558,11 @@  static void quirk_no_bus_reset(struct pci_dev *dev)
 	dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
 }
 
+static void quirk_no_bus_reset_and_no_retrain_link(struct pci_dev *dev)
+{
+	dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET | PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1;
+}
+
 /*
  * Some Atheros AR9xxx and QCA988x chips do not behave after a bus reset.
  * The device will throw a Link Down error on AER-capable systems and
@@ -3567,10 +3572,18 @@  static void quirk_no_bus_reset(struct pci_dev *dev)
  */
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, quirk_no_bus_reset);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0032, quirk_no_bus_reset);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, quirk_no_bus_reset);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034, quirk_no_bus_reset);
 
+/*
+ * Atheros QCA9880 and QCA9890 chips do not behave after a bus reset and also
+ * after retrain link when PCIe bridge is not in GEN1 mode at 2.5 GT/s speed.
+ * The device will throw a Link Down error and config space is not accessible
+ * again. Retrain link can be called only when using GEN1 PCIe bridge or when
+ * PCIe bridge has forced link speed to 2.5 GT/s via PCI_EXP_LNKCTL2 register.
+ */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset_and_no_retrain_link);
+
 /*
  * Root port on some Cavium CN8xxx chips do not successfully complete a bus
  * reset when used with certain child devices.  After the reset, config
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 86c799c97b77..fdbf7254e4ab 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -227,6 +227,8 @@  enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
 	/* Don't use Relaxed Ordering for TLPs directed at this device */
 	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
+	/* Don't Retrain Link for device when bridge is not in GEN1 mode */
+	PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1 = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {