diff mbox series

PCI: Disallow ASPM on ASMedia ASM1083/1085 PCIe-PCI bridge

Message ID 20200722021803.17958-1-hancockrwd@gmail.com (mailing list archive)
State Accepted, archived
Delegated to: Bjorn Helgaas
Headers show
Series PCI: Disallow ASPM on ASMedia ASM1083/1085 PCIe-PCI bridge | expand

Commit Message

Robert Hancock July 22, 2020, 2:18 a.m. UTC
Recently ASPM handling was changed to no longer disable ASPM on all
PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge
devices don't seem to function properly with ASPM enabled, as they
cause the parent PCIe root port to cause repeated AER timeout errors.
In addition to flooding the kernel log, this also causes the machine
to wake up immediately after suspend is initiated.

Fixes: 66ff14e59e8a ("PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges")
Cc: stable@vger.kernel.org
Signed-off-by: Robert Hancock <hancockrwd@gmail.com>
---
 drivers/pci/quirks.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Bjorn Helgaas July 22, 2020, 5:40 p.m. UTC | #1
[+cc Puranjay]

On Tue, Jul 21, 2020 at 08:18:03PM -0600, Robert Hancock wrote:
> Recently ASPM handling was changed to no longer disable ASPM on all
> PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge
> devices don't seem to function properly with ASPM enabled, as they
> cause the parent PCIe root port to cause repeated AER timeout errors.
> In addition to flooding the kernel log, this also causes the machine
> to wake up immediately after suspend is initiated.

Hi Robert, thanks a lot for the report of this problem
(https://lore.kernel.org/r/CADLC3L1R2hssRjxHJv9yhdN_7-hGw58rXSfNp-FraZh0Tw+gRw@mail.gmail.com
and https://bugzilla.redhat.com/show_bug.cgi?id=1853960).

I'm pretty sure Linux ASPM support is missing some things.  This
problem might be a hardware problem where a quirk is the right
solution, but it could also be that it's a result of a Linux defect
that we should fix.

Could you collect the dmesg log and "sudo lspci -vvxxxx" output
somewhere (maybe a bugzilla.kernel.org issue)?  I want to figure out
whether this L1 PM substates are enabled on this link, and whether
that's configured correctly.

> Fixes: 66ff14e59e8a ("PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges")
> Cc: stable@vger.kernel.org
> Signed-off-by: Robert Hancock <hancockrwd@gmail.com>
> ---
>  drivers/pci/quirks.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 812bfc32ecb8..e5713114f2ab 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -2330,6 +2330,19 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f1, quirk_disable_aspm_l0s);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f4, quirk_disable_aspm_l0s);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1508, quirk_disable_aspm_l0s);
>  
> +static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
> +{
> +	pci_info(dev, "Disabling ASPM L0s/L1\n");
> +	pci_disable_link_state(dev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1);
> +}
> +
> +/*
> + * ASM1083/1085 PCIe-PCI bridge devices cause AER timeout errors on the
> + * upstream PCIe root port when ASPM is enabled. At least L0s mode is affected,
> + * disable both L0s and L1 for now to be safe.
> + */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
> +
>  /*
>   * Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe Retrain
>   * Link bit cleared after starting the link retrain process to allow this
> -- 
> 2.26.2
>
Robert Hancock July 23, 2020, 12:46 a.m. UTC | #2
On Wed, Jul 22, 2020 at 11:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Puranjay]
>
> On Tue, Jul 21, 2020 at 08:18:03PM -0600, Robert Hancock wrote:
> > Recently ASPM handling was changed to no longer disable ASPM on all
> > PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge
> > devices don't seem to function properly with ASPM enabled, as they
> > cause the parent PCIe root port to cause repeated AER timeout errors.
> > In addition to flooding the kernel log, this also causes the machine
> > to wake up immediately after suspend is initiated.
>
> Hi Robert, thanks a lot for the report of this problem
> (https://lore.kernel.org/r/CADLC3L1R2hssRjxHJv9yhdN_7-hGw58rXSfNp-FraZh0Tw+gRw@mail.gmail.com
> and https://bugzilla.redhat.com/show_bug.cgi?id=1853960).
>
> I'm pretty sure Linux ASPM support is missing some things.  This
> problem might be a hardware problem where a quirk is the right
> solution, but it could also be that it's a result of a Linux defect
> that we should fix.
>
> Could you collect the dmesg log and "sudo lspci -vvxxxx" output
> somewhere (maybe a bugzilla.kernel.org issue)?  I want to figure out
> whether this L1 PM substates are enabled on this link, and whether
> that's configured correctly.

Created a Bugzilla entry and added dmesg and lspci output:
https://bugzilla.kernel.org/show_bug.cgi?id=208667

As I noted in that report, I subsequently found this page on ASMedia's
site: https://www.asmedia.com.tw/eng/e_show_products.php?cate_index=169&item=114
which indicates this ASM1083 device has "No PCIe ASPM support". It's
not clear why this problem isn't occurring on Windows however - either
it is not enabling ASPM, somehow it doesn't cause issues with the PCIe
link, or it is causing issues and just doesn't notify the user in any
way. I can try and check if this bridge device is ending up with ASPM
enabled under Windows 10 or not..
Bjorn Helgaas July 23, 2020, 1:04 a.m. UTC | #3
On Wed, Jul 22, 2020 at 06:46:06PM -0600, Robert Hancock wrote:
> On Wed, Jul 22, 2020 at 11:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, Jul 21, 2020 at 08:18:03PM -0600, Robert Hancock wrote:
> > > Recently ASPM handling was changed to no longer disable ASPM on all
> > > PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge
> > > devices don't seem to function properly with ASPM enabled, as they
> > > cause the parent PCIe root port to cause repeated AER timeout errors.
> > > In addition to flooding the kernel log, this also causes the machine
> > > to wake up immediately after suspend is initiated.
> >
> > Hi Robert, thanks a lot for the report of this problem
> > (https://lore.kernel.org/r/CADLC3L1R2hssRjxHJv9yhdN_7-hGw58rXSfNp-FraZh0Tw+gRw@mail.gmail.com
> > and https://bugzilla.redhat.com/show_bug.cgi?id=1853960).
> >
> > I'm pretty sure Linux ASPM support is missing some things.  This
> > problem might be a hardware problem where a quirk is the right
> > solution, but it could also be that it's a result of a Linux defect
> > that we should fix.
> >
> > Could you collect the dmesg log and "sudo lspci -vvxxxx" output
> > somewhere (maybe a bugzilla.kernel.org issue)?  I want to figure out
> > whether this L1 PM substates are enabled on this link, and whether
> > that's configured correctly.
> 
> Created a Bugzilla entry and added dmesg and lspci output:
> https://bugzilla.kernel.org/show_bug.cgi?id=208667
> 
> As I noted in that report, I subsequently found this page on ASMedia's
> site: https://www.asmedia.com.tw/eng/e_show_products.php?cate_index=169&item=114
> which indicates this ASM1083 device has "No PCIe ASPM support".

How nice.  According to your lspci, the device itself claims to
support ASPM:

  02:00.0 ... ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge
    LnkCap: ... ASPM L0s L1 ...

but the web page claims otherwise.  That would mean the device is
defective for claiming something that's not true.  Or possibly those
capability bits can be set by BIOS.

> It's not clear why this problem isn't occurring on Windows however -
> either it is not enabling ASPM, somehow it doesn't cause issues with
> the PCIe link, or it is causing issues and just doesn't notify the
> user in any way. I can try and check if this bridge device is ending
> up with ASPM enabled under Windows 10 or not..

If Windows *does* manage to enable ASPM, that would be interesting.  I
don't know whether Windows has a similar quirk mechanism.  I suppose
they must have *some* way to work around defective devices.

Bjorn
Robert Hancock July 23, 2020, 1:41 a.m. UTC | #4
On Wed, Jul 22, 2020 at 7:04 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Jul 22, 2020 at 06:46:06PM -0600, Robert Hancock wrote:
> > On Wed, Jul 22, 2020 at 11:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Tue, Jul 21, 2020 at 08:18:03PM -0600, Robert Hancock wrote:
> > > > Recently ASPM handling was changed to no longer disable ASPM on all
> > > > PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge
> > > > devices don't seem to function properly with ASPM enabled, as they
> > > > cause the parent PCIe root port to cause repeated AER timeout errors.
> > > > In addition to flooding the kernel log, this also causes the machine
> > > > to wake up immediately after suspend is initiated.
> > >
> > > Hi Robert, thanks a lot for the report of this problem
> > > (https://lore.kernel.org/r/CADLC3L1R2hssRjxHJv9yhdN_7-hGw58rXSfNp-FraZh0Tw+gRw@mail.gmail.com
> > > and https://bugzilla.redhat.com/show_bug.cgi?id=1853960).
> > >
> > > I'm pretty sure Linux ASPM support is missing some things.  This
> > > problem might be a hardware problem where a quirk is the right
> > > solution, but it could also be that it's a result of a Linux defect
> > > that we should fix.
> > >
> > > Could you collect the dmesg log and "sudo lspci -vvxxxx" output
> > > somewhere (maybe a bugzilla.kernel.org issue)?  I want to figure out
> > > whether this L1 PM substates are enabled on this link, and whether
> > > that's configured correctly.
> >
> > Created a Bugzilla entry and added dmesg and lspci output:
> > https://bugzilla.kernel.org/show_bug.cgi?id=208667
> >
> > As I noted in that report, I subsequently found this page on ASMedia's
> > site: https://www.asmedia.com.tw/eng/e_show_products.php?cate_index=169&item=114
> > which indicates this ASM1083 device has "No PCIe ASPM support".
>
> How nice.  According to your lspci, the device itself claims to
> support ASPM:
>
>   02:00.0 ... ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge
>     LnkCap: ... ASPM L0s L1 ...
>
> but the web page claims otherwise.  That would mean the device is
> defective for claiming something that's not true.  Or possibly those
> capability bits can be set by BIOS.
>
> > It's not clear why this problem isn't occurring on Windows however -
> > either it is not enabling ASPM, somehow it doesn't cause issues with
> > the PCIe link, or it is causing issues and just doesn't notify the
> > user in any way. I can try and check if this bridge device is ending
> > up with ASPM enabled under Windows 10 or not..
>
> If Windows *does* manage to enable ASPM, that would be interesting.  I
> don't know whether Windows has a similar quirk mechanism.  I suppose
> they must have *some* way to work around defective devices.

As I posted on the Bugzilla report, based on lspci output it appears
Windows does have ASPM L0s enabled for this bridge. However, it
appears to have the exact same problem: there are correctable PCIe
error entries showing up in the Windows system event log against the
root port the bridge is connected to. So I am thinking this hardware
is just broken with ASPM enabled.
Bjorn Helgaas July 29, 2020, 11:43 p.m. UTC | #5
On Tue, Jul 21, 2020 at 08:18:03PM -0600, Robert Hancock wrote:
> Recently ASPM handling was changed to no longer disable ASPM on all
> PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge
> devices don't seem to function properly with ASPM enabled, as they
> cause the parent PCIe root port to cause repeated AER timeout errors.
> In addition to flooding the kernel log, this also causes the machine
> to wake up immediately after suspend is initiated.
> 
> Fixes: 66ff14e59e8a ("PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges")
> Cc: stable@vger.kernel.org
> Signed-off-by: Robert Hancock <hancockrwd@gmail.com>

I applied this to for-linus for v5.8, since 66ff14e59e8a was merged
for v5.8.  Thanks very much for finding, debugging, and fixing this!

66ff14e59e8a wasn't marked for stable, so if it *was* backported to
stable kernels, I assume whatever process backported it will also
catch this because of the Fixes: tag.

> ---
>  drivers/pci/quirks.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 812bfc32ecb8..e5713114f2ab 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -2330,6 +2330,19 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f1, quirk_disable_aspm_l0s);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f4, quirk_disable_aspm_l0s);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1508, quirk_disable_aspm_l0s);
>  
> +static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
> +{
> +	pci_info(dev, "Disabling ASPM L0s/L1\n");
> +	pci_disable_link_state(dev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1);
> +}
> +
> +/*
> + * ASM1083/1085 PCIe-PCI bridge devices cause AER timeout errors on the
> + * upstream PCIe root port when ASPM is enabled. At least L0s mode is affected,
> + * disable both L0s and L1 for now to be safe.
> + */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
> +
>  /*
>   * Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe Retrain
>   * Link bit cleared after starting the link retrain process to allow this
> -- 
> 2.26.2
>
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 812bfc32ecb8..e5713114f2ab 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2330,6 +2330,19 @@  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f1, quirk_disable_aspm_l0s);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f4, quirk_disable_aspm_l0s);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1508, quirk_disable_aspm_l0s);
 
+static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
+{
+	pci_info(dev, "Disabling ASPM L0s/L1\n");
+	pci_disable_link_state(dev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1);
+}
+
+/*
+ * ASM1083/1085 PCIe-PCI bridge devices cause AER timeout errors on the
+ * upstream PCIe root port when ASPM is enabled. At least L0s mode is affected,
+ * disable both L0s and L1 for now to be safe.
+ */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
+
 /*
  * Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe Retrain
  * Link bit cleared after starting the link retrain process to allow this