Message ID | 20200722021803.17958-1-hancockrwd@gmail.com (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: Disallow ASPM on ASMedia ASM1083/1085 PCIe-PCI bridge | expand |
[+cc Puranjay] On Tue, Jul 21, 2020 at 08:18:03PM -0600, Robert Hancock wrote: > Recently ASPM handling was changed to no longer disable ASPM on all > PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge > devices don't seem to function properly with ASPM enabled, as they > cause the parent PCIe root port to cause repeated AER timeout errors. > In addition to flooding the kernel log, this also causes the machine > to wake up immediately after suspend is initiated. Hi Robert, thanks a lot for the report of this problem (https://lore.kernel.org/r/CADLC3L1R2hssRjxHJv9yhdN_7-hGw58rXSfNp-FraZh0Tw+gRw@mail.gmail.com and https://bugzilla.redhat.com/show_bug.cgi?id=1853960). I'm pretty sure Linux ASPM support is missing some things. This problem might be a hardware problem where a quirk is the right solution, but it could also be that it's a result of a Linux defect that we should fix. Could you collect the dmesg log and "sudo lspci -vvxxxx" output somewhere (maybe a bugzilla.kernel.org issue)? I want to figure out whether this L1 PM substates are enabled on this link, and whether that's configured correctly. > Fixes: 66ff14e59e8a ("PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges") > Cc: stable@vger.kernel.org > Signed-off-by: Robert Hancock <hancockrwd@gmail.com> > --- > drivers/pci/quirks.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 812bfc32ecb8..e5713114f2ab 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -2330,6 +2330,19 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f1, quirk_disable_aspm_l0s); > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f4, quirk_disable_aspm_l0s); > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1508, quirk_disable_aspm_l0s); > > +static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev) > +{ > + pci_info(dev, "Disabling ASPM L0s/L1\n"); > + pci_disable_link_state(dev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1); > +} > + > +/* > + * ASM1083/1085 PCIe-PCI bridge devices cause AER timeout errors on the > + * upstream PCIe root port when ASPM is enabled. At least L0s mode is affected, > + * disable both L0s and L1 for now to be safe. > + */ > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1); > + > /* > * Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe Retrain > * Link bit cleared after starting the link retrain process to allow this > -- > 2.26.2 >
On Wed, Jul 22, 2020 at 11:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+cc Puranjay] > > On Tue, Jul 21, 2020 at 08:18:03PM -0600, Robert Hancock wrote: > > Recently ASPM handling was changed to no longer disable ASPM on all > > PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge > > devices don't seem to function properly with ASPM enabled, as they > > cause the parent PCIe root port to cause repeated AER timeout errors. > > In addition to flooding the kernel log, this also causes the machine > > to wake up immediately after suspend is initiated. > > Hi Robert, thanks a lot for the report of this problem > (https://lore.kernel.org/r/CADLC3L1R2hssRjxHJv9yhdN_7-hGw58rXSfNp-FraZh0Tw+gRw@mail.gmail.com > and https://bugzilla.redhat.com/show_bug.cgi?id=1853960). > > I'm pretty sure Linux ASPM support is missing some things. This > problem might be a hardware problem where a quirk is the right > solution, but it could also be that it's a result of a Linux defect > that we should fix. > > Could you collect the dmesg log and "sudo lspci -vvxxxx" output > somewhere (maybe a bugzilla.kernel.org issue)? I want to figure out > whether this L1 PM substates are enabled on this link, and whether > that's configured correctly. Created a Bugzilla entry and added dmesg and lspci output: https://bugzilla.kernel.org/show_bug.cgi?id=208667 As I noted in that report, I subsequently found this page on ASMedia's site: https://www.asmedia.com.tw/eng/e_show_products.php?cate_index=169&item=114 which indicates this ASM1083 device has "No PCIe ASPM support". It's not clear why this problem isn't occurring on Windows however - either it is not enabling ASPM, somehow it doesn't cause issues with the PCIe link, or it is causing issues and just doesn't notify the user in any way. I can try and check if this bridge device is ending up with ASPM enabled under Windows 10 or not..
On Wed, Jul 22, 2020 at 06:46:06PM -0600, Robert Hancock wrote: > On Wed, Jul 22, 2020 at 11:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, Jul 21, 2020 at 08:18:03PM -0600, Robert Hancock wrote: > > > Recently ASPM handling was changed to no longer disable ASPM on all > > > PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge > > > devices don't seem to function properly with ASPM enabled, as they > > > cause the parent PCIe root port to cause repeated AER timeout errors. > > > In addition to flooding the kernel log, this also causes the machine > > > to wake up immediately after suspend is initiated. > > > > Hi Robert, thanks a lot for the report of this problem > > (https://lore.kernel.org/r/CADLC3L1R2hssRjxHJv9yhdN_7-hGw58rXSfNp-FraZh0Tw+gRw@mail.gmail.com > > and https://bugzilla.redhat.com/show_bug.cgi?id=1853960). > > > > I'm pretty sure Linux ASPM support is missing some things. This > > problem might be a hardware problem where a quirk is the right > > solution, but it could also be that it's a result of a Linux defect > > that we should fix. > > > > Could you collect the dmesg log and "sudo lspci -vvxxxx" output > > somewhere (maybe a bugzilla.kernel.org issue)? I want to figure out > > whether this L1 PM substates are enabled on this link, and whether > > that's configured correctly. > > Created a Bugzilla entry and added dmesg and lspci output: > https://bugzilla.kernel.org/show_bug.cgi?id=208667 > > As I noted in that report, I subsequently found this page on ASMedia's > site: https://www.asmedia.com.tw/eng/e_show_products.php?cate_index=169&item=114 > which indicates this ASM1083 device has "No PCIe ASPM support". How nice. According to your lspci, the device itself claims to support ASPM: 02:00.0 ... ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge LnkCap: ... ASPM L0s L1 ... but the web page claims otherwise. That would mean the device is defective for claiming something that's not true. Or possibly those capability bits can be set by BIOS. > It's not clear why this problem isn't occurring on Windows however - > either it is not enabling ASPM, somehow it doesn't cause issues with > the PCIe link, or it is causing issues and just doesn't notify the > user in any way. I can try and check if this bridge device is ending > up with ASPM enabled under Windows 10 or not.. If Windows *does* manage to enable ASPM, that would be interesting. I don't know whether Windows has a similar quirk mechanism. I suppose they must have *some* way to work around defective devices. Bjorn
On Wed, Jul 22, 2020 at 7:04 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, Jul 22, 2020 at 06:46:06PM -0600, Robert Hancock wrote: > > On Wed, Jul 22, 2020 at 11:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Tue, Jul 21, 2020 at 08:18:03PM -0600, Robert Hancock wrote: > > > > Recently ASPM handling was changed to no longer disable ASPM on all > > > > PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge > > > > devices don't seem to function properly with ASPM enabled, as they > > > > cause the parent PCIe root port to cause repeated AER timeout errors. > > > > In addition to flooding the kernel log, this also causes the machine > > > > to wake up immediately after suspend is initiated. > > > > > > Hi Robert, thanks a lot for the report of this problem > > > (https://lore.kernel.org/r/CADLC3L1R2hssRjxHJv9yhdN_7-hGw58rXSfNp-FraZh0Tw+gRw@mail.gmail.com > > > and https://bugzilla.redhat.com/show_bug.cgi?id=1853960). > > > > > > I'm pretty sure Linux ASPM support is missing some things. This > > > problem might be a hardware problem where a quirk is the right > > > solution, but it could also be that it's a result of a Linux defect > > > that we should fix. > > > > > > Could you collect the dmesg log and "sudo lspci -vvxxxx" output > > > somewhere (maybe a bugzilla.kernel.org issue)? I want to figure out > > > whether this L1 PM substates are enabled on this link, and whether > > > that's configured correctly. > > > > Created a Bugzilla entry and added dmesg and lspci output: > > https://bugzilla.kernel.org/show_bug.cgi?id=208667 > > > > As I noted in that report, I subsequently found this page on ASMedia's > > site: https://www.asmedia.com.tw/eng/e_show_products.php?cate_index=169&item=114 > > which indicates this ASM1083 device has "No PCIe ASPM support". > > How nice. According to your lspci, the device itself claims to > support ASPM: > > 02:00.0 ... ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge > LnkCap: ... ASPM L0s L1 ... > > but the web page claims otherwise. That would mean the device is > defective for claiming something that's not true. Or possibly those > capability bits can be set by BIOS. > > > It's not clear why this problem isn't occurring on Windows however - > > either it is not enabling ASPM, somehow it doesn't cause issues with > > the PCIe link, or it is causing issues and just doesn't notify the > > user in any way. I can try and check if this bridge device is ending > > up with ASPM enabled under Windows 10 or not.. > > If Windows *does* manage to enable ASPM, that would be interesting. I > don't know whether Windows has a similar quirk mechanism. I suppose > they must have *some* way to work around defective devices. As I posted on the Bugzilla report, based on lspci output it appears Windows does have ASPM L0s enabled for this bridge. However, it appears to have the exact same problem: there are correctable PCIe error entries showing up in the Windows system event log against the root port the bridge is connected to. So I am thinking this hardware is just broken with ASPM enabled.
On Tue, Jul 21, 2020 at 08:18:03PM -0600, Robert Hancock wrote: > Recently ASPM handling was changed to no longer disable ASPM on all > PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge > devices don't seem to function properly with ASPM enabled, as they > cause the parent PCIe root port to cause repeated AER timeout errors. > In addition to flooding the kernel log, this also causes the machine > to wake up immediately after suspend is initiated. > > Fixes: 66ff14e59e8a ("PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges") > Cc: stable@vger.kernel.org > Signed-off-by: Robert Hancock <hancockrwd@gmail.com> I applied this to for-linus for v5.8, since 66ff14e59e8a was merged for v5.8. Thanks very much for finding, debugging, and fixing this! 66ff14e59e8a wasn't marked for stable, so if it *was* backported to stable kernels, I assume whatever process backported it will also catch this because of the Fixes: tag. > --- > drivers/pci/quirks.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 812bfc32ecb8..e5713114f2ab 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -2330,6 +2330,19 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f1, quirk_disable_aspm_l0s); > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f4, quirk_disable_aspm_l0s); > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1508, quirk_disable_aspm_l0s); > > +static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev) > +{ > + pci_info(dev, "Disabling ASPM L0s/L1\n"); > + pci_disable_link_state(dev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1); > +} > + > +/* > + * ASM1083/1085 PCIe-PCI bridge devices cause AER timeout errors on the > + * upstream PCIe root port when ASPM is enabled. At least L0s mode is affected, > + * disable both L0s and L1 for now to be safe. > + */ > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1); > + > /* > * Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe Retrain > * Link bit cleared after starting the link retrain process to allow this > -- > 2.26.2 >
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 812bfc32ecb8..e5713114f2ab 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -2330,6 +2330,19 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f1, quirk_disable_aspm_l0s); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f4, quirk_disable_aspm_l0s); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1508, quirk_disable_aspm_l0s); +static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev) +{ + pci_info(dev, "Disabling ASPM L0s/L1\n"); + pci_disable_link_state(dev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1); +} + +/* + * ASM1083/1085 PCIe-PCI bridge devices cause AER timeout errors on the + * upstream PCIe root port when ASPM is enabled. At least L0s mode is affected, + * disable both L0s and L1 for now to be safe. + */ +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1); + /* * Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe Retrain * Link bit cleared after starting the link retrain process to allow this
Recently ASPM handling was changed to no longer disable ASPM on all PCIe to PCI bridges. Unfortunately these ASMedia PCIe to PCI bridge devices don't seem to function properly with ASPM enabled, as they cause the parent PCIe root port to cause repeated AER timeout errors. In addition to flooding the kernel log, this also causes the machine to wake up immediately after suspend is initiated. Fixes: 66ff14e59e8a ("PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges") Cc: stable@vger.kernel.org Signed-off-by: Robert Hancock <hancockrwd@gmail.com> --- drivers/pci/quirks.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)