diff mbox series

[2/2] PCI: vmd: Enable ASPM for mobile platforms

Message ID 20200930082455.25613-2-kai.heng.feng@canonical.com (mailing list archive)
State Not Applicable, archived
Delegated to: Bjorn Helgaas
Headers show
Series [1/2] PCI/ASPM: Add helper to enable ASPM link | expand

Commit Message

Kai-Heng Feng Sept. 30, 2020, 8:24 a.m. UTC
BIOS may not be able to program ASPM for links behind VMD, prevent Intel
SoC from entering deeper power saving state.

So enable ASPM for links behind VMD to increase battery life.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
 drivers/pci/controller/vmd.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Oct. 2, 2020, 10:18 p.m. UTC | #1
On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
> BIOS may not be able to program ASPM for links behind VMD, prevent Intel
> SoC from entering deeper power saving state.

It's not a question of BIOS not being *able* to configure ASPM.  I
think BIOS could do it, at least in principle, if it had a driver for
VMD.  Actually, it probably *does* include some sort of VMD code
because it sounds like BIOS can assign some Root Ports to appear
either as regular Root Ports or behind the VMD.

Since this issue is directly related to the unusual VMD topology, I
think it would be worth a quick recap here.  Maybe something like:

  VMD is a Root Complex Integrated Endpoint that acts as a host bridge
  to a secondary PCIe domain.  BIOS can reassign one or more Root
  Ports to appear within a VMD domain instead of the primary domain.

  However, BIOS may not enable ASPM for the hierarchies behind a VMD,
  ...

(This is based on the commit log from 185a383ada2e ("x86/PCI: Add
driver for Intel Volume Management Device (VMD)")).

But we still have the problem that CONFIG_PCIEASPM_DEFAULT=y means
"use the BIOS defaults", and this patch would make it so we use the
BIOS defaults *except* for things behind VMD.

  - Why should VMD be a special case?

  - How would we document such a special case?

  - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
    SoC power state problem?

  - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?

Link to previous discussion for the archives:
https://lore.kernel.org/r/49A36179-D336-4A5E-8B7A-A632833AE6B2@canonical.com

> So enable ASPM for links behind VMD to increase battery life.
> 
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
>  drivers/pci/controller/vmd.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index f69ef8c89f72..058fdef9c566 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -417,6 +417,22 @@ static int vmd_find_free_domain(void)
>  	return domain + 1;
>  }
>  
> +static const struct pci_device_id vmd_mobile_bridge_tbl[] = {
> +	{ PCI_VDEVICE(INTEL, 0x9a09) },
> +	{ PCI_VDEVICE(INTEL, 0xa0b0) },
> +	{ PCI_VDEVICE(INTEL, 0xa0bc) },
> +	{ }
> +};
> +
> +static int vmd_enable_aspm(struct device *dev, void *data)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +
> +	pci_enable_link_state(pdev, PCIE_LINK_STATE_ALL);
> +
> +	return 0;
> +}
> +
>  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  {
>  	struct pci_sysdata *sd = &vmd->sysdata;
> @@ -603,8 +619,12 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  	 * and will fail pcie_bus_configure_settings() early. It can instead be
>  	 * run on each of the real root ports.
>  	 */
> -	list_for_each_entry(child, &vmd->bus->children, node)
> +	list_for_each_entry(child, &vmd->bus->children, node) {
> +		if (pci_match_id(vmd_mobile_bridge_tbl, child->self))
> +			device_for_each_child(&child->self->dev, NULL, vmd_enable_aspm);

Wouldn't something like this be sufficient?

  list_for_each_entry(dev, &child->devices, bus_list)
    vmd_enable_aspm(dev);

>  		pcie_bus_configure_settings(child);
> +	}
>  
>  	pci_bus_add_devices(vmd->bus);
>  
> -- 
> 2.17.1
>
Kai-Heng Feng Oct. 5, 2020, 6:40 p.m. UTC | #2
Hi Bjorn,

> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
>> BIOS may not be able to program ASPM for links behind VMD, prevent Intel
>> SoC from entering deeper power saving state.
> 
> It's not a question of BIOS not being *able* to configure ASPM.  I
> think BIOS could do it, at least in principle, if it had a driver for
> VMD.  Actually, it probably *does* include some sort of VMD code
> because it sounds like BIOS can assign some Root Ports to appear
> either as regular Root Ports or behind the VMD.
> 
> Since this issue is directly related to the unusual VMD topology, I
> think it would be worth a quick recap here.  Maybe something like:
> 
>  VMD is a Root Complex Integrated Endpoint that acts as a host bridge
>  to a secondary PCIe domain.  BIOS can reassign one or more Root
>  Ports to appear within a VMD domain instead of the primary domain.
> 
>  However, BIOS may not enable ASPM for the hierarchies behind a VMD,
>  ...
> 
> (This is based on the commit log from 185a383ada2e ("x86/PCI: Add
> driver for Intel Volume Management Device (VMD)")).

Ok, will just copy the portion as-is if there's patch v2 :)

> 
> But we still have the problem that CONFIG_PCIEASPM_DEFAULT=y means
> "use the BIOS defaults", and this patch would make it so we use the
> BIOS defaults *except* for things behind VMD.
> 
>  - Why should VMD be a special case?

Because BIOS doesn't handle ASPM for it so it's up to software to do the job.
In the meantime we want other devices still use the BIOS defaults to not introduce any regression.

> 
>  - How would we document such a special case?

I wonder whether other devices that add PCIe domain have the same behavior?
Maybe it's not a special case at all...

I understand the end goal is to keep consistency for the entire ASPM logic. However I can't think of any possible solution right now.

> 
>  - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
>    SoC power state problem?

Yes.

> 
>  - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?

This will break many systems, at least for the 1st Gen Ryzen desktops and laptops.
All PCIe ASPM are not enabled by BIOS, and those systems immediately freeze once ASPM is enabled.

Kai-Heng

> 
> Link to previous discussion for the archives:
> https://lore.kernel.org/r/49A36179-D336-4A5E-8B7A-A632833AE6B2@canonical.com
> 
>> So enable ASPM for links behind VMD to increase battery life.
>> 
>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>> ---
>> drivers/pci/controller/vmd.c | 22 +++++++++++++++++++++-
>> 1 file changed, 21 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
>> index f69ef8c89f72..058fdef9c566 100644
>> --- a/drivers/pci/controller/vmd.c
>> +++ b/drivers/pci/controller/vmd.c
>> @@ -417,6 +417,22 @@ static int vmd_find_free_domain(void)
>> 	return domain + 1;
>> }
>> 
>> +static const struct pci_device_id vmd_mobile_bridge_tbl[] = {
>> +	{ PCI_VDEVICE(INTEL, 0x9a09) },
>> +	{ PCI_VDEVICE(INTEL, 0xa0b0) },
>> +	{ PCI_VDEVICE(INTEL, 0xa0bc) },
>> +	{ }
>> +};
>> +
>> +static int vmd_enable_aspm(struct device *dev, void *data)
>> +{
>> +	struct pci_dev *pdev = to_pci_dev(dev);
>> +
>> +	pci_enable_link_state(pdev, PCIE_LINK_STATE_ALL);
>> +
>> +	return 0;
>> +}
>> +
>> static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>> {
>> 	struct pci_sysdata *sd = &vmd->sysdata;
>> @@ -603,8 +619,12 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>> 	 * and will fail pcie_bus_configure_settings() early. It can instead be
>> 	 * run on each of the real root ports.
>> 	 */
>> -	list_for_each_entry(child, &vmd->bus->children, node)
>> +	list_for_each_entry(child, &vmd->bus->children, node) {
>> +		if (pci_match_id(vmd_mobile_bridge_tbl, child->self))
>> +			device_for_each_child(&child->self->dev, NULL, vmd_enable_aspm);
> 
> Wouldn't something like this be sufficient?
> 
>  list_for_each_entry(dev, &child->devices, bus_list)
>    vmd_enable_aspm(dev);
> 
>> 		pcie_bus_configure_settings(child);
>> +	}
>> 
>> 	pci_bus_add_devices(vmd->bus);
>> 
>> -- 
>> 2.17.1
Bjorn Helgaas Oct. 5, 2020, 7:19 p.m. UTC | #3
[+cc Ian, who's also working on an ASPM issue]

On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote:
> > On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
> >> BIOS may not be able to program ASPM for links behind VMD, prevent Intel
> >> SoC from entering deeper power saving state.
> > 
> > It's not a question of BIOS not being *able* to configure ASPM.  I
> > think BIOS could do it, at least in principle, if it had a driver for
> > VMD.  Actually, it probably *does* include some sort of VMD code
> > because it sounds like BIOS can assign some Root Ports to appear
> > either as regular Root Ports or behind the VMD.
> > 
> > Since this issue is directly related to the unusual VMD topology, I
> > think it would be worth a quick recap here.  Maybe something like:
> > 
> >  VMD is a Root Complex Integrated Endpoint that acts as a host bridge
> >  to a secondary PCIe domain.  BIOS can reassign one or more Root
> >  Ports to appear within a VMD domain instead of the primary domain.
> > 
> >  However, BIOS may not enable ASPM for the hierarchies behind a VMD,
> >  ...
> > 
> > (This is based on the commit log from 185a383ada2e ("x86/PCI: Add
> > driver for Intel Volume Management Device (VMD)")).
> 
> Ok, will just copy the portion as-is if there's patch v2 :)
> 
> > But we still have the problem that CONFIG_PCIEASPM_DEFAULT=y means
> > "use the BIOS defaults", and this patch would make it so we use the
> > BIOS defaults *except* for things behind VMD.
> > 
> >  - Why should VMD be a special case?
> 
> Because BIOS doesn't handle ASPM for it so it's up to software to do
> the job.  In the meantime we want other devices still use the BIOS
> defaults to not introduce any regression.
> 
> >  - How would we document such a special case?
> 
> I wonder whether other devices that add PCIe domain have the same
> behavior?  Maybe it's not a special case at all...

What other devices are these?

> I understand the end goal is to keep consistency for the entire ASPM
> logic. However I can't think of any possible solution right now.
> 
> >  - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
> >    SoC power state problem?
> 
> Yes.
> 
> >  - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?
> 
> This will break many systems, at least for the 1st Gen Ryzen
> desktops and laptops.
>
> All PCIe ASPM are not enabled by BIOS, and those systems immediately
> freeze once ASPM is enabled.

That indicates a defect in the Linux ASPM code.  We should fix that.
It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system.

Are there bug reports for these?  The info we would need to start with
includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y).
If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that
might be interesting, too.  We'll likely need to add some
instrumentation and do some experimentation, but in principle, this
should be fixable.

Bjorn
Kai-Heng Feng Oct. 7, 2020, 4:26 a.m. UTC | #4
> On Oct 6, 2020, at 03:19, Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> [+cc Ian, who's also working on an ASPM issue]
> 
> On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote:
>>> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@kernel.org> wrote:
>>> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
>>>> BIOS may not be able to program ASPM for links behind VMD, prevent Intel
>>>> SoC from entering deeper power saving state.
>>> 
>>> It's not a question of BIOS not being *able* to configure ASPM.  I
>>> think BIOS could do it, at least in principle, if it had a driver for
>>> VMD.  Actually, it probably *does* include some sort of VMD code
>>> because it sounds like BIOS can assign some Root Ports to appear
>>> either as regular Root Ports or behind the VMD.
>>> 
>>> Since this issue is directly related to the unusual VMD topology, I
>>> think it would be worth a quick recap here.  Maybe something like:
>>> 
>>> VMD is a Root Complex Integrated Endpoint that acts as a host bridge
>>> to a secondary PCIe domain.  BIOS can reassign one or more Root
>>> Ports to appear within a VMD domain instead of the primary domain.
>>> 
>>> However, BIOS may not enable ASPM for the hierarchies behind a VMD,
>>> ...
>>> 
>>> (This is based on the commit log from 185a383ada2e ("x86/PCI: Add
>>> driver for Intel Volume Management Device (VMD)")).
>> 
>> Ok, will just copy the portion as-is if there's patch v2 :)
>> 
>>> But we still have the problem that CONFIG_PCIEASPM_DEFAULT=y means
>>> "use the BIOS defaults", and this patch would make it so we use the
>>> BIOS defaults *except* for things behind VMD.
>>> 
>>> - Why should VMD be a special case?
>> 
>> Because BIOS doesn't handle ASPM for it so it's up to software to do
>> the job.  In the meantime we want other devices still use the BIOS
>> defaults to not introduce any regression.
>> 
>>> - How would we document such a special case?
>> 
>> I wonder whether other devices that add PCIe domain have the same
>> behavior?  Maybe it's not a special case at all...
> 
> What other devices are these?

Controllers which add PCIe domain.

> 
>> I understand the end goal is to keep consistency for the entire ASPM
>> logic. However I can't think of any possible solution right now.
>> 
>>> - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
>>>   SoC power state problem?
>> 
>> Yes.
>> 
>>> - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?
>> 
>> This will break many systems, at least for the 1st Gen Ryzen
>> desktops and laptops.
>> 
>> All PCIe ASPM are not enabled by BIOS, and those systems immediately
>> freeze once ASPM is enabled.
> 
> That indicates a defect in the Linux ASPM code.  We should fix that.
> It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system.

On those systems ASPM are also not enabled on Windows. So I think ASPM are disabled for a reason.

> 
> Are there bug reports for these? The info we would need to start with
> includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y).
> If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that
> might be interesting, too.  We'll likely need to add some
> instrumentation and do some experimentation, but in principle, this
> should be fixable.

Doing this is asking users to use hardware settings that ODM/OEM never tested, and I think the risk is really high.

Kai-Heng

> 
> Bjorn
Ian Kumlien Oct. 7, 2020, 9:28 a.m. UTC | #5
On Wed, Oct 7, 2020 at 6:26 AM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
>
>
> > On Oct 6, 2020, at 03:19, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > [+cc Ian, who's also working on an ASPM issue]
> >
> > On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote:
> >>> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >>> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
> >>>> BIOS may not be able to program ASPM for links behind VMD, prevent Intel
> >>>> SoC from entering deeper power saving state.
> >>>
> >>> It's not a question of BIOS not being *able* to configure ASPM.  I
> >>> think BIOS could do it, at least in principle, if it had a driver for
> >>> VMD.  Actually, it probably *does* include some sort of VMD code
> >>> because it sounds like BIOS can assign some Root Ports to appear
> >>> either as regular Root Ports or behind the VMD.
> >>>
> >>> Since this issue is directly related to the unusual VMD topology, I
> >>> think it would be worth a quick recap here.  Maybe something like:
> >>>
> >>> VMD is a Root Complex Integrated Endpoint that acts as a host bridge
> >>> to a secondary PCIe domain.  BIOS can reassign one or more Root
> >>> Ports to appear within a VMD domain instead of the primary domain.
> >>>
> >>> However, BIOS may not enable ASPM for the hierarchies behind a VMD,
> >>> ...
> >>>
> >>> (This is based on the commit log from 185a383ada2e ("x86/PCI: Add
> >>> driver for Intel Volume Management Device (VMD)")).
> >>
> >> Ok, will just copy the portion as-is if there's patch v2 :)
> >>
> >>> But we still have the problem that CONFIG_PCIEASPM_DEFAULT=y means
> >>> "use the BIOS defaults", and this patch would make it so we use the
> >>> BIOS defaults *except* for things behind VMD.
> >>>
> >>> - Why should VMD be a special case?
> >>
> >> Because BIOS doesn't handle ASPM for it so it's up to software to do
> >> the job.  In the meantime we want other devices still use the BIOS
> >> defaults to not introduce any regression.
> >>
> >>> - How would we document such a special case?
> >>
> >> I wonder whether other devices that add PCIe domain have the same
> >> behavior?  Maybe it's not a special case at all...
> >
> > What other devices are these?
>
> Controllers which add PCIe domain.
>
> >
> >> I understand the end goal is to keep consistency for the entire ASPM
> >> logic. However I can't think of any possible solution right now.
> >>
> >>> - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
> >>>   SoC power state problem?
> >>
> >> Yes.
> >>
> >>> - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?
> >>
> >> This will break many systems, at least for the 1st Gen Ryzen
> >> desktops and laptops.
> >>
> >> All PCIe ASPM are not enabled by BIOS, and those systems immediately
> >> freeze once ASPM is enabled.
> >
> > That indicates a defect in the Linux ASPM code.  We should fix that.
> > It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system.
>
> On those systems ASPM are also not enabled on Windows. So I think ASPM are disabled for a reason.
>
> >
> > Are there bug reports for these? The info we would need to start with
> > includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y).
> > If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that
> > might be interesting, too.  We'll likely need to add some
> > instrumentation and do some experimentation, but in principle, this
> > should be fixable.
>
> Doing this is asking users to use hardware settings that ODM/OEM never tested, and I think the risk is really high.

They have to test it to comply with pcie specs? and what we're
currently doing is wrong...

This fixes the L1 behaviour in the kernel, could you test it?

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 253c30cc1967..893b37669087 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,

 static void pcie_aspm_check_latency(struct pci_dev *endpoint)
 {
-       u32 latency, l1_switch_latency = 0;
+       u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
        struct aspm_latency *acceptable;
        struct pcie_link_state *link;

@@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
                if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
                    (link->latency_dw.l0s > acceptable->l0s))
                        link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+
                /*
                 * Check L1 latency.
-                * Every switch on the path to root complex need 1
-                * more microsecond for L1. Spec doesn't mention L0s.
+                *
+                * PCIe r5.0, sec 5.4.1.2.2 states:
+                * A Switch is required to initiate an L1 exit transition on its
+                * Upstream Port Link after no more than 1 μs from the
beginning of an
+                * L1 exit transition on any of its Downstream Port Links.
                 *
                 * The exit latencies for L1 substates are not advertised
                 * by a device.  Since the spec also doesn't mention a way
@@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
                 * L1 exit latencies advertised by a device include L1
                 * substate latencies (and hence do not do any check).
                 */
-               latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
-               if ((link->aspm_capable & ASPM_STATE_L1) &&
-                   (latency + l1_switch_latency > acceptable->l1))
-                       link->aspm_capable &= ~ASPM_STATE_L1;
-               l1_switch_latency += 1000;
+               if (link->aspm_capable & ASPM_STATE_L1) {
+                       latency = max_t(u32, link->latency_up.l1,
link->latency_dw.l1);
+                       l1_max_latency = max_t(u32, latency, l1_max_latency);
+                       if (l1_max_latency + l1_switch_latency > acceptable->l1)
+                               link->aspm_capable &= ~ASPM_STATE_L1;
+
+                       l1_switch_latency += 1000;
+               }

                link = link->parent;
        }
---

If it doesn't you could also look at the following L0s patch

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 893b37669087..15d64832a988 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,

 static void pcie_aspm_check_latency(struct pci_dev *endpoint)
 {
-       u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
+       u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
+               l0s_latency_up = 0, l0s_latency_dw = 0;
        struct aspm_latency *acceptable;
        struct pcie_link_state *link;

@@ -448,14 +449,18 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)

        while (link) {
                /* Check upstream direction L0s latency */
-               if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
-                   (link->latency_up.l0s > acceptable->l0s))
-                       link->aspm_capable &= ~ASPM_STATE_L0S_UP;
+               if (link->aspm_capable & ASPM_STATE_L0S_UP) {
+                       l0s_latency_up += link->latency_up.l0s;
+                       if (l0s_latency_up > acceptable->l0s)
+                               link->aspm_capable &= ~ASPM_STATE_L0S_UP;
+               }

                /* Check downstream direction L0s latency */
-               if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
-                   (link->latency_dw.l0s > acceptable->l0s))
-                       link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+               if (link->aspm_capable & ASPM_STATE_L0S_DW) {
+                       l0s_latency_dw += link->latency_dw.l0s;
+                       if (l0s_latency_dw > acceptable->l0s)
+                               link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+               }

                /*
                 * Check L1 latency.
---

I can send them directly as well if you prefer (i hope the client
doesn't mangle them)

> Kai-Heng
>
> >
> > Bjorn
>
Bjorn Helgaas Oct. 7, 2020, 1:30 p.m. UTC | #6
On Wed, Oct 07, 2020 at 12:26:19PM +0800, Kai-Heng Feng wrote:
> > On Oct 6, 2020, at 03:19, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote:
> >>> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >>> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:

...
> >> I wonder whether other devices that add PCIe domain have the same
> >> behavior?  Maybe it's not a special case at all...
> > 
> > What other devices are these?
> 
> Controllers which add PCIe domain.

I was looking for specific examples, not just a restatement of what
you said before.  I'm just curious because there are a lot of
controllers I'm not familiar with, and I can't think of an example.

> >> I understand the end goal is to keep consistency for the entire ASPM
> >> logic. However I can't think of any possible solution right now.
> >> 
> >>> - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
> >>>   SoC power state problem?
> >> 
> >> Yes.
> >> 
> >>> - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?
> >> 
> >> This will break many systems, at least for the 1st Gen Ryzen
> >> desktops and laptops.
> >> 
> >> All PCIe ASPM are not enabled by BIOS, and those systems immediately
> >> freeze once ASPM is enabled.
> > 
> > That indicates a defect in the Linux ASPM code.  We should fix that.
> > It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system.
> 
> On those systems ASPM are also not enabled on Windows. So I think
> ASPM are disabled for a reason.

If the platform knows ASPM needs to be disabled, it should be using
ACPI_FADT_NO_ASPM or _OSC to prevent the OS from using it.  And if
CONFIG_PCIEASPM_POWERSAVE=y means Linux enables ASPM when it
shouldn't, that's a Linux bug that we need to fix.

> > Are there bug reports for these? The info we would need to start with
> > includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y).
> > If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that
> > might be interesting, too.  We'll likely need to add some
> > instrumentation and do some experimentation, but in principle, this
> > should be fixable.
> 
> Doing this is asking users to use hardware settings that ODM/OEM
> never tested, and I think the risk is really high.

What?  That's not what I said at all.  I'm asking for information
about these hangs so we can fix them.  I'm not suggesting that you
should switch to CONFIG_PCIEASPM_POWERSAVE=y for the distro.

Let's back up.  You said:

  CONFIG_PCIEASPM_POWERSAVE=y ... will break many systems, at least
  for the 1st Gen Ryzen desktops and laptops.

  All PCIe ASPM are not enabled by BIOS, and those systems immediately
  freeze once ASPM is enabled.

These system hangs might be caused by (1) some hardware issue that
causes a hang when ASPM is enabled even if it is configured correctly
or (2) Linux configuring ASPM incorrectly.

For case (1), the platform should be using ACPI_FADT_NO_ASPM or _OSC
to prevent the OS from enabling ASPM.  Linux should pay attention to
that even when CONFIG_PCIEASPM_POWERSAVE=y.

If the platform *should* use these mechanisms but doesn't, the
solution is a quirk, not the folklore that "we can't use
CONFIG_PCIEASPM_POWERSAVE=y because it breaks some systems."

For case (2), we should fix Linux so it configures ASPM correctly.

We cannot use the build-time CONFIG_PCIEASPM settings to avoid these
hangs.  We need to fix the Linux run-time code so the system operates
correctly no matter what CONFIG_PCIEASPM setting is used.

We have sysfs knobs to control ASPM (see 72ea91afbfb0 ("PCI/ASPM: Add
sysfs attributes for controlling ASPM link states")).  They can do the
same thing at run-time as CONFIG_PCIEASPM_POWERSAVE=y does at
build-time.  If those knobs cause hangs on 1st Gen Ryzen systems, we
need to fix that.

Bjorn
Ian Kumlien Oct. 7, 2020, 1:44 p.m. UTC | #7
On Wed, Oct 7, 2020 at 3:30 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Oct 07, 2020 at 12:26:19PM +0800, Kai-Heng Feng wrote:
> > > On Oct 6, 2020, at 03:19, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote:
> > >>> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >>> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
>
> ...
> > >> I wonder whether other devices that add PCIe domain have the same
> > >> behavior?  Maybe it's not a special case at all...
> > >
> > > What other devices are these?
> >
> > Controllers which add PCIe domain.
>
> I was looking for specific examples, not just a restatement of what
> you said before.  I'm just curious because there are a lot of
> controllers I'm not familiar with, and I can't think of an example.
>
> > >> I understand the end goal is to keep consistency for the entire ASPM
> > >> logic. However I can't think of any possible solution right now.
> > >>
> > >>> - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
> > >>>   SoC power state problem?
> > >>
> > >> Yes.
> > >>
> > >>> - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?
> > >>
> > >> This will break many systems, at least for the 1st Gen Ryzen
> > >> desktops and laptops.
> > >>
> > >> All PCIe ASPM are not enabled by BIOS, and those systems immediately
> > >> freeze once ASPM is enabled.
> > >
> > > That indicates a defect in the Linux ASPM code.  We should fix that.
> > > It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system.
> >
> > On those systems ASPM are also not enabled on Windows. So I think
> > ASPM are disabled for a reason.
>
> If the platform knows ASPM needs to be disabled, it should be using
> ACPI_FADT_NO_ASPM or _OSC to prevent the OS from using it.  And if
> CONFIG_PCIEASPM_POWERSAVE=y means Linux enables ASPM when it
> shouldn't, that's a Linux bug that we need to fix.
>
> > > Are there bug reports for these? The info we would need to start with
> > > includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y).
> > > If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that
> > > might be interesting, too.  We'll likely need to add some
> > > instrumentation and do some experimentation, but in principle, this
> > > should be fixable.
> >
> > Doing this is asking users to use hardware settings that ODM/OEM
> > never tested, and I think the risk is really high.
>
> What?  That's not what I said at all.  I'm asking for information
> about these hangs so we can fix them.  I'm not suggesting that you
> should switch to CONFIG_PCIEASPM_POWERSAVE=y for the distro.
>
> Let's back up.  You said:
>
>   CONFIG_PCIEASPM_POWERSAVE=y ... will break many systems, at least
>   for the 1st Gen Ryzen desktops and laptops.
>
>   All PCIe ASPM are not enabled by BIOS, and those systems immediately
>   freeze once ASPM is enabled.
>
> These system hangs might be caused by (1) some hardware issue that
> causes a hang when ASPM is enabled even if it is configured correctly
> or (2) Linux configuring ASPM incorrectly.

Could this be:
1044 PCIe ® Controller May Hang on Entry Into Either L1.1 or
L1.2 Power Management Substate

Description
Under a highly specific and detailed set of internal timing
conditions, the PCIe ® controller may hang on entry
into either L1.1 or L1.2 power management substate.
This failure occurs when L1 power management substate exit is
triggered by a link partner asserting CLKREQ#
prior to the completion of the L1 power management stubstates entry protocol.

Potential Effect on System
The system may hang or reset.

Suggested Workaround
Disable L1.1 and L1.2 power management substates. System software may
contain the workaround for this
erratum.

Fix Planned
Yes

Link: https://www.amd.com/system/files/TechDocs/55449_Fam_17h_M_00h-0Fh_Rev_Guide.pdf

> For case (1), the platform should be using ACPI_FADT_NO_ASPM or _OSC
> to prevent the OS from enabling ASPM.  Linux should pay attention to
> that even when CONFIG_PCIEASPM_POWERSAVE=y.
>
> If the platform *should* use these mechanisms but doesn't, the
> solution is a quirk, not the folklore that "we can't use
> CONFIG_PCIEASPM_POWERSAVE=y because it breaks some systems."
>
> For case (2), we should fix Linux so it configures ASPM correctly.
>
> We cannot use the build-time CONFIG_PCIEASPM settings to avoid these
> hangs.  We need to fix the Linux run-time code so the system operates
> correctly no matter what CONFIG_PCIEASPM setting is used.
>
> We have sysfs knobs to control ASPM (see 72ea91afbfb0 ("PCI/ASPM: Add
> sysfs attributes for controlling ASPM link states")).  They can do the
> same thing at run-time as CONFIG_PCIEASPM_POWERSAVE=y does at
> build-time.  If those knobs cause hangs on 1st Gen Ryzen systems, we
> need to fix that.
>
> Bjorn
Kai-Heng Feng Oct. 8, 2020, 4:19 a.m. UTC | #8
> On Oct 7, 2020, at 21:30, Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> On Wed, Oct 07, 2020 at 12:26:19PM +0800, Kai-Heng Feng wrote:
>>> On Oct 6, 2020, at 03:19, Bjorn Helgaas <helgaas@kernel.org> wrote:
>>> On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote:
>>>>> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>>> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
> 
> ...
>>>> I wonder whether other devices that add PCIe domain have the same
>>>> behavior?  Maybe it's not a special case at all...
>>> 
>>> What other devices are these?
>> 
>> Controllers which add PCIe domain.
> 
> I was looking for specific examples, not just a restatement of what
> you said before.  I'm just curious because there are a lot of
> controllers I'm not familiar with, and I can't think of an example.
> 
>>>> I understand the end goal is to keep consistency for the entire ASPM
>>>> logic. However I can't think of any possible solution right now.
>>>> 
>>>>> - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
>>>>>  SoC power state problem?
>>>> 
>>>> Yes.
>>>> 
>>>>> - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?
>>>> 
>>>> This will break many systems, at least for the 1st Gen Ryzen
>>>> desktops and laptops.
>>>> 
>>>> All PCIe ASPM are not enabled by BIOS, and those systems immediately
>>>> freeze once ASPM is enabled.
>>> 
>>> That indicates a defect in the Linux ASPM code.  We should fix that.
>>> It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system.
>> 
>> On those systems ASPM are also not enabled on Windows. So I think
>> ASPM are disabled for a reason.
> 
> If the platform knows ASPM needs to be disabled, it should be using
> ACPI_FADT_NO_ASPM or _OSC to prevent the OS from using it.  And if
> CONFIG_PCIEASPM_POWERSAVE=y means Linux enables ASPM when it
> shouldn't, that's a Linux bug that we need to fix.

Yes that's a bug which fixed by Ian's new patch.

> 
>>> Are there bug reports for these? The info we would need to start with
>>> includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y).
>>> If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that
>>> might be interesting, too.  We'll likely need to add some
>>> instrumentation and do some experimentation, but in principle, this
>>> should be fixable.
>> 
>> Doing this is asking users to use hardware settings that ODM/OEM
>> never tested, and I think the risk is really high.
> 
> What?  That's not what I said at all.  I'm asking for information
> about these hangs so we can fix them.  I'm not suggesting that you
> should switch to CONFIG_PCIEASPM_POWERSAVE=y for the distro.

Ah, I thought your suggestion is switching to CONFIG_PCIEASPM_POWERSAVE=y, because I sense you want to use that to cover the VMD ASPM this patch tries to solve.

Do we have a conclusion how to enable VMD ASPM with CONFIG_PCIEASPM_DEFAULT=y?

> 
> Let's back up.  You said:
> 
>  CONFIG_PCIEASPM_POWERSAVE=y ... will break many systems, at least
>  for the 1st Gen Ryzen desktops and laptops.
> 
>  All PCIe ASPM are not enabled by BIOS, and those systems immediately
>  freeze once ASPM is enabled.
> 
> These system hangs might be caused by (1) some hardware issue that
> causes a hang when ASPM is enabled even if it is configured correctly
> or (2) Linux configuring ASPM incorrectly.

It's (2) here.

> 
> For case (1), the platform should be using ACPI_FADT_NO_ASPM or _OSC
> to prevent the OS from enabling ASPM.  Linux should pay attention to
> that even when CONFIG_PCIEASPM_POWERSAVE=y.
> 
> If the platform *should* use these mechanisms but doesn't, the
> solution is a quirk, not the folklore that "we can't use
> CONFIG_PCIEASPM_POWERSAVE=y because it breaks some systems."

The platform in question doesn't prevent OS from enabling ASPM.

> 
> For case (2), we should fix Linux so it configures ASPM correctly.
> 
> We cannot use the build-time CONFIG_PCIEASPM settings to avoid these
> hangs.  We need to fix the Linux run-time code so the system operates
> correctly no matter what CONFIG_PCIEASPM setting is used.
> 
> We have sysfs knobs to control ASPM (see 72ea91afbfb0 ("PCI/ASPM: Add
> sysfs attributes for controlling ASPM link states")).  They can do the
> same thing at run-time as CONFIG_PCIEASPM_POWERSAVE=y does at
> build-time.  If those knobs cause hangs on 1st Gen Ryzen systems, we
> need to fix that.

Ian's patch solves the issue, at least for the systems I have.

Kai-Heng

> 
> Bjorn
Ian Kumlien Oct. 9, 2020, 2:34 p.m. UTC | #9
On Thu, Oct 8, 2020 at 6:19 AM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
>
>
> > On Oct 7, 2020, at 21:30, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Wed, Oct 07, 2020 at 12:26:19PM +0800, Kai-Heng Feng wrote:
> >>> On Oct 6, 2020, at 03:19, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >>> On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote:
> >>>>> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >>>>> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
> >
> > ...
> >>>> I wonder whether other devices that add PCIe domain have the same
> >>>> behavior?  Maybe it's not a special case at all...
> >>>
> >>> What other devices are these?
> >>
> >> Controllers which add PCIe domain.
> >
> > I was looking for specific examples, not just a restatement of what
> > you said before.  I'm just curious because there are a lot of
> > controllers I'm not familiar with, and I can't think of an example.
> >
> >>>> I understand the end goal is to keep consistency for the entire ASPM
> >>>> logic. However I can't think of any possible solution right now.
> >>>>
> >>>>> - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
> >>>>>  SoC power state problem?
> >>>>
> >>>> Yes.
> >>>>
> >>>>> - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?
> >>>>
> >>>> This will break many systems, at least for the 1st Gen Ryzen
> >>>> desktops and laptops.
> >>>>
> >>>> All PCIe ASPM are not enabled by BIOS, and those systems immediately
> >>>> freeze once ASPM is enabled.
> >>>
> >>> That indicates a defect in the Linux ASPM code.  We should fix that.
> >>> It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system.
> >>
> >> On those systems ASPM are also not enabled on Windows. So I think
> >> ASPM are disabled for a reason.
> >
> > If the platform knows ASPM needs to be disabled, it should be using
> > ACPI_FADT_NO_ASPM or _OSC to prevent the OS from using it.  And if
> > CONFIG_PCIEASPM_POWERSAVE=y means Linux enables ASPM when it
> > shouldn't, that's a Linux bug that we need to fix.
>
> Yes that's a bug which fixed by Ian's new patch.
>
> >
> >>> Are there bug reports for these? The info we would need to start with
> >>> includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y).
> >>> If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that
> >>> might be interesting, too.  We'll likely need to add some
> >>> instrumentation and do some experimentation, but in principle, this
> >>> should be fixable.
> >>
> >> Doing this is asking users to use hardware settings that ODM/OEM
> >> never tested, and I think the risk is really high.
> >
> > What?  That's not what I said at all.  I'm asking for information
> > about these hangs so we can fix them.  I'm not suggesting that you
> > should switch to CONFIG_PCIEASPM_POWERSAVE=y for the distro.
>
> Ah, I thought your suggestion is switching to CONFIG_PCIEASPM_POWERSAVE=y, because I sense you want to use that to cover the VMD ASPM this patch tries to solve.
>
> Do we have a conclusion how to enable VMD ASPM with CONFIG_PCIEASPM_DEFAULT=y?
>
> >
> > Let's back up.  You said:
> >
> >  CONFIG_PCIEASPM_POWERSAVE=y ... will break many systems, at least
> >  for the 1st Gen Ryzen desktops and laptops.
> >
> >  All PCIe ASPM are not enabled by BIOS, and those systems immediately
> >  freeze once ASPM is enabled.
> >
> > These system hangs might be caused by (1) some hardware issue that
> > causes a hang when ASPM is enabled even if it is configured correctly
> > or (2) Linux configuring ASPM incorrectly.
>
> It's (2) here.
>
> >
> > For case (1), the platform should be using ACPI_FADT_NO_ASPM or _OSC
> > to prevent the OS from enabling ASPM.  Linux should pay attention to
> > that even when CONFIG_PCIEASPM_POWERSAVE=y.
> >
> > If the platform *should* use these mechanisms but doesn't, the
> > solution is a quirk, not the folklore that "we can't use
> > CONFIG_PCIEASPM_POWERSAVE=y because it breaks some systems."
>
> The platform in question doesn't prevent OS from enabling ASPM.
>
> >
> > For case (2), we should fix Linux so it configures ASPM correctly.
> >
> > We cannot use the build-time CONFIG_PCIEASPM settings to avoid these
> > hangs.  We need to fix the Linux run-time code so the system operates
> > correctly no matter what CONFIG_PCIEASPM setting is used.
> >
> > We have sysfs knobs to control ASPM (see 72ea91afbfb0 ("PCI/ASPM: Add
> > sysfs attributes for controlling ASPM link states")).  They can do the
> > same thing at run-time as CONFIG_PCIEASPM_POWERSAVE=y does at
> > build-time.  If those knobs cause hangs on 1st Gen Ryzen systems, we
> > need to fix that.
>
> Ian's patch solves the issue, at least for the systems I have.

Could you add:
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 15d64832a988..cd9f2101f9a2 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -482,7 +482,12 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
                        latency = max_t(u32, link->latency_up.l1,
link->latency_dw.l1);
                        l1_max_latency = max_t(u32, latency, l1_max_latency);
                        if (l1_max_latency + l1_switch_latency > acceptable->l1)
+                       {
+                               pci_info(endpoint, "L1 latency
exceeded - path: %i - max: %i\n", l1_switch_latency, l1_max_latency);
+                               pci_info(link->pdev, "Upstream device
- %i\n", link->latency_up.l1);
+                               pci_info(link->downstream, "Downstream
device - %i\n", link->latency_dw.l1);
                                link->aspm_capable &= ~ASPM_STATE_L1;
+                       }

                        l1_switch_latency += 1000;
                }

So we can see what device triggers what links to be disabled?

I think your use-case is much more important than mine - mine fixes
something as a side effect

Also, please send me the lspci -vvv output as well as lspci -PP -s
<device id> for the device id:s mentioned in dmesg with the patch
above applied ;)

> Kai-Heng
>
> >
> > Bjorn
>
diff mbox series

Patch

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index f69ef8c89f72..058fdef9c566 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -417,6 +417,22 @@  static int vmd_find_free_domain(void)
 	return domain + 1;
 }
 
+static const struct pci_device_id vmd_mobile_bridge_tbl[] = {
+	{ PCI_VDEVICE(INTEL, 0x9a09) },
+	{ PCI_VDEVICE(INTEL, 0xa0b0) },
+	{ PCI_VDEVICE(INTEL, 0xa0bc) },
+	{ }
+};
+
+static int vmd_enable_aspm(struct device *dev, void *data)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	pci_enable_link_state(pdev, PCIE_LINK_STATE_ALL);
+
+	return 0;
+}
+
 static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 {
 	struct pci_sysdata *sd = &vmd->sysdata;
@@ -603,8 +619,12 @@  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 	 * and will fail pcie_bus_configure_settings() early. It can instead be
 	 * run on each of the real root ports.
 	 */
-	list_for_each_entry(child, &vmd->bus->children, node)
+	list_for_each_entry(child, &vmd->bus->children, node) {
+		if (pci_match_id(vmd_mobile_bridge_tbl, child->self))
+			device_for_each_child(&child->self->dev, NULL, vmd_enable_aspm);
+
 		pcie_bus_configure_settings(child);
+	}
 
 	pci_bus_add_devices(vmd->bus);