diff mbox series

[V4,2/2] PCI: vmd: Override ASPM on TGL/ADL VMD devices

Message ID 20211216055600.2425362-2-david.e.box@linux.intel.com (mailing list archive)
State Changes Requested
Delegated to: Lorenzo Pieralisi
Headers show
Series [V4,1/2] PCI/ASPM: Add ASPM BIOS override function | expand

Commit Message

David E. Box Dec. 16, 2021, 5:56 a.m. UTC
From: Michael Bottini <michael.a.bottini@linux.intel.com>

On Tiger Lake and Alder Lake platforms, VMD controllers do not have ASPM
enabled nor LTR values set by BIOS. This leads high power consumption on
these platforms when VMD is enabled as reported in bugzilla [1].  Enable
these features in the VMD driver using pcie_aspm_policy_override() to set
the ASPM policy for the root ports.

To do this, add an additional flag in VMD features to specify devices that
must have their respective policies overridden.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=213717

Signed-off-by: Michael Bottini <michael.a.bottini@linux.intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Tested-by: Adhitya Mohan <me@adhityamohan.in>
---
V4
 - Refactor vmd_enable_apsm() to exit early, making the lines shorter
   and more readable. Suggested by Christoph.
V3
 - No changes
V2
 - Use return status to print pci_info message if ASPM cannot be enabled.
 - Add missing static declaration, caught by lkp@intel.com

 drivers/pci/controller/vmd.c | 43 +++++++++++++++++++++++++++++++++---
 1 file changed, 40 insertions(+), 3 deletions(-)

Comments

Bjorn Helgaas Dec. 16, 2021, 5:26 p.m. UTC | #1
[+cc Rajat for LTR max latency write]

On Wed, Dec 15, 2021 at 09:56:00PM -0800, David E. Box wrote:
> From: Michael Bottini <michael.a.bottini@linux.intel.com>
> 
> On Tiger Lake and Alder Lake platforms, VMD controllers do not have ASPM
> enabled nor LTR values set by BIOS. This leads high power consumption on
> these platforms when VMD is enabled as reported in bugzilla [1].  Enable
> these features in the VMD driver using pcie_aspm_policy_override() to set
> the ASPM policy for the root ports.

s/leads high/leads to high/

Does this depend on "Tiger Lake and Alder Lake platforms"
specifically, or does it depend on a BIOS design choice, i.e., "don't
configure ASPM or LTR for devices below a VMD"?

The subject says "override ASPM on VMD devices," but it looks like
this affects the ASPM configuration of devices *below* the VMD, not of
the VMD itself.

It looks like this only affects *NVMe* devices, since
vmd_enable_aspm() checks for PCI_CLASS_STORAGE_EXPRESS.  Why is that?
Is there something special about NVMe?  I'd think you would want to do
this for *all* devices below a VMD.

Since it only affects PCI_CLASS_STORAGE_EXPRESS devices, I don't think
it actually "sets ASPM policy for the root ports".  vmd_enable_aspm()
calls pcie_aspm_policy_override() on endpoints.  It's true that the
link ASPM state happens to be attached to the upstream end of the
link, but that's an ASPM implementation detail.

This all needs to be clear in the subject and commit log.

> To do this, add an additional flag in VMD features to specify devices that
> must have their respective policies overridden.

I'm not clear on why you want this to apply to only certain VMDs and
not others.  Do some BIOSes configure ASPM for devices below some
VMDs?

> [1] https://bugzilla.kernel.org/show_bug.cgi?id=213717
> 
> Signed-off-by: Michael Bottini <michael.a.bottini@linux.intel.com>
> Signed-off-by: David E. Box <david.e.box@linux.intel.com>
> Tested-by: Adhitya Mohan <me@adhityamohan.in>
> ---
> V4
>  - Refactor vmd_enable_apsm() to exit early, making the lines shorter
>    and more readable. Suggested by Christoph.
> V3
>  - No changes
> V2
>  - Use return status to print pci_info message if ASPM cannot be enabled.
>  - Add missing static declaration, caught by lkp@intel.com
> 
>  drivers/pci/controller/vmd.c | 43 +++++++++++++++++++++++++++++++++---
>  1 file changed, 40 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index a45e8e59d3d4..880afd450a14 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -20,6 +20,8 @@
>  
>  #include <asm/irqdomain.h>
>  
> +#include "../pci.h"
> +
>  #define VMD_CFGBAR	0
>  #define VMD_MEMBAR1	2
>  #define VMD_MEMBAR2	4
> @@ -67,6 +69,12 @@ enum vmd_features {
>  	 * interrupt handling.
>  	 */
>  	VMD_FEAT_CAN_BYPASS_MSI_REMAP		= (1 << 4),
> +
> +	/*
> +	 * Device must have ASPM policy overridden, as its default policy is
> +	 * incorrect.
> +	 */
> +	VMD_FEAT_QUIRK_OVERRIDE_ASPM		= (1 << 5),

I think you specifically want to *enable* some ASPM link states, not
just "override the default policy."  "Override" tells us nothing about
whether you are enabling or disabling ASPM.  Applies to subject line
as well.

>  };
>  
>  static DEFINE_IDA(vmd_instance_ida);
> @@ -661,6 +669,30 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>  	return 0;
>  }
>  
> +/*
> + * Override the BIOS ASPM policy and set the LTR value for PCI storage
> + * devices on the VMD bride.

I don't think there's any BIOS "policy" here.  At this point BIOS is
no longer involved at all, so all that's left is whatever ASPM config
the BIOS did or did not do.

Why only storage?

s/bride/bridge/

> + */
> +static int vmd_enable_aspm(struct pci_dev *pdev, void *userdata)
> +{
> +	int features = *(int *)userdata, pos;
> +
> +	if (!(features & VMD_FEAT_QUIRK_OVERRIDE_ASPM) ||
> +	    pdev->class != PCI_CLASS_STORAGE_EXPRESS)
> +		return 0;
> +
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR);
> +	if (!pos)
> +		return 0;
> +
> +	pci_write_config_word(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, 0x1003);
> +	pci_write_config_word(pdev, pos + PCI_LTR_MAX_NOSNOOP_LAT, 0x1003);

1) Where did this magic 0x1003 value come from?  Does that depend on
the VMD device?  The endpoint?  The circuit design?  The path between
endpoint and VMD?  What if there are switches in the path?

2) There exist broken devices where WORD config accesses don't work:
https://lore.kernel.org/all/20211208000948.487820-1-rajatja@google.com/

We might need a way to quirk config accesses to those devices, but we
don't have one yet.  So for now this needs to be a single DWORD write.

> +	if (pcie_aspm_policy_override(pdev))
> +		pci_info(pdev, "Unable of override ASPM policy\n");

s/Unable of/Unable to/

I think we might need a message about when we *do* override the
policy.  A note in dmesg might be useful for debugging.  I'm worried
about the LTR programming because I really don't understand how we
should be doing that.

> +	return 0;
> +}
> +
>  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  {
>  	struct pci_sysdata *sd = &vmd->sysdata;
> @@ -807,6 +839,8 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  	pci_scan_child_bus(vmd->bus);
>  	pci_assign_unassigned_bus_resources(vmd->bus);
>  
> +	pci_walk_bus(vmd->bus, vmd_enable_aspm, &features);

Do you support hotplug under VMD?  This will not happen for hot-added
devices.

>  	/*
>  	 * VMD root buses are virtual and don't return true on pci_is_pcie()
>  	 * and will fail pcie_bus_configure_settings() early. It can instead be
> @@ -948,15 +982,18 @@ static const struct pci_device_id vmd_ids[] = {
>  	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x467f),
>  		.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
>  				VMD_FEAT_HAS_BUS_RESTRICTIONS |
> -				VMD_FEAT_OFFSET_FIRST_VECTOR,},
> +				VMD_FEAT_OFFSET_FIRST_VECTOR |
> +				VMD_FEAT_QUIRK_OVERRIDE_ASPM,},
>  	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x4c3d),
>  		.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
>  				VMD_FEAT_HAS_BUS_RESTRICTIONS |
> -				VMD_FEAT_OFFSET_FIRST_VECTOR,},
> +				VMD_FEAT_OFFSET_FIRST_VECTOR |
> +				VMD_FEAT_QUIRK_OVERRIDE_ASPM,},
>  	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_9A0B),
>  		.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
>  				VMD_FEAT_HAS_BUS_RESTRICTIONS |
> -				VMD_FEAT_OFFSET_FIRST_VECTOR,},
> +				VMD_FEAT_OFFSET_FIRST_VECTOR |
> +				VMD_FEAT_QUIRK_OVERRIDE_ASPM,},
>  	{0,}
>  };
>  MODULE_DEVICE_TABLE(pci, vmd_ids);
> -- 
> 2.25.1
>
David E. Box Dec. 16, 2021, 9:24 p.m. UTC | #2
On Thu, 2021-12-16 at 11:26 -0600, Bjorn Helgaas wrote:
> [+cc Rajat for LTR max latency write]
> 
> On Wed, Dec 15, 2021 at 09:56:00PM -0800, David E. Box wrote:
> > From: Michael Bottini <michael.a.bottini@linux.intel.com>
> > 
> > On Tiger Lake and Alder Lake platforms, VMD controllers do not have ASPM
> > enabled nor LTR values set by BIOS. This leads high power consumption on
> > these platforms when VMD is enabled as reported in bugzilla [1].  Enable
> > these features in the VMD driver using pcie_aspm_policy_override() to set
> > the ASPM policy for the root ports.
> 
> s/leads high/leads to high/
> 
> Does this depend on "Tiger Lake and Alder Lake platforms"
> specifically, or does it depend on a BIOS design choice, i.e., "don't
> configure ASPM or LTR for devices below a VMD"?

It was a BIOS design choice to have this done by the OS driver for both Tiger Lake and Alder Lake.

> 
> The subject says "override ASPM on VMD devices," but it looks like
> this affects the ASPM configuration of devices *below* the VMD, not of
> the VMD itself.

Yes.

> 
> It looks like this only affects *NVMe* devices, since
> vmd_enable_aspm() checks for PCI_CLASS_STORAGE_EXPRESS.  Why is that?
> Is there something special about NVMe?  I'd think you would want to do
> this for *all* devices below a VMD.

We need this for all devices under the PCIe root ports below VMD. Those will only be NVMe device
AFAICS.

> 
> Since it only affects PCI_CLASS_STORAGE_EXPRESS devices, I don't think
> it actually "sets ASPM policy for the root ports".  vmd_enable_aspm()
> calls pcie_aspm_policy_override() on endpoints.  It's true that the
> link ASPM state happens to be attached to the upstream end of the
> link, but that's an ASPM implementation detail.
> 
> This all needs to be clear in the subject and commit log.

Ack

> 
> > To do this, add an additional flag in VMD features to specify devices that
> > must have their respective policies overridden.
> 
> I'm not clear on why you want this to apply to only certain VMDs and
> not others.  Do some BIOSes configure ASPM for devices below some
> VMDs?

Not currently. But the plan is for future devices to move back to having BIOS do the programming.

> 
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=213717
> > 
> > Signed-off-by: Michael Bottini <michael.a.bottini@linux.intel.com>
> > Signed-off-by: David E. Box <david.e.box@linux.intel.com>
> > Tested-by: Adhitya Mohan <me@adhityamohan.in>
> > ---
> > V4
> >  - Refactor vmd_enable_apsm() to exit early, making the lines shorter
> >    and more readable. Suggested by Christoph.
> > V3
> >  - No changes
> > V2
> >  - Use return status to print pci_info message if ASPM cannot be enabled.
> >  - Add missing static declaration, caught by lkp@intel.com
> > 
> >  drivers/pci/controller/vmd.c | 43 +++++++++++++++++++++++++++++++++---
> >  1 file changed, 40 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > index a45e8e59d3d4..880afd450a14 100644
> > --- a/drivers/pci/controller/vmd.c
> > +++ b/drivers/pci/controller/vmd.c
> > @@ -20,6 +20,8 @@
> >  
> >  #include <asm/irqdomain.h>
> >  
> > +#include "../pci.h"
> > +
> >  #define VMD_CFGBAR     0
> >  #define VMD_MEMBAR1    2
> >  #define VMD_MEMBAR2    4
> > @@ -67,6 +69,12 @@ enum vmd_features {
> >          * interrupt handling.
> >          */
> >         VMD_FEAT_CAN_BYPASS_MSI_REMAP           = (1 << 4),
> > +
> > +       /*
> > +        * Device must have ASPM policy overridden, as its default policy is
> > +        * incorrect.
> > +        */
> > +       VMD_FEAT_QUIRK_OVERRIDE_ASPM            = (1 << 5),
> 
> I think you specifically want to *enable* some ASPM link states, not
> just "override the default policy."  "Override" tells us nothing about
> whether you are enabling or disabling ASPM.  Applies to subject line
> as well.

Ack

> 
> >  };
> >  
> >  static DEFINE_IDA(vmd_instance_ida);
> > @@ -661,6 +669,30 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
> >         return 0;
> >  }
> >  
> > +/*
> > + * Override the BIOS ASPM policy and set the LTR value for PCI storage
> > + * devices on the VMD bride.
> 
> I don't think there's any BIOS "policy" here.  At this point BIOS is
> no longer involved at all, so all that's left is whatever ASPM config
> the BIOS did or did not do.
> 
> Why only storage?

Only storage devices will be on these root ports.

> 
> s/bride/bridge/
> 
> > + */
> > +static int vmd_enable_aspm(struct pci_dev *pdev, void *userdata)
> > +{
> > +       int features = *(int *)userdata, pos;
> > +
> > +       if (!(features & VMD_FEAT_QUIRK_OVERRIDE_ASPM) ||
> > +           pdev->class != PCI_CLASS_STORAGE_EXPRESS)
> > +               return 0;
> > +
> > +       pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR);
> > +       if (!pos)
> > +               return 0;
> > +
> > +       pci_write_config_word(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, 0x1003);
> > +       pci_write_config_word(pdev, pos + PCI_LTR_MAX_NOSNOOP_LAT, 0x1003);
> 
> 1) Where did this magic 0x1003 value come from?  Does that depend on
> the VMD device?  The endpoint?  The circuit design?  The path between
> endpoint and VMD?  What if there are switches in the path?

The number comes from the BIOS team. They are tied to the SoC. I don't believe there can be switches
in the path but Nirmal and Jonathan should know for sure. From what I've seen these root ports are
wired directly to M.2 slots on boards that are intended for storage devices.

> 
> 2) There exist broken devices where WORD config accesses don't work:
> https://lore.kernel.org/all/20211208000948.487820-1-rajatja@google.com/
> 
> We might need a way to quirk config accesses to those devices, but we
> don't have one yet.  So for now this needs to be a single DWORD write.

Ack

> 
> > +       if (pcie_aspm_policy_override(pdev))
> > +               pci_info(pdev, "Unable of override ASPM policy\n");
> 
> s/Unable of/Unable to/
> 
> I think we might need a message about when we *do* override the
> policy.  A note in dmesg might be useful for debugging.  I'm worried
> about the LTR programming because I really don't understand how we
> should be doing that.

Sure.

> 
> > +       return 0;
> > +}
> > +
> >  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
> >  {
> >         struct pci_sysdata *sd = &vmd->sysdata;
> > @@ -807,6 +839,8 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
> >         pci_scan_child_bus(vmd->bus);
> >         pci_assign_unassigned_bus_resources(vmd->bus);
> >  
> > +       pci_walk_bus(vmd->bus, vmd_enable_aspm, &features);
> 
> Do you support hotplug under VMD?  This will not happen for hot-added
> devices.

That I don't know. Nirmal, Jonathan?

David

> 
> >         /*
> >          * VMD root buses are virtual and don't return true on pci_is_pcie()
> >          * and will fail pcie_bus_configure_settings() early. It can instead be
> > @@ -948,15 +982,18 @@ static const struct pci_device_id vmd_ids[] = {
> >         {PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x467f),
> >                 .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
> >                                 VMD_FEAT_HAS_BUS_RESTRICTIONS |
> > -                               VMD_FEAT_OFFSET_FIRST_VECTOR,},
> > +                               VMD_FEAT_OFFSET_FIRST_VECTOR |
> > +                               VMD_FEAT_QUIRK_OVERRIDE_ASPM,},
> >         {PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x4c3d),
> >                 .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
> >                                 VMD_FEAT_HAS_BUS_RESTRICTIONS |
> > -                               VMD_FEAT_OFFSET_FIRST_VECTOR,},
> > +                               VMD_FEAT_OFFSET_FIRST_VECTOR |
> > +                               VMD_FEAT_QUIRK_OVERRIDE_ASPM,},
> >         {PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_9A0B),
> >                 .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
> >                                 VMD_FEAT_HAS_BUS_RESTRICTIONS |
> > -                               VMD_FEAT_OFFSET_FIRST_VECTOR,},
> > +                               VMD_FEAT_OFFSET_FIRST_VECTOR |
> > +                               VMD_FEAT_QUIRK_OVERRIDE_ASPM,},
> >         {0,}
> >  };
> >  MODULE_DEVICE_TABLE(pci, vmd_ids);
> > -- 
> > 2.25.1
> >
Bjorn Helgaas Dec. 20, 2021, 5:28 p.m. UTC | #3
On Thu, Dec 16, 2021 at 01:24:00PM -0800, David E. Box wrote:
> On Thu, 2021-12-16 at 11:26 -0600, Bjorn Helgaas wrote:
> > On Wed, Dec 15, 2021 at 09:56:00PM -0800, David E. Box wrote:
> > > From: Michael Bottini <michael.a.bottini@linux.intel.com>
> > > 
> > > On Tiger Lake and Alder Lake platforms, VMD controllers do not have ASPM
> > > enabled nor LTR values set by BIOS. This leads high power consumption on
> > > these platforms when VMD is enabled as reported in bugzilla [1].  Enable
> > > these features in the VMD driver using pcie_aspm_policy_override() to set
> > > the ASPM policy for the root ports.
> > > ...

> > > To do this, add an additional flag in VMD features to specify
> > > devices that must have their respective policies overridden.
> > 
> > I'm not clear on why you want this to apply to only certain VMDs
> > and not others.  Do some BIOSes configure ASPM for devices below
> > some VMDs?
> 
> Not currently. But the plan is for future devices to move back to
> having BIOS do the programming.

Since this is apparently a BIOS design choice, it seems wrong to base
the functionality on the Device ID instead of some signal that tells
us what the BIOS is doing.

> > > + * Override the BIOS ASPM policy and set the LTR value for PCI storage
> > > + * devices on the VMD bride.
> > 
> > I don't think there's any BIOS "policy" here.  At this point BIOS
> > is no longer involved at all, so all that's left is whatever ASPM
> > config the BIOS did or did not do.
> > 
> > Why only storage?
> 
> Only storage devices will be on these root ports.

How do you know this?  You say below that there's an M.2 slot, so
surely the slot could contain a non-storage device?  Couldn't somebody
build a platform with a VMD root port connected to a regular PCIe x4
slot?  Couldn't such a slot support hotplug?

It would be very unusual to hard-code topology knowledge like this
into the kernel, since plug-and-play has always been a major goal of
PCI.

> > > +static int vmd_enable_aspm(struct pci_dev *pdev, void *userdata)
> > > +{
> > > +       int features = *(int *)userdata, pos;
> > > +
> > > +       if (!(features & VMD_FEAT_QUIRK_OVERRIDE_ASPM) ||
> > > +           pdev->class != PCI_CLASS_STORAGE_EXPRESS)
> > > +               return 0;

> > > +       pci_write_config_word(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, 0x1003);
> > > +       pci_write_config_word(pdev, pos + PCI_LTR_MAX_NOSNOOP_LAT, 0x1003);
> > 
> > 1) Where did this magic 0x1003 value come from?  Does that depend
> > on the VMD device?  The endpoint?  The circuit design?  The path
> > between endpoint and VMD?  What if there are switches in the path?
> 
> The number comes from the BIOS team. They are tied to the SoC. I
> don't believe there can be switches in the path but Nirmal and
> Jonathan should know for sure. From what I've seen these root ports
> are wired directly to M.2 slots on boards that are intended for
> storage devices.

I guess you're saying that 0x1003 is determined by the SoC.  If so, I
think this value should be in your .driver_data (which would mean
converting it from a scalar to a struct, as many other drivers do).
The current code suggests that 0x1003 is the correct value for *all*
VMDs and all configurations.

I don't understand LTR well enough to be convinced that this static
value would be the correct value for all possible hierarchies and
devices that could appear below a VMD root port.  I would love to be
educated about this.

Bjorn
David E. Box Dec. 20, 2021, 11:06 p.m. UTC | #4
Hi Born,

So you know this patch comes from our platform power management team to address several VMD bugs
affecting power. The VMD driver developers are already cc'd but also adding Francisco from the VMD
team who can maybe help clarify.

On Mon, 2021-12-20 at 11:28 -0600, Bjorn Helgaas wrote:
> On Thu, Dec 16, 2021 at 01:24:00PM -0800, David E. Box wrote:
> > On Thu, 2021-12-16 at 11:26 -0600, Bjorn Helgaas wrote:
> > > On Wed, Dec 15, 2021 at 09:56:00PM -0800, David E. Box wrote:
> > > > From: Michael Bottini <michael.a.bottini@linux.intel.com>
> > > > 
> > > > On Tiger Lake and Alder Lake platforms, VMD controllers do not have ASPM
> > > > enabled nor LTR values set by BIOS. This leads high power consumption on
> > > > these platforms when VMD is enabled as reported in bugzilla [1].  Enable
> > > > these features in the VMD driver using pcie_aspm_policy_override() to set
> > > > the ASPM policy for the root ports.
> > > > ...
> 
> > > > To do this, add an additional flag in VMD features to specify
> > > > devices that must have their respective policies overridden.
> > > 
> > > I'm not clear on why you want this to apply to only certain VMDs
> > > and not others.  Do some BIOSes configure ASPM for devices below
> > > some VMDs?
> > 
> > Not currently. But the plan is for future devices to move back to
> > having BIOS do the programming.
> 
> Since this is apparently a BIOS design choice, it seems wrong to base
> the functionality on the Device ID instead of some signal that tells
> us what the BIOS is doing.

I don't know if there is another way to tell what the BIOS is _not_ doing. DID is tied to the SoC so
it's our best method of inferring this. We know the firmware team has said they will not support
programming these values for the mentioned platforms so there's no BIOS update coming to change
this. But to be sure, we can add a check first to confirm that these values have not been programmed
(ASPM is disabled and LTRs are 0).

> 
> > > > + * Override the BIOS ASPM policy and set the LTR value for PCI storage
> > > > + * devices on the VMD bride.
> > > 
> > > I don't think there's any BIOS "policy" here.  At this point BIOS
> > > is no longer involved at all, so all that's left is whatever ASPM
> > > config the BIOS did or did not do.
> > > 
> > > Why only storage?
> > 
> > Only storage devices will be on these root ports.
> 
> How do you know this?  You say below that there's an M.2 slot, so
> surely the slot could contain a non-storage device?  Couldn't somebody
> build a platform with a VMD root port connected to a regular PCIe x4
> slot?

In VMD mode the root ports under the VMD bridge are used specifically to manage storage devices. So
only storage devices should be attached to these ports.

>   Couldn't such a slot support hotplug?

The answer I just learned is yes. It may be possible to have a switch too. So both of those would
need to be addressed though I'm note quite sure how yet.

> 
> It would be very unusual to hard-code topology knowledge like this
> into the kernel, since plug-and-play has always been a major goal of
> PCI.
> 
> > > > +static int vmd_enable_aspm(struct pci_dev *pdev, void *userdata)
> > > > +{
> > > > +       int features = *(int *)userdata, pos;
> > > > +
> > > > +       if (!(features & VMD_FEAT_QUIRK_OVERRIDE_ASPM) ||
> > > > +           pdev->class != PCI_CLASS_STORAGE_EXPRESS)
> > > > +               return 0;
> 
> > > > +       pci_write_config_word(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, 0x1003);
> > > > +       pci_write_config_word(pdev, pos + PCI_LTR_MAX_NOSNOOP_LAT, 0x1003);
> > > 
> > > 1) Where did this magic 0x1003 value come from?  Does that depend
> > > on the VMD device?  The endpoint?  The circuit design?  The path
> > > between endpoint and VMD?  What if there are switches in the path?
> > 
> > The number comes from the BIOS team. They are tied to the SoC. I
> > don't believe there can be switches in the path but Nirmal and
> > Jonathan should know for sure. From what I've seen these root ports
> > are wired directly to M.2 slots on boards that are intended for
> > storage devices.
> 
> I guess you're saying that 0x1003 is determined by the SoC.  If so, I
> think this value should be in your .driver_data (which would mean
> converting it from a scalar to a struct, as many other drivers do).
> The current code suggests that 0x1003 is the correct value for *all*
> VMDs and all configurations.

Okay

> 
> I don't understand LTR well enough to be convinced that this static
> value would be the correct value for all possible hierarchies and
> devices that could appear below a VMD root port.  I would love to be
> educated about this.

I am not an LTR expert either, but LTRs are "conglomerated" per spec. Switches subtract their own
latency from the minimum LTR of the downstream ports and report this conglomerated value upstream.
So the value will decrease as needed to account for latency in the hierarchy.

PCI 5.0 Version 1.0, Section 6.18 (pg 611)

Thanks

David

> 
> Bjorn
Rajat Jain Dec. 21, 2021, 1:52 a.m. UTC | #5
Hello,

On Thu, Dec 16, 2021 at 1:24 PM David E. Box
<david.e.box@linux.intel.com> wrote:
>
> On Thu, 2021-12-16 at 11:26 -0600, Bjorn Helgaas wrote:
> > [+cc Rajat for LTR max latency write]

Thanks!

> >
> > On Wed, Dec 15, 2021 at 09:56:00PM -0800, David E. Box wrote:
> > > From: Michael Bottini <michael.a.bottini@linux.intel.com>
> > >
> > > On Tiger Lake and Alder Lake platforms, VMD controllers do not have ASPM
> > > enabled nor LTR values set by BIOS. This leads high power consumption on
> > > these platforms when VMD is enabled as reported in bugzilla [1].  Enable
> > > these features in the VMD driver using pcie_aspm_policy_override() to set
> > > the ASPM policy for the root ports.
> >
> > s/leads high/leads to high/
> >
> > Does this depend on "Tiger Lake and Alder Lake platforms"
> > specifically, or does it depend on a BIOS design choice, i.e., "don't
> > configure ASPM or LTR for devices below a VMD"?
>
> It was a BIOS design choice to have this done by the OS driver for both Tiger Lake and Alder Lake.
>
> >
> > The subject says "override ASPM on VMD devices," but it looks like
> > this affects the ASPM configuration of devices *below* the VMD, not of
> > the VMD itself.
>
> Yes.
>
> >
> > It looks like this only affects *NVMe* devices, since
> > vmd_enable_aspm() checks for PCI_CLASS_STORAGE_EXPRESS.  Why is that?
> > Is there something special about NVMe?  I'd think you would want to do
> > this for *all* devices below a VMD.
>
> We need this for all devices under the PCIe root ports below VMD. Those will only be NVMe device
> AFAICS.
>
> >
> > Since it only affects PCI_CLASS_STORAGE_EXPRESS devices, I don't think
> > it actually "sets ASPM policy for the root ports".  vmd_enable_aspm()
> > calls pcie_aspm_policy_override() on endpoints.  It's true that the
> > link ASPM state happens to be attached to the upstream end of the
> > link, but that's an ASPM implementation detail.
> >
> > This all needs to be clear in the subject and commit log.
>
> Ack
>
> >
> > > To do this, add an additional flag in VMD features to specify devices that
> > > must have their respective policies overridden.
> >
> > I'm not clear on why you want this to apply to only certain VMDs and
> > not others.  Do some BIOSes configure ASPM for devices below some
> > VMDs?
>
> Not currently. But the plan is for future devices to move back to having BIOS do the programming.
>
> >
> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=213717
> > >
> > > Signed-off-by: Michael Bottini <michael.a.bottini@linux.intel.com>
> > > Signed-off-by: David E. Box <david.e.box@linux.intel.com>
> > > Tested-by: Adhitya Mohan <me@adhityamohan.in>
> > > ---
> > > V4
> > >  - Refactor vmd_enable_apsm() to exit early, making the lines shorter
> > >    and more readable. Suggested by Christoph.
> > > V3
> > >  - No changes
> > > V2
> > >  - Use return status to print pci_info message if ASPM cannot be enabled.
> > >  - Add missing static declaration, caught by lkp@intel.com
> > >
> > >  drivers/pci/controller/vmd.c | 43 +++++++++++++++++++++++++++++++++---
> > >  1 file changed, 40 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > index a45e8e59d3d4..880afd450a14 100644
> > > --- a/drivers/pci/controller/vmd.c
> > > +++ b/drivers/pci/controller/vmd.c
> > > @@ -20,6 +20,8 @@
> > >
> > >  #include <asm/irqdomain.h>
> > >
> > > +#include "../pci.h"
> > > +
> > >  #define VMD_CFGBAR     0
> > >  #define VMD_MEMBAR1    2
> > >  #define VMD_MEMBAR2    4
> > > @@ -67,6 +69,12 @@ enum vmd_features {
> > >          * interrupt handling.
> > >          */
> > >         VMD_FEAT_CAN_BYPASS_MSI_REMAP           = (1 << 4),
> > > +
> > > +       /*
> > > +        * Device must have ASPM policy overridden, as its default policy is
> > > +        * incorrect.
> > > +        */
> > > +       VMD_FEAT_QUIRK_OVERRIDE_ASPM            = (1 << 5),
> >
> > I think you specifically want to *enable* some ASPM link states, not
> > just "override the default policy."  "Override" tells us nothing about
> > whether you are enabling or disabling ASPM.  Applies to subject line
> > as well.
>
> Ack
>
> >
> > >  };
> > >
> > >  static DEFINE_IDA(vmd_instance_ida);
> > > @@ -661,6 +669,30 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
> > >         return 0;
> > >  }
> > >
> > > +/*
> > > + * Override the BIOS ASPM policy and set the LTR value for PCI storage
> > > + * devices on the VMD bride.
> >
> > I don't think there's any BIOS "policy" here.  At this point BIOS is
> > no longer involved at all, so all that's left is whatever ASPM config
> > the BIOS did or did not do.
> >
> > Why only storage?
>
> Only storage devices will be on these root ports.
>
> >
> > s/bride/bridge/
> >
> > > + */
> > > +static int vmd_enable_aspm(struct pci_dev *pdev, void *userdata)
> > > +{
> > > +       int features = *(int *)userdata, pos;
> > > +
> > > +       if (!(features & VMD_FEAT_QUIRK_OVERRIDE_ASPM) ||
> > > +           pdev->class != PCI_CLASS_STORAGE_EXPRESS)
> > > +               return 0;
> > > +
> > > +       pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR);
> > > +       if (!pos)
> > > +               return 0;
> > > +
> > > +       pci_write_config_word(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, 0x1003);
> > > +       pci_write_config_word(pdev, pos + PCI_LTR_MAX_NOSNOOP_LAT, 0x1003);
> >
> > 1) Where did this magic 0x1003 value come from?  Does that depend on
> > the VMD device?  The endpoint?  The circuit design?  The path between
> > endpoint and VMD?  What if there are switches in the path?
>
> The number comes from the BIOS team. They are tied to the SoC. I don't believe there can be switches
> in the path but Nirmal and Jonathan should know for sure. From what I've seen these root ports are
> wired directly to M.2 slots on boards that are intended for storage devices.
>
> >
> > 2) There exist broken devices where WORD config accesses don't work:
> > https://lore.kernel.org/all/20211208000948.487820-1-rajatja@google.com/
> >
> > We might need a way to quirk config accesses to those devices, but we
> > don't have one yet.  So for now this needs to be a single DWORD write.
>
> Ack
>

Sideband: Yes, I'm trying to work out such a quirk mechanism, but for
now, yes, please do that as a single DWORD write access.

Thanks,

Rajat

> >
> > > +       if (pcie_aspm_policy_override(pdev))
> > > +               pci_info(pdev, "Unable of override ASPM policy\n");
> >
> > s/Unable of/Unable to/
> >
> > I think we might need a message about when we *do* override the
> > policy.  A note in dmesg might be useful for debugging.  I'm worried
> > about the LTR programming because I really don't understand how we
> > should be doing that.
>
> Sure.
>
> >
> > > +       return 0;
> > > +}
> > > +
> > >  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
> > >  {
> > >         struct pci_sysdata *sd = &vmd->sysdata;
> > > @@ -807,6 +839,8 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
> > >         pci_scan_child_bus(vmd->bus);
> > >         pci_assign_unassigned_bus_resources(vmd->bus);
> > >
> > > +       pci_walk_bus(vmd->bus, vmd_enable_aspm, &features);
> >
> > Do you support hotplug under VMD?  This will not happen for hot-added
> > devices.
>
> That I don't know. Nirmal, Jonathan?
>
> David
>
> >
> > >         /*
> > >          * VMD root buses are virtual and don't return true on pci_is_pcie()
> > >          * and will fail pcie_bus_configure_settings() early. It can instead be
> > > @@ -948,15 +982,18 @@ static const struct pci_device_id vmd_ids[] = {
> > >         {PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x467f),
> > >                 .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
> > >                                 VMD_FEAT_HAS_BUS_RESTRICTIONS |
> > > -                               VMD_FEAT_OFFSET_FIRST_VECTOR,},
> > > +                               VMD_FEAT_OFFSET_FIRST_VECTOR |
> > > +                               VMD_FEAT_QUIRK_OVERRIDE_ASPM,},
> > >         {PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x4c3d),
> > >                 .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
> > >                                 VMD_FEAT_HAS_BUS_RESTRICTIONS |
> > > -                               VMD_FEAT_OFFSET_FIRST_VECTOR,},
> > > +                               VMD_FEAT_OFFSET_FIRST_VECTOR |
> > > +                               VMD_FEAT_QUIRK_OVERRIDE_ASPM,},
> > >         {PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_9A0B),
> > >                 .driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
> > >                                 VMD_FEAT_HAS_BUS_RESTRICTIONS |
> > > -                               VMD_FEAT_OFFSET_FIRST_VECTOR,},
> > > +                               VMD_FEAT_OFFSET_FIRST_VECTOR |
> > > +                               VMD_FEAT_QUIRK_OVERRIDE_ASPM,},
> > >         {0,}
> > >  };
> > >  MODULE_DEVICE_TABLE(pci, vmd_ids);
> > > --
> > > 2.25.1
> > >
>
>
Rajat Jain Dec. 21, 2021, 2:14 a.m. UTC | #6
Hello,

On Mon, Dec 20, 2021 at 3:06 PM David E. Box
<david.e.box@linux.intel.com> wrote:
>
> Hi Born,
>
> So you know this patch comes from our platform power management team to address several VMD bugs
> affecting power. The VMD driver developers are already cc'd but also adding Francisco from the VMD
> team who can maybe help clarify.
>
> On Mon, 2021-12-20 at 11:28 -0600, Bjorn Helgaas wrote:
> > On Thu, Dec 16, 2021 at 01:24:00PM -0800, David E. Box wrote:
> > > On Thu, 2021-12-16 at 11:26 -0600, Bjorn Helgaas wrote:
> > > > On Wed, Dec 15, 2021 at 09:56:00PM -0800, David E. Box wrote:
> > > > > From: Michael Bottini <michael.a.bottini@linux.intel.com>
> > > > >
> > > > > On Tiger Lake and Alder Lake platforms, VMD controllers do not have ASPM
> > > > > enabled nor LTR values set by BIOS. This leads high power consumption on
> > > > > these platforms when VMD is enabled as reported in bugzilla [1].  Enable
> > > > > these features in the VMD driver using pcie_aspm_policy_override() to set
> > > > > the ASPM policy for the root ports.
> > > > > ...
> >
> > > > > To do this, add an additional flag in VMD features to specify
> > > > > devices that must have their respective policies overridden.
> > > >
> > > > I'm not clear on why you want this to apply to only certain VMDs
> > > > and not others.  Do some BIOSes configure ASPM for devices below
> > > > some VMDs?
> > >
> > > Not currently. But the plan is for future devices to move back to
> > > having BIOS do the programming.
> >
> > Since this is apparently a BIOS design choice, it seems wrong to base
> > the functionality on the Device ID instead of some signal that tells
> > us what the BIOS is doing.
>
> I don't know if there is another way to tell what the BIOS is _not_ doing. DID is tied to the SoC so
> it's our best method of inferring this. We know the firmware team has said they will not support
> programming these values for the mentioned platforms so there's no BIOS update coming to change
> this.

I'd like to mention here that ChromeOS also runs linux kernel on TGL
and ADL platforms, and we may run a different BIOS altogether, so yes,
we'd greatly appreciate if you do not assume BIOS setting from device
ID / SOC ID, if the setting is *not tied to device ID / SOC ID*.
Perhaps you use ACPI or device trees to convey what BIOS has or hasn't
configured?

> But to be sure, we can add a check first to confirm that these values have not been programmed
> (ASPM is disabled and LTRs are 0).

Yes, reading the ASPM configuration from hardware sounds OK to me.

>
> >
> > > > > + * Override the BIOS ASPM policy and set the LTR value for PCI storage
> > > > > + * devices on the VMD bride.
> > > >
> > > > I don't think there's any BIOS "policy" here.  At this point BIOS
> > > > is no longer involved at all, so all that's left is whatever ASPM
> > > > config the BIOS did or did not do.

Should the Policy override really take place if aspm_policy =
POLICY_DEFAULT which seems to suggest we leave it to whatever BIOS
left it at?

Perhaps for this override to take place, aspm_policy needs to be the
policy needs to be POLICY_POWERSAVE?

#define POLICY_DEFAULT 0        /* BIOS default setting */
#define POLICY_PERFORMANCE 1    /* high performance */
#define POLICY_POWERSAVE 2      /* high power saving */
#define POLICY_POWER_SUPERSAVE 3 /* possibly even more power saving */


> > > >
> > > > Why only storage?
> > >
> > > Only storage devices will be on these root ports.
> >
> > How do you know this?  You say below that there's an M.2 slot, so
> > surely the slot could contain a non-storage device?  Couldn't somebody
> > build a platform with a VMD root port connected to a regular PCIe x4
> > slot?
>
> In VMD mode the root ports under the VMD bridge are used specifically to manage storage devices. So
> only storage devices should be attached to these ports.

Bjorn: from https://www.intel.com/content/www/us/en/architecture-and-technology/intel-volume-management-device-overview.html
"Intel® Volume Management Device (Intel® VMD) is specifically designed
for enterprise-grade management of NVMe SSDs connected to Intel® Xeon®
CPUs."

I think what David means is that any other device other than an NVME
SSD showing up under VMD might be an error / bad device.


>
> >   Couldn't such a slot support hotplug?
>
> The answer I just learned is yes. It may be possible to have a switch too. So both of those would
> need to be addressed though I'm note quite sure how yet.
>
> >
> > It would be very unusual to hard-code topology knowledge like this
> > into the kernel, since plug-and-play has always been a major goal of
> > PCI.
> >
> > > > > +static int vmd_enable_aspm(struct pci_dev *pdev, void *userdata)
> > > > > +{
> > > > > +       int features = *(int *)userdata, pos;
> > > > > +
> > > > > +       if (!(features & VMD_FEAT_QUIRK_OVERRIDE_ASPM) ||
> > > > > +           pdev->class != PCI_CLASS_STORAGE_EXPRESS)
> > > > > +               return 0;
> >
> > > > > +       pci_write_config_word(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, 0x1003);
> > > > > +       pci_write_config_word(pdev, pos + PCI_LTR_MAX_NOSNOOP_LAT, 0x1003);
> > > >
> > > > 1) Where did this magic 0x1003 value come from?  Does that depend
> > > > on the VMD device?  The endpoint?  The circuit design?  The path
> > > > between endpoint and VMD?  What if there are switches in the path?
> > >
> > > The number comes from the BIOS team. They are tied to the SoC. I
> > > don't believe there can be switches in the path but Nirmal and
> > > Jonathan should know for sure. From what I've seen these root ports
> > > are wired directly to M.2 slots on boards that are intended for
> > > storage devices.
> >
> > I guess you're saying that 0x1003 is determined by the SoC.  If so, I
> > think this value should be in your .driver_data (which would mean
> > converting it from a scalar to a struct, as many other drivers do).
> > The current code suggests that 0x1003 is the correct value for *all*
> > VMDs and all configurations.
>
> Okay
>
> >
> > I don't understand LTR well enough to be convinced that this static
> > value would be the correct value for all possible hierarchies and
> > devices that could appear below a VMD root port.  I would love to be
> > educated about this.
>
> I am not an LTR expert either, but LTRs are "conglomerated" per spec. Switches subtract their own
> latency from the minimum LTR of the downstream ports and report this conglomerated value upstream.
> So the value will decrease as needed to account for latency in the hierarchy.
>
> PCI 5.0 Version 1.0, Section 6.18 (pg 611)

As described here and as much as I understand, I think these LTR
values (0x1003 here) depend on the platform topology. So what works
for David's platform may not work for another one that may have
Switched for e.g., or for perhaps for another NVME that is hotplugged
later (and has different characteristics than what is on board).
Technically tying the LTR values to a SOC or VID/DID sounds incorrect
to me, and instead should be tied to the platform. Ideally I *thiink*
they should come from ACPI or device tree. However, I wouldn't protest
putting them in driver_data for now given that there aren't any other
users. We may have to find a solution once there are more platforms
that begin to use this driver. I think this is a bigger problem that
needs to be addressed w.r.t. ASPM in general.

Thanks & Best Regards,

Rajat


>
> Thanks
>
> David
>
> >
> > Bjorn
>
>
David E. Box Dec. 21, 2021, 9:11 p.m. UTC | #7
Hi,

On Mon, 2021-12-20 at 18:14 -0800, Rajat Jain wrote:
> Hello,
> 
> On Mon, Dec 20, 2021 at 3:06 PM David E. Box
> <david.e.box@linux.intel.com> wrote:
> > Hi Born,
> > 
> > So you know this patch comes from our platform power management team to
> > address several VMD bugs
> > affecting power. The VMD driver developers are already cc'd but also adding
> > Francisco from the VMD
> > team who can maybe help clarify.
> > 
> > On Mon, 2021-12-20 at 11:28 -0600, Bjorn Helgaas wrote:
> > > On Thu, Dec 16, 2021 at 01:24:00PM -0800, David E. Box wrote:
> > > > On Thu, 2021-12-16 at 11:26 -0600, Bjorn Helgaas wrote:
> > > > > On Wed, Dec 15, 2021 at 09:56:00PM -0800, David E. Box wrote:
> > > > > > From: Michael Bottini <michael.a.bottini@linux.intel.com>
> > > > > > 
> > > > > > On Tiger Lake and Alder Lake platforms, VMD controllers do not have
> > > > > > ASPM
> > > > > > enabled nor LTR values set by BIOS. This leads high power
> > > > > > consumption on
> > > > > > these platforms when VMD is enabled as reported in bugzilla
> > > > > > [1].  Enable
> > > > > > these features in the VMD driver using pcie_aspm_policy_override()
> > > > > > to set
> > > > > > the ASPM policy for the root ports.
> > > > > > ...
> > > > > > To do this, add an additional flag in VMD features to specify
> > > > > > devices that must have their respective policies overridden.
> > > > > 
> > > > > I'm not clear on why you want this to apply to only certain VMDs
> > > > > and not others.  Do some BIOSes configure ASPM for devices below
> > > > > some VMDs?
> > > > 
> > > > Not currently. But the plan is for future devices to move back to
> > > > having BIOS do the programming.
> > > 
> > > Since this is apparently a BIOS design choice, it seems wrong to base
> > > the functionality on the Device ID instead of some signal that tells
> > > us what the BIOS is doing.
> > 
> > I don't know if there is another way to tell what the BIOS is _not_ doing.
> > DID is tied to the SoC so
> > it's our best method of inferring this. We know the firmware team has said
> > they will not support
> > programming these values for the mentioned platforms so there's no BIOS
> > update coming to change
> > this.
> 
> I'd like to mention here that ChromeOS also runs linux kernel on TGL
> and ADL platforms, and we may run a different BIOS altogether, so yes,
> we'd greatly appreciate if you do not assume BIOS setting from device
> ID / SOC ID, if the setting is *not tied to device ID / SOC ID*.
> Perhaps you use ACPI or device trees to convey what BIOS has or hasn't
> configured?

The reason why BIOS is not programming these values is because when VMD is
enabled the ports are not visible to BIOS. This would apply to any BIOS
including those used by ChromeOS.


And because BIOS doesn't see these ports ...

> 
> > But to be sure, we can add a check first to confirm that these values have
> > not been programmed
> > (ASPM is disabled and LTRs are 0).
> 
> Yes, reading the ASPM configuration from hardware sounds OK to me.
> 
> > > > > > + * Override the BIOS ASPM policy and set the LTR value for PCI
> > > > > > storage
> > > > > > + * devices on the VMD bride.
> > > > > 
> > > > > I don't think there's any BIOS "policy" here.  At this point BIOS
> > > > > is no longer involved at all, so all that's left is whatever ASPM
> > > > > config the BIOS did or did not do.
> 
> Should the Policy override really take place if aspm_policy =
> POLICY_DEFAULT which seems to suggest we leave it to whatever BIOS
> left it at?

... we cannot rely on POLICY_DEFAULT for their ASPM setting. This leaves the
devices with ASPM disabled.

That said, we do honor the PCI aspm_disabled flag which is tied to both the FADT
bit and module parameter. So we only enable it if OS control is allowed.

> 
> Perhaps for this override to take place, aspm_policy needs to be the
> policy needs to be POLICY_POWERSAVE?
> 
> #define POLICY_DEFAULT 0        /* BIOS default setting */
> #define POLICY_PERFORMANCE 1    /* high performance */
> #define POLICY_POWERSAVE 2      /* high power saving */
> #define POLICY_POWER_SUPERSAVE 3 /* possibly even more power saving */

We do have to set a default in place of the missing BIOS configuration. We chose
to do ASPM_STATE_ALL, allowing all ASPM settings supported by the endpoint.

> 
> 
> > > > > Why only storage?
> > > > 
> > > > Only storage devices will be on these root ports.
> > > 
> > > How do you know this?  You say below that there's an M.2 slot, so
> > > surely the slot could contain a non-storage device?  Couldn't somebody
> > > build a platform with a VMD root port connected to a regular PCIe x4
> > > slot?
> > 
> > In VMD mode the root ports under the VMD bridge are used specifically to
> > manage storage devices. So
> > only storage devices should be attached to these ports.
> 
> Bjorn: from 
> https://www.intel.com/content/www/us/en/architecture-and-technology/intel-volume-management-device-overview.html
> "Intel® Volume Management Device (Intel® VMD) is specifically designed
> for enterprise-grade management of NVMe SSDs connected to Intel® Xeon®
> CPUs."
> 
> I think what David means is that any other device other than an NVME
> SSD showing up under VMD might be an error / bad device.

I don't know if that would happen. It certainly would not be used as intended.

> 
> 
> > >   Couldn't such a slot support hotplug?
> > 
> > The answer I just learned is yes. It may be possible to have a switch too.
> > So both of those would
> > need to be addressed though I'm note quite sure how yet.
> > 
> > > It would be very unusual to hard-code topology knowledge like this
> > > into the kernel, since plug-and-play has always been a major goal of
> > > PCI.
> > > 
> > > > > > +static int vmd_enable_aspm(struct pci_dev *pdev, void *userdata)
> > > > > > +{
> > > > > > +       int features = *(int *)userdata, pos;
> > > > > > +
> > > > > > +       if (!(features & VMD_FEAT_QUIRK_OVERRIDE_ASPM) ||
> > > > > > +           pdev->class != PCI_CLASS_STORAGE_EXPRESS)
> > > > > > +               return 0;
> > > > > > +       pci_write_config_word(pdev, pos + PCI_LTR_MAX_SNOOP_LAT,
> > > > > > 0x1003);
> > > > > > +       pci_write_config_word(pdev, pos + PCI_LTR_MAX_NOSNOOP_LAT,
> > > > > > 0x1003);
> > > > > 
> > > > > 1) Where did this magic 0x1003 value come from?  Does that depend
> > > > > on the VMD device?  The endpoint?  The circuit design?  The path
> > > > > between endpoint and VMD?  What if there are switches in the path?
> > > > 
> > > > The number comes from the BIOS team. They are tied to the SoC. I
> > > > don't believe there can be switches in the path but Nirmal and
> > > > Jonathan should know for sure. From what I've seen these root ports
> > > > are wired directly to M.2 slots on boards that are intended for
> > > > storage devices.
> > > 
> > > I guess you're saying that 0x1003 is determined by the SoC.  If so, I
> > > think this value should be in your .driver_data (which would mean
> > > converting it from a scalar to a struct, as many other drivers do).
> > > The current code suggests that 0x1003 is the correct value for *all*
> > > VMDs and all configurations.
> > 
> > Okay
> > 
> > > I don't understand LTR well enough to be convinced that this static
> > > value would be the correct value for all possible hierarchies and
> > > devices that could appear below a VMD root port.  I would love to be
> > > educated about this.
> > 
> > I am not an LTR expert either, but LTRs are "conglomerated" per spec.
> > Switches subtract their own
> > latency from the minimum LTR of the downstream ports and report this
> > conglomerated value upstream.
> > So the value will decrease as needed to account for latency in the
> > hierarchy.
> > 
> > PCI 5.0 Version 1.0, Section 6.18 (pg 611)
> 
> As described here and as much as I understand, I think these LTR
> values (0x1003 here) depend on the platform topology. So what works
> for David's platform may not work for another one that may have
> Switched for e.g., or for perhaps for another NVME that is hotplugged
> later (and has different characteristics than what is on board).
> Technically tying the LTR values to a SOC or VID/DID sounds incorrect
> to me, and instead should be tied to the platform. Ideally I *thiink*
> they should come from ACPI or device tree. However, I wouldn't protest
> putting them in driver_data for now given that there aren't any other
> users. We may have to find a solution once there are more platforms
> that begin to use this driver. I think this is a bigger problem that
> needs to be addressed w.r.t. ASPM in general.

I'll seek more clarification from firmware team on the LTR. But again the fact
that switches can reduce the reported LTR seems to allow for at least some
decoupling of the endpoint LTR value from the topology.

David

> 
> Thanks & Best Regards,
> 
> Rajat
> 
> 
> > Thanks
> > 
> > David
> > 
> > > Bjorn
Christoph Hellwig Dec. 22, 2021, 7:50 a.m. UTC | #8
On Tue, Dec 21, 2021 at 01:11:26PM -0800, David E. Box wrote:
> The reason why BIOS is not programming these values is because when VMD is
> enabled the ports are not visible to BIOS. This would apply to any BIOS
> including those used by ChromeOS.
> And because BIOS doesn't see these ports ...

Isn't VMD enabled by the BIOS?  The bios should be able see them the
same way as Linux does.  But given that the whole point of VMD is to hide 
these ports from Windows it obviously doesn't.  For Linux and thus for
ChromeOS VMD is completely pointless, so give that the ChromeOS people
aren't as dumb as some people at Intel I'm pretty sure they won't enable
it and just use the ports as normal one and avoid this whole
self-inflicted pain.

If only Intel could give the OS a way to disable VMD at runtime and
discover the actual ports, then we would not have this whole nightmare
of having to support it.
diff mbox series

Patch

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index a45e8e59d3d4..880afd450a14 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -20,6 +20,8 @@ 
 
 #include <asm/irqdomain.h>
 
+#include "../pci.h"
+
 #define VMD_CFGBAR	0
 #define VMD_MEMBAR1	2
 #define VMD_MEMBAR2	4
@@ -67,6 +69,12 @@  enum vmd_features {
 	 * interrupt handling.
 	 */
 	VMD_FEAT_CAN_BYPASS_MSI_REMAP		= (1 << 4),
+
+	/*
+	 * Device must have ASPM policy overridden, as its default policy is
+	 * incorrect.
+	 */
+	VMD_FEAT_QUIRK_OVERRIDE_ASPM		= (1 << 5),
 };
 
 static DEFINE_IDA(vmd_instance_ida);
@@ -661,6 +669,30 @@  static int vmd_alloc_irqs(struct vmd_dev *vmd)
 	return 0;
 }
 
+/*
+ * Override the BIOS ASPM policy and set the LTR value for PCI storage
+ * devices on the VMD bride.
+ */
+static int vmd_enable_aspm(struct pci_dev *pdev, void *userdata)
+{
+	int features = *(int *)userdata, pos;
+
+	if (!(features & VMD_FEAT_QUIRK_OVERRIDE_ASPM) ||
+	    pdev->class != PCI_CLASS_STORAGE_EXPRESS)
+		return 0;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR);
+	if (!pos)
+		return 0;
+
+	pci_write_config_word(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, 0x1003);
+	pci_write_config_word(pdev, pos + PCI_LTR_MAX_NOSNOOP_LAT, 0x1003);
+	if (pcie_aspm_policy_override(pdev))
+		pci_info(pdev, "Unable of override ASPM policy\n");
+
+	return 0;
+}
+
 static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 {
 	struct pci_sysdata *sd = &vmd->sysdata;
@@ -807,6 +839,8 @@  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 	pci_scan_child_bus(vmd->bus);
 	pci_assign_unassigned_bus_resources(vmd->bus);
 
+	pci_walk_bus(vmd->bus, vmd_enable_aspm, &features);
+
 	/*
 	 * VMD root buses are virtual and don't return true on pci_is_pcie()
 	 * and will fail pcie_bus_configure_settings() early. It can instead be
@@ -948,15 +982,18 @@  static const struct pci_device_id vmd_ids[] = {
 	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x467f),
 		.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
 				VMD_FEAT_HAS_BUS_RESTRICTIONS |
-				VMD_FEAT_OFFSET_FIRST_VECTOR,},
+				VMD_FEAT_OFFSET_FIRST_VECTOR |
+				VMD_FEAT_QUIRK_OVERRIDE_ASPM,},
 	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x4c3d),
 		.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
 				VMD_FEAT_HAS_BUS_RESTRICTIONS |
-				VMD_FEAT_OFFSET_FIRST_VECTOR,},
+				VMD_FEAT_OFFSET_FIRST_VECTOR |
+				VMD_FEAT_QUIRK_OVERRIDE_ASPM,},
 	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_VMD_9A0B),
 		.driver_data = VMD_FEAT_HAS_MEMBAR_SHADOW_VSCAP |
 				VMD_FEAT_HAS_BUS_RESTRICTIONS |
-				VMD_FEAT_OFFSET_FIRST_VECTOR,},
+				VMD_FEAT_OFFSET_FIRST_VECTOR |
+				VMD_FEAT_QUIRK_OVERRIDE_ASPM,},
 	{0,}
 };
 MODULE_DEVICE_TABLE(pci, vmd_ids);