mbox series

[0/2] Add support for StorageD3Enable _DSD property

Message ID 20200428003214.3764-1-david.e.box@linux.intel.com (mailing list archive)
Headers show
Series Add support for StorageD3Enable _DSD property | expand

Message

David E. Box April 28, 2020, 12:32 a.m. UTC
NVMe storage power management during suspend-to-idle, particularly on
laptops, has been inconsistent with some devices working with D3 while
others must rely on NVMe APST in order for power savings to be realized.
Currently the default is to use APST unless quirked to do otherwise.
However newer platforms, like Intel Comet Lake systems, may require NVMe
drives to use D3 in order for the PCIe ports to be properly power managed.
To make it easier for drivers to choose, these platforms may supply a
special "StorageD3Enable" _DSD property under the root port that the device
is attached to. If supplied, the driver must use D3 in order for the
platform to realize the deepest power savings in suspend-to-idle.
    
The first patch adds the new _DSD GUID and fowards the property through the
pci/acpi layer to the pci device.

The second patch adds support for the property to the nvme driver.

David E. Box (2):
  pci: Add ACPI StorageD3Enable _DSD support
  drivers/nvme: Add support for ACPI StorageD3Enable property

 drivers/acpi/property.c |  3 +++
 drivers/nvme/host/pci.c |  7 ++++++
 drivers/pci/pci-acpi.c  | 47 +++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.c       |  6 ++++++
 drivers/pci/pci.h       |  4 ++++
 include/linux/pci.h     |  1 +
 6 files changed, 68 insertions(+)

Comments

Christoph Hellwig April 28, 2020, 5:13 a.m. UTC | #1
On Mon, Apr 27, 2020 at 05:32:12PM -0700, David E. Box wrote:
> NVMe storage power management during suspend-to-idle, particularly on
> laptops, has been inconsistent with some devices working with D3 while
> others must rely on NVMe APST in order for power savings to be realized.
> Currently the default is to use APST unless quirked to do otherwise.
> However newer platforms, like Intel Comet Lake systems, may require NVMe
> drives to use D3 in order for the PCIe ports to be properly power managed.
> To make it easier for drivers to choose, these platforms may supply a
> special "StorageD3Enable" _DSD property under the root port that the device
> is attached to. If supplied, the driver must use D3 in order for the
> platform to realize the deepest power savings in suspend-to-idle.
>     
> The first patch adds the new _DSD GUID and fowards the property through the
> pci/acpi layer to the pci device.
> 
> The second patch adds support for the property to the nvme driver.

I'm not sure who came up with the idea to put this into ACPI, but it
belongs into NVMe.  Please talk to the NVMe technical working group
instead of trying to overrules them in an unrelated group that doesn't
apply to all of PCIe.
David E. Box April 28, 2020, 2:09 p.m. UTC | #2
On Tue, 2020-04-28 at 07:13 +0200, Christoph Hellwig wrote:
> On Mon, Apr 27, 2020 at 05:32:12PM -0700, David E. Box wrote:
> > NVMe storage power management during suspend-to-idle, particularly
> > on
> > laptops, has been inconsistent with some devices working with D3
> > while
> > others must rely on NVMe APST in order for power savings to be
> > realized.
> > Currently the default is to use APST unless quirked to do
> > otherwise.
> > However newer platforms, like Intel Comet Lake systems, may require
> > NVMe
> > drives to use D3 in order for the PCIe ports to be properly power
> > managed.
> > To make it easier for drivers to choose, these platforms may supply
> > a
> > special "StorageD3Enable" _DSD property under the root port that
> > the device
> > is attached to. If supplied, the driver must use D3 in order for
> > the
> > platform to realize the deepest power savings in suspend-to-idle.
> >     
> > The first patch adds the new _DSD GUID and fowards the property
> > through the
> > pci/acpi layer to the pci device.
> > 
> > The second patch adds support for the property to the nvme driver.
> 
> I'm not sure who came up with the idea to put this into ACPI, but it
> belongs into NVMe.  Please talk to the NVMe technical working group
> instead of trying to overrules them in an unrelated group that
> doesn't
> apply to all of PCIe.

Agreed that this is not ideal since it does not apply to all of PCIe.
But as the property already exists on shipping systems, we need to be
able to read it in the NVMe driver and the patch is consitent with the
way properties under PCI ports are read.

David
Christoph Hellwig April 28, 2020, 2:22 p.m. UTC | #3
On Tue, Apr 28, 2020 at 07:09:59AM -0700, David E. Box wrote:
> > I'm not sure who came up with the idea to put this into ACPI, but it
> > belongs into NVMe.  Please talk to the NVMe technical working group
> > instead of trying to overrules them in an unrelated group that
> > doesn't
> > apply to all of PCIe.
> 
> Agreed that this is not ideal since it does not apply to all of PCIe.
> But as the property already exists on shipping systems, we need to be
> able to read it in the NVMe driver and the patch is consitent with the
> way properties under PCI ports are read.

The point is that it is not the BIOSes job do decide how Linux does
power management.  For example D3 has really horrible entry and exit
latencies in many cases, and will lead to higher power usage.
David E. Box April 28, 2020, 3:27 p.m. UTC | #4
On Tue, 2020-04-28 at 16:22 +0200, Christoph Hellwig wrote:
> On Tue, Apr 28, 2020 at 07:09:59AM -0700, David E. Box wrote:
> > > I'm not sure who came up with the idea to put this into ACPI, but
> > > it
> > > belongs into NVMe.  Please talk to the NVMe technical working
> > > group
> > > instead of trying to overrules them in an unrelated group that
> > > doesn't
> > > apply to all of PCIe.
> > 
> > Agreed that this is not ideal since it does not apply to all of
> > PCIe.
> > But as the property already exists on shipping systems, we need to
> > be
> > able to read it in the NVMe driver and the patch is consitent with
> > the
> > way properties under PCI ports are read.
> 
> The point is that it is not the BIOSes job do decide how Linux does
> power management.  For example D3 has really horrible entry and exit
> latencies in many cases, and will lead to higher power usage.

The platform can know which pm policies will save the most power. But
since the solution doesn't apply to all PCIe devices (despite BIOS
specifying it that way) I'll withdraw this patch. Thanks.

David
Dan Williams April 29, 2020, 5:20 a.m. UTC | #5
On Tue, 2020-04-28 at 08:27 -0700, David E. Box wrote:
> On Tue, 2020-04-28 at 16:22 +0200, Christoph Hellwig wrote:
> > On Tue, Apr 28, 2020 at 07:09:59AM -0700, David E. Box wrote:
> > > > I'm not sure who came up with the idea to put this into ACPI,
> > > > but
> > > > it
> > > > belongs into NVMe.  Please talk to the NVMe technical working
> > > > group
> > > > instead of trying to overrules them in an unrelated group that
> > > > doesn't
> > > > apply to all of PCIe.
> > > 
> > > Agreed that this is not ideal since it does not apply to all of
> > > PCIe.
> > > But as the property already exists on shipping systems, we need
> > > to
> > > be
> > > able to read it in the NVMe driver and the patch is consitent
> > > with
> > > the
> > > way properties under PCI ports are read.
> > 
> > The point is that it is not the BIOSes job do decide how Linux does
> > power management.  For example D3 has really horrible entry and
> > exit
> > latencies in many cases, and will lead to higher power usage.
> 
> The platform can know which pm policies will save the most power. But
> since the solution doesn't apply to all PCIe devices (despite BIOS
> specifying it that way) I'll withdraw this patch. Thanks.

Wait, why withdraw? In this case the platform is unfortunately
preventing the standard driver from making a proper determination. So
while I agree that it's not the BIOSes job, when the platform actively
prevents proper operation due to some ill conceived non-standard
platform property what is Linux left to do on these systems?

The *patch* is not trying to overrule NVME, and the best I can say is
that the Intel Linux team was not in the loop when this was being
decided between the platform BIOS implemenation and  whomever  thought
they could just publish random ACPI properties that impacted NVME
operation [1].

So now David is trying to get these platform unbroken because they are
already shipping with this b0rkage.

[1]: 
https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro
Keith Busch April 29, 2020, 3:10 p.m. UTC | #6
On Wed, Apr 29, 2020 at 05:20:09AM +0000, Williams, Dan J wrote:
> On Tue, 2020-04-28 at 08:27 -0700, David E. Box wrote:
> > On Tue, 2020-04-28 at 16:22 +0200, Christoph Hellwig wrote:
> > > On Tue, Apr 28, 2020 at 07:09:59AM -0700, David E. Box wrote:
> > > > > I'm not sure who came up with the idea to put this into ACPI,
> > > > > but
> > > > > it
> > > > > belongs into NVMe.  Please talk to the NVMe technical working
> > > > > group
> > > > > instead of trying to overrules them in an unrelated group that
> > > > > doesn't
> > > > > apply to all of PCIe.
> > > > 
> > > > Agreed that this is not ideal since it does not apply to all of
> > > > PCIe.
> > > > But as the property already exists on shipping systems, we need
> > > > to
> > > > be
> > > > able to read it in the NVMe driver and the patch is consitent
> > > > with
> > > > the
> > > > way properties under PCI ports are read.
> > > 
> > > The point is that it is not the BIOSes job do decide how Linux does
> > > power management.  For example D3 has really horrible entry and
> > > exit
> > > latencies in many cases, and will lead to higher power usage.
> > 
> > The platform can know which pm policies will save the most power. But
> > since the solution doesn't apply to all PCIe devices (despite BIOS
> > specifying it that way) I'll withdraw this patch. Thanks.
> 
> Wait, why withdraw? In this case the platform is unfortunately
> preventing the standard driver from making a proper determination. So
> while I agree that it's not the BIOSes job, when the platform actively
> prevents proper operation due to some ill conceived non-standard
> platform property what is Linux left to do on these systems?
> 
> The *patch* is not trying to overrule NVME, and the best I can say is
> that the Intel Linux team was not in the loop when this was being
> decided between the platform BIOS implemenation and  whomever  thought
> they could just publish random ACPI properties that impacted NVME
> operation [1].
> 
> So now David is trying to get these platform unbroken because they are
> already shipping with this b0rkage.

Rather than quirking all these cases, which I get the feeling there
are many more than we've currently got in our quirk list, perhaps it'd
be simpler to default to the simple suspend. AFAIK, the simple suspend
works for all platforms, though it may not realize the best power savings
and/or exit latency.
David E. Box April 29, 2020, 4:11 p.m. UTC | #7
On Wed, 2020-04-29 at 05:20 +0000, Williams, Dan J wrote:
> On Tue, 2020-04-28 at 08:27 -0700, David E. Box wrote:
> > On Tue, 2020-04-28 at 16:22 +0200, Christoph Hellwig wrote:
> > > On Tue, Apr 28, 2020 at 07:09:59AM -0700, David E. Box wrote:
> > > > > I'm not sure who came up with the idea to put this into ACPI,
> > > > > but
> > > > > it
> > > > > belongs into NVMe.  Please talk to the NVMe technical working
> > > > > group
> > > > > instead of trying to overrules them in an unrelated group
> > > > > that
> > > > > doesn't
> > > > > apply to all of PCIe.
> > > > 
> > > > Agreed that this is not ideal since it does not apply to all of
> > > > PCIe.
> > > > But as the property already exists on shipping systems, we need
> > > > to
> > > > be
> > > > able to read it in the NVMe driver and the patch is consitent
> > > > with
> > > > the
> > > > way properties under PCI ports are read.
> > > 
> > > The point is that it is not the BIOSes job do decide how Linux
> > > does
> > > power management.  For example D3 has really horrible entry and
> > > exit
> > > latencies in many cases, and will lead to higher power usage.
> > 
> > The platform can know which pm policies will save the most power.
> > But
> > since the solution doesn't apply to all PCIe devices (despite BIOS
> > specifying it that way) I'll withdraw this patch. Thanks.
> 
> Wait, why withdraw? In this case the platform is unfortunately
> preventing the standard driver from making a proper determination. So
> while I agree that it's not the BIOSes job, when the platform
> actively
> prevents proper operation due to some ill conceived non-standard
> platform property what is Linux left to do on these systems?
> 
> The *patch* is not trying to overrule NVME, and the best I can say is
> that the Intel Linux team was not in the loop when this was being
> decided between the platform BIOS implemenation
> and  whomever  thought
> they could just publish random ACPI properties that impacted NVME
> operation [1].
> 
> So now David is trying to get these platform unbroken because they
> are
> already shipping with this b0rkage.

Not drop completely. This patch copied the code used to read _DSD
properties under PCI root ports. But I agree that such properties
should apply to all devices on those ports and unfortuntely that's not
the case here. BIOS got it wrong. My thought in dropping this patch is
to rewrite it to read the property directly from the nvme driver. Not
the way it's typically done either but it would avoid a global change
in the pci core while allowing us to deal with the firmware we have.

David
Christoph Hellwig May 1, 2020, 1:10 p.m. UTC | #8
On Wed, Apr 29, 2020 at 05:20:09AM +0000, Williams, Dan J wrote:
> > The platform can know which pm policies will save the most power. But
> > since the solution doesn't apply to all PCIe devices (despite BIOS
> > specifying it that way) I'll withdraw this patch. Thanks.
> 
> Wait, why withdraw? In this case the platform is unfortunately
> preventing the standard driver from making a proper determination. So
> while I agree that it's not the BIOSes job, when the platform actively
> prevents proper operation due to some ill conceived non-standard
> platform property what is Linux left to do on these systems?
> 
> The *patch* is not trying to overrule NVME, and the best I can say is
> that the Intel Linux team was not in the loop when this was being
> decided between the platform BIOS implemenation and  whomever  thought
> they could just publish random ACPI properties that impacted NVME
> operation [1].
> 
> So now David is trying to get these platform unbroken because they are
> already shipping with this b0rkage.

So can we please clearly mark this as a quirk and warn in the kernel
log about a buggy BIOS?
Christoph Hellwig May 1, 2020, 1:12 p.m. UTC | #9
On Wed, Apr 29, 2020 at 09:11:13AM -0700, David E. Box wrote:
> Not drop completely. This patch copied the code used to read _DSD
> properties under PCI root ports. But I agree that such properties
> should apply to all devices on those ports and unfortuntely that's not
> the case here. BIOS got it wrong. My thought in dropping this patch is
> to rewrite it to read the property directly from the nvme driver. Not
> the way it's typically done either but it would avoid a global change
> in the pci core while allowing us to deal with the firmware we have.

I'd be happy to heave less of this crap in nvme actually.  But I'm really
pissed this shit got out in the wild.  It wasn't clear from the mail
that this is something already out there because the idiots coming up
with it just went ahead with it.  Please just update the commit logs
and implementation to clearly mark it as a workaround for buggys
systems, which just happen to at least be nice enough to tell us that
they are buggy as f^$k.
David E. Box May 1, 2020, 3:54 p.m. UTC | #10
On Fri, 2020-05-01 at 15:12 +0200, hch@lst.de wrote:
> On Wed, Apr 29, 2020 at 09:11:13AM -0700, David E. Box wrote:
> > Not drop completely. This patch copied the code used to read _DSD
> > properties under PCI root ports. But I agree that such properties
> > should apply to all devices on those ports and unfortuntely that's
> > not
> > the case here. BIOS got it wrong. My thought in dropping this patch
> > is
> > to rewrite it to read the property directly from the nvme driver.
> > Not
> > the way it's typically done either but it would avoid a global
> > change
> > in the pci core while allowing us to deal with the firmware we
> > have.
> 
> I'd be happy to heave less of this crap in nvme actually.  But I'm
> really
> pissed this shit got out in the wild.  It wasn't clear from the mail
> that this is something already out there because the idiots coming up
> with it just went ahead with it.  Please just update the commit logs
> and implementation to clearly mark it as a workaround for buggys
> systems, which just happen to at least be nice enough to tell us that
> they are buggy as f^$k.

Sure.
David Woodhouse May 18, 2020, 1:51 p.m. UTC | #11
On Wed, 2020-04-29 at 05:20 +0000, Williams, Dan J wrote:
> The *patch* is not trying to overrule NVME, and the best I can say is
> that the Intel Linux team was not in the loop when this was being
> decided between the platform BIOS implemenation and  whomever  thought
> they could just publish random ACPI properties that impacted NVME
> operation [1].
> 
> So now David is trying to get these platform unbroken because they are
> already shipping with this b0rkage.

This is what we have WARN_TAINT() for though, right? It can suitably
warn users when such breakage is detected in the platform.
Dan Williams May 18, 2020, 5:20 p.m. UTC | #12
On Mon, May 18, 2020 at 6:52 AM David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Wed, 2020-04-29 at 05:20 +0000, Williams, Dan J wrote:
> > The *patch* is not trying to overrule NVME, and the best I can say is
> > that the Intel Linux team was not in the loop when this was being
> > decided between the platform BIOS implemenation and  whomever  thought
> > they could just publish random ACPI properties that impacted NVME
> > operation [1].
> >
> > So now David is trying to get these platform unbroken because they are
> > already shipping with this b0rkage.
>
> This is what we have WARN_TAINT() for though, right? It can suitably
> warn users when such breakage is detected in the platform.
>

I see WARN_TAINT() as "BIOS implemented its specification wrong". This
case is BIOS "implemented a mechanism in the wrong specification".