diff mbox series

[v11,9/9] PCI: ACPI: Use device constraints to decide PCI target state fallback policy

Message ID 20230809185453.40916-10-mario.limonciello@amd.com (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series Fix wakeup problems on some AMD platforms | expand

Commit Message

Mario Limonciello Aug. 9, 2023, 6:54 p.m. UTC
Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
PCIe ports from modern machines (>=2015) are allowed to be put into D3 by
storing a value to the `bridge_d3` variable in the `struct pci_dev`
structure.

pci_power_manageable() uses this variable to indicate a PCIe port can
enter D3.
pci_pm_suspend_noirq() uses the return from pci_power_manageable() to
decide whether to try to put a device into its target state for a sleep
cycle via pci_prepare_to_sleep().

For devices that support D3, the target state is selected by this policy:
1. If platform_pci_power_manageable():
   Use platform_pci_choose_state()
2. If the device is armed for wakeup:
   Select the deepest D-state that supports a PME.
3. Else:
   Use D3hot.

Devices are considered power manageable by the platform when they have
one or more objects described in the table in section 7.3 of the ACPI 6.5
specification.

When devices are not considered power manageable; specs are ambiguous as
to what should happen.  In this situation Windows 11 leaves PCIe
ports in D0 while Linux puts them into D3 due to the above mentioned
commit.

In Windows systems that support Modern Standby specify hardware
pre-conditions for the SoC to achieve the lowest power state by device
constraints in a SOC specific "Power Engine Plugin" (PEP) [2] [3].
They can be marked as disabled or enabled and when enabled can specify
the minimum power state required for an ACPI device.

When it is ambiguous what should happen, adjust the logic for
pci_target_state() to check whether a device constraint is present
and enabled.
* If power manageable by ACPI use this to get to select target state
* If a device constraint is present but disabled then choose D0
* If a device constraint is present and enabled then use it
* If a device constraint is not present, then continue to existing
logic (if marked for wakeup use deepest state that PME works)
* If not marked for wakeup choose D3hot

Link: https://uefi.org/specs/ACPI/6.5/07_Power_and_Performance_Mgmt.html#device-power-management-objects [1]
Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/platform-design-for-modern-standby#low-power-core-silicon-cpu-soc-dram [2]
Link: https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf [3]
Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
Reported-by: Iain Lane <iain@orangesquash.org.uk>
Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
v10->v11:
 * Fix kernel kernel build robot errors and various sparse warnings
   related to new handling of pci_power_t.
 * Use the new helpers introduced in previous patches
---
 drivers/pci/pci-acpi.c | 12 ++++++++++++
 drivers/pci/pci.c      | 17 ++++++++++++++++-
 drivers/pci/pci.h      |  5 +++++
 3 files changed, 33 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Aug. 15, 2023, 11:48 p.m. UTC | #1
On Wed, Aug 09, 2023 at 01:54:53PM -0500, Mario Limonciello wrote:
> Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> PCIe ports from modern machines (>=2015) are allowed to be put into D3 by
> storing a value to the `bridge_d3` variable in the `struct pci_dev`
> structure.
> 
> pci_power_manageable() uses this variable to indicate a PCIe port can
> enter D3.
> pci_pm_suspend_noirq() uses the return from pci_power_manageable() to
> decide whether to try to put a device into its target state for a sleep
> cycle via pci_prepare_to_sleep().
> 
> For devices that support D3, the target state is selected by this policy:
> 1. If platform_pci_power_manageable():
>    Use platform_pci_choose_state()
> 2. If the device is armed for wakeup:
>    Select the deepest D-state that supports a PME.
> 3. Else:
>    Use D3hot.
> 
> Devices are considered power manageable by the platform when they have
> one or more objects described in the table in section 7.3 of the ACPI 6.5
> specification.
> 
> When devices are not considered power manageable; specs are ambiguous as
> to what should happen.  In this situation Windows 11 leaves PCIe
> ports in D0 while Linux puts them into D3 due to the above mentioned
> commit.

Why would we not use the same policy as Windows 11?

> In Windows systems that support Modern Standby specify hardware
> pre-conditions for the SoC to achieve the lowest power state by device
> constraints in a SOC specific "Power Engine Plugin" (PEP) [2] [3].
> They can be marked as disabled or enabled and when enabled can specify
> the minimum power state required for an ACPI device.
> 
> When it is ambiguous what should happen, adjust the logic for
> pci_target_state() to check whether a device constraint is present
> and enabled.
> * If power manageable by ACPI use this to get to select target state
> * If a device constraint is present but disabled then choose D0
> * If a device constraint is present and enabled then use it
> * If a device constraint is not present, then continue to existing
> logic (if marked for wakeup use deepest state that PME works)
> * If not marked for wakeup choose D3hot

Apparently this is based on constraints from the _DSM method of a
PNP0D80 device (and Intel x86 and AMD have different _DSM methods)?

Isn't this something that needs to be part of the ACPI spec?  I don't
see PNP0D80 mentioned there.  Obviously we also have ARM64 and other
arches now using ACPI, so I'm not really comfortable with these bits
scrounged from Microsoft and Intel white papers.  That seems hard to
maintain in the future, and this is hard enough now.

> Link: https://uefi.org/specs/ACPI/6.5/07_Power_and_Performance_Mgmt.html#device-power-management-objects [1]
> Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/platform-design-for-modern-standby#low-power-core-silicon-cpu-soc-dram [2]
> Link: https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf [3]
> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> Reported-by: Iain Lane <iain@orangesquash.org.uk>
> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
> v10->v11:
>  * Fix kernel kernel build robot errors and various sparse warnings
>    related to new handling of pci_power_t.
>  * Use the new helpers introduced in previous patches
> ---
>  drivers/pci/pci-acpi.c | 12 ++++++++++++
>  drivers/pci/pci.c      | 17 ++++++++++++++++-
>  drivers/pci/pci.h      |  5 +++++
>  3 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index b5b65cdfa3b8b..9f418ec87b6a5 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -1081,6 +1081,18 @@ bool acpi_pci_bridge_d3(struct pci_dev *dev)
>  	return false;
>  }
>  
> +/**
> + * acpi_pci_device_constraint - return any PCI power state constraints
> + * @dev: PCI device to check
> + *
> + * Returns: Any valid constraint specified by platform for a device.
> + * Otherwise PCI_POWER_ERROR.
> + */
> +pci_power_t acpi_pci_device_constraint(struct pci_dev *dev)
> +{
> +	return map_acpi_to_pci(acpi_get_lps0_constraint(&dev->dev));
> +}
> +
>  static void acpi_pci_config_space_access(struct pci_dev *dev, bool enable)
>  {
>  	int val = enable ? ACPI_REG_CONNECT : ACPI_REG_DISCONNECT;
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 693f4ca90452b..567443726974b 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1082,6 +1082,14 @@ static inline bool platform_pci_bridge_d3(struct pci_dev *dev)
>  	return acpi_pci_bridge_d3(dev);
>  }
>  
> +static inline pci_power_t platform_get_constraint(struct pci_dev *dev)
> +{
> +	if (pci_use_mid_pm())
> +		return PCI_POWER_ERROR;
> +
> +	return acpi_pci_device_constraint(dev);
> +}
> +
>  /**
>   * pci_update_current_state - Read power state of given device and cache it
>   * @dev: PCI device to handle.
> @@ -2685,11 +2693,13 @@ static inline pci_power_t pci_get_wake_pme_state(struct pci_dev *dev)
>   */
>  static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
>  {
> +	pci_power_t state;
> +
>  	if (platform_pci_power_manageable(dev)) {
>  		/*
>  		 * Call the platform to find the target state for the device.
>  		 */
> -		pci_power_t state = platform_pci_choose_state(dev);
> +		state = platform_pci_choose_state(dev);
>  
>  		switch (state) {
>  		case PCI_POWER_ERROR:
> @@ -2715,6 +2725,11 @@ static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
>  	else if (!dev->pm_cap)
>  		return PCI_D0;
>  
> +	/* if platform indicates preferred state device constraint, use it */
> +	state = platform_get_constraint(dev);
> +	if (state != PCI_POWER_ERROR)
> +		return state;
> +
>  	if (wakeup && dev->pme_support)
>  		return pci_get_wake_pme_state(dev);
>  
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index a4c3974340576..410fca4b88837 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -707,6 +707,7 @@ void pci_set_acpi_fwnode(struct pci_dev *dev);
>  int pci_dev_acpi_reset(struct pci_dev *dev, bool probe);
>  bool acpi_pci_power_manageable(struct pci_dev *dev);
>  bool acpi_pci_bridge_d3(struct pci_dev *dev);
> +pci_power_t acpi_pci_device_constraint(struct pci_dev *dev);
>  int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state);
>  pci_power_t acpi_pci_get_power_state(struct pci_dev *dev);
>  void acpi_pci_refresh_power_state(struct pci_dev *dev);
> @@ -731,6 +732,10 @@ static inline bool acpi_pci_bridge_d3(struct pci_dev *dev)
>  {
>  	return false;
>  }
> +static inline pci_power_t acpi_pci_device_constraint(struct pci_dev *dev)
> +{
> +	return PCI_POWER_ERROR;
> +}
>  static inline int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
>  {
>  	return -ENODEV;
> -- 
> 2.34.1
>
Mario Limonciello Aug. 16, 2023, 12:57 p.m. UTC | #2
On 8/15/2023 6:48 PM, Bjorn Helgaas wrote:
> On Wed, Aug 09, 2023 at 01:54:53PM -0500, Mario Limonciello wrote:
>> Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>> PCIe ports from modern machines (>=2015) are allowed to be put into D3 by
>> storing a value to the `bridge_d3` variable in the `struct pci_dev`
>> structure.
>>
>> pci_power_manageable() uses this variable to indicate a PCIe port can
>> enter D3.
>> pci_pm_suspend_noirq() uses the return from pci_power_manageable() to
>> decide whether to try to put a device into its target state for a sleep
>> cycle via pci_prepare_to_sleep().
>>
>> For devices that support D3, the target state is selected by this policy:
>> 1. If platform_pci_power_manageable():
>>     Use platform_pci_choose_state()
>> 2. If the device is armed for wakeup:
>>     Select the deepest D-state that supports a PME.
>> 3. Else:
>>     Use D3hot.
>>
>> Devices are considered power manageable by the platform when they have
>> one or more objects described in the table in section 7.3 of the ACPI 6.5
>> specification.
>>
>> When devices are not considered power manageable; specs are ambiguous as
>> to what should happen.  In this situation Windows 11 leaves PCIe
>> ports in D0 while Linux puts them into D3 due to the above mentioned
>> commit.
> 
> Why would we not use the same policy as Windows 11?

That's what I'm aiming to do with my patch series.

> 
>> In Windows systems that support Modern Standby specify hardware
>> pre-conditions for the SoC to achieve the lowest power state by device
>> constraints in a SOC specific "Power Engine Plugin" (PEP) [2] [3].
>> They can be marked as disabled or enabled and when enabled can specify
>> the minimum power state required for an ACPI device.
>>
>> When it is ambiguous what should happen, adjust the logic for
>> pci_target_state() to check whether a device constraint is present
>> and enabled.
>> * If power manageable by ACPI use this to get to select target state
>> * If a device constraint is present but disabled then choose D0
>> * If a device constraint is present and enabled then use it
>> * If a device constraint is not present, then continue to existing
>> logic (if marked for wakeup use deepest state that PME works)
>> * If not marked for wakeup choose D3hot
> 
> Apparently this is based on constraints from the _DSM method of a
> PNP0D80 device (and Intel x86 and AMD have different _DSM methods)?
> 
> Isn't this something that needs to be part of the ACPI spec?  
 > I don't
 > see PNP0D80 mentioned there.

So PNP0D80 is a "Windows-compatible system power management controller"
See 
https://learn.microsoft.com/en-us/windows-hardware/drivers/bringup/device-management-namespace-objects#compatible-id-_cid

> Obviously we also have ARM64 and other
> arches now using ACPI

Let me quote a few select sections of [4] that I want to draw attention to:

"Devices that are integrated into the SoC can be power-managed through 
the Windows Power Framework (PoFx). These framework-integrated devices 
are power-managed by PoFx through a SoC-specific power engine plug-in 
(microPEP) that knows the specifics of the SoC's power and clock controls.
...
For peripheral devices that aren't integrated into the SoC, Windows uses 
ACPI Device Power Management.
...
It is possible to, and some device stacks do, use ACPI Device Power 
Management alone, or in combination with the microPEP for on-SoC device 
power management.
"

[4] 
https://learn.microsoft.com/en-us/windows-hardware/drivers/bringup/device-power-management#device-power-management-in-windows

In Linux we call the device that supports PNP0D80 the "LPS0 device", but 
in Windows they call it uPEP.  In Windows the SOC vendor provides this 
driver and it supports the _DSM calls for the vendor.

As part of going into Modern Standby on Windows that uPEP driver 
orchestrates the state of the devices mentioned by constraints.

I'd like to think that's exactly what this patch is doing; it's giving 
the ability to select the target state for SOC specified constraints to 
the LPS0 driver.

Looking at the intertwined function calls again, I wonder if maybe it's 
better to move the constraint checking into pci_prepare_to_sleep().

That makes it explicitly clear that LPS0 constraints shouldn't be used 
for anything other than suspend rather than also having implications 
into runtime PM.

As for your comment for ARM64, I think if they use Windows uPEP 
constraints we can add an PNP0D80 compatible LPS0 driver for them too 
just the same.

 > so I'm not really comfortable with these bits
 > scrounged from Microsoft and Intel white papers.  That seems hard to
 > maintain in the future, and this is hard enough now.

Remember this all comes back to a PCI root port in the SOC being put 
into an unexpected D-state over suspend.  The device is not ACPI power 
manageable so *that behavior* is up to the OSPM and is not grounded in 
any ACPI or PCI spec.

TBH I think the Microsoft documentation is the best we're going to get 
here for this case.  To be compatible with UEFI firmware designed for 
Windows machines you need to adopt some Windows-isms.

If this was coreboot, I would tell the coreboot developers to mark this
root port as ACPI power manageable and adjust the _SxD and _SxW objects 
so that this root port couldn't get into D3.

>> Link: https://uefi.org/specs/ACPI/6.5/07_Power_and_Performance_Mgmt.html#device-power-management-objects [1]
>> Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/platform-design-for-modern-standby#low-power-core-silicon-cpu-soc-dram [2]
>> Link: https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf [3]
>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>> Reported-by: Iain Lane <iain@orangesquash.org.uk>
>> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>> ---
>> v10->v11:
>>   * Fix kernel kernel build robot errors and various sparse warnings
>>     related to new handling of pci_power_t.
>>   * Use the new helpers introduced in previous patches
>> ---
>>   drivers/pci/pci-acpi.c | 12 ++++++++++++
>>   drivers/pci/pci.c      | 17 ++++++++++++++++-
>>   drivers/pci/pci.h      |  5 +++++
>>   3 files changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
>> index b5b65cdfa3b8b..9f418ec87b6a5 100644
>> --- a/drivers/pci/pci-acpi.c
>> +++ b/drivers/pci/pci-acpi.c
>> @@ -1081,6 +1081,18 @@ bool acpi_pci_bridge_d3(struct pci_dev *dev)
>>   	return false;
>>   }
>>   
>> +/**
>> + * acpi_pci_device_constraint - return any PCI power state constraints
>> + * @dev: PCI device to check
>> + *
>> + * Returns: Any valid constraint specified by platform for a device.
>> + * Otherwise PCI_POWER_ERROR.
>> + */
>> +pci_power_t acpi_pci_device_constraint(struct pci_dev *dev)
>> +{
>> +	return map_acpi_to_pci(acpi_get_lps0_constraint(&dev->dev));
>> +}
>> +
>>   static void acpi_pci_config_space_access(struct pci_dev *dev, bool enable)
>>   {
>>   	int val = enable ? ACPI_REG_CONNECT : ACPI_REG_DISCONNECT;
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 693f4ca90452b..567443726974b 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1082,6 +1082,14 @@ static inline bool platform_pci_bridge_d3(struct pci_dev *dev)
>>   	return acpi_pci_bridge_d3(dev);
>>   }
>>   
>> +static inline pci_power_t platform_get_constraint(struct pci_dev *dev)
>> +{
>> +	if (pci_use_mid_pm())
>> +		return PCI_POWER_ERROR;
>> +
>> +	return acpi_pci_device_constraint(dev);
>> +}
>> +
>>   /**
>>    * pci_update_current_state - Read power state of given device and cache it
>>    * @dev: PCI device to handle.
>> @@ -2685,11 +2693,13 @@ static inline pci_power_t pci_get_wake_pme_state(struct pci_dev *dev)
>>    */
>>   static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
>>   {
>> +	pci_power_t state;
>> +
>>   	if (platform_pci_power_manageable(dev)) {
>>   		/*
>>   		 * Call the platform to find the target state for the device.
>>   		 */
>> -		pci_power_t state = platform_pci_choose_state(dev);
>> +		state = platform_pci_choose_state(dev);
>>   
>>   		switch (state) {
>>   		case PCI_POWER_ERROR:
>> @@ -2715,6 +2725,11 @@ static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
>>   	else if (!dev->pm_cap)
>>   		return PCI_D0;
>>   
>> +	/* if platform indicates preferred state device constraint, use it */
>> +	state = platform_get_constraint(dev);
>> +	if (state != PCI_POWER_ERROR)
>> +		return state;
>> +
>>   	if (wakeup && dev->pme_support)
>>   		return pci_get_wake_pme_state(dev);
>>   
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index a4c3974340576..410fca4b88837 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -707,6 +707,7 @@ void pci_set_acpi_fwnode(struct pci_dev *dev);
>>   int pci_dev_acpi_reset(struct pci_dev *dev, bool probe);
>>   bool acpi_pci_power_manageable(struct pci_dev *dev);
>>   bool acpi_pci_bridge_d3(struct pci_dev *dev);
>> +pci_power_t acpi_pci_device_constraint(struct pci_dev *dev);
>>   int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state);
>>   pci_power_t acpi_pci_get_power_state(struct pci_dev *dev);
>>   void acpi_pci_refresh_power_state(struct pci_dev *dev);
>> @@ -731,6 +732,10 @@ static inline bool acpi_pci_bridge_d3(struct pci_dev *dev)
>>   {
>>   	return false;
>>   }
>> +static inline pci_power_t acpi_pci_device_constraint(struct pci_dev *dev)
>> +{
>> +	return PCI_POWER_ERROR;
>> +}
>>   static inline int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
>>   {
>>   	return -ENODEV;
>> -- 
>> 2.34.1
>>
Bjorn Helgaas Aug. 16, 2023, 10:38 p.m. UTC | #3
[I see that you just posted a v12 that doesn't touch drivers/pci at
all.  I haven't looked at it yet, so maybe my questions/comments below
are no longer relevant.]

On Wed, Aug 16, 2023 at 07:57:52AM -0500, Limonciello, Mario wrote:
> On 8/15/2023 6:48 PM, Bjorn Helgaas wrote:
> > On Wed, Aug 09, 2023 at 01:54:53PM -0500, Mario Limonciello wrote:
> > > Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> > > PCIe ports from modern machines (>=2015) are allowed to be put into D3 by
> > > storing a value to the `bridge_d3` variable in the `struct pci_dev`
> > > structure.
> > > 
> > > pci_power_manageable() uses this variable to indicate a PCIe port can
> > > enter D3.
> > > pci_pm_suspend_noirq() uses the return from pci_power_manageable() to
> > > decide whether to try to put a device into its target state for a sleep
> > > cycle via pci_prepare_to_sleep().
> > > 
> > > For devices that support D3, the target state is selected by this policy:
> > > 1. If platform_pci_power_manageable():
> > >     Use platform_pci_choose_state()
> > > 2. If the device is armed for wakeup:
> > >     Select the deepest D-state that supports a PME.
> > > 3. Else:
> > >     Use D3hot.
> > > 
> > > Devices are considered power manageable by the platform when they have
> > > one or more objects described in the table in section 7.3 of the ACPI 6.5
> > > specification.
> > > 
> > > When devices are not considered power manageable; specs are ambiguous as
> > > to what should happen.  In this situation Windows 11 leaves PCIe
> > > ports in D0 while Linux puts them into D3 due to the above mentioned
> > > commit.
> > 
> > Why would we not use the same policy as Windows 11?
> 
> That's what I'm aiming to do with my patch series.

OK, help me out because I think I have a hint of how this works, but
I'm still really confused.  Here's the sort of commit log I envision
(but I know it's all wrong, so help me out):

  Iain reported that the Lenovo Thinkpad Z13 suspends but activity on
  plugged-in USB devices will not cause a resume.  The resume failed
  because the Root Port leading to the USB adapter was in D3hot, and
  ... maybe firmware can't identify the wakeup source?  I dunno?

  The Z13 supplies an ACPI PNP0D80 System Power Management Controller
  with a "Get Device Constraints" _DSM method that says the Root Port
  should not be put into a lower power state than D0 ??

[This is one thing that seems completely wrong.  The _DSM spec says
the device must be in the specified or DEEPER D-state to achieve the
lowest power platform idle state.  That seems backwards.]

  For power-manageable devices (those with _PS0 or _PR0, per ACPI
  r6.5, sec 7.3), platform_pci_choose_state() selects the low-power
  state based on the device's _SxW, _SxD, and _PRW methods.

  For non-power manageable devices (those without _PS0 or _PR0), ACPI
  doesn't help decide the low-power state.

[This confuses me too.  Even here pci_set_power_state() uses
pci_platform_power_transition(), i.e., ACPI, but I guess this only
uses _OFF/_ON and doesn't count as "power managed"?

Maybe that makes sense since _OFF/_ON are part of sec 7.2 ("Declaring
a Power Resource Object", which is separate from *7.3* ("Device Power
Management Objects") where _PR0, _PS0, _PRW, _SxD, _SxW, etc are.]

  However, extensions like the ACPI PNP0D80 System Power Management
  Controller device can provide constraints on the allowable low-power
  states.

  Add platform_get_constraint() so pci_target_state() can discover
  constraints from these extensions and avoid choosing a deeper
  low-power state than the platform allows.

[Again, this *seems* backwards to how that _DSM is documented.]

> > > In Windows systems that support Modern Standby specify hardware
> > > pre-conditions for the SoC to achieve the lowest power state by device
> > > constraints in a SOC specific "Power Engine Plugin" (PEP) [2] [3].
> > > They can be marked as disabled or enabled and when enabled can specify
> > > the minimum power state required for an ACPI device.
> > > 
> > > When it is ambiguous what should happen, adjust the logic for
> > > pci_target_state() to check whether a device constraint is present
> > > and enabled.
> > > * If power manageable by ACPI use this to get to select target state
> > > * If a device constraint is present but disabled then choose D0
> > > * If a device constraint is present and enabled then use it
> > > * If a device constraint is not present, then continue to existing
> > > logic (if marked for wakeup use deepest state that PME works)
> > > * If not marked for wakeup choose D3hot
> > 
> > Apparently this is based on constraints from the _DSM method of a
> > PNP0D80 device (and Intel x86 and AMD have different _DSM methods)?
> > 
> > Isn't this something that needs to be part of the ACPI spec?  I
> > don't see PNP0D80 mentioned there.
> 
> So PNP0D80 is a "Windows-compatible system power management controller"
> See https://learn.microsoft.com/en-us/windows-hardware/drivers/bringup/device-management-namespace-objects#compatible-id-_cid
> 
> > Obviously we also have ARM64 and other arches now using ACPI
> 
> Let me quote a few select sections of [4] that I want to draw attention to:
> 
> "Devices that are integrated into the SoC can be power-managed through the
> Windows Power Framework (PoFx). These framework-integrated devices are
> power-managed by PoFx through a SoC-specific power engine plug-in (microPEP)
> that knows the specifics of the SoC's power and clock controls.
> ...
> For peripheral devices that aren't integrated into the SoC, Windows uses
> ACPI Device Power Management.
> ...
> It is possible to, and some device stacks do, use ACPI Device Power
> Management alone, or in combination with the microPEP for on-SoC device
> power management.
> "
> 
> [4] https://learn.microsoft.com/en-us/windows-hardware/drivers/bringup/device-power-management#device-power-management-in-windows
> 
> In Linux we call the device that supports PNP0D80 the "LPS0 device", but in
> Windows they call it uPEP.  In Windows the SOC vendor provides this driver
> and it supports the _DSM calls for the vendor.
> 
> As part of going into Modern Standby on Windows that uPEP driver
> orchestrates the state of the devices mentioned by constraints.
> 
> I'd like to think that's exactly what this patch is doing; it's giving the
> ability to select the target state for SOC specified constraints to the LPS0
> driver.
> 
> Looking at the intertwined function calls again, I wonder if maybe it's
> better to move the constraint checking into pci_prepare_to_sleep().
> 
> That makes it explicitly clear that LPS0 constraints shouldn't be used for
> anything other than suspend rather than also having implications into
> runtime PM.
> 
> As for your comment for ARM64, I think if they use Windows uPEP constraints
> we can add an PNP0D80 compatible LPS0 driver for them too just the same.
> 
> > so I'm not really comfortable with these bits scrounged from
> > Microsoft and Intel white papers.  That seems hard to maintain in
> > the future, and this is hard enough now.
> 
> Remember this all comes back to a PCI root port in the SOC being put
> into an unexpected D-state over suspend.  The device is not ACPI
> power manageable so *that behavior* is up to the OSPM and is not
> grounded in any ACPI or PCI spec.
> 
> TBH I think the Microsoft documentation is the best we're going to
> get here for this case.  To be compatible with UEFI firmware
> designed for Windows machines you need to adopt some Windows-isms.
> 
> If this was coreboot, I would tell the coreboot developers to mark
> this root port as ACPI power manageable and adjust the _SxD and _SxW
> objects so that this root port couldn't get into D3.
> 
> > > Link: https://uefi.org/specs/ACPI/6.5/07_Power_and_Performance_Mgmt.html#device-power-management-objects [1]
> > > Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/platform-design-for-modern-standby#low-power-core-silicon-cpu-soc-dram [2]
> > > Link: https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf [3]

This covers the Intel PNP0D80 _DSM.  But
lpi_device_get_constraints_amd() implies that there's a similar but
different AMD version?  Is there a published spec for the AMD _DSM?

> > > Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> > > Reported-by: Iain Lane <iain@orangesquash.org.uk>
> > > Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
> > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> > > ---
> > > v10->v11:
> > >   * Fix kernel kernel build robot errors and various sparse warnings
> > >     related to new handling of pci_power_t.
> > >   * Use the new helpers introduced in previous patches
> > > ---
> > >   drivers/pci/pci-acpi.c | 12 ++++++++++++
> > >   drivers/pci/pci.c      | 17 ++++++++++++++++-
> > >   drivers/pci/pci.h      |  5 +++++
> > >   3 files changed, 33 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > > index b5b65cdfa3b8b..9f418ec87b6a5 100644
> > > --- a/drivers/pci/pci-acpi.c
> > > +++ b/drivers/pci/pci-acpi.c
> > > @@ -1081,6 +1081,18 @@ bool acpi_pci_bridge_d3(struct pci_dev *dev)
> > >   	return false;
> > >   }
> > > +/**
> > > + * acpi_pci_device_constraint - return any PCI power state constraints
> > > + * @dev: PCI device to check
> > > + *
> > > + * Returns: Any valid constraint specified by platform for a device.
> > > + * Otherwise PCI_POWER_ERROR.
> > > + */
> > > +pci_power_t acpi_pci_device_constraint(struct pci_dev *dev)
> > > +{
> > > +	return map_acpi_to_pci(acpi_get_lps0_constraint(&dev->dev));

The non-x86 acpi_get_lps0_constraint() stub returns -ENODEV, which
doesn't seem quite right because map_acpi_to_pci() assumes it gets
ACPI_STATE_D0, etc.

I think it happens to work out fine because if it gets something
that's not ACPI_STATE_*, map_acpi_to_pci() returns PCI_POWER_ERROR,
but that's unholy knowledge of the ACPI_STATE_* values.

> > > +}
> > > +
> > >   static void acpi_pci_config_space_access(struct pci_dev *dev, bool enable)
> > >   {
> > >   	int val = enable ? ACPI_REG_CONNECT : ACPI_REG_DISCONNECT;
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 693f4ca90452b..567443726974b 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -1082,6 +1082,14 @@ static inline bool platform_pci_bridge_d3(struct pci_dev *dev)
> > >   	return acpi_pci_bridge_d3(dev);
> > >   }
> > > +static inline pci_power_t platform_get_constraint(struct pci_dev *dev)
> > > +{
> > > +	if (pci_use_mid_pm())
> > > +		return PCI_POWER_ERROR;
> > > +
> > > +	return acpi_pci_device_constraint(dev);
> > > +}
> > > +
> > >   /**
> > >    * pci_update_current_state - Read power state of given device and cache it
> > >    * @dev: PCI device to handle.
> > > @@ -2685,11 +2693,13 @@ static inline pci_power_t pci_get_wake_pme_state(struct pci_dev *dev)
> > >    */
> > >   static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
> > >   {
> > > +	pci_power_t state;
> > > +
> > >   	if (platform_pci_power_manageable(dev)) {
> > >   		/*
> > >   		 * Call the platform to find the target state for the device.
> > >   		 */
> > > -		pci_power_t state = platform_pci_choose_state(dev);
> > > +		state = platform_pci_choose_state(dev);
> > >   		switch (state) {
> > >   		case PCI_POWER_ERROR:
> > > @@ -2715,6 +2725,11 @@ static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
> > >   	else if (!dev->pm_cap)
> > >   		return PCI_D0;
> > > +	/* if platform indicates preferred state device constraint, use it */
> > > +	state = platform_get_constraint(dev);
> > > +	if (state != PCI_POWER_ERROR)
> > > +		return state;
> > > +
> > >   	if (wakeup && dev->pme_support)
> > >   		return pci_get_wake_pme_state(dev);
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index a4c3974340576..410fca4b88837 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -707,6 +707,7 @@ void pci_set_acpi_fwnode(struct pci_dev *dev);
> > >   int pci_dev_acpi_reset(struct pci_dev *dev, bool probe);
> > >   bool acpi_pci_power_manageable(struct pci_dev *dev);
> > >   bool acpi_pci_bridge_d3(struct pci_dev *dev);
> > > +pci_power_t acpi_pci_device_constraint(struct pci_dev *dev);
> > >   int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state);
> > >   pci_power_t acpi_pci_get_power_state(struct pci_dev *dev);
> > >   void acpi_pci_refresh_power_state(struct pci_dev *dev);
> > > @@ -731,6 +732,10 @@ static inline bool acpi_pci_bridge_d3(struct pci_dev *dev)
> > >   {
> > >   	return false;
> > >   }
> > > +static inline pci_power_t acpi_pci_device_constraint(struct pci_dev *dev)
> > > +{
> > > +	return PCI_POWER_ERROR;
> > > +}
> > >   static inline int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
> > >   {
> > >   	return -ENODEV;
> > > -- 
> > > 2.34.1
> > >
Mario Limonciello Aug. 17, 2023, 1:26 a.m. UTC | #4
On 8/16/2023 5:38 PM, Bjorn Helgaas wrote:
> [I see that you just posted a v12 that doesn't touch drivers/pci at
> all.  I haven't looked at it yet, so maybe my questions/comments below
> are no longer relevant.]

I'm not married to either approach but I think that you'll like the v12 
approach better.

Let me try to answer your questions anyway though because I think 
they're still applicable for understanding of this issue.

> 
> On Wed, Aug 16, 2023 at 07:57:52AM -0500, Limonciello, Mario wrote:
>> On 8/15/2023 6:48 PM, Bjorn Helgaas wrote:
>>> On Wed, Aug 09, 2023 at 01:54:53PM -0500, Mario Limonciello wrote:
>>>> Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>> PCIe ports from modern machines (>=2015) are allowed to be put into D3 by
>>>> storing a value to the `bridge_d3` variable in the `struct pci_dev`
>>>> structure.
>>>>
>>>> pci_power_manageable() uses this variable to indicate a PCIe port can
>>>> enter D3.
>>>> pci_pm_suspend_noirq() uses the return from pci_power_manageable() to
>>>> decide whether to try to put a device into its target state for a sleep
>>>> cycle via pci_prepare_to_sleep().
>>>>
>>>> For devices that support D3, the target state is selected by this policy:
>>>> 1. If platform_pci_power_manageable():
>>>>      Use platform_pci_choose_state()
>>>> 2. If the device is armed for wakeup:
>>>>      Select the deepest D-state that supports a PME.
>>>> 3. Else:
>>>>      Use D3hot.
>>>>
>>>> Devices are considered power manageable by the platform when they have
>>>> one or more objects described in the table in section 7.3 of the ACPI 6.5
>>>> specification.
>>>>
>>>> When devices are not considered power manageable; specs are ambiguous as
>>>> to what should happen.  In this situation Windows 11 leaves PCIe
>>>> ports in D0 while Linux puts them into D3 due to the above mentioned
>>>> commit.
>>>
>>> Why would we not use the same policy as Windows 11?
>>
>> That's what I'm aiming to do with my patch series.
> 
> OK, help me out because I think I have a hint of how this works, but
> I'm still really confused.  Here's the sort of commit log I envision
> (but I know it's all wrong, so help me out):

I was intentionally trying to leave the actual problem out of the commit 
from your earlier feedback and just put it in the cover letter.

But if it's better to keep in the commit message I'll return those details.

> 
>    Iain reported that the Lenovo Thinkpad Z13 suspends but activity on
>    plugged-in USB devices will not cause a resume.  The resume failed
>    because the Root Port leading to the USB adapter was in D3hot, and
>    ... maybe firmware can't identify the wakeup source?  I dunno?
> 

I would just say due to the platform behavior.

>    The Z13 supplies an ACPI PNP0D80 System Power Management Controller
>    with a "Get Device Constraints" _DSM method that says the Root Port
>    should not be put into a lower power state than D0 ??
> 
> [This is one thing that seems completely wrong.  The _DSM spec says
> the device must be in the specified or DEEPER D-state to achieve the
> lowest power platform idle state.  That seems backwards.]
> 

Yeah I see the confusion here and this is why it all ties back the 
policy that Linux uses for deciding if something can use D3hot or not.

The constraint is "not enabled" which means the OS shouldn't be putting 
the device it into a low power state.

Windows without ACPI and without the uPEP constraints saying what to do 
just leaves the device alone at suspend.  I believe that's why we get D0 
in Windows.

But Linux decides that anything >=2015 it should support D3hot, and 
anything that isn't ACPI power manageable should use D3hot.

We can't just rip that out because there are a number of Intel platforms 
that don't ship with Windows and rely upon this specific behavior to get 
the SoC into a low power state.


So what I do instead is look at the constraint not being enabled and use 
that to decide to return it to D0 at suspend.


>    For power-manageable devices (those with _PS0 or _PR0, per ACPI
>    r6.5, sec 7.3), platform_pci_choose_state() selects the low-power
>    state based on the device's _SxW, _SxD, and _PRW methods.
> 
>    For non-power manageable devices (those without _PS0 or _PR0), ACPI
>    doesn't help decide the low-power state.
> 
> [This confuses me too.  Even here pci_set_power_state() uses
> pci_platform_power_transition(), i.e., ACPI, but I guess this only
> uses _OFF/_ON and doesn't count as "power managed"?
> 
> Maybe that makes sense since _OFF/_ON are part of sec 7.2 ("Declaring
> a Power Resource Object", which is separate from *7.3* ("Device Power
> Management Objects") where _PR0, _PS0, _PRW, _SxD, _SxW, etc are.]
> 
>    However, extensions like the ACPI PNP0D80 System Power Management
>    Controller device can provide constraints on the allowable low-power
>    states.
> 
>    Add platform_get_constraint() so pci_target_state() can discover
>    constraints from these extensions and avoid choosing a deeper
>    low-power state than the platform allows.
> 
> [Again, this *seems* backwards to how that _DSM is documented.]
> 
>>>> In Windows systems that support Modern Standby specify hardware
>>>> pre-conditions for the SoC to achieve the lowest power state by device
>>>> constraints in a SOC specific "Power Engine Plugin" (PEP) [2] [3].
>>>> They can be marked as disabled or enabled and when enabled can specify
>>>> the minimum power state required for an ACPI device.
>>>>
>>>> When it is ambiguous what should happen, adjust the logic for
>>>> pci_target_state() to check whether a device constraint is present
>>>> and enabled.
>>>> * If power manageable by ACPI use this to get to select target state
>>>> * If a device constraint is present but disabled then choose D0
>>>> * If a device constraint is present and enabled then use it
>>>> * If a device constraint is not present, then continue to existing
>>>> logic (if marked for wakeup use deepest state that PME works)
>>>> * If not marked for wakeup choose D3hot
>>>
>>> Apparently this is based on constraints from the _DSM method of a
>>> PNP0D80 device (and Intel x86 and AMD have different _DSM methods)?
>>>
>>> Isn't this something that needs to be part of the ACPI spec?  I
>>> don't see PNP0D80 mentioned there.
>>
>> So PNP0D80 is a "Windows-compatible system power management controller"
>> See https://learn.microsoft.com/en-us/windows-hardware/drivers/bringup/device-management-namespace-objects#compatible-id-_cid
>>
>>> Obviously we also have ARM64 and other arches now using ACPI
>>
>> Let me quote a few select sections of [4] that I want to draw attention to:
>>
>> "Devices that are integrated into the SoC can be power-managed through the
>> Windows Power Framework (PoFx). These framework-integrated devices are
>> power-managed by PoFx through a SoC-specific power engine plug-in (microPEP)
>> that knows the specifics of the SoC's power and clock controls.
>> ...
>> For peripheral devices that aren't integrated into the SoC, Windows uses
>> ACPI Device Power Management.
>> ...
>> It is possible to, and some device stacks do, use ACPI Device Power
>> Management alone, or in combination with the microPEP for on-SoC device
>> power management.
>> "
>>
>> [4] https://learn.microsoft.com/en-us/windows-hardware/drivers/bringup/device-power-management#device-power-management-in-windows
>>
>> In Linux we call the device that supports PNP0D80 the "LPS0 device", but in
>> Windows they call it uPEP.  In Windows the SOC vendor provides this driver
>> and it supports the _DSM calls for the vendor.
>>
>> As part of going into Modern Standby on Windows that uPEP driver
>> orchestrates the state of the devices mentioned by constraints.
>>
>> I'd like to think that's exactly what this patch is doing; it's giving the
>> ability to select the target state for SOC specified constraints to the LPS0
>> driver.
>>
>> Looking at the intertwined function calls again, I wonder if maybe it's
>> better to move the constraint checking into pci_prepare_to_sleep().
>>
>> That makes it explicitly clear that LPS0 constraints shouldn't be used for
>> anything other than suspend rather than also having implications into
>> runtime PM.
>>
>> As for your comment for ARM64, I think if they use Windows uPEP constraints
>> we can add an PNP0D80 compatible LPS0 driver for them too just the same.
>>
>>> so I'm not really comfortable with these bits scrounged from
>>> Microsoft and Intel white papers.  That seems hard to maintain in
>>> the future, and this is hard enough now.
>>
>> Remember this all comes back to a PCI root port in the SOC being put
>> into an unexpected D-state over suspend.  The device is not ACPI
>> power manageable so *that behavior* is up to the OSPM and is not
>> grounded in any ACPI or PCI spec.
>>
>> TBH I think the Microsoft documentation is the best we're going to
>> get here for this case.  To be compatible with UEFI firmware
>> designed for Windows machines you need to adopt some Windows-isms.
>>
>> If this was coreboot, I would tell the coreboot developers to mark
>> this root port as ACPI power manageable and adjust the _SxD and _SxW
>> objects so that this root port couldn't get into D3.
>>
>>>> Link: https://uefi.org/specs/ACPI/6.5/07_Power_and_Performance_Mgmt.html#device-power-management-objects [1]
>>>> Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/platform-design-for-modern-standby#low-power-core-silicon-cpu-soc-dram [2]
>>>> Link: https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf [3]
> 
> This covers the Intel PNP0D80 _DSM.  But
> lpi_device_get_constraints_amd() implies that there's a similar but
> different AMD version?  Is there a published spec for the AMD _DSM?

AMD's _DSM doesn't have a public facing document today.

I'll try to get that solved, but for now referencing the Intel and 
Microsoft ones conveys the necessary message about how this stuff works.

> 
>>>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>> Reported-by: Iain Lane <iain@orangesquash.org.uk>
>>>> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121
>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>>> ---
>>>> v10->v11:
>>>>    * Fix kernel kernel build robot errors and various sparse warnings
>>>>      related to new handling of pci_power_t.
>>>>    * Use the new helpers introduced in previous patches
>>>> ---
>>>>    drivers/pci/pci-acpi.c | 12 ++++++++++++
>>>>    drivers/pci/pci.c      | 17 ++++++++++++++++-
>>>>    drivers/pci/pci.h      |  5 +++++
>>>>    3 files changed, 33 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
>>>> index b5b65cdfa3b8b..9f418ec87b6a5 100644
>>>> --- a/drivers/pci/pci-acpi.c
>>>> +++ b/drivers/pci/pci-acpi.c
>>>> @@ -1081,6 +1081,18 @@ bool acpi_pci_bridge_d3(struct pci_dev *dev)
>>>>    	return false;
>>>>    }
>>>> +/**
>>>> + * acpi_pci_device_constraint - return any PCI power state constraints
>>>> + * @dev: PCI device to check
>>>> + *
>>>> + * Returns: Any valid constraint specified by platform for a device.
>>>> + * Otherwise PCI_POWER_ERROR.
>>>> + */
>>>> +pci_power_t acpi_pci_device_constraint(struct pci_dev *dev)
>>>> +{
>>>> +	return map_acpi_to_pci(acpi_get_lps0_constraint(&dev->dev));
> 
> The non-x86 acpi_get_lps0_constraint() stub returns -ENODEV, which
> doesn't seem quite right because map_acpi_to_pci() assumes it gets
> ACPI_STATE_D0, etc.
> 
> I think it happens to work out fine because if it gets something
> that's not ACPI_STATE_*, map_acpi_to_pci() returns PCI_POWER_ERROR,
> but that's unholy knowledge of the ACPI_STATE_* values.

If the consensus is to back to patching pci.c/pci-acpi.c in a future 
revision I'll change this a bit.

Yes I was relying upon PCI_POWER_ERROR for non-x86/non-s2idle cases.

> 
>>>> +}
>>>> +
>>>>    static void acpi_pci_config_space_access(struct pci_dev *dev, bool enable)
>>>>    {
>>>>    	int val = enable ? ACPI_REG_CONNECT : ACPI_REG_DISCONNECT;
>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>>> index 693f4ca90452b..567443726974b 100644
>>>> --- a/drivers/pci/pci.c
>>>> +++ b/drivers/pci/pci.c
>>>> @@ -1082,6 +1082,14 @@ static inline bool platform_pci_bridge_d3(struct pci_dev *dev)
>>>>    	return acpi_pci_bridge_d3(dev);
>>>>    }
>>>> +static inline pci_power_t platform_get_constraint(struct pci_dev *dev)
>>>> +{
>>>> +	if (pci_use_mid_pm())
>>>> +		return PCI_POWER_ERROR;
>>>> +
>>>> +	return acpi_pci_device_constraint(dev);
>>>> +}
>>>> +
>>>>    /**
>>>>     * pci_update_current_state - Read power state of given device and cache it
>>>>     * @dev: PCI device to handle.
>>>> @@ -2685,11 +2693,13 @@ static inline pci_power_t pci_get_wake_pme_state(struct pci_dev *dev)
>>>>     */
>>>>    static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
>>>>    {
>>>> +	pci_power_t state;
>>>> +
>>>>    	if (platform_pci_power_manageable(dev)) {
>>>>    		/*
>>>>    		 * Call the platform to find the target state for the device.
>>>>    		 */
>>>> -		pci_power_t state = platform_pci_choose_state(dev);
>>>> +		state = platform_pci_choose_state(dev);
>>>>    		switch (state) {
>>>>    		case PCI_POWER_ERROR:
>>>> @@ -2715,6 +2725,11 @@ static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
>>>>    	else if (!dev->pm_cap)
>>>>    		return PCI_D0;
>>>> +	/* if platform indicates preferred state device constraint, use it */
>>>> +	state = platform_get_constraint(dev);
>>>> +	if (state != PCI_POWER_ERROR)
>>>> +		return state;
>>>> +
>>>>    	if (wakeup && dev->pme_support)
>>>>    		return pci_get_wake_pme_state(dev);
>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>> index a4c3974340576..410fca4b88837 100644
>>>> --- a/drivers/pci/pci.h
>>>> +++ b/drivers/pci/pci.h
>>>> @@ -707,6 +707,7 @@ void pci_set_acpi_fwnode(struct pci_dev *dev);
>>>>    int pci_dev_acpi_reset(struct pci_dev *dev, bool probe);
>>>>    bool acpi_pci_power_manageable(struct pci_dev *dev);
>>>>    bool acpi_pci_bridge_d3(struct pci_dev *dev);
>>>> +pci_power_t acpi_pci_device_constraint(struct pci_dev *dev);
>>>>    int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state);
>>>>    pci_power_t acpi_pci_get_power_state(struct pci_dev *dev);
>>>>    void acpi_pci_refresh_power_state(struct pci_dev *dev);
>>>> @@ -731,6 +732,10 @@ static inline bool acpi_pci_bridge_d3(struct pci_dev *dev)
>>>>    {
>>>>    	return false;
>>>>    }
>>>> +static inline pci_power_t acpi_pci_device_constraint(struct pci_dev *dev)
>>>> +{
>>>> +	return PCI_POWER_ERROR;
>>>> +}
>>>>    static inline int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
>>>>    {
>>>>    	return -ENODEV;
>>>> -- 
>>>> 2.34.1
>>>>
Rafael J. Wysocki Aug. 17, 2023, 12:13 p.m. UTC | #5
On Thu, Aug 17, 2023 at 3:26 AM Limonciello, Mario
<mario.limonciello@amd.com> wrote:
>
>
>
> On 8/16/2023 5:38 PM, Bjorn Helgaas wrote:
> > [I see that you just posted a v12 that doesn't touch drivers/pci at
> > all.  I haven't looked at it yet, so maybe my questions/comments below
> > are no longer relevant.]
>
> I'm not married to either approach but I think that you'll like the v12
> approach better.
>
> Let me try to answer your questions anyway though because I think
> they're still applicable for understanding of this issue.
>
> >
> > On Wed, Aug 16, 2023 at 07:57:52AM -0500, Limonciello, Mario wrote:
> >> On 8/15/2023 6:48 PM, Bjorn Helgaas wrote:
> >>> On Wed, Aug 09, 2023 at 01:54:53PM -0500, Mario Limonciello wrote:
> >>>> Since commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
> >>>> PCIe ports from modern machines (>=2015) are allowed to be put into D3 by
> >>>> storing a value to the `bridge_d3` variable in the `struct pci_dev`
> >>>> structure.
> >>>>
> >>>> pci_power_manageable() uses this variable to indicate a PCIe port can
> >>>> enter D3.
> >>>> pci_pm_suspend_noirq() uses the return from pci_power_manageable() to
> >>>> decide whether to try to put a device into its target state for a sleep
> >>>> cycle via pci_prepare_to_sleep().
> >>>>
> >>>> For devices that support D3, the target state is selected by this policy:
> >>>> 1. If platform_pci_power_manageable():
> >>>>      Use platform_pci_choose_state()
> >>>> 2. If the device is armed for wakeup:
> >>>>      Select the deepest D-state that supports a PME.
> >>>> 3. Else:
> >>>>      Use D3hot.
> >>>>
> >>>> Devices are considered power manageable by the platform when they have
> >>>> one or more objects described in the table in section 7.3 of the ACPI 6.5
> >>>> specification.
> >>>>
> >>>> When devices are not considered power manageable; specs are ambiguous as
> >>>> to what should happen.  In this situation Windows 11 leaves PCIe
> >>>> ports in D0 while Linux puts them into D3 due to the above mentioned
> >>>> commit.
> >>>
> >>> Why would we not use the same policy as Windows 11?
> >>
> >> That's what I'm aiming to do with my patch series.
> >
> > OK, help me out because I think I have a hint of how this works, but
> > I'm still really confused.  Here's the sort of commit log I envision
> > (but I know it's all wrong, so help me out):
>
> I was intentionally trying to leave the actual problem out of the commit
> from your earlier feedback and just put it in the cover letter.
>
> But if it's better to keep in the commit message I'll return those details.

It is.

If you make a change in order to address a specific problem, that
problem needs to be described in the changelog of the patch making
that change.

Anything else is more or less confusing IMO.
Mario Limonciello Aug. 17, 2023, 6:31 p.m. UTC | #6
>> I was intentionally trying to leave the actual problem out of the commit
>> from your earlier feedback and just put it in the cover letter.
>>
>> But if it's better to keep in the commit message I'll return those details.
> 
> It is.
> 
> If you make a change in order to address a specific problem, that
> problem needs to be described in the changelog of the patch making
> that change.
> 
> Anything else is more or less confusing IMO.

Thanks. I'll write up a new commit message for v13 once I have some 
feedback on the approach of 'v11' vs the 'approach of 'v12'.
diff mbox series

Patch

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index b5b65cdfa3b8b..9f418ec87b6a5 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -1081,6 +1081,18 @@  bool acpi_pci_bridge_d3(struct pci_dev *dev)
 	return false;
 }
 
+/**
+ * acpi_pci_device_constraint - return any PCI power state constraints
+ * @dev: PCI device to check
+ *
+ * Returns: Any valid constraint specified by platform for a device.
+ * Otherwise PCI_POWER_ERROR.
+ */
+pci_power_t acpi_pci_device_constraint(struct pci_dev *dev)
+{
+	return map_acpi_to_pci(acpi_get_lps0_constraint(&dev->dev));
+}
+
 static void acpi_pci_config_space_access(struct pci_dev *dev, bool enable)
 {
 	int val = enable ? ACPI_REG_CONNECT : ACPI_REG_DISCONNECT;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 693f4ca90452b..567443726974b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1082,6 +1082,14 @@  static inline bool platform_pci_bridge_d3(struct pci_dev *dev)
 	return acpi_pci_bridge_d3(dev);
 }
 
+static inline pci_power_t platform_get_constraint(struct pci_dev *dev)
+{
+	if (pci_use_mid_pm())
+		return PCI_POWER_ERROR;
+
+	return acpi_pci_device_constraint(dev);
+}
+
 /**
  * pci_update_current_state - Read power state of given device and cache it
  * @dev: PCI device to handle.
@@ -2685,11 +2693,13 @@  static inline pci_power_t pci_get_wake_pme_state(struct pci_dev *dev)
  */
 static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
 {
+	pci_power_t state;
+
 	if (platform_pci_power_manageable(dev)) {
 		/*
 		 * Call the platform to find the target state for the device.
 		 */
-		pci_power_t state = platform_pci_choose_state(dev);
+		state = platform_pci_choose_state(dev);
 
 		switch (state) {
 		case PCI_POWER_ERROR:
@@ -2715,6 +2725,11 @@  static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
 	else if (!dev->pm_cap)
 		return PCI_D0;
 
+	/* if platform indicates preferred state device constraint, use it */
+	state = platform_get_constraint(dev);
+	if (state != PCI_POWER_ERROR)
+		return state;
+
 	if (wakeup && dev->pme_support)
 		return pci_get_wake_pme_state(dev);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index a4c3974340576..410fca4b88837 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -707,6 +707,7 @@  void pci_set_acpi_fwnode(struct pci_dev *dev);
 int pci_dev_acpi_reset(struct pci_dev *dev, bool probe);
 bool acpi_pci_power_manageable(struct pci_dev *dev);
 bool acpi_pci_bridge_d3(struct pci_dev *dev);
+pci_power_t acpi_pci_device_constraint(struct pci_dev *dev);
 int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state);
 pci_power_t acpi_pci_get_power_state(struct pci_dev *dev);
 void acpi_pci_refresh_power_state(struct pci_dev *dev);
@@ -731,6 +732,10 @@  static inline bool acpi_pci_bridge_d3(struct pci_dev *dev)
 {
 	return false;
 }
+static inline pci_power_t acpi_pci_device_constraint(struct pci_dev *dev)
+{
+	return PCI_POWER_ERROR;
+}
 static inline int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
 {
 	return -ENODEV;