diff mbox series

PCI: Fix the ACPI power state during runtime resume

Message ID 20220124122107.12148-1-abhsahu@nvidia.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series PCI: Fix the ACPI power state during runtime resume | expand

Commit Message

Abhishek Sahu Jan. 24, 2022, 12:21 p.m. UTC
Consider the following sequence during PCI device runtime
suspend/resume:

1. PCI device goes into runtime suspended state. The PCI state
   will be changed to PCI_D0 and then pci_platform_power_transition()
   will be called which changes the ACPI state to ACPI_STATE_D3_HOT.

2. Parent bridge goes into runtime suspended state. If parent
   bridge supports D3cold, then it will change the power state of all its
   children to D3cold state and the power will be removed.

3. During wake-up time, the bridge will be runtime resumed first
   and pci_power_up() will be called for the bridge. Now, the power
   supply will be resumed.

4. pci_resume_bus() will be called which will internally invoke
   pci_restore_standard_config(). pci_update_current_state()
   will read PCI_PM_CTRL register and the current_state will be
   updated to D0.

In the above process, at step 4, the ACPI device state will still be
ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being
invoked. We need call the pci_platform_power_transition() with state
D0 to change the ACPI state to ACPI_STATE_D0.

This patch calls pci_power_up() if current power state is D0 inside
pci_restore_standard_config(). This pci_power_up() will change the
ACPI state to ACPI_STATE_D0.

Following are the steps to confirm:

Enable the debug prints in acpi_pci_set_power_state()

0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device

Before:

0000:01:00.0: power state changed by ACPI to D3hot
0000:00:01.0: power state changed by ACPI to D3cold
0000:00:01.0: power state changed by ACPI to D0

After:

0000:01:00.0: power state changed by ACPI to D3hot
0000:00:01.0: power state changed by ACPI to D3cold
0000:00:01.0: power state changed by ACPI to D0
0000:01:00.0: power state changed by ACPI to D0

So with this patch, the PCI device ACPI state is also being
changed to D0.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
---
 drivers/pci/pci-driver.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Bjorn Helgaas Feb. 4, 2022, 11:32 p.m. UTC | #1
[+cc Rafael, hoping for your review :)

Wonder if we should add something like this to MAINTAINERS so you get
cc'd on power-related things:

diff --git a/MAINTAINERS b/MAINTAINERS
index ea3e6c914384..3d9a211cad5d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15422,6 +15422,7 @@ F:	include/linux/pm.h
 F:	include/linux/pm_*
 F:	include/linux/powercap.h
 F:	kernel/configs/nopm.config
+K:	pci_[a-z_]*power[a-z_]*\(
 
 DYNAMIC THERMAL POWER MANAGEMENT (DTPM)
 M:	Daniel Lezcano <daniel.lezcano@kernel.org>
]

On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote:
> Consider the following sequence during PCI device runtime
> suspend/resume:
> 
> 1. PCI device goes into runtime suspended state. The PCI state
>    will be changed to PCI_D0 and then pci_platform_power_transition()
>    will be called which changes the ACPI state to ACPI_STATE_D3_HOT.
> 
> 2. Parent bridge goes into runtime suspended state. If parent
>    bridge supports D3cold, then it will change the power state of all its
>    children to D3cold state and the power will be removed.
> 
> 3. During wake-up time, the bridge will be runtime resumed first
>    and pci_power_up() will be called for the bridge. Now, the power
>    supply will be resumed.
> 
> 4. pci_resume_bus() will be called which will internally invoke
>    pci_restore_standard_config(). pci_update_current_state()
>    will read PCI_PM_CTRL register and the current_state will be
>    updated to D0.
> 
> In the above process, at step 4, the ACPI device state will still be
> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being
> invoked. We need call the pci_platform_power_transition() with state
> D0 to change the ACPI state to ACPI_STATE_D0.
> 
> This patch calls pci_power_up() if current power state is D0 inside
> pci_restore_standard_config(). This pci_power_up() will change the
> ACPI state to ACPI_STATE_D0.
> 
> Following are the steps to confirm:
> 
> Enable the debug prints in acpi_pci_set_power_state()
> 
> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device
> 
> Before:
> 
> 0000:01:00.0: power state changed by ACPI to D3hot
> 0000:00:01.0: power state changed by ACPI to D3cold
> 0000:00:01.0: power state changed by ACPI to D0
> 
> After:
> 
> 0000:01:00.0: power state changed by ACPI to D3hot
> 0000:00:01.0: power state changed by ACPI to D3cold
> 0000:00:01.0: power state changed by ACPI to D0
> 0000:01:00.0: power state changed by ACPI to D0
> 
> So with this patch, the PCI device ACPI state is also being
> changed to D0.
> 
> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
> ---
>  drivers/pci/pci-driver.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 588588cfda48..64e0cca12f16 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev)
>   */
>  static int pci_restore_standard_config(struct pci_dev *pci_dev)
>  {
> +	int error = 0;
>  	pci_update_current_state(pci_dev, PCI_UNKNOWN);
>  
>  	if (pci_dev->current_state != PCI_D0) {
> -		int error = pci_set_power_state(pci_dev, PCI_D0);
> -		if (error)
> -			return error;
> +		error = pci_set_power_state(pci_dev, PCI_D0);
> +	} else {
> +		/*
> +		 * The platform power state can still be non-D0, so this is
> +		 * required to change the platform power state to D0.
> +		 */
> +		error = pci_power_up(pci_dev);
>  	}
>  
> +	if (error)
> +		return error;
> +
>  	pci_restore_state(pci_dev);
>  	pci_pme_restore(pci_dev);
>  	return 0;
> -- 
> 2.17.1
>
Rafael J. Wysocki Feb. 7, 2022, 6:58 p.m. UTC | #2
On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote:
> [+cc Rafael, hoping for your review :)

+Mika

> Wonder if we should add something like this to MAINTAINERS so you get
> cc'd on power-related things:
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ea3e6c914384..3d9a211cad5d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -15422,6 +15422,7 @@ F:	include/linux/pm.h
>  F:	include/linux/pm_*
>  F:	include/linux/powercap.h
>  F:	kernel/configs/nopm.config
> +K:	pci_[a-z_]*power[a-z_]*\(

It seems so, but generally PM patches should be CCed to linux-pm anyway.

>  
>  DYNAMIC THERMAL POWER MANAGEMENT (DTPM)
>  M:	Daniel Lezcano <daniel.lezcano@kernel.org>
> ]
> 
> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote:
> > Consider the following sequence during PCI device runtime
> > suspend/resume:
> > 
> > 1. PCI device goes into runtime suspended state. The PCI state
> >    will be changed to PCI_D0 and then pci_platform_power_transition()
> >    will be called which changes the ACPI state to ACPI_STATE_D3_HOT.

You mean PCI_D3hot I suppose?

> > 2. Parent bridge goes into runtime suspended state. If parent
> >    bridge supports D3cold, then it will change the power state of all its
> >    children to D3cold state and the power will be removed.
> > 
> > 3. During wake-up time, the bridge will be runtime resumed first
> >    and pci_power_up() will be called for the bridge. Now, the power
> >    supply will be resumed.
> > 
> > 4. pci_resume_bus() will be called which will internally invoke
> >    pci_restore_standard_config(). pci_update_current_state()
> >    will read PCI_PM_CTRL register and the current_state will be
> >    updated to D0.
> > 
> > In the above process, at step 4, the ACPI device state will still be
> > ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being
> > invoked.

I'm not quite following.

I'm assuming that this description applies to the endpoint device that was
previously put into D3_hot.

Since its current state is D3_hot, it is not D0 (in particular) and the
pci_set_power_state() in pci_restore_standard_config() should put int into
D0 proper, including the platform firmware part.

> > We need call the pci_platform_power_transition() with state
> > D0 to change the ACPI state to ACPI_STATE_D0.
> > 
> > This patch calls pci_power_up() if current power state is D0 inside
> > pci_restore_standard_config(). This pci_power_up() will change the
> > ACPI state to ACPI_STATE_D0.
> > 
> > Following are the steps to confirm:
> > 
> > Enable the debug prints in acpi_pci_set_power_state()
> > 
> > 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device
> > 
> > Before:
> > 
> > 0000:01:00.0: power state changed by ACPI to D3hot
> > 0000:00:01.0: power state changed by ACPI to D3cold
> > 0000:00:01.0: power state changed by ACPI to D0
> > 
> > After:
> > 
> > 0000:01:00.0: power state changed by ACPI to D3hot
> > 0000:00:01.0: power state changed by ACPI to D3cold
> > 0000:00:01.0: power state changed by ACPI to D0
> > 0000:01:00.0: power state changed by ACPI to D0
> > 
> > So with this patch, the PCI device ACPI state is also being
> > changed to D0.
> > 
> > Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
> > ---
> >  drivers/pci/pci-driver.c | 14 +++++++++++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 588588cfda48..64e0cca12f16 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev)
> >   */
> >  static int pci_restore_standard_config(struct pci_dev *pci_dev)
> >  {
> > +	int error = 0;
> >  	pci_update_current_state(pci_dev, PCI_UNKNOWN);
> >  
> >  	if (pci_dev->current_state != PCI_D0) {
> > -		int error = pci_set_power_state(pci_dev, PCI_D0);
> > -		if (error)
> > -			return error;
> > +		error = pci_set_power_state(pci_dev, PCI_D0);
> > +	} else {
> > +		/*
> > +		 * The platform power state can still be non-D0, so this is
> > +		 * required to change the platform power state to D0.
> > +		 */

This really isn't expected to happen.

If the device's power state has been changed to D3hot by ACPI, it is not in D0.

It looks like the state tracking is not working here.

> > +		error = pci_power_up(pci_dev);
> >  	}
> >  
> > +	if (error)
> > +		return error;
> > +
> >  	pci_restore_state(pci_dev);
> >  	pci_pme_restore(pci_dev);
> >  	return 0;
>
Bjorn Helgaas Feb. 7, 2022, 8:13 p.m. UTC | #3
On Mon, Feb 07, 2022 at 07:58:13PM +0100, Rafael J. Wysocki wrote:
> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote:
> > Wonder if we should add something like this to MAINTAINERS so you get
> > cc'd on power-related things:
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index ea3e6c914384..3d9a211cad5d 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -15422,6 +15422,7 @@ F:	include/linux/pm.h
> >  F:	include/linux/pm_*
> >  F:	include/linux/powercap.h
> >  F:	kernel/configs/nopm.config
> > +K:	pci_[a-z_]*power[a-z_]*\(
> 
> It seems so, but generally PM patches should be CCed to linux-pm anyway.

Definitely.  It's just that running get_maintainer.pl on this patch
currently shows:

  Bjorn Helgaas <bhelgaas@google.com> (supporter:PCI SUBSYSTEM)
  linux-pci@vger.kernel.org (open list:PCI SUBSYSTEM)
  linux-kernel@vger.kernel.org (open list)

so it's not as helpful as it could be.  The MAINTAINERS patch above
would change it to this:

  Bjorn Helgaas <bhelgaas@google.com> (supporter:PCI SUBSYSTEM)
  "Rafael J. Wysocki" <rafael@kernel.org> (supporter:POWER MANAGEMENT CORE)
  linux-pci@vger.kernel.org (open list:PCI SUBSYSTEM)
  linux-kernel@vger.kernel.org (open list)
  linux-pm@vger.kernel.org (open list:POWER MANAGEMENT CORE)

Bjorn
Abhishek Sahu Feb. 8, 2022, 10:30 a.m. UTC | #4
On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote:
> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote:
>> [+cc Rafael, hoping for your review :)
> 
> +Mika
> 
>> Wonder if we should add something like this to MAINTAINERS so you get
>> cc'd on power-related things:
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index ea3e6c914384..3d9a211cad5d 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -15422,6 +15422,7 @@ F:    include/linux/pm.h
>>  F:   include/linux/pm_*
>>  F:   include/linux/powercap.h
>>  F:   kernel/configs/nopm.config
>> +K:   pci_[a-z_]*power[a-z_]*\(
> 
> It seems so, but generally PM patches should be CCed to linux-pm anyway.
> 
>>
>>  DYNAMIC THERMAL POWER MANAGEMENT (DTPM)
>>  M:   Daniel Lezcano <daniel.lezcano@kernel.org>
>> ]
>>
>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote:
>>> Consider the following sequence during PCI device runtime
>>> suspend/resume:
>>>
>>> 1. PCI device goes into runtime suspended state. The PCI state
>>>    will be changed to PCI_D0 and then pci_platform_power_transition()
>>>    will be called which changes the ACPI state to ACPI_STATE_D3_HOT.
> 
> You mean PCI_D3hot I suppose?
>

 Yes. It should be PCI_D3hot here. 

>>> 2. Parent bridge goes into runtime suspended state. If parent
>>>    bridge supports D3cold, then it will change the power state of all its
>>>    children to D3cold state and the power will be removed.
>>>
>>> 3. During wake-up time, the bridge will be runtime resumed first
>>>    and pci_power_up() will be called for the bridge. Now, the power
>>>    supply will be resumed.
>>>
>>> 4. pci_resume_bus() will be called which will internally invoke
>>>    pci_restore_standard_config(). pci_update_current_state()
>>>    will read PCI_PM_CTRL register and the current_state will be
>>>    updated to D0.
>>>
>>> In the above process, at step 4, the ACPI device state will still be
>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being
>>> invoked.
> 
> I'm not quite following.
> 
> I'm assuming that this description applies to the endpoint device that was
> previously put into D3_hot.
> 

 Yes. This is applicable for endpoint devices which was previously put
 into D3hot.

> Since its current state is D3_hot, it is not D0 (in particular) and the
> pci_set_power_state() in pci_restore_standard_config() should put int into
> D0 proper, including the platform firmware part.
> 

 The pci_restore_standard_config() for endpoint devices are being called
 internally during wake-up of upstream bridge. 

 pci_power_up(struct pci_dev *dev)
 {
      ...
      if (dev->runtime_d3cold) {
        /*
         * When powering on a bridge from D3cold, the whole hierarchy
         * may be powered on into D0uninitialized state, resume them to
         * give them a chance to suspend again
         */
        pci_resume_bus(dev->subordinate);
    }
    ...
 }

 For the upstream bridge, the above code will trigger the wake-up of
 endpoint devices and then following code will be executed for the
 endpoint devices:

 pci_update_current_state(struct pci_dev *dev, pci_power_t state)
 {
    if (platform_pci_get_power_state(dev) == PCI_D3cold ||
        !pci_device_is_present(dev)) {
        dev->current_state = PCI_D3cold;
    } else if (dev->pm_cap) {
        u16 pmcsr;

        pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
        dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
    } else {
        dev->current_state = state;
    }
 }

 In the above code, the current_state will be set to D0 for the
 endpoint devices since it will go into second block where
 it will read the PM_CTRL register.
 
>>> We need call the pci_platform_power_transition() with state
>>> D0 to change the ACPI state to ACPI_STATE_D0.
>>>
>>> This patch calls pci_power_up() if current power state is D0 inside
>>> pci_restore_standard_config(). This pci_power_up() will change the
>>> ACPI state to ACPI_STATE_D0.
>>>
>>> Following are the steps to confirm:
>>>
>>> Enable the debug prints in acpi_pci_set_power_state()
>>>
>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device
>>>
>>> Before:
>>>
>>> 0000:01:00.0: power state changed by ACPI to D3hot
>>> 0000:00:01.0: power state changed by ACPI to D3cold
>>> 0000:00:01.0: power state changed by ACPI to D0
>>>
>>> After:
>>>
>>> 0000:01:00.0: power state changed by ACPI to D3hot
>>> 0000:00:01.0: power state changed by ACPI to D3cold
>>> 0000:00:01.0: power state changed by ACPI to D0
>>> 0000:01:00.0: power state changed by ACPI to D0
>>>
>>> So with this patch, the PCI device ACPI state is also being
>>> changed to D0.
>>>
>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
>>> ---
>>>  drivers/pci/pci-driver.c | 14 +++++++++++---
>>>  1 file changed, 11 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>>> index 588588cfda48..64e0cca12f16 100644
>>> --- a/drivers/pci/pci-driver.c
>>> +++ b/drivers/pci/pci-driver.c
>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev)
>>>   */
>>>  static int pci_restore_standard_config(struct pci_dev *pci_dev)
>>>  {
>>> +   int error = 0;
>>>     pci_update_current_state(pci_dev, PCI_UNKNOWN);
>>>
>>>     if (pci_dev->current_state != PCI_D0) {
>>> -           int error = pci_set_power_state(pci_dev, PCI_D0);
>>> -           if (error)
>>> -                   return error;
>>> +           error = pci_set_power_state(pci_dev, PCI_D0);
>>> +   } else {
>>> +           /*
>>> +            * The platform power state can still be non-D0, so this is
>>> +            * required to change the platform power state to D0.
>>> +            */
> 
> This really isn't expected to happen.
> 
> If the device's power state has been changed to D3hot by ACPI, it is not in D0.
> 
> It looks like the state tracking is not working here.
> 

 The state setting to D0 is happening due to the current logic present in
 pci_update_current_state(). If we can fix the logic in
 pci_update_current_state() to detect this condition and return state D3hot,
 then it should also fix the issue. 

 Thanks,
 Abhishek

>>> +           error = pci_power_up(pci_dev);
>>>     }
>>>
>>> +   if (error)
>>> +           return error;
>>> +
>>>     pci_restore_state(pci_dev);
>>>     pci_pme_restore(pci_dev);
>>>     return 0;
>>
> 
> 
> 
>
Abhishek Sahu April 5, 2022, 4:36 p.m. UTC | #5
On 2/8/2022 4:00 PM, Abhishek Sahu wrote:
> On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote:
>> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote:
>>> [+cc Rafael, hoping for your review :)
>>
>> +Mika
>>
>>> Wonder if we should add something like this to MAINTAINERS so you get
>>> cc'd on power-related things:
>>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index ea3e6c914384..3d9a211cad5d 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -15422,6 +15422,7 @@ F:    include/linux/pm.h
>>>  F:   include/linux/pm_*
>>>  F:   include/linux/powercap.h
>>>  F:   kernel/configs/nopm.config
>>> +K:   pci_[a-z_]*power[a-z_]*\(
>>
>> It seems so, but generally PM patches should be CCed to linux-pm anyway.
>>
>>>
>>>  DYNAMIC THERMAL POWER MANAGEMENT (DTPM)
>>>  M:   Daniel Lezcano <daniel.lezcano@kernel.org>
>>> ]
>>>
>>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote:
>>>> Consider the following sequence during PCI device runtime
>>>> suspend/resume:
>>>>
>>>> 1. PCI device goes into runtime suspended state. The PCI state
>>>>    will be changed to PCI_D0 and then pci_platform_power_transition()
>>>>    will be called which changes the ACPI state to ACPI_STATE_D3_HOT.
>>
>> You mean PCI_D3hot I suppose?
>>
> 
>  Yes. It should be PCI_D3hot here. 
> 
>>>> 2. Parent bridge goes into runtime suspended state. If parent
>>>>    bridge supports D3cold, then it will change the power state of all its
>>>>    children to D3cold state and the power will be removed.
>>>>
>>>> 3. During wake-up time, the bridge will be runtime resumed first
>>>>    and pci_power_up() will be called for the bridge. Now, the power
>>>>    supply will be resumed.
>>>>
>>>> 4. pci_resume_bus() will be called which will internally invoke
>>>>    pci_restore_standard_config(). pci_update_current_state()
>>>>    will read PCI_PM_CTRL register and the current_state will be
>>>>    updated to D0.
>>>>
>>>> In the above process, at step 4, the ACPI device state will still be
>>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being
>>>> invoked.
>>
>> I'm not quite following.
>>
>> I'm assuming that this description applies to the endpoint device that was
>> previously put into D3_hot.
>>
> 
>  Yes. This is applicable for endpoint devices which was previously put
>  into D3hot.
> 
>> Since its current state is D3_hot, it is not D0 (in particular) and the
>> pci_set_power_state() in pci_restore_standard_config() should put int into
>> D0 proper, including the platform firmware part.
>>
> 
>  The pci_restore_standard_config() for endpoint devices are being called
>  internally during wake-up of upstream bridge. 
> 
>  pci_power_up(struct pci_dev *dev)
>  {
>       ...
>       if (dev->runtime_d3cold) {
>         /*
>          * When powering on a bridge from D3cold, the whole hierarchy
>          * may be powered on into D0uninitialized state, resume them to
>          * give them a chance to suspend again
>          */
>         pci_resume_bus(dev->subordinate);
>     }
>     ...
>  }
> 
>  For the upstream bridge, the above code will trigger the wake-up of
>  endpoint devices and then following code will be executed for the
>  endpoint devices:
> 
>  pci_update_current_state(struct pci_dev *dev, pci_power_t state)
>  {
>     if (platform_pci_get_power_state(dev) == PCI_D3cold ||
>         !pci_device_is_present(dev)) {
>         dev->current_state = PCI_D3cold;
>     } else if (dev->pm_cap) {
>         u16 pmcsr;
> 
>         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>         dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
>     } else {
>         dev->current_state = state;
>     }
>  }
> 
>  In the above code, the current_state will be set to D0 for the
>  endpoint devices since it will go into second block where
>  it will read the PM_CTRL register.
>  
>>>> We need call the pci_platform_power_transition() with state
>>>> D0 to change the ACPI state to ACPI_STATE_D0.
>>>>
>>>> This patch calls pci_power_up() if current power state is D0 inside
>>>> pci_restore_standard_config(). This pci_power_up() will change the
>>>> ACPI state to ACPI_STATE_D0.
>>>>
>>>> Following are the steps to confirm:
>>>>
>>>> Enable the debug prints in acpi_pci_set_power_state()
>>>>
>>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device
>>>>
>>>> Before:
>>>>
>>>> 0000:01:00.0: power state changed by ACPI to D3hot
>>>> 0000:00:01.0: power state changed by ACPI to D3cold
>>>> 0000:00:01.0: power state changed by ACPI to D0
>>>>
>>>> After:
>>>>
>>>> 0000:01:00.0: power state changed by ACPI to D3hot
>>>> 0000:00:01.0: power state changed by ACPI to D3cold
>>>> 0000:00:01.0: power state changed by ACPI to D0
>>>> 0000:01:00.0: power state changed by ACPI to D0
>>>>
>>>> So with this patch, the PCI device ACPI state is also being
>>>> changed to D0.
>>>>
>>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
>>>> ---
>>>>  drivers/pci/pci-driver.c | 14 +++++++++++---
>>>>  1 file changed, 11 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>>>> index 588588cfda48..64e0cca12f16 100644
>>>> --- a/drivers/pci/pci-driver.c
>>>> +++ b/drivers/pci/pci-driver.c
>>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev)
>>>>   */
>>>>  static int pci_restore_standard_config(struct pci_dev *pci_dev)
>>>>  {
>>>> +   int error = 0;
>>>>     pci_update_current_state(pci_dev, PCI_UNKNOWN);
>>>>
>>>>     if (pci_dev->current_state != PCI_D0) {
>>>> -           int error = pci_set_power_state(pci_dev, PCI_D0);
>>>> -           if (error)
>>>> -                   return error;
>>>> +           error = pci_set_power_state(pci_dev, PCI_D0);
>>>> +   } else {
>>>> +           /*
>>>> +            * The platform power state can still be non-D0, so this is
>>>> +            * required to change the platform power state to D0.
>>>> +            */
>>
>> This really isn't expected to happen.
>>
>> If the device's power state has been changed to D3hot by ACPI, it is not in D0.
>>
>> It looks like the state tracking is not working here.
>>
> 
>  The state setting to D0 is happening due to the current logic present in
>  pci_update_current_state(). If we can fix the logic in
>  pci_update_current_state() to detect this condition and return state D3hot,
>  then it should also fix the issue. 
> 
>  Thanks,
>  Abhishek
> 

 Hi Rafael/Mika,

 Could you please help regarding the correct way to fix this issue.
 I can update the patch accordingly.
 
 Thanks,
 Abhishek

>>>> +           error = pci_power_up(pci_dev);
>>>>     }
>>>>
>>>> +   if (error)
>>>> +           return error;
>>>> +
>>>>     pci_restore_state(pci_dev);
>>>>     pci_pme_restore(pci_dev);
>>>>     return 0;
>>>
>>
>>
>>
>>
>
Rafael J. Wysocki April 5, 2022, 4:50 p.m. UTC | #6
On Tuesday, April 5, 2022 6:36:34 PM CEST Abhishek Sahu wrote:
> On 2/8/2022 4:00 PM, Abhishek Sahu wrote:
> > On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote:
> >> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote:
> >>> [+cc Rafael, hoping for your review :)
> >>
> >> +Mika
> >>
> >>> Wonder if we should add something like this to MAINTAINERS so you get
> >>> cc'd on power-related things:
> >>>
> >>> diff --git a/MAINTAINERS b/MAINTAINERS
> >>> index ea3e6c914384..3d9a211cad5d 100644
> >>> --- a/MAINTAINERS
> >>> +++ b/MAINTAINERS
> >>> @@ -15422,6 +15422,7 @@ F:    include/linux/pm.h
> >>>  F:   include/linux/pm_*
> >>>  F:   include/linux/powercap.h
> >>>  F:   kernel/configs/nopm.config
> >>> +K:   pci_[a-z_]*power[a-z_]*\(
> >>
> >> It seems so, but generally PM patches should be CCed to linux-pm anyway.
> >>
> >>>
> >>>  DYNAMIC THERMAL POWER MANAGEMENT (DTPM)
> >>>  M:   Daniel Lezcano <daniel.lezcano@kernel.org>
> >>> ]
> >>>
> >>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote:
> >>>> Consider the following sequence during PCI device runtime
> >>>> suspend/resume:
> >>>>
> >>>> 1. PCI device goes into runtime suspended state. The PCI state
> >>>>    will be changed to PCI_D0 and then pci_platform_power_transition()
> >>>>    will be called which changes the ACPI state to ACPI_STATE_D3_HOT.
> >>
> >> You mean PCI_D3hot I suppose?
> >>
> > 
> >  Yes. It should be PCI_D3hot here. 
> > 
> >>>> 2. Parent bridge goes into runtime suspended state. If parent
> >>>>    bridge supports D3cold, then it will change the power state of all its
> >>>>    children to D3cold state and the power will be removed.
> >>>>
> >>>> 3. During wake-up time, the bridge will be runtime resumed first
> >>>>    and pci_power_up() will be called for the bridge. Now, the power
> >>>>    supply will be resumed.
> >>>>
> >>>> 4. pci_resume_bus() will be called which will internally invoke
> >>>>    pci_restore_standard_config(). pci_update_current_state()
> >>>>    will read PCI_PM_CTRL register and the current_state will be
> >>>>    updated to D0.
> >>>>
> >>>> In the above process, at step 4, the ACPI device state will still be
> >>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being
> >>>> invoked.
> >>
> >> I'm not quite following.
> >>
> >> I'm assuming that this description applies to the endpoint device that was
> >> previously put into D3_hot.
> >>
> > 
> >  Yes. This is applicable for endpoint devices which was previously put
> >  into D3hot.
> > 
> >> Since its current state is D3_hot, it is not D0 (in particular) and the
> >> pci_set_power_state() in pci_restore_standard_config() should put int into
> >> D0 proper, including the platform firmware part.
> >>
> > 
> >  The pci_restore_standard_config() for endpoint devices are being called
> >  internally during wake-up of upstream bridge. 
> > 
> >  pci_power_up(struct pci_dev *dev)
> >  {
> >       ...
> >       if (dev->runtime_d3cold) {
> >         /*
> >          * When powering on a bridge from D3cold, the whole hierarchy
> >          * may be powered on into D0uninitialized state, resume them to
> >          * give them a chance to suspend again
> >          */
> >         pci_resume_bus(dev->subordinate);
> >     }
> >     ...
> >  }
> > 
> >  For the upstream bridge, the above code will trigger the wake-up of
> >  endpoint devices and then following code will be executed for the
> >  endpoint devices:
> > 
> >  pci_update_current_state(struct pci_dev *dev, pci_power_t state)
> >  {
> >     if (platform_pci_get_power_state(dev) == PCI_D3cold ||
> >         !pci_device_is_present(dev)) {
> >         dev->current_state = PCI_D3cold;
> >     } else if (dev->pm_cap) {
> >         u16 pmcsr;
> > 
> >         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> >         dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
> >     } else {
> >         dev->current_state = state;
> >     }
> >  }
> > 
> >  In the above code, the current_state will be set to D0 for the
> >  endpoint devices since it will go into second block where
> >  it will read the PM_CTRL register.
> >  
> >>>> We need call the pci_platform_power_transition() with state
> >>>> D0 to change the ACPI state to ACPI_STATE_D0.
> >>>>
> >>>> This patch calls pci_power_up() if current power state is D0 inside
> >>>> pci_restore_standard_config(). This pci_power_up() will change the
> >>>> ACPI state to ACPI_STATE_D0.
> >>>>
> >>>> Following are the steps to confirm:
> >>>>
> >>>> Enable the debug prints in acpi_pci_set_power_state()
> >>>>
> >>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device
> >>>>
> >>>> Before:
> >>>>
> >>>> 0000:01:00.0: power state changed by ACPI to D3hot
> >>>> 0000:00:01.0: power state changed by ACPI to D3cold
> >>>> 0000:00:01.0: power state changed by ACPI to D0
> >>>>
> >>>> After:
> >>>>
> >>>> 0000:01:00.0: power state changed by ACPI to D3hot
> >>>> 0000:00:01.0: power state changed by ACPI to D3cold
> >>>> 0000:00:01.0: power state changed by ACPI to D0
> >>>> 0000:01:00.0: power state changed by ACPI to D0
> >>>>
> >>>> So with this patch, the PCI device ACPI state is also being
> >>>> changed to D0.
> >>>>
> >>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
> >>>> ---
> >>>>  drivers/pci/pci-driver.c | 14 +++++++++++---
> >>>>  1 file changed, 11 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> >>>> index 588588cfda48..64e0cca12f16 100644
> >>>> --- a/drivers/pci/pci-driver.c
> >>>> +++ b/drivers/pci/pci-driver.c
> >>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev)
> >>>>   */
> >>>>  static int pci_restore_standard_config(struct pci_dev *pci_dev)
> >>>>  {
> >>>> +   int error = 0;
> >>>>     pci_update_current_state(pci_dev, PCI_UNKNOWN);
> >>>>
> >>>>     if (pci_dev->current_state != PCI_D0) {
> >>>> -           int error = pci_set_power_state(pci_dev, PCI_D0);
> >>>> -           if (error)
> >>>> -                   return error;
> >>>> +           error = pci_set_power_state(pci_dev, PCI_D0);
> >>>> +   } else {
> >>>> +           /*
> >>>> +            * The platform power state can still be non-D0, so this is
> >>>> +            * required to change the platform power state to D0.
> >>>> +            */
> >>
> >> This really isn't expected to happen.
> >>
> >> If the device's power state has been changed to D3hot by ACPI, it is not in D0.
> >>
> >> It looks like the state tracking is not working here.
> >>
> > 
> >  The state setting to D0 is happening due to the current logic present in
> >  pci_update_current_state(). If we can fix the logic in
> >  pci_update_current_state() to detect this condition and return state D3hot,
> >  then it should also fix the issue. 
> > 
> >  Thanks,
> >  Abhishek
> > 
> 
>  Hi Rafael/Mika,
> 
>  Could you please help regarding the correct way to fix this issue.
>  I can update the patch accordingly.

I think you can try one of the patches posted recently:

https://patchwork.kernel.org/project/linux-pm/patch/3623886.MHq7AAxBmi@kreacher/

Thanks!
Abhishek Sahu April 6, 2022, 5:32 a.m. UTC | #7
On 4/5/2022 10:20 PM, Rafael J. Wysocki wrote:
> On Tuesday, April 5, 2022 6:36:34 PM CEST Abhishek Sahu wrote:
>> On 2/8/2022 4:00 PM, Abhishek Sahu wrote:
>>> On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote:
>>>> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote:
>>>>> [+cc Rafael, hoping for your review :)
>>>>
>>>> +Mika
>>>>
>>>>> Wonder if we should add something like this to MAINTAINERS so you get
>>>>> cc'd on power-related things:
>>>>>
>>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>>> index ea3e6c914384..3d9a211cad5d 100644
>>>>> --- a/MAINTAINERS
>>>>> +++ b/MAINTAINERS
>>>>> @@ -15422,6 +15422,7 @@ F:    include/linux/pm.h
>>>>>  F:   include/linux/pm_*
>>>>>  F:   include/linux/powercap.h
>>>>>  F:   kernel/configs/nopm.config
>>>>> +K:   pci_[a-z_]*power[a-z_]*\(
>>>>
>>>> It seems so, but generally PM patches should be CCed to linux-pm anyway.
>>>>
>>>>>
>>>>>  DYNAMIC THERMAL POWER MANAGEMENT (DTPM)
>>>>>  M:   Daniel Lezcano <daniel.lezcano@kernel.org>
>>>>> ]
>>>>>
>>>>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote:
>>>>>> Consider the following sequence during PCI device runtime
>>>>>> suspend/resume:
>>>>>>
>>>>>> 1. PCI device goes into runtime suspended state. The PCI state
>>>>>>    will be changed to PCI_D0 and then pci_platform_power_transition()
>>>>>>    will be called which changes the ACPI state to ACPI_STATE_D3_HOT.
>>>>
>>>> You mean PCI_D3hot I suppose?
>>>>
>>>
>>>  Yes. It should be PCI_D3hot here.
>>>
>>>>>> 2. Parent bridge goes into runtime suspended state. If parent
>>>>>>    bridge supports D3cold, then it will change the power state of all its
>>>>>>    children to D3cold state and the power will be removed.
>>>>>>
>>>>>> 3. During wake-up time, the bridge will be runtime resumed first
>>>>>>    and pci_power_up() will be called for the bridge. Now, the power
>>>>>>    supply will be resumed.
>>>>>>
>>>>>> 4. pci_resume_bus() will be called which will internally invoke
>>>>>>    pci_restore_standard_config(). pci_update_current_state()
>>>>>>    will read PCI_PM_CTRL register and the current_state will be
>>>>>>    updated to D0.
>>>>>>
>>>>>> In the above process, at step 4, the ACPI device state will still be
>>>>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being
>>>>>> invoked.
>>>>
>>>> I'm not quite following.
>>>>
>>>> I'm assuming that this description applies to the endpoint device that was
>>>> previously put into D3_hot.
>>>>
>>>
>>>  Yes. This is applicable for endpoint devices which was previously put
>>>  into D3hot.
>>>
>>>> Since its current state is D3_hot, it is not D0 (in particular) and the
>>>> pci_set_power_state() in pci_restore_standard_config() should put int into
>>>> D0 proper, including the platform firmware part.
>>>>
>>>
>>>  The pci_restore_standard_config() for endpoint devices are being called
>>>  internally during wake-up of upstream bridge.
>>>
>>>  pci_power_up(struct pci_dev *dev)
>>>  {
>>>       ...
>>>       if (dev->runtime_d3cold) {
>>>         /*
>>>          * When powering on a bridge from D3cold, the whole hierarchy
>>>          * may be powered on into D0uninitialized state, resume them to
>>>          * give them a chance to suspend again
>>>          */
>>>         pci_resume_bus(dev->subordinate);
>>>     }
>>>     ...
>>>  }
>>>
>>>  For the upstream bridge, the above code will trigger the wake-up of
>>>  endpoint devices and then following code will be executed for the
>>>  endpoint devices:
>>>
>>>  pci_update_current_state(struct pci_dev *dev, pci_power_t state)
>>>  {
>>>     if (platform_pci_get_power_state(dev) == PCI_D3cold ||
>>>         !pci_device_is_present(dev)) {
>>>         dev->current_state = PCI_D3cold;
>>>     } else if (dev->pm_cap) {
>>>         u16 pmcsr;
>>>
>>>         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>>>         dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
>>>     } else {
>>>         dev->current_state = state;
>>>     }
>>>  }
>>>
>>>  In the above code, the current_state will be set to D0 for the
>>>  endpoint devices since it will go into second block where
>>>  it will read the PM_CTRL register.
>>>
>>>>>> We need call the pci_platform_power_transition() with state
>>>>>> D0 to change the ACPI state to ACPI_STATE_D0.
>>>>>>
>>>>>> This patch calls pci_power_up() if current power state is D0 inside
>>>>>> pci_restore_standard_config(). This pci_power_up() will change the
>>>>>> ACPI state to ACPI_STATE_D0.
>>>>>>
>>>>>> Following are the steps to confirm:
>>>>>>
>>>>>> Enable the debug prints in acpi_pci_set_power_state()
>>>>>>
>>>>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device
>>>>>>
>>>>>> Before:
>>>>>>
>>>>>> 0000:01:00.0: power state changed by ACPI to D3hot
>>>>>> 0000:00:01.0: power state changed by ACPI to D3cold
>>>>>> 0000:00:01.0: power state changed by ACPI to D0
>>>>>>
>>>>>> After:
>>>>>>
>>>>>> 0000:01:00.0: power state changed by ACPI to D3hot
>>>>>> 0000:00:01.0: power state changed by ACPI to D3cold
>>>>>> 0000:00:01.0: power state changed by ACPI to D0
>>>>>> 0000:01:00.0: power state changed by ACPI to D0
>>>>>>
>>>>>> So with this patch, the PCI device ACPI state is also being
>>>>>> changed to D0.
>>>>>>
>>>>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
>>>>>> ---
>>>>>>  drivers/pci/pci-driver.c | 14 +++++++++++---
>>>>>>  1 file changed, 11 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>>>>>> index 588588cfda48..64e0cca12f16 100644
>>>>>> --- a/drivers/pci/pci-driver.c
>>>>>> +++ b/drivers/pci/pci-driver.c
>>>>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev)
>>>>>>   */
>>>>>>  static int pci_restore_standard_config(struct pci_dev *pci_dev)
>>>>>>  {
>>>>>> +   int error = 0;
>>>>>>     pci_update_current_state(pci_dev, PCI_UNKNOWN);
>>>>>>
>>>>>>     if (pci_dev->current_state != PCI_D0) {
>>>>>> -           int error = pci_set_power_state(pci_dev, PCI_D0);
>>>>>> -           if (error)
>>>>>> -                   return error;
>>>>>> +           error = pci_set_power_state(pci_dev, PCI_D0);
>>>>>> +   } else {
>>>>>> +           /*
>>>>>> +            * The platform power state can still be non-D0, so this is
>>>>>> +            * required to change the platform power state to D0.
>>>>>> +            */
>>>>
>>>> This really isn't expected to happen.
>>>>
>>>> If the device's power state has been changed to D3hot by ACPI, it is not in D0.
>>>>
>>>> It looks like the state tracking is not working here.
>>>>
>>>
>>>  The state setting to D0 is happening due to the current logic present in
>>>  pci_update_current_state(). If we can fix the logic in
>>>  pci_update_current_state() to detect this condition and return state D3hot,
>>>  then it should also fix the issue.
>>>
>>>  Thanks,
>>>  Abhishek
>>>
>>
>>  Hi Rafael/Mika,
>>
>>  Could you please help regarding the correct way to fix this issue.
>>  I can update the patch accordingly.
> 
> I think you can try one of the patches posted recently:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-pm%2Fpatch%2F3623886.MHq7AAxBmi%40kreacher%2F&amp;data=04%7C01%7Cabhsahu%40nvidia.com%7Cae4c8574f5a44973514a08da172471d6%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637847743178405297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=aasJ79EICVnlJQ4EbXA2AtZFW0qnRsMkHEZRI8mnDI8%3D&amp;reserved=0
> 
> Thanks!
> 
> 
> 

 Thanks Rafael.
 I have applied both the changes and still the issue which I mentioned is happening.

 Following are the prints:

 0000:01:00.0: power state changed by ACPI to D3hot
 0000:00:01.0: power state changed by ACPI to D3cold
 0000:00:01.0: power state changed by ACPI to D0

 So ACPI state change is still not happening for PCI endpoint devices.

 Also, the I checked the code and the pci_power_up() will not be called
 for endpoint devices. For endpoints, pci_restore_standard_config() will
 be called first where the current state will come as D0.

 Thanks,
 Abhishek
Rafael J. Wysocki April 6, 2022, 12:06 p.m. UTC | #8
On Wednesday, April 6, 2022 7:32:45 AM CEST Abhishek Sahu wrote:
> On 4/5/2022 10:20 PM, Rafael J. Wysocki wrote:
> > On Tuesday, April 5, 2022 6:36:34 PM CEST Abhishek Sahu wrote:
> >> On 2/8/2022 4:00 PM, Abhishek Sahu wrote:
> >>> On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote:
> >>>> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote:
> >>>>> [+cc Rafael, hoping for your review :)
> >>>>
> >>>> +Mika
> >>>>
> >>>>> Wonder if we should add something like this to MAINTAINERS so you get
> >>>>> cc'd on power-related things:
> >>>>>
> >>>>> diff --git a/MAINTAINERS b/MAINTAINERS
> >>>>> index ea3e6c914384..3d9a211cad5d 100644
> >>>>> --- a/MAINTAINERS
> >>>>> +++ b/MAINTAINERS
> >>>>> @@ -15422,6 +15422,7 @@ F:    include/linux/pm.h
> >>>>>  F:   include/linux/pm_*
> >>>>>  F:   include/linux/powercap.h
> >>>>>  F:   kernel/configs/nopm.config
> >>>>> +K:   pci_[a-z_]*power[a-z_]*\(
> >>>>
> >>>> It seems so, but generally PM patches should be CCed to linux-pm anyway.
> >>>>
> >>>>>
> >>>>>  DYNAMIC THERMAL POWER MANAGEMENT (DTPM)
> >>>>>  M:   Daniel Lezcano <daniel.lezcano@kernel.org>
> >>>>> ]
> >>>>>
> >>>>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote:
> >>>>>> Consider the following sequence during PCI device runtime
> >>>>>> suspend/resume:
> >>>>>>
> >>>>>> 1. PCI device goes into runtime suspended state. The PCI state
> >>>>>>    will be changed to PCI_D0 and then pci_platform_power_transition()
> >>>>>>    will be called which changes the ACPI state to ACPI_STATE_D3_HOT.
> >>>>
> >>>> You mean PCI_D3hot I suppose?
> >>>>
> >>>
> >>>  Yes. It should be PCI_D3hot here.
> >>>
> >>>>>> 2. Parent bridge goes into runtime suspended state. If parent
> >>>>>>    bridge supports D3cold, then it will change the power state of all its
> >>>>>>    children to D3cold state and the power will be removed.
> >>>>>>
> >>>>>> 3. During wake-up time, the bridge will be runtime resumed first
> >>>>>>    and pci_power_up() will be called for the bridge. Now, the power
> >>>>>>    supply will be resumed.
> >>>>>>
> >>>>>> 4. pci_resume_bus() will be called which will internally invoke
> >>>>>>    pci_restore_standard_config(). pci_update_current_state()
> >>>>>>    will read PCI_PM_CTRL register and the current_state will be
> >>>>>>    updated to D0.
> >>>>>>
> >>>>>> In the above process, at step 4, the ACPI device state will still be
> >>>>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being
> >>>>>> invoked.
> >>>>
> >>>> I'm not quite following.
> >>>>
> >>>> I'm assuming that this description applies to the endpoint device that was
> >>>> previously put into D3_hot.
> >>>>
> >>>
> >>>  Yes. This is applicable for endpoint devices which was previously put
> >>>  into D3hot.
> >>>
> >>>> Since its current state is D3_hot, it is not D0 (in particular) and the
> >>>> pci_set_power_state() in pci_restore_standard_config() should put int into
> >>>> D0 proper, including the platform firmware part.
> >>>>
> >>>
> >>>  The pci_restore_standard_config() for endpoint devices are being called
> >>>  internally during wake-up of upstream bridge.
> >>>
> >>>  pci_power_up(struct pci_dev *dev)
> >>>  {
> >>>       ...
> >>>       if (dev->runtime_d3cold) {
> >>>         /*
> >>>          * When powering on a bridge from D3cold, the whole hierarchy
> >>>          * may be powered on into D0uninitialized state, resume them to
> >>>          * give them a chance to suspend again
> >>>          */
> >>>         pci_resume_bus(dev->subordinate);
> >>>     }
> >>>     ...
> >>>  }
> >>>
> >>>  For the upstream bridge, the above code will trigger the wake-up of
> >>>  endpoint devices and then following code will be executed for the
> >>>  endpoint devices:
> >>>
> >>>  pci_update_current_state(struct pci_dev *dev, pci_power_t state)
> >>>  {
> >>>     if (platform_pci_get_power_state(dev) == PCI_D3cold ||
> >>>         !pci_device_is_present(dev)) {
> >>>         dev->current_state = PCI_D3cold;
> >>>     } else if (dev->pm_cap) {
> >>>         u16 pmcsr;
> >>>
> >>>         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> >>>         dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
> >>>     } else {
> >>>         dev->current_state = state;
> >>>     }
> >>>  }
> >>>
> >>>  In the above code, the current_state will be set to D0 for the
> >>>  endpoint devices since it will go into second block where
> >>>  it will read the PM_CTRL register.
> >>>
> >>>>>> We need call the pci_platform_power_transition() with state
> >>>>>> D0 to change the ACPI state to ACPI_STATE_D0.
> >>>>>>
> >>>>>> This patch calls pci_power_up() if current power state is D0 inside
> >>>>>> pci_restore_standard_config(). This pci_power_up() will change the
> >>>>>> ACPI state to ACPI_STATE_D0.
> >>>>>>
> >>>>>> Following are the steps to confirm:
> >>>>>>
> >>>>>> Enable the debug prints in acpi_pci_set_power_state()
> >>>>>>
> >>>>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device
> >>>>>>
> >>>>>> Before:
> >>>>>>
> >>>>>> 0000:01:00.0: power state changed by ACPI to D3hot
> >>>>>> 0000:00:01.0: power state changed by ACPI to D3cold
> >>>>>> 0000:00:01.0: power state changed by ACPI to D0
> >>>>>>
> >>>>>> After:
> >>>>>>
> >>>>>> 0000:01:00.0: power state changed by ACPI to D3hot
> >>>>>> 0000:00:01.0: power state changed by ACPI to D3cold
> >>>>>> 0000:00:01.0: power state changed by ACPI to D0
> >>>>>> 0000:01:00.0: power state changed by ACPI to D0
> >>>>>>
> >>>>>> So with this patch, the PCI device ACPI state is also being
> >>>>>> changed to D0.
> >>>>>>
> >>>>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
> >>>>>> ---
> >>>>>>  drivers/pci/pci-driver.c | 14 +++++++++++---
> >>>>>>  1 file changed, 11 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> >>>>>> index 588588cfda48..64e0cca12f16 100644
> >>>>>> --- a/drivers/pci/pci-driver.c
> >>>>>> +++ b/drivers/pci/pci-driver.c
> >>>>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev)
> >>>>>>   */
> >>>>>>  static int pci_restore_standard_config(struct pci_dev *pci_dev)
> >>>>>>  {
> >>>>>> +   int error = 0;
> >>>>>>     pci_update_current_state(pci_dev, PCI_UNKNOWN);
> >>>>>>
> >>>>>>     if (pci_dev->current_state != PCI_D0) {
> >>>>>> -           int error = pci_set_power_state(pci_dev, PCI_D0);
> >>>>>> -           if (error)
> >>>>>> -                   return error;
> >>>>>> +           error = pci_set_power_state(pci_dev, PCI_D0);
> >>>>>> +   } else {
> >>>>>> +           /*
> >>>>>> +            * The platform power state can still be non-D0, so this is
> >>>>>> +            * required to change the platform power state to D0.
> >>>>>> +            */
> >>>>
> >>>> This really isn't expected to happen.
> >>>>
> >>>> If the device's power state has been changed to D3hot by ACPI, it is not in D0.
> >>>>
> >>>> It looks like the state tracking is not working here.
> >>>>
> >>>
> >>>  The state setting to D0 is happening due to the current logic present in
> >>>  pci_update_current_state(). If we can fix the logic in
> >>>  pci_update_current_state() to detect this condition and return state D3hot,
> >>>  then it should also fix the issue.
> >>>
> >>>  Thanks,
> >>>  Abhishek
> >>>
> >>
> >>  Hi Rafael/Mika,
> >>
> >>  Could you please help regarding the correct way to fix this issue.
> >>  I can update the patch accordingly.
> > 
> > I think you can try one of the patches posted recently:
> > 
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-pm%2Fpatch%2F3623886.MHq7AAxBmi%40kreacher%2F&amp;data=04%7C01%7Cabhsahu%40nvidia.com%7Cae4c8574f5a44973514a08da172471d6%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637847743178405297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=aasJ79EICVnlJQ4EbXA2AtZFW0qnRsMkHEZRI8mnDI8%3D&amp;reserved=0
> > 
> > Thanks!
> > 
> > 
> > 
> 
>  Thanks Rafael.
>  I have applied both the changes and still the issue which I mentioned is happening.
> 
>  Following are the prints:
> 
>  0000:01:00.0: power state changed by ACPI to D3hot
>  0000:00:01.0: power state changed by ACPI to D3cold
>  0000:00:01.0: power state changed by ACPI to D0
> 
>  So ACPI state change is still not happening for PCI endpoint devices.
> 
>  Also, the I checked the code and the pci_power_up() will not be called
>  for endpoint devices. For endpoints, pci_restore_standard_config() will
>  be called first where the current state will come as D0.

OK, I see.

The problem is that if the PCI device goes to D0 because of the bridge power-up,
it's ACPI companion's power state may not follow, which means that we really
want to do a full power-up in there.

Please test the appended patch with the patch from

https://patchwork.kernel.org/project/linux-pm/patch/3623886.MHq7AAxBmi@kreacher/

still applied.

---
 drivers/pci/pci-driver.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -1312,7 +1312,7 @@ static int pci_pm_runtime_resume(struct
 	 * to a driver because although we left it in D0, it may have gone to
 	 * D3cold when the bridge above it runtime suspended.
 	 */
-	pci_restore_standard_config(pci_dev);
+	pci_pm_default_resume_early(pci_dev);
 
 	if (!pci_dev->driver)
 		return 0;
Abhishek Sahu April 6, 2022, 6:43 p.m. UTC | #9
On 4/6/2022 5:36 PM, Rafael J. Wysocki wrote:
> On Wednesday, April 6, 2022 7:32:45 AM CEST Abhishek Sahu wrote:
>> On 4/5/2022 10:20 PM, Rafael J. Wysocki wrote:
>>> On Tuesday, April 5, 2022 6:36:34 PM CEST Abhishek Sahu wrote:
>>>> On 2/8/2022 4:00 PM, Abhishek Sahu wrote:
>>>>> On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote:
>>>>>> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote:
>>>>>>> [+cc Rafael, hoping for your review :)
>>>>>>
>>>>>> +Mika
>>>>>>
>>>>>>> Wonder if we should add something like this to MAINTAINERS so you get
>>>>>>> cc'd on power-related things:
>>>>>>>
>>>>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>>>>> index ea3e6c914384..3d9a211cad5d 100644
>>>>>>> --- a/MAINTAINERS
>>>>>>> +++ b/MAINTAINERS
>>>>>>> @@ -15422,6 +15422,7 @@ F:    include/linux/pm.h
>>>>>>>  F:   include/linux/pm_*
>>>>>>>  F:   include/linux/powercap.h
>>>>>>>  F:   kernel/configs/nopm.config
>>>>>>> +K:   pci_[a-z_]*power[a-z_]*\(
>>>>>>
>>>>>> It seems so, but generally PM patches should be CCed to linux-pm anyway.
>>>>>>
>>>>>>>
>>>>>>>  DYNAMIC THERMAL POWER MANAGEMENT (DTPM)
>>>>>>>  M:   Daniel Lezcano <daniel.lezcano@kernel.org>
>>>>>>> ]
>>>>>>>
>>>>>>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote:
>>>>>>>> Consider the following sequence during PCI device runtime
>>>>>>>> suspend/resume:
>>>>>>>>
>>>>>>>> 1. PCI device goes into runtime suspended state. The PCI state
>>>>>>>>    will be changed to PCI_D0 and then pci_platform_power_transition()
>>>>>>>>    will be called which changes the ACPI state to ACPI_STATE_D3_HOT.
>>>>>>
>>>>>> You mean PCI_D3hot I suppose?
>>>>>>
>>>>>
>>>>>  Yes. It should be PCI_D3hot here.
>>>>>
>>>>>>>> 2. Parent bridge goes into runtime suspended state. If parent
>>>>>>>>    bridge supports D3cold, then it will change the power state of all its
>>>>>>>>    children to D3cold state and the power will be removed.
>>>>>>>>
>>>>>>>> 3. During wake-up time, the bridge will be runtime resumed first
>>>>>>>>    and pci_power_up() will be called for the bridge. Now, the power
>>>>>>>>    supply will be resumed.
>>>>>>>>
>>>>>>>> 4. pci_resume_bus() will be called which will internally invoke
>>>>>>>>    pci_restore_standard_config(). pci_update_current_state()
>>>>>>>>    will read PCI_PM_CTRL register and the current_state will be
>>>>>>>>    updated to D0.
>>>>>>>>
>>>>>>>> In the above process, at step 4, the ACPI device state will still be
>>>>>>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being
>>>>>>>> invoked.
>>>>>>
>>>>>> I'm not quite following.
>>>>>>
>>>>>> I'm assuming that this description applies to the endpoint device that was
>>>>>> previously put into D3_hot.
>>>>>>
>>>>>
>>>>>  Yes. This is applicable for endpoint devices which was previously put
>>>>>  into D3hot.
>>>>>
>>>>>> Since its current state is D3_hot, it is not D0 (in particular) and the
>>>>>> pci_set_power_state() in pci_restore_standard_config() should put int into
>>>>>> D0 proper, including the platform firmware part.
>>>>>>
>>>>>
>>>>>  The pci_restore_standard_config() for endpoint devices are being called
>>>>>  internally during wake-up of upstream bridge.
>>>>>
>>>>>  pci_power_up(struct pci_dev *dev)
>>>>>  {
>>>>>       ...
>>>>>       if (dev->runtime_d3cold) {
>>>>>         /*
>>>>>          * When powering on a bridge from D3cold, the whole hierarchy
>>>>>          * may be powered on into D0uninitialized state, resume them to
>>>>>          * give them a chance to suspend again
>>>>>          */
>>>>>         pci_resume_bus(dev->subordinate);
>>>>>     }
>>>>>     ...
>>>>>  }
>>>>>
>>>>>  For the upstream bridge, the above code will trigger the wake-up of
>>>>>  endpoint devices and then following code will be executed for the
>>>>>  endpoint devices:
>>>>>
>>>>>  pci_update_current_state(struct pci_dev *dev, pci_power_t state)
>>>>>  {
>>>>>     if (platform_pci_get_power_state(dev) == PCI_D3cold ||
>>>>>         !pci_device_is_present(dev)) {
>>>>>         dev->current_state = PCI_D3cold;
>>>>>     } else if (dev->pm_cap) {
>>>>>         u16 pmcsr;
>>>>>
>>>>>         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>>>>>         dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
>>>>>     } else {
>>>>>         dev->current_state = state;
>>>>>     }
>>>>>  }
>>>>>
>>>>>  In the above code, the current_state will be set to D0 for the
>>>>>  endpoint devices since it will go into second block where
>>>>>  it will read the PM_CTRL register.
>>>>>
>>>>>>>> We need call the pci_platform_power_transition() with state
>>>>>>>> D0 to change the ACPI state to ACPI_STATE_D0.
>>>>>>>>
>>>>>>>> This patch calls pci_power_up() if current power state is D0 inside
>>>>>>>> pci_restore_standard_config(). This pci_power_up() will change the
>>>>>>>> ACPI state to ACPI_STATE_D0.
>>>>>>>>
>>>>>>>> Following are the steps to confirm:
>>>>>>>>
>>>>>>>> Enable the debug prints in acpi_pci_set_power_state()
>>>>>>>>
>>>>>>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device
>>>>>>>>
>>>>>>>> Before:
>>>>>>>>
>>>>>>>> 0000:01:00.0: power state changed by ACPI to D3hot
>>>>>>>> 0000:00:01.0: power state changed by ACPI to D3cold
>>>>>>>> 0000:00:01.0: power state changed by ACPI to D0
>>>>>>>>
>>>>>>>> After:
>>>>>>>>
>>>>>>>> 0000:01:00.0: power state changed by ACPI to D3hot
>>>>>>>> 0000:00:01.0: power state changed by ACPI to D3cold
>>>>>>>> 0000:00:01.0: power state changed by ACPI to D0
>>>>>>>> 0000:01:00.0: power state changed by ACPI to D0
>>>>>>>>
>>>>>>>> So with this patch, the PCI device ACPI state is also being
>>>>>>>> changed to D0.
>>>>>>>>
>>>>>>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
>>>>>>>> ---
>>>>>>>>  drivers/pci/pci-driver.c | 14 +++++++++++---
>>>>>>>>  1 file changed, 11 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>>>>>>>> index 588588cfda48..64e0cca12f16 100644
>>>>>>>> --- a/drivers/pci/pci-driver.c
>>>>>>>> +++ b/drivers/pci/pci-driver.c
>>>>>>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev)
>>>>>>>>   */
>>>>>>>>  static int pci_restore_standard_config(struct pci_dev *pci_dev)
>>>>>>>>  {
>>>>>>>> +   int error = 0;
>>>>>>>>     pci_update_current_state(pci_dev, PCI_UNKNOWN);
>>>>>>>>
>>>>>>>>     if (pci_dev->current_state != PCI_D0) {
>>>>>>>> -           int error = pci_set_power_state(pci_dev, PCI_D0);
>>>>>>>> -           if (error)
>>>>>>>> -                   return error;
>>>>>>>> +           error = pci_set_power_state(pci_dev, PCI_D0);
>>>>>>>> +   } else {
>>>>>>>> +           /*
>>>>>>>> +            * The platform power state can still be non-D0, so this is
>>>>>>>> +            * required to change the platform power state to D0.
>>>>>>>> +            */
>>>>>>
>>>>>> This really isn't expected to happen.
>>>>>>
>>>>>> If the device's power state has been changed to D3hot by ACPI, it is not in D0.
>>>>>>
>>>>>> It looks like the state tracking is not working here.
>>>>>>
>>>>>
>>>>>  The state setting to D0 is happening due to the current logic present in
>>>>>  pci_update_current_state(). If we can fix the logic in
>>>>>  pci_update_current_state() to detect this condition and return state D3hot,
>>>>>  then it should also fix the issue.
>>>>>
>>>>>  Thanks,
>>>>>  Abhishek
>>>>>
>>>>
>>>>  Hi Rafael/Mika,
>>>>
>>>>  Could you please help regarding the correct way to fix this issue.
>>>>  I can update the patch accordingly.
>>>
>>> I think you can try one of the patches posted recently:
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-pm%2Fpatch%2F3623886.MHq7AAxBmi%40kreacher%2F&amp;data=04%7C01%7Cabhsahu%40nvidia.com%7C87470c4f30f14871bcc708da17c5e913%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637848436478123578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=USJs87Fmm0Vcjm1YWszZCBTpcNQED3InOuOdGgE3f88%3D&amp;reserved=0
>>>
>>> Thanks!
>>>
>>>
>>>
>>
>>  Thanks Rafael.
>>  I have applied both the changes and still the issue which I mentioned is happening.
>>
>>  Following are the prints:
>>
>>  0000:01:00.0: power state changed by ACPI to D3hot
>>  0000:00:01.0: power state changed by ACPI to D3cold
>>  0000:00:01.0: power state changed by ACPI to D0
>>
>>  So ACPI state change is still not happening for PCI endpoint devices.
>>
>>  Also, the I checked the code and the pci_power_up() will not be called
>>  for endpoint devices. For endpoints, pci_restore_standard_config() will
>>  be called first where the current state will come as D0.
> 
> OK, I see.
> 
> The problem is that if the PCI device goes to D0 because of the bridge power-up,
> it's ACPI companion's power state may not follow, which means that we really
> want to do a full power-up in there.
> 
> Please test the appended patch with the patch from
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-pm%2Fpatch%2F3623886.MHq7AAxBmi%40kreacher%2F&amp;data=04%7C01%7Cabhsahu%40nvidia.com%7C87470c4f30f14871bcc708da17c5e913%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637848436478123578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=USJs87Fmm0Vcjm1YWszZCBTpcNQED3InOuOdGgE3f88%3D&amp;reserved=0
> 
> still applied.
> 
> ---
>  drivers/pci/pci-driver.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -1312,7 +1312,7 @@ static int pci_pm_runtime_resume(struct
>          * to a driver because although we left it in D0, it may have gone to
>          * D3cold when the bridge above it runtime suspended.
>          */
> -       pci_restore_standard_config(pci_dev);
> +       pci_pm_default_resume_early(pci_dev);
> 
>         if (!pci_dev->driver)
>                 return 0;
> 

 Thanks Rafael.
 With the above code change, the issue is getting fixed and the
 PCI endpoint power state is also changing to D0.

 0000:01:00.0: power state changed by ACPI to D3hot
 0000:00:01.0: power state changed by ACPI to D3cold
 0000:00:01.0: power state changed by ACPI to D0
 0000:01:00.0: power state changed by ACPI to D0

 Regards,
 Abhishek
Rafael J. Wysocki April 6, 2022, 6:49 p.m. UTC | #10
On Wednesday, April 6, 2022 8:43:19 PM CEST Abhishek Sahu wrote:
> On 4/6/2022 5:36 PM, Rafael J. Wysocki wrote:
> > On Wednesday, April 6, 2022 7:32:45 AM CEST Abhishek Sahu wrote:
> >> On 4/5/2022 10:20 PM, Rafael J. Wysocki wrote:
> >>> On Tuesday, April 5, 2022 6:36:34 PM CEST Abhishek Sahu wrote:
> >>>> On 2/8/2022 4:00 PM, Abhishek Sahu wrote:
> >>>>> On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote:
> >>>>>> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote:
> >>>>>>> [+cc Rafael, hoping for your review :)
> >>>>>>
> >>>>>> +Mika
> >>>>>>
> >>>>>>> Wonder if we should add something like this to MAINTAINERS so you get
> >>>>>>> cc'd on power-related things:
> >>>>>>>
> >>>>>>> diff --git a/MAINTAINERS b/MAINTAINERS
> >>>>>>> index ea3e6c914384..3d9a211cad5d 100644
> >>>>>>> --- a/MAINTAINERS
> >>>>>>> +++ b/MAINTAINERS
> >>>>>>> @@ -15422,6 +15422,7 @@ F:    include/linux/pm.h
> >>>>>>>  F:   include/linux/pm_*
> >>>>>>>  F:   include/linux/powercap.h
> >>>>>>>  F:   kernel/configs/nopm.config
> >>>>>>> +K:   pci_[a-z_]*power[a-z_]*\(
> >>>>>>
> >>>>>> It seems so, but generally PM patches should be CCed to linux-pm anyway.
> >>>>>>
> >>>>>>>
> >>>>>>>  DYNAMIC THERMAL POWER MANAGEMENT (DTPM)
> >>>>>>>  M:   Daniel Lezcano <daniel.lezcano@kernel.org>
> >>>>>>> ]
> >>>>>>>
> >>>>>>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote:
> >>>>>>>> Consider the following sequence during PCI device runtime
> >>>>>>>> suspend/resume:
> >>>>>>>>
> >>>>>>>> 1. PCI device goes into runtime suspended state. The PCI state
> >>>>>>>>    will be changed to PCI_D0 and then pci_platform_power_transition()
> >>>>>>>>    will be called which changes the ACPI state to ACPI_STATE_D3_HOT.
> >>>>>>
> >>>>>> You mean PCI_D3hot I suppose?
> >>>>>>
> >>>>>
> >>>>>  Yes. It should be PCI_D3hot here.
> >>>>>
> >>>>>>>> 2. Parent bridge goes into runtime suspended state. If parent
> >>>>>>>>    bridge supports D3cold, then it will change the power state of all its
> >>>>>>>>    children to D3cold state and the power will be removed.
> >>>>>>>>
> >>>>>>>> 3. During wake-up time, the bridge will be runtime resumed first
> >>>>>>>>    and pci_power_up() will be called for the bridge. Now, the power
> >>>>>>>>    supply will be resumed.
> >>>>>>>>
> >>>>>>>> 4. pci_resume_bus() will be called which will internally invoke
> >>>>>>>>    pci_restore_standard_config(). pci_update_current_state()
> >>>>>>>>    will read PCI_PM_CTRL register and the current_state will be
> >>>>>>>>    updated to D0.
> >>>>>>>>
> >>>>>>>> In the above process, at step 4, the ACPI device state will still be
> >>>>>>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being
> >>>>>>>> invoked.
> >>>>>>
> >>>>>> I'm not quite following.
> >>>>>>
> >>>>>> I'm assuming that this description applies to the endpoint device that was
> >>>>>> previously put into D3_hot.
> >>>>>>
> >>>>>
> >>>>>  Yes. This is applicable for endpoint devices which was previously put
> >>>>>  into D3hot.
> >>>>>
> >>>>>> Since its current state is D3_hot, it is not D0 (in particular) and the
> >>>>>> pci_set_power_state() in pci_restore_standard_config() should put int into
> >>>>>> D0 proper, including the platform firmware part.
> >>>>>>
> >>>>>
> >>>>>  The pci_restore_standard_config() for endpoint devices are being called
> >>>>>  internally during wake-up of upstream bridge.
> >>>>>
> >>>>>  pci_power_up(struct pci_dev *dev)
> >>>>>  {
> >>>>>       ...
> >>>>>       if (dev->runtime_d3cold) {
> >>>>>         /*
> >>>>>          * When powering on a bridge from D3cold, the whole hierarchy
> >>>>>          * may be powered on into D0uninitialized state, resume them to
> >>>>>          * give them a chance to suspend again
> >>>>>          */
> >>>>>         pci_resume_bus(dev->subordinate);
> >>>>>     }
> >>>>>     ...
> >>>>>  }
> >>>>>
> >>>>>  For the upstream bridge, the above code will trigger the wake-up of
> >>>>>  endpoint devices and then following code will be executed for the
> >>>>>  endpoint devices:
> >>>>>
> >>>>>  pci_update_current_state(struct pci_dev *dev, pci_power_t state)
> >>>>>  {
> >>>>>     if (platform_pci_get_power_state(dev) == PCI_D3cold ||
> >>>>>         !pci_device_is_present(dev)) {
> >>>>>         dev->current_state = PCI_D3cold;
> >>>>>     } else if (dev->pm_cap) {
> >>>>>         u16 pmcsr;
> >>>>>
> >>>>>         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> >>>>>         dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
> >>>>>     } else {
> >>>>>         dev->current_state = state;
> >>>>>     }
> >>>>>  }
> >>>>>
> >>>>>  In the above code, the current_state will be set to D0 for the
> >>>>>  endpoint devices since it will go into second block where
> >>>>>  it will read the PM_CTRL register.
> >>>>>
> >>>>>>>> We need call the pci_platform_power_transition() with state
> >>>>>>>> D0 to change the ACPI state to ACPI_STATE_D0.
> >>>>>>>>
> >>>>>>>> This patch calls pci_power_up() if current power state is D0 inside
> >>>>>>>> pci_restore_standard_config(). This pci_power_up() will change the
> >>>>>>>> ACPI state to ACPI_STATE_D0.
> >>>>>>>>
> >>>>>>>> Following are the steps to confirm:
> >>>>>>>>
> >>>>>>>> Enable the debug prints in acpi_pci_set_power_state()
> >>>>>>>>
> >>>>>>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device
> >>>>>>>>
> >>>>>>>> Before:
> >>>>>>>>
> >>>>>>>> 0000:01:00.0: power state changed by ACPI to D3hot
> >>>>>>>> 0000:00:01.0: power state changed by ACPI to D3cold
> >>>>>>>> 0000:00:01.0: power state changed by ACPI to D0
> >>>>>>>>
> >>>>>>>> After:
> >>>>>>>>
> >>>>>>>> 0000:01:00.0: power state changed by ACPI to D3hot
> >>>>>>>> 0000:00:01.0: power state changed by ACPI to D3cold
> >>>>>>>> 0000:00:01.0: power state changed by ACPI to D0
> >>>>>>>> 0000:01:00.0: power state changed by ACPI to D0
> >>>>>>>>
> >>>>>>>> So with this patch, the PCI device ACPI state is also being
> >>>>>>>> changed to D0.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
> >>>>>>>> ---
> >>>>>>>>  drivers/pci/pci-driver.c | 14 +++++++++++---
> >>>>>>>>  1 file changed, 11 insertions(+), 3 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> >>>>>>>> index 588588cfda48..64e0cca12f16 100644
> >>>>>>>> --- a/drivers/pci/pci-driver.c
> >>>>>>>> +++ b/drivers/pci/pci-driver.c
> >>>>>>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev)
> >>>>>>>>   */
> >>>>>>>>  static int pci_restore_standard_config(struct pci_dev *pci_dev)
> >>>>>>>>  {
> >>>>>>>> +   int error = 0;
> >>>>>>>>     pci_update_current_state(pci_dev, PCI_UNKNOWN);
> >>>>>>>>
> >>>>>>>>     if (pci_dev->current_state != PCI_D0) {
> >>>>>>>> -           int error = pci_set_power_state(pci_dev, PCI_D0);
> >>>>>>>> -           if (error)
> >>>>>>>> -                   return error;
> >>>>>>>> +           error = pci_set_power_state(pci_dev, PCI_D0);
> >>>>>>>> +   } else {
> >>>>>>>> +           /*
> >>>>>>>> +            * The platform power state can still be non-D0, so this is
> >>>>>>>> +            * required to change the platform power state to D0.
> >>>>>>>> +            */
> >>>>>>
> >>>>>> This really isn't expected to happen.
> >>>>>>
> >>>>>> If the device's power state has been changed to D3hot by ACPI, it is not in D0.
> >>>>>>
> >>>>>> It looks like the state tracking is not working here.
> >>>>>>
> >>>>>
> >>>>>  The state setting to D0 is happening due to the current logic present in
> >>>>>  pci_update_current_state(). If we can fix the logic in
> >>>>>  pci_update_current_state() to detect this condition and return state D3hot,
> >>>>>  then it should also fix the issue.
> >>>>>
> >>>>>  Thanks,
> >>>>>  Abhishek
> >>>>>
> >>>>
> >>>>  Hi Rafael/Mika,
> >>>>
> >>>>  Could you please help regarding the correct way to fix this issue.
> >>>>  I can update the patch accordingly.
> >>>
> >>> I think you can try one of the patches posted recently:
> >>>
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-pm%2Fpatch%2F3623886.MHq7AAxBmi%40kreacher%2F&amp;data=04%7C01%7Cabhsahu%40nvidia.com%7C87470c4f30f14871bcc708da17c5e913%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637848436478123578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=USJs87Fmm0Vcjm1YWszZCBTpcNQED3InOuOdGgE3f88%3D&amp;reserved=0
> >>>
> >>> Thanks!
> >>>
> >>>
> >>>
> >>
> >>  Thanks Rafael.
> >>  I have applied both the changes and still the issue which I mentioned is happening.
> >>
> >>  Following are the prints:
> >>
> >>  0000:01:00.0: power state changed by ACPI to D3hot
> >>  0000:00:01.0: power state changed by ACPI to D3cold
> >>  0000:00:01.0: power state changed by ACPI to D0
> >>
> >>  So ACPI state change is still not happening for PCI endpoint devices.
> >>
> >>  Also, the I checked the code and the pci_power_up() will not be called
> >>  for endpoint devices. For endpoints, pci_restore_standard_config() will
> >>  be called first where the current state will come as D0.
> > 
> > OK, I see.
> > 
> > The problem is that if the PCI device goes to D0 because of the bridge power-up,
> > it's ACPI companion's power state may not follow, which means that we really
> > want to do a full power-up in there.
> > 
> > Please test the appended patch with the patch from
> > 
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-pm%2Fpatch%2F3623886.MHq7AAxBmi%40kreacher%2F&amp;data=04%7C01%7Cabhsahu%40nvidia.com%7C87470c4f30f14871bcc708da17c5e913%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637848436478123578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=USJs87Fmm0Vcjm1YWszZCBTpcNQED3InOuOdGgE3f88%3D&amp;reserved=0
> > 
> > still applied.
> > 
> > ---
> >  drivers/pci/pci-driver.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > Index: linux-pm/drivers/pci/pci-driver.c
> > ===================================================================
> > --- linux-pm.orig/drivers/pci/pci-driver.c
> > +++ linux-pm/drivers/pci/pci-driver.c
> > @@ -1312,7 +1312,7 @@ static int pci_pm_runtime_resume(struct
> >          * to a driver because although we left it in D0, it may have gone to
> >          * D3cold when the bridge above it runtime suspended.
> >          */
> > -       pci_restore_standard_config(pci_dev);
> > +       pci_pm_default_resume_early(pci_dev);
> > 
> >         if (!pci_dev->driver)
> >                 return 0;
> > 
> 
>  Thanks Rafael.
>  With the above code change, the issue is getting fixed and the
>  PCI endpoint power state is also changing to D0.
> 
>  0000:01:00.0: power state changed by ACPI to D3hot
>  0000:00:01.0: power state changed by ACPI to D3cold
>  0000:00:01.0: power state changed by ACPI to D0
>  0000:01:00.0: power state changed by ACPI to D0

Thanks for testing!
diff mbox series

Patch

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 588588cfda48..64e0cca12f16 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -521,14 +521,22 @@  static void pci_device_shutdown(struct device *dev)
  */
 static int pci_restore_standard_config(struct pci_dev *pci_dev)
 {
+	int error = 0;
 	pci_update_current_state(pci_dev, PCI_UNKNOWN);
 
 	if (pci_dev->current_state != PCI_D0) {
-		int error = pci_set_power_state(pci_dev, PCI_D0);
-		if (error)
-			return error;
+		error = pci_set_power_state(pci_dev, PCI_D0);
+	} else {
+		/*
+		 * The platform power state can still be non-D0, so this is
+		 * required to change the platform power state to D0.
+		 */
+		error = pci_power_up(pci_dev);
 	}
 
+	if (error)
+		return error;
+
 	pci_restore_state(pci_dev);
 	pci_pme_restore(pci_dev);
 	return 0;