diff mbox series

[v3,4/9] PCI/PM: Rework changing power states of PCI devices

Message ID 3687697.kQq0lBPeGt@kreacher (mailing list archive)
State Accepted
Commit 5bffe4c611f567d83c579378c54b13c21fd0fb98
Delegated to: Bjorn Helgaas
Headers show
Series PCI/PM: Improvements related to device transitions into D0 | expand

Commit Message

Rafael J. Wysocki April 14, 2022, 1:11 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

There are some issues related to changing power states of PCI
devices, mostly related to carrying out unnecessary actions in some
places, and the code is generally hard to follow.

 1. pci_power_up() has two callers, pci_set_power_state() and
    pci_pm_default_resume_early().  The latter updates the current
    power state of the device right after calling pci_power_up()
    and it restores the entire config space of the device right
    after that, so pci_power_up() itself need not read the
    PCI_PM_CTRL register or restore the BARs after programming the
    device into D0 in that case.
 
 2. It is generally hard to get a clear view of the pci_power_up()
    code flow, especially in some corner cases, due to all of the
    involved PCI_PM_CTRL register reads and writes occurring in
    pci_platform_power_transition() and in pci_raw_set_power_state(),
    some of which are redundant.

 3. The transitions from low-power states to D0 and the other way
    around are unnecessarily tangled in pci_raw_set_power_state()
    which causes it to use a redundant local variable and makes it
    rather hard to follow.

To address the above shortcomings, make the following changes:

 a. Remove the code handling transitions into D0
    from pci_raw_set_power_state() and rename it as
    pci_set_low_power_state().

 b. Add the code handling transitions into D0 directly
    to pci_power_up() and to a new wrapper function
    pci_set_full_power_state() calling it internally that is
    only used in pci_set_power_state().

 c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
    and make it work in the same way for transitions from any
    low-power states (transitions from D1 and D2 are handled
    slightly differently before the change).

 d. Put the restoration of the BARs and the PCI_PM_CTRL
    register read confirming the power state change into
    pci_set_full_power_state() to avoid doing that in
    pci_pm_default_resume_early() unnecessarily.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---

v2 -> v3:
   * Make pci_power_up() check dev->pm_cap.
   * Rephrase a comment in pci_power_up() (Mika).
   * Add kerneldoc comment to pci_set_full_power_state() (Mika).
   * Add R-by from Mika.

v1 -> v2:
   * Do not add a redundant check to pci_set_low_power_state().

---
 drivers/pci/pci.c |  175 +++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 122 insertions(+), 53 deletions(-)

Comments

Nathan Chancellor May 3, 2022, 5:59 p.m. UTC | #1
Hi Rafael,

On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> There are some issues related to changing power states of PCI
> devices, mostly related to carrying out unnecessary actions in some
> places, and the code is generally hard to follow.
> 
>  1. pci_power_up() has two callers, pci_set_power_state() and
>     pci_pm_default_resume_early().  The latter updates the current
>     power state of the device right after calling pci_power_up()
>     and it restores the entire config space of the device right
>     after that, so pci_power_up() itself need not read the
>     PCI_PM_CTRL register or restore the BARs after programming the
>     device into D0 in that case.
>  
>  2. It is generally hard to get a clear view of the pci_power_up()
>     code flow, especially in some corner cases, due to all of the
>     involved PCI_PM_CTRL register reads and writes occurring in
>     pci_platform_power_transition() and in pci_raw_set_power_state(),
>     some of which are redundant.
> 
>  3. The transitions from low-power states to D0 and the other way
>     around are unnecessarily tangled in pci_raw_set_power_state()
>     which causes it to use a redundant local variable and makes it
>     rather hard to follow.
> 
> To address the above shortcomings, make the following changes:
> 
>  a. Remove the code handling transitions into D0
>     from pci_raw_set_power_state() and rename it as
>     pci_set_low_power_state().
> 
>  b. Add the code handling transitions into D0 directly
>     to pci_power_up() and to a new wrapper function
>     pci_set_full_power_state() calling it internally that is
>     only used in pci_set_power_state().
> 
>  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
>     and make it work in the same way for transitions from any
>     low-power states (transitions from D1 and D2 are handled
>     slightly differently before the change).
> 
>  d. Put the restoration of the BARs and the PCI_PM_CTRL
>     register read confirming the power state change into
>     pci_set_full_power_state() to avoid doing that in
>     pci_pm_default_resume_early() unnecessarily.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
states of PCI devices") causes my AMD-based system to fail to fully
boot. As far as I can tell, this might be NVMe related, which might make
getting a full log difficult, as journalctl won't have anywhere to save
it. I see:

nvme nvme0: I/O 8 QID 0 timeout, completion polled

then shortly afterwards:

nvme nvme0: I/O 24 QID 0 timeout, completion polled
nvme nvme0: missing or invalid SUBNQN field

then I am dropped into an emergency shell.

This is a log from the previous commit, which may give some hints about
the configuration of this particular system.

https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log

If there is any additional debugging information I can provide or
patches I can try, please let me know!

Cheers,
Nathan
Rafael J. Wysocki May 4, 2022, 12:59 p.m. UTC | #2
On Tue, May 3, 2022 at 7:59 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> Hi Rafael,
>
> On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > There are some issues related to changing power states of PCI
> > devices, mostly related to carrying out unnecessary actions in some
> > places, and the code is generally hard to follow.
> >
> >  1. pci_power_up() has two callers, pci_set_power_state() and
> >     pci_pm_default_resume_early().  The latter updates the current
> >     power state of the device right after calling pci_power_up()
> >     and it restores the entire config space of the device right
> >     after that, so pci_power_up() itself need not read the
> >     PCI_PM_CTRL register or restore the BARs after programming the
> >     device into D0 in that case.
> >
> >  2. It is generally hard to get a clear view of the pci_power_up()
> >     code flow, especially in some corner cases, due to all of the
> >     involved PCI_PM_CTRL register reads and writes occurring in
> >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> >     some of which are redundant.
> >
> >  3. The transitions from low-power states to D0 and the other way
> >     around are unnecessarily tangled in pci_raw_set_power_state()
> >     which causes it to use a redundant local variable and makes it
> >     rather hard to follow.
> >
> > To address the above shortcomings, make the following changes:
> >
> >  a. Remove the code handling transitions into D0
> >     from pci_raw_set_power_state() and rename it as
> >     pci_set_low_power_state().
> >
> >  b. Add the code handling transitions into D0 directly
> >     to pci_power_up() and to a new wrapper function
> >     pci_set_full_power_state() calling it internally that is
> >     only used in pci_set_power_state().
> >
> >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> >     and make it work in the same way for transitions from any
> >     low-power states (transitions from D1 and D2 are handled
> >     slightly differently before the change).
> >
> >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> >     register read confirming the power state change into
> >     pci_set_full_power_state() to avoid doing that in
> >     pci_pm_default_resume_early() unnecessarily.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>
> This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> states of PCI devices") causes my AMD-based system to fail to fully
> boot. As far as I can tell, this might be NVMe related, which might make
> getting a full log difficult, as journalctl won't have anywhere to save
> it. I see:
>
> nvme nvme0: I/O 8 QID 0 timeout, completion polled
>
> then shortly afterwards:
>
> nvme nvme0: I/O 24 QID 0 timeout, completion polled
> nvme nvme0: missing or invalid SUBNQN field
>
> then I am dropped into an emergency shell.

Thanks for the report!

> This is a log from the previous commit, which may give some hints about
> the configuration of this particular system.
>
> https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log
>
> If there is any additional debugging information I can provide or
> patches I can try, please let me know!

Please see what happens if the "if (dev->current_state == PCI_D0)"
check and the following "return 0" statement in pci_power_up() are
commented out.
Anders Roxell May 4, 2022, 3:54 p.m. UTC | #3
On 2022-05-04 14:59, Rafael J. Wysocki wrote:
> On Tue, May 3, 2022 at 7:59 PM Nathan Chancellor <nathan@kernel.org> wrote:
> >
> > Hi Rafael,
> >
> > On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > There are some issues related to changing power states of PCI
> > > devices, mostly related to carrying out unnecessary actions in some
> > > places, and the code is generally hard to follow.
> > >
> > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > >     pci_pm_default_resume_early().  The latter updates the current
> > >     power state of the device right after calling pci_power_up()
> > >     and it restores the entire config space of the device right
> > >     after that, so pci_power_up() itself need not read the
> > >     PCI_PM_CTRL register or restore the BARs after programming the
> > >     device into D0 in that case.
> > >
> > >  2. It is generally hard to get a clear view of the pci_power_up()
> > >     code flow, especially in some corner cases, due to all of the
> > >     involved PCI_PM_CTRL register reads and writes occurring in
> > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > >     some of which are redundant.
> > >
> > >  3. The transitions from low-power states to D0 and the other way
> > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > >     which causes it to use a redundant local variable and makes it
> > >     rather hard to follow.
> > >
> > > To address the above shortcomings, make the following changes:
> > >
> > >  a. Remove the code handling transitions into D0
> > >     from pci_raw_set_power_state() and rename it as
> > >     pci_set_low_power_state().
> > >
> > >  b. Add the code handling transitions into D0 directly
> > >     to pci_power_up() and to a new wrapper function
> > >     pci_set_full_power_state() calling it internally that is
> > >     only used in pci_set_power_state().
> > >
> > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > >     and make it work in the same way for transitions from any
> > >     low-power states (transitions from D1 and D2 are handled
> > >     slightly differently before the change).
> > >
> > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > >     register read confirming the power state change into
> > >     pci_set_full_power_state() to avoid doing that in
> > >     pci_pm_default_resume_early() unnecessarily.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> >
> > This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> > states of PCI devices") causes my AMD-based system to fail to fully
> > boot. As far as I can tell, this might be NVMe related, which might make
> > getting a full log difficult, as journalctl won't have anywhere to save
> > it. I see:
> >
> > nvme nvme0: I/O 8 QID 0 timeout, completion polled
> >
> > then shortly afterwards:
> >
> > nvme nvme0: I/O 24 QID 0 timeout, completion polled
> > nvme nvme0: missing or invalid SUBNQN field
> >
> > then I am dropped into an emergency shell.
> 
> Thanks for the report!
> 
> > This is a log from the previous commit, which may give some hints about
> > the configuration of this particular system.
> >
> > https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log
> >
> > If there is any additional debugging information I can provide or
> > patches I can try, please let me know!
> 
> Please see what happens if the "if (dev->current_state == PCI_D0)"
> check and the following "return 0" statement in pci_power_up() are
> commented out.

I've built an arm64 allmodconfig kernel on linux-next tag next-20220503, and tried to boot it.
This is the boot error I see [1].
I bisected down to this patch [2]

When I revert the following patches [3] the kernel boots fine.
I also tried next-20220504 and I saw the same issue.

Cheers,
Anders
[1] http://ix.io/3WT3
[2] http://ix.io/3WXT
[3] http://ix.io/3WXU
Nathan Chancellor May 4, 2022, 4:36 p.m. UTC | #4
On Wed, May 04, 2022 at 02:59:17PM +0200, Rafael J. Wysocki wrote:
> On Tue, May 3, 2022 at 7:59 PM Nathan Chancellor <nathan@kernel.org> wrote:
> >
> > Hi Rafael,
> >
> > On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > There are some issues related to changing power states of PCI
> > > devices, mostly related to carrying out unnecessary actions in some
> > > places, and the code is generally hard to follow.
> > >
> > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > >     pci_pm_default_resume_early().  The latter updates the current
> > >     power state of the device right after calling pci_power_up()
> > >     and it restores the entire config space of the device right
> > >     after that, so pci_power_up() itself need not read the
> > >     PCI_PM_CTRL register or restore the BARs after programming the
> > >     device into D0 in that case.
> > >
> > >  2. It is generally hard to get a clear view of the pci_power_up()
> > >     code flow, especially in some corner cases, due to all of the
> > >     involved PCI_PM_CTRL register reads and writes occurring in
> > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > >     some of which are redundant.
> > >
> > >  3. The transitions from low-power states to D0 and the other way
> > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > >     which causes it to use a redundant local variable and makes it
> > >     rather hard to follow.
> > >
> > > To address the above shortcomings, make the following changes:
> > >
> > >  a. Remove the code handling transitions into D0
> > >     from pci_raw_set_power_state() and rename it as
> > >     pci_set_low_power_state().
> > >
> > >  b. Add the code handling transitions into D0 directly
> > >     to pci_power_up() and to a new wrapper function
> > >     pci_set_full_power_state() calling it internally that is
> > >     only used in pci_set_power_state().
> > >
> > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > >     and make it work in the same way for transitions from any
> > >     low-power states (transitions from D1 and D2 are handled
> > >     slightly differently before the change).
> > >
> > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > >     register read confirming the power state change into
> > >     pci_set_full_power_state() to avoid doing that in
> > >     pci_pm_default_resume_early() unnecessarily.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> >
> > This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> > states of PCI devices") causes my AMD-based system to fail to fully
> > boot. As far as I can tell, this might be NVMe related, which might make
> > getting a full log difficult, as journalctl won't have anywhere to save
> > it. I see:
> >
> > nvme nvme0: I/O 8 QID 0 timeout, completion polled
> >
> > then shortly afterwards:
> >
> > nvme nvme0: I/O 24 QID 0 timeout, completion polled
> > nvme nvme0: missing or invalid SUBNQN field
> >
> > then I am dropped into an emergency shell.
> 
> Thanks for the report!
> 
> > This is a log from the previous commit, which may give some hints about
> > the configuration of this particular system.
> >
> > https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log
> >
> > If there is any additional debugging information I can provide or
> > patches I can try, please let me know!
> 
> Please see what happens if the "if (dev->current_state == PCI_D0)"
> check and the following "return 0" statement in pci_power_up() are
> commented out.

If I understand you correctly, this? Unfortunately, that does not help.

Cheers,
Nathan

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1e22dc5187e7..9f7a463107f3 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1235,8 +1235,10 @@ int pci_power_up(struct pci_dev *dev)
 	}
 
 	/* There's nothing more to do if current_state is D0 at this point. */
+#if 0
 	if (dev->current_state == PCI_D0)
 		return 0;
+#endif
 
 	/*
 	 * Program the device into PCI_D0 by forcing the entire word to 0 (this
Bjorn Helgaas May 4, 2022, 4:54 p.m. UTC | #5
[+cc Anders]

On Tue, May 03, 2022 at 10:59:43AM -0700, Nathan Chancellor wrote:
> On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > There are some issues related to changing power states of PCI
> > devices, mostly related to carrying out unnecessary actions in some
> > places, and the code is generally hard to follow.
> > 
> >  1. pci_power_up() has two callers, pci_set_power_state() and
> >     pci_pm_default_resume_early().  The latter updates the current
> >     power state of the device right after calling pci_power_up()
> >     and it restores the entire config space of the device right
> >     after that, so pci_power_up() itself need not read the
> >     PCI_PM_CTRL register or restore the BARs after programming the
> >     device into D0 in that case.
> >  
> >  2. It is generally hard to get a clear view of the pci_power_up()
> >     code flow, especially in some corner cases, due to all of the
> >     involved PCI_PM_CTRL register reads and writes occurring in
> >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> >     some of which are redundant.
> > 
> >  3. The transitions from low-power states to D0 and the other way
> >     around are unnecessarily tangled in pci_raw_set_power_state()
> >     which causes it to use a redundant local variable and makes it
> >     rather hard to follow.
> > 
> > To address the above shortcomings, make the following changes:
> > 
> >  a. Remove the code handling transitions into D0
> >     from pci_raw_set_power_state() and rename it as
> >     pci_set_low_power_state().
> > 
> >  b. Add the code handling transitions into D0 directly
> >     to pci_power_up() and to a new wrapper function
> >     pci_set_full_power_state() calling it internally that is
> >     only used in pci_set_power_state().
> > 
> >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> >     and make it work in the same way for transitions from any
> >     low-power states (transitions from D1 and D2 are handled
> >     slightly differently before the change).
> > 
> >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> >     register read confirming the power state change into
> >     pci_set_full_power_state() to avoid doing that in
> >     pci_pm_default_resume_early() unnecessarily.
> > 
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> states of PCI devices") causes my AMD-based system to fail to fully
> boot.

I dropped 5bffe4c611f5 and subsequent pci/pm patches temporarily while
this gets worked out.

Bjorn
Rafael J. Wysocki May 4, 2022, 5:15 p.m. UTC | #6
On Wed, May 4, 2022 at 7:03 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Anders]
>
> On Tue, May 03, 2022 at 10:59:43AM -0700, Nathan Chancellor wrote:
> > On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > There are some issues related to changing power states of PCI
> > > devices, mostly related to carrying out unnecessary actions in some
> > > places, and the code is generally hard to follow.
> > >
> > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > >     pci_pm_default_resume_early().  The latter updates the current
> > >     power state of the device right after calling pci_power_up()
> > >     and it restores the entire config space of the device right
> > >     after that, so pci_power_up() itself need not read the
> > >     PCI_PM_CTRL register or restore the BARs after programming the
> > >     device into D0 in that case.
> > >
> > >  2. It is generally hard to get a clear view of the pci_power_up()
> > >     code flow, especially in some corner cases, due to all of the
> > >     involved PCI_PM_CTRL register reads and writes occurring in
> > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > >     some of which are redundant.
> > >
> > >  3. The transitions from low-power states to D0 and the other way
> > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > >     which causes it to use a redundant local variable and makes it
> > >     rather hard to follow.
> > >
> > > To address the above shortcomings, make the following changes:
> > >
> > >  a. Remove the code handling transitions into D0
> > >     from pci_raw_set_power_state() and rename it as
> > >     pci_set_low_power_state().
> > >
> > >  b. Add the code handling transitions into D0 directly
> > >     to pci_power_up() and to a new wrapper function
> > >     pci_set_full_power_state() calling it internally that is
> > >     only used in pci_set_power_state().
> > >
> > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > >     and make it work in the same way for transitions from any
> > >     low-power states (transitions from D1 and D2 are handled
> > >     slightly differently before the change).
> > >
> > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > >     register read confirming the power state change into
> > >     pci_set_full_power_state() to avoid doing that in
> > >     pci_pm_default_resume_early() unnecessarily.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> >
> > This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> > states of PCI devices") causes my AMD-based system to fail to fully
> > boot.
>
> I dropped 5bffe4c611f5 and subsequent pci/pm patches temporarily while
> this gets worked out.

OK

It looks like I missed something subtle that triggers on a subset of
systems only.
Rafael J. Wysocki May 4, 2022, 6 p.m. UTC | #7
On Wednesday, May 4, 2022 6:36:00 PM CEST Nathan Chancellor wrote:
> On Wed, May 04, 2022 at 02:59:17PM +0200, Rafael J. Wysocki wrote:
> > On Tue, May 3, 2022 at 7:59 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > >
> > > Hi Rafael,
> > >
> > > On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > There are some issues related to changing power states of PCI
> > > > devices, mostly related to carrying out unnecessary actions in some
> > > > places, and the code is generally hard to follow.
> > > >
> > > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > > >     pci_pm_default_resume_early().  The latter updates the current
> > > >     power state of the device right after calling pci_power_up()
> > > >     and it restores the entire config space of the device right
> > > >     after that, so pci_power_up() itself need not read the
> > > >     PCI_PM_CTRL register or restore the BARs after programming the
> > > >     device into D0 in that case.
> > > >
> > > >  2. It is generally hard to get a clear view of the pci_power_up()
> > > >     code flow, especially in some corner cases, due to all of the
> > > >     involved PCI_PM_CTRL register reads and writes occurring in
> > > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > > >     some of which are redundant.
> > > >
> > > >  3. The transitions from low-power states to D0 and the other way
> > > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > > >     which causes it to use a redundant local variable and makes it
> > > >     rather hard to follow.
> > > >
> > > > To address the above shortcomings, make the following changes:
> > > >
> > > >  a. Remove the code handling transitions into D0
> > > >     from pci_raw_set_power_state() and rename it as
> > > >     pci_set_low_power_state().
> > > >
> > > >  b. Add the code handling transitions into D0 directly
> > > >     to pci_power_up() and to a new wrapper function
> > > >     pci_set_full_power_state() calling it internally that is
> > > >     only used in pci_set_power_state().
> > > >
> > > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > > >     and make it work in the same way for transitions from any
> > > >     low-power states (transitions from D1 and D2 are handled
> > > >     slightly differently before the change).
> > > >
> > > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > > >     register read confirming the power state change into
> > > >     pci_set_full_power_state() to avoid doing that in
> > > >     pci_pm_default_resume_early() unnecessarily.
> > > >
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > >
> > > This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> > > states of PCI devices") causes my AMD-based system to fail to fully
> > > boot. As far as I can tell, this might be NVMe related, which might make
> > > getting a full log difficult, as journalctl won't have anywhere to save
> > > it. I see:
> > >
> > > nvme nvme0: I/O 8 QID 0 timeout, completion polled
> > >
> > > then shortly afterwards:
> > >
> > > nvme nvme0: I/O 24 QID 0 timeout, completion polled
> > > nvme nvme0: missing or invalid SUBNQN field
> > >
> > > then I am dropped into an emergency shell.
> > 
> > Thanks for the report!
> > 
> > > This is a log from the previous commit, which may give some hints about
> > > the configuration of this particular system.
> > >
> > > https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log
> > >
> > > If there is any additional debugging information I can provide or
> > > patches I can try, please let me know!
> > 
> > Please see what happens if the "if (dev->current_state == PCI_D0)"
> > check and the following "return 0" statement in pci_power_up() are
> > commented out.
> 
> If I understand you correctly, this? Unfortunately, that does not help.

Thanks for testing.

Please check if the patch below makes any difference.

---
 drivers/pci/pci.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1245,7 +1245,7 @@ int pci_power_up(struct pci_dev *dev)
 
 	/* There's nothing more to do if current_state is D0 at this point. */
 	if (dev->current_state == PCI_D0)
-		return 0;
+		goto done;
 
 	/*
 	 * Program the device into PCI_D0 by forcing the entire word to 0 (this
@@ -1260,6 +1260,11 @@ int pci_power_up(struct pci_dev *dev)
 		udelay(PCI_PM_D2_DELAY);
 
 	dev->current_state = PCI_D0;
+
+done:
+	if (dev->bus->self)
+		pcie_aspm_pm_state_change(dev->bus->self);
+
 	return 1;
 
 fail:
@@ -1339,9 +1344,6 @@ static int pci_set_full_power_state(stru
 		pci_restore_bars(dev);
 	}
 
-	if (dev->bus->self)
-		pcie_aspm_pm_state_change(dev->bus->self);
-
 	return 0;
 }
Nathan Chancellor May 4, 2022, 7:35 p.m. UTC | #8
On Wed, May 04, 2022 at 08:00:33PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, May 4, 2022 6:36:00 PM CEST Nathan Chancellor wrote:
> > On Wed, May 04, 2022 at 02:59:17PM +0200, Rafael J. Wysocki wrote:
> > > On Tue, May 3, 2022 at 7:59 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > > >
> > > > Hi Rafael,
> > > >
> > > > On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > >
> > > > > There are some issues related to changing power states of PCI
> > > > > devices, mostly related to carrying out unnecessary actions in some
> > > > > places, and the code is generally hard to follow.
> > > > >
> > > > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > > > >     pci_pm_default_resume_early().  The latter updates the current
> > > > >     power state of the device right after calling pci_power_up()
> > > > >     and it restores the entire config space of the device right
> > > > >     after that, so pci_power_up() itself need not read the
> > > > >     PCI_PM_CTRL register or restore the BARs after programming the
> > > > >     device into D0 in that case.
> > > > >
> > > > >  2. It is generally hard to get a clear view of the pci_power_up()
> > > > >     code flow, especially in some corner cases, due to all of the
> > > > >     involved PCI_PM_CTRL register reads and writes occurring in
> > > > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > > > >     some of which are redundant.
> > > > >
> > > > >  3. The transitions from low-power states to D0 and the other way
> > > > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > > > >     which causes it to use a redundant local variable and makes it
> > > > >     rather hard to follow.
> > > > >
> > > > > To address the above shortcomings, make the following changes:
> > > > >
> > > > >  a. Remove the code handling transitions into D0
> > > > >     from pci_raw_set_power_state() and rename it as
> > > > >     pci_set_low_power_state().
> > > > >
> > > > >  b. Add the code handling transitions into D0 directly
> > > > >     to pci_power_up() and to a new wrapper function
> > > > >     pci_set_full_power_state() calling it internally that is
> > > > >     only used in pci_set_power_state().
> > > > >
> > > > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > > > >     and make it work in the same way for transitions from any
> > > > >     low-power states (transitions from D1 and D2 are handled
> > > > >     slightly differently before the change).
> > > > >
> > > > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > > > >     register read confirming the power state change into
> > > > >     pci_set_full_power_state() to avoid doing that in
> > > > >     pci_pm_default_resume_early() unnecessarily.
> > > > >
> > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > >
> > > > This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> > > > states of PCI devices") causes my AMD-based system to fail to fully
> > > > boot. As far as I can tell, this might be NVMe related, which might make
> > > > getting a full log difficult, as journalctl won't have anywhere to save
> > > > it. I see:
> > > >
> > > > nvme nvme0: I/O 8 QID 0 timeout, completion polled
> > > >
> > > > then shortly afterwards:
> > > >
> > > > nvme nvme0: I/O 24 QID 0 timeout, completion polled
> > > > nvme nvme0: missing or invalid SUBNQN field
> > > >
> > > > then I am dropped into an emergency shell.
> > > 
> > > Thanks for the report!
> > > 
> > > > This is a log from the previous commit, which may give some hints about
> > > > the configuration of this particular system.
> > > >
> > > > https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log
> > > >
> > > > If there is any additional debugging information I can provide or
> > > > patches I can try, please let me know!
> > > 
> > > Please see what happens if the "if (dev->current_state == PCI_D0)"
> > > check and the following "return 0" statement in pci_power_up() are
> > > commented out.
> > 
> > If I understand you correctly, this? Unfortunately, that does not help.
> 
> Thanks for testing.
> 
> Please check if the patch below makes any difference.

Unfortunately, there is still no difference. Even worse, I thought I
might be able to get some information from the emergency shell but I
don't think the HID driver is loaded yet so my keyboard does not work. I
am not sure of how to get any further information from the problematic
kernel; if anyone has any ideas, I am happy to test them! I am more than
happy to continue to test patches or provide information, I just don't
want to be a waste of time :)

Cheers,
Nathan

> ---
>  drivers/pci/pci.c |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/pci/pci.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci.c
> +++ linux-pm/drivers/pci/pci.c
> @@ -1245,7 +1245,7 @@ int pci_power_up(struct pci_dev *dev)
>  
>  	/* There's nothing more to do if current_state is D0 at this point. */
>  	if (dev->current_state == PCI_D0)
> -		return 0;
> +		goto done;
>  
>  	/*
>  	 * Program the device into PCI_D0 by forcing the entire word to 0 (this
> @@ -1260,6 +1260,11 @@ int pci_power_up(struct pci_dev *dev)
>  		udelay(PCI_PM_D2_DELAY);
>  
>  	dev->current_state = PCI_D0;
> +
> +done:
> +	if (dev->bus->self)
> +		pcie_aspm_pm_state_change(dev->bus->self);
> +
>  	return 1;
>  
>  fail:
> @@ -1339,9 +1344,6 @@ static int pci_set_full_power_state(stru
>  		pci_restore_bars(dev);
>  	}
>  
> -	if (dev->bus->self)
> -		pcie_aspm_pm_state_change(dev->bus->self);
> -
>  	return 0;
>  }
>  
> 
> 
>
Rafael J. Wysocki May 5, 2022, 11:58 a.m. UTC | #9
On Wed, May 4, 2022 at 9:35 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Wed, May 04, 2022 at 08:00:33PM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, May 4, 2022 6:36:00 PM CEST Nathan Chancellor wrote:
> > > On Wed, May 04, 2022 at 02:59:17PM +0200, Rafael J. Wysocki wrote:
> > > > On Tue, May 3, 2022 at 7:59 PM Nathan Chancellor <nathan@kernel.org> wrote:
> > > > >
> > > > > Hi Rafael,
> > > > >
> > > > > On Thu, Apr 14, 2022 at 03:11:21PM +0200, Rafael J. Wysocki wrote:
> > > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > >
> > > > > > There are some issues related to changing power states of PCI
> > > > > > devices, mostly related to carrying out unnecessary actions in some
> > > > > > places, and the code is generally hard to follow.
> > > > > >
> > > > > >  1. pci_power_up() has two callers, pci_set_power_state() and
> > > > > >     pci_pm_default_resume_early().  The latter updates the current
> > > > > >     power state of the device right after calling pci_power_up()
> > > > > >     and it restores the entire config space of the device right
> > > > > >     after that, so pci_power_up() itself need not read the
> > > > > >     PCI_PM_CTRL register or restore the BARs after programming the
> > > > > >     device into D0 in that case.
> > > > > >
> > > > > >  2. It is generally hard to get a clear view of the pci_power_up()
> > > > > >     code flow, especially in some corner cases, due to all of the
> > > > > >     involved PCI_PM_CTRL register reads and writes occurring in
> > > > > >     pci_platform_power_transition() and in pci_raw_set_power_state(),
> > > > > >     some of which are redundant.
> > > > > >
> > > > > >  3. The transitions from low-power states to D0 and the other way
> > > > > >     around are unnecessarily tangled in pci_raw_set_power_state()
> > > > > >     which causes it to use a redundant local variable and makes it
> > > > > >     rather hard to follow.
> > > > > >
> > > > > > To address the above shortcomings, make the following changes:
> > > > > >
> > > > > >  a. Remove the code handling transitions into D0
> > > > > >     from pci_raw_set_power_state() and rename it as
> > > > > >     pci_set_low_power_state().
> > > > > >
> > > > > >  b. Add the code handling transitions into D0 directly
> > > > > >     to pci_power_up() and to a new wrapper function
> > > > > >     pci_set_full_power_state() calling it internally that is
> > > > > >     only used in pci_set_power_state().
> > > > > >
> > > > > >  c. Make pci_power_up() avoid redundant PCI_PM_CTRL register reads
> > > > > >     and make it work in the same way for transitions from any
> > > > > >     low-power states (transitions from D1 and D2 are handled
> > > > > >     slightly differently before the change).
> > > > > >
> > > > > >  d. Put the restoration of the BARs and the PCI_PM_CTRL
> > > > > >     register read confirming the power state change into
> > > > > >     pci_set_full_power_state() to avoid doing that in
> > > > > >     pci_pm_default_resume_early() unnecessarily.
> > > > > >
> > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > > >
> > > > > This change as commit 5bffe4c611f5 ("PCI/PM: Rework changing power
> > > > > states of PCI devices") causes my AMD-based system to fail to fully
> > > > > boot. As far as I can tell, this might be NVMe related, which might make
> > > > > getting a full log difficult, as journalctl won't have anywhere to save
> > > > > it. I see:
> > > > >
> > > > > nvme nvme0: I/O 8 QID 0 timeout, completion polled
> > > > >
> > > > > then shortly afterwards:
> > > > >
> > > > > nvme nvme0: I/O 24 QID 0 timeout, completion polled
> > > > > nvme nvme0: missing or invalid SUBNQN field
> > > > >
> > > > > then I am dropped into an emergency shell.
> > > >
> > > > Thanks for the report!
> > > >
> > > > > This is a log from the previous commit, which may give some hints about
> > > > > the configuration of this particular system.
> > > > >
> > > > > https://gist.github.com/nathanchance/8a56f0939410cb187896e904c72e41e7/raw/b47b2620bdd32d43c7a3b209fcfd9e3d4668f058/good-boot.log
> > > > >
> > > > > If there is any additional debugging information I can provide or
> > > > > patches I can try, please let me know!
> > > >
> > > > Please see what happens if the "if (dev->current_state == PCI_D0)"
> > > > check and the following "return 0" statement in pci_power_up() are
> > > > commented out.
> > >
> > > If I understand you correctly, this? Unfortunately, that does not help.
> >
> > Thanks for testing.
> >
> > Please check if the patch below makes any difference.
>
> Unfortunately, there is still no difference. Even worse, I thought I
> might be able to get some information from the emergency shell but I
> don't think the HID driver is loaded yet so my keyboard does not work. I
> am not sure of how to get any further information from the problematic
> kernel; if anyone has any ideas, I am happy to test them! I am more than
> happy to continue to test patches or provide information, I just don't
> want to be a waste of time :)

It's not a waste of time if you run tests I ask for.

Anyway, I'm going to change the approach, because we're looking for a
subtle change in behavior that breaks your system and there are quite
a few of these in the problematic patch.

I'll post a new series of patches to replace the commits dropped by
Bjorn later today.

Thanks!
diff mbox series

Patch

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -1068,10 +1068,9 @@  static inline bool platform_pci_bridge_d
 }
 
 /**
- * pci_raw_set_power_state - Use PCI PM registers to set the power state of
- *			     given PCI device
+ * pci_set_low_power_state - Program the given device into a low-power state
  * @dev: PCI device to handle.
- * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
+ * @state: PCI power state (D1, D2, D3hot) to put the device into.
  *
  * RETURN VALUE:
  * -EINVAL if the requested state is invalid.
@@ -1080,10 +1079,9 @@  static inline bool platform_pci_bridge_d
  * 0 if device already is in the requested state.
  * 0 if device's power state has been successfully changed.
  */
-static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
+static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
 {
 	u16 pmcsr;
-	bool need_restore = false;
 
 	/* Check if we're already there */
 	if (dev->current_state == state)
@@ -1092,7 +1090,7 @@  static int pci_raw_set_power_state(struc
 	if (!dev->pm_cap)
 		return -EIO;
 
-	if (state < PCI_D0 || state > PCI_D3hot)
+	if (state < PCI_D1 || state > PCI_D3hot)
 		return -EINVAL;
 
 	/*
@@ -1101,8 +1099,7 @@  static int pci_raw_set_power_state(struc
 	 * we can go from D1 to D3, but we can't go directly from D3 to D1;
 	 * we'd have to go from D3 to D0, then to D1.
 	 */
-	if (state != PCI_D0 && dev->current_state <= PCI_D3cold
-	    && dev->current_state > state) {
+	if (dev->current_state <= PCI_D3cold && dev->current_state > state) {
 		pci_err(dev, "invalid power transition (from %s to %s)\n",
 			pci_power_name(dev->current_state),
 			pci_power_name(state));
@@ -1122,29 +1119,8 @@  static int pci_raw_set_power_state(struc
 		return -EIO;
 	}
 
-	/*
-	 * If we're (effectively) in D3, force entire word to 0.
-	 * This doesn't affect PME_Status, disables PME_En, and
-	 * sets PowerState to 0.
-	 */
-	switch (dev->current_state) {
-	case PCI_D0:
-	case PCI_D1:
-	case PCI_D2:
-		pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
-		pmcsr |= state;
-		break;
-	case PCI_D3hot:
-	case PCI_D3cold:
-	case PCI_UNKNOWN: /* Boot-up */
-		if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
-		 && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
-			need_restore = true;
-		fallthrough;	/* force to D0 */
-	default:
-		pmcsr = 0;
-		break;
-	}
+	pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
+	pmcsr |= state;
 
 	/* Enter specified state */
 	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
@@ -1153,9 +1129,9 @@  static int pci_raw_set_power_state(struc
 	 * Mandatory power management transition delays; see PCI PM 1.1
 	 * 5.6.1 table 18
 	 */
-	if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
+	if (state == PCI_D3hot)
 		pci_dev_d3_sleep(dev);
-	else if (state == PCI_D2 || dev->current_state == PCI_D2)
+	else if (state == PCI_D2)
 		udelay(PCI_PM_D2_DELAY);
 
 	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
@@ -1165,22 +1141,6 @@  static int pci_raw_set_power_state(struc
 			 pci_power_name(dev->current_state),
 			 pci_power_name(state));
 
-	/*
-	 * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
-	 * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
-	 * from D3hot to D0 _may_ perform an internal reset, thereby
-	 * going to "D0 Uninitialized" rather than "D0 Initialized".
-	 * For example, at least some versions of the 3c905B and the
-	 * 3c556B exhibit this behaviour.
-	 *
-	 * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
-	 * devices in a D3hot state at boot.  Consequently, we need to
-	 * restore at least the BARs so that the device will be
-	 * accessible to its driver.
-	 */
-	if (need_restore)
-		pci_restore_bars(dev);
-
 	if (dev->bus->self)
 		pcie_aspm_pm_state_change(dev->bus->self);
 
@@ -1312,8 +1272,63 @@  static int pci_dev_wait(struct pci_dev *
  */
 int pci_power_up(struct pci_dev *dev)
 {
-	pci_platform_power_transition(dev, PCI_D0);
-	return pci_raw_set_power_state(dev, PCI_D0);
+	int ret;
+
+	ret = pci_platform_power_transition(dev, PCI_D0);
+	if (ret) {
+		u16 pmcsr;
+
+		/*
+		 * If native PM is not supported, pass the error returned by
+		 * pci_platform_power_transition() to the caller.
+		 */
+		if (!dev->pm_cap)
+			return ret;
+
+		/*
+		 * Since pci_platform_power_transition() has returned an error,
+		 * the PCI_PM_CTRL register has not been read by it and the
+		 * current power state of the device is unknown.  Read the
+		 * PCI_PM_CTRL register now and bail out if that fails.
+		 */
+		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+		if (PCI_POSSIBLE_ERROR(pmcsr)) {
+			dev->current_state = PCI_D3cold;
+			goto fail;
+		}
+		dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
+	} else if (dev->current_state == PCI_D3cold) {
+		/*
+		 * Since current_state is PCI_D3cold here, the power state seen
+		 * by the platform is still D3cold or the PCI_PM_CTRL register
+		 * read in pci_update_current_state() has failed, so assume the
+		 * device to be inaccessible.
+		 */
+		goto fail;
+	}
+
+	/* There's nothing more to do if current_state is D0 at this point. */
+	if (dev->current_state == PCI_D0)
+		return 0;
+
+	/*
+	 * Program the device into PCI_D0 by forcing the entire word to 0 (this
+	 * doesn't affect PME_Status, disables PME_En, and sets PowerState to 0)
+	 * and wait for the prescribed amount of time.  Assume success.
+	 */
+	pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0);
+
+	if (dev->current_state == PCI_D3hot)
+		pci_dev_d3_sleep(dev);
+	else if (dev->current_state == PCI_D2)
+		udelay(PCI_PM_D2_DELAY);
+
+	dev->current_state = PCI_D0;
+	return 0;
+
+fail:
+	pci_err(dev, "Unable to change power state to D0, device inaccessible\n");
+	return -ENODEV;
 }
 
 /**
@@ -1341,6 +1356,60 @@  void pci_bus_set_current_state(struct pc
 }
 
 /**
+ * pci_set_full_power_state - Put a PCI device into D0 and update its state
+ * @dev: PCI device to power up
+ *
+ * Call pci_power_up() to put @dev into D0, read from its PCI_PM_CTRL register
+ * to confirm the state change, restore its BARs if they might be lost and
+ * reconfigure ASPM in acordance with the new power state.
+ *
+ * If pci_restore_state() is going to be called right after a power state change
+ * to D0, it is more efficient to use pci_power_up() directly instead of this
+ * function.
+ */
+static int pci_set_full_power_state(struct pci_dev *dev)
+{
+	pci_power_t old_state = dev->current_state;
+	u16 pmcsr;
+	int ret;
+
+	ret = pci_power_up(dev);
+	if (ret)
+		return ret;
+
+	if (!dev->pm_cap)
+		return 0;
+
+	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+
+	dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;
+	if (dev->current_state != PCI_D0) {
+		pci_info_ratelimited(dev, "Refused to change power state from %s to D0\n",
+				     pci_power_name(dev->current_state));
+	} else if (old_state >= PCI_D3hot && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
+		/*
+		 * According to section 5.4.1 of the "PCI BUS POWER MANAGEMENT
+		 * INTERFACE SPECIFICATION, REV. 1.2", a device transitioning
+		 * from D3hot to D0 _may_ perform an internal reset, thereby
+		 * going to "D0 Uninitialized" rather than "D0 Initialized". For
+		 * example, at least some versions of the 3c905B and the 3c556B
+		 * exhibit this behaviour.
+		 *
+		 * At least some laptop BIOSen (e.g. the Thinkpad T21) leave
+		 * devices in a D3hot state at boot. Consequently, we need to
+		 * restore at least the BARs so that the device will be
+		 * accessible to its driver.
+		 */
+		pci_restore_bars(dev);
+	}
+
+	if (dev->bus->self)
+		pcie_aspm_pm_state_change(dev->bus->self);
+
+	return 0;
+}
+
+/**
  * pci_set_power_state - Set the power state of a PCI device
  * @dev: PCI device to handle.
  * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
@@ -1381,7 +1450,7 @@  int pci_set_power_state(struct pci_dev *
 		return 0;
 
 	if (state == PCI_D0)
-		return pci_power_up(dev);
+		return pci_set_full_power_state(dev);
 
 	/*
 	 * This device is quirked not to be put into D3, so don't put it in
@@ -1394,7 +1463,7 @@  int pci_set_power_state(struct pci_dev *
 	 * To put device in D3cold, we put device into D3hot in native
 	 * way, then put device into D3cold with platform ops
 	 */
-	error = pci_raw_set_power_state(dev, state > PCI_D3hot ?
+	error = pci_set_low_power_state(dev, state > PCI_D3hot ?
 					PCI_D3hot : state);
 
 	if (pci_platform_power_transition(dev, state))