diff mbox series

[v2] PCI: PM: Add special case handling for PCIe device wakeup

Message ID 3149540.aeNJFYEL58@kreacher (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series [v2] PCI: PM: Add special case handling for PCIe device wakeup | expand

Commit Message

Rafael J. Wysocki July 28, 2021, 5:25 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Some PCIe devices only support PME (Power Management Event) from
D3cold.  One example is the ASMedia xHCI controller:

 11:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller (prog-if 30 [XHCI])
   ...
   Capabilities: [78] Power Management version 3
       Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
       Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

In those cases, if the device is expected to generate wakeup events
from its final power state, pci_target_state() returns D0, which
prevents the PCIe hierarchy above the device from entering any
low-power states too, but the device cannot signal PME from D0
either.  However, if the device were allowed to go into D3hot, its
parent PCIe port and its ancestors would also be able to go into D3
and if any of them goes into D3cold, the device would end up in
D3cold too (as per the PCI PM spec v1.2, Table 6-1), in which case
it would be able to signal PME.

This means that the system could be put into a lower-power
configuration while meeting the requirement to enable the device to
generate PME from the final state (which is not the case if the
device stays in D0 along with the entire hierarchy above it).

In order to avoid missing that opportunity, extend pci_pme_capable()
to return 'true' in the special case when the target state is D3hot
and the device can only signal PME from D3cold and update
pci_target_state() to return the current target state if
pci_pme_capable() returns 'true' for it.

This change can be regarded as a pci_target_state() fix, because that
function should ignore its 'wakeup' argument if signaling PME from
any power states shallower than the current candidate one (including
D0) is not supported.

Link: https://lore.kernel.org/linux-pm/20210617123653.58640-1-mika.westerberg@linux.intel.com
Fixes: 666ff6f83e1d ("PCI/PM: Avoid using device_may_wakeup() for runtime PM")
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reported-by: Utkarsh H Patel <utkarsh.h.patel@intel.com>
Reported-by: Koba Ko <koba.ko@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

Hi Mika,

IMO it is better to address the case in which the device cannot signal PME from
D0 (as well as from any power states shallower than D3cold), which appears to be
the one at hand, to start with and then, if need be, take care of the case in
which signaling PME from both D0 and D3cold is supported separately.

If the device cannot signal PME from D0, then there is no point returning D0
from pci_target_state() just because 'wakeup' is set, so this is a bug, and
then enabling PME in case the device ends up in D3cold can be regarded as "best
effort" (we cannot guarantee that this will always happen, but then it is the
only way to enable it to signal wakeup anyway).

The case when signaling PME from D0 and D3cold (if we need to worry about it at
all) is more controversial, because in that case leaving the device in D0 really
allows it to signal wakeup, while putting it into D3hot in hope that it will end
up in D3cold may not work.

Cheers!

-> v2:
   * Instead of checking the direct parent of the device in question, which
     doesn't work, check if the device can signal PME from any power states
     shallower than D3cold, including D0.  If not, return 'true' from
     pci_pme_capable() for D3hot as a way to indicate that the device should
     be able to signal wakeup from the final state conditional on getting
     one of its ancestors into D3cold.

---
 drivers/pci/pci.c |   34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

Comments

Mika Westerberg July 29, 2021, 1:23 p.m. UTC | #1
Hi Rafael,

On Wed, Jul 28, 2021 at 07:25:04PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Some PCIe devices only support PME (Power Management Event) from
> D3cold.  One example is the ASMedia xHCI controller:
> 
>  11:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller (prog-if 30 [XHCI])
>    ...
>    Capabilities: [78] Power Management version 3
>        Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
>        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 
> In those cases, if the device is expected to generate wakeup events
> from its final power state, pci_target_state() returns D0, which
> prevents the PCIe hierarchy above the device from entering any
> low-power states too, but the device cannot signal PME from D0
> either.  However, if the device were allowed to go into D3hot, its
> parent PCIe port and its ancestors would also be able to go into D3
> and if any of them goes into D3cold, the device would end up in
> D3cold too (as per the PCI PM spec v1.2, Table 6-1), in which case
> it would be able to signal PME.
> 
> This means that the system could be put into a lower-power
> configuration while meeting the requirement to enable the device to
> generate PME from the final state (which is not the case if the
> device stays in D0 along with the entire hierarchy above it).
> 
> In order to avoid missing that opportunity, extend pci_pme_capable()
> to return 'true' in the special case when the target state is D3hot
> and the device can only signal PME from D3cold and update
> pci_target_state() to return the current target state if
> pci_pme_capable() returns 'true' for it.
> 
> This change can be regarded as a pci_target_state() fix, because that
> function should ignore its 'wakeup' argument if signaling PME from
> any power states shallower than the current candidate one (including
> D0) is not supported.
> 
> Link: https://lore.kernel.org/linux-pm/20210617123653.58640-1-mika.westerberg@linux.intel.com
> Fixes: 666ff6f83e1d ("PCI/PM: Avoid using device_may_wakeup() for runtime PM")
> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Reported-by: Utkarsh H Patel <utkarsh.h.patel@intel.com>
> Reported-by: Koba Ko <koba.ko@canonical.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Tried this now and it fixes the issue! Also checked with another device
that actually supports PME from other states than D3cold and it also
works (as expected).

Feel free to add my,

Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>

Thanks!
Rafael J. Wysocki July 29, 2021, 1:44 p.m. UTC | #2
Hi Mika,

On Thu, Jul 29, 2021 at 3:23 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> Hi Rafael,
>
> On Wed, Jul 28, 2021 at 07:25:04PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Some PCIe devices only support PME (Power Management Event) from
> > D3cold.  One example is the ASMedia xHCI controller:
> >
> >  11:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller (prog-if 30 [XHCI])
> >    ...
> >    Capabilities: [78] Power Management version 3
> >        Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
> >        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >
> > In those cases, if the device is expected to generate wakeup events
> > from its final power state, pci_target_state() returns D0, which
> > prevents the PCIe hierarchy above the device from entering any
> > low-power states too, but the device cannot signal PME from D0
> > either.  However, if the device were allowed to go into D3hot, its
> > parent PCIe port and its ancestors would also be able to go into D3
> > and if any of them goes into D3cold, the device would end up in
> > D3cold too (as per the PCI PM spec v1.2, Table 6-1), in which case
> > it would be able to signal PME.
> >
> > This means that the system could be put into a lower-power
> > configuration while meeting the requirement to enable the device to
> > generate PME from the final state (which is not the case if the
> > device stays in D0 along with the entire hierarchy above it).
> >
> > In order to avoid missing that opportunity, extend pci_pme_capable()
> > to return 'true' in the special case when the target state is D3hot
> > and the device can only signal PME from D3cold and update
> > pci_target_state() to return the current target state if
> > pci_pme_capable() returns 'true' for it.
> >
> > This change can be regarded as a pci_target_state() fix, because that
> > function should ignore its 'wakeup' argument if signaling PME from
> > any power states shallower than the current candidate one (including
> > D0) is not supported.
> >
> > Link: https://lore.kernel.org/linux-pm/20210617123653.58640-1-mika.westerberg@linux.intel.com
> > Fixes: 666ff6f83e1d ("PCI/PM: Avoid using device_may_wakeup() for runtime PM")
> > Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Reported-by: Utkarsh H Patel <utkarsh.h.patel@intel.com>
> > Reported-by: Koba Ko <koba.ko@canonical.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Tried this now and it fixes the issue! Also checked with another device
> that actually supports PME from other states than D3cold and it also
> works (as expected).
>
> Feel free to add my,
>
> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>

Thank you!

However, after giving some more consideration to this, I've realized
that it may still not be the cleanest way to address the issue at
hand.

Namely, there are two things we want to happen: (1) pci_target_state()
should return the original candidate target state if 'wakeup' is true,
but the device cannot signal PME from any states shallower than the
original one (including D0) and (2) __pci_enable_wake() should call
pci_pme_active() for devices that support signaling PME from D3cold
regardless of whether or not they support signaling PME from the power
state passed to it (because a device may end up in D3cold as a result
of a transition of the hierarchy above it and PME should be signaled
then).  They are only related to each other because the PME signaling
capability is taken into account in both places, but in fact (1) has
nothing to do with D3cold (and making it effectively depend on whether
or not the device can signal PME from D3cold is somewhat artificial)
and (2) has nothing to do with what the target power state of the
device is.  And so each of these two things should be covered by a
separate patch.

Let me post an alternative patch series in accordance with the above.
diff mbox series

Patch

Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -2301,7 +2300,21 @@  bool pci_pme_capable(struct pci_dev *dev
 	if (!dev->pm_cap)
 		return false;
 
-	return !!(dev->pme_support & (1 << state));
+	if (dev->pme_support & (1 << state))
+		return true;
+
+	/*
+	 * Special case: The target state is D3hot and the device (which is a
+	 * PCIe one) only supports signaling PME from D3cold.  In that case, if
+	 * the device is allowed to go into D3hot, its ancestor PCIe port may go
+	 * into D3cold which will cause the device to end up in D3cold too
+	 * (along the lines of the PCI PM spec v1.2, Table 6-1), so it will be
+	 * able to signal PME from the final state.  It will not be able to
+	 * signal PME if left in D0, however.
+	 */
+	return state == PCI_D3hot && pci_is_pcie(dev) &&
+		(dev->pme_support & (1 << PCI_D3cold)) &&
+		!(dev->pme_support ^ (1 << PCI_D3cold));
 }
 EXPORT_SYMBOL(pci_pme_capable);
 
@@ -2595,17 +2608,12 @@  static pci_power_t pci_target_state(stru
 	if (dev->current_state == PCI_D3cold)
 		target_state = PCI_D3cold;
 
-	if (wakeup) {
-		/*
-		 * Find the deepest state from which the device can generate
-		 * PME#.
-		 */
-		if (dev->pme_support) {
-			while (target_state
-			      && !(dev->pme_support & (1 << target_state)))
-				target_state--;
-		}
-	}
+	if (!wakeup || !dev->pme_support || pci_pme_capable(dev, target_state))
+		return target_state;
+
+	/* Find the deepest state from which the device can generate PME#. */
+	while (target_state && !(dev->pme_support & (1 << target_state)))
+		target_state--;
 
 	return target_state;
 }