Message ID | 9f0713b23baf7c6139a4b476cd9637d96e18cc95.1463134232.git.lukas@wunner.de (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Friday, May 13, 2016 01:15:31 PM Lukas Wunner wrote: > Since commit aae4518b3124 ("PM / sleep: Mechanism to avoid resuming > runtime-suspended devices unnecessarily"), we no longer wake up devices > which are already runtime suspended upon entering system sleep > ("direct-complete"). > > However commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might > have been reset by firmware") changed this to mandatorily runtime resume > such devices after the system is woken. The motivation was to ensure > that devices do not remain in a reset-power-on state after system > resume, potentially preventing deep SoC-wide low-power states from being > entered on idle. > > This is counter-productive for devices of which we know that the > mandatory runtime resume is unnecessary. Thunderbolt on the Mac is a > case in point: Runtime resume not just powers up the controller, but > multiple adjacent chips, including a 15V boost converter, multiplexers > and an eeprom. Gratuitously powering this up after every system sleep > burns a not insignificant amount of energy and needlessly strains the > hardware. > > Perhaps it would have been better to carry out the mandatory runtime > resume only for those devices that actually need it, but at least we > should allow an opt-out. > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Cc: Alan Stern <stern@rowland.harvard.edu> > Signed-off-by: Lukas Wunner <lukas@wunner.de> I don't like this patch and especially adding a new dev_pm_ops flag to work around something that you're seeing as an issue in the generic ops. It is sort of like saying "the generic ops don't work for me, so modify them as well as struct dev_pm_ops", but maybe it's better to change the PCI bus type to do something different from calling the generic function? Or you can add a ->complete callback to your driver that will clear power.direct_complete for the device in question. > --- > drivers/base/power/generic_ops.c | 3 ++- > include/linux/pm.h | 1 + > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/base/power/generic_ops.c b/drivers/base/power/generic_ops.c > index 07c3c4a..6e88f55 100644 > --- a/drivers/base/power/generic_ops.c > +++ b/drivers/base/power/generic_ops.c > @@ -316,7 +316,8 @@ void pm_complete_with_resume_check(struct device *dev) > * the sleep state it is going out of and it has never been resumed till > * now, resume it in case the firmware powered it up. > */ > - if (dev->power.direct_complete && pm_resume_via_firmware()) > + if (dev->power.direct_complete && pm_resume_via_firmware() && > + !dev->power.direct_complete_noresume) > pm_request_resume(dev); > } > EXPORT_SYMBOL_GPL(pm_complete_with_resume_check); > diff --git a/include/linux/pm.h b/include/linux/pm.h > index 6a5d654..023de94 100644 > --- a/include/linux/pm.h > +++ b/include/linux/pm.h > @@ -596,6 +596,7 @@ struct dev_pm_info { > unsigned int use_autosuspend:1; > unsigned int timer_autosuspends:1; > unsigned int memalloc_noio:1; > + unsigned int direct_complete_noresume:1; > enum rpm_request request; > enum rpm_status runtime_status; > int runtime_error; >
On Mon, Jul 18, 2016 at 03:18:25PM +0200, Rafael J. Wysocki wrote: > On Friday, May 13, 2016 01:15:31 PM Lukas Wunner wrote: > > Since commit aae4518b3124 ("PM / sleep: Mechanism to avoid resuming > > runtime-suspended devices unnecessarily"), we no longer wake up devices > > which are already runtime suspended upon entering system sleep > > ("direct-complete"). > > > > However commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might > > have been reset by firmware") changed this to mandatorily runtime resume > > such devices after the system is woken. The motivation was to ensure > > that devices do not remain in a reset-power-on state after system > > resume, potentially preventing deep SoC-wide low-power states from being > > entered on idle. > > > > This is counter-productive for devices of which we know that the > > mandatory runtime resume is unnecessary. Thunderbolt on the Mac is a > > case in point: Runtime resume not just powers up the controller, but > > multiple adjacent chips, including a 15V boost converter, multiplexers > > and an eeprom. Gratuitously powering this up after every system sleep > > burns a not insignificant amount of energy and needlessly strains the > > hardware. > > > > Perhaps it would have been better to carry out the mandatory runtime > > resume only for those devices that actually need it, but at least we > > should allow an opt-out. > > > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Cc: Alan Stern <stern@rowland.harvard.edu> > > Signed-off-by: Lukas Wunner <lukas@wunner.de> > > I don't like this patch and especially adding a new dev_pm_ops flag to > work around something that you're seeing as an issue in the generic ops. > > It is sort of like saying "the generic ops don't work for me, so modify > them as well as struct dev_pm_ops", but maybe it's better to change the > PCI bus type to do something different from calling the generic function? > > Or you can add a ->complete callback to your driver that will clear > power.direct_complete for the device in question. First of all, the direct_complete flag is marked "Owned by the PM core" in include/linux/pm.h. So I would have expected that a driver is not supposed to fudge it. Second, yes it's possible to make it work by clearing direct_complete in the ->complete callback, but there's a catch: The device tree is traversed bottom-up in dpm_complete(). Recall that a Thunderbolt controller consists of multiple devices and that power control is governed by its top-most device (upstream bridge). But because we're going bottom-up, clearing the direct_complete flag must be done by the bottom-most device (NHI)! So I've got all the power management stuff nicely separated in functions executed for the upstream bridge, but a small portion needs to be executed for the NHI. That's ugly. Normally the device hierarchy is traversed bottom-up during suspend and top-down during resume. However ->prepare and ->complete do it the other way round. In the case of ->prepare, this is even documented in Documentation/power/devices.txt but the reason thereof is not. Could you explain this please? Third, I'm irritated by your question "maybe it's better to change the PCI bus type to do something different from calling the generic function". What should that be? Under which circumstances can we leave a PCI device asleep after direct-complete? I'm generally irritated by commit 58a1fbbb2ee8, it's a significant change to mandatorily wake all devices, it wastes a not insignificant amount of energy, yet the reasoning in the commit message sounds vague and handwavy ("There is a concern [...] devices that are most likely to be affected"). Are there clear indications for or against a device requiring a resume? E.g. the commit message names SoCs, perhaps those can be recognized by having child devices of certain types? Thanks, Lukas > > > --- > > drivers/base/power/generic_ops.c | 3 ++- > > include/linux/pm.h | 1 + > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/base/power/generic_ops.c b/drivers/base/power/generic_ops.c > > index 07c3c4a..6e88f55 100644 > > --- a/drivers/base/power/generic_ops.c > > +++ b/drivers/base/power/generic_ops.c > > @@ -316,7 +316,8 @@ void pm_complete_with_resume_check(struct device *dev) > > * the sleep state it is going out of and it has never been resumed till > > * now, resume it in case the firmware powered it up. > > */ > > - if (dev->power.direct_complete && pm_resume_via_firmware()) > > + if (dev->power.direct_complete && pm_resume_via_firmware() && > > + !dev->power.direct_complete_noresume) > > pm_request_resume(dev); > > } > > EXPORT_SYMBOL_GPL(pm_complete_with_resume_check); > > diff --git a/include/linux/pm.h b/include/linux/pm.h > > index 6a5d654..023de94 100644 > > --- a/include/linux/pm.h > > +++ b/include/linux/pm.h > > @@ -596,6 +596,7 @@ struct dev_pm_info { > > unsigned int use_autosuspend:1; > > unsigned int timer_autosuspends:1; > > unsigned int memalloc_noio:1; > > + unsigned int direct_complete_noresume:1; > > enum rpm_request request; > > enum rpm_status runtime_status; > > int runtime_error; > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 7 Aug 2016, Lukas Wunner wrote: > Normally the device hierarchy is traversed bottom-up during suspend > and top-down during resume. However ->prepare and ->complete do it > the other way round. In the case of ->prepare, this is even documented > in Documentation/power/devices.txt but the reason thereof is not. > Could you explain this please? The purpose of ->prepare is to tell drivers that a system sleep is beginning and accordingly they should stop registering new children. This is necessary for the PM core to be able to traverse the entire device tree safely; we want to avoid races where a new child is added below a device concurrently with that device being suspended. (Or if you want to be more precise, races in which a new child is added below a device while the PM core is acquiring the device's lock just prior to invoking its ->suspend callback.) Telling drivers to stop registering new children below a device has to be done top-down, because if it were done bottom-up then it would be subject to the same race described above. Doing it top-down avoids problems; if a device registers new children while the PM core is acquiring its lock prior to invoking ->prepare, it doesn't matter. The new children will be handled later, right along with the existing ones. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Aug 07, 2016 at 11:33:17AM -0400, Alan Stern wrote: > On Sun, 7 Aug 2016, Lukas Wunner wrote: > > > Normally the device hierarchy is traversed bottom-up during suspend > > and top-down during resume. However ->prepare and ->complete do it > > the other way round. In the case of ->prepare, this is even documented > > in Documentation/power/devices.txt but the reason thereof is not. > > Could you explain this please? > > The purpose of ->prepare is to tell drivers that a system sleep is > beginning and accordingly they should stop registering new children. > This is necessary for the PM core to be able to traverse the entire > device tree safely; we want to avoid races where a new child is added > below a device concurrently with that device being suspended. (Or if > you want to be more precise, races in which a new child is added below > a device while the PM core is acquiring the device's lock just prior to > invoking its ->suspend callback.) > > Telling drivers to stop registering new children below a device has to > be done top-down, because if it were done bottom-up then it would be > subject to the same race described above. Doing it top-down avoids > problems; if a device registers new children while the PM core is > acquiring its lock prior to invoking ->prepare, it doesn't matter. The > new children will be handled later, right along with the existing ones. Thank you for explaining the motivation to carry out ->prepare top-down. However my problem is really that ->complete is carried out bottom-up. What's the motivation for that? Merely to mirror the behaviour of ->prepare? Would it be possible to change it to top-down? Note that re-enablement of device addition is already allowed in ->resume, which is called top-down. By the way, neither the PCI nor USB bus-level ->prepare callbacks perform any action that would stop device addition. Same for the pciehp driver (we don't even have a ->prepare callback defined for PCIe port services. So it *is* possible to hotplug PCI devices after ->prepare. Best regards, Lukas -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 12 Aug 2016, Lukas Wunner wrote: > Thank you for explaining the motivation to carry out ->prepare top-down. > However my problem is really that ->complete is carried out bottom-up. > What's the motivation for that? Merely to mirror the behaviour of > ->prepare? Would it be possible to change it to top-down? Note that > re-enablement of device addition is already allowed in ->resume, > which is called top-down. I'm not aware of any particular reason why making ->complete run top-down wouldn't work. Of course, if you did then the environment at the start of the ->complete callback wouldn't be the same as it was at the end of the ->prepare callback. I think originally the idea was just to mirror ->prepare. Perhaps Rafael will remember something that has escaped me. > By the way, neither the PCI nor USB bus-level ->prepare callbacks perform > any action that would stop device addition. Same for the pciehp driver > (we don't even have a ->prepare callback defined for PCIe port services. > So it *is* possible to hotplug PCI devices after ->prepare. I don't know about PCI (although what you describe sounds like a bug). USB relies on a freezable workqueue for adding child devices, so it stops adding children even before the prepare phase begins. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, August 12, 2016 01:30:04 PM Alan Stern wrote: > On Fri, 12 Aug 2016, Lukas Wunner wrote: > > > Thank you for explaining the motivation to carry out ->prepare top-down. > > However my problem is really that ->complete is carried out bottom-up. > > What's the motivation for that? Merely to mirror the behaviour of > > ->prepare? Would it be possible to change it to top-down? Note that > > re-enablement of device addition is already allowed in ->resume, > > which is called top-down. > > I'm not aware of any particular reason why making ->complete run > top-down wouldn't work. Of course, if you did then the environment at > the start of the ->complete callback wouldn't be the same as it was at > the end of the ->prepare callback. > > I think originally the idea was just to mirror ->prepare. Perhaps > Rafael will remember something that has escaped me. Nothing specific from the top of my head. > > By the way, neither the PCI nor USB bus-level ->prepare callbacks perform > > any action that would stop device addition. Same for the pciehp driver > > (we don't even have a ->prepare callback defined for PCIe port services. > > So it *is* possible to hotplug PCI devices after ->prepare. Not via ACPI, though. The ACPI core blocks all hotplug events at the beginning of the suspend sequence and releases them at the end of device resume. > I don't know about PCI (although what you describe sounds like a bug). > > USB relies on a freezable workqueue for adding child devices, so it > stops adding children even before the prepare phase begins. Right. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/base/power/generic_ops.c b/drivers/base/power/generic_ops.c index 07c3c4a..6e88f55 100644 --- a/drivers/base/power/generic_ops.c +++ b/drivers/base/power/generic_ops.c @@ -316,7 +316,8 @@ void pm_complete_with_resume_check(struct device *dev) * the sleep state it is going out of and it has never been resumed till * now, resume it in case the firmware powered it up. */ - if (dev->power.direct_complete && pm_resume_via_firmware()) + if (dev->power.direct_complete && pm_resume_via_firmware() && + !dev->power.direct_complete_noresume) pm_request_resume(dev); } EXPORT_SYMBOL_GPL(pm_complete_with_resume_check); diff --git a/include/linux/pm.h b/include/linux/pm.h index 6a5d654..023de94 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -596,6 +596,7 @@ struct dev_pm_info { unsigned int use_autosuspend:1; unsigned int timer_autosuspends:1; unsigned int memalloc_noio:1; + unsigned int direct_complete_noresume:1; enum rpm_request request; enum rpm_status runtime_status; int runtime_error;
Since commit aae4518b3124 ("PM / sleep: Mechanism to avoid resuming runtime-suspended devices unnecessarily"), we no longer wake up devices which are already runtime suspended upon entering system sleep ("direct-complete"). However commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might have been reset by firmware") changed this to mandatorily runtime resume such devices after the system is woken. The motivation was to ensure that devices do not remain in a reset-power-on state after system resume, potentially preventing deep SoC-wide low-power states from being entered on idle. This is counter-productive for devices of which we know that the mandatory runtime resume is unnecessary. Thunderbolt on the Mac is a case in point: Runtime resume not just powers up the controller, but multiple adjacent chips, including a 15V boost converter, multiplexers and an eeprom. Gratuitously powering this up after every system sleep burns a not insignificant amount of energy and needlessly strains the hardware. Perhaps it would have been better to carry out the mandatory runtime resume only for those devices that actually need it, but at least we should allow an opt-out. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Lukas Wunner <lukas@wunner.de> --- drivers/base/power/generic_ops.c | 3 ++- include/linux/pm.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-)