diff mbox

[RFC] PM / Core: suspend_again cb for syscore_ops

Message ID 1303288106-2965-1-git-send-email-myungjoo.ham@samsung.com (mailing list archive)
State Rejected, archived
Headers show

Commit Message

MyungJoo Ham April 20, 2011, 8:28 a.m. UTC
A system or a device may need to control suspend/wakeup events. It may
want to wakeup the system after a predefined amount of time or at a
predefined event decided while entering suspend for polling or delayed
work. Then, it may want to enter suspend again if its predefined wakeup
condition is the only wakeup reason and there is no outstanding events;
thus, it does not wakeup the userspace unnecessary and keeps suspended
as long as possible (saving the power).

Enabling a system to wakeup after a specified time can be easily
achieved by using RTC. However, to enter suspend again immediately
without invoking userland, we need additional features in the
suspend framework.

Such need comes from:

1. Monitoring a critical device status without interrupts that can
wakeup the system. (in-suspend polling)
 An example is ambient temperature monitoring that needs to shut down
the system or a specific device function if it is too hot or cold. The
temperature of a specific device may be needed to be monitored as well;
e.g., a charger monitors battery temperature in order to stop charging
if overheated.

2. Execute critical "delayed work" at suspend.
 A driver or a system/board may have a delayed work (or any similar
things) that it wants to execute at the requested time.
 For example, some chargers want to check the battery voltage some
time (e.g., 30 seconds) after the battery is fully charged and the
charger stops. Then, the charger restarts charging if the voltage has
dropped more than a threshold, which is smaller than "restart-charger"
voltage, which is a threshold to restart charging regardless of the
time passed.

This patch allows a system or a device to provide "suspend_again"
callback with syscore_ops. With suspend_again callbacks registered,
the suspend framework (kernel/power/suspend.c) tries to enter suspend
again if conditions are met.

The system enters the suspend again if and only if all of the following
conditions are met:
1. None of suspend_again ops returned "I want to stop suspend"
(suspend_again returns SUSPEND_AGAIN_STOP).
2. At least one of suspend_again ops returned "I want to suspend again"
(suspend_again returns SUSPEND_AGAIN_CONTINUE)

suspend_again ops may return "I do not care. This wakeup is not related
with me." (SUSPEND_AGAIN_NC, which is 0).

Use SUSPEND_AGAIN_STOP in order to override other devices'
SUSPEND_AGAIN_CONTINUE and to wakeup fully. For devices that poll
sensors during suspend may need this if any outstanding status is found.
For conventional suspend wakeup sources, SUSPEND_AGAIN_STOP may be used
to override SUSPEND_AGAIN devices.

Anyway, the following features may need to be added later:
1. An API to allow devices to express next desired wakeup-time. Then,
the framework will combine them and setup RTC alarm accordingly and
save/restore previously registered RTC alarms.
2. Create a method to declare a specific instance of delayed-work is to
be executed in suspend by waking up the system in the middle of
suspend. Then, let the framework handle those "critical" delayed-work
in suspend.
3. If a device says SUSPEND_AGAIN_CONTINUE and there is another wakeup
source pending (e.g., power button) without suspend_again ops, the
system will enter suspend again. In such a case, the system should not
suspend again. We may need to see if irqs that are enabled by
set_irq_wake() (and not related to suspend_ops devices)
are pending at syscore_suspend_again(). Maybe we need to add
something like "set_irq_wake_with_suspend_again" so that IRQs with
suspend_again ops implemented are ignored for the
"override-suspend-again-continue" checking.

For the initial release, I have set the point of "suspend-again" after
suspend_ops->end(). However, I'm not so sure about where to set the
suspend-again point. Because in-suspend polling, which may require
I/O with other devices, is supposed to be executed at suspend-again ops,
the suspend-again point is configured to be as late as possible in
suspend_devices_and_enter(). In order to reduce the number of devices
waked up, we may need to set the suspend-again point ealier.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 drivers/base/syscore.c      |   36 +++++++++++++++++++++++++++
 include/linux/syscore_ops.h |    7 +++++
 kernel/power/suspend.c      |   57 +++++++++++++++++++++++-------------------
 3 files changed, 74 insertions(+), 26 deletions(-)

Comments

Pavel Machek April 20, 2011, 10:36 a.m. UTC | #1
Hi!

> A system or a device may need to control suspend/wakeup events. It may
> want to wakeup the system after a predefined amount of time or at a
> predefined event decided while entering suspend for polling or delayed
> work. Then, it may want to enter suspend again if its predefined wakeup
> condition is the only wakeup reason and there is no outstanding events;
> thus, it does not wakeup the userspace unnecessary and keeps suspended
> as long as possible (saving the power).

Nice. Zaurus needs it, exactly due to this reason:

> 2. Execute critical "delayed work" at suspend.
>  A driver or a system/board may have a delayed work (or any similar
> things) that it wants to execute at the requested time.
>  For example, some chargers want to check the battery voltage some
> time (e.g., 30 seconds) after the battery is fully charged and the
> charger stops. Then, the charger restarts charging if the voltage has
> dropped more than a threshold, which is smaller than "restart-charger"
> voltage, which is a threshold to restart charging regardless of the
> time passed.

Yep, and zaurus also needs it to stop charging.

> The system enters the suspend again if and only if all of the following
> conditions are met:
> 1. None of suspend_again ops returned "I want to stop suspend"
> (suspend_again returns SUSPEND_AGAIN_STOP).
> 2. At least one of suspend_again ops returned "I want to suspend again"
> (suspend_again returns SUSPEND_AGAIN_CONTINUE)

Better naming  could be SUSPEND_AGAIN and VETO_RESUSPEND or
SUSPEND_AGAIN_VETO .

> Anyway, the following features may need to be added later:
> 1. An API to allow devices to express next desired wakeup-time. Then,
> the framework will combine them and setup RTC alarm accordingly and
> save/restore previously registered RTC alarms.

Yep.

									Pavel
Rafael Wysocki April 20, 2011, 8:28 p.m. UTC | #2
On Wednesday, April 20, 2011, MyungJoo Ham wrote:
> A system or a device may need to control suspend/wakeup events. It may
> want to wakeup the system after a predefined amount of time or at a
> predefined event decided while entering suspend for polling or delayed
> work. Then, it may want to enter suspend again if its predefined wakeup
> condition is the only wakeup reason and there is no outstanding events;
> thus, it does not wakeup the userspace unnecessary and keeps suspended
> as long as possible (saving the power).
> 
> Enabling a system to wakeup after a specified time can be easily
> achieved by using RTC. However, to enter suspend again immediately
> without invoking userland, we need additional features in the
> suspend framework.
> 
> Such need comes from:
> 
> 1. Monitoring a critical device status without interrupts that can
> wakeup the system. (in-suspend polling)
>  An example is ambient temperature monitoring that needs to shut down
> the system or a specific device function if it is too hot or cold. The
> temperature of a specific device may be needed to be monitored as well;
> e.g., a charger monitors battery temperature in order to stop charging
> if overheated.
> 
> 2. Execute critical "delayed work" at suspend.
>  A driver or a system/board may have a delayed work (or any similar
> things) that it wants to execute at the requested time.
>  For example, some chargers want to check the battery voltage some
> time (e.g., 30 seconds) after the battery is fully charged and the
> charger stops. Then, the charger restarts charging if the voltage has
> dropped more than a threshold, which is smaller than "restart-charger"
> voltage, which is a threshold to restart charging regardless of the
> time passed.
> 
> This patch allows a system or a device to provide "suspend_again"
> callback with syscore_ops. With suspend_again callbacks registered,
> the suspend framework (kernel/power/suspend.c) tries to enter suspend
> again if conditions are met.

syscore_ops are defined to be executed on one CPU and with interrupts off.
I don't think your new callback matches this definition.

> The system enters the suspend again if and only if all of the following
> conditions are met:
> 1. None of suspend_again ops returned "I want to stop suspend"
> (suspend_again returns SUSPEND_AGAIN_STOP).
> 2. At least one of suspend_again ops returned "I want to suspend again"
> (suspend_again returns SUSPEND_AGAIN_CONTINUE)
> 
> suspend_again ops may return "I do not care. This wakeup is not related
> with me." (SUSPEND_AGAIN_NC, which is 0).
> 
> Use SUSPEND_AGAIN_STOP in order to override other devices'
> SUSPEND_AGAIN_CONTINUE and to wakeup fully. For devices that poll
> sensors during suspend may need this if any outstanding status is found.
> For conventional suspend wakeup sources, SUSPEND_AGAIN_STOP may be used
> to override SUSPEND_AGAIN devices.
> 
> Anyway, the following features may need to be added later:
> 1. An API to allow devices to express next desired wakeup-time. Then,
> the framework will combine them and setup RTC alarm accordingly and
> save/restore previously registered RTC alarms.
> 2. Create a method to declare a specific instance of delayed-work is to
> be executed in suspend by waking up the system in the middle of
> suspend. Then, let the framework handle those "critical" delayed-work
> in suspend.
> 3. If a device says SUSPEND_AGAIN_CONTINUE and there is another wakeup
> source pending (e.g., power button) without suspend_again ops, the
> system will enter suspend again. In such a case, the system should not
> suspend again. We may need to see if irqs that are enabled by
> set_irq_wake() (and not related to suspend_ops devices)
> are pending at syscore_suspend_again(). Maybe we need to add
> something like "set_irq_wake_with_suspend_again" so that IRQs with
> suspend_again ops implemented are ignored for the
> "override-suspend-again-continue" checking.
> 
> For the initial release, I have set the point of "suspend-again" after
> suspend_ops->end(). However, I'm not so sure about where to set the
> suspend-again point. Because in-suspend polling, which may require
> I/O with other devices, is supposed to be executed at suspend-again ops,
> the suspend-again point is configured to be as late as possible in
> suspend_devices_and_enter(). In order to reduce the number of devices
> waked up, we may need to set the suspend-again point ealier.
> 
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

The idea seems to be sane, but I don't like the implementation.

First off, why do you thing it's a good thing to put the callback into
struct syscore_ops?

> ---
>  drivers/base/syscore.c      |   36 +++++++++++++++++++++++++++
>  include/linux/syscore_ops.h |    7 +++++
>  kernel/power/suspend.c      |   57 +++++++++++++++++++++++-------------------
>  3 files changed, 74 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/base/syscore.c b/drivers/base/syscore.c
> index 90af294..1a7e08d 100644
> --- a/drivers/base/syscore.c
> +++ b/drivers/base/syscore.c
> @@ -95,6 +95,42 @@ void syscore_resume(void)
>  				"Interrupts enabled after %pF\n", ops->resume);
>  		}
>  }
> +
> +/**
> + * syscore_suspend_again - Execute all the registeres system core suspend_again
> + *			callbacks. If at least one returns
> + *			SUSPEND_AGAIN_CONTINUE and no one returns
> + *			SUSPEND_AGAIN_STOP, syscore_suspend_again let the system
> + *			enter suspend again.
> + */

That causes the ->suspend_again() callbacks to be quite complicated.  I'm not
sure this is necessary in general.

> +bool syscore_suspend_again(void)
> +{
> +	struct syscore_ops *ops;
> +	enum suspend_again_cond condition = SUSPEND_AGAIN_NC;
> +
> +	list_for_each_entry(ops, &syscore_ops_list, node)
> +		if (ops->suspend_again) {
> +			switch (ops->suspend_again()) {
> +			case SUSPEND_AGAIN_NC:
> +				break;
> +			case SUSPEND_AGAIN_CONTINUE:
> +				if (condition == SUSPEND_AGAIN_NC)
> +					condition = SUSPEND_AGAIN_CONTINUE;
> +				break;
> +			case SUSPEND_AGAIN_STOP:
> +				condition = SUSPEND_AGAIN_STOP;
> +				break;
> +			default:
> +				pr_warn("PM: incorrect return from %pF\n",
> +					ops->suspend_again);
> +			}
> +		}
> +
> +	if (condition == SUSPEND_AGAIN_CONTINUE)
> +		return true;
> +
> +	return false;
> +}
>  #endif /* CONFIG_PM_SLEEP */
>  
>  /**
> diff --git a/include/linux/syscore_ops.h b/include/linux/syscore_ops.h
> index 27b3b0b..bf9bc4e 100644
> --- a/include/linux/syscore_ops.h
> +++ b/include/linux/syscore_ops.h
> @@ -11,10 +11,16 @@
>  
>  #include <linux/list.h>
>  
> +enum suspend_again_cond {
> +	SUSPEND_AGAIN_NC = 0, /* Do Not Care */
> +	SUSPEND_AGAIN_CONTINUE, /* Start or keep the again */
> +	SUSPEND_AGAIN_STOP, /* Stop or do not start. Override CONTINUE */
> +};
>  struct syscore_ops {
>  	struct list_head node;
>  	int (*suspend)(void);
>  	void (*resume)(void);
> +	enum suspend_again_cond (*suspend_again)(void);
>  	void (*shutdown)(void);
>  };
>  
> @@ -23,6 +29,7 @@ extern void unregister_syscore_ops(struct syscore_ops *ops);
>  #ifdef CONFIG_PM_SLEEP
>  extern int syscore_suspend(void);
>  extern void syscore_resume(void);
> +extern bool syscore_suspend_again(void);
>  #endif
>  extern void syscore_shutdown(void);
>  
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 2814c32..aa6a3d1 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -202,43 +202,48 @@ static int suspend_enter(suspend_state_t state)
>  int suspend_devices_and_enter(suspend_state_t state)
>  {
>  	int error;
> +	bool recover = false;
>  
>  	if (!suspend_ops)
>  		return -ENOSYS;
>  
> -	trace_machine_suspend(state);
> -	if (suspend_ops->begin) {
> -		error = suspend_ops->begin(state);
> -		if (error)
> -			goto Close;
> -	}
> -	suspend_console();
> -	pm_restrict_gfp_mask();
> -	suspend_test_start();
> -	error = dpm_suspend_start(PMSG_SUSPEND);
> -	if (error) {
> -		printk(KERN_ERR "PM: Some devices failed to suspend\n");
> -		goto Recover_platform;
> -	}
> -	suspend_test_finish("suspend devices");
> -	if (suspend_test(TEST_DEVICES))
> -		goto Recover_platform;
> +	do {
> +		trace_machine_suspend(state);
> +		if (suspend_ops->begin) {
> +			error = suspend_ops->begin(state);
> +			if (error)
> +				goto Close;

All of those goto statements from the inside of the loop don't really look
good.  Isn't there any way to avoid them?

> +		}
> +		suspend_console();
> +		pm_restrict_gfp_mask();
> +		suspend_test_start();
> +		error = dpm_suspend_start(PMSG_SUSPEND);
> +		if (error) {
> +			printk(KERN_ERR "PM: Some devices failed to suspend\n");
> +			goto Recover_platform;
> +		}
> +		suspend_test_finish("suspend devices");
> +		if (suspend_test(TEST_DEVICES))
> +			goto Recover_platform;
>  
> -	suspend_enter(state);
> +		error = suspend_enter(state);
>  
>   Resume_devices:
> -	suspend_test_start();
> -	dpm_resume_end(PMSG_RESUME);
> -	suspend_test_finish("resume devices");
> -	pm_restore_gfp_mask();
> -	resume_console();
> +		suspend_test_start();
> +		dpm_resume_end(PMSG_RESUME);
> +		suspend_test_finish("resume devices");
> +		pm_restore_gfp_mask();
> +		resume_console();
>   Close:
> -	if (suspend_ops->end)
> -		suspend_ops->end();
> -	trace_machine_suspend(PWR_EVENT_EXIT);
> +		if (suspend_ops->end)
> +			suspend_ops->end();
> +		trace_machine_suspend(PWR_EVENT_EXIT);
> +	} while (syscore_suspend_again() && !error && !recover);

Why exactly do you think that the next automatic suspend should occur at
this particular point?

> +
>  	return error;
>  
>   Recover_platform:
> +	recover = true;
>  	if (suspend_ops->recover)
>  		suspend_ops->recover();
>  	goto Resume_devices;
> 

To summarize:
* I don't think struct syscore_ops is the right place for the new callback.
* The necessity to take care of the possibly many different return values
  with different meanings have a potential to introduce unnecessary
  complications into the subsystems implementing the new callback.
* It isn't particularly clear to me why the new suspend should occur at the
  point you want it to.

Moreover, what's wrong with thawing user space processes and _then_
automatically suspending again?  Why do you want to do that from the inside
of the kernel?

Rafael
MyungJoo Ham April 21, 2011, 7:03 a.m. UTC | #3
2011/4/21 Rafael J. Wysocki <rjw@sisk.pl>:
> On Wednesday, April 20, 2011, MyungJoo Ham wrote:
>> A system or a device may need to control suspend/wakeup events. It may
>> want to wakeup the system after a predefined amount of time or at a
>> predefined event decided while entering suspend for polling or delayed
>> work. Then, it may want to enter suspend again if its predefined wakeup
>> condition is the only wakeup reason and there is no outstanding events;
>> thus, it does not wakeup the userspace unnecessary and keeps suspended
>> as long as possible (saving the power).
>>
>> Enabling a system to wakeup after a specified time can be easily
>> achieved by using RTC. However, to enter suspend again immediately
>> without invoking userland, we need additional features in the
>> suspend framework.
>>
>> Such need comes from:
>>
>> 1. Monitoring a critical device status without interrupts that can
>> wakeup the system. (in-suspend polling)
>>  An example is ambient temperature monitoring that needs to shut down
>> the system or a specific device function if it is too hot or cold. The
>> temperature of a specific device may be needed to be monitored as well;
>> e.g., a charger monitors battery temperature in order to stop charging
>> if overheated.
>>
>> 2. Execute critical "delayed work" at suspend.
>>  A driver or a system/board may have a delayed work (or any similar
>> things) that it wants to execute at the requested time.
>>  For example, some chargers want to check the battery voltage some
>> time (e.g., 30 seconds) after the battery is fully charged and the
>> charger stops. Then, the charger restarts charging if the voltage has
>> dropped more than a threshold, which is smaller than "restart-charger"
>> voltage, which is a threshold to restart charging regardless of the
>> time passed.
>>
>> This patch allows a system or a device to provide "suspend_again"
>> callback with syscore_ops. With suspend_again callbacks registered,
>> the suspend framework (kernel/power/suspend.c) tries to enter suspend
>> again if conditions are met.
>
> syscore_ops are defined to be executed on one CPU and with interrupts off.
> I don't think your new callback matches this definition.

Yes, this "suspend-again" is not meant to be that close to suspended
state and is placed near "end()".
Do you think it would be better fitted if it is included as device
driver's pm_ops along with other "suspend", "suspend_noirq", and such?

>
>> The system enters the suspend again if and only if all of the following
>> conditions are met:
>> 1. None of suspend_again ops returned "I want to stop suspend"
>> (suspend_again returns SUSPEND_AGAIN_STOP).
>> 2. At least one of suspend_again ops returned "I want to suspend again"
>> (suspend_again returns SUSPEND_AGAIN_CONTINUE)
>>
>> suspend_again ops may return "I do not care. This wakeup is not related
>> with me." (SUSPEND_AGAIN_NC, which is 0).
>>
>> Use SUSPEND_AGAIN_STOP in order to override other devices'
>> SUSPEND_AGAIN_CONTINUE and to wakeup fully. For devices that poll
>> sensors during suspend may need this if any outstanding status is found.
>> For conventional suspend wakeup sources, SUSPEND_AGAIN_STOP may be used
>> to override SUSPEND_AGAIN devices.
>>
>> Anyway, the following features may need to be added later:
>> 1. An API to allow devices to express next desired wakeup-time. Then,
>> the framework will combine them and setup RTC alarm accordingly and
>> save/restore previously registered RTC alarms.
>> 2. Create a method to declare a specific instance of delayed-work is to
>> be executed in suspend by waking up the system in the middle of
>> suspend. Then, let the framework handle those "critical" delayed-work
>> in suspend.
>> 3. If a device says SUSPEND_AGAIN_CONTINUE and there is another wakeup
>> source pending (e.g., power button) without suspend_again ops, the
>> system will enter suspend again. In such a case, the system should not
>> suspend again. We may need to see if irqs that are enabled by
>> set_irq_wake() (and not related to suspend_ops devices)
>> are pending at syscore_suspend_again(). Maybe we need to add
>> something like "set_irq_wake_with_suspend_again" so that IRQs with
>> suspend_again ops implemented are ignored for the
>> "override-suspend-again-continue" checking.
>>
>> For the initial release, I have set the point of "suspend-again" after
>> suspend_ops->end(). However, I'm not so sure about where to set the
>> suspend-again point. Because in-suspend polling, which may require
>> I/O with other devices, is supposed to be executed at suspend-again ops,
>> the suspend-again point is configured to be as late as possible in
>> suspend_devices_and_enter(). In order to reduce the number of devices
>> waked up, we may need to set the suspend-again point ealier.
>>
>> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>
> The idea seems to be sane, but I don't like the implementation.
>
> First off, why do you thing it's a good thing to put the callback into
> struct syscore_ops?

I put the callback to syscore_ops as it appeared that the
suspend_again callback may be required by a device, a sysdev, or a
platform/machine (that calls suspend_set_ops() ). In other words, a
specific device or sysdev may want to use the feature while the board
itself may want it.

However, I think it's good to have the callback at pm_dev_ops for
devices and let subsystems handle it. In such a case, starting only
with platform_device would be fine.

>
>> ---
>>  drivers/base/syscore.c      |   36 +++++++++++++++++++++++++++
>>  include/linux/syscore_ops.h |    7 +++++
>>  kernel/power/suspend.c      |   57 +++++++++++++++++++++++-------------------
>>  3 files changed, 74 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/base/syscore.c b/drivers/base/syscore.c
>> index 90af294..1a7e08d 100644
>> --- a/drivers/base/syscore.c
>> +++ b/drivers/base/syscore.c
>> @@ -95,6 +95,42 @@ void syscore_resume(void)
>>                               "Interrupts enabled after %pF\n", ops->resume);
>>               }
>>  }
>> +
>> +/**
>> + * syscore_suspend_again - Execute all the registeres system core suspend_again
>> + *                   callbacks. If at least one returns
>> + *                   SUSPEND_AGAIN_CONTINUE and no one returns
>> + *                   SUSPEND_AGAIN_STOP, syscore_suspend_again let the system
>> + *                   enter suspend again.
>> + */
>
> That causes the ->suspend_again() callbacks to be quite complicated.  I'm not
> sure this is necessary in general.

For "suspend-again" feature, the callback has the ability to return
the following 3,
"I want to suspend again" (CONTINUE),
"I want to resume." or "There is an exception or an error." (STOP);
overriding CONTINUE
"I do not care." (NC); default.

That comment is about how these conditions are met in the view of
syscore framework side.

>
[]
>
> All of those goto statements from the inside of the loop don't really look
> good.  Isn't there any way to avoid them?

Fine. I will reform suspend_devices_and_enter's structure in the next revision.

>
[]
>> +             if (suspend_ops->end)
>> +                     suspend_ops->end();
>> +             trace_machine_suspend(PWR_EVENT_EXIT);
>> +     } while (syscore_suspend_again() && !error && !recover);
>
> Why exactly do you think that the next automatic suspend should occur at
> this particular point?

The intention was to allow most devices to use the feature.

Because any actions (e.g., sensor polling, executing delayed work,
...) that are intended to use/require automatic suspend will be
executed at the syscore_suspend_again, we have to guarantee that the
required devices are available at that point except for userland
stuff.

It makes to set the point after "dpm_resume_end(PMSG_RESUME);".

Another consideration was that if we keep it looping without console
enabled, we may have too many logs accumulated while the console is
disabled; thus, the point is further pushed lower to
"resume_console()".

Putting the point after "trace_machine_suspend(PWR_EVENT_EXIT)" was to
allow tracer to track each instance of suspend_again.

>
[]
>>
>
> To summarize:
> * I don't think struct syscore_ops is the right place for the new callback.

Then, what about dev_pm_ops (although, then, it will depend on subsystems)?

> * The necessity to take care of the possibly many different return values
>  with different meanings have a potential to introduce unnecessary
>  complications into the subsystems implementing the new callback.

Aren't three values (CONTINUE / STOP / NotCare) simple enough?
I thought of two values at first (CONTINUE / STOP); however, in that
way, if no one has an error or failure (requiring urgent wake up),
we'd have infinite loop of suspend.

Or maybe we can implement in this way,
if there is a pending interrupt of IRQ with set_irq_wake on unless the
IRQ is specified to be handled by one of suspend_again callbacks, it
is assumed to have STOP equivalent return and we allow only "CONTINUE"
and "STOP" (or simply return 0 to continue, return -ERRNO to stop)

> * It isn't particularly clear to me why the new suspend should occur at the
>  point you want it to.

Please read the above.

>
> Moreover, what's wrong with thawing user space processes and _then_
> automatically suspending again?  Why do you want to do that from the inside
> of the kernel?

If we let user space processes to decide whether to enter suspend
again or not, every user process will wake up and user processes, not
device drivers should take the responsibility to monitor the status
and handle device driver specific tasks (some of delayed work of
device drivers mentioned earlier).

When it is for simple sensor polling and executing critical (but
simple) kernel delayed work,
a) waking up the whole user processes is too heavy (for power
consumption and suspend/wakeup latency),
b) some of kernel device drivers are depending on user processes
(using user process's features at device driver), and
c) user processes expect that the system wakes up when it is now free
to move on, not wanting to sleep again. (waked up by "power button
press", "user defined alarm", "some important messages from
kernel/device", and such)

>
> Rafael
>

Thank you.


Cheers!

- MyungJoo
Rafael Wysocki April 26, 2011, 11:47 a.m. UTC | #4
Hi,

If I saw it correctly, patches [2-3/3] only added the generic routine to
the platform and i2c bus types, right?

Now, this one is better than the previous one IMO, but (1) it also should
cover hibernation (I'd don't see a reason why suspend-to-RAM should be a
special case here) and (2) I don't really think that thawing user space is
"too heavy" (in fact it's much lighter weight than resuming all devices
that your approach doesn't prevent from happening, so for example on my
desktop/notebook machines I woulnd't mind at all if user space were
thawed after all of the devices had been resumed).

On Tuesday, April 26, 2011, MyungJoo Ham wrote:
> A system or a device may need to control suspend/wakeup events. It may
> want to wakeup the system after a predefined amount of time or at a
> predefined event decided while entering suspend for polling or delayed
> work. Then, it may want to enter suspend again if its predefined wakeup
> condition is the only wakeup reason and there is no outstanding events;
> thus, it does not wakeup the userspace unnecessary and keeps suspended
> as long as possible (saving the power).

I'm not really sure if kernel subsystems should decide when to suspend,
because that's a matter of policy and as such I think it should belong to
user space.  The kernel subsystems should be concerned with _how_ to
suspend, not _when_ to do that.

For example, even if your wakeup signal is the only one and it _seems_ it's
better to suspend again, it actually may be better to wake up and let user
space decide.

Moreover, I'm not sure if kernel subsystems (including drivers) should really
decide when to generate wakeup signals in the first place.  Generally,
user space decides what devices should wake up the system from sleep and the
kernel should follow.  So, there shouldn't be any wakeup signals enabled
beyond what user space has requested.

> Enabling a system to wakeup after a specified time can be easily
> achieved by using RTC. However, to enter suspend again immediately
> without invoking userland, we need additional features in the
> suspend framework.
> 
> Such need comes from:
> 
> 1. Monitoring a critical device status without interrupts that can
> wakeup the system. (in-suspend polling)
>  An example is ambient temperature monitoring that needs to shut down
> the system or a specific device function if it is too hot or cold. The
> temperature of a specific device may be needed to be monitored as well;
> e.g., a charger monitors battery temperature in order to stop charging
> if overheated.

I'd say in those cases (ie. if polling is necessary to prevent some
disaster from happening) you shouldn't suspend at all and use runtime PM
instead (I know that's hard with Android, but please not that Android is
not using the mainline kernel).

> 2. Execute critical "delayed work" at suspend.
>  A driver or a system/board may have a delayed work (or any similar
> things) that it wants to execute at the requested time.
>  For example, some chargers want to check the battery voltage some
> time (e.g., 30 seconds) after the battery is fully charged and the
> charger has stopped. Then, the charger restarts charging if the voltage has
> dropped more than a threshold, which is smaller than "restart-charger"
> voltage, which is a threshold to restart charging regardless of the
> time passed.

Again, don't suspend if you have a usage case like this.

> This patch allows a device to provide "suspend_again" callback with
> struct dev_pm_ops in driver. With suspend_again callbacks registered
> along with supporting subsystems, the suspend framework
> (kernel/power/suspend.c) tries to enter suspend again if conditions are met.
> In general, in order to support suspend-again, subsystems simply need to
> return the value returned by the suspend_again of device as the
> pm_generic_suspend_again does.
> 
> The system enters the suspend again if and only if all of the following
> conditions are met:
> 1. None of suspend_again ops returned "I want to stop suspend" by
> returning a negative number.
> 2. At least one of suspend_again ops returned "I want to suspend again"
> by returning a positive number.
> 
> suspend_again ops may return "I do not care. This wakeup is not related
> with me." by providing zero.
> 
> suspend_again ops may return a negative number in order to override
> other devices' "suspend again" request and to wakeup fully. Devices
> that poll sensors during suspend may need this if any outstanding status
> that requires to notify the userland is found. Conventional suspend
> wakeup sources may also need this to override SUSPEND_AGAIN devices.
> (we may need to lookup every IRQ registered with set_irq_wake and
> register IRQS that are related with suspend_ops separatedly. this is
> not implemented yet anyway and discusses below with item 3.)
> 
> Anyway, the following features may need to be added later:
> 
> 1. An API to allow devices to express next desired wakeup-time. Then,
> the framework will combine them and setup RTC alarm accordingly and
> save/restore previously registered RTC alarms.
> => For this, rtc_timer_init/start/cancel might work for any rtc that
> has wakeup enabled. We need to express whether the rtc_timer is going to
> be used for suspend_again so that it would not fully wakeup the system.
> => Then, we need a bogus platform-device of rtc to see if the
> current wakeup event is whether a) rtc-induced and b) suspend-again
> related rtc event. This may be implemented at drivers/rtc/interface.c?
> 
> 2. Create a method to declare a specific instance of delayed-work is to
> be executed in suspend by waking up the system in the middle of
> suspend. Then, let the framework handle those "critical" delayed-work
> in suspend.
> => If the workitem 1 with identification method is done, this will
> probably be an easy feature to add. (virtually already implemented with 1.)
> 
> 3. If a device says "suspend again, >0" and there is another wakeup
> source pending (e.g., power button) without suspend_again ops, the
> system will enter suspend again. In such a case, the system should not
> suspend again. We may need to see if irqs that are enabled by
> set_irq_wake() (and not related to suspend_ops devices)
> are pending at syscore_suspend_again(). Maybe we need to add
> something like "set_irq_wake_with_suspend_again" so that IRQs with
> suspend_again ops implemented are ignored for the
> "override-suspend-again-continue" checking.
> => For wakeup-enabled IRQs, such behavior of fully-waking up the system
> (by returning a negative number at suspend_again) should be default and
> should not require any modification; the change should be transparent to
> the conventional wakeup sources. Thus, we should modify IRQ core to
> do this by default and add an option not to return < 0 optionally
> (selected by suspend-again users). The implementation of 1 and 2 should
> include this in their code as well.
> 
> In this patch, the suspend-again loops at "suspend_devices_and_enter"
> because at that point,
> 1. every device is resume and are ready to be used.
> 2. userspace is still not invoked (transparent to it)
> 3. console is accessible as it is outside of suspend/resume_console.
> 4. it is outside of trace_machine_suspend(); thus, we can have seperated
> traces for each instance of suspend_again.
> 
> For the RFC release, I have set the point of "suspend-again" after
> suspend_ops->end(). However, I'm not so sure about where to set the
> suspend-again point. Because in-suspend polling, which may require
> I/O with other devices, is supposed to be executed at suspend-again ops,
> the suspend-again point is configured to be as late as possible in
> suspend_devices_and_enter(). In order to reduce the number of devices
> waked up, we may need to set the suspend-again point ealier.
> 
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> --
> Changes from v1:
> 	- moved from syscore to dev_pm_ops
> 	- added generic ops for subsystems.
> 	- cleaned up suspend_again code at kernel/power/suspend.
> ---
>  drivers/base/power/generic_ops.c |   17 ++++++++++
>  drivers/base/power/main.c        |   61 ++++++++++++++++++++++++++++++++++++++
>  include/linux/pm.h               |   17 ++++++++++
>  kernel/power/suspend.c           |    4 ++-
>  4 files changed, 98 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/base/power/generic_ops.c b/drivers/base/power/generic_ops.c
> index 42f97f9..0976452 100644
> --- a/drivers/base/power/generic_ops.c
> +++ b/drivers/base/power/generic_ops.c
> @@ -213,6 +213,22 @@ int pm_generic_restore(struct device *dev)
>  	return __pm_generic_resume(dev, PM_EVENT_RESTORE);
>  }
>  EXPORT_SYMBOL_GPL(pm_generic_restore);
> +
> +/**
> + * pm_generic_suspend_again - Generic suspend-again callback for subsystems.
> + * @dev: Device to check for suspend-again request.
> + */
> +int pm_generic_suspend_again(struct device *dev)
> +{
> +	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +
> +	if (!pm || !pm->suspend_again)
> +		return 0;
> +
> +	return pm->suspend_again(dev);
> +}
> +EXPORT_SYMBOL_GPL(pm_generic_suspend_again);
> +
>  #endif /* CONFIG_PM_SLEEP */
>  
>  struct dev_pm_ops generic_subsys_pm_ops = {
> @@ -223,6 +239,7 @@ struct dev_pm_ops generic_subsys_pm_ops = {
>  	.thaw = pm_generic_thaw,
>  	.poweroff = pm_generic_poweroff,
>  	.restore = pm_generic_restore,
> +	.suspend_again = pm_generic_suspend_again,
>  #endif
>  #ifdef CONFIG_PM_RUNTIME
>  	.runtime_suspend = pm_generic_runtime_suspend,
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 052dc53..a5b659f 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1082,3 +1082,64 @@ int device_pm_wait_for_dev(struct device *subordinate, struct device *dev)
>  	return async_error;
>  }
>  EXPORT_SYMBOL_GPL(device_pm_wait_for_dev);
> +
> +/**
> + * dpm_suspend_again - Let devices poll or run code while the system is kept
> + *	suspended in the view of userland.
> + *
> + * Execute all "suspend_again" callbacks of devices and return non-zero value
> + * if all of the following conditions are met:
> + * 1) There is a device that wants to suspend again (returned > 0)
> + * 2) There is no device that wants to wake up (returned < 0)
> + */
> +int dpm_suspend_again(void)
> +{
> +	int suspend_again = 0;
> +	int wakeup = 0;
> +	struct device *dev;
> +
> +	mutex_lock(&dpm_list_mtx);
> +
> +	list_for_each_entry(dev, &dpm_list, power.entry) {
> +		const struct dev_pm_ops *ops = NULL;
> +		int result = 0;
> +
> +		get_device(dev);
> +		mutex_unlock(&dpm_list_mtx);
> +		device_lock(dev);
> +

Please remember of power domains (that's being worked on at the moment).

> +		if (dev->type && dev->type->pm) {
> +			ops = dev->type->pm;
> +			dev_dbg(dev, "type suspend_again ");
> +		} else if (dev->class && dev->class->pm) {
> +			ops = dev->class->pm;
> +			dev_dbg(dev, "class suspend_again ");
> +		} else if (dev->bus && dev->bus->pm) {
> +			ops = dev->bus->pm;
> +			dev_dbg(dev, "suspend_again ");
> +		}
> +		if (ops && ops->suspend_again)
> +			result = ops->suspend_again(dev);
> +
> +		device_unlock(dev);
> +		mutex_lock(&dpm_list_mtx);
> +		put_device(dev);
> +
> +		if (result > 0)
> +			suspend_again++;
> +		if (result < 0)
> +			wakeup++;
> +	}
> +	mutex_unlock(&dpm_list_mtx);
> +
> +	pr_debug("%d devices wants to suspend again. "
> +		 "%d devices wants to wakeup.\n", suspend_again, wakeup);
> +
> +	if (suspend_again && !wakeup) {
> +		pr_info("Suspending Again.\n");
> +		return 1;
> +	}
> +
> +	pr_info("No devices wants to suspend again or a device wants to wakeup.\n");
> +	return 0;
> +}
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index 512e091..d3a9c67 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -145,6 +145,15 @@ typedef struct pm_message {
>   *	actions required for resuming the device that need interrupts to be
>   *	disabled
>   *
> + * @suspend_again: Tell the system whether the device wants to either
> + *	suspend again (return > 0), wake up (return < 0), or not-care
> + *	(return = 0). If a device wants to poll sensors or execute some code
> + *	during suspended, suspend_again callback is the place assuming that
> + *	periodic-wakeup or alarm-wakeup is already setup. Note that if a
> + *	device returns "wakeup" with an under-zero value, it overrides
> + *	other devices' "suspend again" return values. This allows to
> + *	execute some codes while being kept suspended in the view of userland.
> + *

Since, as I said, I think that should cover hibernation too, I'd change the
name to, say, sleep_again().

>   * @freeze_noirq: Complete the operations of ->freeze() by carrying out any
>   *	actions required for freezing the device that need interrupts to be
>   *	disabled
> @@ -212,6 +221,7 @@ struct dev_pm_ops {
>  	int (*restore)(struct device *dev);
>  	int (*suspend_noirq)(struct device *dev);
>  	int (*resume_noirq)(struct device *dev);
> +	int (*suspend_again)(struct device *dev);
>  	int (*freeze_noirq)(struct device *dev);
>  	int (*thaw_noirq)(struct device *dev);
>  	int (*poweroff_noirq)(struct device *dev);
> @@ -544,6 +554,7 @@ extern void dpm_resume_end(pm_message_t state);
>  extern void device_pm_unlock(void);
>  extern int dpm_suspend_noirq(pm_message_t state);
>  extern int dpm_suspend_start(pm_message_t state);
> +extern int dpm_suspend_again(void);
>  
>  extern void __suspend_report_result(const char *function, void *fn, int ret);
>  
> @@ -563,6 +574,11 @@ static inline int dpm_suspend_start(pm_message_t state)
>  	return 0;
>  }
>  
> +static int dpm_suspend_again(void)
> +{
> +	return 0;
> +}
> +
>  #define suspend_report_result(fn, ret)		do {} while (0)
>  
>  static inline int device_pm_wait_for_dev(struct device *a, struct device *b)
> @@ -585,5 +601,6 @@ extern int pm_generic_freeze(struct device *dev);
>  extern int pm_generic_thaw(struct device *dev);
>  extern int pm_generic_restore(struct device *dev);
>  extern int pm_generic_poweroff(struct device *dev);
> +extern int pm_generic_suspend_again(struct device *dev);
>  
>  #endif /* _LINUX_PM_H */
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 08515b4..b5a84a4 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -291,7 +291,9 @@ int enter_state(suspend_state_t state)
>  		goto Finish;
>  
>  	pr_debug("PM: Entering %s sleep\n", pm_states[state]);
> -	error = suspend_devices_and_enter(state);
> +	do {
> +		error = suspend_devices_and_enter(state);
> +	} while (!error && dpm_suspend_again());
>  
>   Finish:
>  	pr_debug("PM: Finishing wakeup.\n");
> 

To conclude, I'm not sure about the approach.  In particular, I'm not sure
if the benefit is worth the effort and the resulting complications (ie. the
possibility of having to deal with wakeup signals not requested by user
space) seem to be a bit too far reaching.

Greg, what do you think?

Thanks,
Rafael
Greg Kroah-Hartman April 26, 2011, 1:17 p.m. UTC | #5
On Tue, Apr 26, 2011 at 01:47:21PM +0200, Rafael J. Wysocki wrote:
> Moreover, I'm not sure if kernel subsystems (including drivers) should really
> decide when to generate wakeup signals in the first place.  Generally,
> user space decides what devices should wake up the system from sleep and the
> kernel should follow.  So, there shouldn't be any wakeup signals enabled
> beyond what user space has requested.

The RTC wakeup signals are ok though, right?  Userspace is the one
asking for that from what I can tell.

> To conclude, I'm not sure about the approach.  In particular, I'm not sure
> if the benefit is worth the effort and the resulting complications (ie. the
> possibility of having to deal with wakeup signals not requested by user
> space) seem to be a bit too far reaching.
> 
> Greg, what do you think?

I agree with you in that I don't think that this type of feature is
valid at the moment.

I don't understand why our current situation doesn't work, what are we
lacking that is needed for these systems that we have not seen before?

What is the root problem that this is trying to solve?

thanks,

greg k-h
Pavel Machek April 26, 2011, 8:35 p.m. UTC | #6
Hi!

> If I saw it correctly, patches [2-3/3] only added the generic routine to
> the platform and i2c bus types, right?
> 
> Now, this one is better than the previous one IMO, but (1) it also should
> cover hibernation (I'd don't see a reason why suspend-to-RAM should be a
> special case here) and (2) I don't really think that thawing user
> space is

Suspend-to-RAM really is special here... at least for zaurus-like
machine.

In hibernation, you are "powered down"; that means suspend to RAM with
bootloader active (operating system is off). You do not need to do any
kind of maintenance -- bootloader takes care of battery charging and
protection.

> "too heavy" (in fact it's much lighter weight than resuming all devices
> that your approach doesn't prevent from happening, so for example on my
> desktop/notebook machines I woulnd't mind at all if user space were
> thawed after all of the devices had been resumed).

Well, it would be behavior change for the user. I told the zaurus to
go s2ram, I do not expect it to wake up after 5 minutes because it
needed to check battery status.

I'd have to modify my userland to retry suspend again and again and
again...

> > @@ -145,6 +145,15 @@ typedef struct pm_message {
> >   *	actions required for resuming the device that need interrupts to be
> >   *	disabled
> >   *
> > + * @suspend_again: Tell the system whether the device wants to either
> > + *	suspend again (return > 0), wake up (return < 0), or not-care
> > + *	(return = 0). If a device wants to poll sensors or execute some code
> > + *	during suspended, suspend_again callback is the place assuming that
> > + *	periodic-wakeup or alarm-wakeup is already setup. Note that if a
> > + *	device returns "wakeup" with an under-zero value, it overrides
> > + *	other devices' "suspend again" return values. This allows to
> > + *	execute some codes while being kept suspended in the view of userland.
> > + *
> 
> Since, as I said, I think that should cover hibernation too, I'd change the
> name to, say, sleep_again().

I'm not sure if we need to cover hibernation. Do you know any machine
that needs this for hibernation case?

> > @@ -291,7 +291,9 @@ int enter_state(suspend_state_t state)
> >  		goto Finish;
> >  
> >  	pr_debug("PM: Entering %s sleep\n", pm_states[state]);
> > -	error = suspend_devices_and_enter(state);
> > +	do {
> > +		error = suspend_devices_and_enter(state);
> > +	} while (!error && dpm_suspend_again());
> >  
> >   Finish:
> >  	pr_debug("PM: Finishing wakeup.\n");
> > 
> 
> To conclude, I'm not sure about the approach.  In particular, I'm not sure
> if the benefit is worth the effort and the resulting complications (ie. the
> possibility of having to deal with wakeup signals not requested by user
> space) seem to be a bit too far reaching.

We already have platform-specific hacks to do exactly this at least on
Zaurus. Moving it to common code means that hacks are not duplicated..
									Pavel
Pavel Machek April 26, 2011, 8:38 p.m. UTC | #7
Hi!

> > To conclude, I'm not sure about the approach.  In particular, I'm not sure
> > if the benefit is worth the effort and the resulting complications (ie. the
> > possibility of having to deal with wakeup signals not requested by user
> > space) seem to be a bit too far reaching.
> > 
> > Greg, what do you think?
> 
> I agree with you in that I don't think that this type of feature is
> valid at the moment.

Our current "solution" is low level suspend code on Zaurus directly
looking at charger state and doing the "wakeup or not" decision by hand. 

> I don't understand why our current situation doesn't work, what are we
> lacking that is needed for these systems that we have not seen
> before?

It works, but it is ugly; and it seems samsung now needs similar
hacks.

> What is the root problem that this is trying to solve?

It is trying to fix machines that need to run periodic kernel tasks
even when user asked them to sleep. Zaurus needs to periodicaly wake
up to be able to charge battery in s2ram state, for example.
								Pavel
Rafael Wysocki April 26, 2011, 8:47 p.m. UTC | #8
On Tuesday, April 26, 2011, Pavel Machek wrote:
> Hi!
> 
> > If I saw it correctly, patches [2-3/3] only added the generic routine to
> > the platform and i2c bus types, right?
> > 
> > Now, this one is better than the previous one IMO, but (1) it also should
> > cover hibernation (I'd don't see a reason why suspend-to-RAM should be a
> > special case here) and (2) I don't really think that thawing user
> > space is
> 
> Suspend-to-RAM really is special here... at least for zaurus-like
> machine.
> 
> In hibernation, you are "powered down"; that means suspend to RAM with
> bootloader active (operating system is off). You do not need to do any
> kind of maintenance -- bootloader takes care of battery charging and
> protection.
> 
> > "too heavy" (in fact it's much lighter weight than resuming all devices
> > that your approach doesn't prevent from happening, so for example on my
> > desktop/notebook machines I woulnd't mind at all if user space were
> > thawed after all of the devices had been resumed).
> 
> Well, it would be behavior change for the user. I told the zaurus to
> go s2ram, I do not expect it to wake up after 5 minutes because it
> needed to check battery status.
> 
> I'd have to modify my userland to retry suspend again and again and
> again...

And that's exactly what should be done.  Have a user space process controlling
that, because avoiding to thaw user space doesn't buy us almost anything.

Now, I know that it's probably easier to modify the kernel than to write
a user space tool for that, test it and so on, but "easier" is not necessarily
"better".

> > > @@ -145,6 +145,15 @@ typedef struct pm_message {
> > >   *	actions required for resuming the device that need interrupts to be
> > >   *	disabled
> > >   *
> > > + * @suspend_again: Tell the system whether the device wants to either
> > > + *	suspend again (return > 0), wake up (return < 0), or not-care
> > > + *	(return = 0). If a device wants to poll sensors or execute some code
> > > + *	during suspended, suspend_again callback is the place assuming that
> > > + *	periodic-wakeup or alarm-wakeup is already setup. Note that if a
> > > + *	device returns "wakeup" with an under-zero value, it overrides
> > > + *	other devices' "suspend again" return values. This allows to
> > > + *	execute some codes while being kept suspended in the view of userland.
> > > + *
> > 
> > Since, as I said, I think that should cover hibernation too, I'd change the
> > name to, say, sleep_again().
> 
> I'm not sure if we need to cover hibernation. Do you know any machine
> that needs this for hibernation case?

Yes, any machine that "needs" it while suspended.  What's the difference,
after all?  The only difference is that there's an image on permanent storage
in the hibernation case.  You still can overheat a battery when charging it
while hibernated, right?

> > > @@ -291,7 +291,9 @@ int enter_state(suspend_state_t state)
> > >  		goto Finish;
> > >  
> > >  	pr_debug("PM: Entering %s sleep\n", pm_states[state]);
> > > -	error = suspend_devices_and_enter(state);
> > > +	do {
> > > +		error = suspend_devices_and_enter(state);
> > > +	} while (!error && dpm_suspend_again());
> > >  
> > >   Finish:
> > >  	pr_debug("PM: Finishing wakeup.\n");
> > > 
> > 
> > To conclude, I'm not sure about the approach.  In particular, I'm not sure
> > if the benefit is worth the effort and the resulting complications (ie. the
> > possibility of having to deal with wakeup signals not requested by user
> > space) seem to be a bit too far reaching.
> 
> We already have platform-specific hacks to do exactly this at least on
> Zaurus. Moving it to common code means that hacks are not duplicated..

Well, good to know they are there, but I'm still not sure what to do about
that.  At the moment I feel like having too little information to really
decide, so perhaps please point me to the code you're talking about for
starters.

Thanks,
Rafael
Rafael Wysocki April 26, 2011, 8:49 p.m. UTC | #9
On Tuesday, April 26, 2011, Pavel Machek wrote:
> Hi!
> 
> > > To conclude, I'm not sure about the approach.  In particular, I'm not sure
> > > if the benefit is worth the effort and the resulting complications (ie. the
> > > possibility of having to deal with wakeup signals not requested by user
> > > space) seem to be a bit too far reaching.
> > > 
> > > Greg, what do you think?
> > 
> > I agree with you in that I don't think that this type of feature is
> > valid at the moment.
> 
> Our current "solution" is low level suspend code on Zaurus directly
> looking at charger state and doing the "wakeup or not" decision by hand. 
> 
> > I don't understand why our current situation doesn't work, what are we
> > lacking that is needed for these systems that we have not seen
> > before?
> 
> It works, but it is ugly; and it seems samsung now needs similar
> hacks.
> 
> > What is the root problem that this is trying to solve?
> 
> It is trying to fix machines that need to run periodic kernel tasks
> even when user asked them to sleep. Zaurus needs to periodicaly wake
> up to be able to charge battery in s2ram state, for example.

Well, if you wake up periodically, it's not S2RAM any more, I'd say.

Thanks,
Rafael
Greg Kroah-Hartman April 26, 2011, 8:57 p.m. UTC | #10
On Tue, Apr 26, 2011 at 10:38:53PM +0200, Pavel Machek wrote:
> Hi!
> 
> > > To conclude, I'm not sure about the approach.  In particular, I'm not sure
> > > if the benefit is worth the effort and the resulting complications (ie. the
> > > possibility of having to deal with wakeup signals not requested by user
> > > space) seem to be a bit too far reaching.
> > > 
> > > Greg, what do you think?
> > 
> > I agree with you in that I don't think that this type of feature is
> > valid at the moment.
> 
> Our current "solution" is low level suspend code on Zaurus directly
> looking at charger state and doing the "wakeup or not" decision by hand. 
> 
> > I don't understand why our current situation doesn't work, what are we
> > lacking that is needed for these systems that we have not seen
> > before?
> 
> It works, but it is ugly; and it seems samsung now needs similar
> hacks.
> 
> > What is the root problem that this is trying to solve?
> 
> It is trying to fix machines that need to run periodic kernel tasks
> even when user asked them to sleep. Zaurus needs to periodicaly wake
> up to be able to charge battery in s2ram state, for example.

That sounds like a bug in userspace you are trying to work around.  Why
not solve it there first?

thanks,

greg k-h
Pavel Machek April 26, 2011, 9:06 p.m. UTC | #11
Hi!

> > > "too heavy" (in fact it's much lighter weight than resuming all devices
> > > that your approach doesn't prevent from happening, so for example on my
> > > desktop/notebook machines I woulnd't mind at all if user space were
> > > thawed after all of the devices had been resumed).
> > 
> > Well, it would be behavior change for the user. I told the zaurus to
> > go s2ram, I do not expect it to wake up after 5 minutes because it
> > needed to check battery status.
> > 
> > I'd have to modify my userland to retry suspend again and again and
> > again...
> 
> And that's exactly what should be done.  Have a user space process controlling
> that, because avoiding to thaw user space doesn't buy us almost
> anything.

That makes Zaurus implement different user-kernel interface than PC
class machine, because of hardware quirk.

> Now, I know that it's probably easier to modify the kernel than to write
> a user space tool for that, test it and so on, but "easier" is not necessarily
> "better".

It is easier, allows us to keep same user-kernel interface on PC and
Zaurus, and is compatible with 2.6.38.

Heck, I'm used to typing "echo mem > /sys/power/state". I should not
have to learn different interface just because Zaurus does not have
proper hardware charger.

> > I'm not sure if we need to cover hibernation. Do you know any machine
> > that needs this for hibernation case?
> 
> Yes, any machine that "needs" it while suspended.  What's the difference,
> after all?  The only difference is that there's an image on permanent storage
> in the hibernation case.  You still can overheat a battery when charging it
> while hibernated, right?

No, you can not; not on Zaurus.

It can not really power down; it is always sleeping. s2ram is sleep in
operating system, hibernation or poweroff is sleep in bootloader.

Bootloader takes care of battery in that case.

> > > To conclude, I'm not sure about the approach.  In particular, I'm not sure
> > > if the benefit is worth the effort and the resulting complications (ie. the
> > > possibility of having to deal with wakeup signals not requested by user
> > > space) seem to be a bit too far reaching.
> > 
> > We already have platform-specific hacks to do exactly this at least on
> > Zaurus. Moving it to common code means that hacks are not duplicated..
> 
> Well, good to know they are there, but I'm still not sure what to do about
> that.  At the moment I feel like having too little information to really
> decide, so perhaps please point me to the code you're talking about for
> starters.

Ok, see the spitz_should_wakeup() function in arch/arm/mach-pxa/* and
should_wakeup() usage.
									Pavel
Pavel Machek April 26, 2011, 9:11 p.m. UTC | #12
Hi!

> > > What is the root problem that this is trying to solve?
> > 
> > It is trying to fix machines that need to run periodic kernel tasks
> > even when user asked them to sleep. Zaurus needs to periodicaly wake
> > up to be able to charge battery in s2ram state, for example.
> 
> Well, if you wake up periodically, it's not S2RAM any more, I'd say.

It certainly looks a lot like s2ram to the user.

Now, your PC wakes up periodically in s2ram, too; but it is keyboard
controller that wakes up, not CPU, so you do not know.

On Zaurus, you do not know, either; nothing visible happens and
userspace is shielded from these details.

Should user really have to know if battery charger is implemented on
keyboard controller or on main cpu before he selects which interface
to use?
									Pavel
Rafael Wysocki April 26, 2011, 9:36 p.m. UTC | #13
On Tuesday, April 26, 2011, Pavel Machek wrote:
> Hi!
> 
> > > > What is the root problem that this is trying to solve?
> > > 
> > > It is trying to fix machines that need to run periodic kernel tasks
> > > even when user asked them to sleep. Zaurus needs to periodicaly wake
> > > up to be able to charge battery in s2ram state, for example.
> > 
> > Well, if you wake up periodically, it's not S2RAM any more, I'd say.
> 
> It certainly looks a lot like s2ram to the user.
> 
> Now, your PC wakes up periodically in s2ram, too; but it is keyboard
> controller that wakes up, not CPU, so you do not know.

I seriously doubt it's a keyboard controller in contemporary machines.

Yes, there are parts of them that don't actually sleep, but that's not
under OS control.

> On Zaurus, you do not know, either; nothing visible happens and
> userspace is shielded from these details.

If you wake up device drivers, it _is_ visible.  Maybe not to user
space, but that's a semantic difference.

> Should user really have to know if battery charger is implemented on
> keyboard controller or on main cpu before he selects which interface
> to use?

Please don't confuse user space with the user.  You can resume all the
way to user space without turning the backlight on and the user will not
notice, right?

Rafael
Rafael Wysocki April 26, 2011, 9:46 p.m. UTC | #14
On Tuesday, April 26, 2011, Pavel Machek wrote:
> Hi!
> 
> > > > "too heavy" (in fact it's much lighter weight than resuming all devices
> > > > that your approach doesn't prevent from happening, so for example on my
> > > > desktop/notebook machines I woulnd't mind at all if user space were
> > > > thawed after all of the devices had been resumed).
> > > 
> > > Well, it would be behavior change for the user. I told the zaurus to
> > > go s2ram, I do not expect it to wake up after 5 minutes because it
> > > needed to check battery status.
> > > 
> > > I'd have to modify my userland to retry suspend again and again and
> > > again...
> > 
> > And that's exactly what should be done.  Have a user space process controlling
> > that, because avoiding to thaw user space doesn't buy us almost
> > anything.
> 
> That makes Zaurus implement different user-kernel interface than PC
> class machine, because of hardware quirk.

Let me say that again: If Zaurus needs to resume everything except for
user space periodically to monitor the battery charger, I'm not sure if our
suspend interface is the right one for it in the first place.

It seriously looks like a workaround for the lack of appropriately implemented
runtime PM, just like the Android's opportunistic suspend.

> > Now, I know that it's probably easier to modify the kernel than to write
> > a user space tool for that, test it and so on, but "easier" is not necessarily
> > "better".
> 
> It is easier, allows us to keep same user-kernel interface on PC and
> Zaurus, and is compatible with 2.6.38.
> 
> Heck, I'm used to typing "echo mem > /sys/power/state". I should not
> have to learn different interface just because Zaurus does not have
> proper hardware charger.

No, this interface should not be used on Zaurus at all.  It's not mean for
that and while you can hack it to kind of work, it still is hacking rather
than designing things.

> > > I'm not sure if we need to cover hibernation. Do you know any machine
> > > that needs this for hibernation case?
> > 
> > Yes, any machine that "needs" it while suspended.  What's the difference,
> > after all?  The only difference is that there's an image on permanent storage
> > in the hibernation case.  You still can overheat a battery when charging it
> > while hibernated, right?
> 
> No, you can not; not on Zaurus.
> 
> It can not really power down; it is always sleeping. s2ram is sleep in
> operating system,

Which is not the meaning of S2RAM I use.

> hibernation or poweroff is sleep in bootloader.
> 
> Bootloader takes care of battery in that case.

So the difference is that we let someone else worry.  Cool. :-)

> > > > To conclude, I'm not sure about the approach.  In particular, I'm not sure
> > > > if the benefit is worth the effort and the resulting complications (ie. the
> > > > possibility of having to deal with wakeup signals not requested by user
> > > > space) seem to be a bit too far reaching.
> > > 
> > > We already have platform-specific hacks to do exactly this at least on
> > > Zaurus. Moving it to common code means that hacks are not duplicated..
> > 
> > Well, good to know they are there, but I'm still not sure what to do about
> > that.  At the moment I feel like having too little information to really
> > decide, so perhaps please point me to the code you're talking about for
> > starters.
> 
> Ok, see the spitz_should_wakeup() function in arch/arm/mach-pxa/* and
> should_wakeup() usage.

OK, I will.

Thanks,
Rafael
Pavel Machek April 26, 2011, 10:06 p.m. UTC | #15
Hi!

> > It certainly looks a lot like s2ram to the user.
> > 
> > Now, your PC wakes up periodically in s2ram, too; but it is keyboard
> > controller that wakes up, not CPU, so you do not know.
> 
> I seriously doubt it's a keyboard controller in contemporary machines.

> Yes, there are parts of them that don't actually sleep, but that's not
> under OS control.

Ok, so it is called "embedded controller" these days. But you got the
idea.

> > On Zaurus, you do not know, either; nothing visible happens and
> > userspace is shielded from these details.
> 
> If you wake up device drivers, it _is_ visible.  Maybe not to user
> space, but that's a semantic difference.

Just to clarify; not _all_ device drivers need to be woken up. Just
SPI + charger code is enough... and I guess it would be prefered
because spinning up disk just to toggle charger would be nasty.

> > Should user really have to know if battery charger is implemented on
> > keyboard controller or on main cpu before he selects which interface
> > to use?
> 
> Please don't confuse user space with the user.  You can resume all the
> way to user space without turning the backlight on and the user will not
> notice, right?

The user has to type "echo mem > /sys/power/state" :-). And you better
turn him the backlight on, typing blind is bad.

									Pavel
Pavel Machek April 26, 2011, 10:10 p.m. UTC | #16
Hi!

> > > And that's exactly what should be done.  Have a user space process controlling
> > > that, because avoiding to thaw user space doesn't buy us almost
> > > anything.
> > 
> > That makes Zaurus implement different user-kernel interface than PC
> > class machine, because of hardware quirk.
> 
> Let me say that again: If Zaurus needs to resume everything except for
> user space periodically to monitor the battery charger, I'm not sure if our
> suspend interface is the right one for it in the first place.

Well, Zaurus does not need to resume everything. SPI+charger code is
enough, and that would be preffered. No need to touch disk etc.

> It seriously looks like a workaround for the lack of appropriately implemented
> runtime PM, just like the Android's opportunistic suspend.

No, not really. I'm running standard Debian; I do not want/need
anything like opportunistic suspend.

> > > Now, I know that it's probably easier to modify the kernel than to write
> > > a user space tool for that, test it and so on, but "easier" is not necessarily
> > > "better".
> > 
> > It is easier, allows us to keep same user-kernel interface on PC and
> > Zaurus, and is compatible with 2.6.38.
> > 
> > Heck, I'm used to typing "echo mem > /sys/power/state". I should not
> > have to learn different interface just because Zaurus does not have
> > proper hardware charger.
> 
> No, this interface should not be used on Zaurus at all.  It's not mean for
> that and while you can hack it to kind of work, it still is hacking rather
> than designing things.

Why not? I want the Zaurus to sleep. Why should I have to know how its
charging unit works? Why should I use different interface than on PC?
I want it to suspend, put it into backpack and move...

> > Bootloader takes care of battery in that case.
> 
> So the difference is that we let someone else worry.  Cool. :-)

Yes.

> > Ok, see the spitz_should_wakeup() function in arch/arm/mach-pxa/* and
> > should_wakeup() usage.
> 
> OK, I will.

Thanks.
									Pavel
Pavel Machek April 26, 2011, 10:14 p.m. UTC | #17
Hi!

> > > What is the root problem that this is trying to solve?
> > 
> > It is trying to fix machines that need to run periodic kernel tasks
> > even when user asked them to sleep. Zaurus needs to periodicaly wake
> > up to be able to charge battery in s2ram state, for example.
> 
> That sounds like a bug in userspace you are trying to work around.  Why
> not solve it there first?

Can you be more specific?

The userspace is standard Debian; or maybe even init=/bin/bash. I want
to sleep the machine before I move, so I do echo mem >
/sys/power/state, as I do on PC.

Then I do plug the machine into AC when I move near outlet. In PC,
embedded controller wakes up and starts charging. In Zaurus, it is
kernel (SPI+charging drivers) that wake up, start charging, and go
back to sleep.

Then I (the user) press the power button and get "bash#" back. Where
is the userspace bug?
								Pavel
Rafael Wysocki April 26, 2011, 10:32 p.m. UTC | #18
On Tuesday, April 26, 2011, Rafael J. Wysocki wrote:
> On Tuesday, April 26, 2011, Pavel Machek wrote:
...
> > Ok, see the spitz_should_wakeup() function in arch/arm/mach-pxa/* and
> > should_wakeup() usage.
> 
> OK, I will.

Well, as far as I can tell, on Zaurus this is all done within the platform
->enter() callback, so it doesn't carry out the device resume, while the
proposed $subject patch does.  For this reason, I'm not even sure if the
proposed approach is really suitable for Zaurus (the resuming and suspending
of all devices every once a while will cost quite some enery I guess).

Still, in the unlikely case it is suitable, here's my opinion.

I don't think syscore_ops is the right place to put the new "suspend
again" callback for reasons stated previously.

I also don't think dev_pm_ops is the right place to put such a callback,
because it will only be necessary on very few platforms and there's no
reason why every driver, subsystem or power domain in the kernel should care.

Moreover, the usage cases appear to be suspend-specific.

For the above reasons, the only place the "suspend again" callback may be
put into is struct platform_suspend_ops.  Then, suspend_devices_and_enter()
may be modified so that it loops between dpm_suspend_start() and
dpm_suspend_end() until that callback returns "false" and there are no
wakeup events (as indicated by pm_wakeup_pending(); but please note that
this would require suspend_enter() to indicate that pm_wakeup_pending()
returned "true" and events_check_enabled should not be reset until
we're sure that we _really_ want to resume).

Alternatively, if that's sufficient, suspend_enter() may be called in a
loop until the "suspend again" callback returns "false" and there are no
wakeup events (in the above sense).

The callback can be called "repeat" and return bool as far as I'm concerned,
but I'm not insisting on that.

MyungJoo, would that be suitable to you?

Rafael
MyungJoo Ham April 27, 2011, 6:36 a.m. UTC | #19
On Wed, Apr 27, 2011 at 7:32 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Tuesday, April 26, 2011, Rafael J. Wysocki wrote:
>> On Tuesday, April 26, 2011, Pavel Machek wrote:
> ...
>> > Ok, see the spitz_should_wakeup() function in arch/arm/mach-pxa/* and
>> > should_wakeup() usage.
>>
>> OK, I will.
>
> Well, as far as I can tell, on Zaurus this is all done within the platform
> ->enter() callback, so it doesn't carry out the device resume, while the
> proposed $subject patch does.  For this reason, I'm not even sure if the
> proposed approach is really suitable for Zaurus (the resuming and suspending
> of all devices every once a while will cost quite some enery I guess).

In fact, the kernel "hacks" of mine have been looping at
sysdev-suspend/resume implemented with platform_suspend_ops; thus it
does not carry device resume either.
I've moved the looping point to general device-suspend/resume because
I thought that other systems may require active devices.
However, if others can be satisfied without resumed devices, it is
perfect with me to move that inside device resume as well.

>
> Still, in the unlikely case it is suitable, here's my opinion.
>
> I don't think syscore_ops is the right place to put the new "suspend
> again" callback for reasons stated previously.
>
> I also don't think dev_pm_ops is the right place to put such a callback,
> because it will only be necessary on very few platforms and there's no
> reason why every driver, subsystem or power domain in the kernel should care.
>
> Moreover, the usage cases appear to be suspend-specific.
>
> For the above reasons, the only place the "suspend again" callback may be
> put into is struct platform_suspend_ops.  Then, suspend_devices_and_enter()
> may be modified so that it loops between dpm_suspend_start() and
> dpm_suspend_end() until that callback returns "false" and there are no
> wakeup events (as indicated by pm_wakeup_pending(); but please note that
> this would require suspend_enter() to indicate that pm_wakeup_pending()
> returned "true" and events_check_enabled should not be reset until
> we're sure that we _really_ want to resume).
>
> Alternatively, if that's sufficient, suspend_enter() may be called in a
> loop until the "suspend again" callback returns "false" and there are no
> wakeup events (in the above sense).
>
> The callback can be called "repeat" and return bool as far as I'm concerned,
> but I'm not insisting on that.
>
> MyungJoo, would that be suitable to you?

As long as sysdevs are resumed and some devices/subsystems (I2C-PMIC,
ADC, and RTC in my cases) can be selectively resumed and suspended, it
is fine.
Thus, your "alternative" suggestion is perfect with me. Actually, this
is almost going back to the original hack. =)

I'll submit next revision with platform_suspend_ops later.

Thank you!

>
> Rafael
>

Cheers!
- MyungJoo
Stanislav Brabec April 27, 2011, 9:46 a.m. UTC | #20
MyungJoo Ham wrote:

> As long as sysdevs are resumed and some devices/subsystems (I2C-PMIC,
> ADC, and RTC in my cases) can be selectively resumed and suspended, it
> is fine.
> Thus, your "alternative" suggestion is perfect with me. Actually, this
> is almost going back to the original hack. =)
> 
> I'll submit next revision with platform_suspend_ops later.

Does kernel have something like "sleepy task" interface, e. g. special
mode that is triggered by some sort of interrupt and instead of going to
perform full resume, it just lets a hook to wake up "manually" needed
devices, perform the "sleepy task" and then tell the system whether full
resume is requested?

It can fit for Zaurus battery charging monitoring - timer interrupt
needs to wake just the SPI bus.

But I can imagine a GPS logger using such interface to save energy -
serial interrupt semi-wakes the system each second, but will not go to
do full resume. It just processes NMEA sentence and buffers the result.
Only if buffer becomes full, it issues full resume and writes data
somewhere.
MyungJoo Ham April 27, 2011, 10:47 a.m. UTC | #21
On Wed, Apr 27, 2011 at 6:46 PM, Stanislav Brabec <utx@penguin.cz> wrote:
> MyungJoo Ham wrote:
>
>> As long as sysdevs are resumed and some devices/subsystems (I2C-PMIC,
>> ADC, and RTC in my cases) can be selectively resumed and suspended, it
>> is fine.
>> Thus, your "alternative" suggestion is perfect with me. Actually, this
>> is almost going back to the original hack. =)
>>
>> I'll submit next revision with platform_suspend_ops later.
>
> Does kernel have something like "sleepy task" interface, e. g. special
> mode that is triggered by some sort of interrupt and instead of going to
> perform full resume, it just lets a hook to wake up "manually" needed
> devices, perform the "sleepy task" and then tell the system whether full
> resume is requested?

Not that I know of. I don't see anything that stops the resuming flow
(full resume will be performed at any wakeup event before this patch)

The patch that is going to be revised as mentioned will allow you to perform
1) "manually" wake up some devices
2) perform "sleepy tasks"
3) "manually" suspend devices waked up at 1)
4) tell the system whether full resume is requested.
at the suspend_again callback of struct platform_suspend_ops suspend_ops.

However, you may need some "helpers" to do 1 and 3 easily.

Anyway, for the "helpers" for 1) and 3), I expect that allowing the
one who provide suspend_ops to provide a list of struct dev pointers
that required wakeup would be enough.

I'd look at this issue as well because I have the same issue, but I'd
do it separately with the "suspend_again".

>
> It can fit for Zaurus battery charging monitoring - timer interrupt
> needs to wake just the SPI bus.
>
> But I can imagine a GPS logger using such interface to save energy -
> serial interrupt semi-wakes the system each second, but will not go to
> do full resume. It just processes NMEA sentence and buffers the result.
> Only if buffer becomes full, it issues full resume and writes data
> somewhere.

Sure, as long as the platform file (or machine/board file) has the
suspend_again callback implemented correctly, it will work as you
expected.

>
> --
> Best Regards / S pozdravem,
>
> Stanislav Brabec
> software developer
> ---------------------------------------------------------------------
> SUSE LINUX, s. r. o.                          e-mail: sbrabec@suse.cz
> Lihovarská 1060/12                            tel: +49 911 7405384547
> 190 00 Praha 9                                  fax: +420 284 028 951
> Czech Republic                                    http://www.suse.cz/
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>


Cheers!

- MyungJoo
diff mbox

Patch

diff --git a/drivers/base/syscore.c b/drivers/base/syscore.c
index 90af294..1a7e08d 100644
--- a/drivers/base/syscore.c
+++ b/drivers/base/syscore.c
@@ -95,6 +95,42 @@  void syscore_resume(void)
 				"Interrupts enabled after %pF\n", ops->resume);
 		}
 }
+
+/**
+ * syscore_suspend_again - Execute all the registeres system core suspend_again
+ *			callbacks. If at least one returns
+ *			SUSPEND_AGAIN_CONTINUE and no one returns
+ *			SUSPEND_AGAIN_STOP, syscore_suspend_again let the system
+ *			enter suspend again.
+ */
+bool syscore_suspend_again(void)
+{
+	struct syscore_ops *ops;
+	enum suspend_again_cond condition = SUSPEND_AGAIN_NC;
+
+	list_for_each_entry(ops, &syscore_ops_list, node)
+		if (ops->suspend_again) {
+			switch (ops->suspend_again()) {
+			case SUSPEND_AGAIN_NC:
+				break;
+			case SUSPEND_AGAIN_CONTINUE:
+				if (condition == SUSPEND_AGAIN_NC)
+					condition = SUSPEND_AGAIN_CONTINUE;
+				break;
+			case SUSPEND_AGAIN_STOP:
+				condition = SUSPEND_AGAIN_STOP;
+				break;
+			default:
+				pr_warn("PM: incorrect return from %pF\n",
+					ops->suspend_again);
+			}
+		}
+
+	if (condition == SUSPEND_AGAIN_CONTINUE)
+		return true;
+
+	return false;
+}
 #endif /* CONFIG_PM_SLEEP */
 
 /**
diff --git a/include/linux/syscore_ops.h b/include/linux/syscore_ops.h
index 27b3b0b..bf9bc4e 100644
--- a/include/linux/syscore_ops.h
+++ b/include/linux/syscore_ops.h
@@ -11,10 +11,16 @@ 
 
 #include <linux/list.h>
 
+enum suspend_again_cond {
+	SUSPEND_AGAIN_NC = 0, /* Do Not Care */
+	SUSPEND_AGAIN_CONTINUE, /* Start or keep the again */
+	SUSPEND_AGAIN_STOP, /* Stop or do not start. Override CONTINUE */
+};
 struct syscore_ops {
 	struct list_head node;
 	int (*suspend)(void);
 	void (*resume)(void);
+	enum suspend_again_cond (*suspend_again)(void);
 	void (*shutdown)(void);
 };
 
@@ -23,6 +29,7 @@  extern void unregister_syscore_ops(struct syscore_ops *ops);
 #ifdef CONFIG_PM_SLEEP
 extern int syscore_suspend(void);
 extern void syscore_resume(void);
+extern bool syscore_suspend_again(void);
 #endif
 extern void syscore_shutdown(void);
 
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 2814c32..aa6a3d1 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -202,43 +202,48 @@  static int suspend_enter(suspend_state_t state)
 int suspend_devices_and_enter(suspend_state_t state)
 {
 	int error;
+	bool recover = false;
 
 	if (!suspend_ops)
 		return -ENOSYS;
 
-	trace_machine_suspend(state);
-	if (suspend_ops->begin) {
-		error = suspend_ops->begin(state);
-		if (error)
-			goto Close;
-	}
-	suspend_console();
-	pm_restrict_gfp_mask();
-	suspend_test_start();
-	error = dpm_suspend_start(PMSG_SUSPEND);
-	if (error) {
-		printk(KERN_ERR "PM: Some devices failed to suspend\n");
-		goto Recover_platform;
-	}
-	suspend_test_finish("suspend devices");
-	if (suspend_test(TEST_DEVICES))
-		goto Recover_platform;
+	do {
+		trace_machine_suspend(state);
+		if (suspend_ops->begin) {
+			error = suspend_ops->begin(state);
+			if (error)
+				goto Close;
+		}
+		suspend_console();
+		pm_restrict_gfp_mask();
+		suspend_test_start();
+		error = dpm_suspend_start(PMSG_SUSPEND);
+		if (error) {
+			printk(KERN_ERR "PM: Some devices failed to suspend\n");
+			goto Recover_platform;
+		}
+		suspend_test_finish("suspend devices");
+		if (suspend_test(TEST_DEVICES))
+			goto Recover_platform;
 
-	suspend_enter(state);
+		error = suspend_enter(state);
 
  Resume_devices:
-	suspend_test_start();
-	dpm_resume_end(PMSG_RESUME);
-	suspend_test_finish("resume devices");
-	pm_restore_gfp_mask();
-	resume_console();
+		suspend_test_start();
+		dpm_resume_end(PMSG_RESUME);
+		suspend_test_finish("resume devices");
+		pm_restore_gfp_mask();
+		resume_console();
  Close:
-	if (suspend_ops->end)
-		suspend_ops->end();
-	trace_machine_suspend(PWR_EVENT_EXIT);
+		if (suspend_ops->end)
+			suspend_ops->end();
+		trace_machine_suspend(PWR_EVENT_EXIT);
+	} while (syscore_suspend_again() && !error && !recover);
+
 	return error;
 
  Recover_platform:
+	recover = true;
 	if (suspend_ops->recover)
 		suspend_ops->recover();
 	goto Resume_devices;