diff mbox

[v5,01/48] kernel: Add support for power-off handler call chain

Message ID 1415292213-28652-2-git-send-email-linux@roeck-us.net (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Guenter Roeck Nov. 6, 2014, 4:42 p.m. UTC
Various drivers implement architecture and/or device specific means to
power off the system.  For the most part, those drivers set the global
variable pm_power_off to point to a function within the driver.

This mechanism has a number of drawbacks.  Typically only one scheme
to remove power is supported (at least if pm_power_off is used).
At least in theory there can be multiple means remove power, some of
which may be less desirable. For example, some mechanisms may only
power off the CPU or the CPU card, while another may power off the
entire system.  Others may really just execute a restart sequence
or drop into the ROM monitor. Using pm_power_off can also be racy
if the function pointer is set from a driver built as module, as the
driver may be in the process of being unloaded when pm_power_off is
called. If there are multiple power-off handlers in the system, removing
a module with such a handler may inadvertently reset the pointer to
pm_power_off to NULL, leaving the system with no means to remove power.

Introduce a system power-off handler call chain to solve the described
problems.  This call chain is expected to be executed from the architecture
specific machine_power_off() function.  Drivers and architeceture code
providing system power-off functionality are expected to register with
this call chain.  When registering a power-off handler, callers can
provide a priority to control power-off handler execution sequence
and thus ensure that the power-off handler with the optimal capabilities
to remove power for a given system is called first.

Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Alexander Graf <agraf@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Philippe Rétornaz <philippe.retornaz@gmail.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Romain Perier <romain.perier@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
v5:
- Rebase to v3.18-rc3
v4:
- Do not use notifiers but internal functions and data structures to manage
  the list of power-off handlers. Drop unused parameters from callbacks, and
  make the power-off function type void.
  Code to manage and walk the list of callbacks is derived from notifier.c.
v3:
- Rename new file to power_off_handler.c
- Replace poweroff in all newly introduced variables and in text
  with power_off or power-off as appropriate
- Replace POWEROFF_PRIORITY_xxx with POWER_OFF_PRIORITY_xxx
- Execute power-off handlers without any locks held
v2:
- poweroff -> power_off
- Add defines for default priorities
- Use raw notifiers protected by spinlocks instead of atomic notifiers
- Add register_poweroff_handler_simple
- Add devm_register_power_off_handler

 include/linux/pm.h               |  28 ++++
 kernel/power/Makefile            |   1 +
 kernel/power/power_off_handler.c | 293 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 322 insertions(+)
 create mode 100644 kernel/power/power_off_handler.c

Comments

Guenter Roeck Nov. 6, 2014, 10:27 p.m. UTC | #1
On Thu, Nov 06, 2014 at 11:30:59PM +0100, Rafael J. Wysocki wrote:
> On Thursday, November 06, 2014 08:42:45 AM Guenter Roeck wrote:
> > Various drivers implement architecture and/or device specific means to
> > power off the system.  For the most part, those drivers set the global
> > variable pm_power_off to point to a function within the driver.
> > 
> > This mechanism has a number of drawbacks.  Typically only one scheme
> > to remove power is supported (at least if pm_power_off is used).
> > At least in theory there can be multiple means remove power, some of
> > which may be less desirable. For example, some mechanisms may only
> > power off the CPU or the CPU card, while another may power off the
> > entire system.  Others may really just execute a restart sequence
> > or drop into the ROM monitor. Using pm_power_off can also be racy
> > if the function pointer is set from a driver built as module, as the
> > driver may be in the process of being unloaded when pm_power_off is
> > called. If there are multiple power-off handlers in the system, removing
> > a module with such a handler may inadvertently reset the pointer to
> > pm_power_off to NULL, leaving the system with no means to remove power.
> > 
> > Introduce a system power-off handler call chain to solve the described
> > problems.  This call chain is expected to be executed from the architecture
> > specific machine_power_off() function.  Drivers and architeceture code
> > providing system power-off functionality are expected to register with
> > this call chain.  When registering a power-off handler, callers can
> > provide a priority to control power-off handler execution sequence
> > and thus ensure that the power-off handler with the optimal capabilities
> > to remove power for a given system is called first.
> > 
> > Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
> > Cc: Alexander Graf <agraf@suse.de>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > Cc: Heiko Stuebner <heiko@sntech.de>
> > Cc: Lee Jones <lee.jones@linaro.org>
> > Cc: Len Brown <len.brown@intel.com>
> > Cc: Pavel Machek <pavel@ucw.cz>
> > Cc: Philippe Rétornaz <philippe.retornaz@gmail.com>
> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Cc: Romain Perier <romain.perier@gmail.com>
> > Acked-by: Pavel Machek <pavel@ucw.cz>
> > Acked-by: Heiko Stuebner <heiko@sntech.de>
> > Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> > ---
> > v5:
> > - Rebase to v3.18-rc3
> > v4:
> > - Do not use notifiers but internal functions and data structures to manage
> >   the list of power-off handlers. Drop unused parameters from callbacks, and
> >   make the power-off function type void.
> >   Code to manage and walk the list of callbacks is derived from notifier.c.
> > v3:
> > - Rename new file to power_off_handler.c
> > - Replace poweroff in all newly introduced variables and in text
> >   with power_off or power-off as appropriate
> > - Replace POWEROFF_PRIORITY_xxx with POWER_OFF_PRIORITY_xxx
> > - Execute power-off handlers without any locks held
> > v2:
> > - poweroff -> power_off
> > - Add defines for default priorities
> > - Use raw notifiers protected by spinlocks instead of atomic notifiers
> > - Add register_poweroff_handler_simple
> > - Add devm_register_power_off_handler
> > 
> >  include/linux/pm.h               |  28 ++++
> >  kernel/power/Makefile            |   1 +
> >  kernel/power/power_off_handler.c | 293 +++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 322 insertions(+)
> >  create mode 100644 kernel/power/power_off_handler.c
> > 
> > diff --git a/include/linux/pm.h b/include/linux/pm.h
> > index 383fd68..a4d6bf8 100644
> > --- a/include/linux/pm.h
> > +++ b/include/linux/pm.h
> > @@ -35,6 +35,34 @@ extern void (*pm_power_off)(void);
> >  extern void (*pm_power_off_prepare)(void);
> >  
> >  struct device; /* we have a circular dep with device.h */
> > +
> > +/*
> > + * Data structures and callbacks to manage power-off handlers
> > + */
> > +
> > +struct power_off_handler_block {
> > +	void (*handler)(struct power_off_handler_block *);
> > +	struct power_off_handler_block __rcu *next;
> > +	int priority;
> > +};
> > +
> > +int register_power_off_handler(struct power_off_handler_block *);
> > +int devm_register_power_off_handler(struct device *,
> > +				    struct power_off_handler_block *);
> > +int register_power_off_handler_simple(void (*function)(void), int priority);
> > +int unregister_power_off_handler(struct power_off_handler_block *);
> > +void do_kernel_power_off(void);
> > +bool have_kernel_power_off(void);
> > +
> > +/*
> > + * Pre-defined power-off handler priorities
> > + */
> > +#define POWER_OFF_PRIORITY_FALLBACK	0
> > +#define POWER_OFF_PRIORITY_LOW		64
> > +#define POWER_OFF_PRIORITY_DEFAULT	128
> > +#define POWER_OFF_PRIORITY_HIGH		192
> > +#define POWER_OFF_PRIORITY_HIGHEST	255
> 
> I'm not sure why we need these gaps in the priority space.
> 
> I guess it might be possible to use
> 
> enum power_off_priority {
> 	POWER_OFF_PRIORITY_FALLBACK = 0,
> 	POWER_OFF_PRIORITY_LOW,
> 	POWER_OFF_PRIORITY_DEFAULT,
> 	POWER_OFF_PRIORITY_HIGH,
> 	POWER_OFF_PRIORITY_HIGHEST,
> 	POWER_OFF_PRIORITY_LIMIT,
> };

I retained the large number space on purpose, specifically to permit in-between
priorities. In other words, I want people to be able to say "priority for this
handler is higher than low but lower than default". After all, the defines were
intended as hints, not as a "Thou shall use those and only those priorities".

Having said that, the important part is to get the series accepted, so I won't
argue if that is what it takes to get an Ack. Let me know.

Thanks,
Guenter

> 
> and then make register_ complain if priority is POWER_OFF_PRIORITY_LIMIT
> or greater.
> 
> But I'm OK with the rest, so if no one else sees a problem here,
> I'm not going to make a fuss about it.
> 
> Rafael
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Nov. 6, 2014, 10:30 p.m. UTC | #2
On Thursday, November 06, 2014 08:42:45 AM Guenter Roeck wrote:
> Various drivers implement architecture and/or device specific means to
> power off the system.  For the most part, those drivers set the global
> variable pm_power_off to point to a function within the driver.
> 
> This mechanism has a number of drawbacks.  Typically only one scheme
> to remove power is supported (at least if pm_power_off is used).
> At least in theory there can be multiple means remove power, some of
> which may be less desirable. For example, some mechanisms may only
> power off the CPU or the CPU card, while another may power off the
> entire system.  Others may really just execute a restart sequence
> or drop into the ROM monitor. Using pm_power_off can also be racy
> if the function pointer is set from a driver built as module, as the
> driver may be in the process of being unloaded when pm_power_off is
> called. If there are multiple power-off handlers in the system, removing
> a module with such a handler may inadvertently reset the pointer to
> pm_power_off to NULL, leaving the system with no means to remove power.
> 
> Introduce a system power-off handler call chain to solve the described
> problems.  This call chain is expected to be executed from the architecture
> specific machine_power_off() function.  Drivers and architeceture code
> providing system power-off functionality are expected to register with
> this call chain.  When registering a power-off handler, callers can
> provide a priority to control power-off handler execution sequence
> and thus ensure that the power-off handler with the optimal capabilities
> to remove power for a given system is called first.
> 
> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
> Cc: Alexander Graf <agraf@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Heiko Stuebner <heiko@sntech.de>
> Cc: Lee Jones <lee.jones@linaro.org>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Philippe Rétornaz <philippe.retornaz@gmail.com>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Romain Perier <romain.perier@gmail.com>
> Acked-by: Pavel Machek <pavel@ucw.cz>
> Acked-by: Heiko Stuebner <heiko@sntech.de>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
> v5:
> - Rebase to v3.18-rc3
> v4:
> - Do not use notifiers but internal functions and data structures to manage
>   the list of power-off handlers. Drop unused parameters from callbacks, and
>   make the power-off function type void.
>   Code to manage and walk the list of callbacks is derived from notifier.c.
> v3:
> - Rename new file to power_off_handler.c
> - Replace poweroff in all newly introduced variables and in text
>   with power_off or power-off as appropriate
> - Replace POWEROFF_PRIORITY_xxx with POWER_OFF_PRIORITY_xxx
> - Execute power-off handlers without any locks held
> v2:
> - poweroff -> power_off
> - Add defines for default priorities
> - Use raw notifiers protected by spinlocks instead of atomic notifiers
> - Add register_poweroff_handler_simple
> - Add devm_register_power_off_handler
> 
>  include/linux/pm.h               |  28 ++++
>  kernel/power/Makefile            |   1 +
>  kernel/power/power_off_handler.c | 293 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 322 insertions(+)
>  create mode 100644 kernel/power/power_off_handler.c
> 
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index 383fd68..a4d6bf8 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -35,6 +35,34 @@ extern void (*pm_power_off)(void);
>  extern void (*pm_power_off_prepare)(void);
>  
>  struct device; /* we have a circular dep with device.h */
> +
> +/*
> + * Data structures and callbacks to manage power-off handlers
> + */
> +
> +struct power_off_handler_block {
> +	void (*handler)(struct power_off_handler_block *);
> +	struct power_off_handler_block __rcu *next;
> +	int priority;
> +};
> +
> +int register_power_off_handler(struct power_off_handler_block *);
> +int devm_register_power_off_handler(struct device *,
> +				    struct power_off_handler_block *);
> +int register_power_off_handler_simple(void (*function)(void), int priority);
> +int unregister_power_off_handler(struct power_off_handler_block *);
> +void do_kernel_power_off(void);
> +bool have_kernel_power_off(void);
> +
> +/*
> + * Pre-defined power-off handler priorities
> + */
> +#define POWER_OFF_PRIORITY_FALLBACK	0
> +#define POWER_OFF_PRIORITY_LOW		64
> +#define POWER_OFF_PRIORITY_DEFAULT	128
> +#define POWER_OFF_PRIORITY_HIGH		192
> +#define POWER_OFF_PRIORITY_HIGHEST	255

I'm not sure why we need these gaps in the priority space.

I guess it might be possible to use

enum power_off_priority {
	POWER_OFF_PRIORITY_FALLBACK = 0,
	POWER_OFF_PRIORITY_LOW,
	POWER_OFF_PRIORITY_DEFAULT,
	POWER_OFF_PRIORITY_HIGH,
	POWER_OFF_PRIORITY_HIGHEST,
	POWER_OFF_PRIORITY_LIMIT,
};

and then make register_ complain if priority is POWER_OFF_PRIORITY_LIMIT
or greater.

But I'm OK with the rest, so if no one else sees a problem here,
I'm not going to make a fuss about it.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Nov. 7, 2014, 12:16 a.m. UTC | #3
On Thursday, November 06, 2014 02:27:03 PM Guenter Roeck wrote:
> On Thu, Nov 06, 2014 at 11:30:59PM +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 06, 2014 08:42:45 AM Guenter Roeck wrote:
> > > Various drivers implement architecture and/or device specific means to
> > > power off the system.  For the most part, those drivers set the global
> > > variable pm_power_off to point to a function within the driver.
> > > 
> > > This mechanism has a number of drawbacks.  Typically only one scheme
> > > to remove power is supported (at least if pm_power_off is used).
> > > At least in theory there can be multiple means remove power, some of
> > > which may be less desirable. For example, some mechanisms may only
> > > power off the CPU or the CPU card, while another may power off the
> > > entire system.  Others may really just execute a restart sequence
> > > or drop into the ROM monitor. Using pm_power_off can also be racy
> > > if the function pointer is set from a driver built as module, as the
> > > driver may be in the process of being unloaded when pm_power_off is
> > > called. If there are multiple power-off handlers in the system, removing
> > > a module with such a handler may inadvertently reset the pointer to
> > > pm_power_off to NULL, leaving the system with no means to remove power.
> > > 
> > > Introduce a system power-off handler call chain to solve the described
> > > problems.  This call chain is expected to be executed from the architecture
> > > specific machine_power_off() function.  Drivers and architeceture code
> > > providing system power-off functionality are expected to register with
> > > this call chain.  When registering a power-off handler, callers can
> > > provide a priority to control power-off handler execution sequence
> > > and thus ensure that the power-off handler with the optimal capabilities
> > > to remove power for a given system is called first.
> > > 
> > > Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
> > > Cc: Alexander Graf <agraf@suse.de>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > Cc: Heiko Stuebner <heiko@sntech.de>
> > > Cc: Lee Jones <lee.jones@linaro.org>
> > > Cc: Len Brown <len.brown@intel.com>
> > > Cc: Pavel Machek <pavel@ucw.cz>
> > > Cc: Philippe Rétornaz <philippe.retornaz@gmail.com>
> > > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > Cc: Romain Perier <romain.perier@gmail.com>
> > > Acked-by: Pavel Machek <pavel@ucw.cz>
> > > Acked-by: Heiko Stuebner <heiko@sntech.de>
> > > Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> > > ---
> > > v5:
> > > - Rebase to v3.18-rc3
> > > v4:
> > > - Do not use notifiers but internal functions and data structures to manage
> > >   the list of power-off handlers. Drop unused parameters from callbacks, and
> > >   make the power-off function type void.
> > >   Code to manage and walk the list of callbacks is derived from notifier.c.
> > > v3:
> > > - Rename new file to power_off_handler.c
> > > - Replace poweroff in all newly introduced variables and in text
> > >   with power_off or power-off as appropriate
> > > - Replace POWEROFF_PRIORITY_xxx with POWER_OFF_PRIORITY_xxx
> > > - Execute power-off handlers without any locks held
> > > v2:
> > > - poweroff -> power_off
> > > - Add defines for default priorities
> > > - Use raw notifiers protected by spinlocks instead of atomic notifiers
> > > - Add register_poweroff_handler_simple
> > > - Add devm_register_power_off_handler
> > > 
> > >  include/linux/pm.h               |  28 ++++
> > >  kernel/power/Makefile            |   1 +
> > >  kernel/power/power_off_handler.c | 293 +++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 322 insertions(+)
> > >  create mode 100644 kernel/power/power_off_handler.c
> > > 
> > > diff --git a/include/linux/pm.h b/include/linux/pm.h
> > > index 383fd68..a4d6bf8 100644
> > > --- a/include/linux/pm.h
> > > +++ b/include/linux/pm.h
> > > @@ -35,6 +35,34 @@ extern void (*pm_power_off)(void);
> > >  extern void (*pm_power_off_prepare)(void);
> > >  
> > >  struct device; /* we have a circular dep with device.h */
> > > +
> > > +/*
> > > + * Data structures and callbacks to manage power-off handlers
> > > + */
> > > +
> > > +struct power_off_handler_block {
> > > +	void (*handler)(struct power_off_handler_block *);
> > > +	struct power_off_handler_block __rcu *next;
> > > +	int priority;
> > > +};
> > > +
> > > +int register_power_off_handler(struct power_off_handler_block *);
> > > +int devm_register_power_off_handler(struct device *,
> > > +				    struct power_off_handler_block *);
> > > +int register_power_off_handler_simple(void (*function)(void), int priority);
> > > +int unregister_power_off_handler(struct power_off_handler_block *);
> > > +void do_kernel_power_off(void);
> > > +bool have_kernel_power_off(void);
> > > +
> > > +/*
> > > + * Pre-defined power-off handler priorities
> > > + */
> > > +#define POWER_OFF_PRIORITY_FALLBACK	0
> > > +#define POWER_OFF_PRIORITY_LOW		64
> > > +#define POWER_OFF_PRIORITY_DEFAULT	128
> > > +#define POWER_OFF_PRIORITY_HIGH		192
> > > +#define POWER_OFF_PRIORITY_HIGHEST	255
> > 
> > I'm not sure why we need these gaps in the priority space.
> > 
> > I guess it might be possible to use
> > 
> > enum power_off_priority {
> > 	POWER_OFF_PRIORITY_FALLBACK = 0,
> > 	POWER_OFF_PRIORITY_LOW,
> > 	POWER_OFF_PRIORITY_DEFAULT,
> > 	POWER_OFF_PRIORITY_HIGH,
> > 	POWER_OFF_PRIORITY_HIGHEST,
> > 	POWER_OFF_PRIORITY_LIMIT,
> > };
> 
> I retained the large number space on purpose, specifically to permit in-between
> priorities. In other words, I want people to be able to say "priority for this
> handler is higher than low but lower than default". After all, the defines were
> intended as hints, not as a "Thou shall use those and only those priorities".

Problem with that is how they are supposed to know what priority to use then.

How do I know if my priority is between DEFAULT and HIGH and whether it is
closer to HIGH or closer to DEFAULT?  What are the rules?

The only rule that seems to be there is "this handler should be tried before
that one, so it needs to have a higher priority".  But now the question is
how people are going to know which handlers they are competing with and whether
or not they are more "important".

> Having said that, the important part is to get the series accepted, so I won't
> argue if that is what it takes to get an Ack. Let me know.

This isn't worth fighting over in my view, so I won't if everyone else is fine
with it.

Just feel free to ignore this concern if you don't think it is valid.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guenter Roeck Nov. 7, 2014, 3 a.m. UTC | #4
On 11/06/2014 04:16 PM, Rafael J. Wysocki wrote:
> On Thursday, November 06, 2014 02:27:03 PM Guenter Roeck wrote:
>> On Thu, Nov 06, 2014 at 11:30:59PM +0100, Rafael J. Wysocki wrote:
>>> On Thursday, November 06, 2014 08:42:45 AM Guenter Roeck wrote:
>>>> Various drivers implement architecture and/or device specific means to
>>>> power off the system.  For the most part, those drivers set the global
>>>> variable pm_power_off to point to a function within the driver.
>>>>
>>>> This mechanism has a number of drawbacks.  Typically only one scheme
>>>> to remove power is supported (at least if pm_power_off is used).
>>>> At least in theory there can be multiple means remove power, some of
>>>> which may be less desirable. For example, some mechanisms may only
>>>> power off the CPU or the CPU card, while another may power off the
>>>> entire system.  Others may really just execute a restart sequence
>>>> or drop into the ROM monitor. Using pm_power_off can also be racy
>>>> if the function pointer is set from a driver built as module, as the
>>>> driver may be in the process of being unloaded when pm_power_off is
>>>> called. If there are multiple power-off handlers in the system, removing
>>>> a module with such a handler may inadvertently reset the pointer to
>>>> pm_power_off to NULL, leaving the system with no means to remove power.
>>>>
>>>> Introduce a system power-off handler call chain to solve the described
>>>> problems.  This call chain is expected to be executed from the architecture
>>>> specific machine_power_off() function.  Drivers and architeceture code
>>>> providing system power-off functionality are expected to register with
>>>> this call chain.  When registering a power-off handler, callers can
>>>> provide a priority to control power-off handler execution sequence
>>>> and thus ensure that the power-off handler with the optimal capabilities
>>>> to remove power for a given system is called first.
>>>>
>>>> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
>>>> Cc: Alexander Graf <agraf@suse.de>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
>>>> Cc: Heiko Stuebner <heiko@sntech.de>
>>>> Cc: Lee Jones <lee.jones@linaro.org>
>>>> Cc: Len Brown <len.brown@intel.com>
>>>> Cc: Pavel Machek <pavel@ucw.cz>
>>>> Cc: Philippe Rétornaz <philippe.retornaz@gmail.com>
>>>> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
>>>> Cc: Romain Perier <romain.perier@gmail.com>
>>>> Acked-by: Pavel Machek <pavel@ucw.cz>
>>>> Acked-by: Heiko Stuebner <heiko@sntech.de>
>>>> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>>>> ---
>>>> v5:
>>>> - Rebase to v3.18-rc3
>>>> v4:
>>>> - Do not use notifiers but internal functions and data structures to manage
>>>>    the list of power-off handlers. Drop unused parameters from callbacks, and
>>>>    make the power-off function type void.
>>>>    Code to manage and walk the list of callbacks is derived from notifier.c.
>>>> v3:
>>>> - Rename new file to power_off_handler.c
>>>> - Replace poweroff in all newly introduced variables and in text
>>>>    with power_off or power-off as appropriate
>>>> - Replace POWEROFF_PRIORITY_xxx with POWER_OFF_PRIORITY_xxx
>>>> - Execute power-off handlers without any locks held
>>>> v2:
>>>> - poweroff -> power_off
>>>> - Add defines for default priorities
>>>> - Use raw notifiers protected by spinlocks instead of atomic notifiers
>>>> - Add register_poweroff_handler_simple
>>>> - Add devm_register_power_off_handler
>>>>
>>>>   include/linux/pm.h               |  28 ++++
>>>>   kernel/power/Makefile            |   1 +
>>>>   kernel/power/power_off_handler.c | 293 +++++++++++++++++++++++++++++++++++++++
>>>>   3 files changed, 322 insertions(+)
>>>>   create mode 100644 kernel/power/power_off_handler.c
>>>>
>>>> diff --git a/include/linux/pm.h b/include/linux/pm.h
>>>> index 383fd68..a4d6bf8 100644
>>>> --- a/include/linux/pm.h
>>>> +++ b/include/linux/pm.h
>>>> @@ -35,6 +35,34 @@ extern void (*pm_power_off)(void);
>>>>   extern void (*pm_power_off_prepare)(void);
>>>>
>>>>   struct device; /* we have a circular dep with device.h */
>>>> +
>>>> +/*
>>>> + * Data structures and callbacks to manage power-off handlers
>>>> + */
>>>> +
>>>> +struct power_off_handler_block {
>>>> +	void (*handler)(struct power_off_handler_block *);
>>>> +	struct power_off_handler_block __rcu *next;
>>>> +	int priority;
>>>> +};
>>>> +
>>>> +int register_power_off_handler(struct power_off_handler_block *);
>>>> +int devm_register_power_off_handler(struct device *,
>>>> +				    struct power_off_handler_block *);
>>>> +int register_power_off_handler_simple(void (*function)(void), int priority);
>>>> +int unregister_power_off_handler(struct power_off_handler_block *);
>>>> +void do_kernel_power_off(void);
>>>> +bool have_kernel_power_off(void);
>>>> +
>>>> +/*
>>>> + * Pre-defined power-off handler priorities
>>>> + */
>>>> +#define POWER_OFF_PRIORITY_FALLBACK	0
>>>> +#define POWER_OFF_PRIORITY_LOW		64
>>>> +#define POWER_OFF_PRIORITY_DEFAULT	128
>>>> +#define POWER_OFF_PRIORITY_HIGH		192
>>>> +#define POWER_OFF_PRIORITY_HIGHEST	255
>>>
>>> I'm not sure why we need these gaps in the priority space.
>>>
>>> I guess it might be possible to use
>>>
>>> enum power_off_priority {
>>> 	POWER_OFF_PRIORITY_FALLBACK = 0,
>>> 	POWER_OFF_PRIORITY_LOW,
>>> 	POWER_OFF_PRIORITY_DEFAULT,
>>> 	POWER_OFF_PRIORITY_HIGH,
>>> 	POWER_OFF_PRIORITY_HIGHEST,
>>> 	POWER_OFF_PRIORITY_LIMIT,
>>> };
>>
>> I retained the large number space on purpose, specifically to permit in-between
>> priorities. In other words, I want people to be able to say "priority for this
>> handler is higher than low but lower than default". After all, the defines were
>> intended as hints, not as a "Thou shall use those and only those priorities".
>
> Problem with that is how they are supposed to know what priority to use then.
>
> How do I know if my priority is between DEFAULT and HIGH and whether it is
> closer to HIGH or closer to DEFAULT?  What are the rules?
>
Guess there is too much of a Libertarian in me to make up such rules.
Or, in other words, I didn't think that more explicit rules than the ones
provided were needed.

> The only rule that seems to be there is "this handler should be tried before
> that one, so it needs to have a higher priority".  But now the question is
> how people are going to know which handlers they are competing with and whether
> or not they are more "important".
>
Keep in mind those are power-off handlers. The "rule", if one is needed,
would be that the power-off handler which powers off more parts of the system
should get higher priority. Which one that is is depends on the platform.
I would think that it is in people's interest not to shoot themselves
into the foot, but maybe I am wrong.

>> Having said that, the important part is to get the series accepted, so I won't
>> argue if that is what it takes to get an Ack. Let me know.
>
> This isn't worth fighting over in my view, so I won't if everyone else is fine
> with it.
>

Linus raised pretty much the same or a similar concern. Everyone else, as far
as I can see, doesn't seem to care. Given that, and since I don't have a strong
opinion either, I'll change it to an enum in the next version. If there is anyone
who disagrees with that, the time to speak up is now.

Please let me know if you have any other concerns.

Thanks,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/pm.h b/include/linux/pm.h
index 383fd68..a4d6bf8 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -35,6 +35,34 @@  extern void (*pm_power_off)(void);
 extern void (*pm_power_off_prepare)(void);
 
 struct device; /* we have a circular dep with device.h */
+
+/*
+ * Data structures and callbacks to manage power-off handlers
+ */
+
+struct power_off_handler_block {
+	void (*handler)(struct power_off_handler_block *);
+	struct power_off_handler_block __rcu *next;
+	int priority;
+};
+
+int register_power_off_handler(struct power_off_handler_block *);
+int devm_register_power_off_handler(struct device *,
+				    struct power_off_handler_block *);
+int register_power_off_handler_simple(void (*function)(void), int priority);
+int unregister_power_off_handler(struct power_off_handler_block *);
+void do_kernel_power_off(void);
+bool have_kernel_power_off(void);
+
+/*
+ * Pre-defined power-off handler priorities
+ */
+#define POWER_OFF_PRIORITY_FALLBACK	0
+#define POWER_OFF_PRIORITY_LOW		64
+#define POWER_OFF_PRIORITY_DEFAULT	128
+#define POWER_OFF_PRIORITY_HIGH		192
+#define POWER_OFF_PRIORITY_HIGHEST	255
+
 #ifdef CONFIG_VT_CONSOLE_SLEEP
 extern void pm_vt_switch_required(struct device *dev, bool required);
 extern void pm_vt_switch_unregister(struct device *dev);
diff --git a/kernel/power/Makefile b/kernel/power/Makefile
index 29472bf..567eda5 100644
--- a/kernel/power/Makefile
+++ b/kernel/power/Makefile
@@ -2,6 +2,7 @@ 
 ccflags-$(CONFIG_PM_DEBUG)	:= -DDEBUG
 
 obj-y				+= qos.o
+obj-y				+= power_off_handler.o
 obj-$(CONFIG_PM)		+= main.o
 obj-$(CONFIG_VT_CONSOLE_SLEEP)	+= console.o
 obj-$(CONFIG_FREEZER)		+= process.o
diff --git a/kernel/power/power_off_handler.c b/kernel/power/power_off_handler.c
new file mode 100644
index 0000000..e576534
--- /dev/null
+++ b/kernel/power/power_off_handler.c
@@ -0,0 +1,293 @@ 
+/*
+ * kernel/power/power_off_handler.c - Power-off handling functions
+ *
+ * Copyright (c) 2014 Guenter Roeck
+ *
+ * List management code derived from kernel/notifier.c.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt)	"power-off: " fmt
+
+#include <linux/ctype.h>
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/kallsyms.h>
+#include <linux/pm.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+/*
+ * List of handlers for kernel code which wants to be called
+ * to power off the system.
+ */
+static struct power_off_handler_block __rcu *power_off_handler_list;
+static DEFINE_SPINLOCK(power_off_handler_lock);
+
+/*
+ * Internal function to register power-off handler.
+ * Must be called with power-off spinlock acquired.
+ */
+static void _register_power_off_handler(struct power_off_handler_block *p)
+{
+	struct power_off_handler_block **pl = &power_off_handler_list;
+
+	while ((*pl) != NULL) {
+		if (p->priority > (*pl)->priority)
+			break;
+		pl = &((*pl)->next);
+	}
+	p->next = *pl;
+	rcu_assign_pointer(*pl, p);
+}
+
+/*
+ * Internal function to unregister a power-off handler.
+ * Must be called with power-off spinlock acquired.
+ */
+static int _unregister_power_off_handler(struct power_off_handler_block *p)
+{
+	struct power_off_handler_block **pl = &power_off_handler_list;
+
+	while ((*pl) != NULL) {
+		if ((*pl) == p) {
+			rcu_assign_pointer(*pl, p->next);
+			return 0;
+		}
+		pl = &((*pl)->next);
+	}
+	return -ENOENT;
+}
+
+/**
+ *	register_power_off_handler - Register function to be called to power off
+ *				     the system
+ *	@nb: Info about handler function to be called
+ *	@nb->priority:	Handler priority. Handlers should follow the
+ *			following guidelines for setting priorities.
+ *			0:	Power-off handler of last resort,
+ *				with limited power-off capabilities,
+ *				such as power-off handlers which
+ *				do not really power off the system
+ *				but loop forever or stop the CPU.
+ *			128:	Default power-off handler; use if no other
+ *				power-off handler is expected to be available,
+ *				and/or if power-off functionality is
+ *				sufficient to power off the entire system
+ *			255:	Highest priority power-off handler, will
+ *				preempt all other power-off handlers
+ *
+ *	Registers a function with code to be called to power off the
+ *	system.
+ *
+ *	Registered functions will be called from machine_power_off as last
+ *	step of the power-off sequence. Registered functions are expected
+ *	to power off the system immediately. If more than one function is
+ *	registered, the power-off handler priority selects which function
+ *	will be called first.
+ *
+ *	Power-off handlers may be registered from architecture code or from
+ *	drivers. A typical use case would be a system where power off
+ *	functionality is provided through a multi-function chip or through
+ *	a programmable power controller. Multiple power-off handlers may exist;
+ *	for example, one power-off handler might power off the entire system,
+ *	while another only powers off the CPU card. In such cases, the
+ *	power-off handler which only powers off part of the hardware is
+ *	expected to register with low priority to ensure that it only
+ *	runs if no other means to power off the system are available.
+ *
+ *	Always returns zero.
+ */
+int register_power_off_handler(struct power_off_handler_block *pb)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&power_off_handler_lock, flags);
+	_register_power_off_handler(pb);
+	spin_unlock_irqrestore(&power_off_handler_lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL(register_power_off_handler);
+
+/**
+ *	unregister_power_off_handler - Unregister previously registered
+ *				       power-off handler
+ *	@nb: Hook to be unregistered
+ *
+ *	Unregisters a previously registered power-off handler function.
+ *
+ *	Returns zero on success, or %-ENOENT on failure.
+ */
+int unregister_power_off_handler(struct power_off_handler_block *pb)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&power_off_handler_lock, flags);
+	ret = _unregister_power_off_handler(pb);
+	spin_unlock_irqrestore(&power_off_handler_lock, flags);
+
+	return ret;
+}
+EXPORT_SYMBOL(unregister_power_off_handler);
+
+struct _power_off_handler_data {
+	void (*handler)(void);
+	struct power_off_handler_block power_off_hb;
+};
+
+static void _power_off_handler(struct power_off_handler_block *this)
+{
+	struct _power_off_handler_data *poh =
+		container_of(this, struct _power_off_handler_data,
+			     power_off_hb);
+
+	poh->handler();
+}
+
+static struct _power_off_handler_data power_off_handler_data;
+
+/**
+ *	register_power_off_handler_simple - Register function to be called
+ *					    to power off the system
+ *	@handler:	Function to be called to power off the system
+ *	@priority:	Handler priority. For priority guidelines see
+ *			register_power_off_handler.
+ *
+ *	This is a simplified version of register_power_off_handler. It does
+ *	not take a power-off handler as argument, but a function pointer.
+ *	The function registers a power-off handler with specified priority.
+ *	Power-off handlers registered with this function can not be
+ *	unregistered, and only a single power-off handler can be installed
+ *	using it.
+ *
+ *	This function must not be called from modules and is therefore
+ *	not exported.
+ *
+ *	Returns %-EBUSY if a power-off handler has already been registered
+ *	using register_power_off_handler_simple. Otherwise returns zero.
+ */
+int register_power_off_handler_simple(void (*handler)(void), int priority)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&power_off_handler_lock, flags);
+
+	if (power_off_handler_data.handler) {
+		pr_warn("Power-off function already registered (%ps), cannot register %ps",
+			power_off_handler_data.handler, handler);
+		ret = -EBUSY;
+		goto abort;
+	}
+
+	power_off_handler_data.handler = handler;
+	power_off_handler_data.power_off_hb.handler = _power_off_handler;
+	power_off_handler_data.power_off_hb.priority = priority;
+
+	_register_power_off_handler(&power_off_handler_data.power_off_hb);
+abort:
+	spin_unlock_irqrestore(&power_off_handler_lock, flags);
+	return ret;
+}
+
+/* Device managed power-off handler registration */
+
+static void devm_power_off_release(struct device *dev, void *res)
+{
+	struct power_off_handler_block *hb;
+
+	hb = *(struct power_off_handler_block **)res;
+	unregister_power_off_handler(hb);
+}
+
+/**
+ *	devm_register_power_off_handler - Register function to be called
+ *					  to power off the system
+ *	@dev:		The device registering the power-off handler.
+ *	@handler:	Function to be called to power off the system
+ *	@priority:	Handler priority. For priority guidelines see
+ *			register_power_off_handler.
+ *
+ *	This is the device managed version of register_power_off_handler.
+ *
+ *	Returns %-EINVAL if dev is NULL. Returns %-ENOMEM if the system is
+ *	out of memory. Otherwise returns zero.
+ */
+int devm_register_power_off_handler(struct device *dev,
+				    struct power_off_handler_block *hb)
+{
+	struct power_off_handler_block **ptr;
+	unsigned long flags;
+
+	if (!dev)
+		return -EINVAL;
+
+	ptr = devres_alloc(devm_power_off_release, sizeof(*ptr), GFP_KERNEL);
+	if (!ptr)
+		return -ENOMEM;
+
+	spin_lock_irqsave(&power_off_handler_lock, flags);
+	_register_power_off_handler(hb);
+	spin_unlock_irqrestore(&power_off_handler_lock, flags);
+
+	*ptr = hb;
+	devres_add(dev, ptr);
+	return 0;
+}
+EXPORT_SYMBOL(devm_register_power_off_handler);
+
+/**
+ *	do_kernel_power_off - Execute kernel power-off handler call chain
+ *
+ *	Calls functions registered with register_power_off_handler.
+ *
+ *	Expected to be called from machine_power_off as last step of
+ *	the power-off sequence.
+ *
+ *	Powers the system off immediately if a power-off handler function
+ *	has been registered. Otherwise does nothing.
+ */
+void do_kernel_power_off(void)
+{
+	struct power_off_handler_block *p, *next_p;
+
+	/*
+	 * No locking. This code can be called from both atomic and non-atomic
+	 * context, with interrupts enabled or disabled, depending on the
+	 * architecture and platform.
+	 *
+	 * Power-off handler registration and de-registration are executed in
+	 * atomic context with interrupts disabled, which guarantees that
+	 * do_kernel_power_off() will not be called while a power-off handler
+	 * is installed or removed.
+	 * There is a theoretic risc that a power-off handler is installed or
+	 * removed while the call chain is traversed, but we'll have to accept
+	 * that risk.
+	 */
+
+	p = rcu_dereference_raw(power_off_handler_list);
+	while (p) {
+		next_p = rcu_dereference_raw(p->next);
+		p->handler(p);
+		p = next_p;
+	}
+}
+
+/**
+ * have_kernel_power_off() - Check if kernel power-off handler is available
+ *
+ * Returns true if a kernel power-off handler is available, false otherwise.
+ */
+bool have_kernel_power_off(void)
+{
+	return pm_power_off != NULL ||
+			rcu_dereference_raw(power_off_handler_list) != NULL;
+}
+EXPORT_SYMBOL(have_kernel_power_off);