Message ID | 20141024195532.GF19377@localhost (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Oct 24, 2014 at 09:55:32PM +0200, Johan Hovold wrote: > On Fri, Oct 24, 2014 at 02:44:42PM -0500, Felipe Balbi wrote: > > On Fri, Oct 24, 2014 at 09:36:55PM +0200, Johan Hovold wrote: > > > On Fri, Oct 24, 2014 at 02:29:48PM -0500, Felipe Balbi wrote: > > > > Hi, > > > > > > > > On Fri, Oct 24, 2014 at 02:25:40PM -0500, Felipe Balbi wrote: > > > > > > with this I always get to "Power off failed -- system halted". If I > > > > > switch to v3.18-rc1 vanilla, then it works. So it's definitely caused by > > > > > your rtc-only patches. > > > > > > That's expected (see below). It works with v3.18-rc1 vanilla because > > > machine_halt is called instead of machine_power_off as there is no > > > registered power-off handler. > > > > yeah, that much I figured :-) > > > > > > ok, so it seems like it takes more than 1 second for things to > > > > propagate. If I increase that mdelay() to 3000, then everything works > > > > fine on my end. I think we should get RMK's input on this 3000ms delay > > > > to machine_power_off(). Should it be generic, or should we add it to our > > > > rtc pm_power_off implementation ? > > > > > > As I wrote above, we still need a 2-second mdelay in rtc-omap, which I > > > intend to add to the pmic_power_en patch. > > > > oh, alright then. If you can Cc me, I'll make sure to test that too ;-) > > I will. :) Just wanted to see whether Andrew preferred I resend the > whole series or just that one patch first. > > The diff is minimal: > > diff --git a/drivers/rtc/rtc-omap.c b/drivers/rtc/rtc-omap.c > index e74750f00b18..e4f97ad9eb21 100644 > --- a/drivers/rtc/rtc-omap.c > +++ b/drivers/rtc/rtc-omap.c > @@ -423,6 +423,8 @@ static void omap_rtc_power_off(void) > val = rtc_read(rtc, OMAP_RTC_INTERRUPTS_REG); > rtc_writel(rtc, OMAP_RTC_INTERRUPTS_REG, > val | OMAP_RTC_INTERRUPTS_IT_ALARM2); > + > + mdelay(2000); > } you can add my: Tested-by: Felipe Balbi <balbi@ti.com> for this too. Now it's all good.
On Fri, 24 Oct 2014 21:55:32 +0200 Johan Hovold <johan@kernel.org> wrote: > On Fri, Oct 24, 2014 at 02:44:42PM -0500, Felipe Balbi wrote: > > On Fri, Oct 24, 2014 at 09:36:55PM +0200, Johan Hovold wrote: > > > On Fri, Oct 24, 2014 at 02:29:48PM -0500, Felipe Balbi wrote: > > > > Hi, > > > > > > > > On Fri, Oct 24, 2014 at 02:25:40PM -0500, Felipe Balbi wrote: > > > > > > with this I always get to "Power off failed -- system halted". If I > > > > > switch to v3.18-rc1 vanilla, then it works. So it's definitely caused by > > > > > your rtc-only patches. > > > > > > That's expected (see below). It works with v3.18-rc1 vanilla because > > > machine_halt is called instead of machine_power_off as there is no > > > registered power-off handler. > > > > yeah, that much I figured :-) > > > > > > ok, so it seems like it takes more than 1 second for things to > > > > propagate. If I increase that mdelay() to 3000, then everything works > > > > fine on my end. I think we should get RMK's input on this 3000ms delay > > > > to machine_power_off(). Should it be generic, or should we add it to our > > > > rtc pm_power_off implementation ? > > > > > > As I wrote above, we still need a 2-second mdelay in rtc-omap, which I > > > intend to add to the pmic_power_en patch. > > > > oh, alright then. If you can Cc me, I'll make sure to test that too ;-) > > I will. :) Just wanted to see whether Andrew preferred I resend the > whole series or just that one patch first. > > The diff is minimal: > > diff --git a/drivers/rtc/rtc-omap.c b/drivers/rtc/rtc-omap.c > index e74750f00b18..e4f97ad9eb21 100644 > --- a/drivers/rtc/rtc-omap.c > +++ b/drivers/rtc/rtc-omap.c > @@ -423,6 +423,8 @@ static void omap_rtc_power_off(void) > val = rtc_read(rtc, OMAP_RTC_INTERRUPTS_REG); > rtc_writel(rtc, OMAP_RTC_INTERRUPTS_REG, > val | OMAP_RTC_INTERRUPTS_IT_ALARM2); > + > + mdelay(2000); > } Yes, having read this threadlet: we need a very good comment in there explaining what's going on, please. Do we even need this delay on anything other than arm? Or even on all arm?
On Mon, Oct 27, 2014 at 04:22:51PM -0700, Andrew Morton wrote: > On Fri, 24 Oct 2014 21:55:32 +0200 Johan Hovold <johan@kernel.org> wrote: > > I will. :) Just wanted to see whether Andrew preferred I resend the > > whole series or just that one patch first. > > > > The diff is minimal: > > > > diff --git a/drivers/rtc/rtc-omap.c b/drivers/rtc/rtc-omap.c > > index e74750f00b18..e4f97ad9eb21 100644 > > --- a/drivers/rtc/rtc-omap.c > > +++ b/drivers/rtc/rtc-omap.c > > @@ -423,6 +423,8 @@ static void omap_rtc_power_off(void) > > val = rtc_read(rtc, OMAP_RTC_INTERRUPTS_REG); > > rtc_writel(rtc, OMAP_RTC_INTERRUPTS_REG, > > val | OMAP_RTC_INTERRUPTS_IT_ALARM2); > > + > > + mdelay(2000); > > } > > Yes, having read this threadlet: we need a very good comment in there > explaining what's going on, please. > > Do we even need this delay on anything other than arm? Or even on all arm? I think I've already commented on the behaviour of the reboot syscalls such as power off which can return to userspace, pointing out that x86 can return to userspace. As long as x86 can return to userspace, I see no harm in ARM returning to userspace. If a driver which is hooking into the power off stuff is unable to immediately shut off the power (wtf it can't for 2 sec I've no idea) then having that driver work around that hardware's specific brokenness with a delay seems entirely reasonable. That allows those SoCs which can do the right thing to do the right thing without being hindered by such silliness. And it also stops the next person coming along and bumping the delay from 2 to 3, to 5, and then what... 10 seconds? Keeping it in the driver means that the workaround for the broken hardware is kept with the driver for the broken hardware - exactly where it should be.
On Tue, Oct 28, 2014 at 12:25:52AM +0000, Russell King - ARM Linux wrote: > On Mon, Oct 27, 2014 at 04:22:51PM -0700, Andrew Morton wrote: > > On Fri, 24 Oct 2014 21:55:32 +0200 Johan Hovold <johan@kernel.org> wrote: > > > I will. :) Just wanted to see whether Andrew preferred I resend the > > > whole series or just that one patch first. > > > > > > The diff is minimal: > > > > > > diff --git a/drivers/rtc/rtc-omap.c b/drivers/rtc/rtc-omap.c > > > index e74750f00b18..e4f97ad9eb21 100644 > > > --- a/drivers/rtc/rtc-omap.c > > > +++ b/drivers/rtc/rtc-omap.c > > > @@ -423,6 +423,8 @@ static void omap_rtc_power_off(void) > > > val = rtc_read(rtc, OMAP_RTC_INTERRUPTS_REG); > > > rtc_writel(rtc, OMAP_RTC_INTERRUPTS_REG, > > > val | OMAP_RTC_INTERRUPTS_IT_ALARM2); > > > + > > > + mdelay(2000); > > > } > > > > Yes, having read this threadlet: we need a very good comment in there > > explaining what's going on, please. > > > > Do we even need this delay on anything other than arm? Or even on all arm? > > I think I've already commented on the behaviour of the reboot syscalls > such as power off which can return to userspace, pointing out that > x86 can return to userspace. > > As long as x86 can return to userspace, I see no harm in ARM returning > to userspace. If a driver which is hooking into the power off stuff > is unable to immediately shut off the power (wtf it can't for 2 sec > I've no idea) then having that driver work around that hardware's > specific brokenness with a delay seems entirely reasonable. Yeah, there are two issues here. If a power-off handler is crazy slow there really should be a delay in the handler. That was just an oversight on my part. [ In this case it takes between one and two seconds due to the resolution of the rtc and they way it's alarm events are triggered. ] The other issue is whether arch code should inform the user about failed power-off, in really exactly the same way as it does for failed reboot, see: ac15e00b1efe ("ARM: restart: move reboot failure handing into machine_restart()" by Russell. It looks like we're soon to be having power-off call chains, with configurable priorities, to shut of various parts of the hardware and this is all at least partly configurable through DT. [1] I think it's reasonable to expect to see more frequent failures to power off either due to (DT) misconfiguration or broken or flakey hardware. Having a short delay (I proposed 1s as for reboot) would also prevent any oopses when returning to user-space for just quite slow devices (e.g. millisecond range) without requiring explicit delays in these handlers. But as Andrew points out above, this really isn't an arm-specific issue, and currently various arches deal with this differently, where some return to user-space, some spin indefinitely (without an error message), and some spin on failed reboot but not power-off (e.g. arm and arm64). > That allows those SoCs which can do the right thing to do the right > thing without being hindered by such silliness. And it also stops > the next person coming along and bumping the delay from 2 to 3, to 5, > and then what... 10 seconds? That wouldn't be an issue then. Arch code would only handle the non-crazy case and complete power-off failures. > Keeping it in the driver means that the workaround for the broken > hardware is kept with the driver for the broken hardware - exactly > where it should be. Agreed. Johan [1] https://lkml.org/lkml/2014/10/27/506
On Tue, Oct 28, 2014 at 09:16:16AM +0100, Johan Hovold wrote: > It looks like we're soon to be having power-off call chains, with > configurable priorities, to shut of various parts of the hardware I really hope that they *don't* get used like that. I guess this is what happens when people don't read the code before they decide to implement something. All the reboot/power off/halt methods already call into the driver model, and the driver model then iterates over all bound drivers calling their "shutdown" method. So, today, as it has been for years, all device drivers are notified of system power off. That's where most device drivers should be cleanly stopping their hardware. The only thing which is left is the actual hardware triggering of the action, and that should be what's done via the notifier.
On Tue, Oct 28, 2014 at 08:47:46AM +0000, Russell King - ARM Linux wrote: > On Tue, Oct 28, 2014 at 09:16:16AM +0100, Johan Hovold wrote: > > It looks like we're soon to be having power-off call chains, with > > configurable priorities, to shut of various parts of the hardware > > I really hope that they *don't* get used like that. I guess this is > what happens when people don't read the code before they decide to > implement something. > > All the reboot/power off/halt methods already call into the driver model, > and the driver model then iterates over all bound drivers calling their > "shutdown" method. So, today, as it has been for years, all device > drivers are notified of system power off. > > That's where most device drivers should be cleanly stopping their > hardware. > > The only thing which is left is the actual hardware triggering of the > action, and that should be what's done via the notifier. That's not what I was trying to refer to. But the patch set explicitly allows for multiple, prioritised power-off handlers, which can power off a board in different ways and with various degrees of success. Specifically, it allows for fallback handlers in case one or more power-off handlers fail. So if we allow for that, what is to prevent the final power-off handler from failing? And should this not be logged by arch code in the same way as failure to restart is? Johan
On Tue, Oct 28, 2014 at 02:12:57PM +0100, Johan Hovold wrote: > That's not what I was trying to refer to. But the patch set explicitly > allows for multiple, prioritised power-off handlers, which can power > off a board in different ways and with various degrees of success. > Specifically, it allows for fallback handlers in case one or more > power-off handlers fail. > > So if we allow for that, what is to prevent the final power-off handler > from failing? And should this not be logged by arch code in the same way > as failure to restart is? And how is that different from having a set of power-off handlers, and reporting when each individual one fails? Don't you want to know if your primary high priority reboot handler fails, just as much as you want to know if your final last-resort power-off handler fails? Or different from having no power-off handlers. Here's the x86 code: void machine_power_off(void) { machine_ops.power_off(); } struct machine_ops machine_ops = { .power_off = native_machine_power_off, ... static void native_machine_power_off(void) { if (pm_power_off) { if (!reboot_force) machine_shutdown(); pm_power_off(); } /* A fallback in case there is no PM info available */ tboot_shutdown(TB_SHUTDOWN_HALT); } void tboot_shutdown(u32 shutdown_type) { void (*shutdown)(void); if (!tboot_enabled()) return; See - x86 can very well just fall straight back out of machine_power_off() if there's no pm_power_off() hook and tboot is not enabled.
On Tue, Oct 28, 2014 at 03:16:10PM +0000, Russell King - ARM Linux wrote: > On Tue, Oct 28, 2014 at 02:12:57PM +0100, Johan Hovold wrote: > > That's not what I was trying to refer to. But the patch set explicitly > > allows for multiple, prioritised power-off handlers, which can power > > off a board in different ways and with various degrees of success. > > Specifically, it allows for fallback handlers in case one or more > > power-off handlers fail. > > > > So if we allow for that, what is to prevent the final power-off handler > > from failing? And should this not be logged by arch code in the same way > > as failure to restart is? > > And how is that different from having a set of power-off handlers, and > reporting when each individual one fails? Don't you want to know if > your primary high priority reboot handler fails, just as much as you > want to know if your final last-resort power-off handler fails? Good point. Failed power-off should probably be logged by the power-off call chain implementation (which seems to makes notifier chains a bad fit). And what about any power-off latencies? Should this always be dealt with in the power-off handler? Again, if it's predictable and high, as in the OMAP RTC case, it should go in the handler. But what if it's just normal bus latencies (peripheral busses, i2c, or whatever people may come up with)? Should there always be a short delay before calling the next handler? > Or different from having no power-off handlers. That is actually quite different, as in that case we call machine_halt instead (via kernel_halt). > Here's the x86 code: > > void machine_power_off(void) > { > machine_ops.power_off(); > } > > struct machine_ops machine_ops = { > .power_off = native_machine_power_off, > ... > > static void native_machine_power_off(void) > { > if (pm_power_off) { > if (!reboot_force) > machine_shutdown(); > pm_power_off(); > } > /* A fallback in case there is no PM info available */ > tboot_shutdown(TB_SHUTDOWN_HALT); > } > > void tboot_shutdown(u32 shutdown_type) > { > void (*shutdown)(void); > > if (!tboot_enabled()) > return; > > See - x86 can very well just fall straight back out of machine_power_off() > if there's no pm_power_off() hook and tboot is not enabled. I never doubted that, but is the right thing to do? Not all arches do it that way. And what about the killing of init? Shall we simply consider that a systemd bug? case LINUX_REBOOT_CMD_POWER_OFF: kernel_power_off(); do_exit(0); break; If power-off fails (for whatever reason), do_exit(0) will trigger a panic when called from PID 1. Johan
Johan:. do you really plan to use this "poweroff-source" property ? As you proposed a renaming few days ago... I don't really want to waste time to propose patches to fix things incrementally and rename it if the old one is used... Romain 2014-10-29 13:34 GMT+01:00 Johan Hovold <johan@kernel.org>: > On Tue, Oct 28, 2014 at 03:16:10PM +0000, Russell King - ARM Linux wrote: >> On Tue, Oct 28, 2014 at 02:12:57PM +0100, Johan Hovold wrote: >> > That's not what I was trying to refer to. But the patch set explicitly >> > allows for multiple, prioritised power-off handlers, which can power >> > off a board in different ways and with various degrees of success. >> > Specifically, it allows for fallback handlers in case one or more >> > power-off handlers fail. >> > >> > So if we allow for that, what is to prevent the final power-off handler >> > from failing? And should this not be logged by arch code in the same way >> > as failure to restart is? >> >> And how is that different from having a set of power-off handlers, and >> reporting when each individual one fails? Don't you want to know if >> your primary high priority reboot handler fails, just as much as you >> want to know if your final last-resort power-off handler fails? > > Good point. Failed power-off should probably be logged by the power-off > call chain implementation (which seems to makes notifier chains a bad > fit). > > And what about any power-off latencies? Should this always be dealt with > in the power-off handler? > > Again, if it's predictable and high, as in the OMAP RTC case, it should > go in the handler. But what if it's just normal bus latencies > (peripheral busses, i2c, or whatever people may come up with)? > > Should there always be a short delay before calling the next handler? > >> Or different from having no power-off handlers. > > That is actually quite different, as in that case we call machine_halt > instead (via kernel_halt). > >> Here's the x86 code: >> >> void machine_power_off(void) >> { >> machine_ops.power_off(); >> } >> >> struct machine_ops machine_ops = { >> .power_off = native_machine_power_off, >> ... >> >> static void native_machine_power_off(void) >> { >> if (pm_power_off) { >> if (!reboot_force) >> machine_shutdown(); >> pm_power_off(); >> } >> /* A fallback in case there is no PM info available */ >> tboot_shutdown(TB_SHUTDOWN_HALT); >> } >> >> void tboot_shutdown(u32 shutdown_type) >> { >> void (*shutdown)(void); >> >> if (!tboot_enabled()) >> return; >> >> See - x86 can very well just fall straight back out of machine_power_off() >> if there's no pm_power_off() hook and tboot is not enabled. > > I never doubted that, but is the right thing to do? Not all arches do it > that way. > > And what about the killing of init? Shall we simply consider that a > systemd bug? > > case LINUX_REBOOT_CMD_POWER_OFF: > kernel_power_off(); > do_exit(0); > break; > > If power-off fails (for whatever reason), do_exit(0) will trigger a > panic when called from PID 1. > > Johan > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
[ Please do not top-post. ] On Wed, Oct 29, 2014 at 01:55:49PM +0100, Romain Perier wrote: > Johan:. do you really plan to use this "poweroff-source" property ? As > you proposed a renaming few days ago... > I don't really want to waste time to propose patches to fix things > incrementally and rename it if the old one is used... Why would I want to use your retracted renaming proposal for this property (i.e. "poweroff-source")? These patches use "ti,system-power-controller" and the vendor prefix will be dropped when your patches have been merged. Johan
On Wed, Oct 29, 2014 at 01:34:18PM +0100, Johan Hovold wrote: > On Tue, Oct 28, 2014 at 03:16:10PM +0000, Russell King - ARM Linux wrote: > > And how is that different from having a set of power-off handlers, and > > reporting when each individual one fails? Don't you want to know if > > your primary high priority reboot handler fails, just as much as you > > want to know if your final last-resort power-off handler fails? > > Good point. Failed power-off should probably be logged by the power-off > call chain implementation (which seems to makes notifier chains a bad > fit). > > And what about any power-off latencies? Should this always be dealt with > in the power-off handler? > > Again, if it's predictable and high, as in the OMAP RTC case, it should > go in the handler. But what if it's just normal bus latencies > (peripheral busses, i2c, or whatever people may come up with)? > > Should there always be a short delay before calling the next handler? If the handler has determined that it has failed, then why delay before trying the next handler? At the point it has decided it has failed, surely that's after it has waited sufficient time to determine that failure? > > Or different from having no power-off handlers. > > That is actually quite different, as in that case we call machine_halt > instead (via kernel_halt). Today, ARM does exactly what x86 does. If there's no power off handler registered, machine_power_off() shuts down other CPUs and returns. > > Here's the x86 code: > > > > void machine_power_off(void) > > { > > machine_ops.power_off(); > > } > > > > struct machine_ops machine_ops = { > > .power_off = native_machine_power_off, > > ... > > > > static void native_machine_power_off(void) > > { > > if (pm_power_off) { > > if (!reboot_force) > > machine_shutdown(); > > pm_power_off(); > > } > > /* A fallback in case there is no PM info available */ > > tboot_shutdown(TB_SHUTDOWN_HALT); > > } > > > > void tboot_shutdown(u32 shutdown_type) > > { > > void (*shutdown)(void); > > > > if (!tboot_enabled()) > > return; > > > > See - x86 can very well just fall straight back out of machine_power_off() > > if there's no pm_power_off() hook and tboot is not enabled. > > I never doubted that, but is the right thing to do? Not all arches do it > that way. Well, the biggest question there is: if the power off or restart syscall fails, what is the _generic_ non-architecture action which is supposed to happen? Whatever the answer is to that question, that action should be performed by the _generic_ non-architecture code, rather than having the same implementation spread across all 30 architectures which the kernel supports today. > And what about the killing of init? Shall we simply consider that a > systemd bug? > > case LINUX_REBOOT_CMD_POWER_OFF: > kernel_power_off(); > do_exit(0); > break; > > If power-off fails (for whatever reason), do_exit(0) will trigger a > panic when called from PID 1. Oh, systemd calls this from PID1? I guess that's another reason to hate systemd with vitriol. :) SysVinit and upstart implementations call it from the "halt" command, which is itself normally run from a script, which totally avoids that problem. I'm quite sure the insane systemd lobby will scream that this is a kernel bug and will want to change it somehow, just like they want to change the kernel in soo many other silly ways.
This just to be sure, nothing more. I did read that you mentionned "poweroff-source" earlier. However If I am wrong, my bad, everything is fine. 2014-10-29 14:00 GMT+01:00 Johan Hovold <johan@kernel.org>: > [ Please do not top-post. ] > > On Wed, Oct 29, 2014 at 01:55:49PM +0100, Romain Perier wrote: >> Johan:. do you really plan to use this "poweroff-source" property ? As >> you proposed a renaming few days ago... >> I don't really want to waste time to propose patches to fix things >> incrementally and rename it if the old one is used... > > Why would I want to use your retracted renaming proposal for this > property (i.e. "poweroff-source")? > > These patches use "ti,system-power-controller" and the vendor prefix > will be dropped when your patches have been merged. > > Johan
On 10/29/2014 05:34 AM, Johan Hovold wrote: > On Tue, Oct 28, 2014 at 03:16:10PM +0000, Russell King - ARM Linux wrote: >> On Tue, Oct 28, 2014 at 02:12:57PM +0100, Johan Hovold wrote: >>> That's not what I was trying to refer to. But the patch set explicitly >>> allows for multiple, prioritised power-off handlers, which can power >>> off a board in different ways and with various degrees of success. >>> Specifically, it allows for fallback handlers in case one or more >>> power-off handlers fail. >>> >>> So if we allow for that, what is to prevent the final power-off handler >>> from failing? And should this not be logged by arch code in the same way >>> as failure to restart is? >> >> And how is that different from having a set of power-off handlers, and >> reporting when each individual one fails? Don't you want to know if >> your primary high priority reboot handler fails, just as much as you >> want to know if your final last-resort power-off handler fails? > > Good point. Failed power-off should probably be logged by the power-off > call chain implementation (which seems to makes notifier chains a bad > fit). > Good that I just replaced notifier chain with an open coded implementation. Sure, that is possible, but I would prefer to do that as a follow-up commit, and it should be discussed in the context of the power-off handler patch set. > And what about any power-off latencies? Should this always be dealt with > in the power-off handler? > > Again, if it's predictable and high, as in the OMAP RTC case, it should > go in the handler. But what if it's just normal bus latencies > (peripheral busses, i2c, or whatever people may come up with)? > > Should there always be a short delay before calling the next handler? > That delay would depend on the individual power-off handler, so I think the current implementation works just fine (where power-off handlers implement the delay). We could move the delay into the infrastructure, but it would have to be configurable. I would prefer to consider that as a follow-up patch to not overload the power-off handler patch set with too many changes at the same time. >> Or different from having no power-off handlers. > > That is actually quite different, as in that case we call machine_halt > instead (via kernel_halt). > >> Here's the x86 code: >> >> void machine_power_off(void) >> { >> machine_ops.power_off(); >> } >> >> struct machine_ops machine_ops = { >> .power_off = native_machine_power_off, >> ... >> >> static void native_machine_power_off(void) >> { >> if (pm_power_off) { >> if (!reboot_force) >> machine_shutdown(); >> pm_power_off(); >> } >> /* A fallback in case there is no PM info available */ >> tboot_shutdown(TB_SHUTDOWN_HALT); >> } >> >> void tboot_shutdown(u32 shutdown_type) >> { >> void (*shutdown)(void); >> >> if (!tboot_enabled()) >> return; >> >> See - x86 can very well just fall straight back out of machine_power_off() >> if there's no pm_power_off() hook and tboot is not enabled. > > I never doubted that, but is the right thing to do? Not all arches do it > that way. > > And what about the killing of init? Shall we simply consider that a > systemd bug? > > case LINUX_REBOOT_CMD_POWER_OFF: > kernel_power_off(); > do_exit(0); > break; > > If power-off fails (for whatever reason), do_exit(0) will trigger a > panic when called from PID 1. > Common handling of that condition - eg to call machine_halt() - might be an option. Separate patch, though. Guenter
On Wed, Oct 29, 2014 at 01:10:20PM +0000, Russell King - ARM Linux wrote: > On Wed, Oct 29, 2014 at 01:34:18PM +0100, Johan Hovold wrote: > > On Tue, Oct 28, 2014 at 03:16:10PM +0000, Russell King - ARM Linux wrote: > > > And how is that different from having a set of power-off handlers, and > > > reporting when each individual one fails? Don't you want to know if > > > your primary high priority reboot handler fails, just as much as you > > > want to know if your final last-resort power-off handler fails? > > > > Good point. Failed power-off should probably be logged by the power-off > > call chain implementation (which seems to makes notifier chains a bad > > fit). > > > > And what about any power-off latencies? Should this always be dealt with > > in the power-off handler? > > > > Again, if it's predictable and high, as in the OMAP RTC case, it should > > go in the handler. But what if it's just normal bus latencies > > (peripheral busses, i2c, or whatever people may come up with)? > > > > Should there always be a short delay before calling the next handler? > > If the handler has determined that it has failed, then why delay before > trying the next handler? At the point it has decided it has failed, > surely that's after it has waited sufficient time to determine that > failure? The current handlers we have are not expecting any other handler to be run after they return. My question was whether all these handlers should get a short mdelay added to them (e.g. to compensate for bus latencies) or if this could be done in the power-off handler (e.g. before printing the error message). > > > Or different from having no power-off handlers. > > > > That is actually quite different, as in that case we call machine_halt > > instead (via kernel_halt). > > Today, ARM does exactly what x86 does. If there's no power off handler > registered, machine_power_off() shuts down other CPUs and returns. No, if there are no power-off handlers registered, kernel/reboot.c will never call machine_power_off: /* Instead of trying to make the power_off code look like * halt when pm_power_off is not set do it the easy way. */ if ((cmd == LINUX_REBOOT_CMD_POWER_OFF) && !pm_power_off) cmd = LINUX_REBOOT_CMD_HALT; So in that case on arm, a system-halted message is printed, and we never return to user-space. > > > Here's the x86 code: > > > > > > void machine_power_off(void) > > > { > > > machine_ops.power_off(); > > > } > > > > > > struct machine_ops machine_ops = { > > > .power_off = native_machine_power_off, > > > ... > > > > > > static void native_machine_power_off(void) > > > { > > > if (pm_power_off) { > > > if (!reboot_force) > > > machine_shutdown(); > > > pm_power_off(); > > > } > > > /* A fallback in case there is no PM info available */ > > > tboot_shutdown(TB_SHUTDOWN_HALT); > > > } > > > > > > void tboot_shutdown(u32 shutdown_type) > > > { > > > void (*shutdown)(void); > > > > > > if (!tboot_enabled()) > > > return; > > > > > > See - x86 can very well just fall straight back out of machine_power_off() > > > if there's no pm_power_off() hook and tboot is not enabled. > > > > I never doubted that, but is the right thing to do? Not all arches do it > > that way. > > Well, the biggest question there is: if the power off or restart syscall > fails, what is the _generic_ non-architecture action which is supposed to > happen? > > Whatever the answer is to that question, that action should be performed > by the _generic_ non-architecture code, rather than having the same > implementation spread across all 30 architectures which the kernel > supports today. I fully agree. > > And what about the killing of init? Shall we simply consider that a > > systemd bug? > > > > case LINUX_REBOOT_CMD_POWER_OFF: > > kernel_power_off(); > > do_exit(0); > > break; > > > > If power-off fails (for whatever reason), do_exit(0) will trigger a > > panic when called from PID 1. > > Oh, systemd calls this from PID1? I guess that's another reason to hate > systemd with vitriol. :) SysVinit and upstart implementations call it > from the "halt" command, which is itself normally run from a script, > which totally avoids that problem. Yeah, that's why I never noticed the missing mdelay. > I'm quite sure the insane systemd lobby will scream that this is a kernel > bug and will want to change it somehow, just like they want to change the > kernel in soo many other silly ways. Will be interesting to follow. :) Johan
On Wed, Oct 29, 2014 at 06:20:40AM -0700, Guenter Roeck wrote: > On 10/29/2014 05:34 AM, Johan Hovold wrote: > > On Tue, Oct 28, 2014 at 03:16:10PM +0000, Russell King - ARM Linux wrote: > >> On Tue, Oct 28, 2014 at 02:12:57PM +0100, Johan Hovold wrote: > >>> That's not what I was trying to refer to. But the patch set explicitly > >>> allows for multiple, prioritised power-off handlers, which can power > >>> off a board in different ways and with various degrees of success. > >>> Specifically, it allows for fallback handlers in case one or more > >>> power-off handlers fail. > >>> > >>> So if we allow for that, what is to prevent the final power-off handler > >>> from failing? And should this not be logged by arch code in the same way > >>> as failure to restart is? > >> > >> And how is that different from having a set of power-off handlers, and > >> reporting when each individual one fails? Don't you want to know if > >> your primary high priority reboot handler fails, just as much as you > >> want to know if your final last-resort power-off handler fails? > > > > Good point. Failed power-off should probably be logged by the power-off > > call chain implementation (which seems to makes notifier chains a bad > > fit). > > Good that I just replaced notifier chain with an open coded implementation. Good to hear. > Sure, that is possible, but I would prefer to do that as a follow-up commit, > and it should be discussed in the context of the power-off handler patch set. Fine with me. > > And what about any power-off latencies? Should this always be dealt with > > in the power-off handler? > > > > Again, if it's predictable and high, as in the OMAP RTC case, it should > > go in the handler. But what if it's just normal bus latencies > > (peripheral busses, i2c, or whatever people may come up with)? > > > > Should there always be a short delay before calling the next handler? > > That delay would depend on the individual power-off handler, so I think > the current implementation works just fine (where power-off handlers > implement the delay). Some don't, and could possibly unknowingly have been relying on the fact that they could return to user space and be powered off at some later time. With systemd that would have caused a panic. Also consider generic power-off handlers such as gpio-poweroff. It currently hard-codes a three-second delay but the actual delay would really be board specific. > We could move the delay into the infrastructure, but it would have > to be configurable. I would prefer to consider that as a follow-up patch > to not overload the power-off handler patch set with too many changes > at the same time. Sure. Johan
[ Again, please stop with the top-posting. A: http://en.wikipedia.org/wiki/Top_post Q: Were do I find info about this thing called top-posting? A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? ] On Wed, Oct 29, 2014 at 02:11:02PM +0100, Romain Perier wrote: > This just to be sure, nothing more. I did read that you mentionned > "poweroff-source" earlier. However If I am wrong, my bad, everything > is fine. I only mentioned that you had proposed that name in my cover letter. Johan
On Wed, Oct 29, 2014 at 02:22:44PM +0100, Johan Hovold wrote: > On Wed, Oct 29, 2014 at 01:10:20PM +0000, Russell King - ARM Linux wrote: > > On Wed, Oct 29, 2014 at 01:34:18PM +0100, Johan Hovold wrote: > > > On Tue, Oct 28, 2014 at 03:16:10PM +0000, Russell King - ARM Linux wrote: > > > > And how is that different from having a set of power-off handlers, and > > > > reporting when each individual one fails? Don't you want to know if > > > > your primary high priority reboot handler fails, just as much as you > > > > want to know if your final last-resort power-off handler fails? > > > > > > Good point. Failed power-off should probably be logged by the power-off > > > call chain implementation (which seems to makes notifier chains a bad > > > fit). > > > > > > And what about any power-off latencies? Should this always be dealt with > > > in the power-off handler? > > > > > > Again, if it's predictable and high, as in the OMAP RTC case, it should > > > go in the handler. But what if it's just normal bus latencies > > > (peripheral busses, i2c, or whatever people may come up with)? > > > > > > Should there always be a short delay before calling the next handler? > > > > If the handler has determined that it has failed, then why delay before > > trying the next handler? At the point it has decided it has failed, > > surely that's after it has waited sufficient time to determine that > > failure? > > The current handlers we have are not expecting any other handler to be > run after they return. My question was whether all these handlers should > get a short mdelay added to them (e.g. to compensate for bus latencies) Some of them do add a delay. > or if this could be done in the power-off handler (e.g. before printing > the error message). > That might make sense, but it would have to be configurable, since the delay is platform specific and power-off handler does not know how long to wait (the longest delay I have seen is 10 seconds). > > > > Or different from having no power-off handlers. > > > > > > That is actually quite different, as in that case we call machine_halt > > > instead (via kernel_halt). > > > > Today, ARM does exactly what x86 does. If there's no power off handler > > registered, machine_power_off() shuts down other CPUs and returns. > > No, if there are no power-off handlers registered, kernel/reboot.c will > never call machine_power_off: > > /* Instead of trying to make the power_off code look like > * halt when pm_power_off is not set do it the easy way. > */ > if ((cmd == LINUX_REBOOT_CMD_POWER_OFF) && !pm_power_off) > cmd = LINUX_REBOOT_CMD_HALT; > > So in that case on arm, a system-halted message is printed, and we never > return to user-space. > Some architectures do that, or go into an endless loop. Others do return from machine_power_off. Having a well defined behavior would be nice (such as dumping an error mesasge and calling machine_halt if machine_power_off returns). Only question would be where to put it. kernel_power_off() might be a good place; only problem is that there are direct callers of machine_power_off(). Guenter
On Wed, Oct 29, 2014 at 02:35:26PM +0100, Johan Hovold wrote: > On Wed, Oct 29, 2014 at 06:20:40AM -0700, Guenter Roeck wrote: > > On 10/29/2014 05:34 AM, Johan Hovold wrote: > > > On Tue, Oct 28, 2014 at 03:16:10PM +0000, Russell King - ARM Linux wrote: > > >> On Tue, Oct 28, 2014 at 02:12:57PM +0100, Johan Hovold wrote: > > >>> That's not what I was trying to refer to. But the patch set explicitly > > >>> allows for multiple, prioritised power-off handlers, which can power > > >>> off a board in different ways and with various degrees of success. > > >>> Specifically, it allows for fallback handlers in case one or more > > >>> power-off handlers fail. > > >>> > > >>> So if we allow for that, what is to prevent the final power-off handler > > >>> from failing? And should this not be logged by arch code in the same way > > >>> as failure to restart is? > > >> > > >> And how is that different from having a set of power-off handlers, and > > >> reporting when each individual one fails? Don't you want to know if > > >> your primary high priority reboot handler fails, just as much as you > > >> want to know if your final last-resort power-off handler fails? > > > > > > Good point. Failed power-off should probably be logged by the power-off > > > call chain implementation (which seems to makes notifier chains a bad > > > fit). > > > > Good that I just replaced notifier chain with an open coded implementation. > > Good to hear. > > > Sure, that is possible, but I would prefer to do that as a follow-up commit, > > and it should be discussed in the context of the power-off handler patch set. > > Fine with me. > > > > And what about any power-off latencies? Should this always be dealt with > > > in the power-off handler? > > > > > > Again, if it's predictable and high, as in the OMAP RTC case, it should > > > go in the handler. But what if it's just normal bus latencies > > > (peripheral busses, i2c, or whatever people may come up with)? > > > > > > Should there always be a short delay before calling the next handler? > > > > That delay would depend on the individual power-off handler, so I think > > the current implementation works just fine (where power-off handlers > > implement the delay). > > Some don't, and could possibly unknowingly have been relying on the fact > that they could return to user space and be powered off at some later > time. With systemd that would have caused a panic. > Agreed, but there are two cases to consider: What should be the delay before the next power-off handler is called, and what should the system do if all power-off handlers fail (or if there are none). The current behavior isn't exactly well defined. Ok, with systemd that results in a crash, but I am not really sure if one can or should blame systemd for that. The discussion about systemd and its philosophy should not cloud the fact that power-off behavior isn't exactly well defined. > Also consider generic power-off handlers such as gpio-poweroff. It > currently hard-codes a three-second delay but the actual delay would > really be board specific. > A configurable delay would address that. The actually required delay could be provided in platform data or as devicetree property. Thanks, Guenter
On Wed, Oct 29, 2014 at 08:25:02AM -0700, Guenter Roeck wrote: > On Wed, Oct 29, 2014 at 02:22:44PM +0100, Johan Hovold wrote: > > On Wed, Oct 29, 2014 at 01:10:20PM +0000, Russell King - ARM Linux wrote: > > > On Wed, Oct 29, 2014 at 01:34:18PM +0100, Johan Hovold wrote: > > > > On Tue, Oct 28, 2014 at 03:16:10PM +0000, Russell King - ARM Linux wrote: > > > > > And how is that different from having a set of power-off handlers, and > > > > > reporting when each individual one fails? Don't you want to know if > > > > > your primary high priority reboot handler fails, just as much as you > > > > > want to know if your final last-resort power-off handler fails? > > > > > > > > Good point. Failed power-off should probably be logged by the power-off > > > > call chain implementation (which seems to makes notifier chains a bad > > > > fit). > > > > > > > > And what about any power-off latencies? Should this always be dealt with > > > > in the power-off handler? > > > > > > > > Again, if it's predictable and high, as in the OMAP RTC case, it should > > > > go in the handler. But what if it's just normal bus latencies > > > > (peripheral busses, i2c, or whatever people may come up with)? > > > > > > > > Should there always be a short delay before calling the next handler? > > > > > > If the handler has determined that it has failed, then why delay before > > > trying the next handler? At the point it has decided it has failed, > > > surely that's after it has waited sufficient time to determine that > > > failure? > > > > The current handlers we have are not expecting any other handler to be > > run after they return. My question was whether all these handlers should > > get a short mdelay added to them (e.g. to compensate for bus latencies) > > Some of them do add a delay. Yes, but not all. > > or if this could be done in the power-off handler (e.g. before printing > > the error message). > > > That might make sense, but it would have to be configurable, since the delay > is platform specific and power-off handler does not know how long to wait > (the longest delay I have seen is 10 seconds). We've already concluded in this thread that such delays must be encoded in the actual handler (even if it's an argument to the power-off call chain code). The only exception should be generic handlers such as gpio-poweroff, which may need to define different delays depending on board. This could of course just be an argument of the corresponding DT node. My question above was if it was reasonable to add some generic short delay after calling each power-off handler to handle short power-off latencies (e.g. bus latencies) so that not every handler needs an explicit delay. > > > > > Or different from having no power-off handlers. > > > > > > > > That is actually quite different, as in that case we call machine_halt > > > > instead (via kernel_halt). > > > > > > Today, ARM does exactly what x86 does. If there's no power off handler > > > registered, machine_power_off() shuts down other CPUs and returns. > > > > No, if there are no power-off handlers registered, kernel/reboot.c will > > never call machine_power_off: > > > > /* Instead of trying to make the power_off code look like > > * halt when pm_power_off is not set do it the easy way. > > */ > > if ((cmd == LINUX_REBOOT_CMD_POWER_OFF) && !pm_power_off) > > cmd = LINUX_REBOOT_CMD_HALT; > > > > So in that case on arm, a system-halted message is printed, and we never > > return to user-space. > > > Some architectures do that, or go into an endless loop. Others do return > from machine_power_off. Please re-read my comment and the code above. machine_power_off is never called if there's no handler registered. Some archs machine_power_off to spin on failed power-off (i.e. when there is a handler), something which I've mentioned a few times already in this thread. > Having a well defined behavior would be nice > (such as dumping an error mesasge and calling machine_halt if > machine_power_off returns). Only question would be where to put it. > kernel_power_off() might be a good place; only problem is that there > are direct callers of machine_power_off(). Indeed. Adding an error message to the power-off handler call chain code would solve the first problem as I mentioned before. Then it's mostly a matter of whether we care about consistency, and either remove the indefinite spins from those non-x86/non-arm arches, or prevent the latter (and some others) from returning to user-space. I'm inclined to having all arches return to user space on failed power-off, even if it means systemd cannot call reboot() from PID 1. Johan
On Wed, Oct 29, 2014 at 08:36:41AM -0700, Guenter Roeck wrote: > On Wed, Oct 29, 2014 at 02:35:26PM +0100, Johan Hovold wrote: > > On Wed, Oct 29, 2014 at 06:20:40AM -0700, Guenter Roeck wrote: > > > On 10/29/2014 05:34 AM, Johan Hovold wrote: > > > > And what about any power-off latencies? Should this always be dealt with > > > > in the power-off handler? > > > > > > > > Again, if it's predictable and high, as in the OMAP RTC case, it should > > > > go in the handler. But what if it's just normal bus latencies > > > > (peripheral busses, i2c, or whatever people may come up with)? > > > > > > > > Should there always be a short delay before calling the next handler? > > > > > > That delay would depend on the individual power-off handler, so I think > > > the current implementation works just fine (where power-off handlers > > > implement the delay). > > > > Some don't, and could possibly unknowingly have been relying on the fact > > that they could return to user space and be powered off at some later > > time. With systemd that would have caused a panic. > > Agreed, but there are two cases to consider: What should be the delay > before the next power-off handler is called, and what should the system > do if all power-off handlers fail (or if there are none). The current > behavior isn't exactly well defined. Ok, with systemd that results in > a crash, but I am not really sure if one can or should blame systemd > for that. The discussion about systemd and its philosophy should not > cloud the fact that power-off behavior isn't exactly well defined. Sounds like we pretty much agree. See my response to your last mail. > > Also consider generic power-off handlers such as gpio-poweroff. It > > currently hard-codes a three-second delay but the actual delay would > > really be board specific. > > > A configurable delay would address that. The actually required delay > could be provided in platform data or as devicetree property. Yep, see mail mentioned above. Johan
On Wed, Oct 29, 2014 at 04:51:09PM +0100, Johan Hovold wrote: > On Wed, Oct 29, 2014 at 08:25:02AM -0700, Guenter Roeck wrote: > > On Wed, Oct 29, 2014 at 02:22:44PM +0100, Johan Hovold wrote: > > > On Wed, Oct 29, 2014 at 01:10:20PM +0000, Russell King - ARM Linux wrote: > > > > On Wed, Oct 29, 2014 at 01:34:18PM +0100, Johan Hovold wrote: > > > > > On Tue, Oct 28, 2014 at 03:16:10PM +0000, Russell King - ARM Linux wrote: > > > > > > And how is that different from having a set of power-off handlers, and > > > > > > reporting when each individual one fails? Don't you want to know if > > > > > > your primary high priority reboot handler fails, just as much as you > > > > > > want to know if your final last-resort power-off handler fails? > > > > > > > > > > Good point. Failed power-off should probably be logged by the power-off > > > > > call chain implementation (which seems to makes notifier chains a bad > > > > > fit). > > > > > > > > > > And what about any power-off latencies? Should this always be dealt with > > > > > in the power-off handler? > > > > > > > > > > Again, if it's predictable and high, as in the OMAP RTC case, it should > > > > > go in the handler. But what if it's just normal bus latencies > > > > > (peripheral busses, i2c, or whatever people may come up with)? > > > > > > > > > > Should there always be a short delay before calling the next handler? > > > > > > > > If the handler has determined that it has failed, then why delay before > > > > trying the next handler? At the point it has decided it has failed, > > > > surely that's after it has waited sufficient time to determine that > > > > failure? > > > > > > The current handlers we have are not expecting any other handler to be > > > run after they return. My question was whether all these handlers should > > > get a short mdelay added to them (e.g. to compensate for bus latencies) > > > > Some of them do add a delay. > > Yes, but not all. > > > > or if this could be done in the power-off handler (e.g. before printing > > > the error message). > > > > > That might make sense, but it would have to be configurable, since the delay > > is platform specific and power-off handler does not know how long to wait > > (the longest delay I have seen is 10 seconds). > > We've already concluded in this thread that such delays must be encoded > in the actual handler (even if it's an argument to the power-off call > chain code). The only exception should be generic handlers such as > gpio-poweroff, which may need to define different delays depending on > board. This could of course just be an argument of the corresponding DT > node. There probably should be a generic device-tree property for this, even if some delays would be device specific (e.g. the two second delay for omap rtc). But on top of that, external latencies would be quite board specific. The PMIC on the Beaglebone black adds a 50ms deglitch time, for instance. > My question above was if it was reasonable to add some generic short > delay after calling each power-off handler to handle short power-off > latencies (e.g. bus latencies) so that not every handler needs an > explicit delay. I just added a 500ms margin to the OMAP rtc power-off handler. That should be more than enough, and perhaps something like that could be a default in the power-off call chain. Consider that arch-arm has a 1s grace period before reporting failed reboot already today. Johan
diff --git a/drivers/rtc/rtc-omap.c b/drivers/rtc/rtc-omap.c index e74750f00b18..e4f97ad9eb21 100644 --- a/drivers/rtc/rtc-omap.c +++ b/drivers/rtc/rtc-omap.c @@ -423,6 +423,8 @@ static void omap_rtc_power_off(void) val = rtc_read(rtc, OMAP_RTC_INTERRUPTS_REG); rtc_writel(rtc, OMAP_RTC_INTERRUPTS_REG, val | OMAP_RTC_INTERRUPTS_IT_ALARM2); + + mdelay(2000); } > > The one-second delay is there in machine_power_off to catch most cases