Message ID | 20170708000303.21863-2-dbasehore@chromium.org (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
On Sat, Jul 8, 2017 at 3:03 AM, Derek Basehore <dbasehore@chromium.org> wrote: > Adds a new feature to tick to schedule wakeups on a CPU during freeze. > This won't fully wake up the system (devices are not resumed), but > allow simple platform functionality to be run during freeze with > little power impact. > > This implementation allows an idle driver to setup a timer event with > the clock event device when entering freeze by calling > tick_set_freeze_event. Only one caller should exist for the function. > > tick_freeze_event_expired is used to check if the timer went off when > the CPU wakes. > > The event is cleared by tick_clear_freeze_event. > +int tick_set_freeze_event(int cpu, ktime_t delta) > +{ > + printk_deferred(KERN_WARNING > + "[%s] unsupported by clock event device\n", Can it be one line? > + printk_deferred(KERN_WARNING > + "[%s] clock event device in use\n", Ditto. > + printk_deferred(KERN_WARNING > + "[%s] %lluns outside range: [%lluns, %lluns]\n", Ditto. > + printk_deferred(KERN_WARNING > + "Failed to program freeze event\n"); Ditto. > +int tick_freeze_event_expired(int cpu) > +{ > + struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev; > + > + if (!(dev && dev->event_expired)) Usually we use a pattern (!x || !x->y). At least for me it looks slightly better to read. > + return 0; > + > + return dev->event_expired(dev); > +}
On Friday, July 07, 2017 05:03:00 PM Derek Basehore wrote:
> Adds a new feature to tick to schedule wakeups on a CPU during freeze.
This is referred to "suspend-to-idle" nowadays.
I guess I need to update the code to be more consistent with respect to the
terminology.
Thanks,
Rafael
On Sat, Jul 8, 2017 at 9:05 AM, Andy Shevchenko <andy.shevchenko@gmail.com> wrote: > On Sat, Jul 8, 2017 at 3:03 AM, Derek Basehore <dbasehore@chromium.org> wrote: >> Adds a new feature to tick to schedule wakeups on a CPU during freeze. >> This won't fully wake up the system (devices are not resumed), but >> allow simple platform functionality to be run during freeze with >> little power impact. >> >> This implementation allows an idle driver to setup a timer event with >> the clock event device when entering freeze by calling >> tick_set_freeze_event. Only one caller should exist for the function. >> >> tick_freeze_event_expired is used to check if the timer went off when >> the CPU wakes. >> >> The event is cleared by tick_clear_freeze_event. > >> +int tick_set_freeze_event(int cpu, ktime_t delta) >> +{ > >> + printk_deferred(KERN_WARNING >> + "[%s] unsupported by clock event device\n", > > Can it be one line? Sure. It seems that some of these lines were at 80 characters on one line anyways. Putting some of these on one line breaks the 80 character limit and doesn't help with grepping through code, though. > >> + printk_deferred(KERN_WARNING >> + "[%s] clock event device in use\n", > > Ditto. > >> + printk_deferred(KERN_WARNING >> + "[%s] %lluns outside range: [%lluns, %lluns]\n", > > Ditto. > >> + printk_deferred(KERN_WARNING >> + "Failed to program freeze event\n"); > > Ditto. > >> +int tick_freeze_event_expired(int cpu) >> +{ >> + struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev; >> + >> + if (!(dev && dev->event_expired)) > > Usually we use a pattern (!x || !x->y). At least for me it looks > slightly better to read. Will do. > >> + return 0; >> + >> + return dev->event_expired(dev); >> +} > > -- > With Best Regards, > Andy Shevchenko
On Fri, 7 Jul 2017, Derek Basehore wrote: > Adds a new feature to tick to schedule wakeups on a CPU during freeze. > This won't fully wake up the system (devices are not resumed), but > allow simple platform functionality to be run during freeze with > little power impact. > > This implementation allows an idle driver to setup a timer event with > the clock event device when entering freeze by calling > tick_set_freeze_event. Only one caller should exist for the function. Emphasis on should. > tick_freeze_event_expired is used to check if the timer went off when > the CPU wakes. That makes me shudder. > +/* > + * Clockevent device may run during freeze > + */ > +# define CLOCK_EVT_FEAT_FREEZE_NONSTOP 0x000100 Is that really restricted to freezing? > +/** > + * tick_set_freeze_event - Set timer to wake up the CPU from freeze. > + * > + * @cpu: CPU to set the clock event for > + * @delta: time to wait before waking the CPU > + * > + * Returns 0 on success and -EERROR on failure. > + */ > +int tick_set_freeze_event(int cpu, ktime_t delta) > +{ > + struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev; This is fundamentally wrong. If that is invoked for cpu != smp_processor_id() then everything below is utter crap because you CANNOT access a real per cpu timer of a remote CPU. x86 will silently fiddle on the current CPU and others will create an utter mess or simply crash and burn. The only way to use this is w/o the 'cpu' argument and restricted to the clock events device of the CPU on which this is invoked. > + u64 delta_ns; > + int ret; > + > + if (!dev->set_next_event || We have a feature flag for that. > + !(dev->features & CLOCK_EVT_FEAT_FREEZE_NONSTOP)) { > + printk_deferred(KERN_WARNING > + "[%s] unsupported by clock event device\n", > + __func__); Please get rid of these __func__ uglies. And looking at it, get rid of all of this printk stuff. You have proper return codes, so the call site can act accordingly. > + return -EPERM; > + } > + > + if (!clockevent_state_shutdown(dev)) { What puts the device in shutdown mode when the machine is in freeze state? > + printk_deferred(KERN_WARNING > + "[%s] clock event device in use\n", > + __func__); > + return -EBUSY; > + } > + > + delta_ns = ktime_to_ns(delta); > + if (delta_ns > dev->max_delta_ns || delta_ns < dev->min_delta_ns) { > + printk_deferred(KERN_WARNING > + "[%s] %lluns outside range: [%lluns, %lluns]\n", > + __func__, delta_ns, dev->min_delta_ns, > + dev->max_delta_ns); > + return -ERANGE; > + } > + > + clockevents_tick_resume(dev); That looks wrong as well. What did call suspend on that device? I'm not aware that freeze will actually call suspend on anything down deep in the core code. Can you please explain this magic here? > + clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT); > + ret = dev->set_next_event((delta_ns * dev->mult) >> dev->shift, dev); > + if (ret < 0) { > + printk_deferred(KERN_WARNING > + "Failed to program freeze event\n"); > + clockevents_shutdown(dev); > + } > + return ret; > +} > +EXPORT_SYMBOL_GPL(tick_set_freeze_event); > + > +/** > + * tick_freeze_event_expired - Check if the programmed freeze event expired > + * > + * @cpu: CPU to check the clock event device for an expired event > + * > + * Returns 1 if the event expired and 0 otherwise. > + */ > +int tick_freeze_event_expired(int cpu) > +{ > + struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev; > + > + if (!(dev && dev->event_expired)) > + return 0; > + > + return dev->event_expired(dev); Same issue vs. the cpu argument as above. > +} > + > +/** > + * tick_clear_freeze_event - Shuts down the clock device after programming a > + * freeze event. > + * > + * @cpu: CPU to shutdown the clock device for > + * > + * Returns 0 on success and -EERROR otherwise. > + */ > +int tick_clear_freeze_event(int cpu) > +{ > + struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev; Ditto. > + if (!dev) > + return -ENODEV; > + > + if (!clockevent_state_oneshot(dev)) > + return -EBUSY; All of this lacks an explanation how any of this is safe vs. the normal operation of clock event devices and the tick device code. This lacks documentation of calling conventions and checks which make sure they are obeyed. Thanks, tglx
On Wed, Jul 12, 2017 at 2:25 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > On Fri, 7 Jul 2017, Derek Basehore wrote: >> Adds a new feature to tick to schedule wakeups on a CPU during freeze. >> This won't fully wake up the system (devices are not resumed), but >> allow simple platform functionality to be run during freeze with >> little power impact. >> >> This implementation allows an idle driver to setup a timer event with >> the clock event device when entering freeze by calling >> tick_set_freeze_event. Only one caller should exist for the function. > > Emphasis on should. > >> tick_freeze_event_expired is used to check if the timer went off when >> the CPU wakes. > > That makes me shudder. > >> +/* >> + * Clockevent device may run during freeze >> + */ >> +# define CLOCK_EVT_FEAT_FREEZE_NONSTOP 0x000100 > > Is that really restricted to freezing? I'm not going to make the call on what firmware may do when it enters suspend. Even if it's not supposed to do anything, it still might. > >> +/** >> + * tick_set_freeze_event - Set timer to wake up the CPU from freeze. >> + * >> + * @cpu: CPU to set the clock event for >> + * @delta: time to wait before waking the CPU >> + * >> + * Returns 0 on success and -EERROR on failure. >> + */ >> +int tick_set_freeze_event(int cpu, ktime_t delta) >> +{ >> + struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev; > > This is fundamentally wrong. If that is invoked for > > cpu != smp_processor_id() > > then everything below is utter crap because you CANNOT access a real per > cpu timer of a remote CPU. x86 will silently fiddle on the current CPU and > others will create an utter mess or simply crash and burn. > > The only way to use this is w/o the 'cpu' argument and restricted to the > clock events device of the CPU on which this is invoked. > >> + u64 delta_ns; >> + int ret; >> + >> + if (!dev->set_next_event || > > We have a feature flag for that. > >> + !(dev->features & CLOCK_EVT_FEAT_FREEZE_NONSTOP)) { >> + printk_deferred(KERN_WARNING >> + "[%s] unsupported by clock event device\n", >> + __func__); > > Please get rid of these __func__ uglies. And looking at it, get rid of all > of this printk stuff. You have proper return codes, so the call site can > act accordingly. > >> + return -EPERM; >> + } >> + >> + if (!clockevent_state_shutdown(dev)) { > > What puts the device in shutdown mode when the machine is in freeze state? tick_freeze does through timekeeping_suspend or tick_suspend_local depending on whether it's the last CPU to freeze or not. > >> + printk_deferred(KERN_WARNING >> + "[%s] clock event device in use\n", >> + __func__); >> + return -EBUSY; >> + } >> + >> + delta_ns = ktime_to_ns(delta); >> + if (delta_ns > dev->max_delta_ns || delta_ns < dev->min_delta_ns) { >> + printk_deferred(KERN_WARNING >> + "[%s] %lluns outside range: [%lluns, %lluns]\n", >> + __func__, delta_ns, dev->min_delta_ns, >> + dev->max_delta_ns); >> + return -ERANGE; >> + } >> + >> + clockevents_tick_resume(dev); > > That looks wrong as well. What did call suspend on that device? > > I'm not aware that freeze will actually call suspend on anything down deep > in the core code. Can you please explain this magic here? tick_freeze in tick-common.c calls the tick suspend code. > >> + clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT); >> + ret = dev->set_next_event((delta_ns * dev->mult) >> dev->shift, dev); >> + if (ret < 0) { >> + printk_deferred(KERN_WARNING >> + "Failed to program freeze event\n"); >> + clockevents_shutdown(dev); >> + } >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(tick_set_freeze_event); >> + >> +/** >> + * tick_freeze_event_expired - Check if the programmed freeze event expired >> + * >> + * @cpu: CPU to check the clock event device for an expired event >> + * >> + * Returns 1 if the event expired and 0 otherwise. >> + */ >> +int tick_freeze_event_expired(int cpu) >> +{ >> + struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev; >> + >> + if (!(dev && dev->event_expired)) >> + return 0; >> + >> + return dev->event_expired(dev); > > Same issue vs. the cpu argument as above. > >> +} >> + >> +/** >> + * tick_clear_freeze_event - Shuts down the clock device after programming a >> + * freeze event. >> + * >> + * @cpu: CPU to shutdown the clock device for >> + * >> + * Returns 0 on success and -EERROR otherwise. >> + */ >> +int tick_clear_freeze_event(int cpu) >> +{ >> + struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev; > > Ditto. > >> + if (!dev) >> + return -ENODEV; >> + >> + if (!clockevent_state_oneshot(dev)) >> + return -EBUSY; > > All of this lacks an explanation how any of this is safe vs. the normal > operation of clock event devices and the tick device code. > > This lacks documentation of calling conventions and checks which make sure > they are obeyed. If I get rid of passing in the cpu id, the only thing left to check seems to be making sure that tick_clear_freeze_event is called on the same CPU as tick_set_freeze_event. Am I missing something? I'll add Documentation. > > Thanks, > > tglx
Derek, On Wed, 12 Jul 2017, dbasehore . wrote: > On Wed, Jul 12, 2017 at 2:25 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Fri, 7 Jul 2017, Derek Basehore wrote: > >> +/* > >> + * Clockevent device may run during freeze > >> + */ > >> +# define CLOCK_EVT_FEAT_FREEZE_NONSTOP 0x000100 > > > > Is that really restricted to freezing? > > I'm not going to make the call on what firmware may do when it enters > suspend. Even if it's not supposed to do anything, it still might. The feature is a description of the hardware capability not of the way how it's used by software. > >> + if (!clockevent_state_shutdown(dev)) { > > > > What puts the device in shutdown mode when the machine is in freeze state? > > tick_freeze does through timekeeping_suspend or tick_suspend_local > depending on whether it's the last CPU to freeze or not. Ok. I completely forgot about the inner workings of freeze. So this check for shutdown actually wants to have a printk, because that's a state which is wrong. The nonavailability of oneshot mode is just factual information that this is not possible and does not justify dmesg spam. > >> + clockevents_tick_resume(dev); > > > > That looks wrong as well. What did call suspend on that device? > > > > I'm not aware that freeze will actually call suspend on anything down deep > > in the core code. Can you please explain this magic here? > > tick_freeze in tick-common.c calls the tick suspend code. Fair enough. > > All of this lacks an explanation how any of this is safe vs. the normal > > operation of clock event devices and the tick device code. > > > > This lacks documentation of calling conventions and checks which make sure > > they are obeyed. > > If I get rid of passing in the cpu id, the only thing left to check > seems to be making sure that tick_clear_freeze_event is called on the > same CPU as tick_set_freeze_event. Yes, you want to store that information somewhere. > Am I missing something? I'll add Documentation. Please make that whole thing depend on a Kconfig option. There is no point having the code and the exports there for everyone while it has only a single user. Thanks, tglx
On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote: > Adds a new feature to tick to schedule wakeups on a CPU during freeze. > This won't fully wake up the system (devices are not resumed), but > allow simple platform functionality to be run during freeze with > little power impact. > > This implementation allows an idle driver to setup a timer event with > the clock event device when entering freeze by calling > tick_set_freeze_event. Only one caller should exist for the function. > > tick_freeze_event_expired is used to check if the timer went off when > the CPU wakes. > > The event is cleared by tick_clear_freeze_event. Why? What's wrong with using the RTC stuff? RTC should be able to wake suspended systems, see RTCWAKE(8).
On Thu, Jul 13, 2017 at 9:32 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote: >> Adds a new feature to tick to schedule wakeups on a CPU during freeze. >> This won't fully wake up the system (devices are not resumed), but >> allow simple platform functionality to be run during freeze with >> little power impact. >> >> This implementation allows an idle driver to setup a timer event with >> the clock event device when entering freeze by calling >> tick_set_freeze_event. Only one caller should exist for the function. >> >> tick_freeze_event_expired is used to check if the timer went off when >> the CPU wakes. >> >> The event is cleared by tick_clear_freeze_event. > > Why? What's wrong with using the RTC stuff? RTC should be able to wake > suspended systems, see RTCWAKE(8). The RTC interrupt is an SCI (on ACPI systems) and we don't handle it at this point, so we don't know what woke us up until we re-enable interrupt handlers and run the one for the SCI.
On Thu, Jul 13, 2017 at 8:09 AM, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Thu, Jul 13, 2017 at 9:32 AM, Peter Zijlstra <peterz@infradead.org> wrote: >> On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote: >>> Adds a new feature to tick to schedule wakeups on a CPU during freeze. >>> This won't fully wake up the system (devices are not resumed), but >>> allow simple platform functionality to be run during freeze with >>> little power impact. >>> >>> This implementation allows an idle driver to setup a timer event with >>> the clock event device when entering freeze by calling >>> tick_set_freeze_event. Only one caller should exist for the function. >>> >>> tick_freeze_event_expired is used to check if the timer went off when >>> the CPU wakes. >>> >>> The event is cleared by tick_clear_freeze_event. >> >> Why? What's wrong with using the RTC stuff? RTC should be able to wake >> suspended systems, see RTCWAKE(8). > > The RTC interrupt is an SCI (on ACPI systems) and we don't handle it > at this point, so we don't know what woke us up until we re-enable > interrupt handlers and run the one for the SCI. To add to that point, RTC wake ups are valid for fully waking up the system. The clock event wake up wasn't used for waking up the system before, so we know that we don't have to check if it should wake up the system entirely. The way rtc timers work right now, I think that we'd have to go all the way through resume devices to figure out if we should resume to user space or freeze again.
On Thursday, July 13, 2017 03:58:53 PM dbasehore . wrote: > On Thu, Jul 13, 2017 at 8:09 AM, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Thu, Jul 13, 2017 at 9:32 AM, Peter Zijlstra <peterz@infradead.org> wrote: > >> On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote: > >>> Adds a new feature to tick to schedule wakeups on a CPU during freeze. > >>> This won't fully wake up the system (devices are not resumed), but > >>> allow simple platform functionality to be run during freeze with > >>> little power impact. > >>> > >>> This implementation allows an idle driver to setup a timer event with > >>> the clock event device when entering freeze by calling > >>> tick_set_freeze_event. Only one caller should exist for the function. > >>> > >>> tick_freeze_event_expired is used to check if the timer went off when > >>> the CPU wakes. > >>> > >>> The event is cleared by tick_clear_freeze_event. > >> > >> Why? What's wrong with using the RTC stuff? RTC should be able to wake > >> suspended systems, see RTCWAKE(8). > > > > The RTC interrupt is an SCI (on ACPI systems) and we don't handle it > > at this point, so we don't know what woke us up until we re-enable > > interrupt handlers and run the one for the SCI. > > To add to that point, RTC wake ups are valid for fully waking up the > system. The clock event wake up wasn't used for waking up the system > before, so we know that we don't have to check if it should wake up > the system entirely. The way rtc timers work right now, I think that > we'd have to go all the way through resume devices to figure out if we > should resume to user space or freeze again. Actually, that's not exactly the case any more. After some changes that went in for 4.13-rc1 there is an additional decision point in the resume path, after the noirq stage, where we can decide to go back to sleep if that's the right thing to do. This means that in principle you might hack the CMOS RTC driver to do something more sophisticated than just calling pm_wakeup_hard_event() in rtc_handler(). That's ACPI-specific, but I think you have ACPI on all of the systems where the residency counders are going to be checked anyway. Thanks, Rafael
On Sat, Jul 15, 2017 at 5:39 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > On Thursday, July 13, 2017 03:58:53 PM dbasehore . wrote: >> On Thu, Jul 13, 2017 at 8:09 AM, Rafael J. Wysocki <rafael@kernel.org> wrote: >> > On Thu, Jul 13, 2017 at 9:32 AM, Peter Zijlstra <peterz@infradead.org> wrote: >> >> On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote: >> >>> Adds a new feature to tick to schedule wakeups on a CPU during freeze. >> >>> This won't fully wake up the system (devices are not resumed), but >> >>> allow simple platform functionality to be run during freeze with >> >>> little power impact. >> >>> >> >>> This implementation allows an idle driver to setup a timer event with >> >>> the clock event device when entering freeze by calling >> >>> tick_set_freeze_event. Only one caller should exist for the function. >> >>> >> >>> tick_freeze_event_expired is used to check if the timer went off when >> >>> the CPU wakes. >> >>> >> >>> The event is cleared by tick_clear_freeze_event. >> >> >> >> Why? What's wrong with using the RTC stuff? RTC should be able to wake >> >> suspended systems, see RTCWAKE(8). >> > >> > The RTC interrupt is an SCI (on ACPI systems) and we don't handle it >> > at this point, so we don't know what woke us up until we re-enable >> > interrupt handlers and run the one for the SCI. >> >> To add to that point, RTC wake ups are valid for fully waking up the >> system. The clock event wake up wasn't used for waking up the system >> before, so we know that we don't have to check if it should wake up >> the system entirely. The way rtc timers work right now, I think that >> we'd have to go all the way through resume devices to figure out if we >> should resume to user space or freeze again. > > Actually, that's not exactly the case any more. > > After some changes that went in for 4.13-rc1 there is an additional decision > point in the resume path, after the noirq stage, where we can decide to go back > to sleep if that's the right thing to do. > > This means that in principle you might hack the CMOS RTC driver to do something > more sophisticated than just calling pm_wakeup_hard_event() in rtc_handler(). > > That's ACPI-specific, but I think you have ACPI on all of the systems where the > residency counders are going to be checked anyway. This will take more power than the current implementation I have. While I'm fine with that since the power draw would probably be within 100uW to 1mW, it's worth noting. This is because of the added overhead of noirq suspend and resume which take a combined time of about 50 to 100 ms depending on the platform. The implementation that used the APIC uses about 3uW. Rather than make the change in rtc_handler for the CMOS RTC driver, the change might be better in the drivers/rtc/interface.c code to better handle multiple RTC alarms. For example, there might be 2 alarms set for the same time (one that won't wake the system and one that will) or 2 alarms 1 second apart. In the later case, it's possible that 1 second will pass before the second alarm is scheduled. We need to make sure that the RTC irq runs before calling dpm_suspend_noirq too. If I remember correctly, I proposed using the RTC to wakeup for this check to you, but you recommended using the APIC instead. This was of course before the additional decision point was added to freeze. > > Thanks, > Rafael >
On Tue, Jul 18, 2017 at 2:30 AM, dbasehore . <dbasehore@chromium.org> wrote: > On Sat, Jul 15, 2017 at 5:39 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >> On Thursday, July 13, 2017 03:58:53 PM dbasehore . wrote: >>> On Thu, Jul 13, 2017 at 8:09 AM, Rafael J. Wysocki <rafael@kernel.org> wrote: >>> > On Thu, Jul 13, 2017 at 9:32 AM, Peter Zijlstra <peterz@infradead.org> wrote: >>> >> On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote: >>> >>> Adds a new feature to tick to schedule wakeups on a CPU during freeze. >>> >>> This won't fully wake up the system (devices are not resumed), but >>> >>> allow simple platform functionality to be run during freeze with >>> >>> little power impact. >>> >>> >>> >>> This implementation allows an idle driver to setup a timer event with >>> >>> the clock event device when entering freeze by calling >>> >>> tick_set_freeze_event. Only one caller should exist for the function. >>> >>> >>> >>> tick_freeze_event_expired is used to check if the timer went off when >>> >>> the CPU wakes. >>> >>> >>> >>> The event is cleared by tick_clear_freeze_event. >>> >> >>> >> Why? What's wrong with using the RTC stuff? RTC should be able to wake >>> >> suspended systems, see RTCWAKE(8). >>> > >>> > The RTC interrupt is an SCI (on ACPI systems) and we don't handle it >>> > at this point, so we don't know what woke us up until we re-enable >>> > interrupt handlers and run the one for the SCI. >>> >>> To add to that point, RTC wake ups are valid for fully waking up the >>> system. The clock event wake up wasn't used for waking up the system >>> before, so we know that we don't have to check if it should wake up >>> the system entirely. The way rtc timers work right now, I think that >>> we'd have to go all the way through resume devices to figure out if we >>> should resume to user space or freeze again. >> >> Actually, that's not exactly the case any more. >> >> After some changes that went in for 4.13-rc1 there is an additional decision >> point in the resume path, after the noirq stage, where we can decide to go back >> to sleep if that's the right thing to do. >> >> This means that in principle you might hack the CMOS RTC driver to do something >> more sophisticated than just calling pm_wakeup_hard_event() in rtc_handler(). >> >> That's ACPI-specific, but I think you have ACPI on all of the systems where the >> residency counders are going to be checked anyway. > > This will take more power than the current implementation I have. > While I'm fine with that since the power draw would probably be within > 100uW to 1mW, it's worth noting. This is because of the added overhead > of noirq suspend and resume which take a combined time of about 50 to > 100 ms depending on the platform. The implementation that used the > APIC uses about 3uW. That's correct, but I'm not quite sure how much of a difference it makes in practice. > Rather than make the change in rtc_handler for the CMOS RTC driver, > the change might be better in the drivers/rtc/interface.c code to > better handle multiple RTC alarms. For example, there might be 2 > alarms set for the same time (one that won't wake the system and one > that will) or 2 alarms 1 second apart. In the later case, it's > possible that 1 second will pass before the second alarm is scheduled. > We need to make sure that the RTC irq runs before calling > dpm_suspend_noirq too. Well, I guess the choice will depend on which patch looks more straightforward. :-) > If I remember correctly, I proposed using the RTC to wakeup for this > check to you, but you recommended using the APIC instead. This was of > course before the additional decision point was added to freeze. Right. That's why I recommended it at that time. Thanks, Rafael
On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Tue, Jul 18, 2017 at 2:30 AM, dbasehore . <dbasehore@chromium.org> wrote: >> On Sat, Jul 15, 2017 at 5:39 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: >>> On Thursday, July 13, 2017 03:58:53 PM dbasehore . wrote: >>>> On Thu, Jul 13, 2017 at 8:09 AM, Rafael J. Wysocki <rafael@kernel.org> wrote: >>>> > On Thu, Jul 13, 2017 at 9:32 AM, Peter Zijlstra <peterz@infradead.org> wrote: >>>> >> On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote: >>>> >>> Adds a new feature to tick to schedule wakeups on a CPU during freeze. >>>> >>> This won't fully wake up the system (devices are not resumed), but >>>> >>> allow simple platform functionality to be run during freeze with >>>> >>> little power impact. >>>> >>> >>>> >>> This implementation allows an idle driver to setup a timer event with >>>> >>> the clock event device when entering freeze by calling >>>> >>> tick_set_freeze_event. Only one caller should exist for the function. >>>> >>> >>>> >>> tick_freeze_event_expired is used to check if the timer went off when >>>> >>> the CPU wakes. >>>> >>> >>>> >>> The event is cleared by tick_clear_freeze_event. >>>> >> >>>> >> Why? What's wrong with using the RTC stuff? RTC should be able to wake >>>> >> suspended systems, see RTCWAKE(8). >>>> > >>>> > The RTC interrupt is an SCI (on ACPI systems) and we don't handle it >>>> > at this point, so we don't know what woke us up until we re-enable >>>> > interrupt handlers and run the one for the SCI. >>>> >>>> To add to that point, RTC wake ups are valid for fully waking up the >>>> system. The clock event wake up wasn't used for waking up the system >>>> before, so we know that we don't have to check if it should wake up >>>> the system entirely. The way rtc timers work right now, I think that >>>> we'd have to go all the way through resume devices to figure out if we >>>> should resume to user space or freeze again. >>> >>> Actually, that's not exactly the case any more. >>> >>> After some changes that went in for 4.13-rc1 there is an additional decision >>> point in the resume path, after the noirq stage, where we can decide to go back >>> to sleep if that's the right thing to do. >>> >>> This means that in principle you might hack the CMOS RTC driver to do something >>> more sophisticated than just calling pm_wakeup_hard_event() in rtc_handler(). >>> >>> That's ACPI-specific, but I think you have ACPI on all of the systems where the >>> residency counders are going to be checked anyway. >> >> This will take more power than the current implementation I have. >> While I'm fine with that since the power draw would probably be within >> 100uW to 1mW, it's worth noting. This is because of the added overhead >> of noirq suspend and resume which take a combined time of about 50 to >> 100 ms depending on the platform. The implementation that used the >> APIC uses about 3uW. > > That's correct, but I'm not quite sure how much of a difference it > makes in practice. > >> Rather than make the change in rtc_handler for the CMOS RTC driver, >> the change might be better in the drivers/rtc/interface.c code to >> better handle multiple RTC alarms. For example, there might be 2 >> alarms set for the same time (one that won't wake the system and one >> that will) or 2 alarms 1 second apart. In the later case, it's >> possible that 1 second will pass before the second alarm is scheduled. >> We need to make sure that the RTC irq runs before calling >> dpm_suspend_noirq too. > > Well, I guess the choice will depend on which patch looks more > straightforward. :-) I could make a patch to try it out. I would probably add a flag to rtc timers to indicate whether it wakes the system (default true). We would have to add a sync with the rtc irq and the rtc irqwork. I would probably add a rtc_timer_sync function that would flush the rtc irq and flush the irqwork. I would call this after the freeze_ops sync function since the sci irq needs to finish before syncing with the rtc irq. Also, pm_wakeup_irq seems racy with the current implementation of s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq when something else actually triggered the full wakeup. Fortunately, I don't think pm_wakeup_irq is used for anything except debugging, but we might change that. > >> If I remember correctly, I proposed using the RTC to wakeup for this >> check to you, but you recommended using the APIC instead. This was of >> course before the additional decision point was added to freeze. > > Right. That's why I recommended it at that time. > > Thanks, > Rafael
On Mon, 17 Jul 2017, dbasehore . wrote: > On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > I could make a patch to try it out. I would probably add a flag to rtc > timers to indicate whether it wakes the system (default true). We > would have to add a sync with the rtc irq and the rtc irqwork. I would > probably add a rtc_timer_sync function that would flush the rtc irq > and flush the irqwork. I would call this after the freeze_ops sync > function since the sci irq needs to finish before syncing with the rtc > irq. Also, pm_wakeup_irq seems racy with the current implementation of > s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq > when something else actually triggered the full wakeup. Fortunately, I > don't think pm_wakeup_irq is used for anything except debugging, but > we might change that. There is another option which you might consider. We can reserve one of the HPET comparators for that purpose and have a special interrupt handler for it. Checking the HPET for expiry from the low level code should be trivial. Thanks, tglx
On Mon, Jul 17, 2017 at 11:40 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > On Mon, 17 Jul 2017, dbasehore . wrote: >> On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: >> I could make a patch to try it out. I would probably add a flag to rtc >> timers to indicate whether it wakes the system (default true). We >> would have to add a sync with the rtc irq and the rtc irqwork. I would >> probably add a rtc_timer_sync function that would flush the rtc irq >> and flush the irqwork. I would call this after the freeze_ops sync >> function since the sci irq needs to finish before syncing with the rtc >> irq. Also, pm_wakeup_irq seems racy with the current implementation of >> s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq >> when something else actually triggered the full wakeup. Fortunately, I >> don't think pm_wakeup_irq is used for anything except debugging, but >> we might change that. > > There is another option which you might consider. We can reserve one of the > HPET comparators for that purpose and have a special interrupt handler for > it. Checking the HPET for expiry from the low level code should be trivial. > Does that handle setting up new timers properly or does the timer sync code still need to be written? > Thanks, > > tglx > >
On Tue, 18 Jul 2017, dbasehore . wrote: > On Mon, Jul 17, 2017 at 11:40 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Mon, 17 Jul 2017, dbasehore . wrote: > >> On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > >> I could make a patch to try it out. I would probably add a flag to rtc > >> timers to indicate whether it wakes the system (default true). We > >> would have to add a sync with the rtc irq and the rtc irqwork. I would > >> probably add a rtc_timer_sync function that would flush the rtc irq > >> and flush the irqwork. I would call this after the freeze_ops sync > >> function since the sci irq needs to finish before syncing with the rtc > >> irq. Also, pm_wakeup_irq seems racy with the current implementation of > >> s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq > >> when something else actually triggered the full wakeup. Fortunately, I > >> don't think pm_wakeup_irq is used for anything except debugging, but > >> we might change that. > > > > There is another option which you might consider. We can reserve one of the > > HPET comparators for that purpose and have a special interrupt handler for > > it. Checking the HPET for expiry from the low level code should be trivial. > > > Does that handle setting up new timers properly or does the timer sync > code still need to be written? Sorry, I don't understand the question. What is timer sync code? Thanks, tglx
On Tue, Jul 18, 2017 at 2:53 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > On Tue, 18 Jul 2017, dbasehore . wrote: >> On Mon, Jul 17, 2017 at 11:40 PM, Thomas Gleixner <tglx@linutronix.de> wrote: >> > On Mon, 17 Jul 2017, dbasehore . wrote: >> >> On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: >> >> I could make a patch to try it out. I would probably add a flag to rtc >> >> timers to indicate whether it wakes the system (default true). We >> >> would have to add a sync with the rtc irq and the rtc irqwork. I would >> >> probably add a rtc_timer_sync function that would flush the rtc irq >> >> and flush the irqwork. I would call this after the freeze_ops sync >> >> function since the sci irq needs to finish before syncing with the rtc >> >> irq. Also, pm_wakeup_irq seems racy with the current implementation of >> >> s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq >> >> when something else actually triggered the full wakeup. Fortunately, I >> >> don't think pm_wakeup_irq is used for anything except debugging, but >> >> we might change that. >> > >> > There is another option which you might consider. We can reserve one of the >> > HPET comparators for that purpose and have a special interrupt handler for >> > it. Checking the HPET for expiry from the low level code should be trivial. >> > >> Does that handle setting up new timers properly or does the timer sync >> code still need to be written? > > Sorry, I don't understand the question. What is timer sync code? > Does the comparator allow you to have a completely separate alarm set in the hardware? If there's another timer setup (say some user specified wake alarm), we need to program that alarm after the current one goes off if we aren't able to program multiple alarms at the same time. > Thanks, > > tglx
On Tue, 18 Jul 2017, dbasehore . wrote: > On Tue, Jul 18, 2017 at 2:53 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Tue, 18 Jul 2017, dbasehore . wrote: > >> On Mon, Jul 17, 2017 at 11:40 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > >> > On Mon, 17 Jul 2017, dbasehore . wrote: > >> >> On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > >> >> I could make a patch to try it out. I would probably add a flag to rtc > >> >> timers to indicate whether it wakes the system (default true). We > >> >> would have to add a sync with the rtc irq and the rtc irqwork. I would > >> >> probably add a rtc_timer_sync function that would flush the rtc irq > >> >> and flush the irqwork. I would call this after the freeze_ops sync > >> >> function since the sci irq needs to finish before syncing with the rtc > >> >> irq. Also, pm_wakeup_irq seems racy with the current implementation of > >> >> s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq > >> >> when something else actually triggered the full wakeup. Fortunately, I > >> >> don't think pm_wakeup_irq is used for anything except debugging, but > >> >> we might change that. > >> > > >> > There is another option which you might consider. We can reserve one of the > >> > HPET comparators for that purpose and have a special interrupt handler for > >> > it. Checking the HPET for expiry from the low level code should be trivial. > >> > > >> Does that handle setting up new timers properly or does the timer sync > >> code still need to be written? > > > > Sorry, I don't understand the question. What is timer sync code? > > > > Does the comparator allow you to have a completely separate alarm set > in the hardware? If there's another timer setup (say some user > specified wake alarm), we need to program that alarm after the current > one goes off if we aren't able to program multiple alarms at the same > time. The HPET consists of several independent comparator units. The kernel uses only a limited set of them. We can reserve one or more for that purpose, so it does not require any multiplexing or whatever. How many of them do you need? Thanks, tglx
On Tue, Jul 18, 2017 at 3:22 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > On Tue, 18 Jul 2017, dbasehore . wrote: >> On Tue, Jul 18, 2017 at 2:53 PM, Thomas Gleixner <tglx@linutronix.de> wrote: >> > On Tue, 18 Jul 2017, dbasehore . wrote: >> >> On Mon, Jul 17, 2017 at 11:40 PM, Thomas Gleixner <tglx@linutronix.de> wrote: >> >> > On Mon, 17 Jul 2017, dbasehore . wrote: >> >> >> On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: >> >> >> I could make a patch to try it out. I would probably add a flag to rtc >> >> >> timers to indicate whether it wakes the system (default true). We >> >> >> would have to add a sync with the rtc irq and the rtc irqwork. I would >> >> >> probably add a rtc_timer_sync function that would flush the rtc irq >> >> >> and flush the irqwork. I would call this after the freeze_ops sync >> >> >> function since the sci irq needs to finish before syncing with the rtc >> >> >> irq. Also, pm_wakeup_irq seems racy with the current implementation of >> >> >> s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq >> >> >> when something else actually triggered the full wakeup. Fortunately, I >> >> >> don't think pm_wakeup_irq is used for anything except debugging, but >> >> >> we might change that. >> >> > >> >> > There is another option which you might consider. We can reserve one of the >> >> > HPET comparators for that purpose and have a special interrupt handler for >> >> > it. Checking the HPET for expiry from the low level code should be trivial. >> >> > >> >> Does that handle setting up new timers properly or does the timer sync >> >> code still need to be written? >> > >> > Sorry, I don't understand the question. What is timer sync code? >> > >> >> Does the comparator allow you to have a completely separate alarm set >> in the hardware? If there's another timer setup (say some user >> specified wake alarm), we need to program that alarm after the current >> one goes off if we aren't able to program multiple alarms at the same >> time. > > The HPET consists of several independent comparator units. The kernel uses > only a limited set of them. We can reserve one or more for that purpose, so > it does not require any multiplexing or whatever. How many of them do you > need? > I just need one. Is it a different irq than the RTC? It would be nice if we could avoid going through the s2idle_loop code. > Thanks, > > tglx >
On Tue, 18 Jul 2017, dbasehore . wrote: > On Tue, Jul 18, 2017 at 3:22 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > > The HPET consists of several independent comparator units. The kernel uses > > only a limited set of them. We can reserve one or more for that purpose, so > > it does not require any multiplexing or whatever. How many of them do you > > need? > > > > I just need one. Is it a different irq than the RTC? It would be nice > if we could avoid going through the s2idle_loop code. Yes, it is a different IRQ. Thanks, tglx
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index a116926598fd..6a3f30008020 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -66,12 +66,20 @@ enum clock_event_state { */ # define CLOCK_EVT_FEAT_HRTIMER 0x000080 +/* + * Clockevent device may run during freeze + */ +# define CLOCK_EVT_FEAT_FREEZE_NONSTOP 0x000100 + /** * struct clock_event_device - clock event device descriptor * @event_handler: Assigned by the framework to be called by the low * level handler of the event source * @set_next_event: set next event function using a clocksource delta * @set_next_ktime: set next event function using a direct ktime value + * @event_expired: check if the programmed event is expired. Used for + * freeze events when timekeeping is suspended and + * irqs are disabled. * @next_event: local storage for the next event in oneshot mode * @max_delta_ns: maximum delta value in ns * @min_delta_ns: minimum delta value in ns @@ -100,6 +108,7 @@ struct clock_event_device { void (*event_handler)(struct clock_event_device *); int (*set_next_event)(unsigned long evt, struct clock_event_device *); int (*set_next_ktime)(ktime_t expires, struct clock_event_device *); + int (*event_expired)(struct clock_event_device *); ktime_t next_event; u64 max_delta_ns; u64 min_delta_ns; diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 0b1cf32edfd7..1d56269a7b31 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -248,6 +248,8 @@ static inline bool idle_should_freeze(void) } extern void __init pm_states_init(void); +int tick_set_freeze_event(int cpu, ktime_t expires); +int tick_clear_freeze_event(int cpu); extern void freeze_set_ops(const struct platform_freeze_ops *ops); extern void freeze_wake(void); diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index 49edc1c4f3e6..688d1c0cad10 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -498,6 +498,97 @@ void tick_freeze(void) raw_spin_unlock(&tick_freeze_lock); } +/** + * tick_set_freeze_event - Set timer to wake up the CPU from freeze. + * + * @cpu: CPU to set the clock event for + * @delta: time to wait before waking the CPU + * + * Returns 0 on success and -EERROR on failure. + */ +int tick_set_freeze_event(int cpu, ktime_t delta) +{ + struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev; + u64 delta_ns; + int ret; + + if (!dev->set_next_event || + !(dev->features & CLOCK_EVT_FEAT_FREEZE_NONSTOP)) { + printk_deferred(KERN_WARNING + "[%s] unsupported by clock event device\n", + __func__); + return -EPERM; + } + + if (!clockevent_state_shutdown(dev)) { + printk_deferred(KERN_WARNING + "[%s] clock event device in use\n", + __func__); + return -EBUSY; + } + + delta_ns = ktime_to_ns(delta); + if (delta_ns > dev->max_delta_ns || delta_ns < dev->min_delta_ns) { + printk_deferred(KERN_WARNING + "[%s] %lluns outside range: [%lluns, %lluns]\n", + __func__, delta_ns, dev->min_delta_ns, + dev->max_delta_ns); + return -ERANGE; + } + + clockevents_tick_resume(dev); + clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT); + ret = dev->set_next_event((delta_ns * dev->mult) >> dev->shift, dev); + if (ret < 0) { + printk_deferred(KERN_WARNING + "Failed to program freeze event\n"); + clockevents_shutdown(dev); + } + + return ret; +} +EXPORT_SYMBOL_GPL(tick_set_freeze_event); + +/** + * tick_freeze_event_expired - Check if the programmed freeze event expired + * + * @cpu: CPU to check the clock event device for an expired event + * + * Returns 1 if the event expired and 0 otherwise. + */ +int tick_freeze_event_expired(int cpu) +{ + struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev; + + if (!(dev && dev->event_expired)) + return 0; + + return dev->event_expired(dev); +} + +/** + * tick_clear_freeze_event - Shuts down the clock device after programming a + * freeze event. + * + * @cpu: CPU to shutdown the clock device for + * + * Returns 0 on success and -EERROR otherwise. + */ +int tick_clear_freeze_event(int cpu) +{ + struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev; + + if (!dev) + return -ENODEV; + + if (!clockevent_state_oneshot(dev)) + return -EBUSY; + + clockevents_shutdown(dev); + return 0; +} +EXPORT_SYMBOL_GPL(tick_clear_freeze_event); + /** * tick_unfreeze - Resume the local tick and (possibly) timekeeping. *
Adds a new feature to tick to schedule wakeups on a CPU during freeze. This won't fully wake up the system (devices are not resumed), but allow simple platform functionality to be run during freeze with little power impact. This implementation allows an idle driver to setup a timer event with the clock event device when entering freeze by calling tick_set_freeze_event. Only one caller should exist for the function. tick_freeze_event_expired is used to check if the timer went off when the CPU wakes. The event is cleared by tick_clear_freeze_event. Signed-off-by: Derek Basehore <dbasehore@chromium.org> --- include/linux/clockchips.h | 9 +++++ include/linux/suspend.h | 2 + kernel/time/tick-common.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 102 insertions(+)