[v5,2/5] tick: Add freeze timer events

Message ID	20170708000303.21863-2-dbasehore@chromium.org (mailing list archive)
State	Deferred, archived
Delegated to:	Andy Shevchenko
Headers	show Return-Path: <platform-driver-x86-owner@kernel.org> From: Derek Basehore <dbasehore@chromium.org> To: linux-kernel@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com>, x86@kernel.org, platform-driver-x86@vger.kernel.org, "Rafael J . Wysocki" <rjw@rjwysocki.net>, Len Brown <len.brown@intel.com>, linux-pm@vger.kernel.org, Derek Basehore <dbasehore@chromium.org> Subject: [PATCH v5 2/5] tick: Add freeze timer events Date: Fri, 7 Jul 2017 17:03:00 -0700 Message-Id: <20170708000303.21863-2-dbasehore@chromium.org> In-Reply-To: <20170708000303.21863-1-dbasehore@chromium.org> References: <20170708000303.21863-1-dbasehore@chromium.org> Sender: platform-driver-x86-owner@vger.kernel.org Precedence: bulk

Derek Basehore July 8, 2017, 12:03 a.m. UTC

Adds a new feature to tick to schedule wakeups on a CPU during freeze.
This won't fully wake up the system (devices are not resumed), but
allow simple platform functionality to be run during freeze with
little power impact.

This implementation allows an idle driver to setup a timer event with
the clock event device when entering freeze by calling
tick_set_freeze_event. Only one caller should exist for the function.

tick_freeze_event_expired is used to check if the timer went off when
the CPU wakes.

The event is cleared by tick_clear_freeze_event.

Signed-off-by: Derek Basehore <dbasehore@chromium.org>
---
 include/linux/clockchips.h |  9 +++++
 include/linux/suspend.h    |  2 +
 kernel/time/tick-common.c  | 91 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+)

Andy Shevchenko July 8, 2017, 4:05 p.m. UTC | #1

On Sat, Jul 8, 2017 at 3:03 AM, Derek Basehore <dbasehore@chromium.org> wrote:
> Adds a new feature to tick to schedule wakeups on a CPU during freeze.
> This won't fully wake up the system (devices are not resumed), but
> allow simple platform functionality to be run during freeze with
> little power impact.
>
> This implementation allows an idle driver to setup a timer event with
> the clock event device when entering freeze by calling
> tick_set_freeze_event. Only one caller should exist for the function.
>
> tick_freeze_event_expired is used to check if the timer went off when
> the CPU wakes.
>
> The event is cleared by tick_clear_freeze_event.

> +int tick_set_freeze_event(int cpu, ktime_t delta)
> +{

> +               printk_deferred(KERN_WARNING
> +                               "[%s] unsupported by clock event device\n",

Can it be one line?

> +               printk_deferred(KERN_WARNING
> +                               "[%s] clock event device in use\n",

Ditto.

> +               printk_deferred(KERN_WARNING
> +                               "[%s] %lluns outside range: [%lluns, %lluns]\n",

Ditto.

> +               printk_deferred(KERN_WARNING
> +                               "Failed to program freeze event\n");

Ditto.

> +int tick_freeze_event_expired(int cpu)
> +{
> +       struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev;
> +
> +       if (!(dev && dev->event_expired))

Usually we use a pattern (!x || !x->y). At least for me it looks
slightly better to read.

> +               return 0;
> +
> +       return dev->event_expired(dev);
> +}

Rafael J. Wysocki July 10, 2017, 12:53 p.m. UTC | #2

On Friday, July 07, 2017 05:03:00 PM Derek Basehore wrote:
> Adds a new feature to tick to schedule wakeups on a CPU during freeze.

This is referred to "suspend-to-idle" nowadays.

I guess I need to update the code to be more consistent with respect to the
terminology.

Thanks,
Rafael

Derek Basehore July 10, 2017, 9:11 p.m. UTC | #3

On Sat, Jul 8, 2017 at 9:05 AM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Sat, Jul 8, 2017 at 3:03 AM, Derek Basehore <dbasehore@chromium.org> wrote:
>> Adds a new feature to tick to schedule wakeups on a CPU during freeze.
>> This won't fully wake up the system (devices are not resumed), but
>> allow simple platform functionality to be run during freeze with
>> little power impact.
>>
>> This implementation allows an idle driver to setup a timer event with
>> the clock event device when entering freeze by calling
>> tick_set_freeze_event. Only one caller should exist for the function.
>>
>> tick_freeze_event_expired is used to check if the timer went off when
>> the CPU wakes.
>>
>> The event is cleared by tick_clear_freeze_event.
>
>> +int tick_set_freeze_event(int cpu, ktime_t delta)
>> +{
>
>> +               printk_deferred(KERN_WARNING
>> +                               "[%s] unsupported by clock event device\n",
>
> Can it be one line?

Sure. It seems that some of these lines were at 80 characters on one
line anyways. Putting some of these on one line breaks the 80
character limit and doesn't help with grepping through code, though.

>
>> +               printk_deferred(KERN_WARNING
>> +                               "[%s] clock event device in use\n",
>
> Ditto.
>
>> +               printk_deferred(KERN_WARNING
>> +                               "[%s] %lluns outside range: [%lluns, %lluns]\n",
>
> Ditto.
>
>> +               printk_deferred(KERN_WARNING
>> +                               "Failed to program freeze event\n");
>
> Ditto.
>
>> +int tick_freeze_event_expired(int cpu)
>> +{
>> +       struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev;
>> +
>> +       if (!(dev && dev->event_expired))
>
> Usually we use a pattern (!x || !x->y). At least for me it looks
> slightly better to read.

Will do.

>
>> +               return 0;
>> +
>> +       return dev->event_expired(dev);
>> +}
>
> --
> With Best Regards,
> Andy Shevchenko

Thomas Gleixner July 12, 2017, 9:25 p.m. UTC | #4

On Fri, 7 Jul 2017, Derek Basehore wrote:
> Adds a new feature to tick to schedule wakeups on a CPU during freeze.
> This won't fully wake up the system (devices are not resumed), but
> allow simple platform functionality to be run during freeze with
> little power impact.
> 
> This implementation allows an idle driver to setup a timer event with
> the clock event device when entering freeze by calling
> tick_set_freeze_event. Only one caller should exist for the function.

Emphasis on should.

> tick_freeze_event_expired is used to check if the timer went off when
> the CPU wakes.

That makes me shudder.

> +/*
> + * Clockevent device may run during freeze
> + */
> +# define CLOCK_EVT_FEAT_FREEZE_NONSTOP	0x000100

Is that really restricted to freezing?

> +/**
> + * tick_set_freeze_event - Set timer to wake up the CPU from freeze.
> + *
> + * @cpu:	CPU to set the clock event for
> + * @delta:	time to wait before waking the CPU
> + *
> + * Returns 0 on success and -EERROR on failure.
> + */
> +int tick_set_freeze_event(int cpu, ktime_t delta)
> +{
> +	struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev;

This is fundamentally wrong. If that is invoked for

     cpu != smp_processor_id()

then everything below is utter crap because you CANNOT access a real per
cpu timer of a remote CPU. x86 will silently fiddle on the current CPU and
others will create an utter mess or simply crash and burn.

The only way to use this is w/o the 'cpu' argument and restricted to the
clock events device of the CPU on which this is invoked.

> +	u64 delta_ns;
> +	int ret;
> +
> +	if (!dev->set_next_event ||

We have a feature flag for that.

> +	    !(dev->features & CLOCK_EVT_FEAT_FREEZE_NONSTOP)) {
> +		printk_deferred(KERN_WARNING
> +				"[%s] unsupported by clock event device\n",
> +				__func__);

Please get rid of these __func__ uglies. And looking at it, get rid of all
of this printk stuff. You have proper return codes, so the call site can
act accordingly.

> +		return -EPERM;
> +	}
> +
> +	if (!clockevent_state_shutdown(dev)) {

What puts the device in shutdown mode when the machine is in freeze state?

> +		printk_deferred(KERN_WARNING
> +				"[%s] clock event device in use\n",
> +				__func__);
> +		return -EBUSY;
> +	}
> +
> +	delta_ns = ktime_to_ns(delta);
> +	if (delta_ns > dev->max_delta_ns || delta_ns < dev->min_delta_ns) {
> +		printk_deferred(KERN_WARNING
> +				"[%s] %lluns outside range: [%lluns, %lluns]\n",
> +				__func__, delta_ns, dev->min_delta_ns,
> +				dev->max_delta_ns);
> +		return -ERANGE;
> +	}
> +
> +	clockevents_tick_resume(dev);

That looks wrong as well. What did call suspend on that device?

I'm not aware that freeze will actually call suspend on anything down deep
in the core code. Can you please explain this magic here?

> +	clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT);
> +	ret = dev->set_next_event((delta_ns * dev->mult) >> dev->shift, dev);
> +	if (ret < 0) {
> +		printk_deferred(KERN_WARNING
> +				"Failed to program freeze event\n");
> +		clockevents_shutdown(dev);
> +	}
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tick_set_freeze_event);
> +
> +/**
> + * tick_freeze_event_expired - Check if the programmed freeze event expired
> + *
> + * @cpu:	CPU to check the clock event device for an expired event
> + *
> + * Returns 1 if the event expired and 0 otherwise.
> + */
> +int tick_freeze_event_expired(int cpu)
> +{
> +	struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev;
> +
> +	if (!(dev && dev->event_expired))
> +		return 0;
> +
> +	return dev->event_expired(dev);

Same issue vs. the cpu argument as above.

> +}
> +
> +/**
> + * tick_clear_freeze_event - Shuts down the clock device after programming a
> + * freeze event.
> + *
> + * @cpu:	CPU to shutdown the clock device for
> + *
> + * Returns 0 on success and -EERROR otherwise.
> + */
> +int tick_clear_freeze_event(int cpu)
> +{
> +	struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev;

Ditto.

> +	if (!dev)
> +		return -ENODEV;
> +
> +	if (!clockevent_state_oneshot(dev))
> +		return -EBUSY;

All of this lacks an explanation how any of this is safe vs. the normal
operation of clock event devices and the tick device code.

This lacks documentation of calling conventions and checks which make sure
they are obeyed.

Thanks,

	tglx

Derek Basehore July 13, 2017, 1:18 a.m. UTC | #5

On Wed, Jul 12, 2017 at 2:25 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 7 Jul 2017, Derek Basehore wrote:
>> Adds a new feature to tick to schedule wakeups on a CPU during freeze.
>> This won't fully wake up the system (devices are not resumed), but
>> allow simple platform functionality to be run during freeze with
>> little power impact.
>>
>> This implementation allows an idle driver to setup a timer event with
>> the clock event device when entering freeze by calling
>> tick_set_freeze_event. Only one caller should exist for the function.
>
> Emphasis on should.
>
>> tick_freeze_event_expired is used to check if the timer went off when
>> the CPU wakes.
>
> That makes me shudder.
>
>> +/*
>> + * Clockevent device may run during freeze
>> + */
>> +# define CLOCK_EVT_FEAT_FREEZE_NONSTOP       0x000100
>
> Is that really restricted to freezing?

I'm not going to make the call on what firmware may do when it enters
suspend. Even if it's not supposed to do anything, it still might.

>
>> +/**
>> + * tick_set_freeze_event - Set timer to wake up the CPU from freeze.
>> + *
>> + * @cpu:     CPU to set the clock event for
>> + * @delta:   time to wait before waking the CPU
>> + *
>> + * Returns 0 on success and -EERROR on failure.
>> + */
>> +int tick_set_freeze_event(int cpu, ktime_t delta)
>> +{
>> +     struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev;
>
> This is fundamentally wrong. If that is invoked for
>
>      cpu != smp_processor_id()
>
> then everything below is utter crap because you CANNOT access a real per
> cpu timer of a remote CPU. x86 will silently fiddle on the current CPU and
> others will create an utter mess or simply crash and burn.
>
> The only way to use this is w/o the 'cpu' argument and restricted to the
> clock events device of the CPU on which this is invoked.
>
>> +     u64 delta_ns;
>> +     int ret;
>> +
>> +     if (!dev->set_next_event ||
>
> We have a feature flag for that.
>
>> +         !(dev->features & CLOCK_EVT_FEAT_FREEZE_NONSTOP)) {
>> +             printk_deferred(KERN_WARNING
>> +                             "[%s] unsupported by clock event device\n",
>> +                             __func__);
>
> Please get rid of these __func__ uglies. And looking at it, get rid of all
> of this printk stuff. You have proper return codes, so the call site can
> act accordingly.
>
>> +             return -EPERM;
>> +     }
>> +
>> +     if (!clockevent_state_shutdown(dev)) {
>
> What puts the device in shutdown mode when the machine is in freeze state?

tick_freeze does through timekeeping_suspend or tick_suspend_local
depending on whether it's the last CPU to freeze or not.

>
>> +             printk_deferred(KERN_WARNING
>> +                             "[%s] clock event device in use\n",
>> +                             __func__);
>> +             return -EBUSY;
>> +     }
>> +
>> +     delta_ns = ktime_to_ns(delta);
>> +     if (delta_ns > dev->max_delta_ns || delta_ns < dev->min_delta_ns) {
>> +             printk_deferred(KERN_WARNING
>> +                             "[%s] %lluns outside range: [%lluns, %lluns]\n",
>> +                             __func__, delta_ns, dev->min_delta_ns,
>> +                             dev->max_delta_ns);
>> +             return -ERANGE;
>> +     }
>> +
>> +     clockevents_tick_resume(dev);
>
> That looks wrong as well. What did call suspend on that device?
>
> I'm not aware that freeze will actually call suspend on anything down deep
> in the core code. Can you please explain this magic here?

tick_freeze in tick-common.c calls the tick suspend code.

>
>> +     clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT);
>> +     ret = dev->set_next_event((delta_ns * dev->mult) >> dev->shift, dev);
>> +     if (ret < 0) {
>> +             printk_deferred(KERN_WARNING
>> +                             "Failed to program freeze event\n");
>> +             clockevents_shutdown(dev);
>> +     }
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(tick_set_freeze_event);
>> +
>> +/**
>> + * tick_freeze_event_expired - Check if the programmed freeze event expired
>> + *
>> + * @cpu:     CPU to check the clock event device for an expired event
>> + *
>> + * Returns 1 if the event expired and 0 otherwise.
>> + */
>> +int tick_freeze_event_expired(int cpu)
>> +{
>> +     struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev;
>> +
>> +     if (!(dev && dev->event_expired))
>> +             return 0;
>> +
>> +     return dev->event_expired(dev);
>
> Same issue vs. the cpu argument as above.
>
>> +}
>> +
>> +/**
>> + * tick_clear_freeze_event - Shuts down the clock device after programming a
>> + * freeze event.
>> + *
>> + * @cpu:     CPU to shutdown the clock device for
>> + *
>> + * Returns 0 on success and -EERROR otherwise.
>> + */
>> +int tick_clear_freeze_event(int cpu)
>> +{
>> +     struct clock_event_device *dev = per_cpu(tick_cpu_device, cpu).evtdev;
>
> Ditto.
>
>> +     if (!dev)
>> +             return -ENODEV;
>> +
>> +     if (!clockevent_state_oneshot(dev))
>> +             return -EBUSY;
>
> All of this lacks an explanation how any of this is safe vs. the normal
> operation of clock event devices and the tick device code.
>
> This lacks documentation of calling conventions and checks which make sure
> they are obeyed.

If I get rid of passing in the cpu id, the only thing left to check
seems to be making sure that tick_clear_freeze_event is called on the
same CPU as tick_set_freeze_event. Am I missing something? I'll add
Documentation.

>
> Thanks,
>
>         tglx

Thomas Gleixner July 13, 2017, 4:54 a.m. UTC | #6

Derek,

On Wed, 12 Jul 2017, dbasehore . wrote:
> On Wed, Jul 12, 2017 at 2:25 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Fri, 7 Jul 2017, Derek Basehore wrote:
> >> +/*
> >> + * Clockevent device may run during freeze
> >> + */
> >> +# define CLOCK_EVT_FEAT_FREEZE_NONSTOP       0x000100
> >
> > Is that really restricted to freezing?
> 
> I'm not going to make the call on what firmware may do when it enters
> suspend. Even if it's not supposed to do anything, it still might.

The feature is a description of the hardware capability not of the
way how it's used by software.

> >> +     if (!clockevent_state_shutdown(dev)) {
> >
> > What puts the device in shutdown mode when the machine is in freeze state?
> 
> tick_freeze does through timekeeping_suspend or tick_suspend_local
> depending on whether it's the last CPU to freeze or not.

Ok. I completely forgot about the inner workings of freeze. So this check
for shutdown actually wants to have a printk, because that's a state which
is wrong. The nonavailability of oneshot mode is just factual information
that this is not possible and does not justify dmesg spam.

> >> +     clockevents_tick_resume(dev);
> >
> > That looks wrong as well. What did call suspend on that device?
> >
> > I'm not aware that freeze will actually call suspend on anything down deep
> > in the core code. Can you please explain this magic here?
> 
> tick_freeze in tick-common.c calls the tick suspend code.

Fair enough.

> > All of this lacks an explanation how any of this is safe vs. the normal
> > operation of clock event devices and the tick device code.
> >
> > This lacks documentation of calling conventions and checks which make sure
> > they are obeyed.
> 
> If I get rid of passing in the cpu id, the only thing left to check
> seems to be making sure that tick_clear_freeze_event is called on the
> same CPU as tick_set_freeze_event.

Yes, you want to store that information somewhere.

> Am I missing something? I'll add Documentation.

Please make that whole thing depend on a Kconfig option. There is no point
having the code and the exports there for everyone while it has only a
single user.

Thanks,

	tglx

Peter Zijlstra July 13, 2017, 7:32 a.m. UTC | #7

On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote:
> Adds a new feature to tick to schedule wakeups on a CPU during freeze.
> This won't fully wake up the system (devices are not resumed), but
> allow simple platform functionality to be run during freeze with
> little power impact.
> 
> This implementation allows an idle driver to setup a timer event with
> the clock event device when entering freeze by calling
> tick_set_freeze_event. Only one caller should exist for the function.
> 
> tick_freeze_event_expired is used to check if the timer went off when
> the CPU wakes.
> 
> The event is cleared by tick_clear_freeze_event.

Why? What's wrong with using the RTC stuff? RTC should be able to wake
suspended systems, see RTCWAKE(8).

Rafael J. Wysocki July 13, 2017, 3:09 p.m. UTC | #8

On Thu, Jul 13, 2017 at 9:32 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote:
>> Adds a new feature to tick to schedule wakeups on a CPU during freeze.
>> This won't fully wake up the system (devices are not resumed), but
>> allow simple platform functionality to be run during freeze with
>> little power impact.
>>
>> This implementation allows an idle driver to setup a timer event with
>> the clock event device when entering freeze by calling
>> tick_set_freeze_event. Only one caller should exist for the function.
>>
>> tick_freeze_event_expired is used to check if the timer went off when
>> the CPU wakes.
>>
>> The event is cleared by tick_clear_freeze_event.
>
> Why? What's wrong with using the RTC stuff? RTC should be able to wake
> suspended systems, see RTCWAKE(8).

The RTC interrupt is an SCI (on ACPI systems) and we don't handle it
at this point, so we don't know what woke us up until we re-enable
interrupt handlers and run the one for the SCI.

Derek Basehore July 13, 2017, 10:58 p.m. UTC | #9

On Thu, Jul 13, 2017 at 8:09 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Thu, Jul 13, 2017 at 9:32 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote:
>>> Adds a new feature to tick to schedule wakeups on a CPU during freeze.
>>> This won't fully wake up the system (devices are not resumed), but
>>> allow simple platform functionality to be run during freeze with
>>> little power impact.
>>>
>>> This implementation allows an idle driver to setup a timer event with
>>> the clock event device when entering freeze by calling
>>> tick_set_freeze_event. Only one caller should exist for the function.
>>>
>>> tick_freeze_event_expired is used to check if the timer went off when
>>> the CPU wakes.
>>>
>>> The event is cleared by tick_clear_freeze_event.
>>
>> Why? What's wrong with using the RTC stuff? RTC should be able to wake
>> suspended systems, see RTCWAKE(8).
>
> The RTC interrupt is an SCI (on ACPI systems) and we don't handle it
> at this point, so we don't know what woke us up until we re-enable
> interrupt handlers and run the one for the SCI.

To add to that point, RTC wake ups are valid for fully waking up the
system. The clock event wake up wasn't used for waking up the system
before, so we know that we don't have to check if it should wake up
the system entirely. The way rtc timers work right now, I think that
we'd have to go all the way through resume devices to figure out if we
should resume to user space or freeze again.

Rafael J. Wysocki July 15, 2017, 12:39 p.m. UTC | #10

On Thursday, July 13, 2017 03:58:53 PM dbasehore . wrote:
> On Thu, Jul 13, 2017 at 8:09 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Thu, Jul 13, 2017 at 9:32 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote:
> >>> Adds a new feature to tick to schedule wakeups on a CPU during freeze.
> >>> This won't fully wake up the system (devices are not resumed), but
> >>> allow simple platform functionality to be run during freeze with
> >>> little power impact.
> >>>
> >>> This implementation allows an idle driver to setup a timer event with
> >>> the clock event device when entering freeze by calling
> >>> tick_set_freeze_event. Only one caller should exist for the function.
> >>>
> >>> tick_freeze_event_expired is used to check if the timer went off when
> >>> the CPU wakes.
> >>>
> >>> The event is cleared by tick_clear_freeze_event.
> >>
> >> Why? What's wrong with using the RTC stuff? RTC should be able to wake
> >> suspended systems, see RTCWAKE(8).
> >
> > The RTC interrupt is an SCI (on ACPI systems) and we don't handle it
> > at this point, so we don't know what woke us up until we re-enable
> > interrupt handlers and run the one for the SCI.
> 
> To add to that point, RTC wake ups are valid for fully waking up the
> system. The clock event wake up wasn't used for waking up the system
> before, so we know that we don't have to check if it should wake up
> the system entirely. The way rtc timers work right now, I think that
> we'd have to go all the way through resume devices to figure out if we
> should resume to user space or freeze again.

Actually, that's not exactly the case any more.

After some changes that went in for 4.13-rc1 there is an additional decision
point in the resume path, after the noirq stage, where we can decide to go back
to sleep if that's the right thing to do.

This means that in principle you might hack the CMOS RTC driver to do something
more sophisticated than just calling pm_wakeup_hard_event() in rtc_handler().

That's ACPI-specific, but I think you have ACPI on all of the systems where the
residency counders are going to be checked anyway.

Thanks,
Rafael

Derek Basehore July 18, 2017, 12:30 a.m. UTC | #11

On Sat, Jul 15, 2017 at 5:39 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, July 13, 2017 03:58:53 PM dbasehore . wrote:
>> On Thu, Jul 13, 2017 at 8:09 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> > On Thu, Jul 13, 2017 at 9:32 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> >> On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote:
>> >>> Adds a new feature to tick to schedule wakeups on a CPU during freeze.
>> >>> This won't fully wake up the system (devices are not resumed), but
>> >>> allow simple platform functionality to be run during freeze with
>> >>> little power impact.
>> >>>
>> >>> This implementation allows an idle driver to setup a timer event with
>> >>> the clock event device when entering freeze by calling
>> >>> tick_set_freeze_event. Only one caller should exist for the function.
>> >>>
>> >>> tick_freeze_event_expired is used to check if the timer went off when
>> >>> the CPU wakes.
>> >>>
>> >>> The event is cleared by tick_clear_freeze_event.
>> >>
>> >> Why? What's wrong with using the RTC stuff? RTC should be able to wake
>> >> suspended systems, see RTCWAKE(8).
>> >
>> > The RTC interrupt is an SCI (on ACPI systems) and we don't handle it
>> > at this point, so we don't know what woke us up until we re-enable
>> > interrupt handlers and run the one for the SCI.
>>
>> To add to that point, RTC wake ups are valid for fully waking up the
>> system. The clock event wake up wasn't used for waking up the system
>> before, so we know that we don't have to check if it should wake up
>> the system entirely. The way rtc timers work right now, I think that
>> we'd have to go all the way through resume devices to figure out if we
>> should resume to user space or freeze again.
>
> Actually, that's not exactly the case any more.
>
> After some changes that went in for 4.13-rc1 there is an additional decision
> point in the resume path, after the noirq stage, where we can decide to go back
> to sleep if that's the right thing to do.
>
> This means that in principle you might hack the CMOS RTC driver to do something
> more sophisticated than just calling pm_wakeup_hard_event() in rtc_handler().
>
> That's ACPI-specific, but I think you have ACPI on all of the systems where the
> residency counders are going to be checked anyway.

This will take more power than the current implementation I have.
While I'm fine with that since the power draw would probably be within
100uW to 1mW, it's worth noting. This is because of the added overhead
of noirq suspend and resume which take a combined time of about 50 to
100 ms depending on the platform. The implementation that used the
APIC uses about 3uW.

Rather than make the change in rtc_handler for the CMOS RTC driver,
the change might be better in the drivers/rtc/interface.c code to
better handle multiple RTC alarms. For example, there might be 2
alarms set for the same time (one that won't wake the system and one
that will) or 2 alarms 1 second apart. In the later case, it's
possible that 1 second will pass before the second alarm is scheduled.
We need to make sure that the RTC irq runs before calling
dpm_suspend_noirq too.

If I remember correctly, I proposed using the RTC to wakeup for this
check to you, but you recommended using the APIC instead. This was of
course before the additional decision point was added to freeze.

>
> Thanks,
> Rafael
>

Rafael J. Wysocki July 18, 2017, 1:33 a.m. UTC | #12

On Tue, Jul 18, 2017 at 2:30 AM, dbasehore . <dbasehore@chromium.org> wrote:
> On Sat, Jul 15, 2017 at 5:39 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> On Thursday, July 13, 2017 03:58:53 PM dbasehore . wrote:
>>> On Thu, Jul 13, 2017 at 8:09 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>>> > On Thu, Jul 13, 2017 at 9:32 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> >> On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote:
>>> >>> Adds a new feature to tick to schedule wakeups on a CPU during freeze.
>>> >>> This won't fully wake up the system (devices are not resumed), but
>>> >>> allow simple platform functionality to be run during freeze with
>>> >>> little power impact.
>>> >>>
>>> >>> This implementation allows an idle driver to setup a timer event with
>>> >>> the clock event device when entering freeze by calling
>>> >>> tick_set_freeze_event. Only one caller should exist for the function.
>>> >>>
>>> >>> tick_freeze_event_expired is used to check if the timer went off when
>>> >>> the CPU wakes.
>>> >>>
>>> >>> The event is cleared by tick_clear_freeze_event.
>>> >>
>>> >> Why? What's wrong with using the RTC stuff? RTC should be able to wake
>>> >> suspended systems, see RTCWAKE(8).
>>> >
>>> > The RTC interrupt is an SCI (on ACPI systems) and we don't handle it
>>> > at this point, so we don't know what woke us up until we re-enable
>>> > interrupt handlers and run the one for the SCI.
>>>
>>> To add to that point, RTC wake ups are valid for fully waking up the
>>> system. The clock event wake up wasn't used for waking up the system
>>> before, so we know that we don't have to check if it should wake up
>>> the system entirely. The way rtc timers work right now, I think that
>>> we'd have to go all the way through resume devices to figure out if we
>>> should resume to user space or freeze again.
>>
>> Actually, that's not exactly the case any more.
>>
>> After some changes that went in for 4.13-rc1 there is an additional decision
>> point in the resume path, after the noirq stage, where we can decide to go back
>> to sleep if that's the right thing to do.
>>
>> This means that in principle you might hack the CMOS RTC driver to do something
>> more sophisticated than just calling pm_wakeup_hard_event() in rtc_handler().
>>
>> That's ACPI-specific, but I think you have ACPI on all of the systems where the
>> residency counders are going to be checked anyway.
>
> This will take more power than the current implementation I have.
> While I'm fine with that since the power draw would probably be within
> 100uW to 1mW, it's worth noting. This is because of the added overhead
> of noirq suspend and resume which take a combined time of about 50 to
> 100 ms depending on the platform. The implementation that used the
> APIC uses about 3uW.

That's correct, but I'm not quite sure how much of a difference it
makes in practice.

> Rather than make the change in rtc_handler for the CMOS RTC driver,
> the change might be better in the drivers/rtc/interface.c code to
> better handle multiple RTC alarms. For example, there might be 2
> alarms set for the same time (one that won't wake the system and one
> that will) or 2 alarms 1 second apart. In the later case, it's
> possible that 1 second will pass before the second alarm is scheduled.
> We need to make sure that the RTC irq runs before calling
> dpm_suspend_noirq too.

Well, I guess the choice will depend on which patch looks more
straightforward. :-)

> If I remember correctly, I proposed using the RTC to wakeup for this
> check to you, but you recommended using the APIC instead. This was of
> course before the additional decision point was added to freeze.

Right.  That's why I recommended it at that time.

Thanks,
Rafael

Derek Basehore July 18, 2017, 3:52 a.m. UTC | #13

On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Jul 18, 2017 at 2:30 AM, dbasehore . <dbasehore@chromium.org> wrote:
>> On Sat, Jul 15, 2017 at 5:39 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> On Thursday, July 13, 2017 03:58:53 PM dbasehore . wrote:
>>>> On Thu, Jul 13, 2017 at 8:09 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>>>> > On Thu, Jul 13, 2017 at 9:32 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>> >> On Fri, Jul 07, 2017 at 05:03:00PM -0700, Derek Basehore wrote:
>>>> >>> Adds a new feature to tick to schedule wakeups on a CPU during freeze.
>>>> >>> This won't fully wake up the system (devices are not resumed), but
>>>> >>> allow simple platform functionality to be run during freeze with
>>>> >>> little power impact.
>>>> >>>
>>>> >>> This implementation allows an idle driver to setup a timer event with
>>>> >>> the clock event device when entering freeze by calling
>>>> >>> tick_set_freeze_event. Only one caller should exist for the function.
>>>> >>>
>>>> >>> tick_freeze_event_expired is used to check if the timer went off when
>>>> >>> the CPU wakes.
>>>> >>>
>>>> >>> The event is cleared by tick_clear_freeze_event.
>>>> >>
>>>> >> Why? What's wrong with using the RTC stuff? RTC should be able to wake
>>>> >> suspended systems, see RTCWAKE(8).
>>>> >
>>>> > The RTC interrupt is an SCI (on ACPI systems) and we don't handle it
>>>> > at this point, so we don't know what woke us up until we re-enable
>>>> > interrupt handlers and run the one for the SCI.
>>>>
>>>> To add to that point, RTC wake ups are valid for fully waking up the
>>>> system. The clock event wake up wasn't used for waking up the system
>>>> before, so we know that we don't have to check if it should wake up
>>>> the system entirely. The way rtc timers work right now, I think that
>>>> we'd have to go all the way through resume devices to figure out if we
>>>> should resume to user space or freeze again.
>>>
>>> Actually, that's not exactly the case any more.
>>>
>>> After some changes that went in for 4.13-rc1 there is an additional decision
>>> point in the resume path, after the noirq stage, where we can decide to go back
>>> to sleep if that's the right thing to do.
>>>
>>> This means that in principle you might hack the CMOS RTC driver to do something
>>> more sophisticated than just calling pm_wakeup_hard_event() in rtc_handler().
>>>
>>> That's ACPI-specific, but I think you have ACPI on all of the systems where the
>>> residency counders are going to be checked anyway.
>>
>> This will take more power than the current implementation I have.
>> While I'm fine with that since the power draw would probably be within
>> 100uW to 1mW, it's worth noting. This is because of the added overhead
>> of noirq suspend and resume which take a combined time of about 50 to
>> 100 ms depending on the platform. The implementation that used the
>> APIC uses about 3uW.
>
> That's correct, but I'm not quite sure how much of a difference it
> makes in practice.
>
>> Rather than make the change in rtc_handler for the CMOS RTC driver,
>> the change might be better in the drivers/rtc/interface.c code to
>> better handle multiple RTC alarms. For example, there might be 2
>> alarms set for the same time (one that won't wake the system and one
>> that will) or 2 alarms 1 second apart. In the later case, it's
>> possible that 1 second will pass before the second alarm is scheduled.
>> We need to make sure that the RTC irq runs before calling
>> dpm_suspend_noirq too.
>
> Well, I guess the choice will depend on which patch looks more
> straightforward. :-)

I could make a patch to try it out. I would probably add a flag to rtc
timers to indicate whether it wakes the system (default true). We
would have to add a sync with the rtc irq and the rtc irqwork. I would
probably add a rtc_timer_sync function that would flush the rtc irq
and flush the irqwork. I would call this after the freeze_ops sync
function since the sci irq needs to finish before syncing with the rtc
irq. Also, pm_wakeup_irq seems racy with the current implementation of
s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq
when something else actually triggered the full wakeup. Fortunately, I
don't think pm_wakeup_irq is used for anything except debugging, but
we might change that.

>
>> If I remember correctly, I proposed using the RTC to wakeup for this
>> check to you, but you recommended using the APIC instead. This was of
>> course before the additional decision point was added to freeze.
>
> Right.  That's why I recommended it at that time.
>
> Thanks,
> Rafael

Thomas Gleixner July 18, 2017, 6:40 a.m. UTC | #14

On Mon, 17 Jul 2017, dbasehore . wrote:
> On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> I could make a patch to try it out. I would probably add a flag to rtc
> timers to indicate whether it wakes the system (default true). We
> would have to add a sync with the rtc irq and the rtc irqwork. I would
> probably add a rtc_timer_sync function that would flush the rtc irq
> and flush the irqwork. I would call this after the freeze_ops sync
> function since the sci irq needs to finish before syncing with the rtc
> irq. Also, pm_wakeup_irq seems racy with the current implementation of
> s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq
> when something else actually triggered the full wakeup. Fortunately, I
> don't think pm_wakeup_irq is used for anything except debugging, but
> we might change that.

There is another option which you might consider. We can reserve one of the
HPET comparators for that purpose and have a special interrupt handler for
it. Checking the HPET for expiry from the low level code should be trivial.

Thanks,

	tglx

Derek Basehore July 18, 2017, 8:09 p.m. UTC | #15

On Mon, Jul 17, 2017 at 11:40 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 17 Jul 2017, dbasehore . wrote:
>> On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> I could make a patch to try it out. I would probably add a flag to rtc
>> timers to indicate whether it wakes the system (default true). We
>> would have to add a sync with the rtc irq and the rtc irqwork. I would
>> probably add a rtc_timer_sync function that would flush the rtc irq
>> and flush the irqwork. I would call this after the freeze_ops sync
>> function since the sci irq needs to finish before syncing with the rtc
>> irq. Also, pm_wakeup_irq seems racy with the current implementation of
>> s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq
>> when something else actually triggered the full wakeup. Fortunately, I
>> don't think pm_wakeup_irq is used for anything except debugging, but
>> we might change that.
>
> There is another option which you might consider. We can reserve one of the
> HPET comparators for that purpose and have a special interrupt handler for
> it. Checking the HPET for expiry from the low level code should be trivial.
>

Does that handle setting up new timers properly or does the timer sync
code still need to be written?

> Thanks,
>
>         tglx
>
>

Thomas Gleixner July 18, 2017, 9:53 p.m. UTC | #16

On Tue, 18 Jul 2017, dbasehore . wrote:
> On Mon, Jul 17, 2017 at 11:40 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Mon, 17 Jul 2017, dbasehore . wrote:
> >> On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> I could make a patch to try it out. I would probably add a flag to rtc
> >> timers to indicate whether it wakes the system (default true). We
> >> would have to add a sync with the rtc irq and the rtc irqwork. I would
> >> probably add a rtc_timer_sync function that would flush the rtc irq
> >> and flush the irqwork. I would call this after the freeze_ops sync
> >> function since the sci irq needs to finish before syncing with the rtc
> >> irq. Also, pm_wakeup_irq seems racy with the current implementation of
> >> s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq
> >> when something else actually triggered the full wakeup. Fortunately, I
> >> don't think pm_wakeup_irq is used for anything except debugging, but
> >> we might change that.
> >
> > There is another option which you might consider. We can reserve one of the
> > HPET comparators for that purpose and have a special interrupt handler for
> > it. Checking the HPET for expiry from the low level code should be trivial.
> >
> Does that handle setting up new timers properly or does the timer sync
> code still need to be written?

Sorry, I don't understand the question. What is timer sync code?

Thanks,

	tglx

Derek Basehore July 18, 2017, 10:03 p.m. UTC | #17

On Tue, Jul 18, 2017 at 2:53 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 18 Jul 2017, dbasehore . wrote:
>> On Mon, Jul 17, 2017 at 11:40 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > On Mon, 17 Jul 2017, dbasehore . wrote:
>> >> On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> >> I could make a patch to try it out. I would probably add a flag to rtc
>> >> timers to indicate whether it wakes the system (default true). We
>> >> would have to add a sync with the rtc irq and the rtc irqwork. I would
>> >> probably add a rtc_timer_sync function that would flush the rtc irq
>> >> and flush the irqwork. I would call this after the freeze_ops sync
>> >> function since the sci irq needs to finish before syncing with the rtc
>> >> irq. Also, pm_wakeup_irq seems racy with the current implementation of
>> >> s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq
>> >> when something else actually triggered the full wakeup. Fortunately, I
>> >> don't think pm_wakeup_irq is used for anything except debugging, but
>> >> we might change that.
>> >
>> > There is another option which you might consider. We can reserve one of the
>> > HPET comparators for that purpose and have a special interrupt handler for
>> > it. Checking the HPET for expiry from the low level code should be trivial.
>> >
>> Does that handle setting up new timers properly or does the timer sync
>> code still need to be written?
>
> Sorry, I don't understand the question. What is timer sync code?
>

Does the comparator allow you to have a completely separate alarm set
in the hardware? If there's another timer setup (say some user
specified wake alarm), we need to program that alarm after the current
one goes off if we aren't able to program multiple alarms at the same
time.

> Thanks,
>
>         tglx

Thomas Gleixner July 18, 2017, 10:22 p.m. UTC | #18

On Tue, 18 Jul 2017, dbasehore . wrote:
> On Tue, Jul 18, 2017 at 2:53 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Tue, 18 Jul 2017, dbasehore . wrote:
> >> On Mon, Jul 17, 2017 at 11:40 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> > On Mon, 17 Jul 2017, dbasehore . wrote:
> >> >> On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> >> I could make a patch to try it out. I would probably add a flag to rtc
> >> >> timers to indicate whether it wakes the system (default true). We
> >> >> would have to add a sync with the rtc irq and the rtc irqwork. I would
> >> >> probably add a rtc_timer_sync function that would flush the rtc irq
> >> >> and flush the irqwork. I would call this after the freeze_ops sync
> >> >> function since the sci irq needs to finish before syncing with the rtc
> >> >> irq. Also, pm_wakeup_irq seems racy with the current implementation of
> >> >> s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq
> >> >> when something else actually triggered the full wakeup. Fortunately, I
> >> >> don't think pm_wakeup_irq is used for anything except debugging, but
> >> >> we might change that.
> >> >
> >> > There is another option which you might consider. We can reserve one of the
> >> > HPET comparators for that purpose and have a special interrupt handler for
> >> > it. Checking the HPET for expiry from the low level code should be trivial.
> >> >
> >> Does that handle setting up new timers properly or does the timer sync
> >> code still need to be written?
> >
> > Sorry, I don't understand the question. What is timer sync code?
> >
> 
> Does the comparator allow you to have a completely separate alarm set
> in the hardware? If there's another timer setup (say some user
> specified wake alarm), we need to program that alarm after the current
> one goes off if we aren't able to program multiple alarms at the same
> time.

The HPET consists of several independent comparator units. The kernel uses
only a limited set of them. We can reserve one or more for that purpose, so
it does not require any multiplexing or whatever. How many of them do you
need?

Thanks,

	tglx

Derek Basehore July 18, 2017, 10:37 p.m. UTC | #19

On Tue, Jul 18, 2017 at 3:22 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 18 Jul 2017, dbasehore . wrote:
>> On Tue, Jul 18, 2017 at 2:53 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > On Tue, 18 Jul 2017, dbasehore . wrote:
>> >> On Mon, Jul 17, 2017 at 11:40 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> >> > On Mon, 17 Jul 2017, dbasehore . wrote:
>> >> >> On Mon, Jul 17, 2017 at 6:33 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> >> >> I could make a patch to try it out. I would probably add a flag to rtc
>> >> >> timers to indicate whether it wakes the system (default true). We
>> >> >> would have to add a sync with the rtc irq and the rtc irqwork. I would
>> >> >> probably add a rtc_timer_sync function that would flush the rtc irq
>> >> >> and flush the irqwork. I would call this after the freeze_ops sync
>> >> >> function since the sci irq needs to finish before syncing with the rtc
>> >> >> irq. Also, pm_wakeup_irq seems racy with the current implementation of
>> >> >> s2idle_loop since the RTC irq could be mistakenly set as pm_wakeup_irq
>> >> >> when something else actually triggered the full wakeup. Fortunately, I
>> >> >> don't think pm_wakeup_irq is used for anything except debugging, but
>> >> >> we might change that.
>> >> >
>> >> > There is another option which you might consider. We can reserve one of the
>> >> > HPET comparators for that purpose and have a special interrupt handler for
>> >> > it. Checking the HPET for expiry from the low level code should be trivial.
>> >> >
>> >> Does that handle setting up new timers properly or does the timer sync
>> >> code still need to be written?
>> >
>> > Sorry, I don't understand the question. What is timer sync code?
>> >
>>
>> Does the comparator allow you to have a completely separate alarm set
>> in the hardware? If there's another timer setup (say some user
>> specified wake alarm), we need to program that alarm after the current
>> one goes off if we aren't able to program multiple alarms at the same
>> time.
>
> The HPET consists of several independent comparator units. The kernel uses
> only a limited set of them. We can reserve one or more for that purpose, so
> it does not require any multiplexing or whatever. How many of them do you
> need?
>

I just need one. Is it a different irq than the RTC? It would be nice
if we could avoid going through the s2idle_loop code.

> Thanks,
>
>         tglx
>

Thomas Gleixner July 18, 2017, 10:39 p.m. UTC | #20

On Tue, 18 Jul 2017, dbasehore . wrote:
> On Tue, Jul 18, 2017 at 3:22 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > The HPET consists of several independent comparator units. The kernel uses
> > only a limited set of them. We can reserve one or more for that purpose, so
> > it does not require any multiplexing or whatever. How many of them do you
> > need?
> >
> 
> I just need one. Is it a different irq than the RTC? It would be nice
> if we could avoid going through the s2idle_loop code.

Yes, it is a different IRQ.

Thanks,

	tglx

[v5,2/5] tick: Add freeze timer events

Commit Message

Comments

Patch