diff mbox

cpuidle: use high confidence factors only when considering polling

Message ID 20160318173551.675ff6e5@annuminas.surriel.com (mailing list archive)
State Superseded, archived
Delegated to: Rafael Wysocki
Headers show

Commit Message

Rik van Riel March 18, 2016, 9:35 p.m. UTC
On Fri, 18 Mar 2016 11:32:28 -0700
"Doug Smythies" <dsmythies@telus.net> wrote:

> Energy:
> Kernel 4.5-rc7-rjw10-rvr6: 55864 Joules
> 
> Trace data (very crude summary):
> Kernel 4.5-rc7-rjw10-rvr5: ~3049 long durations at high CPU load (idle state 0)
> Kernel 4.5-rc7-rjw10-rvr5: ~183 long durations at high, but less, CPU load (not all idle state 0)

This is a bit of a big hammer, but since the polling loop already uses
full CPU power anyway, this could be harmless, and solve your problem.

---8<---

Subject: cpuidle: break out of idle polling loop after HLT threshold

Staying in the poll loop can be beneficial for workloads that
wake back up very, very quickly, but staying in the poll loop
for too long can waste a lot of power.

Break out of the idle polling loop if the system has spent more
time in the loop than the threshold for entering the next power
saving mode.

The poll loop uses full CPU power already, so this will not
increase the power use of the poll loop.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 drivers/cpuidle/driver.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rafael J. Wysocki March 18, 2016, 9:48 p.m. UTC | #1
On Fri, Mar 18, 2016 at 10:35 PM, Rik van Riel <riel@redhat.com> wrote:
> On Fri, 18 Mar 2016 11:32:28 -0700
> "Doug Smythies" <dsmythies@telus.net> wrote:
>
>> Energy:
>> Kernel 4.5-rc7-rjw10-rvr6: 55864 Joules
>>
>> Trace data (very crude summary):
>> Kernel 4.5-rc7-rjw10-rvr5: ~3049 long durations at high CPU load (idle state 0)
>> Kernel 4.5-rc7-rjw10-rvr5: ~183 long durations at high, but less, CPU load (not all idle state 0)
>
> This is a bit of a big hammer, but since the polling loop already uses
> full CPU power anyway, this could be harmless, and solve your problem.

Yes, it would be good to test this.

No, I don't like it.

If we are to break out of the polling loop after a specific time, we
shouldn't be using polling at all in the first place.

So I very much prefer to compare data->next_timer_us with the target
residency of C1 instead of doing this.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rik van Riel March 18, 2016, 9:52 p.m. UTC | #2
On Fri, 2016-03-18 at 22:48 +0100, Rafael J. Wysocki wrote:
> On Fri, Mar 18, 2016 at 10:35 PM, Rik van Riel <riel@redhat.com>
> wrote:
> > 
> > On Fri, 18 Mar 2016 11:32:28 -0700
> > "Doug Smythies" <dsmythies@telus.net> wrote:
> > 
> > > 
> > > Energy:
> > > Kernel 4.5-rc7-rjw10-rvr6: 55864 Joules
> > > 
> > > Trace data (very crude summary):
> > > Kernel 4.5-rc7-rjw10-rvr5: ~3049 long durations at high CPU load
> > > (idle state 0)
> > > Kernel 4.5-rc7-rjw10-rvr5: ~183 long durations at high, but less,
> > > CPU load (not all idle state 0)
> > This is a bit of a big hammer, but since the polling loop already
> > uses
> > full CPU power anyway, this could be harmless, and solve your
> > problem.
> Yes, it would be good to test this.
> 
> No, I don't like it.
> 
> If we are to break out of the polling loop after a specific time, we
> shouldn't be using polling at all in the first place.
> 
> So I very much prefer to compare data->next_timer_us with the target
> residency of C1 instead of doing this.

The problem is that data->next_timer_us does not tell us
when the next network packet is about to arrive, though.

This could cause issues when dealing with a fast stream
of network traffic, which get_typical_interval() would
be able to deal with.
Rafael J. Wysocki March 18, 2016, 10:09 p.m. UTC | #3
On Friday, March 18, 2016 05:52:49 PM Rik van Riel wrote:
> On Fri, 2016-03-18 at 22:48 +0100, Rafael J. Wysocki wrote:
> > On Fri, Mar 18, 2016 at 10:35 PM, Rik van Riel <riel@redhat.com>
> > wrote:
> > > 
> > > On Fri, 18 Mar 2016 11:32:28 -0700
> > > "Doug Smythies" <dsmythies@telus.net> wrote:
> > > 
> > > > 
> > > > Energy:
> > > > Kernel 4.5-rc7-rjw10-rvr6: 55864 Joules
> > > > 
> > > > Trace data (very crude summary):
> > > > Kernel 4.5-rc7-rjw10-rvr5: ~3049 long durations at high CPU load
> > > > (idle state 0)
> > > > Kernel 4.5-rc7-rjw10-rvr5: ~183 long durations at high, but less,
> > > > CPU load (not all idle state 0)
> > > This is a bit of a big hammer, but since the polling loop already
> > > uses
> > > full CPU power anyway, this could be harmless, and solve your
> > > problem.
> > Yes, it would be good to test this.
> > 
> > No, I don't like it.
> > 
> > If we are to break out of the polling loop after a specific time, we
> > shouldn't be using polling at all in the first place.
> > 
> > So I very much prefer to compare data->next_timer_us with the target
> > residency of C1 instead of doing this.
> 
> The problem is that data->next_timer_us does not tell us
> when the next network packet is about to arrive, though.
> 
> This could cause issues when dealing with a fast stream
> of network traffic, which get_typical_interval() would
> be able to deal with.

If it predicted things correctly for small intervals, that is.

It doesn't seem to be able to do that, however.

You seem to want to trade energy consumption for performance with a very narrow
use case in mind and that by changing bahavior that's been there pretty much
forever, so by now it really has become the expected one.

Needless to say I'm not a fan of changes like that.
Rik van Riel March 18, 2016, 10:28 p.m. UTC | #4
On Fri, 2016-03-18 at 23:09 +0100, Rafael J. Wysocki wrote:
> On Friday, March 18, 2016 05:52:49 PM Rik van Riel wrote:
> > 
> > On Fri, 2016-03-18 at 22:48 +0100, Rafael J. Wysocki wrote:
> > > 
> > > On Fri, Mar 18, 2016 at 10:35 PM, Rik van Riel <riel@redhat.com>
> > > wrote:
> > > > 
> > > > 
> > > > On Fri, 18 Mar 2016 11:32:28 -0700
> > > > "Doug Smythies" <dsmythies@telus.net> wrote:
> > > > 
> > > > > 
> > > > > 
> > > > > Energy:
> > > > > Kernel 4.5-rc7-rjw10-rvr6: 55864 Joules
> > > > > 
> > > > > Trace data (very crude summary):
> > > > > Kernel 4.5-rc7-rjw10-rvr5: ~3049 long durations at high CPU
> > > > > load
> > > > > (idle state 0)
> > > > > Kernel 4.5-rc7-rjw10-rvr5: ~183 long durations at high, but
> > > > > less,
> > > > > CPU load (not all idle state 0)
> > > > This is a bit of a big hammer, but since the polling loop
> > > > already
> > > > uses
> > > > full CPU power anyway, this could be harmless, and solve your
> > > > problem.
> > > Yes, it would be good to test this.
> > > 
> > > No, I don't like it.
> > > 
> > > If we are to break out of the polling loop after a specific time,
> > > we
> > > shouldn't be using polling at all in the first place.
> > > 
> > > So I very much prefer to compare data->next_timer_us with the
> > > target
> > > residency of C1 instead of doing this.
> > The problem is that data->next_timer_us does not tell us
> > when the next network packet is about to arrive, though.
> > 
> > This could cause issues when dealing with a fast stream
> > of network traffic, which get_typical_interval() would
> > be able to deal with.
> If it predicted things correctly for small intervals, that is.
> 
> It doesn't seem to be able to do that, however.

It appears to get it right the vast majority of the
time, it is just that when it gets wrong in a tiny
minority of the cases, the penalty for polling is
really high.

> You seem to want to trade energy consumption for performance with a
> very narrow
> use case in mind and that by changing bahavior that's been there
> pretty much
> forever, so by now it really has become the expected one.
> 
> Needless to say I'm not a fan of changes like that.

I am not sure it is a narrow use case, and I suspect
cases with over half a million wakeups a second are
not going to be all that unusual.

The same thing applies not just to network traffic,
but also to tasks ping-ponging info between CPUs,
eg. because some tasks are pinned, or because we
are dealing with a controller thread and a larger
number of workers.
Rafael J. Wysocki March 18, 2016, 10:56 p.m. UTC | #5
On Fri, Mar 18, 2016 at 10:35 PM, Rik van Riel <riel@redhat.com> wrote:
> On Fri, 18 Mar 2016 11:32:28 -0700
> "Doug Smythies" <dsmythies@telus.net> wrote:
>
>> Energy:
>> Kernel 4.5-rc7-rjw10-rvr6: 55864 Joules
>>
>> Trace data (very crude summary):
>> Kernel 4.5-rc7-rjw10-rvr5: ~3049 long durations at high CPU load (idle state 0)
>> Kernel 4.5-rc7-rjw10-rvr5: ~183 long durations at high, but less, CPU load (not all idle state 0)
>
> This is a bit of a big hammer, but since the polling loop already uses
> full CPU power anyway, this could be harmless, and solve your problem.
>
> ---8<---
>
> Subject: cpuidle: break out of idle polling loop after HLT threshold
>
> Staying in the poll loop can be beneficial for workloads that
> wake back up very, very quickly, but staying in the poll loop
> for too long can waste a lot of power.
>
> Break out of the idle polling loop if the system has spent more
> time in the loop than the threshold for entering the next power
> saving mode.
>
> The poll loop uses full CPU power already, so this will not
> increase the power use of the poll loop.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  drivers/cpuidle/driver.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
> index 389ade4572be..a5dfeafd243f 100644
> --- a/drivers/cpuidle/driver.c
> +++ b/drivers/cpuidle/driver.c
> @@ -181,10 +181,29 @@ static void __cpuidle_driver_init(struct cpuidle_driver *drv)
>  static int poll_idle(struct cpuidle_device *dev,
>                 struct cpuidle_driver *drv, int index)
>  {
> +       /*
> +        * Polling is state 0; break out of the polling loop after the
> +        * threshold for the next power state has passed. Doing this several
> +        * times in a row should cause the menu governor to immediately
> +        * select a deeper power state.
> +        */
> +       u64 limit = drv->states[1].target_residency * NSEC_PER_USEC;
> +       ktime_t start = ktime_get();
> +       int i = 0;
> +
>         local_irq_enable();
>         if (!current_set_polling_and_test()) {
> -               while (!need_resched())
> +               while (!need_resched()) {
> +                       ktime_t now;
>                         cpu_relax();
> +
> +                       /* Occasionally check for a timeout. */
> +                       if (!(i % 1000)) {
> +                               now = ktime_get();
> +                               if (ktime_to_ns(ktime_sub(now, start)) > limit)
> +                                       break;
> +                       }
> +               }
>         }
>         current_clr_polling();

I'm not sure how this is going to help, but maybe I'm missing something.

Say we hit the timeout and break out of the loop.  We get back to
cpuidle_enter_state() and from there (via a couple of irrelevant
steps) to cpuidle_idle_call() and to cpu_idle_loop().  Then, since we
are still idle, the while (1) loop continues and we land in
menu_select() again.  That will choose polling again (because now
there's less time to the next timer than it was last time, so if it
didn't choose C1 then, it won't do that now either) and we land in
idle_poll() again.

So is there anything I'm overlooking?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki March 18, 2016, 11:29 p.m. UTC | #6
On Fri, Mar 18, 2016 at 11:28 PM, Rik van Riel <riel@redhat.com> wrote:
> On Fri, 2016-03-18 at 23:09 +0100, Rafael J. Wysocki wrote:
>> On Friday, March 18, 2016 05:52:49 PM Rik van Riel wrote:
>> >
>> > On Fri, 2016-03-18 at 22:48 +0100, Rafael J. Wysocki wrote:
>> > >
>> > > On Fri, Mar 18, 2016 at 10:35 PM, Rik van Riel <riel@redhat.com>
>> > > wrote:
>> > > >
>> > > >
>> > > > On Fri, 18 Mar 2016 11:32:28 -0700
>> > > > "Doug Smythies" <dsmythies@telus.net> wrote:
>> > > >
>> > > > >
>> > > > >
>> > > > > Energy:
>> > > > > Kernel 4.5-rc7-rjw10-rvr6: 55864 Joules
>> > > > >
>> > > > > Trace data (very crude summary):
>> > > > > Kernel 4.5-rc7-rjw10-rvr5: ~3049 long durations at high CPU
>> > > > > load
>> > > > > (idle state 0)
>> > > > > Kernel 4.5-rc7-rjw10-rvr5: ~183 long durations at high, but
>> > > > > less,
>> > > > > CPU load (not all idle state 0)
>> > > > This is a bit of a big hammer, but since the polling loop
>> > > > already
>> > > > uses
>> > > > full CPU power anyway, this could be harmless, and solve your
>> > > > problem.
>> > > Yes, it would be good to test this.
>> > >
>> > > No, I don't like it.
>> > >
>> > > If we are to break out of the polling loop after a specific time,
>> > > we
>> > > shouldn't be using polling at all in the first place.
>> > >
>> > > So I very much prefer to compare data->next_timer_us with the
>> > > target
>> > > residency of C1 instead of doing this.
>> > The problem is that data->next_timer_us does not tell us
>> > when the next network packet is about to arrive, though.
>> >
>> > This could cause issues when dealing with a fast stream
>> > of network traffic, which get_typical_interval() would
>> > be able to deal with.
>> If it predicted things correctly for small intervals, that is.
>>
>> It doesn't seem to be able to do that, however.
>
> It appears to get it right the vast majority of the
> time, it is just that when it gets wrong in a tiny
> minority of the cases, the penalty for polling is
> really high.

The overall penalty is what matters, though (and it is cumulative,
unfortunately).

>> You seem to want to trade energy consumption for performance with a
>> very narrow
>> use case in mind and that by changing bahavior that's been there
>> pretty much
>> forever, so by now it really has become the expected one.
>>
>> Needless to say I'm not a fan of changes like that.
>
> I am not sure it is a narrow use case, and I suspect
> cases with over half a million wakeups a second are
> not going to be all that unusual.

Well, the definition of "unusual" is somewhat elusive. :-)

> The same thing applies not just to network traffic,
> but also to tasks ping-ponging info between CPUs,
> eg. because some tasks are pinned, or because we
> are dealing with a controller thread and a larger
> number of workers.

All true, but still 10% energy consumption increase is rather a big deal.

I'd like to hear if there's any difference from going to nanosecond
time scale in the first place at this point.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rik van Riel March 19, 2016, 1:53 a.m. UTC | #7
On Fri, 2016-03-18 at 23:56 +0100, Rafael J. Wysocki wrote:
> On Fri, Mar 18, 2016 at 10:35 PM, Rik van Riel <riel@redhat.com>
> wrote:
> > 
> > On Fri, 18 Mar 2016 11:32:28 -0700
> > "Doug Smythies" <dsmythies@telus.net> wrote:
> > 
> > > 
> > > Energy:
> > > Kernel 4.5-rc7-rjw10-rvr6: 55864 Joules
> > > 
> > > Trace data (very crude summary):
> > > Kernel 4.5-rc7-rjw10-rvr5: ~3049 long durations at high CPU load
> > > (idle state 0)
> > > Kernel 4.5-rc7-rjw10-rvr5: ~183 long durations at high, but less,
> > > CPU load (not all idle state 0)
> > This is a bit of a big hammer, but since the polling loop already
> > uses
> > full CPU power anyway, this could be harmless, and solve your
> > problem.
> > 
> > ---8<---
> > 
> > Subject: cpuidle: break out of idle polling loop after HLT
> > threshold
> > 
> > Staying in the poll loop can be beneficial for workloads that
> > wake back up very, very quickly, but staying in the poll loop
> > for too long can waste a lot of power.
> > 
> > Break out of the idle polling loop if the system has spent more
> > time in the loop than the threshold for entering the next power
> > saving mode.
> > 
> > The poll loop uses full CPU power already, so this will not
> > increase the power use of the poll loop.
> > 
> > Signed-off-by: Rik van Riel <riel@redhat.com>
> > ---
> >  drivers/cpuidle/driver.c | 21 ++++++++++++++++++++-
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
> > index 389ade4572be..a5dfeafd243f 100644
> > --- a/drivers/cpuidle/driver.c
> > +++ b/drivers/cpuidle/driver.c
> > @@ -181,10 +181,29 @@ static void __cpuidle_driver_init(struct
> > cpuidle_driver *drv)
> >  static int poll_idle(struct cpuidle_device *dev,
> >                 struct cpuidle_driver *drv, int index)
> >  {
> > +       /*
> > +        * Polling is state 0; break out of the polling loop after
> > the
> > +        * threshold for the next power state has passed. Doing
> > this several
> > +        * times in a row should cause the menu governor to
> > immediately
> > +        * select a deeper power state.
> > +        */
> > +       u64 limit = drv->states[1].target_residency *
> > NSEC_PER_USEC;
> > +       ktime_t start = ktime_get();
> > +       int i = 0;
> > +
> >         local_irq_enable();
> >         if (!current_set_polling_and_test()) {
> > -               while (!need_resched())
> > +               while (!need_resched()) {
> > +                       ktime_t now;
> >                         cpu_relax();
> > +
> > +                       /* Occasionally check for a timeout. */
> > +                       if (!(i % 1000)) {
> > +                               now = ktime_get();
> > +                               if (ktime_to_ns(ktime_sub(now,
> > start)) > limit)
> > +                                       break;
> > +                       }
> > +               }
> >         }
> >         current_clr_polling();
> I'm not sure how this is going to help, but maybe I'm missing
> something.
> 
> Say we hit the timeout and break out of the loop.  We get back to
> cpuidle_enter_state() and from there (via a couple of irrelevant
> steps) to cpuidle_idle_call() and to cpu_idle_loop().  Then, since we
> are still idle, the while (1) loop continues and we land in
> menu_select() again.  That will choose polling again (because now
> there's less time to the next timer than it was last time, so if it
> didn't choose C1 then, it won't do that now either) and we land in
> idle_poll() again.
> 
> So is there anything I'm overlooking?

After about three microseconds of the above, get_typical_interval()
will return an interval larger than the C1 target residency, or
fail to find a typical interval, because 3 of the 8 intervals will
exceed the C1 target residency at that point.

Only repeated very fast wakeups will lead to polling, and just
a few periods of polling timing out will lead to the system
reverting to HLT.

In other words, the system will only use polling while it appears
to be beneficial, and should snap out of that mode very quickly.

The kernel test robot also showed over 6% improvement in (IIRC)
the wikibench test, with polling being done a little more often.

Other parts of the kernel do pretty much anything for 2% extra
performance, so if we can find a way to preserve that 6%, without
hurting power usage significantly, I suspect it may be worthwhile.
Rafael J. Wysocki March 19, 2016, 2:06 a.m. UTC | #8
On Sat, Mar 19, 2016 at 2:53 AM, Rik van Riel <riel@redhat.com> wrote:
> On Fri, 2016-03-18 at 23:56 +0100, Rafael J. Wysocki wrote:
>> On Fri, Mar 18, 2016 at 10:35 PM, Rik van Riel <riel@redhat.com>
>> wrote:
>> >
>> > On Fri, 18 Mar 2016 11:32:28 -0700
>> > "Doug Smythies" <dsmythies@telus.net> wrote:
>> >
>> > >
>> > > Energy:
>> > > Kernel 4.5-rc7-rjw10-rvr6: 55864 Joules
>> > >
>> > > Trace data (very crude summary):
>> > > Kernel 4.5-rc7-rjw10-rvr5: ~3049 long durations at high CPU load
>> > > (idle state 0)
>> > > Kernel 4.5-rc7-rjw10-rvr5: ~183 long durations at high, but less,
>> > > CPU load (not all idle state 0)
>> > This is a bit of a big hammer, but since the polling loop already
>> > uses
>> > full CPU power anyway, this could be harmless, and solve your
>> > problem.
>> >
>> > ---8<---
>> >
>> > Subject: cpuidle: break out of idle polling loop after HLT
>> > threshold
>> >
>> > Staying in the poll loop can be beneficial for workloads that
>> > wake back up very, very quickly, but staying in the poll loop
>> > for too long can waste a lot of power.
>> >
>> > Break out of the idle polling loop if the system has spent more
>> > time in the loop than the threshold for entering the next power
>> > saving mode.
>> >
>> > The poll loop uses full CPU power already, so this will not
>> > increase the power use of the poll loop.
>> >
>> > Signed-off-by: Rik van Riel <riel@redhat.com>
>> > ---
>> >  drivers/cpuidle/driver.c | 21 ++++++++++++++++++++-
>> >  1 file changed, 20 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
>> > index 389ade4572be..a5dfeafd243f 100644
>> > --- a/drivers/cpuidle/driver.c
>> > +++ b/drivers/cpuidle/driver.c
>> > @@ -181,10 +181,29 @@ static void __cpuidle_driver_init(struct
>> > cpuidle_driver *drv)
>> >  static int poll_idle(struct cpuidle_device *dev,
>> >                 struct cpuidle_driver *drv, int index)
>> >  {
>> > +       /*
>> > +        * Polling is state 0; break out of the polling loop after
>> > the
>> > +        * threshold for the next power state has passed. Doing
>> > this several
>> > +        * times in a row should cause the menu governor to
>> > immediately
>> > +        * select a deeper power state.
>> > +        */
>> > +       u64 limit = drv->states[1].target_residency *
>> > NSEC_PER_USEC;
>> > +       ktime_t start = ktime_get();
>> > +       int i = 0;
>> > +
>> >         local_irq_enable();
>> >         if (!current_set_polling_and_test()) {
>> > -               while (!need_resched())
>> > +               while (!need_resched()) {
>> > +                       ktime_t now;
>> >                         cpu_relax();
>> > +
>> > +                       /* Occasionally check for a timeout. */
>> > +                       if (!(i % 1000)) {
>> > +                               now = ktime_get();
>> > +                               if (ktime_to_ns(ktime_sub(now,
>> > start)) > limit)
>> > +                                       break;
>> > +                       }
>> > +               }
>> >         }
>> >         current_clr_polling();
>> I'm not sure how this is going to help, but maybe I'm missing
>> something.
>>
>> Say we hit the timeout and break out of the loop.  We get back to
>> cpuidle_enter_state() and from there (via a couple of irrelevant
>> steps) to cpuidle_idle_call() and to cpu_idle_loop().  Then, since we
>> are still idle, the while (1) loop continues and we land in
>> menu_select() again.  That will choose polling again (because now
>> there's less time to the next timer than it was last time, so if it
>> didn't choose C1 then, it won't do that now either) and we land in
>> idle_poll() again.
>>
>> So is there anything I'm overlooking?
>
> After about three microseconds of the above, get_typical_interval()
> will return an interval larger than the C1 target residency, or
> fail to find a typical interval, because 3 of the 8 intervals will
> exceed the C1 target residency at that point.
>
> Only repeated very fast wakeups will lead to polling, and just
> a few periods of polling timing out will lead to the system
> reverting to HLT.
>
> In other words, the system will only use polling while it appears
> to be beneficial, and should snap out of that mode very quickly.

OK, thanks!

> The kernel test robot also showed over 6% improvement in (IIRC)
> the wikibench test, with polling being done a little more often.
>
> Other parts of the kernel do pretty much anything for 2% extra
> performance, so if we can find a way to preserve that 6%, without
> hurting power usage significantly, I suspect it may be worthwhile.

Well. OK.  That is a worthy goal, but things in this thread look a bit
too ad-hoc to me to be honest.

I'd like to take one step at a time and in such a way that
improvements are actually confirmed in every step.

Does this sound reasonable?

If so, I first would like to restore the energy consumption in the
Doug's case at least to the level from before commit a9ceb78bc75c
(cpuidle,menu: use interactivity_req to disable polling).  That's what
I'd like to do for 4.6 and possibly before the merge window is over.

After 4.6-rc1, in the 4.7 cycle, we can think about improving
performance here and maybe sacrificing some energy for that, but in a
controlled way as I said above.

What do you think?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rik van Riel March 19, 2016, 2:17 a.m. UTC | #9
On Sat, 2016-03-19 at 03:06 +0100, Rafael J. Wysocki wrote:
> On Sat, Mar 19, 2016 at 2:53 AM, Rik van Riel <riel@redhat.com>
> wrote:
> > 
> > The kernel test robot also showed over 6% improvement in (IIRC)
> > the wikibench test, with polling being done a little more often.
> > 
> > Other parts of the kernel do pretty much anything for 2% extra
> > performance, so if we can find a way to preserve that 6%, without
> > hurting power usage significantly, I suspect it may be worthwhile.
> Well. OK.  That is a worthy goal, but things in this thread look a
> bit
> too ad-hoc to me to be honest.
> 
> I'd like to take one step at a time and in such a way that
> improvements are actually confirmed in every step.
> 
> Does this sound reasonable?
> 
> If so, I first would like to restore the energy consumption in the
> Doug's case at least to the level from before commit a9ceb78bc75c
> (cpuidle,menu: use interactivity_req to disable polling).  That's
> what
> I'd like to do for 4.6 and possibly before the merge window is over.
> 
> After 4.6-rc1, in the 4.7 cycle, we can think about improving
> performance here and maybe sacrificing some energy for that, but in a
> controlled way as I said above.
> 
> What do you think?

That sounds perfectly reasonable to me.

Count me in for any efforts to improve performance with
the cpuidle code, and hopefully allow more people to enable
c-states on their servers.
Rafael J. Wysocki March 19, 2016, 2:33 a.m. UTC | #10
On Sat, Mar 19, 2016 at 3:17 AM, Rik van Riel <riel@redhat.com> wrote:
> On Sat, 2016-03-19 at 03:06 +0100, Rafael J. Wysocki wrote:
>> On Sat, Mar 19, 2016 at 2:53 AM, Rik van Riel <riel@redhat.com>
>> wrote:
>> >
>> > The kernel test robot also showed over 6% improvement in (IIRC)
>> > the wikibench test, with polling being done a little more often.
>> >
>> > Other parts of the kernel do pretty much anything for 2% extra
>> > performance, so if we can find a way to preserve that 6%, without
>> > hurting power usage significantly, I suspect it may be worthwhile.
>> Well. OK.  That is a worthy goal, but things in this thread look a
>> bit
>> too ad-hoc to me to be honest.
>>
>> I'd like to take one step at a time and in such a way that
>> improvements are actually confirmed in every step.
>>
>> Does this sound reasonable?
>>
>> If so, I first would like to restore the energy consumption in the
>> Doug's case at least to the level from before commit a9ceb78bc75c
>> (cpuidle,menu: use interactivity_req to disable polling).  That's
>> what
>> I'd like to do for 4.6 and possibly before the merge window is over.
>>
>> After 4.6-rc1, in the 4.7 cycle, we can think about improving
>> performance here and maybe sacrificing some energy for that, but in a
>> controlled way as I said above.
>>
>> What do you think?
>
> That sounds perfectly reasonable to me.
>
> Count me in for any efforts to improve performance with
> the cpuidle code, and hopefully allow more people to enable
> c-states on their servers.

OK, great!

I'm going to send a patch for Doug to test that I'd like to apply for
4.6 (on top of http://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=e132b9b3bc7f19e9b158e42b323881d5dee5ecf3),
maybe not today (because I'm falling asleep now), but likely tomorrow
or on Sunday.  If that restores things for him, I'll apply it and that
will be the starting point for future changes.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 389ade4572be..a5dfeafd243f 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -181,10 +181,29 @@  static void __cpuidle_driver_init(struct cpuidle_driver *drv)
 static int poll_idle(struct cpuidle_device *dev,
 		struct cpuidle_driver *drv, int index)
 {
+	/*
+	 * Polling is state 0; break out of the polling loop after the
+	 * threshold for the next power state has passed. Doing this several
+	 * times in a row should cause the menu governor to immediately
+	 * select a deeper power state.
+	 */
+	u64 limit = drv->states[1].target_residency * NSEC_PER_USEC;
+	ktime_t start = ktime_get();
+	int i = 0;
+
 	local_irq_enable();
 	if (!current_set_polling_and_test()) {
-		while (!need_resched())
+		while (!need_resched()) {
+			ktime_t now;
 			cpu_relax();
+
+			/* Occasionally check for a timeout. */
+			if (!(i % 1000)) {
+				now = ktime_get();
+				if (ktime_to_ns(ktime_sub(now, start)) > limit)
+					break;
+			}
+		}
 	}
 	current_clr_polling();