diff mbox

[v10,1/3] cpufreq: Add mechanism for registering utilization update callbacks

Message ID 16016177.YFqb4gVNBo@vostro.rjw.lan (mailing list archive)
State Accepted, archived
Delegated to: Rafael Wysocki
Headers show

Commit Message

Rafael J. Wysocki Feb. 15, 2016, 9:47 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Introduce a mechanism by which parts of the cpufreq subsystem
("setpolicy" drivers or the core) can register callbacks to be
executed from cpufreq_update_util() which is invoked by the
scheduler's update_load_avg() on CPU utilization changes.

This allows the "setpolicy" drivers to dispense with their timers
and do all of the computations they need and frequency/voltage
adjustments in the update_load_avg() code path, among other things.

The update_load_avg() changes were suggested by Peter Zijlstra.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
---

Changes from v9:
- Move the additional RT/DL hooks back to update_curr_rt/dl() (Peter says
  that's OK), but only call them if updating the current CPU's rq, update
  the cpufreq_trigger_update() kerneldoc.

Changes from v8:
- Peter thinks that cpufreq hooks in update_curr_rt/dl() are overkill so
  move them to task_tick_rt/dl() and enqueue_task_rt/dl() (in case RT/DL
  tasks are only active between ticks), update the cpufreq_trigger_update()
  kerneldoc.

Changes from v7
- cpufreq_trigger_update() has a kerneldoc describing it as a band-aid to
  be replaced in the future and the comments next to its call sites ask
  the reader to see that comment.

  No functional changes. 

Changes from v6:
- Steve suggested to use rq_clock() instead of rq_clock_task() as the time
  argument for cpufreq_update_util() as that seems to be more suitable for
  this purpose.

---
 drivers/cpufreq/cpufreq.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/cpufreq.h   |   34 ++++++++++++++++++++++++++++++++++
 kernel/sched/deadline.c   |    4 ++++
 kernel/sched/fair.c       |   26 +++++++++++++++++++++++++-
 kernel/sched/rt.c         |    4 ++++
 kernel/sched/sched.h      |    1 +
 6 files changed, 113 insertions(+), 1 deletion(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rafael J. Wysocki Feb. 18, 2016, 8:22 p.m. UTC | #1
On Mon, Feb 15, 2016 at 10:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Introduce a mechanism by which parts of the cpufreq subsystem
> ("setpolicy" drivers or the core) can register callbacks to be
> executed from cpufreq_update_util() which is invoked by the
> scheduler's update_load_avg() on CPU utilization changes.
>
> This allows the "setpolicy" drivers to dispense with their timers
> and do all of the computations they need and frequency/voltage
> adjustments in the update_load_avg() code path, among other things.
>
> The update_load_avg() changes were suggested by Peter Zijlstra.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>
> Changes from v9:
> - Move the additional RT/DL hooks back to update_curr_rt/dl() (Peter says
>   that's OK), but only call them if updating the current CPU's rq, update
>   the cpufreq_trigger_update() kerneldoc.
>
> Changes from v8:
> - Peter thinks that cpufreq hooks in update_curr_rt/dl() are overkill so
>   move them to task_tick_rt/dl() and enqueue_task_rt/dl() (in case RT/DL
>   tasks are only active between ticks), update the cpufreq_trigger_update()
>   kerneldoc.
>
> Changes from v7
> - cpufreq_trigger_update() has a kerneldoc describing it as a band-aid to
>   be replaced in the future and the comments next to its call sites ask
>   the reader to see that comment.
>
>   No functional changes.
>
> Changes from v6:
> - Steve suggested to use rq_clock() instead of rq_clock_task() as the time
>   argument for cpufreq_update_util() as that seems to be more suitable for
>   this purpose.
>
> ---
>  drivers/cpufreq/cpufreq.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/cpufreq.h   |   34 ++++++++++++++++++++++++++++++++++
>  kernel/sched/deadline.c   |    4 ++++
>  kernel/sched/fair.c       |   26 +++++++++++++++++++++++++-
>  kernel/sched/rt.c         |    4 ++++
>  kernel/sched/sched.h      |    1 +
>  6 files changed, 113 insertions(+), 1 deletion(-)

So if anyone has any issues with this one, please let me know.

It has been in linux-next for a few days and seems to be doing well.

As I said previously, there is a metric ton of cpufreq improvements
depending on it, so I'd rather not delay integrating it any more.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Juri Lelli Feb. 19, 2016, 8:09 a.m. UTC | #2
Hi Rafael,

On 18/02/16 21:22, Rafael J. Wysocki wrote:
> On Mon, Feb 15, 2016 at 10:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >

[...]

> 
> So if anyone has any issues with this one, please let me know.
> 

I'm repeating myself a bit, but I'll try to articulate my only concern
once again anyway. I run some tests on a couple of arm boxes and I
didn't notice any regression or improvements for ondemand and
conservative (FWIW this might also work as a tested-by), so I tend to
take this series as a way to replace governor timers, making further
cleanups and fixes possibile. I think you already confirmed this and I
understand why you'd like this series to go in as I also think that what
we have on top is beneficial.

However, I still don't quite get why we want to introduce an interface
for explicit passing of util and max if we are not using such parameters
yet. Also, I couldn't find any indication of how such parameters will be
used in the future. If what we need today is a periodic kick for cpufreq
governors that need it, we should simply do how we already do for RT and
DL, IMHO. Also because the places where the current hooks reside might
not be the correct and useful one once we'll start using the utilization
parameters. I could probably make a case for DL where we should place
hooks in admission control path (or somewhere else when more
sophisticated mechanisms we'll be in place) rather then in the periodic
tick.

> It has been in linux-next for a few days and seems to be doing well.
> 
> As I said previously, there is a metric ton of cpufreq improvements
> depending on it, so I'd rather not delay integrating it any more.
> 

As said. I'm not against these changes since they open up to further
substantial fixes. I'm only wondering if we are doing the right thing
defining an interface that nobody is using and without an indication of
how such thing we'll be used in the future.

Best,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srinivas Pandruvada Feb. 19, 2016, 4:42 p.m. UTC | #3
On Fri, 2016-02-19 at 08:09 +0000, Juri Lelli wrote:
Hi Juri,
> > 
> Hi Rafael,
> 
> On 18/02/16 21:22, Rafael J. Wysocki wrote:
> > On Mon, Feb 15, 2016 at 10:47 PM, Rafael J. Wysocki <rjw@rjwysocki.
> > net> wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > 
> 
[...]

> However, I still don't quite get why we want to introduce an
> interface
> for explicit passing of util and max if we are not using such
> parameters
> yet. Also, I couldn't find any indication of how such parameters will
> be
> used in the future. If what we need today is a periodic kick for
> cpufreq
> governors that need it, we should simply do how we already do for RT
> and
> DL, IMHO. Also because the places where the current hooks reside
> might
> not be the correct and useful one once we'll start using the
> utilization
> parameters. I could probably make a case for DL where we should place
> hooks in admission control path (or somewhere else when more
> sophisticated mechanisms we'll be in place) rather then in the
> periodic
> tick.
We did experiments using util/max in intel_pstate. For some benchmarks
there were regression of 4 to 5%, for some benchmarks it performed at
par with getting utilization from the processor. Further optimization
in the algorithm is possible and still in progress. Idea is that we can
change P-State fast enough and be more reactive. Once I have good data,
I will send to this list. The algorithm can be part of the cpufreq
governor too.

Thanks,
Srinivas


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Juri Lelli Feb. 19, 2016, 5:26 p.m. UTC | #4
Hi Srinivas,

On 19/02/16 08:42, Srinivas Pandruvada wrote:
> On Fri, 2016-02-19 at 08:09 +0000, Juri Lelli wrote:
> Hi Juri,
> > > 
> > Hi Rafael,
> > 
> > On 18/02/16 21:22, Rafael J. Wysocki wrote:
> > > On Mon, Feb 15, 2016 at 10:47 PM, Rafael J. Wysocki <rjw@rjwysocki.
> > > net> wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > 
> > 
> [...]
> 
> > However, I still don't quite get why we want to introduce an
> > interface
> > for explicit passing of util and max if we are not using such
> > parameters
> > yet. Also, I couldn't find any indication of how such parameters will
> > be
> > used in the future. If what we need today is a periodic kick for
> > cpufreq
> > governors that need it, we should simply do how we already do for RT
> > and
> > DL, IMHO. Also because the places where the current hooks reside
> > might
> > not be the correct and useful one once we'll start using the
> > utilization
> > parameters. I could probably make a case for DL where we should place
> > hooks in admission control path (or somewhere else when more
> > sophisticated mechanisms we'll be in place) rather then in the
> > periodic
> > tick.
> We did experiments using util/max in intel_pstate. For some benchmarks
> there were regression of 4 to 5%, for some benchmarks it performed at
> par with getting utilization from the processor. Further optimization
> in the algorithm is possible and still in progress. Idea is that we can
> change P-State fast enough and be more reactive. Once I have good data,
> I will send to this list. The algorithm can be part of the cpufreq
> governor too.
> 

Thanks for your answer. What you are experimenting with looks really
interesting and I'm certainly more than interested in looking at your
findings and patches when they will hit the list.

My point was more on what we can look at today, though. Without a clear
understanding about how and where util and max will be used and from
which scheduler paths such information should come from, it is a bit
difficult to tell if the current interface and hooks are fine, IMHO.
I'd suggest we leave this part to the discussion we will have once your
proposal will be public; and to facilitate that we should remove those
arguments from the current interface.

Best,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Muckle Feb. 19, 2016, 5:28 p.m. UTC | #5
On 02/19/2016 08:42 AM, Srinivas Pandruvada wrote:
> We did experiments using util/max in intel_pstate. For some benchmarks
> there were regression of 4 to 5%, for some benchmarks it performed at
> par with getting utilization from the processor. Further optimization
> in the algorithm is possible and still in progress. Idea is that we can
> change P-State fast enough and be more reactive. Once I have good data,
> I will send to this list. The algorithm can be part of the cpufreq
> governor too.

There has been a lot of work in the area of scheduler-driven CPU
frequency selection by Linaro and ARM as well. It was posted most
recently a couple months ago:

http://thread.gmane.org/gmane.linux.power-management.general/69176

It was also posted as part of the energy-aware scheduling series last
July. There's a new RFC series forthcoming which I had hoped (and
failed) to post prior to my business travel this week; it should be out
next week. It will address the feedback received thus far along with
locking and other things.

The scheduler hooks for utilization-based cpufreq operation deserve a
lot more debate I think. They could quite possibly have different
requirements than hooks which are chosen just to guarantee periodic
callbacks into sampling-based governors.

For my part I think it would be best if the util/max parameters are
omitted until it's clear whether these same hooks can be effectively
used for architecture agnostic scheduler-guided (capacity driven) CPU
frequency support. My upcoming RFC will provide another opportunity to
debate the hooks as well as how scheduler-guided CPU frequency should be
structured.

thanks,
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Feb. 19, 2016, 10:14 p.m. UTC | #6
On Friday, February 19, 2016 08:09:17 AM Juri Lelli wrote:
> Hi Rafael,
> 
> On 18/02/16 21:22, Rafael J. Wysocki wrote:
> > On Mon, Feb 15, 2016 at 10:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> 
> [...]
> 
> > 
> > So if anyone has any issues with this one, please let me know.
> > 
> 
> I'm repeating myself a bit, but I'll try to articulate my only concern
> once again anyway. I run some tests on a couple of arm boxes and I
> didn't notice any regression or improvements for ondemand and
> conservative (FWIW this might also work as a tested-by), so I tend to
> take this series as a way to replace governor timers, making further
> cleanups and fixes possibile. I think you already confirmed this and I
> understand why you'd like this series to go in as I also think that what
> we have on top is beneficial.

OK

> However, I still don't quite get why we want to introduce an interface
> for explicit passing of util and max if we are not using such parameters
> yet. Also, I couldn't find any indication of how such parameters will be
> used in the future. If what we need today is a periodic kick for cpufreq
> governors that need it, we should simply do how we already do for RT and
> DL, IMHO. Also because the places where the current hooks reside might
> not be the correct and useful one once we'll start using the utilization
> parameters. I could probably make a case for DL where we should place
> hooks in admission control path (or somewhere else when more
> sophisticated mechanisms we'll be in place) rather then in the periodic
> tick.

Well, the hook in DL is explicitly denoted as a temporary band-aid.

I and Srinivas have said for multiple times that we are going to use the
scheduler's utilization data in intel_pstate.  Admittedly, we haven't shown
any patches implementing that, but that's because Srinivas doesn't regard
that work as ready yet.

I also have something for the general cpufreq in the works.  I may be able
to send it as an RFC over the weekend, depending on how much time I can
spend on it.

That said, if the concern is that there are plans to change the way the
scheduler computes the utilization numbers and that may become difficult to
carry out if cpufreq starts to depend on them in their current form, then I
may agree that it is valid, but I'm not aware of those plans ATM.

However, if the numbers are going to stay what they are, I don't see why
passing them to cpufreq may possibly become problematic at any point.

> > It has been in linux-next for a few days and seems to be doing well.
> > 
> > As I said previously, there is a metric ton of cpufreq improvements
> > depending on it, so I'd rather not delay integrating it any more.
> > 
> 
> As said. I'm not against these changes since they open up to further
> substantial fixes.

Good. :-)

> I'm only wondering if we are doing the right thing defining an interface
> that nobody is using and without an indication of how such thing we'll be
> used in the future.

That indication may be coming though. :-)

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Feb. 19, 2016, 10:35 p.m. UTC | #7
On Friday, February 19, 2016 09:28:23 AM Steve Muckle wrote:
> On 02/19/2016 08:42 AM, Srinivas Pandruvada wrote:
> > We did experiments using util/max in intel_pstate. For some benchmarks
> > there were regression of 4 to 5%, for some benchmarks it performed at
> > par with getting utilization from the processor. Further optimization
> > in the algorithm is possible and still in progress. Idea is that we can
> > change P-State fast enough and be more reactive. Once I have good data,
> > I will send to this list. The algorithm can be part of the cpufreq
> > governor too.
> 
> There has been a lot of work in the area of scheduler-driven CPU
> frequency selection by Linaro and ARM as well. It was posted most
> recently a couple months ago:
> 
> http://thread.gmane.org/gmane.linux.power-management.general/69176
> 
> It was also posted as part of the energy-aware scheduling series last
> July. There's a new RFC series forthcoming which I had hoped (and
> failed) to post prior to my business travel this week; it should be out
> next week. It will address the feedback received thus far along with
> locking and other things.
> 
> The scheduler hooks for utilization-based cpufreq operation deserve a
> lot more debate I think. They could quite possibly have different
> requirements than hooks which are chosen just to guarantee periodic
> callbacks into sampling-based governors.

Yes, they could.

The point here, though, is that even the sampling-based governors may 
benefit from using the numbers provided by the scheduler instead of trying
to come up with analogous numbers themselves.

> For my part I think it would be best if the util/max parameters are
> omitted

OK, so please see the patch I've just sent to Juri:

https://patchwork.kernel.org/patch/8364621/

> until it's clear whether these same hooks can be effectively
> used for architecture agnostic scheduler-guided (capacity driven) CPU
> frequency support.

Well, if they can't, then we'll need to move the hooks, but I'm not sure
how this is related to the arguments they take.

> My upcoming RFC will provide another opportunity to debate the hooks as
> well as how scheduler-guided CPU frequency should be structured.

OK, looking forward to seeing the RFC then. :-)

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Juri Lelli Feb. 22, 2016, 9:32 a.m. UTC | #8
On 19/02/16 23:14, Rafael J. Wysocki wrote:
> On Friday, February 19, 2016 08:09:17 AM Juri Lelli wrote:
> > Hi Rafael,
> > 
> > On 18/02/16 21:22, Rafael J. Wysocki wrote:
> > > On Mon, Feb 15, 2016 at 10:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > 
> > [...]
> > 
> > > 
> > > So if anyone has any issues with this one, please let me know.
> > > 
> > 
> > I'm repeating myself a bit, but I'll try to articulate my only concern
> > once again anyway. I run some tests on a couple of arm boxes and I
> > didn't notice any regression or improvements for ondemand and
> > conservative (FWIW this might also work as a tested-by), so I tend to
> > take this series as a way to replace governor timers, making further
> > cleanups and fixes possibile. I think you already confirmed this and I
> > understand why you'd like this series to go in as I also think that what
> > we have on top is beneficial.
> 
> OK
> 
> > However, I still don't quite get why we want to introduce an interface
> > for explicit passing of util and max if we are not using such parameters
> > yet. Also, I couldn't find any indication of how such parameters will be
> > used in the future. If what we need today is a periodic kick for cpufreq
> > governors that need it, we should simply do how we already do for RT and
> > DL, IMHO. Also because the places where the current hooks reside might
> > not be the correct and useful one once we'll start using the utilization
> > parameters. I could probably make a case for DL where we should place
> > hooks in admission control path (or somewhere else when more
> > sophisticated mechanisms we'll be in place) rather then in the periodic
> > tick.
> 
> Well, the hook in DL is explicitly denoted as a temporary band-aid.
> 
> I and Srinivas have said for multiple times that we are going to use the
> scheduler's utilization data in intel_pstate.  Admittedly, we haven't shown
> any patches implementing that, but that's because Srinivas doesn't regard
> that work as ready yet.
> 
> I also have something for the general cpufreq in the works.  I may be able
> to send it as an RFC over the weekend, depending on how much time I can
> spend on it.
> 

Saw that, thanks. Please allow me some time to review and test. :-)

> That said, if the concern is that there are plans to change the way the
> scheduler computes the utilization numbers and that may become difficult to
> carry out if cpufreq starts to depend on them in their current form, then I
> may agree that it is valid, but I'm not aware of those plans ATM.
> 

No, I don't think there's any substantial discussion going on about the
utilization numbers.

> However, if the numbers are going to stay what they are, I don't see why
> passing them to cpufreq may possibly become problematic at any point.

My concern was mostly on the fact that there is already another RFC
under discussion that uses the same numbers and has different hooks
placed in scheduler code (Steve's sched-freq); so, additional hooks
might generate confusion, IMHO.
 
> > > It has been in linux-next for a few days and seems to be doing well.
> > > 
> > > As I said previously, there is a metric ton of cpufreq improvements
> > > depending on it, so I'd rather not delay integrating it any more.
> > > 
> > 
> > As said. I'm not against these changes since they open up to further
> > substantial fixes.
> 
> Good. :-)
> 
> > I'm only wondering if we are doing the right thing defining an interface
> > that nobody is using and without an indication of how such thing we'll be
> > used in the future.
> 
> That indication may be coming though. :-)
> 

Thanks again. I'm going to have a look at that.

Best,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Feb. 22, 2016, 9:26 p.m. UTC | #9
On Mon, Feb 22, 2016 at 10:32 AM, Juri Lelli <juri.lelli@arm.com> wrote:
> On 19/02/16 23:14, Rafael J. Wysocki wrote:
>> On Friday, February 19, 2016 08:09:17 AM Juri Lelli wrote:
>> > Hi Rafael,
>> >
>> > On 18/02/16 21:22, Rafael J. Wysocki wrote:
>> > > On Mon, Feb 15, 2016 at 10:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > > >

[cut]

>> That said, if the concern is that there are plans to change the way the
>> scheduler computes the utilization numbers and that may become difficult to
>> carry out if cpufreq starts to depend on them in their current form, then I
>> may agree that it is valid, but I'm not aware of those plans ATM.
>>
>
> No, I don't think there's any substantial discussion going on about the
> utilization numbers.

OK, so the statement below applies.

>> However, if the numbers are going to stay what they are, I don't see why
>> passing them to cpufreq may possibly become problematic at any point.
>
> My concern was mostly on the fact that there is already another RFC
> under discussion that uses the same numbers and has different hooks
> placed in scheduler code (Steve's sched-freq); so, additional hooks
> might generate confusion, IMHO.

So this is about the hooks rather than about their arguments after
all, isn't it?

I fail to see why it is better to drop the arguments and leave the hooks, then.

OTOH, I see reasons for keeping the arguments along with the hooks,
but let me address that in my next reply.

Now, if the call sites of the hooks change in the future, it won't be
a problem for me as long as the new hooks are invoked on a regular
basis or, if they aren't, as long as I can figure out from the
arguments they pass that I should not expect an update any time soon.

If the arguments change, it won't be a problem either as long as they
are sufficient to be inserted into the frequency selection formula
used by the schedutil governor I posted and produce sensible
frequencies for the CPU.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Muckle Feb. 23, 2016, 3:58 a.m. UTC | #10
On 02/19/2016 02:35 PM, Rafael J. Wysocki wrote:
>> The scheduler hooks for utilization-based cpufreq operation deserve a
>> lot more debate I think. They could quite possibly have different
>> requirements than hooks which are chosen just to guarantee periodic
>> callbacks into sampling-based governors.
> 
> Yes, they could.
> 
> The point here, though, is that even the sampling-based governors may 
> benefit from using the numbers provided by the scheduler instead of trying
> to come up with analogous numbers themselves.

It seems premature to me to merge supporting infrastructure (the
utilization hooks) before we have changes, be it modifications to the
sampling based governors to use utilization or a scheduler-guided
governor, which are well tested and proven to yield reasonable
performance and power across various platforms and workloads.

Perhaps I'm a pessimist but I think it's going to be a challenge to get
utilization-based cpufreq on par, and I think getting the hooks right
will be part of that challenge.

>> For my part I think it would be best if the util/max parameters are
>> omitted
> 
> OK, so please see the patch I've just sent to Juri:
> 
> https://patchwork.kernel.org/patch/8364621/

Looked good to me.

thanks,
Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Juri Lelli Feb. 23, 2016, 11:01 a.m. UTC | #11
On 22/02/16 22:26, Rafael J. Wysocki wrote:
> On Mon, Feb 22, 2016 at 10:32 AM, Juri Lelli <juri.lelli@arm.com> wrote:
> > On 19/02/16 23:14, Rafael J. Wysocki wrote:
> >> On Friday, February 19, 2016 08:09:17 AM Juri Lelli wrote:
> >> > Hi Rafael,
> >> >
> >> > On 18/02/16 21:22, Rafael J. Wysocki wrote:
> >> > > On Mon, Feb 15, 2016 at 10:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> > > >
> 
> [cut]
> 
> >> That said, if the concern is that there are plans to change the way the
> >> scheduler computes the utilization numbers and that may become difficult to
> >> carry out if cpufreq starts to depend on them in their current form, then I
> >> may agree that it is valid, but I'm not aware of those plans ATM.
> >>
> >
> > No, I don't think there's any substantial discussion going on about the
> > utilization numbers.
> 
> OK, so the statement below applies.
> 
> >> However, if the numbers are going to stay what they are, I don't see why
> >> passing them to cpufreq may possibly become problematic at any point.
> >
> > My concern was mostly on the fact that there is already another RFC
> > under discussion that uses the same numbers and has different hooks
> > placed in scheduler code (Steve's sched-freq); so, additional hooks
> > might generate confusion, IMHO.
> 
> So this is about the hooks rather than about their arguments after
> all, isn't it?
> 
> I fail to see why it is better to drop the arguments and leave the hooks, then.
> 

It's about where we place such hooks and what arguments they have.
Without the schedutil governor as a consumer the current position makes
sense, but some of the arguments are not used. With schedutil both
position and arguments make sense, but a different implementation
(sched-freq) might have different needs w.r.t. position and arguments.

> OTOH, I see reasons for keeping the arguments along with the hooks,
> but let me address that in my next reply.
> 
> Now, if the call sites of the hooks change in the future, it won't be
> a problem for me as long as the new hooks are invoked on a regular
> basis or, if they aren't, as long as I can figure out from the
> arguments they pass that I should not expect an update any time soon.
> 

OK.

> If the arguments change, it won't be a problem either as long as they
> are sufficient to be inserted into the frequency selection formula
> used by the schedutil governor I posted and produce sensible
> frequencies for the CPU.
> 

Right, I guess this applies to any kind of governor.

Best,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Feb. 24, 2016, 2:01 a.m. UTC | #12
On Tuesday, February 23, 2016 11:01:18 AM Juri Lelli wrote:
> On 22/02/16 22:26, Rafael J. Wysocki wrote:
> > On Mon, Feb 22, 2016 at 10:32 AM, Juri Lelli <juri.lelli@arm.com> wrote:
> > > On 19/02/16 23:14, Rafael J. Wysocki wrote:
> > >> On Friday, February 19, 2016 08:09:17 AM Juri Lelli wrote:
> > >> > Hi Rafael,
> > >> >
> > >> > On 18/02/16 21:22, Rafael J. Wysocki wrote:
> > >> > > On Mon, Feb 15, 2016 at 10:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >> > > >
> > 
> > [cut]
> > 
> > >> That said, if the concern is that there are plans to change the way the
> > >> scheduler computes the utilization numbers and that may become difficult to
> > >> carry out if cpufreq starts to depend on them in their current form, then I
> > >> may agree that it is valid, but I'm not aware of those plans ATM.
> > >>
> > >
> > > No, I don't think there's any substantial discussion going on about the
> > > utilization numbers.
> > 
> > OK, so the statement below applies.
> > 
> > >> However, if the numbers are going to stay what they are, I don't see why
> > >> passing them to cpufreq may possibly become problematic at any point.
> > >
> > > My concern was mostly on the fact that there is already another RFC
> > > under discussion that uses the same numbers and has different hooks
> > > placed in scheduler code (Steve's sched-freq); so, additional hooks
> > > might generate confusion, IMHO.
> > 
> > So this is about the hooks rather than about their arguments after
> > all, isn't it?
> > 
> > I fail to see why it is better to drop the arguments and leave the hooks, then.
> > 
> 
> It's about where we place such hooks and what arguments they have.
> Without the schedutil governor as a consumer the current position makes
> sense, but some of the arguments are not used. With schedutil both
> position and arguments make sense, but a different implementation
> (sched-freq) might have different needs w.r.t. position and arguments.

And that's fine.  If the current position and/or arguments are not suitable,
they'll need to be changed.  It's not like things introduced today are set
in stone forever.

Peter has already shown how they may be changed to make everyone happy,
so I don't really see what the fuss is about.

> > OTOH, I see reasons for keeping the arguments along with the hooks,
> > but let me address that in my next reply.
> > 
> > Now, if the call sites of the hooks change in the future, it won't be
> > a problem for me as long as the new hooks are invoked on a regular
> > basis or, if they aren't, as long as I can figure out from the
> > arguments they pass that I should not expect an update any time soon.
> > 
> 
> OK.
> 
> > If the arguments change, it won't be a problem either as long as they
> > are sufficient to be inserted into the frequency selection formula
> > used by the schedutil governor I posted and produce sensible
> > frequencies for the CPU.
> > 
> 
> Right, I guess this applies to any kind of governor.

Sure, but this particular formula is very simple.  It just assumes that
util <= max so dividing the former by the latter will always yield a number
between 0 and 1.  [And the interpretation of util > max is totally arbitrary
today and regarded as temporary anyway, so that's just irrelevant.]

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Turquette March 8, 2016, 7:24 p.m. UTC | #13
Quoting Rafael J. Wysocki (2016-02-23 18:01:06)
> On Tuesday, February 23, 2016 11:01:18 AM Juri Lelli wrote:
> > On 22/02/16 22:26, Rafael J. Wysocki wrote:
> > > On Mon, Feb 22, 2016 at 10:32 AM, Juri Lelli <juri.lelli@arm.com> wrote:
> > > > On 19/02/16 23:14, Rafael J. Wysocki wrote:
> > > >> On Friday, February 19, 2016 08:09:17 AM Juri Lelli wrote:
> > > >> > Hi Rafael,
> > > >> >
> > > >> > On 18/02/16 21:22, Rafael J. Wysocki wrote:
> > > >> > > On Mon, Feb 15, 2016 at 10:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >> > > >
> > > 
> > > [cut]
> > > 
> > > >> That said, if the concern is that there are plans to change the way the
> > > >> scheduler computes the utilization numbers and that may become difficult to
> > > >> carry out if cpufreq starts to depend on them in their current form, then I
> > > >> may agree that it is valid, but I'm not aware of those plans ATM.
> > > >>
> > > >
> > > > No, I don't think there's any substantial discussion going on about the
> > > > utilization numbers.
> > > 
> > > OK, so the statement below applies.
> > > 
> > > >> However, if the numbers are going to stay what they are, I don't see why
> > > >> passing them to cpufreq may possibly become problematic at any point.
> > > >
> > > > My concern was mostly on the fact that there is already another RFC
> > > > under discussion that uses the same numbers and has different hooks
> > > > placed in scheduler code (Steve's sched-freq); so, additional hooks
> > > > might generate confusion, IMHO.
> > > 
> > > So this is about the hooks rather than about their arguments after
> > > all, isn't it?
> > > 
> > > I fail to see why it is better to drop the arguments and leave the hooks, then.
> > > 
> > 
> > It's about where we place such hooks and what arguments they have.
> > Without the schedutil governor as a consumer the current position makes
> > sense, but some of the arguments are not used. With schedutil both
> > position and arguments make sense, but a different implementation
> > (sched-freq) might have different needs w.r.t. position and arguments.
> 
> And that's fine.  If the current position and/or arguments are not suitable,
> they'll need to be changed.  It's not like things introduced today are set
> in stone forever.
> 
> Peter has already shown how they may be changed to make everyone happy,
> so I don't really see what the fuss is about.

I see this patch in linux-next now. Did it ever get Peter's or Ingo's
Ack?

Also it seems weird to me that this patch touching sched code is going
through the pm tree.

When it comes times to experiment more with the interfaces and make the
"future changes" that everyone keeps talking about, who is the
maintainer? Who has the last word?

Regards,
Mike

> 
> > > OTOH, I see reasons for keeping the arguments along with the hooks,
> > > but let me address that in my next reply.
> > > 
> > > Now, if the call sites of the hooks change in the future, it won't be
> > > a problem for me as long as the new hooks are invoked on a regular
> > > basis or, if they aren't, as long as I can figure out from the
> > > arguments they pass that I should not expect an update any time soon.
> > > 
> > 
> > OK.
> > 
> > > If the arguments change, it won't be a problem either as long as they
> > > are sufficient to be inserted into the frequency selection formula
> > > used by the schedutil governor I posted and produce sensible
> > > frequencies for the CPU.
> > > 
> > 
> > Right, I guess this applies to any kind of governor.
> 
> Sure, but this particular formula is very simple.  It just assumes that
> util <= max so dividing the former by the latter will always yield a number
> between 0 and 1.  [And the interpretation of util > max is totally arbitrary
> today and regarded as temporary anyway, so that's just irrelevant.]
> 
> Thanks,
> Rafael
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki March 8, 2016, 8:40 p.m. UTC | #14
On Tue, Mar 8, 2016 at 8:24 PM, Michael Turquette
<mturquette@baylibre.com> wrote:
> Quoting Rafael J. Wysocki (2016-02-23 18:01:06)
>> On Tuesday, February 23, 2016 11:01:18 AM Juri Lelli wrote:
>> > On 22/02/16 22:26, Rafael J. Wysocki wrote:
>> > > On Mon, Feb 22, 2016 at 10:32 AM, Juri Lelli <juri.lelli@arm.com> wrote:
>> > > > On 19/02/16 23:14, Rafael J. Wysocki wrote:
>> > > >> On Friday, February 19, 2016 08:09:17 AM Juri Lelli wrote:
>> > > >> > Hi Rafael,
>> > > >> >
>> > > >> > On 18/02/16 21:22, Rafael J. Wysocki wrote:
>> > > >> > > On Mon, Feb 15, 2016 at 10:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > > >> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > > >> > > >
>> > >
>> > > [cut]
>> > >
>> > > >> That said, if the concern is that there are plans to change the way the
>> > > >> scheduler computes the utilization numbers and that may become difficult to
>> > > >> carry out if cpufreq starts to depend on them in their current form, then I
>> > > >> may agree that it is valid, but I'm not aware of those plans ATM.
>> > > >>
>> > > >
>> > > > No, I don't think there's any substantial discussion going on about the
>> > > > utilization numbers.
>> > >
>> > > OK, so the statement below applies.
>> > >
>> > > >> However, if the numbers are going to stay what they are, I don't see why
>> > > >> passing them to cpufreq may possibly become problematic at any point.
>> > > >
>> > > > My concern was mostly on the fact that there is already another RFC
>> > > > under discussion that uses the same numbers and has different hooks
>> > > > placed in scheduler code (Steve's sched-freq); so, additional hooks
>> > > > might generate confusion, IMHO.
>> > >
>> > > So this is about the hooks rather than about their arguments after
>> > > all, isn't it?
>> > >
>> > > I fail to see why it is better to drop the arguments and leave the hooks, then.
>> > >
>> >
>> > It's about where we place such hooks and what arguments they have.
>> > Without the schedutil governor as a consumer the current position makes
>> > sense, but some of the arguments are not used. With schedutil both
>> > position and arguments make sense, but a different implementation
>> > (sched-freq) might have different needs w.r.t. position and arguments.
>>
>> And that's fine.  If the current position and/or arguments are not suitable,
>> they'll need to be changed.  It's not like things introduced today are set
>> in stone forever.
>>
>> Peter has already shown how they may be changed to make everyone happy,
>> so I don't really see what the fuss is about.
>
> I see this patch in linux-next now. Did it ever get Peter's or Ingo's
> Ack?

No, but none of them said "no" either.

And the interface was suggested by Peter in the first place.

> Also it seems weird to me that this patch touching sched code is going
> through the pm tree.

That's for purely practical reasons.  There are lots of PM changes
depending on it that have nothing to do with the scheduler.  I've been
talking about that for several times now, last time in my yesterday
post (http://marc.info/?l=linux-pm&m=145740561402948&w=2).  I've been
talking openly about what I'm going to do with this all the time.

No one is hiding things from anyone or trying to slip them through
past somebody here if that's what you're worried about.

> When it comes times to experiment more with the interfaces and make the
> "future changes" that everyone keeps talking about, who is the
> maintainer? Who has the last word?

As usual, it is about consensus.

This is on a boundary of two subsystems and I have good reasons to do
it.  One of the maintainers of the other subsystem involved is working
with me all the time and I'm following his suggestions.  Isn't that
really sufficient?

But really please see
http://marc.info/?l=linux-pm&m=145740561402948&w=2 as it means I'm
actually going to do what Juri and Steve asked for unless I'm told
that this is a bad idea.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki March 8, 2016, 10:43 p.m. UTC | #15
On Tue, Mar 8, 2016 at 11:06 PM, Michael Turquette
<mturquette@baylibre.com> wrote:
> Quoting Rafael J. Wysocki (2016-03-08 12:40:18)
>> On Tue, Mar 8, 2016 at 8:24 PM, Michael Turquette
>> <mturquette@baylibre.com> wrote:
>> > Quoting Rafael J. Wysocki (2016-02-23 18:01:06)
>> >> On Tuesday, February 23, 2016 11:01:18 AM Juri Lelli wrote:
>> >> > On 22/02/16 22:26, Rafael J. Wysocki wrote:
>> >> > > On Mon, Feb 22, 2016 at 10:32 AM, Juri Lelli <juri.lelli@arm.com> wrote:
>> >> > > > On 19/02/16 23:14, Rafael J. Wysocki wrote:
>> >> > > >> On Friday, February 19, 2016 08:09:17 AM Juri Lelli wrote:
>> >> > > >> > Hi Rafael,
>> >> > > >> >
>> >> > > >> > On 18/02/16 21:22, Rafael J. Wysocki wrote:
>> >> > > >> > > On Mon, Feb 15, 2016 at 10:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> >> > > >> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >> > > >> > > >
>> >> > >
>> >> > > [cut]
>> >> > >
>> >> > > >> That said, if the concern is that there are plans to change the way the
>> >> > > >> scheduler computes the utilization numbers and that may become difficult to
>> >> > > >> carry out if cpufreq starts to depend on them in their current form, then I
>> >> > > >> may agree that it is valid, but I'm not aware of those plans ATM.
>> >> > > >>
>> >> > > >
>> >> > > > No, I don't think there's any substantial discussion going on about the
>> >> > > > utilization numbers.
>> >> > >
>> >> > > OK, so the statement below applies.
>> >> > >
>> >> > > >> However, if the numbers are going to stay what they are, I don't see why
>> >> > > >> passing them to cpufreq may possibly become problematic at any point.
>> >> > > >
>> >> > > > My concern was mostly on the fact that there is already another RFC
>> >> > > > under discussion that uses the same numbers and has different hooks
>> >> > > > placed in scheduler code (Steve's sched-freq); so, additional hooks
>> >> > > > might generate confusion, IMHO.
>> >> > >
>> >> > > So this is about the hooks rather than about their arguments after
>> >> > > all, isn't it?
>> >> > >
>> >> > > I fail to see why it is better to drop the arguments and leave the hooks, then.
>> >> > >
>> >> >
>> >> > It's about where we place such hooks and what arguments they have.
>> >> > Without the schedutil governor as a consumer the current position makes
>> >> > sense, but some of the arguments are not used. With schedutil both
>> >> > position and arguments make sense, but a different implementation
>> >> > (sched-freq) might have different needs w.r.t. position and arguments.
>> >>
>> >> And that's fine.  If the current position and/or arguments are not suitable,
>> >> they'll need to be changed.  It's not like things introduced today are set
>> >> in stone forever.
>> >>
>> >> Peter has already shown how they may be changed to make everyone happy,
>> >> so I don't really see what the fuss is about.
>> >
>> > I see this patch in linux-next now. Did it ever get Peter's or Ingo's
>> > Ack?
>>
>> No, but none of them said "no" either.
>>
>> And the interface was suggested by Peter in the first place.
>>
>> > Also it seems weird to me that this patch touching sched code is going
>> > through the pm tree.
>>
>> That's for purely practical reasons.  There are lots of PM changes
>> depending on it that have nothing to do with the scheduler.  I've been
>> talking about that for several times now, last time in my yesterday
>> post (http://marc.info/?l=linux-pm&m=145740561402948&w=2).  I've been
>> talking openly about what I'm going to do with this all the time.
>>
>> No one is hiding things from anyone or trying to slip them through
>> past somebody here if that's what you're worried about.
>>
>> > When it comes times to experiment more with the interfaces and make the
>> > "future changes" that everyone keeps talking about, who is the
>> > maintainer? Who has the last word?
>>
>> As usual, it is about consensus.
>
> To be fair, that consensus should be recorded formally by Reviewed-by
> and Acked-by tags.

I would feel much more comfortable with ACKs on the commits touching
the scheduler code, no question about that. :-)

That said, if another maintainer makes PM-related or ACPI-related
changes and follows my suggestions all the way through, I may not feel
like I have to ACK all of that every time.  After all, it all boils
down to what happens to the pull request eventually and Acked-by tags
may or may not help there.

>>
>> This is on a boundary of two subsystems and I have good reasons to do
>> it.  One of the maintainers of the other subsystem involved is working
>> with me all the time and I'm following his suggestions.  Isn't that
>> really sufficient?
>>
>> But really please see
>> http://marc.info/?l=linux-pm&m=145740561402948&w=2 as it means I'm
>> actually going to do what Juri and Steve asked for unless I'm told
>> that this is a bad idea.
>
> I'll take a look. Note that Steve, Juri and Vincent are all at a
> conference this week so their responses may be slow.

That's fine.

I'm not going to send new versions of the patches any time soon
(unless somebody points out a problem to fix in them to me).

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra March 9, 2016, 12:35 p.m. UTC | #16
On Mon, Feb 15, 2016 at 10:47:22PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Introduce a mechanism by which parts of the cpufreq subsystem
> ("setpolicy" drivers or the core) can register callbacks to be
> executed from cpufreq_update_util() which is invoked by the
> scheduler's update_load_avg() on CPU utilization changes.
> 
> This allows the "setpolicy" drivers to dispense with their timers
> and do all of the computations they need and frequency/voltage
> adjustments in the update_load_avg() code path, among other things.
> 
> The update_load_avg() changes were suggested by Peter Zijlstra.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/cpufreq.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/cpufreq.h   |   34 ++++++++++++++++++++++++++++++++++
>  kernel/sched/deadline.c   |    4 ++++
>  kernel/sched/fair.c       |   26 +++++++++++++++++++++++++-
>  kernel/sched/rt.c         |    4 ++++
>  kernel/sched/sched.h      |    1 +
>  6 files changed, 113 insertions(+), 1 deletion(-)
> 

So with the understanding that we'll work on getting rid of
cpufreq_trigger_update().

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Also, Vincent had some concerns about the exact placement of the
callback, and I see no problem in moving it if there's need.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki March 9, 2016, 1:22 p.m. UTC | #17
On Wed, Mar 9, 2016 at 1:35 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Feb 15, 2016 at 10:47:22PM +0100, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Introduce a mechanism by which parts of the cpufreq subsystem
>> ("setpolicy" drivers or the core) can register callbacks to be
>> executed from cpufreq_update_util() which is invoked by the
>> scheduler's update_load_avg() on CPU utilization changes.
>>
>> This allows the "setpolicy" drivers to dispense with their timers
>> and do all of the computations they need and frequency/voltage
>> adjustments in the update_load_avg() code path, among other things.
>>
>> The update_load_avg() changes were suggested by Peter Zijlstra.
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
>> ---
>>  drivers/cpufreq/cpufreq.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/cpufreq.h   |   34 ++++++++++++++++++++++++++++++++++
>>  kernel/sched/deadline.c   |    4 ++++
>>  kernel/sched/fair.c       |   26 +++++++++++++++++++++++++-
>>  kernel/sched/rt.c         |    4 ++++
>>  kernel/sched/sched.h      |    1 +
>>  6 files changed, 113 insertions(+), 1 deletion(-)
>>
>
> So with the understanding that we'll work on getting rid of
> cpufreq_trigger_update().

That definitely is the plan.

> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thanks! :-)

> Also, Vincent had some concerns about the exact placement of the
> callback, and I see no problem in moving it if there's need.

Yup, same here.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar March 9, 2016, 1:32 p.m. UTC | #18
* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Feb 15, 2016 at 10:47:22PM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > Introduce a mechanism by which parts of the cpufreq subsystem
> > ("setpolicy" drivers or the core) can register callbacks to be
> > executed from cpufreq_update_util() which is invoked by the
> > scheduler's update_load_avg() on CPU utilization changes.
> > 
> > This allows the "setpolicy" drivers to dispense with their timers
> > and do all of the computations they need and frequency/voltage
> > adjustments in the update_load_avg() code path, among other things.
> > 
> > The update_load_avg() changes were suggested by Peter Zijlstra.
> > 
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
> >  drivers/cpufreq/cpufreq.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/cpufreq.h   |   34 ++++++++++++++++++++++++++++++++++
> >  kernel/sched/deadline.c   |    4 ++++
> >  kernel/sched/fair.c       |   26 +++++++++++++++++++++++++-
> >  kernel/sched/rt.c         |    4 ++++
> >  kernel/sched/sched.h      |    1 +
> >  6 files changed, 113 insertions(+), 1 deletion(-)
> > 
> 
> So with the understanding that we'll work on getting rid of
> cpufreq_trigger_update().
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

I'm happy with the latest iteration and with the general direction as well!

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki March 9, 2016, 1:39 p.m. UTC | #19
On Wed, Mar 9, 2016 at 2:32 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Mon, Feb 15, 2016 at 10:47:22PM +0100, Rafael J. Wysocki wrote:
>> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >
>> > Introduce a mechanism by which parts of the cpufreq subsystem
>> > ("setpolicy" drivers or the core) can register callbacks to be
>> > executed from cpufreq_update_util() which is invoked by the
>> > scheduler's update_load_avg() on CPU utilization changes.
>> >
>> > This allows the "setpolicy" drivers to dispense with their timers
>> > and do all of the computations they need and frequency/voltage
>> > adjustments in the update_load_avg() code path, among other things.
>> >
>> > The update_load_avg() changes were suggested by Peter Zijlstra.
>> >
>> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
>> > ---
>> >  drivers/cpufreq/cpufreq.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>> >  include/linux/cpufreq.h   |   34 ++++++++++++++++++++++++++++++++++
>> >  kernel/sched/deadline.c   |    4 ++++
>> >  kernel/sched/fair.c       |   26 +++++++++++++++++++++++++-
>> >  kernel/sched/rt.c         |    4 ++++
>> >  kernel/sched/sched.h      |    1 +
>> >  6 files changed, 113 insertions(+), 1 deletion(-)
>> >
>>
>> So with the understanding that we'll work on getting rid of
>> cpufreq_trigger_update().
>>
>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> I'm happy with the latest iteration and with the general direction as well!
>
> Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks a lot!

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vincent Guittot March 10, 2016, 2:12 a.m. UTC | #20
On 9 March 2016 at 19:35, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Feb 15, 2016 at 10:47:22PM +0100, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Introduce a mechanism by which parts of the cpufreq subsystem
>> ("setpolicy" drivers or the core) can register callbacks to be
>> executed from cpufreq_update_util() which is invoked by the
>> scheduler's update_load_avg() on CPU utilization changes.
>>
>> This allows the "setpolicy" drivers to dispense with their timers
>> and do all of the computations they need and frequency/voltage
>> adjustments in the update_load_avg() code path, among other things.
>>
>> The update_load_avg() changes were suggested by Peter Zijlstra.
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
>> ---
>>  drivers/cpufreq/cpufreq.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/cpufreq.h   |   34 ++++++++++++++++++++++++++++++++++
>>  kernel/sched/deadline.c   |    4 ++++
>>  kernel/sched/fair.c       |   26 +++++++++++++++++++++++++-
>>  kernel/sched/rt.c         |    4 ++++
>>  kernel/sched/sched.h      |    1 +
>>  6 files changed, 113 insertions(+), 1 deletion(-)
>>
>
> So with the understanding that we'll work on getting rid of
> cpufreq_trigger_update().
>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> Also, Vincent had some concerns about the exact placement of the
> callback, and I see no problem in moving it if there's need.

Yes, as explained previously we can probably  use other placement to
not miss any immediate change of rq's utilization because of task
migration but this optimization can probably be done in a next step

> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -151,6 +151,36 @@  static inline bool policy_is_shared(stru
 extern struct kobject *cpufreq_global_kobject;
 
 #ifdef CONFIG_CPU_FREQ
+void cpufreq_update_util(u64 time, unsigned long util, unsigned long max);
+
+/**
+ * cpufreq_trigger_update - Trigger CPU performance state evaluation if needed.
+ * @time: Current time.
+ *
+ * The way cpufreq is currently arranged requires it to evaluate the CPU
+ * performance state (frequency/voltage) on a regular basis to prevent it from
+ * being stuck in a completely inadequate performance level for too long.
+ * That is not guaranteed to happen if the updates are only triggered from CFS,
+ * though, because they may not be coming in if RT or deadline tasks are active
+ * all the time (or there are RT and DL tasks only).
+ *
+ * As a workaround for that issue, this function is called by the RT and DL
+ * sched classes to trigger extra cpufreq updates to prevent it from stalling,
+ * but that really is a band-aid.  Going forward it should be replaced with
+ * solutions targeted more specifically at RT and DL tasks.
+ */
+static inline void cpufreq_trigger_update(u64 time)
+{
+	cpufreq_update_util(time, ULONG_MAX, 0);
+}
+
+struct update_util_data {
+	void (*func)(struct update_util_data *data,
+		     u64 time, unsigned long util, unsigned long max);
+};
+
+void cpufreq_set_update_util_data(int cpu, struct update_util_data *data);
+
 unsigned int cpufreq_get(unsigned int cpu);
 unsigned int cpufreq_quick_get(unsigned int cpu);
 unsigned int cpufreq_quick_get_max(unsigned int cpu);
@@ -162,6 +192,10 @@  int cpufreq_update_policy(unsigned int c
 bool have_governor_per_policy(void);
 struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);
 #else
+static inline void cpufreq_update_util(u64 time, unsigned long util,
+				       unsigned long max) {}
+static inline void cpufreq_trigger_update(u64 time) {}
+
 static inline unsigned int cpufreq_get(unsigned int cpu)
 {
 	return 0;
Index: linux-pm/kernel/sched/sched.h
===================================================================
--- linux-pm.orig/kernel/sched/sched.h
+++ linux-pm/kernel/sched/sched.h
@@ -9,6 +9,7 @@ 
 #include <linux/irq_work.h>
 #include <linux/tick.h>
 #include <linux/slab.h>
+#include <linux/cpufreq.h>
 
 #include "cpupri.h"
 #include "cpudeadline.h"
Index: linux-pm/kernel/sched/fair.c
===================================================================
--- linux-pm.orig/kernel/sched/fair.c
+++ linux-pm/kernel/sched/fair.c
@@ -2824,7 +2824,8 @@  static inline void update_load_avg(struc
 {
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 	u64 now = cfs_rq_clock_task(cfs_rq);
-	int cpu = cpu_of(rq_of(cfs_rq));
+	struct rq *rq = rq_of(cfs_rq);
+	int cpu = cpu_of(rq);
 
 	/*
 	 * Track task load average for carrying it to new CPU after migrated, and
@@ -2836,6 +2837,29 @@  static inline void update_load_avg(struc
 
 	if (update_cfs_rq_load_avg(now, cfs_rq) && update_tg)
 		update_tg_load_avg(cfs_rq, 0);
+
+	if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) {
+		unsigned long max = rq->cpu_capacity_orig;
+
+		/*
+		 * There are a few boundary cases this might miss but it should
+		 * get called often enough that that should (hopefully) not be
+		 * a real problem -- added to that it only calls on the local
+		 * CPU, so if we enqueue remotely we'll miss an update, but
+		 * the next tick/schedule should update.
+		 *
+		 * It will not get called when we go idle, because the idle
+		 * thread is a different class (!fair), nor will the utilization
+		 * number include things like RT tasks.
+		 *
+		 * As is, the util number is not freq-invariant (we'd have to
+		 * implement arch_scale_freq_capacity() for that).
+		 *
+		 * See cpu_util().
+		 */
+		cpufreq_update_util(rq_clock(rq),
+				    min(cfs_rq->avg.util_avg, max), max);
+	}
 }
 
 static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
Index: linux-pm/kernel/sched/deadline.c
===================================================================
--- linux-pm.orig/kernel/sched/deadline.c
+++ linux-pm/kernel/sched/deadline.c
@@ -726,6 +726,10 @@  static void update_curr_dl(struct rq *rq
 	if (!dl_task(curr) || !on_dl_rq(dl_se))
 		return;
 
+	/* Kick cpufreq (see the comment in linux/cpufreq.h). */
+	if (cpu_of(rq) == smp_processor_id())
+		cpufreq_trigger_update(rq_clock(rq));
+
 	/*
 	 * Consumed budget is computed considering the time as
 	 * observed by schedulable tasks (excluding time spent
Index: linux-pm/kernel/sched/rt.c
===================================================================
--- linux-pm.orig/kernel/sched/rt.c
+++ linux-pm/kernel/sched/rt.c
@@ -945,6 +945,10 @@  static void update_curr_rt(struct rq *rq
 	if (curr->sched_class != &rt_sched_class)
 		return;
 
+	/* Kick cpufreq (see the comment in linux/cpufreq.h). */
+	if (cpu_of(rq) == smp_processor_id())
+		cpufreq_trigger_update(rq_clock(rq));
+
 	delta_exec = rq_clock_task(rq) - curr->se.exec_start;
 	if (unlikely((s64)delta_exec <= 0))
 		return;
Index: linux-pm/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -102,6 +102,51 @@  static LIST_HEAD(cpufreq_governor_list);
 static struct cpufreq_driver *cpufreq_driver;
 static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
 static DEFINE_RWLOCK(cpufreq_driver_lock);
+
+static DEFINE_PER_CPU(struct update_util_data *, cpufreq_update_util_data);
+
+/**
+ * cpufreq_set_update_util_data - Populate the CPU's update_util_data pointer.
+ * @cpu: The CPU to set the pointer for.
+ * @data: New pointer value.
+ *
+ * Set and publish the update_util_data pointer for the given CPU.  That pointer
+ * points to a struct update_util_data object containing a callback function
+ * to call from cpufreq_update_util().  That function will be called from an RCU
+ * read-side critical section, so it must not sleep.
+ *
+ * Callers must use RCU callbacks to free any memory that might be accessed
+ * via the old update_util_data pointer or invoke synchronize_rcu() right after
+ * this function to avoid use-after-free.
+ */
+void cpufreq_set_update_util_data(int cpu, struct update_util_data *data)
+{
+	rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data);
+}
+EXPORT_SYMBOL_GPL(cpufreq_set_update_util_data);
+
+/**
+ * cpufreq_update_util - Take a note about CPU utilization changes.
+ * @time: Current time.
+ * @util: Current utilization.
+ * @max: Utilization ceiling.
+ *
+ * This function is called by the scheduler on every invocation of
+ * update_load_avg() on the CPU whose utilization is being updated.
+ */
+void cpufreq_update_util(u64 time, unsigned long util, unsigned long max)
+{
+	struct update_util_data *data;
+
+	rcu_read_lock();
+
+	data = rcu_dereference(*this_cpu_ptr(&cpufreq_update_util_data));
+	if (data && data->func)
+		data->func(data, time, util, max);
+
+	rcu_read_unlock();
+}
+
 DEFINE_MUTEX(cpufreq_governor_lock);
 
 /* Flag to suspend/resume CPUFreq governors */