diff mbox

[5/8] sched/cpufreq: pass sched class into cpufreq_update_util

Message ID 20160316131008.GW6344@twins.programming.kicks-ass.net (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Peter Zijlstra March 16, 2016, 1:10 p.m. UTC
On Wed, Mar 16, 2016 at 01:39:10PM +0100, Rafael J. Wysocki wrote:
> On Wed, Mar 16, 2016 at 9:53 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, Mar 16, 2016 at 09:29:59AM +0100, Vincent Guittot wrote:
> >> I wonder if it's really worth passing per sched_class request to
> >> sched_util ? sched_util is about selecting a frequency based on the
> >> utilization of the CPU, it only needs a value that reflect the whole
> >> utilization. Can't we sum  (or whatever the formula we want to apply)
> >> utilizations before calling cpufreq_update_util
> >
> > So I've thought the same; but I'm conflicted, its a shame to compute
> > anything if the call then doesn't do anything with it.
> >
> > And keeping a structure of all the various numbers to pass in also has
> > cost of yet another cacheline to touch.
> 
> In principle we can use high-order bits of util and max to encode the
> information on where they come from.
> 
> Of course, that translates to additional ifs in the governor, but I
> guess they are unavoidable anyway.

Another thing we can do, for as long as we have the indirect function
call anyway, is stuff extra pointers in that same cacheline we pull the
function from.

Something like the below; there's room for 8 pointers (including the
function pointer) in a cacheline.

That would allow the callback to fetch whatever data it feels is
required (could be all of it).

We could also put a u64 *now = &rq->clock in, which would leave another
4 pointers for DL/RT support.

And since we're then back to 1-2 arguments on the function, we can add a
flags/mask field to indicate what changed (and if the function
throttles, it can keep a mask of all that changed since last time it
actually did something, or allow punching through the throttle if our
minimum guarantee changes or whatnot).

(this would of course require we allocate struct update_util_data with
the proper alignment thingies etc..)

Then again, maybe this is somewhat overboard :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rafael J. Wysocki March 16, 2016, 1:23 p.m. UTC | #1
On Wed, Mar 16, 2016 at 2:10 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Mar 16, 2016 at 01:39:10PM +0100, Rafael J. Wysocki wrote:
>> On Wed, Mar 16, 2016 at 9:53 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Wed, Mar 16, 2016 at 09:29:59AM +0100, Vincent Guittot wrote:
>> >> I wonder if it's really worth passing per sched_class request to
>> >> sched_util ? sched_util is about selecting a frequency based on the
>> >> utilization of the CPU, it only needs a value that reflect the whole
>> >> utilization. Can't we sum  (or whatever the formula we want to apply)
>> >> utilizations before calling cpufreq_update_util
>> >
>> > So I've thought the same; but I'm conflicted, its a shame to compute
>> > anything if the call then doesn't do anything with it.
>> >
>> > And keeping a structure of all the various numbers to pass in also has
>> > cost of yet another cacheline to touch.
>>
>> In principle we can use high-order bits of util and max to encode the
>> information on where they come from.
>>
>> Of course, that translates to additional ifs in the governor, but I
>> guess they are unavoidable anyway.
>
> Another thing we can do, for as long as we have the indirect function
> call anyway, is stuff extra pointers in that same cacheline we pull the
> function from.
>
> Something like the below; there's room for 8 pointers (including the
> function pointer) in a cacheline.
>
> That would allow the callback to fetch whatever data it feels is
> required (could be all of it).
>
> We could also put a u64 *now = &rq->clock in, which would leave another
> 4 pointers for DL/RT support.
>
> And since we're then back to 1-2 arguments on the function, we can add a
> flags/mask field to indicate what changed (and if the function
> throttles, it can keep a mask of all that changed since last time it
> actually did something, or allow punching through the throttle if our
> minimum guarantee changes or whatnot).
>
> (this would of course require we allocate struct update_util_data with
> the proper alignment thingies etc..)
>
> Then again, maybe this is somewhat overboard :-)

I was thinking about something along these lines, but then I thought
that passing in registers would be more efficient.

One advantage I can see here is that we don't pass arguments that may
not be used by the callee.

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index ba49c9efd0b2..d34d75c5cc93 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -3236,8 +3236,10 @@ static inline unsigned long rlimit_max(unsigned int limit)
>
>  #ifdef CONFIG_CPU_FREQ
>  struct update_util_data {
> -       void (*func)(struct update_util_data *data,
> -                    u64 time, unsigned long util, unsigned long max);
> +       unsigned long *cfs_util_avg;
> +       unsigned long *cfs_util_max;
> +
> +       void (*func)(struct update_util_data *data, u64 time);
>  };

How do we ensure proper alignment?

>  void cpufreq_set_update_util_data(int cpu, struct update_util_data *data);
> diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
> index 928c4ba32f68..de5b20b11de3 100644
> --- a/kernel/sched/cpufreq.c
> +++ b/kernel/sched/cpufreq.c
> @@ -32,6 +32,9 @@ void cpufreq_set_update_util_data(int cpu, struct update_util_data *data)
>         if (WARN_ON(data && !data->func))
>                 return;
>
> +       data->cfs_util_avg = &cpu_rq(cpu)->cfs.avg.util_avg;
> +       data->cfs_util_max = &cpu_rq(cpu)->cpu_capacity_orig;
> +
>         rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data);
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_set_update_util_data);
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra March 16, 2016, 1:43 p.m. UTC | #2
On Wed, Mar 16, 2016 at 02:23:21PM +0100, Rafael J. Wysocki wrote:
> On Wed, Mar 16, 2016 at 2:10 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > (this would of course require we allocate struct update_util_data with
> > the proper alignment thingies etc..)

> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index ba49c9efd0b2..d34d75c5cc93 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -3236,8 +3236,10 @@ static inline unsigned long rlimit_max(unsigned int limit)
> >
> >  #ifdef CONFIG_CPU_FREQ
> >  struct update_util_data {
> > -       void (*func)(struct update_util_data *data,
> > -                    u64 time, unsigned long util, unsigned long max);
> > +       unsigned long *cfs_util_avg;
> > +       unsigned long *cfs_util_max;
> > +
> > +       void (*func)(struct update_util_data *data, u64 time);
> >  };

we should add: ____cacheline_aligned here

> How do we ensure proper alignment?

Depends on the allocator; not all of them respect the struct alignment
attribute.

kernel/sched/cpufreq.c:DEFINE_PER_CPU(struct update_util_data *, cpufreq_update_util_data);

That one could use:

DEFINE_PER_CPU_ALIGNED() instead

as would this one:

drivers/cpufreq/cpufreq_governor.c:static DEFINE_PER_CPU(struct cpu_dbs_info, cpu_dbs);

Because when you cacheline align dbs_info, its update_util_data member
will also get the correct alignment because of the structure attribute.


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ba49c9efd0b2..d34d75c5cc93 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3236,8 +3236,10 @@  static inline unsigned long rlimit_max(unsigned int limit)
 
 #ifdef CONFIG_CPU_FREQ
 struct update_util_data {
-	void (*func)(struct update_util_data *data,
-		     u64 time, unsigned long util, unsigned long max);
+	unsigned long *cfs_util_avg;
+	unsigned long *cfs_util_max;
+
+	void (*func)(struct update_util_data *data, u64 time);
 };
 
 void cpufreq_set_update_util_data(int cpu, struct update_util_data *data);
diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index 928c4ba32f68..de5b20b11de3 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c
@@ -32,6 +32,9 @@  void cpufreq_set_update_util_data(int cpu, struct update_util_data *data)
 	if (WARN_ON(data && !data->func))
 		return;
 
+	data->cfs_util_avg = &cpu_rq(cpu)->cfs.avg.util_avg;
+	data->cfs_util_max = &cpu_rq(cpu)->cpu_capacity_orig;
+
 	rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data);
 }
 EXPORT_SYMBOL_GPL(cpufreq_set_update_util_data);