diff mbox

[6/8] cpufreq/schedutil: sum per-sched class utilization

Message ID 1457932932-28444-7-git-send-email-mturquette+renesas@baylibre.com (mailing list archive)
State RFC, archived
Headers show

Commit Message

Michael Turquette March 14, 2016, 5:22 a.m. UTC
Patch, "sched/cpufreq: pass sched class into cpufreq_update_util" made
it possible for calls of cpufreq_update_util() to specify scheduler
class, particularly cfs, rt & dl.

Update the schedutil governor to store these individual utilizations per
cpu and sum them to create a total utilization contribution.

Signed-off-by: Michael Turquette <mturquette+renesas@baylibre.com>
---
 drivers/cpufreq/cpufreq_schedutil.c | 39 +++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

Comments

Peter Zijlstra March 15, 2016, 9:29 p.m. UTC | #1
On Sun, Mar 13, 2016 at 10:22:10PM -0700, Michael Turquette wrote:

> +static unsigned long sugov_sum_total_util(struct sugov_cpu *sg_cpu)
> +{
> +	enum sched_class_util sc;
> +
> +	/* sum the utilization of all sched classes */
> +	sg_cpu->total_util = 0;
> +	for (sc = 0; sc < nr_util_types; sc++)
> +		sg_cpu->total_util += sg_cpu->util[sc];
> +
> +	return sg_cpu->total_util;
> +}

> @@ -153,7 +172,7 @@ static unsigned int sugov_next_freq(struct sugov_policy *sg_policy,
>  		if ((s64)delta_ns > NSEC_PER_SEC / HZ)
>  			continue;
>  
> -		j_util = j_sg_cpu->util;
> +		j_util = j_sg_cpu->total_util;
>  		j_max = j_sg_cpu->max;
>  		if (j_util > j_max)
>  			return max_f;

So while not strictly wrong, I think we can do so much better.

Changelog doesn't mention anything useful, like that this is indeed very
rough and what we really should be doing etc..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra March 16, 2016, 7:38 a.m. UTC | #2
On Tue, Mar 15, 2016 at 03:09:51PM -0700, Michael Turquette wrote:
> Quoting Peter Zijlstra (2016-03-15 14:29:26)
> > On Sun, Mar 13, 2016 at 10:22:10PM -0700, Michael Turquette wrote:
> > 
> > > +static unsigned long sugov_sum_total_util(struct sugov_cpu *sg_cpu)
> > > +{
> > > +     enum sched_class_util sc;
> > > +
> > > +     /* sum the utilization of all sched classes */
> > > +     sg_cpu->total_util = 0;
> > > +     for (sc = 0; sc < nr_util_types; sc++)
> > > +             sg_cpu->total_util += sg_cpu->util[sc];
> > > +
> > > +     return sg_cpu->total_util;
> > > +}
> > 
> > > @@ -153,7 +172,7 @@ static unsigned int sugov_next_freq(struct sugov_policy *sg_policy,
> > >               if ((s64)delta_ns > NSEC_PER_SEC / HZ)
> > >                       continue;
> > >  
> > > -             j_util = j_sg_cpu->util;
> > > +             j_util = j_sg_cpu->total_util;
> > >               j_max = j_sg_cpu->max;
> > >               if (j_util > j_max)
> > >                       return max_f;
> > 
> > So while not strictly wrong, I think we can do so much better.
> > 
> > Changelog doesn't mention anything useful, like that this is indeed very
> > rough and what we really should be doing etc..
> 
> What should we really be doing? Summing the scheduler class
> contributions seems correct to me.
> 
> Are you referring to the fact that dl and rt are passing bogus values
> into cpufreq_update_util()? If so I'm happy to add a note about that in
> the changelog.

Somewhere in the giant discussions I mentioned that we should be looking
at a CPPC like interface and pass {min,max} tuples to the cpufreq
selection thingy.

In that same discussion I also mentioned that we must compute min as the
hard dl reservation, but that for max we can actually use the avg dl +
avg rt + avg cfs.

That way there is far more room for selecting a sensible frequency.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Muckle March 16, 2016, 6:20 p.m. UTC | #3
On 03/16/2016 12:38 AM, Peter Zijlstra wrote:
> Somewhere in the giant discussions I mentioned that we should be looking
> at a CPPC like interface and pass {min,max} tuples to the cpufreq
> selection thingy.
> 
> In that same discussion I also mentioned that we must compute min as the
> hard dl reservation, but that for max we can actually use the avg dl +
> avg rt + avg cfs.
> 
> That way there is far more room for selecting a sensible frequency.

Doesn't the above min/max policy mean that the platform will likely
underserve the task load? If avg dl+rt+cfs represents our best estimate
of the work to be done, I would think that should be the min.

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra March 16, 2016, 6:36 p.m. UTC | #4
On Wed, Mar 16, 2016 at 11:20:45AM -0700, Steve Muckle wrote:
> On 03/16/2016 12:38 AM, Peter Zijlstra wrote:
> > Somewhere in the giant discussions I mentioned that we should be looking
> > at a CPPC like interface and pass {min,max} tuples to the cpufreq
> > selection thingy.
> > 
> > In that same discussion I also mentioned that we must compute min as the
> > hard dl reservation, but that for max we can actually use the avg dl +
> > avg rt + avg cfs.
> > 
> > That way there is far more room for selecting a sensible frequency.
> 
> Doesn't the above min/max policy mean that the platform will likely
> underserve the task load? If avg dl+rt+cfs represents our best estimate
> of the work to be done, I would think that should be the min.

Can't be the min, avg_dl might (and typically will be) must lower than
the worst case utilization estimates.

However if we use that as our min, peaks in DL utilization will not
complete, because we run at too low a frequency.

Therefore, the min must be given by our worst case utilization
reservation, not by the actual avg consumed.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Muckle March 16, 2016, 7:12 p.m. UTC | #5
On 03/16/2016 11:36 AM, Peter Zijlstra wrote:
> On Wed, Mar 16, 2016 at 11:20:45AM -0700, Steve Muckle wrote:
>> On 03/16/2016 12:38 AM, Peter Zijlstra wrote:
>>> Somewhere in the giant discussions I mentioned that we should be looking
>>> at a CPPC like interface and pass {min,max} tuples to the cpufreq
>>> selection thingy.
>>>
>>> In that same discussion I also mentioned that we must compute min as the
>>> hard dl reservation, but that for max we can actually use the avg dl +
>>> avg rt + avg cfs.
>>>
>>> That way there is far more room for selecting a sensible frequency.
>>
>> Doesn't the above min/max policy mean that the platform will likely
>> underserve the task load? If avg dl+rt+cfs represents our best estimate
>> of the work to be done, I would think that should be the min.
> 
> Can't be the min, avg_dl might (and typically will be) must lower than
> the worst case utilization estimates.
> 
> However if we use that as our min, peaks in DL utilization will not
> complete, because we run at too low a frequency.

Doesn't that mean the max (if one is specified) should also use hard dl?
I.e. hard dl + rt + cfs.

> Therefore, the min must be given by our worst case utilization
> reservation, not by the actual avg consumed.

Ok sure. My concern was more about the platform potentially ignoring the
CFS and RT capacity requests. So given the point about DL bw I'd think
the min would then be hard dl + rt + cfs.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq_schedutil.c b/drivers/cpufreq/cpufreq_schedutil.c
index 18d9ca3..b9234e1 100644
--- a/drivers/cpufreq/cpufreq_schedutil.c
+++ b/drivers/cpufreq/cpufreq_schedutil.c
@@ -46,8 +46,10 @@  struct sugov_cpu {
 	struct freq_update_hook update_hook;
 	struct sugov_policy *sg_policy;
 
+	unsigned long util[nr_util_types];
+	unsigned long total_util;
+
 	/* The fields below are only needed when sharing a policy. */
-	unsigned long util;
 	unsigned long max;
 	u64 last_update;
 };
@@ -106,6 +108,18 @@  static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
 	trace_cpu_frequency(freq, smp_processor_id());
 }
 
+static unsigned long sugov_sum_total_util(struct sugov_cpu *sg_cpu)
+{
+	enum sched_class_util sc;
+
+	/* sum the utilization of all sched classes */
+	sg_cpu->total_util = 0;
+	for (sc = 0; sc < nr_util_types; sc++)
+		sg_cpu->total_util += sg_cpu->util[sc];
+
+	return sg_cpu->total_util;
+}
+
 static void sugov_update_single(struct freq_update_hook *hook,
 				enum sched_class_util sc, u64 time,
 				unsigned long util, unsigned long max)
@@ -113,12 +127,17 @@  static void sugov_update_single(struct freq_update_hook *hook,
 	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_hook);
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	unsigned int max_f, next_f;
+	unsigned long total_util;
 
 	if (!sugov_should_update_freq(sg_policy, time))
 		return;
 
+	/* update per-sched_class utilization for this cpu */
+	sg_cpu->util[sc] = util;
+	total_util = sugov_sum_total_util(sg_cpu);
+
 	max_f = sg_policy->max_freq;
-	next_f = util > max ? max_f : util * max_f / max;
+	next_f = total_util > max ? max_f : total_util * max_f / max;
 	sugov_update_commit(sg_policy, time, next_f);
 }
 
@@ -153,7 +172,7 @@  static unsigned int sugov_next_freq(struct sugov_policy *sg_policy,
 		if ((s64)delta_ns > NSEC_PER_SEC / HZ)
 			continue;
 
-		j_util = j_sg_cpu->util;
+		j_util = j_sg_cpu->total_util;
 		j_max = j_sg_cpu->max;
 		if (j_util > j_max)
 			return max_f;
@@ -174,15 +193,19 @@  static void sugov_update_shared(struct freq_update_hook *hook,
 	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_hook);
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	unsigned int next_f;
+	unsigned long total_util;
 
 	raw_spin_lock(&sg_policy->update_lock);
 
-	sg_cpu->util = util;
+	sg_cpu->util[sc] = util;
 	sg_cpu->max = max;
 	sg_cpu->last_update = time;
 
+	/* update per-sched_class utilization for this cpu */
+	total_util = sugov_sum_total_util(sg_cpu);
+
 	if (sugov_should_update_freq(sg_policy, time)) {
-		next_f = sugov_next_freq(sg_policy, util, max);
+		next_f = sugov_next_freq(sg_policy, total_util, max);
 		sugov_update_commit(sg_policy, time, next_f);
 	}
 
@@ -423,6 +446,7 @@  static int sugov_start(struct cpufreq_policy *policy)
 {
 	struct sugov_policy *sg_policy = policy->governor_data;
 	unsigned int cpu;
+	enum sched_class_util sc;
 
 	sg_policy->freq_update_delay_ns = sg_policy->tunables->rate_limit_us * NSEC_PER_USEC;
 	sg_policy->last_freq_update_time = 0;
@@ -434,8 +458,11 @@  static int sugov_start(struct cpufreq_policy *policy)
 		struct sugov_cpu *sg_cpu = &per_cpu(sugov_cpu, cpu);
 
 		sg_cpu->sg_policy = sg_policy;
+		for (sc = 0; sc < nr_util_types; sc++) {
+			sg_cpu->util[sc] = ULONG_MAX;
+			sg_cpu->total_util = ULONG_MAX;
+		}
 		if (policy_is_shared(policy)) {
-			sg_cpu->util = ULONG_MAX;
 			sg_cpu->max = 0;
 			sg_cpu->last_update = 0;
 			cpufreq_set_freq_update_hook(cpu, &sg_cpu->update_hook,