diff mbox

[RFD,4/5] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq

Message ID 20170324140900.7334-5-juri.lelli@arm.com (mailing list archive)
State RFC, archived
Headers show

Commit Message

Juri Lelli March 24, 2017, 2:08 p.m. UTC
No assumption can be made upon the rate at which frequency updates get
triggered, as there are scheduling policies (like SCHED_DEADLINE) which
don't trigger them so frequently.

Remove such assumption from the code.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Claudio Scordino <claudio@evidence.eu.com>
---
 kernel/sched/cpufreq_schedutil.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

Comments

Rafael J. Wysocki March 29, 2017, 10:41 p.m. UTC | #1
On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
> No assumption can be made upon the rate at which frequency updates get
> triggered, as there are scheduling policies (like SCHED_DEADLINE) which
> don't trigger them so frequently.
> 
> Remove such assumption from the code.

But the util/max values for idle CPUs may be stale, no?

Thanks,
Rafael
Juri Lelli March 30, 2017, 8:58 a.m. UTC | #2
Hi,

On 30/03/17 00:41, Rafael J. Wysocki wrote:
> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
> > No assumption can be made upon the rate at which frequency updates get
> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which
> > don't trigger them so frequently.
> > 
> > Remove such assumption from the code.
> 
> But the util/max values for idle CPUs may be stale, no?
> 

Right, that might be a problem. A proper solution I think would be to
remotely update such values for idle CPUs, and I believe Vincent is
working on a patch for that.

As mid-term workarounds, changing a bit the current one, come to my
mind:

 - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set
 - remove CFS contribution (without triggering a freq update) when a CPU
   enters IDLE; this might not work well, though, as we probably want
   to keep in blocked util contribution for a bit

What you think is the way to go?

Thanks,

- Juri
Vincent Guittot March 30, 2017, 1:21 p.m. UTC | #3
On 30 March 2017 at 10:58, Juri Lelli <juri.lelli@arm.com> wrote:
> Hi,
>
> On 30/03/17 00:41, Rafael J. Wysocki wrote:
>> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
>> > No assumption can be made upon the rate at which frequency updates get
>> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which
>> > don't trigger them so frequently.
>> >
>> > Remove such assumption from the code.
>>
>> But the util/max values for idle CPUs may be stale, no?
>>
>
> Right, that might be a problem. A proper solution I think would be to
> remotely update such values for idle CPUs, and I believe Vincent is
> working on a patch for that.

Yes. I'm working on a patch that will regularly update the blocked
load/utilization of idle CPU. This update will be done on a slow pace
to make sure that utilization and load will be decayed regularly

>
> As mid-term workarounds, changing a bit the current one, come to my
> mind:
>
>  - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set
>  - remove CFS contribution (without triggering a freq update) when a CPU
>    enters IDLE; this might not work well, though, as we probably want
>    to keep in blocked util contribution for a bit
>
> What you think is the way to go?
>
> Thanks,
>
> - Juri
Rafael J. Wysocki March 30, 2017, 8:13 p.m. UTC | #4
On Thu, Mar 30, 2017 at 10:58 AM, Juri Lelli <juri.lelli@arm.com> wrote:
> Hi,

Hi,

> On 30/03/17 00:41, Rafael J. Wysocki wrote:
>> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
>> > No assumption can be made upon the rate at which frequency updates get
>> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which
>> > don't trigger them so frequently.
>> >
>> > Remove such assumption from the code.
>>
>> But the util/max values for idle CPUs may be stale, no?
>>
>
> Right, that might be a problem. A proper solution I think would be to
> remotely update such values for idle CPUs, and I believe Vincent is
> working on a patch for that.
>
> As mid-term workarounds, changing a bit the current one, come to my
> mind:
>
>  - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set
>  - remove CFS contribution (without triggering a freq update) when a CPU
>    enters IDLE; this might not work well, though, as we probably want
>    to keep in blocked util contribution for a bit
>
> What you think is the way to go?

Well, do we want SCHED_DEADLINE util contribution to be there even for
idle CPUs?

Thanks,
Rafael
Juri Lelli March 31, 2017, 7:31 a.m. UTC | #5
On 30/03/17 22:13, Rafael J. Wysocki wrote:
> On Thu, Mar 30, 2017 at 10:58 AM, Juri Lelli <juri.lelli@arm.com> wrote:
> > Hi,
> 
> Hi,
> 
> > On 30/03/17 00:41, Rafael J. Wysocki wrote:
> >> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
> >> > No assumption can be made upon the rate at which frequency updates get
> >> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which
> >> > don't trigger them so frequently.
> >> >
> >> > Remove such assumption from the code.
> >>
> >> But the util/max values for idle CPUs may be stale, no?
> >>
> >
> > Right, that might be a problem. A proper solution I think would be to
> > remotely update such values for idle CPUs, and I believe Vincent is
> > working on a patch for that.
> >
> > As mid-term workarounds, changing a bit the current one, come to my
> > mind:
> >
> >  - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set
> >  - remove CFS contribution (without triggering a freq update) when a CPU
> >    enters IDLE; this might not work well, though, as we probably want
> >    to keep in blocked util contribution for a bit
> >
> > What you think is the way to go?
> 
> Well, do we want SCHED_DEADLINE util contribution to be there even for
> idle CPUs?
> 

DEADLINE util contribution is removed, even if the CPU is idle, by the
reclaiming mechanism when we know (applying GRUB algorithm rules [1])
that it can't be used anymore by a task (roughly speaking). So, we
shouldn't have this problem in the DEADLINE case.

[1] https://marc.info/?l=linux-kernel&m=149029880524038
Rafael J. Wysocki March 31, 2017, 9:03 a.m. UTC | #6
On Fri, Mar 31, 2017 at 9:31 AM, Juri Lelli <juri.lelli@arm.com> wrote:
> On 30/03/17 22:13, Rafael J. Wysocki wrote:
>> On Thu, Mar 30, 2017 at 10:58 AM, Juri Lelli <juri.lelli@arm.com> wrote:
>> > Hi,
>>
>> Hi,
>>
>> > On 30/03/17 00:41, Rafael J. Wysocki wrote:
>> >> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
>> >> > No assumption can be made upon the rate at which frequency updates get
>> >> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which
>> >> > don't trigger them so frequently.
>> >> >
>> >> > Remove such assumption from the code.
>> >>
>> >> But the util/max values for idle CPUs may be stale, no?
>> >>
>> >
>> > Right, that might be a problem. A proper solution I think would be to
>> > remotely update such values for idle CPUs, and I believe Vincent is
>> > working on a patch for that.
>> >
>> > As mid-term workarounds, changing a bit the current one, come to my
>> > mind:
>> >
>> >  - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set
>> >  - remove CFS contribution (without triggering a freq update) when a CPU
>> >    enters IDLE; this might not work well, though, as we probably want
>> >    to keep in blocked util contribution for a bit
>> >
>> > What you think is the way to go?
>>
>> Well, do we want SCHED_DEADLINE util contribution to be there even for
>> idle CPUs?
>>
>
> DEADLINE util contribution is removed, even if the CPU is idle, by the
> reclaiming mechanism when we know (applying GRUB algorithm rules [1])
> that it can't be used anymore by a task (roughly speaking). So, we
> shouldn't have this problem in the DEADLINE case.
>
> [1] https://marc.info/?l=linux-kernel&m=149029880524038

OK

Why don't you store the contributions from DL and CFS separately, then
(say, as util_dl, util_cfs, respectively) and only discard the CFS one
if delta_ns > TICK_NSEC?
Juri Lelli March 31, 2017, 9:16 a.m. UTC | #7
On 31/03/17 11:03, Rafael J. Wysocki wrote:
> On Fri, Mar 31, 2017 at 9:31 AM, Juri Lelli <juri.lelli@arm.com> wrote:
> > On 30/03/17 22:13, Rafael J. Wysocki wrote:
> >> On Thu, Mar 30, 2017 at 10:58 AM, Juri Lelli <juri.lelli@arm.com> wrote:
> >> > Hi,
> >>
> >> Hi,
> >>
> >> > On 30/03/17 00:41, Rafael J. Wysocki wrote:
> >> >> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
> >> >> > No assumption can be made upon the rate at which frequency updates get
> >> >> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which
> >> >> > don't trigger them so frequently.
> >> >> >
> >> >> > Remove such assumption from the code.
> >> >>
> >> >> But the util/max values for idle CPUs may be stale, no?
> >> >>
> >> >
> >> > Right, that might be a problem. A proper solution I think would be to
> >> > remotely update such values for idle CPUs, and I believe Vincent is
> >> > working on a patch for that.
> >> >
> >> > As mid-term workarounds, changing a bit the current one, come to my
> >> > mind:
> >> >
> >> >  - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set
> >> >  - remove CFS contribution (without triggering a freq update) when a CPU
> >> >    enters IDLE; this might not work well, though, as we probably want
> >> >    to keep in blocked util contribution for a bit
> >> >
> >> > What you think is the way to go?
> >>
> >> Well, do we want SCHED_DEADLINE util contribution to be there even for
> >> idle CPUs?
> >>
> >
> > DEADLINE util contribution is removed, even if the CPU is idle, by the
> > reclaiming mechanism when we know (applying GRUB algorithm rules [1])
> > that it can't be used anymore by a task (roughly speaking). So, we
> > shouldn't have this problem in the DEADLINE case.
> >
> > [1] https://marc.info/?l=linux-kernel&m=149029880524038
> 
> OK
> 
> Why don't you store the contributions from DL and CFS separately, then
> (say, as util_dl, util_cfs, respectively) and only discard the CFS one
> if delta_ns > TICK_NSEC?

Sure, this should work as well. I'll try this approach for next version.

Thanks,

- Juri
diff mbox

Patch

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index da67a1cf91e7..40f30373b709 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -233,14 +233,13 @@  static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
 		 * If the CPU utilization was last updated before the previous
 		 * frequency update and the time elapsed between the last update
 		 * of the CPU utilization and the last frequency update is long
-		 * enough, don't take the CPU into account as it probably is
-		 * idle now (and clear iowait_boost for it).
+		 * enough, reset iowait_boost, as it probably is not boosted
+		 * anymore now.
 		 */
 		delta_ns = last_freq_update_time - j_sg_cpu->last_update;
-		if (delta_ns > TICK_NSEC) {
+		if (delta_ns > TICK_NSEC)
 			j_sg_cpu->iowait_boost = 0;
-			continue;
-		}
+
 		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
 			return policy->cpuinfo.max_freq;