Message ID | 1472236848-17038-3-git-send-email-smuckle@linaro.org (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
On Friday, August 26, 2016 11:40:48 AM Steve Muckle wrote: > A policy of going to fmax on any RT activity will be detrimental > for power on many platforms. Often RT accounts for only a small amount > of CPU activity so sending the CPU frequency to fmax is overkill. Worse > still, some platforms may not be able to even complete the CPU frequency > change before the RT activity has already completed. > > Cpufreq governors have not treated RT activity this way in the past so > it is not part of the expected semantics of the RT scheduling class. The > DL class offers guarantees about task completion and could be used for > this purpose. > > Modify the schedutil algorithm to instead use rt_avg as an estimate of > RT utilization of the CPU. > > Based on previous work by Vincent Guittot <vincent.guittot@linaro.org>. If we do it for RT, why not to do a similar thing for DL? As in the original patch from Peter, for example? > Signed-off-by: Steve Muckle <smuckle@linaro.org> > --- > kernel/sched/cpufreq_schedutil.c | 26 +++++++++++++++++--------- > 1 file changed, 17 insertions(+), 9 deletions(-) > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index cb8a77b1ef1b..89094a466250 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -146,13 +146,21 @@ static unsigned int get_next_freq(struct sugov_cpu *sg_cpu, unsigned long util, > > static void sugov_get_util(unsigned long *util, unsigned long *max) > { > - struct rq *rq = this_rq(); > - unsigned long cfs_max; > + int cpu = smp_processor_id(); > + struct rq *rq = cpu_rq(cpu); > + unsigned long max_cap, rt; > + s64 delta; > > - cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id()); > + max_cap = arch_scale_cpu_capacity(NULL, cpu); > > - *util = min(rq->cfs.avg.util_avg, cfs_max); > - *max = cfs_max; > + delta = rq_clock(rq) - rq->age_stamp; > + if (unlikely(delta < 0)) > + delta = 0; > + rt = div64_u64(rq->rt_avg, sched_avg_period() + delta); > + rt = (rt * max_cap) >> SCHED_CAPACITY_SHIFT; These computations are rather heavy, so I wonder if they are avoidable based on the flags, for example? Plus is SCHED_CAPACITY_SHIFT actually defined for all architectures? One more ugly thing is about using rq_clock(rq) directly from here whereas we pass it around as the 'time' argument elsewhere. > + > + *util = min(rq->cfs.avg.util_avg + rt, max_cap); > + *max = max_cap; > } Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 26, 2016 at 11:40:48AM -0700, Steve Muckle wrote: > A policy of going to fmax on any RT activity will be detrimental > for power on many platforms. Often RT accounts for only a small amount > of CPU activity so sending the CPU frequency to fmax is overkill. Worse > still, some platforms may not be able to even complete the CPU frequency > change before the RT activity has already completed. > > Cpufreq governors have not treated RT activity this way in the past so > it is not part of the expected semantics of the RT scheduling class. The > DL class offers guarantees about task completion and could be used for > this purpose. Not entirely true. People have simply disabled cpufreq because of this. Yes, RR/FIFO are a pain, but they should still be deterministic, and variable cpufreq destroys that. I realize that the fmax thing is annoying, but I'm not seeing how rt_avg is much better. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 31, 2016 at 03:31:07AM +0200, Rafael J. Wysocki wrote: > On Friday, August 26, 2016 11:40:48 AM Steve Muckle wrote: > > A policy of going to fmax on any RT activity will be detrimental > > for power on many platforms. Often RT accounts for only a small amount > > of CPU activity so sending the CPU frequency to fmax is overkill. Worse > > still, some platforms may not be able to even complete the CPU frequency > > change before the RT activity has already completed. > > > > Cpufreq governors have not treated RT activity this way in the past so > > it is not part of the expected semantics of the RT scheduling class. The > > DL class offers guarantees about task completion and could be used for > > this purpose. > > > > Modify the schedutil algorithm to instead use rt_avg as an estimate of > > RT utilization of the CPU. > > > > Based on previous work by Vincent Guittot <vincent.guittot@linaro.org>. > > If we do it for RT, why not to do a similar thing for DL? As in the > original patch from Peter, for example? Agreed DL should have a similar change. I think that could be done in a separate patch. I also would need to discuss it with the deadline sched devs to fully understand the metric used there. > > > Signed-off-by: Steve Muckle <smuckle@linaro.org> > > --- > > kernel/sched/cpufreq_schedutil.c | 26 +++++++++++++++++--------- > > 1 file changed, 17 insertions(+), 9 deletions(-) > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > index cb8a77b1ef1b..89094a466250 100644 > > --- a/kernel/sched/cpufreq_schedutil.c > > +++ b/kernel/sched/cpufreq_schedutil.c > > @@ -146,13 +146,21 @@ static unsigned int get_next_freq(struct sugov_cpu *sg_cpu, unsigned long util, > > > > static void sugov_get_util(unsigned long *util, unsigned long *max) > > { > > - struct rq *rq = this_rq(); > > - unsigned long cfs_max; > > + int cpu = smp_processor_id(); > > + struct rq *rq = cpu_rq(cpu); > > + unsigned long max_cap, rt; > > + s64 delta; > > > > - cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id()); > > + max_cap = arch_scale_cpu_capacity(NULL, cpu); > > > > - *util = min(rq->cfs.avg.util_avg, cfs_max); > > - *max = cfs_max; > > + delta = rq_clock(rq) - rq->age_stamp; > > + if (unlikely(delta < 0)) > > + delta = 0; > > + rt = div64_u64(rq->rt_avg, sched_avg_period() + delta); > > + rt = (rt * max_cap) >> SCHED_CAPACITY_SHIFT; > > These computations are rather heavy, so I wonder if they are avoidable based > on the flags, for example? Yeah the div is bad. I don't know that we can avoid it based on the flags because rt_avg will decay during CFS activity and you'd want to take note of that. One way to make this a little better is to ssume that the divisor, sched_avg_period() + delta, fits into 32 bits so that div_u64 can be used, which I believe is less bad. Doing that means placing a restriction on how large sysctl_sched_time_avg (which determines sched_avg_period()) can be, a max of 4.2 seconds I think. I don't know that anyone uses a value that large anyway but there's currently no limit on it. Another option would be just adding another separate metric to track rt activity that is more mathematically favorable to deal with. Both these seemed potentially heavy handed so I figured I'd just start with the obvious, if suboptimal, solution... > Plus is SCHED_CAPACITY_SHIFT actually defined for all architectures? Yes. > One more ugly thing is about using rq_clock(rq) directly from here whereas we > pass it around as the 'time' argument elsewhere. Sure I'll clean this up. > > > + > > + *util = min(rq->cfs.avg.util_avg + rt, max_cap); > > + *max = max_cap; > > } -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 31, 2016 at 04:39:07PM +0200, Peter Zijlstra wrote: > On Fri, Aug 26, 2016 at 11:40:48AM -0700, Steve Muckle wrote: > > A policy of going to fmax on any RT activity will be detrimental > > for power on many platforms. Often RT accounts for only a small amount > > of CPU activity so sending the CPU frequency to fmax is overkill. Worse > > still, some platforms may not be able to even complete the CPU frequency > > change before the RT activity has already completed. > > > > Cpufreq governors have not treated RT activity this way in the past so > > it is not part of the expected semantics of the RT scheduling class. The > > DL class offers guarantees about task completion and could be used for > > this purpose. > > Not entirely true. People have simply disabled cpufreq because of this. > > Yes, RR/FIFO are a pain, but they should still be deterministic, and > variable cpufreq destroys that. That is the way it's been with cpufreq and many systems (including all mobile devices) rely on that to not destroy power. RT + variable cpufreq is not deterministic. Given we don't have good constraints on RT tasks I don't think we should try to strengthen the semantics there. Folks should either move to DL if they want determinism *and* not-sucky power, or continue disabling cpufreq if they are able to do so. > I realize that the fmax thing is annoying, but I'm not seeing how rt_avg > is much better. Rt_avg is much closer to the current behavior offered by the most commonly used cpufreq governors since it tracks actual CPU utilization. Power is not impacted by minimal RT activity and the frequency is raised if RT activity is high. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 31 Aug 2016, Steve Muckle wrote: > On Wed, Aug 31, 2016 at 04:39:07PM +0200, Peter Zijlstra wrote: > > On Fri, Aug 26, 2016 at 11:40:48AM -0700, Steve Muckle wrote: > > > A policy of going to fmax on any RT activity will be detrimental > > > for power on many platforms. Often RT accounts for only a small amount > > > of CPU activity so sending the CPU frequency to fmax is overkill. Worse > > > still, some platforms may not be able to even complete the CPU frequency > > > change before the RT activity has already completed. > > > > > > Cpufreq governors have not treated RT activity this way in the past so > > > it is not part of the expected semantics of the RT scheduling class. The > > > DL class offers guarantees about task completion and could be used for > > > this purpose. > > > > Not entirely true. People have simply disabled cpufreq because of this. > > > > Yes, RR/FIFO are a pain, but they should still be deterministic, and > > variable cpufreq destroys that. > > That is the way it's been with cpufreq and many systems (including all > mobile devices) rely on that to not destroy power. RT + variable cpufreq > is not deterministic. > > Given we don't have good constraints on RT tasks I don't think we should > try to strengthen the semantics there. Folks should either move to DL if > they want determinism *and* not-sucky power, or continue disabling > cpufreq if they are able to do so. RT deterministic behaviour is all about meeting the deadlines. If your deadline is relaxed enough that you can meet it even with the lowest cpu frequency then it's perfectly fine to enable cpufreq. The same logic applies to C-States. There are a lot of RT systems out there which enable both. If cpufreq or c-states cause a deadline violation because the constraints of the system are tight, then people will disable it and we need a knob for both. Realtime is not as fast as possible. It's as fast as specified. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 31, 2016 at 06:28:10PM +0200, Thomas Gleixner wrote: > > That is the way it's been with cpufreq and many systems (including all > > mobile devices) rely on that to not destroy power. RT + variable cpufreq > > is not deterministic. > > > > Given we don't have good constraints on RT tasks I don't think we should > > try to strengthen the semantics there. Folks should either move to DL if > > they want determinism *and* not-sucky power, or continue disabling > > cpufreq if they are able to do so. > > RT deterministic behaviour is all about meeting the deadlines. If your > deadline is relaxed enough that you can meet it even with the lowest cpu > frequency then it's perfectly fine to enable cpufreq. The same logic applies > to C-States. > > There are a lot of RT systems out there which enable both. If cpufreq or > c-states cause a deadline violation because the constraints of the system are > tight, then people will disable it and we need a knob for both. > > Realtime is not as fast as possible. It's as fast as specified. Sure, problem is of course that RR/FIFO doesn't specify anything so the users are left to prod knobs. Another problem is that we have many semi related knobs; we have the global RT runtime limit knob, but that doesn't affect cpufreq (maybe it should) and cpufreq has knobs to set f_min and f_max, which again are unaware of RT anything. So before we go do anything, I'd like input on what is needed and how things should tie together to make most sense. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 31/08/16 18:40, Peter Zijlstra wrote: > On Wed, Aug 31, 2016 at 06:28:10PM +0200, Thomas Gleixner wrote: > > > That is the way it's been with cpufreq and many systems (including all > > > mobile devices) rely on that to not destroy power. RT + variable cpufreq > > > is not deterministic. > > > > > > Given we don't have good constraints on RT tasks I don't think we should > > > try to strengthen the semantics there. Folks should either move to DL if > > > they want determinism *and* not-sucky power, or continue disabling > > > cpufreq if they are able to do so. > > > > RT deterministic behaviour is all about meeting the deadlines. If your > > deadline is relaxed enough that you can meet it even with the lowest cpu > > frequency then it's perfectly fine to enable cpufreq. The same logic applies > > to C-States. > > > > There are a lot of RT systems out there which enable both. If cpufreq or > > c-states cause a deadline violation because the constraints of the system are > > tight, then people will disable it and we need a knob for both. > > > > Realtime is not as fast as possible. It's as fast as specified. > > Sure, problem is of course that RR/FIFO doesn't specify anything so the > users are left to prod knobs. > > Another problem is that we have many semi related knobs; we have the > global RT runtime limit knob, but that doesn't affect cpufreq (maybe it > should) Maybe we could create this sort of link when using the cgroup RT throttling interface as well? It should still then fit well once we replace the underlying mechanism with DL reservations. And, AFAIK, the interface is used by Android folks already. > and cpufreq has knobs to set f_min and f_max, which again are > unaware of RT anything. > > So before we go do anything, I'd like input on what is needed and how > things should tie together to make most sense. > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, August 31, 2016 06:40:09 PM Peter Zijlstra wrote: > On Wed, Aug 31, 2016 at 06:28:10PM +0200, Thomas Gleixner wrote: > > > That is the way it's been with cpufreq and many systems (including all > > > mobile devices) rely on that to not destroy power. RT + variable cpufreq > > > is not deterministic. > > > > > > Given we don't have good constraints on RT tasks I don't think we should > > > try to strengthen the semantics there. Folks should either move to DL if > > > they want determinism *and* not-sucky power, or continue disabling > > > cpufreq if they are able to do so. > > > > RT deterministic behaviour is all about meeting the deadlines. If your > > deadline is relaxed enough that you can meet it even with the lowest cpu > > frequency then it's perfectly fine to enable cpufreq. The same logic applies > > to C-States. > > > > There are a lot of RT systems out there which enable both. If cpufreq or > > c-states cause a deadline violation because the constraints of the system are > > tight, then people will disable it and we need a knob for both. > > > > Realtime is not as fast as possible. It's as fast as specified. > > Sure, problem is of course that RR/FIFO doesn't specify anything so the > users are left to prod knobs. > > Another problem is that we have many semi related knobs; we have the > global RT runtime limit knob, but that doesn't affect cpufreq (maybe it > should) and cpufreq has knobs to set f_min and f_max, which again are > unaware of RT anything. > > So before we go do anything, I'd like input on what is needed and how > things should tie together to make most sense. I totally agree. We need to know where we want to get to before deciding on which way to go. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 31, 2016 at 06:00:02PM +0100, Juri Lelli wrote: > On 31/08/16 18:40, Peter Zijlstra wrote: > > Another problem is that we have many semi related knobs; we have the > > global RT runtime limit knob, but that doesn't affect cpufreq (maybe it > > should) > > Maybe we could create this sort of link when using the cgroup RT > throttling interface as well? It should still then fit well once we > replace the underlying mechanism with DL reservations. And, AFAIK, the > interface is used by Android folks already. Tricky, but possible I suppose. Since minimal cpufreq is 'global', the cgroup reservation only matters if there are no tasks in any of its parent groups. Computing the effective rt min then again becomes somewhat tricky, since we'd have to iterate the cgroup tree. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 31, 2016 at 06:00:02PM +0100, Juri Lelli wrote: > > Another problem is that we have many semi related knobs; we have the > > global RT runtime limit knob, but that doesn't affect cpufreq (maybe it > > should) > > Maybe we could create this sort of link when using the cgroup RT > throttling interface as well? It should still then fit well once we > replace the underlying mechanism with DL reservations. And, AFAIK, the > interface is used by Android folks already. I'm not sure how the upper bounds can be used to infer CPU frequency... On my Nexus 6p (an Android device), the global RT runtime limit seems to be set at 950ms/1sec, the root cgroup is set to 800ms/1sec, and bg_non_interactive is set at 700ms/1sec. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 31 Aug 2016, Peter Zijlstra wrote: > On Wed, Aug 31, 2016 at 06:28:10PM +0200, Thomas Gleixner wrote: > > > That is the way it's been with cpufreq and many systems (including all > > > mobile devices) rely on that to not destroy power. RT + variable cpufreq > > > is not deterministic. > > > > > > Given we don't have good constraints on RT tasks I don't think we should > > > try to strengthen the semantics there. Folks should either move to DL if > > > they want determinism *and* not-sucky power, or continue disabling > > > cpufreq if they are able to do so. > > > > RT deterministic behaviour is all about meeting the deadlines. If your > > deadline is relaxed enough that you can meet it even with the lowest cpu > > frequency then it's perfectly fine to enable cpufreq. The same logic applies > > to C-States. > > > > There are a lot of RT systems out there which enable both. If cpufreq or > > c-states cause a deadline violation because the constraints of the system are > > tight, then people will disable it and we need a knob for both. > > > > Realtime is not as fast as possible. It's as fast as specified. > > Sure, problem is of course that RR/FIFO doesn't specify anything so the > users are left to prod knobs. I know :( > Another problem is that we have many semi related knobs; we have the > global RT runtime limit knob, but that doesn't affect cpufreq (maybe it > should) and cpufreq has knobs to set f_min and f_max, which again are > unaware of RT anything. > > So before we go do anything, I'd like input on what is needed and how > things should tie together to make most sense. RT systems and especially RR/FIFO driven ones need a lot of specific tuning and configuration. I doubt that we can do anything except lousy heuristics which will end up being wrong for most use cases. In the DL case we certainly can do informed decisions, but for the RR/FIFO case the global RT runtime limit is just a too big hammer which shouldn't be abused for calculating cpufreq limits. I think that we should concentrate on DL and make it work very well and just leave the rest of the RT folks with rather simplistic knobs (i.e. on/off/hard limits). That will force people who have RT _and_ power constraints to think harder about their system design and eventually make them move over to DL. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/09/16 14:48, Steve Muckle wrote: > On Wed, Aug 31, 2016 at 06:00:02PM +0100, Juri Lelli wrote: > > > Another problem is that we have many semi related knobs; we have the > > > global RT runtime limit knob, but that doesn't affect cpufreq (maybe it > > > should) > > > > Maybe we could create this sort of link when using the cgroup RT > > throttling interface as well? It should still then fit well once we > > replace the underlying mechanism with DL reservations. And, AFAIK, the > > interface is used by Android folks already. > > I'm not sure how the upper bounds can be used to infer CPU frequency... > On my Nexus 6p (an Android device), the global RT runtime limit > seems to be set at 950ms/1sec, the root cgroup is set to 800ms/1sec, and > bg_non_interactive is set at 700ms/1sec. > Right, unfortunately. Still too coarse grained (as Thomas is also saying in his last reply, if I read it correctly). Doesn't pay off the added complexity I'm afraid. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2 Sep 2016, Juri Lelli wrote: > On 01/09/16 14:48, Steve Muckle wrote: > > On Wed, Aug 31, 2016 at 06:00:02PM +0100, Juri Lelli wrote: > > > > Another problem is that we have many semi related knobs; we have the > > > > global RT runtime limit knob, but that doesn't affect cpufreq (maybe it > > > > should) > > > > > > Maybe we could create this sort of link when using the cgroup RT > > > throttling interface as well? It should still then fit well once we > > > replace the underlying mechanism with DL reservations. And, AFAIK, the > > > interface is used by Android folks already. > > > > I'm not sure how the upper bounds can be used to infer CPU frequency... > > On my Nexus 6p (an Android device), the global RT runtime limit > > seems to be set at 950ms/1sec, the root cgroup is set to 800ms/1sec, and > > bg_non_interactive is set at 700ms/1sec. > > > > Right, unfortunately. Still too coarse grained (as Thomas is also saying > in his last reply, if I read it correctly). Yes, you do. It's a big hammer and really unsuitable for this kind of mechanism. > Doesn't pay off the added complexity I'm afraid. Certainly not. And the only choice we have is heuristics. Heuristic is an euphemism for saying that it cannot work. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index cb8a77b1ef1b..89094a466250 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -146,13 +146,21 @@ static unsigned int get_next_freq(struct sugov_cpu *sg_cpu, unsigned long util, static void sugov_get_util(unsigned long *util, unsigned long *max) { - struct rq *rq = this_rq(); - unsigned long cfs_max; + int cpu = smp_processor_id(); + struct rq *rq = cpu_rq(cpu); + unsigned long max_cap, rt; + s64 delta; - cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id()); + max_cap = arch_scale_cpu_capacity(NULL, cpu); - *util = min(rq->cfs.avg.util_avg, cfs_max); - *max = cfs_max; + delta = rq_clock(rq) - rq->age_stamp; + if (unlikely(delta < 0)) + delta = 0; + rt = div64_u64(rq->rt_avg, sched_avg_period() + delta); + rt = (rt * max_cap) >> SCHED_CAPACITY_SHIFT; + + *util = min(rq->cfs.avg.util_avg + rt, max_cap); + *max = max_cap; } static void sugov_update_single(struct update_util_data *hook, u64 time, @@ -167,7 +175,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, if (!sugov_should_update_freq(sg_policy, time)) return; - if (flags & SCHED_CPUFREQ_RT_DL) { + if (flags & SCHED_CPUFREQ_DL) { next_f = policy->cpuinfo.max_freq; } else { sugov_get_util(&util, &max); @@ -186,7 +194,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 last_freq_update_time = sg_policy->last_freq_update_time; unsigned int j; - if (flags & SCHED_CPUFREQ_RT_DL) + if (flags & SCHED_CPUFREQ_DL) return max_f; for_each_cpu(j, policy->cpus) { @@ -209,7 +217,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, if (delta_ns > TICK_NSEC) continue; - if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL) + if (j_sg_cpu->flags & SCHED_CPUFREQ_DL) return max_f; j_util = j_sg_cpu->util; @@ -467,7 +475,7 @@ static int sugov_start(struct cpufreq_policy *policy) if (policy_is_shared(policy)) { sg_cpu->util = 0; sg_cpu->max = 0; - sg_cpu->flags = SCHED_CPUFREQ_RT; + sg_cpu->flags = SCHED_CPUFREQ_DL; sg_cpu->last_update = 0; sg_cpu->cached_raw_freq = 0; cpufreq_add_update_util_hook(cpu, &sg_cpu->update_util,
A policy of going to fmax on any RT activity will be detrimental for power on many platforms. Often RT accounts for only a small amount of CPU activity so sending the CPU frequency to fmax is overkill. Worse still, some platforms may not be able to even complete the CPU frequency change before the RT activity has already completed. Cpufreq governors have not treated RT activity this way in the past so it is not part of the expected semantics of the RT scheduling class. The DL class offers guarantees about task completion and could be used for this purpose. Modify the schedutil algorithm to instead use rt_avg as an estimate of RT utilization of the CPU. Based on previous work by Vincent Guittot <vincent.guittot@linaro.org>. Signed-off-by: Steve Muckle <smuckle@linaro.org> --- kernel/sched/cpufreq_schedutil.c | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-)