Message ID | 20171222100626.7g5yklspjofcp2we@hirez.programming.kicks-ass.net (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On 22-Dec 11:06, Peter Zijlstra wrote: > On Wed, Dec 20, 2017 at 06:38:14PM +0100, Juri Lelli wrote: > > On 20/12/17 16:30, Peter Zijlstra wrote: > > > > [...] > > > > > @@ -327,12 +331,7 @@ static unsigned int sugov_next_freq_shar > > > if (delta_ns > TICK_NSEC) { > > > j_sg_cpu->iowait_boost = 0; > > > j_sg_cpu->iowait_boost_pending = false; > > > - j_sg_cpu->util_cfs = 0; > > > - if (j_sg_cpu->util_dl == 0) > > > - continue; > > > } > > > > This goes away because with Brendan/Vincent fix we don't need the > > workaround for stale CFS util contribution for idle CPUs anymore? > > An easy fix would be something like the below I suppose (also folded a > change from Viresh). > > This way it completely ignores the demand from idle CPUs. Which I > suppose is exactly what you want, no? > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index ab84d2261554..9736b537386a 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -315,8 +315,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) > unsigned long j_util, j_max; > s64 delta_ns; > > - if (j_sg_cpu != sg_cpu) > - sugov_get_util(j_sg_cpu); > + if (idle_cpu(j)) > + continue; That should work to skip IDLE CPUs... however I'm missing where now we get the sugov_get_util(j_sg_cpu) for active CPUs. It has been moved somewhere else I guess... Moreover, that way don't we completely disregard CFS blocked load for IDLE CPUs... as well as DL reserved utilization, which should be released only at the 0-lag time? > > /* > * If the CFS CPU utilization was last updated before the > @@ -354,7 +354,6 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > raw_spin_lock(&sg_policy->update_lock); > > - sugov_get_util(sg_cpu); > sugov_set_iowait_boost(sg_cpu, time, flags); > sg_cpu->last_update = time; >
On Fri, Dec 22, 2017 at 11:02:06AM +0000, Patrick Bellasi wrote: > > @@ -315,8 +315,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) > > unsigned long j_util, j_max; > > s64 delta_ns; > > > > - if (j_sg_cpu != sg_cpu) > > - sugov_get_util(j_sg_cpu); > > + if (idle_cpu(j)) > > + continue; > > That should work to skip IDLE CPUs... however I'm missing where now we > get the sugov_get_util(j_sg_cpu) for active CPUs. It has been moved > somewhere else I guess... No, I'm just an idiot... lemme fix that. > Moreover, that way don't we completely disregard CFS blocked load for > IDLE CPUs... as well as DL reserved utilization, which should be > released only at the 0-lag time? I was thinking that since dl is a 'global' scheduler the reservation would be too and thus the freq just needs a single CPU to be observed; but I suppose there's nothing stopping anybody from splitting a clock domain down the middle scheduling wise. So yes, good point. Blergh that'd make a mess of things again.
Hi Peter, On 22/12/17 12:46, Peter Zijlstra wrote: > On Fri, Dec 22, 2017 at 11:02:06AM +0000, Patrick Bellasi wrote: > > > @@ -315,8 +315,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) > > > unsigned long j_util, j_max; > > > s64 delta_ns; > > > > > > - if (j_sg_cpu != sg_cpu) > > > - sugov_get_util(j_sg_cpu); > > > + if (idle_cpu(j)) > > > + continue; > > > > That should work to skip IDLE CPUs... however I'm missing where now we > > get the sugov_get_util(j_sg_cpu) for active CPUs. It has been moved > > somewhere else I guess... > > No, I'm just an idiot... lemme fix that. > > > Moreover, that way don't we completely disregard CFS blocked load for > > IDLE CPUs... as well as DL reserved utilization, which should be > > released only at the 0-lag time? > > I was thinking that since dl is a 'global' scheduler the reservation > would be too and thus the freq just needs a single CPU to be observed; > but I suppose there's nothing stopping anybody from splitting a clock > domain down the middle scheduling wise. So yes, good point. Also, for CFS current behaviour is to start ignoring contributions after TICK_NS. It seems that your change might introduce regressions? Thanks, - Juri
On 22-Dec 12:46, Peter Zijlstra wrote: > On Fri, Dec 22, 2017 at 11:02:06AM +0000, Patrick Bellasi wrote: > > > @@ -315,8 +315,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) > > > unsigned long j_util, j_max; > > > s64 delta_ns; > > > > > > - if (j_sg_cpu != sg_cpu) > > > - sugov_get_util(j_sg_cpu); > > > + if (idle_cpu(j)) > > > + continue; > > > > That should work to skip IDLE CPUs... however I'm missing where now we > > get the sugov_get_util(j_sg_cpu) for active CPUs. It has been moved > > somewhere else I guess... > > No, I'm just an idiot... lemme fix that. Then you just missed a call to sugov_get_util(j_sg_cpu) after the above if... right, actually that was Viresh proposal... > > Moreover, that way don't we completely disregard CFS blocked load for > > IDLE CPUs... as well as DL reserved utilization, which should be > > released only at the 0-lag time? > > I was thinking that since dl is a 'global' scheduler the reservation > would be too and thus the freq just needs a single CPU to be observed; AFAIU global is only the admission control (which is something worth a thread by itself...) while the dl_se->dl_bw are aggregated into the dl_rq->running_bw, which ultimately represents the DL bandwidth required for just a CPU. > but I suppose there's nothing stopping anybody from splitting a clock > domain down the middle scheduling wise. So yes, good point. That makes sense... moreover, using the global utilization, we would end up asking for capacities which cannot be provided by a single CPU. > Blergh that'd make a mess of things again. Actually, looking better at your patch: are we not just ok with that? I mean, we don't need this check on idle_cpu since in sugov_aggregate_util we already skip the util=sg_cpu->max in case of !rq->rt.rt_nr_running, while we aggregate just CFS and DL requests.
On 22-Dec 13:07, Juri Lelli wrote: > Hi Peter, > > On 22/12/17 12:46, Peter Zijlstra wrote: > > On Fri, Dec 22, 2017 at 11:02:06AM +0000, Patrick Bellasi wrote: > > > > @@ -315,8 +315,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) > > > > unsigned long j_util, j_max; > > > > s64 delta_ns; > > > > > > > > - if (j_sg_cpu != sg_cpu) > > > > - sugov_get_util(j_sg_cpu); > > > > + if (idle_cpu(j)) > > > > + continue; > > > > > > That should work to skip IDLE CPUs... however I'm missing where now we > > > get the sugov_get_util(j_sg_cpu) for active CPUs. It has been moved > > > somewhere else I guess... > > > > No, I'm just an idiot... lemme fix that. > > > > > Moreover, that way don't we completely disregard CFS blocked load for > > > IDLE CPUs... as well as DL reserved utilization, which should be > > > released only at the 0-lag time? > > > > I was thinking that since dl is a 'global' scheduler the reservation > > would be too and thus the freq just needs a single CPU to be observed; > > but I suppose there's nothing stopping anybody from splitting a clock > > domain down the middle scheduling wise. So yes, good point. > > Also, for CFS current behaviour is to start ignoring contributions after > TICK_NS. It seems that your change might introduce regressions? Good point, an energy regression I guess you mean... I think that check is already gone for CFS in the current PeterZ tree. It seems we use TICK_NS just for the reset of iowait_boost, isn't it? However, if the remote updates of CFS works as expected, the removal of the TICK_NS for CFS is not intentional?
On Fri, Dec 22, 2017 at 12:07:37PM +0000, Patrick Bellasi wrote: > > I was thinking that since dl is a 'global' scheduler the reservation > > would be too and thus the freq just needs a single CPU to be observed; > > AFAIU global is only the admission control (which is something worth a > thread by itself...) while the dl_se->dl_bw are aggregated into the > dl_rq->running_bw, which ultimately represents the DL bandwidth > required for just a CPU. Oh urgh yes, forgot that.. then the dl freq stuff isn't strictly correct I think. But yes, that's another thread. > > but I suppose there's nothing stopping anybody from splitting a clock > > domain down the middle scheduling wise. So yes, good point. > > That makes sense... moreover, using the global utilization, we would > end up asking for capacities which cannot be provided by a single CPU. Yes, but that _should_ not be a problem if you clock them all high enough. But this gets to be complicated real fast I think. > > Blergh that'd make a mess of things again. > > Actually, looking better at your patch: are we not just ok with that? > > I mean, we don't need this check on idle_cpu since in > sugov_aggregate_util we already skip the util=sg_cpu->max in case of > !rq->rt.rt_nr_running, while we aggregate just CFS and DL requests. Right, well, I don't actually have an environment to test this sanely, so someone will have to go play with the various variations and see what works.
On Fri, Dec 22, 2017 at 12:14:45PM +0000, Patrick Bellasi wrote: > I think that check is already gone for CFS in the current PeterZ tree. > It seems we use TICK_NS just for the reset of iowait_boost, isn't it? Easy enough to bring back though. > However, if the remote updates of CFS works as expected, > the removal of the TICK_NS for CFS is not intentional? So I killed that because I now do get_util for all CPUs and figured get_util provides up-to-date numbers. So we don't need no artificial aging. Esp. once we get that whole blocked load stuff sorted.
On 22/12/17 13:19, Peter Zijlstra wrote: > On Fri, Dec 22, 2017 at 12:07:37PM +0000, Patrick Bellasi wrote: > > > I was thinking that since dl is a 'global' scheduler the reservation > > > would be too and thus the freq just needs a single CPU to be observed; > > > > AFAIU global is only the admission control (which is something worth a > > thread by itself...) while the dl_se->dl_bw are aggregated into the > > dl_rq->running_bw, which ultimately represents the DL bandwidth > > required for just a CPU. > > Oh urgh yes, forgot that.. then the dl freq stuff isn't strictly correct > I think. But yes, that's another thread. > > > > but I suppose there's nothing stopping anybody from splitting a clock > > > domain down the middle scheduling wise. So yes, good point. > > > > That makes sense... moreover, using the global utilization, we would > > end up asking for capacities which cannot be provided by a single CPU. > > Yes, but that _should_ not be a problem if you clock them all high > enough. But this gets to be complicated real fast I think. > > > > Blergh that'd make a mess of things again. > > > > Actually, looking better at your patch: are we not just ok with that? > > > > I mean, we don't need this check on idle_cpu since in > > sugov_aggregate_util we already skip the util=sg_cpu->max in case of > > !rq->rt.rt_nr_running, while we aggregate just CFS and DL requests. > > Right, well, I don't actually have an environment to test this sanely, > so someone will have to go play with the various variations and see what > works. Adding Claudio and Luca to the thread (as I don't have a testing platform myself ATM). ;)
On 22-Dec 13:19, Peter Zijlstra wrote: > On Fri, Dec 22, 2017 at 12:07:37PM +0000, Patrick Bellasi wrote: > > > I was thinking that since dl is a 'global' scheduler the reservation > > > would be too and thus the freq just needs a single CPU to be observed; > > > > AFAIU global is only the admission control (which is something worth a > > thread by itself...) while the dl_se->dl_bw are aggregated into the > > dl_rq->running_bw, which ultimately represents the DL bandwidth > > required for just a CPU. > > Oh urgh yes, forgot that.. then the dl freq stuff isn't strictly correct > I think. But yes, that's another thread. Mmm... maybe I don't get your point... I was referring to the global admission control of DL. If you have for example 3 60% DL tasks on a 2CPU system, AFAIU the CBS will allow the tasks in the system (since the overall utilization is 180 < 200 * 0.95) although that workload is not necessarily schedule (for example if the tasks wakeups at the same time one of them will miss its deadline). But, yeah... maybe I'm completely wrong or, in any case, it's for a different thread... > > > but I suppose there's nothing stopping anybody from splitting a clock > > > domain down the middle scheduling wise. So yes, good point. > > > > That makes sense... moreover, using the global utilization, we would > > end up asking for capacities which cannot be provided by a single CPU. > > Yes, but that _should_ not be a problem if you clock them all high > enough. But this gets to be complicated real fast I think. IMO the current solution with Juri's patches is working as expected: we know how many DL tasks are runnable on a CPU and we properly account for their utilization. The only "issue/limitation" is (eventually) the case described above. Dunno if we can enqueue 2 60% DL tasks on the same CPU... in that case we will ask for 120% Utilization? > > > Blergh that'd make a mess of things again. > > > > Actually, looking better at your patch: are we not just ok with that? > > > > I mean, we don't need this check on idle_cpu since in > > sugov_aggregate_util we already skip the util=sg_cpu->max in case of > > !rq->rt.rt_nr_running, while we aggregate just CFS and DL requests. > > Right, well, I don't actually have an environment to test this sanely, > so someone will have to go play with the various variations and see what > works. Definitively, we have some synthetics for mainline... as well as we can easily backport this series to v4.9 and test for power/perf using a full Android stack. But, give today is the 22th, I guess we can do that after holidays (in ~2 weeks).
On 22/12/17 12:38, Patrick Bellasi wrote: > On 22-Dec 13:19, Peter Zijlstra wrote: > > On Fri, Dec 22, 2017 at 12:07:37PM +0000, Patrick Bellasi wrote: > > > > I was thinking that since dl is a 'global' scheduler the reservation > > > > would be too and thus the freq just needs a single CPU to be observed; > > > > > > AFAIU global is only the admission control (which is something worth a > > > thread by itself...) while the dl_se->dl_bw are aggregated into the > > > dl_rq->running_bw, which ultimately represents the DL bandwidth > > > required for just a CPU. > > > > Oh urgh yes, forgot that.. then the dl freq stuff isn't strictly correct > > I think. But yes, that's another thread. > > Mmm... maybe I don't get your point... I was referring to the global > admission control of DL. If you have for example 3 60% DL tasks on a > 2CPU system, AFAIU the CBS will allow the tasks in the system (since > the overall utilization is 180 < 200 * 0.95) although that workload is > not necessarily schedule (for example if the tasks wakeups at the > same time one of them will miss its deadline). > > But, yeah... maybe I'm completely wrong or, in any case, it's for a > different thread... > > > > > but I suppose there's nothing stopping anybody from splitting a clock > > > > domain down the middle scheduling wise. So yes, good point. > > > > > > That makes sense... moreover, using the global utilization, we would > > > end up asking for capacities which cannot be provided by a single CPU. > > > > Yes, but that _should_ not be a problem if you clock them all high > > enough. But this gets to be complicated real fast I think. > > IMO the current solution with Juri's patches is working as expected: > we know how many DL tasks are runnable on a CPU and we properly > account for their utilization. > > The only "issue/limitation" is (eventually) the case described above. > Dunno if we can enqueue 2 60% DL tasks on the same CPU... in that case > we will ask for 120% Utilization? In general it depends on the other parameters, deadline and period.
On 22-Dec 13:43, Juri Lelli wrote: > On 22/12/17 12:38, Patrick Bellasi wrote: > > On 22-Dec 13:19, Peter Zijlstra wrote: > > > On Fri, Dec 22, 2017 at 12:07:37PM +0000, Patrick Bellasi wrote: > > > > > I was thinking that since dl is a 'global' scheduler the reservation > > > > > would be too and thus the freq just needs a single CPU to be observed; > > > > > > > > AFAIU global is only the admission control (which is something worth a > > > > thread by itself...) while the dl_se->dl_bw are aggregated into the > > > > dl_rq->running_bw, which ultimately represents the DL bandwidth > > > > required for just a CPU. > > > > > > Oh urgh yes, forgot that.. then the dl freq stuff isn't strictly correct > > > I think. But yes, that's another thread. > > > > Mmm... maybe I don't get your point... I was referring to the global > > admission control of DL. If you have for example 3 60% DL tasks on a > > 2CPU system, AFAIU the CBS will allow the tasks in the system (since > > the overall utilization is 180 < 200 * 0.95) although that workload is > > not necessarily schedule (for example if the tasks wakeups at the > > same time one of them will miss its deadline). > > > > But, yeah... maybe I'm completely wrong or, in any case, it's for a > > different thread... > > > > > > > but I suppose there's nothing stopping anybody from splitting a clock > > > > > domain down the middle scheduling wise. So yes, good point. > > > > > > > > That makes sense... moreover, using the global utilization, we would > > > > end up asking for capacities which cannot be provided by a single CPU. > > > > > > Yes, but that _should_ not be a problem if you clock them all high > > > enough. But this gets to be complicated real fast I think. > > > > IMO the current solution with Juri's patches is working as expected: > > we know how many DL tasks are runnable on a CPU and we properly > > account for their utilization. > > > > The only "issue/limitation" is (eventually) the case described above. > > Dunno if we can enqueue 2 60% DL tasks on the same CPU... in that case > > we will ask for 120% Utilization? > > In general it depends on the other parameters, deadline and period. Right, but what about the case dealdine==period, with 60% utilization? AFAIU, 3 DL tasks with same parameters like above will be accepted on a 2 CPU system, isn't it? And thus, in that case, we can end up with a 120% utlization request from DL for a single CPU... but, considering it's lunch o'clock, I'm likely missing something...
On 22/12/17 12:50, Patrick Bellasi wrote: > On 22-Dec 13:43, Juri Lelli wrote: > > On 22/12/17 12:38, Patrick Bellasi wrote: > > > On 22-Dec 13:19, Peter Zijlstra wrote: > > > > On Fri, Dec 22, 2017 at 12:07:37PM +0000, Patrick Bellasi wrote: > > > > > > I was thinking that since dl is a 'global' scheduler the reservation > > > > > > would be too and thus the freq just needs a single CPU to be observed; > > > > > > > > > > AFAIU global is only the admission control (which is something worth a > > > > > thread by itself...) while the dl_se->dl_bw are aggregated into the > > > > > dl_rq->running_bw, which ultimately represents the DL bandwidth > > > > > required for just a CPU. > > > > > > > > Oh urgh yes, forgot that.. then the dl freq stuff isn't strictly correct > > > > I think. But yes, that's another thread. > > > > > > Mmm... maybe I don't get your point... I was referring to the global > > > admission control of DL. If you have for example 3 60% DL tasks on a > > > 2CPU system, AFAIU the CBS will allow the tasks in the system (since > > > the overall utilization is 180 < 200 * 0.95) although that workload is > > > not necessarily schedule (for example if the tasks wakeups at the > > > same time one of them will miss its deadline). > > > > > > But, yeah... maybe I'm completely wrong or, in any case, it's for a > > > different thread... > > > > > > > > > but I suppose there's nothing stopping anybody from splitting a clock > > > > > > domain down the middle scheduling wise. So yes, good point. > > > > > > > > > > That makes sense... moreover, using the global utilization, we would > > > > > end up asking for capacities which cannot be provided by a single CPU. > > > > > > > > Yes, but that _should_ not be a problem if you clock them all high > > > > enough. But this gets to be complicated real fast I think. > > > > > > IMO the current solution with Juri's patches is working as expected: > > > we know how many DL tasks are runnable on a CPU and we properly > > > account for their utilization. > > > > > > The only "issue/limitation" is (eventually) the case described above. > > > Dunno if we can enqueue 2 60% DL tasks on the same CPU... in that case > > > we will ask for 120% Utilization? > > > > In general it depends on the other parameters, deadline and period. > > Right, but what about the case dealdine==period, with 60% utilization? > AFAIU, 3 DL tasks with same parameters like above will be accepted on > a 2 CPU system, isn't it? > > And thus, in that case, we can end up with a 120% utlization request > from DL for a single CPU... but, considering it's lunch o'clock, > I'm likely missing something... Nope. CBS on SMP only gives you bounded tardiness (at least with the AC kernel does). Some deadlines might be missed.
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index ab84d2261554..9736b537386a 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -315,8 +315,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) unsigned long j_util, j_max; s64 delta_ns; - if (j_sg_cpu != sg_cpu) - sugov_get_util(j_sg_cpu); + if (idle_cpu(j)) + continue; /* * If the CFS CPU utilization was last updated before the @@ -354,7 +354,6 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, raw_spin_lock(&sg_policy->update_lock); - sugov_get_util(sg_cpu); sugov_set_iowait_boost(sg_cpu, time, flags); sg_cpu->last_update = time;