Message ID | 20170705085905.6558-9-juri.lelli@arm.com (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote: > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq) > if (unlikely(dl_entity_is_special(dl_se))) > return; > > - if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) > - delta_exec = grub_reclaim(delta_exec, rq, &curr->dl); > - dl_se->runtime -= delta_exec; > + /* > + * For tasks that participate in GRUB, we implement GRUB-PA: the > + * spare reclaimed bandwidth is used to clock down frequency. > + * > + * For the others, we still need to scale reservation parameters > + * according to current frequency and CPU maximum capacity. > + */ > + if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) { > + scaled_delta_exec = grub_reclaim(delta_exec, > + rq, > + &curr->dl); > + } else { > + unsigned long scale_freq = arch_scale_freq_capacity(cpu); > + unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu); > + > + scaled_delta_exec = cap_scale(delta_exec, scale_freq); > + scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu); > + } > + > + dl_se->runtime -= scaled_delta_exec; > This I don't get...
On 19/07/17 09:21, Peter Zijlstra wrote: > On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote: > > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq) > > if (unlikely(dl_entity_is_special(dl_se))) > > return; > > > > - if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) > > - delta_exec = grub_reclaim(delta_exec, rq, &curr->dl); > > - dl_se->runtime -= delta_exec; > > + /* > > + * For tasks that participate in GRUB, we implement GRUB-PA: the > > + * spare reclaimed bandwidth is used to clock down frequency. > > + * > > + * For the others, we still need to scale reservation parameters > > + * according to current frequency and CPU maximum capacity. > > + */ > > + if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) { > > + scaled_delta_exec = grub_reclaim(delta_exec, > > + rq, > > + &curr->dl); > > + } else { > > + unsigned long scale_freq = arch_scale_freq_capacity(cpu); > > + unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu); > > + > > + scaled_delta_exec = cap_scale(delta_exec, scale_freq); > > + scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu); > > + } > > + > > + dl_se->runtime -= scaled_delta_exec; > > > > This I don't get... Considering that we use GRUB's active utilization to drive clock frequency selection, rationale is that GRUB tasks don't need any special scaling, as their delta_exec is already scaled according to GRUB rules. OTOH, normal tasks need to have their runtime (delta_exec) explicitly scaled considering current frequency (and CPU max capacity), otherwise they are going to receive less runtime than granted at AC, when frequency is reduced.
On Wed, Jul 19, 2017 at 10:20:29AM +0100, Juri Lelli wrote: > On 19/07/17 09:21, Peter Zijlstra wrote: > > On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote: > > > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq) > > > if (unlikely(dl_entity_is_special(dl_se))) > > > return; > > > > > > - if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) > > > - delta_exec = grub_reclaim(delta_exec, rq, &curr->dl); > > > - dl_se->runtime -= delta_exec; > > > + /* > > > + * For tasks that participate in GRUB, we implement GRUB-PA: the > > > + * spare reclaimed bandwidth is used to clock down frequency. > > > + * > > > + * For the others, we still need to scale reservation parameters > > > + * according to current frequency and CPU maximum capacity. > > > + */ > > > + if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) { > > > + scaled_delta_exec = grub_reclaim(delta_exec, > > > + rq, > > > + &curr->dl); > > > + } else { > > > + unsigned long scale_freq = arch_scale_freq_capacity(cpu); > > > + unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu); > > > + > > > + scaled_delta_exec = cap_scale(delta_exec, scale_freq); > > > + scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu); > > > + } > > > + > > > + dl_se->runtime -= scaled_delta_exec; > > > > > > > This I don't get... > > > Considering that we use GRUB's active utilization to drive clock > frequency selection, rationale is that GRUB tasks don't need any special > scaling, as their delta_exec is already scaled according to GRUB rules. > OTOH, normal tasks need to have their runtime (delta_exec) explicitly > scaled considering current frequency (and CPU max capacity), otherwise > they are going to receive less runtime than granted at AC, when > frequency is reduced. I don't think that quite works out. Given that the frequency selection will never quite end up at exactly the same fraction (if the hardware listens to your requests at all). Also, by not scaling the GRUB stuff, don't you run the risk of attempting to hand out more idle time than there actually is?
On 19/07/17 13:00, Peter Zijlstra wrote: > On Wed, Jul 19, 2017 at 10:20:29AM +0100, Juri Lelli wrote: > > On 19/07/17 09:21, Peter Zijlstra wrote: > > > On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote: > > > > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq) > > > > if (unlikely(dl_entity_is_special(dl_se))) > > > > return; > > > > > > > > - if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) > > > > - delta_exec = grub_reclaim(delta_exec, rq, &curr->dl); > > > > - dl_se->runtime -= delta_exec; > > > > + /* > > > > + * For tasks that participate in GRUB, we implement GRUB-PA: the > > > > + * spare reclaimed bandwidth is used to clock down frequency. > > > > + * > > > > + * For the others, we still need to scale reservation parameters > > > > + * according to current frequency and CPU maximum capacity. > > > > + */ > > > > + if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) { > > > > + scaled_delta_exec = grub_reclaim(delta_exec, > > > > + rq, > > > > + &curr->dl); > > > > + } else { > > > > + unsigned long scale_freq = arch_scale_freq_capacity(cpu); > > > > + unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu); > > > > + > > > > + scaled_delta_exec = cap_scale(delta_exec, scale_freq); > > > > + scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu); > > > > + } > > > > + > > > > + dl_se->runtime -= scaled_delta_exec; > > > > > > > > > > This I don't get... > > > > > > Considering that we use GRUB's active utilization to drive clock > > frequency selection, rationale is that GRUB tasks don't need any special > > scaling, as their delta_exec is already scaled according to GRUB rules. > > OTOH, normal tasks need to have their runtime (delta_exec) explicitly > > scaled considering current frequency (and CPU max capacity), otherwise > > they are going to receive less runtime than granted at AC, when > > frequency is reduced. > > I don't think that quite works out. Given that the frequency selection > will never quite end up at exactly the same fraction (if the hardware > listens to your requests at all). > It's an approximation yes (how big it depends on the granularity of the available frequencies). But, for the !GRUB tasks it should be OK, as we always select a frequency (among the available ones) bigger than the current active utilization. Also, for platforms/archs that don't redefine arch_scale_* this is not used. In case they are defined instead the assumption is that either hw listens to requests or scaling factors can be derived in some other ways (avgs?). > Also, by not scaling the GRUB stuff, don't you run the risk of > attempting to hand out more idle time than there actually is? The way I understand it is that for GRUB tasks we always scale considering the "correct" factor. Then frequency could be higher, but this spare idle time will be reclaimed by other GRUB tasks.
On Wed, Jul 19, 2017 at 12:16:24PM +0100, Juri Lelli wrote: > On 19/07/17 13:00, Peter Zijlstra wrote: > > On Wed, Jul 19, 2017 at 10:20:29AM +0100, Juri Lelli wrote: > > > On 19/07/17 09:21, Peter Zijlstra wrote: > > > > On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote: > > > > > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq) > > > > > if (unlikely(dl_entity_is_special(dl_se))) > > > > > return; > > > > > > > > > > - if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) > > > > > - delta_exec = grub_reclaim(delta_exec, rq, &curr->dl); > > > > > - dl_se->runtime -= delta_exec; > > > > > + /* > > > > > + * For tasks that participate in GRUB, we implement GRUB-PA: the > > > > > + * spare reclaimed bandwidth is used to clock down frequency. > > > > > + * > > > > > + * For the others, we still need to scale reservation parameters > > > > > + * according to current frequency and CPU maximum capacity. > > > > > + */ > > > > > + if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) { > > > > > + scaled_delta_exec = grub_reclaim(delta_exec, > > > > > + rq, > > > > > + &curr->dl); > > > > > + } else { > > > > > + unsigned long scale_freq = arch_scale_freq_capacity(cpu); > > > > > + unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu); > > > > > + > > > > > + scaled_delta_exec = cap_scale(delta_exec, scale_freq); > > > > > + scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu); > > > > > + } > > > > > + > > > > > + dl_se->runtime -= scaled_delta_exec; > > > > > > > > > > > > > This I don't get... > > > > > > > > > Considering that we use GRUB's active utilization to drive clock > > > frequency selection, rationale is that GRUB tasks don't need any special > > > scaling, as their delta_exec is already scaled according to GRUB rules. > > > OTOH, normal tasks need to have their runtime (delta_exec) explicitly > > > scaled considering current frequency (and CPU max capacity), otherwise > > > they are going to receive less runtime than granted at AC, when > > > frequency is reduced. > > > > I don't think that quite works out. Given that the frequency selection > > will never quite end up at exactly the same fraction (if the hardware > > listens to your requests at all). > > > > It's an approximation yes (how big it depends on the granularity of the > available frequencies). But, for the !GRUB tasks it should be OK, as we > always select a frequency (among the available ones) bigger than the > current active utilization. > > Also, for platforms/archs that don't redefine arch_scale_* this is not > used. In case they are defined instead the assumption is that either hw > listens to requests or scaling factors can be derived in some other ways > (avgs?). > > > Also, by not scaling the GRUB stuff, don't you run the risk of > > attempting to hand out more idle time than there actually is? > > The way I understand it is that for GRUB tasks we always scale > considering the "correct" factor. Then frequency could be higher, but > this spare idle time will be reclaimed by other GRUB tasks. I'm still confused.. So GRUB does: dq = Uact -dt right? (yeah, I know, the code does something a little more complicated, but it should still be more or less the same of you take out the 'extra' bits). Now, you do DVFS using that same Uact. If we lower the clock, we need more time, so would we then not end up with something like: dq = 1/Uact -dt After all; our budget assignment is such that we're able to complete our work at max freq. Therefore, when we lower the frequency, we'll have to increase budget pro rata, otherwise we'll not complete our work and badness happens. Say we have a 1 Ghz part and Uact=0.5 we'd select 500 Mhz and need double the time to complete. Now, if we fold these two together, you'd get: dq = Uact/Uact -dt = -dt Because, after all, if we lowered the clock to consume all idle time, there's no further idle time to reclaim. Now, of course, our DVFS assignment isn't as precise nor as deterministic as this, so we'll get a slightly different ratio, lets call that Udvfs. So would then not GRUB change into something like: dq = Uact/Udvfs -dt Allowing it to only reclaim that idle time that exists because our DVFS level is strictly higher than required? This way, on our 1 GHz part, with Uact=.5 but Udvfs=.6, we'll allow it to reclaim just the additional 100Mhz of idle time. Or am I completely off the rails now?
Hi Peter, On Mon, 24 Jul 2017 18:43:49 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Jul 19, 2017 at 12:16:24PM +0100, Juri Lelli wrote: > > On 19/07/17 13:00, Peter Zijlstra wrote: > > > On Wed, Jul 19, 2017 at 10:20:29AM +0100, Juri Lelli wrote: > > > > On 19/07/17 09:21, Peter Zijlstra wrote: > > > > > On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote: > > > > > > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq) > > > > > > if (unlikely(dl_entity_is_special(dl_se))) > > > > > > return; > > > > > > > > > > > > - if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) > > > > > > - delta_exec = grub_reclaim(delta_exec, rq, &curr->dl); > > > > > > - dl_se->runtime -= delta_exec; > > > > > > + /* > > > > > > + * For tasks that participate in GRUB, we implement GRUB-PA: the > > > > > > + * spare reclaimed bandwidth is used to clock down frequency. > > > > > > + * > > > > > > + * For the others, we still need to scale reservation parameters > > > > > > + * according to current frequency and CPU maximum capacity. > > > > > > + */ > > > > > > + if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) { > > > > > > + scaled_delta_exec = grub_reclaim(delta_exec, > > > > > > + rq, > > > > > > + &curr->dl); > > > > > > + } else { > > > > > > + unsigned long scale_freq = arch_scale_freq_capacity(cpu); > > > > > > + unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu); > > > > > > + > > > > > > + scaled_delta_exec = cap_scale(delta_exec, scale_freq); > > > > > > + scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu); > > > > > > + } > > > > > > + > > > > > > + dl_se->runtime -= scaled_delta_exec; > > > > > > > > > > > > > > > > This I don't get... > > > > > > > > > > > > Considering that we use GRUB's active utilization to drive clock > > > > frequency selection, rationale is that GRUB tasks don't need any special > > > > scaling, as their delta_exec is already scaled according to GRUB rules. > > > > OTOH, normal tasks need to have their runtime (delta_exec) explicitly > > > > scaled considering current frequency (and CPU max capacity), otherwise > > > > they are going to receive less runtime than granted at AC, when > > > > frequency is reduced. > > > > > > I don't think that quite works out. Given that the frequency selection > > > will never quite end up at exactly the same fraction (if the hardware > > > listens to your requests at all). > > > > > > > It's an approximation yes (how big it depends on the granularity of the > > available frequencies). But, for the !GRUB tasks it should be OK, as we > > always select a frequency (among the available ones) bigger than the > > current active utilization. > > > > Also, for platforms/archs that don't redefine arch_scale_* this is not > > used. In case they are defined instead the assumption is that either hw > > listens to requests or scaling factors can be derived in some other ways > > (avgs?). > > > > > Also, by not scaling the GRUB stuff, don't you run the risk of > > > attempting to hand out more idle time than there actually is? > > > > The way I understand it is that for GRUB tasks we always scale > > considering the "correct" factor. Then frequency could be higher, but > > this spare idle time will be reclaimed by other GRUB tasks. > > I'm still confused.. > > So GRUB does: > > dq = Uact -dt > > right? Right. This is what the original (single processor) GRUB did. And this was used by the "GRUB-PA" algorithm: https://www.researchgate.net/profile/Giuseppe_Lipari/publication/220800940_Using_resource_reservation_techniques_for_power-aware_scheduling/links/09e41513639b2703fc000000.pdf (basically, GRUB-PA uses GRUB for reclaiming, and scales the CPU frequency based on Uact) > Now, you do DVFS using that same Uact. If we lower the clock, we need > more time, so would we then not end up with something like: > > dq = 1/Uact -dt Well, in the GRUB-PA algorithm GRUB reclaiming is the mechanism used to give more runtime to the task... Since Uact is < 1, doing dq = - Uact * dt means that we decrease the current runtime by a smaller amount of time. And so we end up giving more runtime to the task: instead of giving dl_runtime every dl_period, we give "dl_runtime / Uact" every dl_period... And since the CPU is slower (by a ratio Uact), this is equivalent to giving dl_runtime at the maximum CPU speed / frequency (at least, in theory :). > After all; our budget assignment is such that we're able to complete > our work at max freq. Therefore, when we lower the frequency, we'll have > to increase budget pro rata, otherwise we'll not complete our work and > badness happens. Right. But instead of increasing dl_runtime, GRUB-PA decreases the amount of time accounted to the current runtime. > Say we have a 1 Ghz part and Uact=0.5 we'd select 500 Mhz and need > double the time to complete. > > Now, if we fold these two together, you'd get: > > dq = Uact/Uact -dt = -dt Not sure why " / Uact"... According to the GRUB-PA algorithm, you just do dq = - Uact * dt = -0.5dt and you end up giving the CPU to the task for 2 * dl_runtime every dl_period (as expected) > Because, after all, if we lowered the clock to consume all idle time, > there's no further idle time to reclaim. The idea of GRUB-PA is that you do not change dl_runtime... So, the task is still "nominally reserved" dl_runtime every dl_period (in this example, 1/2*dl_period every dl_period)... It is the reclaiming mechanism that allows the task to execute for dl_runtime/Uact every dl_period (in this example, for dl_period every dl_period, so it reclaims 1/2dl_period every dl_period). > Now, of course, our DVFS assignment isn't as precise nor as > deterministic as this, so we'll get a slightly different ratio, lets > call that Udvfs. This is where GRUB-PA starts to have issues... :) The implicit assumption is (I think) that is the DVFS mechanism cannot assign exactly the requested frequency then it makes a "conservative" assignment (assigning a frequency higher than the requested one). > So would then not GRUB change into something like: > > dq = Uact/Udvfs -dt > > Allowing it to only reclaim that idle time that exists because our DVFS > level is strictly higher than required? I think GRUB should still do "dq = -Uact * dt", trying to reclaim all the idle CPU time (up to the configured limit, of course). Luca > > This way, on our 1 GHz part, with Uact=.5 but Udvfs=.6, we'll allow it > to reclaim just the additional 100Mhz of idle time. > > > Or am I completely off the rails now?
On Tue, Jul 25, 2017 at 09:03:08AM +0200, Luca Abeni wrote: > > I'm still confused.. > > > > So GRUB does: > > > > dq = Uact -dt > > > > right? > > Right. This is what the original (single processor) GRUB did. And this > was used by the "GRUB-PA" algorithm: > https://www.researchgate.net/profile/Giuseppe_Lipari/publication/220800940_Using_resource_reservation_techniques_for_power-aware_scheduling/links/09e41513639b2703fc000000.pdf > > (basically, GRUB-PA uses GRUB for reclaiming, and scales the CPU > frequency based on Uact) > > > > Now, you do DVFS using that same Uact. If we lower the clock, we need > > more time, so would we then not end up with something like: > > > > dq = 1/Uact -dt > > Well, in the GRUB-PA algorithm GRUB reclaiming is the mechanism used to > give more runtime to the task... Since Uact is < 1, doing > dq = - Uact * dt > means that we decrease the current runtime by a smaller amount of time. > And so we end up giving more runtime to the task: instead of giving > dl_runtime every dl_period, we give "dl_runtime / Uact" every > dl_period... And since the CPU is slower (by a ratio Uact), this is > equivalent to giving dl_runtime at the maximum CPU speed / frequency > (at least, in theory :). > > > > After all; our budget assignment is such that we're able to complete > > our work at max freq. Therefore, when we lower the frequency, we'll have > > to increase budget pro rata, otherwise we'll not complete our work and > > badness happens. > > Right. But instead of increasing dl_runtime, GRUB-PA decreases the > amount of time accounted to the current runtime. > > > > Say we have a 1 Ghz part and Uact=0.5 we'd select 500 Mhz and need > > double the time to complete. > > > > Now, if we fold these two together, you'd get: > > > > dq = Uact/Uact -dt = -dt > > Not sure why " / Uact"... According to the GRUB-PA algorithm, you just > do > dq = - Uact * dt = -0.5dt > and you end up giving the CPU to the task for 2 * dl_runtime every > dl_period (as expected) Yeah, I seem to have gone off the rails there... Bah I'm terminally confused now. Let me try and get my brain the right way up.
On Tue, 25 Jul 2017 15:51:05 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Jul 25, 2017 at 09:03:08AM +0200, Luca Abeni wrote: > > > > I'm still confused.. > > > > > > So GRUB does: > > > > > > dq = Uact -dt > > > > > > right? > > > > Right. This is what the original (single processor) GRUB did. And > > this was used by the "GRUB-PA" algorithm: > > https://www.researchgate.net/profile/Giuseppe_Lipari/publication/220800940_Using_resource_reservation_techniques_for_power-aware_scheduling/links/09e41513639b2703fc000000.pdf > > > > (basically, GRUB-PA uses GRUB for reclaiming, and scales the CPU > > frequency based on Uact) > > > > > > > Now, you do DVFS using that same Uact. If we lower the clock, we > > > need more time, so would we then not end up with something like: > > > > > > dq = 1/Uact -dt > > > > Well, in the GRUB-PA algorithm GRUB reclaiming is the mechanism > > used to give more runtime to the task... Since Uact is < 1, doing > > dq = - Uact * dt > > means that we decrease the current runtime by a smaller amount of > > time. And so we end up giving more runtime to the task: instead of > > giving dl_runtime every dl_period, we give "dl_runtime / Uact" every > > dl_period... And since the CPU is slower (by a ratio Uact), this is > > equivalent to giving dl_runtime at the maximum CPU speed / frequency > > (at least, in theory :). > > > > > > > After all; our budget assignment is such that we're able to > > > complete our work at max freq. Therefore, when we lower the > > > frequency, we'll have to increase budget pro rata, otherwise > > > we'll not complete our work and badness happens. > > > > Right. But instead of increasing dl_runtime, GRUB-PA decreases the > > amount of time accounted to the current runtime. > > > > > > > Say we have a 1 Ghz part and Uact=0.5 we'd select 500 Mhz and need > > > double the time to complete. > > > > > > Now, if we fold these two together, you'd get: > > > > > > dq = Uact/Uact -dt = -dt > > > > Not sure why " / Uact"... According to the GRUB-PA algorithm, you > > just do > > dq = - Uact * dt = -0.5dt > > and you end up giving the CPU to the task for 2 * dl_runtime every > > dl_period (as expected) > > Yeah, I seem to have gone off the rails there... Bah I'm terminally > confused now. Let me try and get my brain the right way up. This stuff always confuses me too... :) The parts that gives me more headaches is how to combine GRUB-PA with non-reclaiming tasks, and how to cope with "real world issues" (such as an actual DVFS frequency different from the theoretical one, GRUB reclaiming less than 100% of the CPU time, etc...) Anyway, Claudio is running some experiments with this patchset, measuring power saving and missed deadlines for various sets of periodic real-time tasks... We hope to present the results at RTLWS. Luca
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 4ec82fa56b04..5bed1a2fae06 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1122,7 +1122,8 @@ static void update_curr_dl(struct rq *rq) { struct task_struct *curr = rq->curr; struct sched_dl_entity *dl_se = &curr->dl; - u64 delta_exec; + u64 delta_exec, scaled_delta_exec; + int cpu = cpu_of(rq); if (!dl_task(curr) || !on_dl_rq(dl_se)) return; @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq) if (unlikely(dl_entity_is_special(dl_se))) return; - if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) - delta_exec = grub_reclaim(delta_exec, rq, &curr->dl); - dl_se->runtime -= delta_exec; + /* + * For tasks that participate in GRUB, we implement GRUB-PA: the + * spare reclaimed bandwidth is used to clock down frequency. + * + * For the others, we still need to scale reservation parameters + * according to current frequency and CPU maximum capacity. + */ + if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) { + scaled_delta_exec = grub_reclaim(delta_exec, + rq, + &curr->dl); + } else { + unsigned long scale_freq = arch_scale_freq_capacity(cpu); + unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu); + + scaled_delta_exec = cap_scale(delta_exec, scale_freq); + scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu); + } + + dl_se->runtime -= scaled_delta_exec; throttle: if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) { diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 97bd5aae285d..a6e8a4f5786f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2844,8 +2844,6 @@ static u32 __accumulate_pelt_segments(u64 periods, u32 d1, u32 d3) return c1 + c2 + c3; } -#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT) - /* * Accumulate the three separate parts of the sum; d1 the remainder * of the last (incomplete) period, d2 the span of full periods and d3 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 178f4a5df2fa..688185761eaf 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -155,6 +155,8 @@ static inline int task_has_dl_policy(struct task_struct *p) return dl_policy(p->policy); } +#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT) + /* * !! For sched_setattr_nocheck() (kernel) only !! *
Apply frequency and cpu scale-invariance correction factor to bandwidth enforcement (similar to what we already do to fair utilization tracking). Each delta_exec gets scaled considering current frequency and maximum cpu capacity; which means that the reservation runtime parameter (that need to be specified profiling the task execution at max frequency on biggest capacity core) gets thus scaled accordingly. Signed-off-by: Juri Lelli <juri.lelli@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Luca Abeni <luca.abeni@santannapisa.it> Cc: Claudio Scordino <claudio@evidence.eu.com> --- kernel/sched/deadline.c | 26 ++++++++++++++++++++++---- kernel/sched/fair.c | 2 -- kernel/sched/sched.h | 2 ++ 3 files changed, 24 insertions(+), 6 deletions(-)