Message ID | 20171130114723.29210-2-patrick.bellasi@arm.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Hi, On 30/11/17 11:47, Patrick Bellasi wrote: [...] > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index 2f52ec0f1539..67339ccb5595 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > sg_cpu->util = util; > sg_cpu->max = max; > + > + /* CPU is entering IDLE, reset flags without triggering an update */ > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > + sg_cpu->flags = 0; > + goto done; > + } Looks good for now. I'm just thinking that we will happen for DL, as a CPU that still "has" a sleeping task is not going to be really idle until the 0-lag time. I guess we could move this at that point in time? > sg_cpu->flags = flags; > > sugov_set_iowait_boost(sg_cpu, time, flags); > @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > sugov_update_commit(sg_policy, time, next_f); > } > > +done: > raw_spin_unlock(&sg_policy->update_lock); > } > > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c > index d518664cce4f..6e8ae2aa7a13 100644 > --- a/kernel/sched/idle_task.c > +++ b/kernel/sched/idle_task.c > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf > put_prev_task(rq, prev); > update_idle_core(rq); > schedstat_inc(rq->sched_goidle); > + > + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ > + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); Don't know if it make things any cleaner, but you could add to the comment that we don't actually trigger a frequency update with this call. Best, Juri
On 30-Nov 14:12, Juri Lelli wrote: > Hi, > > On 30/11/17 11:47, Patrick Bellasi wrote: > > [...] > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > index 2f52ec0f1539..67339ccb5595 100644 > > --- a/kernel/sched/cpufreq_schedutil.c > > +++ b/kernel/sched/cpufreq_schedutil.c > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > > > sg_cpu->util = util; > > sg_cpu->max = max; > > + > > + /* CPU is entering IDLE, reset flags without triggering an update */ > > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > > + sg_cpu->flags = 0; > > + goto done; > > + } > > Looks good for now. I'm just thinking that we will happen for DL, as a > CPU that still "has" a sleeping task is not going to be really idle > until the 0-lag time. AFAIU, for the time being, DL already cannot really rely on this flag for its behaviors to be correct. Indeed, flags are reset as soon as a FAIR task wakes up and it's enqueued. Only once your DL integration patches are in, we do not depends on flags anymore since DL will report a ceratain utilization up to the 0-lag time, isn't it? If that's the case, I would say that the flags will be used only to jump to the max OPP for RT tasks. Thus, this patch should still be valid. > I guess we could move this at that point in time? Not sure what you mean here. Right now the new SCHED_CPUFREQ_IDLE flag is notified only by idle tasks. That's the only code path where we are sure the CPU is entering IDLE. > > sg_cpu->flags = flags; > > > > sugov_set_iowait_boost(sg_cpu, time, flags); > > @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > sugov_update_commit(sg_policy, time, next_f); > > } > > > > +done: > > raw_spin_unlock(&sg_policy->update_lock); > > } > > > > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c > > index d518664cce4f..6e8ae2aa7a13 100644 > > --- a/kernel/sched/idle_task.c > > +++ b/kernel/sched/idle_task.c > > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf > > put_prev_task(rq, prev); > > update_idle_core(rq); > > schedstat_inc(rq->sched_goidle); > > + > > + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ > > + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); > > Don't know if it make things any cleaner, but you could add to the > comment that we don't actually trigger a frequency update with this > call. Right, will add on next posting. > Best, > > Juri Cheers Patrick
On 30/11/17 15:41, Patrick Bellasi wrote: > On 30-Nov 14:12, Juri Lelli wrote: > > Hi, > > > > On 30/11/17 11:47, Patrick Bellasi wrote: > > > > [...] > > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > > index 2f52ec0f1539..67339ccb5595 100644 > > > --- a/kernel/sched/cpufreq_schedutil.c > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > > > > > sg_cpu->util = util; > > > sg_cpu->max = max; > > > + > > > + /* CPU is entering IDLE, reset flags without triggering an update */ > > > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > > > + sg_cpu->flags = 0; > > > + goto done; > > > + } > > > > Looks good for now. I'm just thinking that we will happen for DL, as a > > CPU that still "has" a sleeping task is not going to be really idle > > until the 0-lag time. > > AFAIU, for the time being, DL already cannot really rely on this flag > for its behaviors to be correct. Indeed, flags are reset as soon as > a FAIR task wakes up and it's enqueued. Right, and your flags ORing patch should help with this. > > Only once your DL integration patches are in, we do not depends on > flags anymore since DL will report a ceratain utilization up to the > 0-lag time, isn't it? Utilization won't decrease until 0-lag time, correct. I was just wondering if resetting flags before that time (when a CPU enters idle) might be an issue. > > If that's the case, I would say that the flags will be used only to > jump to the max OPP for RT tasks. Thus, this patch should still be valid. > > > I guess we could move this at that point in time? > > Not sure what you mean here. Right now the new SCHED_CPUFREQ_IDLE flag > is notified only by idle tasks. That's the only code path where we are > sure the CPU is entering IDLE. > W.r.t. the possible issue above, I was thinking that we might want to reset flags at 0-lag time for DL (if CPU is still idle). Anyway, two distinct set of patches. Who gets in last will have to ponder the thing a little bit more. :) Best, Juri
On 30-Nov 17:02, Juri Lelli wrote: > On 30/11/17 15:41, Patrick Bellasi wrote: > > On 30-Nov 14:12, Juri Lelli wrote: > > > Hi, > > > > > > On 30/11/17 11:47, Patrick Bellasi wrote: > > > > > > [...] > > > > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > > > index 2f52ec0f1539..67339ccb5595 100644 > > > > --- a/kernel/sched/cpufreq_schedutil.c > > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > > > > > > > sg_cpu->util = util; > > > > sg_cpu->max = max; > > > > + > > > > + /* CPU is entering IDLE, reset flags without triggering an update */ > > > > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > > > > + sg_cpu->flags = 0; > > > > + goto done; > > > > + } > > > > > > Looks good for now. I'm just thinking that we will happen for DL, as a > > > CPU that still "has" a sleeping task is not going to be really idle > > > until the 0-lag time. > > > > AFAIU, for the time being, DL already cannot really rely on this flag > > for its behaviors to be correct. Indeed, flags are reset as soon as > > a FAIR task wakes up and it's enqueued. > > Right, and your flags ORing patch should help with this. > > > > > Only once your DL integration patches are in, we do not depends on > > flags anymore since DL will report a ceratain utilization up to the > > 0-lag time, isn't it? > > Utilization won't decrease until 0-lag time, correct. Then IMO with your DL patches the DL class don't need the flags anymore since schedutil will know (and account) for the utlization required by the DL tasks. Isn't it? > I was just wondering if resetting flags before that time (when a CPU > enters idle) might be an issue. If the above is correct, then flags will be used only for the RT class (and IO boosting)... and thus this patch will still be useful as it is now: meaning that once the idle task is selected we do not care anymore about RT and IOBoosting (only). > > If that's the case, I would say that the flags will be used only to > > jump to the max OPP for RT tasks. Thus, this patch should still be valid. > > > > > I guess we could move this at that point in time? > > > > Not sure what you mean here. Right now the new SCHED_CPUFREQ_IDLE flag > > is notified only by idle tasks. That's the only code path where we are > > sure the CPU is entering IDLE. > > > > W.r.t. the possible issue above, I was thinking that we might want to > reset flags at 0-lag time for DL (if CPU is still idle). Anyway, two > distinct set of patches. Who gets in last will have to ponder the thing > a little bit more. :) Perhaps I'm still a bit confused but, to me, it seems that with your patches we completely fix DL but we still can use this exact same patch just for RT tasks. > Best, > > Juri
On 30/11/17 16:19, Patrick Bellasi wrote: > On 30-Nov 17:02, Juri Lelli wrote: > > On 30/11/17 15:41, Patrick Bellasi wrote: > > > On 30-Nov 14:12, Juri Lelli wrote: > > > > Hi, > > > > > > > > On 30/11/17 11:47, Patrick Bellasi wrote: > > > > > > > > [...] > > > > > > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > > > > index 2f52ec0f1539..67339ccb5595 100644 > > > > > --- a/kernel/sched/cpufreq_schedutil.c > > > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > > > > > > > > > sg_cpu->util = util; > > > > > sg_cpu->max = max; > > > > > + > > > > > + /* CPU is entering IDLE, reset flags without triggering an update */ > > > > > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > > > > > + sg_cpu->flags = 0; > > > > > + goto done; > > > > > + } > > > > > > > > Looks good for now. I'm just thinking that we will happen for DL, as a > > > > CPU that still "has" a sleeping task is not going to be really idle > > > > until the 0-lag time. > > > > > > AFAIU, for the time being, DL already cannot really rely on this flag > > > for its behaviors to be correct. Indeed, flags are reset as soon as > > > a FAIR task wakes up and it's enqueued. > > > > Right, and your flags ORing patch should help with this. > > > > > > > > Only once your DL integration patches are in, we do not depends on > > > flags anymore since DL will report a ceratain utilization up to the > > > 0-lag time, isn't it? > > > > Utilization won't decrease until 0-lag time, correct. > > Then IMO with your DL patches the DL class don't need the flags > anymore since schedutil will know (and account) for the > utlization required by the DL tasks. Isn't it? > > > I was just wondering if resetting flags before that time (when a CPU > > enters idle) might be an issue. > > If the above is correct, then flags will be used only for the RT class (and > IO boosting)... and thus this patch will still be useful as it is now: > meaning that once the idle task is selected we do not care anymore > about RT and IOBoosting (only). > > > > If that's the case, I would say that the flags will be used only to > > > jump to the max OPP for RT tasks. Thus, this patch should still be valid. > > > > > > > I guess we could move this at that point in time? > > > > > > Not sure what you mean here. Right now the new SCHED_CPUFREQ_IDLE flag > > > is notified only by idle tasks. That's the only code path where we are > > > sure the CPU is entering IDLE. > > > > > > > W.r.t. the possible issue above, I was thinking that we might want to > > reset flags at 0-lag time for DL (if CPU is still idle). Anyway, two > > distinct set of patches. Who gets in last will have to ponder the thing > > a little bit more. :) > > Perhaps I'm still a bit confused but, to me, it seems that with your > patches we completely fix DL but we still can use this exact same > patch just for RT tasks. We don't use the flags for bailing out during aggregation, so it should be ok for DL yes. Thanks, Juri
On 30-11-17, 11:47, Patrick Bellasi wrote: > diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h > index d1ad3d825561..bb5f778db023 100644 > --- a/include/linux/sched/cpufreq.h > +++ b/include/linux/sched/cpufreq.h > @@ -11,6 +11,7 @@ > #define SCHED_CPUFREQ_RT (1U << 0) > #define SCHED_CPUFREQ_DL (1U << 1) > #define SCHED_CPUFREQ_IOWAIT (1U << 2) > +#define SCHED_CPUFREQ_IDLE (1U << 3) > > #define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL) > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index 2f52ec0f1539..67339ccb5595 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > sg_cpu->util = util; > sg_cpu->max = max; > + > + /* CPU is entering IDLE, reset flags without triggering an update */ > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > + sg_cpu->flags = 0; > + goto done; > + } > sg_cpu->flags = flags; > > sugov_set_iowait_boost(sg_cpu, time, flags); > @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > sugov_update_commit(sg_policy, time, next_f); > } > > +done: > raw_spin_unlock(&sg_policy->update_lock); > } > > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c > index d518664cce4f..6e8ae2aa7a13 100644 > --- a/kernel/sched/idle_task.c > +++ b/kernel/sched/idle_task.c > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf > put_prev_task(rq, prev); > update_idle_core(rq); > schedstat_inc(rq->sched_goidle); > + > + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ > + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); We posted some comments on V2 for this particular patch suggesting some improvements. The patch hasn't changed at all and you haven't replied to few of those suggestions as well. Any particular reason for that? For example: - I suggested to get rid of the conditional expression in cpufreq_schedutil.c file that you have added. - And Joel suggested to clear the RT/DL flags from dequeue path to avoid adding SCHED_CPUFREQ_IDLE flag.
Hi Viresh, On 07-Dec 10:31, Viresh Kumar wrote: > On 30-11-17, 11:47, Patrick Bellasi wrote: > > diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h > > index d1ad3d825561..bb5f778db023 100644 > > --- a/include/linux/sched/cpufreq.h > > +++ b/include/linux/sched/cpufreq.h > > @@ -11,6 +11,7 @@ > > #define SCHED_CPUFREQ_RT (1U << 0) > > #define SCHED_CPUFREQ_DL (1U << 1) > > #define SCHED_CPUFREQ_IOWAIT (1U << 2) > > +#define SCHED_CPUFREQ_IDLE (1U << 3) > > > > #define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL) > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > index 2f52ec0f1539..67339ccb5595 100644 > > --- a/kernel/sched/cpufreq_schedutil.c > > +++ b/kernel/sched/cpufreq_schedutil.c > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > > > sg_cpu->util = util; > > sg_cpu->max = max; > > + > > + /* CPU is entering IDLE, reset flags without triggering an update */ > > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > > + sg_cpu->flags = 0; > > + goto done; > > + } > > sg_cpu->flags = flags; > > > > sugov_set_iowait_boost(sg_cpu, time, flags); > > @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > sugov_update_commit(sg_policy, time, next_f); > > } > > > > +done: > > raw_spin_unlock(&sg_policy->update_lock); > > } > > > > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c > > index d518664cce4f..6e8ae2aa7a13 100644 > > --- a/kernel/sched/idle_task.c > > +++ b/kernel/sched/idle_task.c > > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf > > put_prev_task(rq, prev); > > update_idle_core(rq); > > schedstat_inc(rq->sched_goidle); > > + > > + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ > > + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); > > We posted some comments on V2 for this particular patch suggesting > some improvements. The patch hasn't changed at all and you haven't > replied to few of those suggestions as well. Any particular reason for > that? You right, since the previous posting has been a long time ago, with this one I mainly wanted to refresh the discussion. Thanks for highlighting hereafter which one was the main discussion points. > For example: > - I suggested to get rid of the conditional expression in > cpufreq_schedutil.c file that you have added. We can probably set flags to SCHED_CPUFREQ_IDLE (instead of resetting them), however I think we still need an if condition somewhere. Indeed, when SCHED_CPUFREQ_IDLE is asserted we don't want to trigger an OPP change (reasons described in the changelog). If that's still a goal, then we will need to check this flag and bail out from sugov_update_shared straight away. That's why I've added a check at the beginning and also defined it as unlikely to have not impact on all cases where we call a schedutil update with runnable tasks. Does this makes sense? > - And Joel suggested to clear the RT/DL flags from dequeue path to > avoid adding SCHED_CPUFREQ_IDLE flag. I had a thought about Joel's proposal: >> wouldn't another way be to just clear the flag from the RT scheduling >> class with an extra call to cpufreq_update_util with flags = 0 during >> dequeue_rt_entity? The main concern for me was that the current API is completely transparent about which scheduling class is calling schedutil for updates. Thus, at dequeue time of an RT task we cannot really clear all the flags (e.g. IOWAIT of a fair task), we should clear only the RT related flags. This means that we likely need to implement Joel's idea by: 1. adding a new set of flags like: SCHED_CPUFREQ_RT_IDLE, SCHED_CPUFREQ_DL_IDLE, etc... 3. add an operation flag, e.g. SCHED_CPUFERQ_SET, SCHED_CPUFREQ_RESET to be ORed with the class flag, e.g. cpufreq_update_util(rq, SCHED_CPUFREQ_SET|SCHED_CPUFREQ_RT); 3. change the API to carry the operation required for a flag, e.g.: cpufreq_update_util(rq, flag, set={true, false}); To be honest I don't like any of those, especially compared to the simplicity of the one proposed by this patch. :) IMO, the only pitfall of this patch is that (as Juri pointed out in v2) for DL it can happen that we do not want to reset the flag right when a CPU enters IDLE. We need instead a specific call to reset the DL flag at the 0-lag time. However, AFAIU, this special case for DL will disappear as long as we have last Juri's set [1]in. Indeed, at this point, schedutil will always and only need to know the utilization required by DL. [1] https://lkml.org/lkml/2017/12/4/173 Cheers Patrick
On 12/07/2017 01:45 PM, Patrick Bellasi wrote: > Hi Viresh, > > On 07-Dec 10:31, Viresh Kumar wrote: >> On 30-11-17, 11:47, Patrick Bellasi wrote: [...] >> We posted some comments on V2 for this particular patch suggesting >> some improvements. The patch hasn't changed at all and you haven't >> replied to few of those suggestions as well. Any particular reason for >> that? > > You right, since the previous posting has been a long time ago, with > this one I mainly wanted to refresh the discussion. Thanks for > highlighting hereafter which one was the main discussion points. > > >> For example: >> - I suggested to get rid of the conditional expression in >> cpufreq_schedutil.c file that you have added. > > We can probably set flags to SCHED_CPUFREQ_IDLE (instead of resetting > them), however I think we still need an if condition somewhere. > > Indeed, when SCHED_CPUFREQ_IDLE is asserted we don't want to trigger > an OPP change (reasons described in the changelog). > > If that's still a goal, then we will need to check this flag and bail > out from sugov_update_shared straight away. That's why I've added a > check at the beginning and also defined it as unlikely to have not > impact on all cases where we call a schedutil update with runnable > tasks. > > Does this makes sense? IIRC, there was also this question of doing this not only in the shared but also in the single case ... [...]
On Thu, Nov 30, 2017 at 11:47:18AM +0000, Patrick Bellasi wrote: > Currently, sg_cpu's flags are set to the value defined by the last call > of the cpufreq_update_util(); for RT/DL classes this corresponds to the > SCHED_CPUFREQ_{RT/DL} flags always being set. > > When multiple CPUs share the same frequency domain it might happen that > a CPU which executed an RT task, right before entering IDLE, has one of > the SCHED_CPUFREQ_RT_DL flags set, permanently, until it exits IDLE. > > Although such an idle CPU is _going to be_ ignored by the > sugov_next_freq_shared(): > 1. this kind of "useless RT requests" are ignored only if more then > TICK_NSEC have elapsed since the last update > 2. we can still potentially trigger an already too late switch to > MAX, which starts also a new throttling interval > 3. the internal state machine is not consistent with what the > scheduler knows, i.e. the CPU is now actually idle So I _really_ hate having to clutter the idle path for this shared case :/ 1, can obviously be fixed by short-circuiting the timeout when idle. 2. not sure how if you do 1; anybody doing a switch will go through sugov_next_freq_shared() which will poll all relevant CPUs and per 1 will see its idle, no? Not sure what that leaves for 3. > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c > index d518664cce4f..6e8ae2aa7a13 100644 > --- a/kernel/sched/idle_task.c > +++ b/kernel/sched/idle_task.c > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf > put_prev_task(rq, prev); > update_idle_core(rq); > schedstat_inc(rq->sched_goidle); > + > + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ > + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); > + > return rq->idle; > } > > -- > 2.14.1 >
On 20-Dec 15:33, Peter Zijlstra wrote: > On Thu, Nov 30, 2017 at 11:47:18AM +0000, Patrick Bellasi wrote: > > Currently, sg_cpu's flags are set to the value defined by the last call > > of the cpufreq_update_util(); for RT/DL classes this corresponds to the > > SCHED_CPUFREQ_{RT/DL} flags always being set. > > > > When multiple CPUs share the same frequency domain it might happen that > > a CPU which executed an RT task, right before entering IDLE, has one of > > the SCHED_CPUFREQ_RT_DL flags set, permanently, until it exits IDLE. > > > > Although such an idle CPU is _going to be_ ignored by the > > sugov_next_freq_shared(): > > 1. this kind of "useless RT requests" are ignored only if more then > > TICK_NSEC have elapsed since the last update > > 2. we can still potentially trigger an already too late switch to > > MAX, which starts also a new throttling interval > > 3. the internal state machine is not consistent with what the > > scheduler knows, i.e. the CPU is now actually idle > > So I _really_ hate having to clutter the idle path for this shared case > :/ :) We would like to have per-CPU frequency domains... but the HW guys always complain that's too costly from an HW/power standpoint... and they are likely right :-/ So, here are are just at trying hard to have a SW status matching the HW status... which is just another pain :-/ > 1, can obviously be fixed by short-circuiting the timeout when idle. Mmm.. right... it should be possible for schedutil to detect that a certain CPU is currently idle. Can we use core.c::idle_cpu() from cpufreq_schedutil? > 2. not sure how if you do 1; anybody doing a switch will go through > sugov_next_freq_shared() which will poll all relevant CPUs and per 1 > will see its idle, no? Right, that should work... > Not sure what that leaves for 3. When a CPU is detected idle, perhaps we can still clear the RT flags... ... just for "consistency" of current status representation. > > > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c > > index d518664cce4f..6e8ae2aa7a13 100644 > > --- a/kernel/sched/idle_task.c > > +++ b/kernel/sched/idle_task.c > > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf > > put_prev_task(rq, prev); > > update_idle_core(rq); > > schedstat_inc(rq->sched_goidle); > > + > > + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ > > + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); > > + > > return rq->idle; > > } > > > > -- > > 2.14.1 > >
diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h index d1ad3d825561..bb5f778db023 100644 --- a/include/linux/sched/cpufreq.h +++ b/include/linux/sched/cpufreq.h @@ -11,6 +11,7 @@ #define SCHED_CPUFREQ_RT (1U << 0) #define SCHED_CPUFREQ_DL (1U << 1) #define SCHED_CPUFREQ_IOWAIT (1U << 2) +#define SCHED_CPUFREQ_IDLE (1U << 3) #define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 2f52ec0f1539..67339ccb5595 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, sg_cpu->util = util; sg_cpu->max = max; + + /* CPU is entering IDLE, reset flags without triggering an update */ + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { + sg_cpu->flags = 0; + goto done; + } sg_cpu->flags = flags; sugov_set_iowait_boost(sg_cpu, time, flags); @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, sugov_update_commit(sg_policy, time, next_f); } +done: raw_spin_unlock(&sg_policy->update_lock); } diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index d518664cce4f..6e8ae2aa7a13 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf put_prev_task(rq, prev); update_idle_core(rq); schedstat_inc(rq->sched_goidle); + + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); + return rq->idle; }