Message ID | 1514608.eWxQqcMBcc@aspire.rjw.lan (mailing list archive) |
---|---|
State | Mainlined |
Delegated to: | Rafael Wysocki |
Headers | show |
Hi Rafael, On 30/03/17 23:36, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The schedutil governor reduces frequencies too fast in some > situations which cases undesirable performance drops to > appear. > > To address that issue, make schedutil reduce the frequency slower by > setting it to the average of the value chosen during the previous > iteration of governor computations and the new one coming from its > frequency selection formula. > I'm curious to test this out on Pixel phones once back in office, but I've already got some considerations about this patch. Please find them inline below. > Link: https://bugzilla.kernel.org/show_bug.cgi?id=194963 > Reported-by: John <john.ettedgui@gmail.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > > This addresses a practical issue, but one in the "responsiveness" or > "interactivity" category which is quite hard to represent quantitatively. > > As reported by John in BZ194963, schedutil does not ramp up P-states quickly > enough which causes audio issues to appear in his gaming setup. At least it > evidently is worse than ondemand in this respect and the patch below helps. > Might also be a PELT problem? > The patch essentially repeats the trick added some time ago to the load-based > P-state selection algorithm in intel_pstate, which allowed us to make it viable > for performance-oriented users, and which is to reduce frequencies at a slower > pace. > > The reason why I chose the average is because it is computationally cheap > and pretty much the max reasonable slowdown and the idea is that in case > there's something about to run that we don't know about yet, it is better to > stay at a higher level for a while more to avoid having to get up from the floor > every time. > Another approach we have been playing with on Android (to solve what seem to me similar issues) is to have decoupled up and down frequency changes thresholds. With this you can decide how much quick to react to a sudden increase in utilization and how much "hysteresis" you want to have before slowing down. Every platfrom can also be easily tuned as needed (instead of having the same filter applied for every platform). We seemed to actually recently come to the consideration that the up threshold is probably not much needed (and it is in fact set to very small values in practice). Once one is confident that the utilization signal is not too jumpy, responding to a request for additional capacity quickly seems the right thing to do (especially for RT/DL policies). What's your opinion? > But technically speaking it is a filter. :-) > > So among other things I'm wondering if that leads to substantial increases in > energy consumption anywhere. > Having a tunable might help getting the tradeoff right for different platforms, maybe? As we discussed at the last LPC, having an energy model handy and use that to decide how quickly to ramp up or slow down seems the desirable long term solution, but we probably need something (as you are proposing) until we get there. Thanks, - Juri
On Friday, March 31, 2017 11:22:23 AM Juri Lelli wrote: > Hi Rafael, Hi Juri, > On 30/03/17 23:36, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > The schedutil governor reduces frequencies too fast in some > > situations which cases undesirable performance drops to > > appear. > > > > To address that issue, make schedutil reduce the frequency slower by > > setting it to the average of the value chosen during the previous > > iteration of governor computations and the new one coming from its > > frequency selection formula. > > > > I'm curious to test this out on Pixel phones once back in office, but > I've already got some considerations about this patch. Please find them > inline below. > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=194963 > > Reported-by: John <john.ettedgui@gmail.com> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > > > This addresses a practical issue, but one in the "responsiveness" or > > "interactivity" category which is quite hard to represent quantitatively. > > > > As reported by John in BZ194963, schedutil does not ramp up P-states quickly > > enough which causes audio issues to appear in his gaming setup. At least it > > evidently is worse than ondemand in this respect and the patch below helps. > > > > Might also be a PELT problem? I don't think so. As mentioned below, intel_pstate had it too and it doesn't use PELT. :-) This appears to be a general issue with load-based (or utilization-based) frequency selection algorithms using periodic sampling. Roughly, if something unanticipated is going to happen shortly (such as a burst in audio activity in a game), it may take a whole period to notice what's going on and the frequency set for that period can make a difference between sufficient and insufficient provisioning. What the patch does is to increase the likelihood that the frequency in question will be sufficient to avoid noticeable effects (such as audio cracks) and it tends to do the trick most of the time. [Of course, you may argue that this is related to the rate limitting in schedutil and intel_pstate, but then PELT itself is sampled periodically AFAICS.] > > The patch essentially repeats the trick added some time ago to the load-based > > P-state selection algorithm in intel_pstate, which allowed us to make it viable > > for performance-oriented users, and which is to reduce frequencies at a slower > > pace. > > > > The reason why I chose the average is because it is computationally cheap > > and pretty much the max reasonable slowdown and the idea is that in case > > there's something about to run that we don't know about yet, it is better to > > stay at a higher level for a while more to avoid having to get up from the floor > > every time. > > > > Another approach we have been playing with on Android (to solve what > seem to me similar issues) is to have decoupled up and down frequency > changes thresholds. With this you can decide how much quick to react to > a sudden increase in utilization and how much "hysteresis" you want to > have before slowing down. Every platfrom can also be easily tuned as > needed (instead of having the same filter applied for every platform). > > We seemed to actually recently come to the consideration that the up > threshold is probably not much needed (and it is in fact set to very > small values in practice). Once one is confident that the utilization > signal is not too jumpy, responding to a request for additional capacity > quickly seems the right thing to do (especially for RT/DL policies). > > What's your opinion? As I said, responding to increased load may take a whole period to notice and it looks like what happens during that period may be quite important. To me, thresholds have a problem that from the algorithm perspective they are constant values set externally. This means they likely need to be tuned once in a while by whatever entity that had set them (it is difficult to imagine that the same values will always be suitable for every workload) and that means an additional layer of (dynamic) control on top of the governor. > > But technically speaking it is a filter. :-) > > > > So among other things I'm wondering if that leads to substantial increases in > > energy consumption anywhere. > > > > Having a tunable might help getting the tradeoff right for different > platforms, maybe? It might, but it would mean additional computational cost (at least one more integer multiplication AFAICS). > As we discussed at the last LPC, having an energy model handy and use > that to decide how quickly to ramp up or slow down seems the desirable > long term solution, but we probably need something (as you are > proposing) until we get there. Well, we definitely need something to address real use cases, like the one that I responded to with this patch. :-) Thanks, Rafael
On Sat, Apr 1, 2017 at 1:39 PM, Andres Oportus <andresoportus@android.com> wrote: > Hi Rafael, Juri, > > On Fri, Mar 31, 2017 at 2:51 PM, Rafael J. Wysocki <rjw@rjwysocki.net> > wrote: >> >> On Friday, March 31, 2017 11:22:23 AM Juri Lelli wrote: >> > Hi Rafael, >> >> Hi Juri, >> >> > On 30/03/17 23:36, Rafael J. Wysocki wrote: >> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> > > >> > > The schedutil governor reduces frequencies too fast in some >> > > situations which cases undesirable performance drops to >> > > appear. >> > > >> > > To address that issue, make schedutil reduce the frequency slower by >> > > setting it to the average of the value chosen during the previous >> > > iteration of governor computations and the new one coming from its >> > > frequency selection formula. >> > > >> > >> > I'm curious to test this out on Pixel phones once back in office, but >> > I've already got some considerations about this patch. Please find them >> > inline below. >> > >> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=194963 >> > > Reported-by: John <john.ettedgui@gmail.com> >> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> > > --- >> > > >> > > This addresses a practical issue, but one in the "responsiveness" or >> > > "interactivity" category which is quite hard to represent >> > > quantitatively. >> > > >> > > As reported by John in BZ194963, schedutil does not ramp up P-states >> > > quickly >> > > enough which causes audio issues to appear in his gaming setup. At >> > > least it >> > > evidently is worse than ondemand in this respect and the patch below >> > > helps. >> > > >> > >> > Might also be a PELT problem? >> >> I don't think so. >> >> As mentioned below, intel_pstate had it too and it doesn't use PELT. :-) >> >> This appears to be a general issue with load-based (or utilization-based) >> frequency selection algorithms using periodic sampling. Roughly, if >> something >> unanticipated is going to happen shortly (such as a burst in audio >> activity in >> a game), it may take a whole period to notice what's going on and the >> frequency >> set for that period can make a difference between sufficient and >> insufficient >> provisioning. >> >> >> What the patch does is to increase the likelihood that the frequency in >> question will be sufficient to avoid noticeable effects (such as audio >> cracks) >> and it tends to do the trick most of the time. >> >> [Of course, you may argue that this is related to the rate limitting in >> schedutil and intel_pstate, but then PELT itself is sampled periodically >> AFAICS.] >> >> > > The patch essentially repeats the trick added some time ago to the >> > > load-based >> > > P-state selection algorithm in intel_pstate, which allowed us to make >> > > it viable >> > > for performance-oriented users, and which is to reduce frequencies at >> > > a slower >> > > pace. >> > > >> > > The reason why I chose the average is because it is computationally >> > > cheap >> > > and pretty much the max reasonable slowdown and the idea is that in >> > > case >> > > there's something about to run that we don't know about yet, it is >> > > better to >> > > stay at a higher level for a while more to avoid having to get up from >> > > the floor >> > > every time. >> > > >> > >> > Another approach we have been playing with on Android (to solve what >> > seem to me similar issues) is to have decoupled up and down frequency >> > changes thresholds. With this you can decide how much quick to react to >> > a sudden increase in utilization and how much "hysteresis" you want to >> > have before slowing down. Every platfrom can also be easily tuned as >> > needed (instead of having the same filter applied for every platform). >> >> > >> > We seemed to actually recently come to the consideration that the up >> > threshold is probably not much needed (and it is in fact set to very >> > small values in practice). Once one is confident that the utilization >> > signal is not too jumpy, responding to a request for additional capacity >> > quickly seems the right thing to do (especially for RT/DL policies). >> > >> > What's your opinion? >> >> As I said, responding to increased load may take a whole period to notice >> and it looks like what happens during that period may be quite important. >> >> To me, thresholds have a problem that from the algorithm perspective they >> are constant values set externally. This means they likely need to be >> tuned >> once in a while by whatever entity that had set them (it is difficult to >> imagine that the same values will always be suitable for every workload) >> and >> that means an additional layer of (dynamic) control on top of the >> governor. >> >> > > But technically speaking it is a filter. :-) >> > > >> > > So among other things I'm wondering if that leads to substantial >> > > increases in >> > > energy consumption anywhere. >> > > >> > >> > Having a tunable might help getting the tradeoff right for different >> > platforms, maybe? >> >> It might, but it would mean additional computational cost (at least one >> more >> integer multiplication AFAICS). >> >> > As we discussed at the last LPC, having an energy model handy and use >> > that to decide how quickly to ramp up or slow down seems the desirable >> > long term solution, but we probably need something (as you are >> > proposing) until we get there. >> >> Well, we definitely need something to address real use cases, like the one >> that >> I responded to with this patch. :-) > > I don't know the history/intent behind schedutil rate limiting, but if we > make it to be only "down" as Juri mentioned we would not be adding a new > tunable but rather changing the current one to be more restricted (maybe > some renaming would be in order if this is done), this would provide > hysteresis to reduce this problem without locking the amount of the > hysteresis which may not work for all platforms. I also agree that "it is > difficult to imagine that the same values will always be suitable for every > workload", but without any value to control the whole system, we get nothing > in between. Ultimately I also think we should solve the hysteresis problem > at the root, i.e. the input to the governor in the form of util/load that > has not only hysteresis and energy model, but also any other behavioral > inputs built-in. > > Thanks, > Andres >> >> >> Thanks, >> Rafael >> >
On Sun, Apr 2, 2017 at 1:29 AM, Andres Oportus <andresoportus@android.com> wrote: > On Sat, Apr 1, 2017 at 1:39 PM, Andres Oportus > <andresoportus@android.com> wrote: >> Hi Rafael, Juri, [cut] >>> >>> > As we discussed at the last LPC, having an energy model handy and use >>> > that to decide how quickly to ramp up or slow down seems the desirable >>> > long term solution, but we probably need something (as you are >>> > proposing) until we get there. >>> >>> Well, we definitely need something to address real use cases, like the one >>> that >>> I responded to with this patch. :-) >> >> I don't know the history/intent behind schedutil rate limiting, but if we Basically, the hardware cannot be requested to change the frequency too often as that would be too expensive in general. >> make it to be only "down" as Juri mentioned we would not be adding a new >> tunable but rather changing the current one to be more restricted (maybe >> some renaming would be in order if this is done), this would provide >> hysteresis to reduce this problem without locking the amount of the >> hysteresis which may not work for all platforms. I also agree that "it is >> difficult to imagine that the same values will always be suitable for every >> workload", but without any value to control the whole system, we get nothing >> in between. Ultimately I also think we should solve the hysteresis problem >> at the root, i.e. the input to the governor in the form of util/load that >> has not only hysteresis and energy model, but also any other behavioral >> inputs built-in. That's long-term, though, and besides knobs only help if users change the defaults. From my experience they usually don't do that. Thanks, Rafael
On Sat, Apr 1, 2017 at 7:03 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > On Sun, Apr 2, 2017 at 1:29 AM, Andres Oportus > <andresoportus@android.com> wrote: >> On Sat, Apr 1, 2017 at 1:39 PM, Andres Oportus >> <andresoportus@android.com> wrote: >>> Hi Rafael, Juri, > > [cut] > >>>> >>>> > As we discussed at the last LPC, having an energy model handy and use >>>> > that to decide how quickly to ramp up or slow down seems the desirable >>>> > long term solution, but we probably need something (as you are >>>> > proposing) until we get there. >>>> >>>> Well, we definitely need something to address real use cases, like the one >>>> that >>>> I responded to with this patch. :-) >>> >>> I don't know the history/intent behind schedutil rate limiting, but if we > > Basically, the hardware cannot be requested to change the frequency > too often as that would be too expensive in general. Expensiveness of changing the frequency too often is one of the reasons we need to have down throttle in Android, but we can't have this value also slow down the frequency increases (so up throttle being equal to down throttle is bad for us) which would affect interactivity/response-time. > >>> make it to be only "down" as Juri mentioned we would not be adding a new >>> tunable but rather changing the current one to be more restricted (maybe >>> some renaming would be in order if this is done), this would provide >>> hysteresis to reduce this problem without locking the amount of the >>> hysteresis which may not work for all platforms. I also agree that "it is >>> difficult to imagine that the same values will always be suitable for every >>> workload", but without any value to control the whole system, we get nothing >>> in between. Ultimately I also think we should solve the hysteresis problem >>> at the root, i.e. the input to the governor in the form of util/load that >>> has not only hysteresis and energy model, but also any other behavioral >>> inputs built-in. > > That's long-term, though, and besides knobs only help if users change > the defaults. From my experience they usually don't do that. Agreed that it is long term. Regarding knobs, we are already tuning the down throttle value for Android (we currently do 50ms down for the similar knob in schedfreq). > > Thanks, > Rafael Thanks, Andres
On Thu, Mar 30, 2017 at 11:36 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The schedutil governor reduces frequencies too fast in some > situations which cases undesirable performance drops to > appear. > > To address that issue, make schedutil reduce the frequency slower by > setting it to the average of the value chosen during the previous > iteration of governor computations and the new one coming from its > frequency selection formula. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=194963 > Reported-by: John <john.ettedgui@gmail.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > > This addresses a practical issue, but one in the "responsiveness" or > "interactivity" category which is quite hard to represent quantitatively. > > As reported by John in BZ194963, schedutil does not ramp up P-states quickly > enough which causes audio issues to appear in his gaming setup. At least it > evidently is worse than ondemand in this respect and the patch below helps. > > The patch essentially repeats the trick added some time ago to the load-based > P-state selection algorithm in intel_pstate, which allowed us to make it viable > for performance-oriented users, and which is to reduce frequencies at a slower > pace. > > The reason why I chose the average is because it is computationally cheap > and pretty much the max reasonable slowdown and the idea is that in case > there's something about to run that we don't know about yet, it is better to > stay at a higher level for a while more to avoid having to get up from the floor > every time. > > But technically speaking it is a filter. :-) > > So among other things I'm wondering if that leads to substantial increases in > energy consumption anywhere. I haven't seen any numbers indicating that, so since the issue addressed by this patch is real, I'd like to go ahead with it for 4.12. Thanks, Rafael
Hi Rafael, On 11/04/17 23:23, Rafael J. Wysocki wrote: > On Thu, Mar 30, 2017 at 11:36 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > The schedutil governor reduces frequencies too fast in some > > situations which cases undesirable performance drops to > > appear. > > > > To address that issue, make schedutil reduce the frequency slower by > > setting it to the average of the value chosen during the previous > > iteration of governor computations and the new one coming from its > > frequency selection formula. > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=194963 > > Reported-by: John <john.ettedgui@gmail.com> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > > > This addresses a practical issue, but one in the "responsiveness" or > > "interactivity" category which is quite hard to represent quantitatively. > > > > As reported by John in BZ194963, schedutil does not ramp up P-states quickly > > enough which causes audio issues to appear in his gaming setup. At least it > > evidently is worse than ondemand in this respect and the patch below helps. > > > > The patch essentially repeats the trick added some time ago to the load-based > > P-state selection algorithm in intel_pstate, which allowed us to make it viable > > for performance-oriented users, and which is to reduce frequencies at a slower > > pace. > > > > The reason why I chose the average is because it is computationally cheap > > and pretty much the max reasonable slowdown and the idea is that in case > > there's something about to run that we don't know about yet, it is better to > > stay at a higher level for a while more to avoid having to get up from the floor > > every time. > > > > But technically speaking it is a filter. :-) > > > > So among other things I'm wondering if that leads to substantial increases in > > energy consumption anywhere. > > I haven't seen any numbers indicating that, so since the issue > addressed by this patch is real, I'd like to go ahead with it for > 4.12. > Sorry, it took a bit of time to get results. I didn't see big power/perf regressions on interactive type of workloads running on Pixel phones. Youtube only showed a +~2% energy increase, but I can't say for certain that is not in the noise. My experiments also have your other two patches [1] applied. So, yeah. I guess we could go ahead with this and see if we can improve further in the future. Thanks, - Juri [1] https://marc.info/?l=linux-kernel&m=149178351728004&w=2
On 30-03-17, 23:36, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The schedutil governor reduces frequencies too fast in some > situations which cases undesirable performance drops to > appear. > > To address that issue, make schedutil reduce the frequency slower by > setting it to the average of the value chosen during the previous > iteration of governor computations and the new one coming from its > frequency selection formula. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=194963 > Reported-by: John <john.ettedgui@gmail.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > > This addresses a practical issue, but one in the "responsiveness" or > "interactivity" category which is quite hard to represent quantitatively. > > As reported by John in BZ194963, schedutil does not ramp up P-states quickly > enough which causes audio issues to appear in his gaming setup. At least it > evidently is worse than ondemand in this respect and the patch below helps. > > The patch essentially repeats the trick added some time ago to the load-based > P-state selection algorithm in intel_pstate, which allowed us to make it viable > for performance-oriented users, and which is to reduce frequencies at a slower > pace. > > The reason why I chose the average is because it is computationally cheap > and pretty much the max reasonable slowdown and the idea is that in case > there's something about to run that we don't know about yet, it is better to > stay at a higher level for a while more to avoid having to get up from the floor > every time. > > But technically speaking it is a filter. :-) > > So among other things I'm wondering if that leads to substantial increases in > energy consumption anywhere. > > Thanks, > Rafael > > --- > kernel/sched/cpufreq_schedutil.c | 3 +++ > 1 file changed, 3 insertions(+) > > Index: linux-pm/kernel/sched/cpufreq_schedutil.c > =================================================================== > --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c > +++ linux-pm/kernel/sched/cpufreq_schedutil.c > @@ -101,6 +101,9 @@ static void sugov_update_commit(struct s > if (sg_policy->next_freq == next_freq) > return; > Maybe add a comment over here on why we are adding this workaround . > + if (sg_policy->next_freq > next_freq) > + next_freq = (sg_policy->next_freq + next_freq) >> 1; > + > sg_policy->next_freq = next_freq; > sg_policy->last_freq_update_time = time; Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Index: linux-pm/kernel/sched/cpufreq_schedutil.c =================================================================== --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c +++ linux-pm/kernel/sched/cpufreq_schedutil.c @@ -101,6 +101,9 @@ static void sugov_update_commit(struct s if (sg_policy->next_freq == next_freq) return; + if (sg_policy->next_freq > next_freq) + next_freq = (sg_policy->next_freq + next_freq) >> 1; + sg_policy->next_freq = next_freq; sg_policy->last_freq_update_time = time;