Message ID | 20200625064614.101183-1-wvw@google.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | cpufreq: schedutil: force frequency update when limits change | expand |
On 24-06-20, 23:46, Wei Wang wrote: > To avoid reducing the frequency of a CPU prematurely, we skip reducing > the frequency if the CPU had been busy recently. > > This should not be done when the limits of the policy are changed, for > example due to thermal throttling. We should always get the frequency > within the new limits as soon as possible. > > There was a fix in > commit 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when > limits change") upstream which introduced another flag. However, the > fix didn't address the case when next_freq is the same as previously > voted, which is then checked in sugov_update_next_freq. As a result, the > frequency would be stuck at low until the high demanding workload quits. > > test trace: > kworker/u19:0-1872 ( 1872) [002] .... 347.878871: cpu_frequency_limits: min=600000 max=2348000 cpu_id=6 > dhry64-11525 (11525) [007] d.h2 347.880012: sugov_should_update_freq: thermal limit on policy6 > dhry64-11525 (11525) [007] d.h2 347.880012: sugov_deferred_update: policy6 skipped update > dhry64-11525 (11525) [007] d.h2 347.884040: sugov_deferred_update: policy6 skipped update I am not sure these are helpful in the logs as the code which generated them isn't there in the kernel. > ... > > This patch fixes this by skipping the check and forcing an update in > this case. The second flag was kept as the limits_change flag could be > updated in thermal kworker from another CPU. I am sorry but I am not fully sure of what the problem is. Can you describe that by giving an example with some random frequency, and tell the expected and actual behavior ? > Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX") > Signed-off-by: Wei Wang <wvw@google.com> > --- > kernel/sched/cpufreq_schedutil.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index 7fbaee24c824..dc2cd768022e 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -102,11 +102,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) > static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time, > unsigned int next_freq) > { > - if (sg_policy->next_freq == next_freq) > + if (!sg_policy->need_freq_update && sg_policy->next_freq == next_freq) AFAIU, if the next freq is same as currently programmed one, there is no need to force update it. > return false; > > sg_policy->next_freq = next_freq; > sg_policy->last_freq_update_time = time; > + sg_policy->need_freq_update = false; > > return true; > } > @@ -178,7 +179,6 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, > if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update) > return sg_policy->next_freq; > > - sg_policy->need_freq_update = false; > sg_policy->cached_raw_freq = freq; > return cpufreq_driver_resolve_freq(policy, freq); > } > -- > 2.27.0.212.ge8ba1cc988-goog
On Thu, Jun 25, 2020 at 3:23 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 24-06-20, 23:46, Wei Wang wrote: > > To avoid reducing the frequency of a CPU prematurely, we skip reducing > > the frequency if the CPU had been busy recently. > > > > This should not be done when the limits of the policy are changed, for > > example due to thermal throttling. We should always get the frequency > > within the new limits as soon as possible. > > > > There was a fix in > > commit 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when > > limits change") upstream which introduced another flag. However, the > > fix didn't address the case when next_freq is the same as previously > > voted, which is then checked in sugov_update_next_freq. As a result, the > > frequency would be stuck at low until the high demanding workload quits. > > > > test trace: > > kworker/u19:0-1872 ( 1872) [002] .... 347.878871: cpu_frequency_limits: min=600000 max=2348000 cpu_id=6 > > dhry64-11525 (11525) [007] d.h2 347.880012: sugov_should_update_freq: thermal limit on policy6 > > dhry64-11525 (11525) [007] d.h2 347.880012: sugov_deferred_update: policy6 skipped update > > dhry64-11525 (11525) [007] d.h2 347.884040: sugov_deferred_update: policy6 skipped update > > I am not sure these are helpful in the logs as the code which > generated them isn't there in the kernel. > Yes, those traceprintk were added to those particular functions to help debug. > > ... > > > > This patch fixes this by skipping the check and forcing an update in > > this case. The second flag was kept as the limits_change flag could be > > updated in thermal kworker from another CPU. > > I am sorry but I am not fully sure of what the problem is. Can you > describe that by giving an example with some random frequency, and > tell the expected and actual behavior ? > The problem is sugov thought next_freq already updated (but actually skipped by the rate limit thing) and all following updates will be skipped. Actually this is specifically for Android common kernel 4.19's issue which has sugov_up_down_rate_limit in sugov_update_next_freq, let's continue discussion there. Thanks! -Wei > > Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX") > > Signed-off-by: Wei Wang <wvw@google.com> > > --- > > kernel/sched/cpufreq_schedutil.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > index 7fbaee24c824..dc2cd768022e 100644 > > --- a/kernel/sched/cpufreq_schedutil.c > > +++ b/kernel/sched/cpufreq_schedutil.c > > @@ -102,11 +102,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) > > static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time, > > unsigned int next_freq) > > { > > - if (sg_policy->next_freq == next_freq) > > + if (!sg_policy->need_freq_update && sg_policy->next_freq == next_freq) > > AFAIU, if the next freq is same as currently programmed one, there is > no need to force update it. > > > return false; > > > > sg_policy->next_freq = next_freq; > > sg_policy->last_freq_update_time = time; > > + sg_policy->need_freq_update = false; > > > > return true; > > } > > @@ -178,7 +179,6 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, > > if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update) > > return sg_policy->next_freq; > > > > - sg_policy->need_freq_update = false; > > sg_policy->cached_raw_freq = freq; > > return cpufreq_driver_resolve_freq(policy, freq); > > } > > -- > > 2.27.0.212.ge8ba1cc988-goog > > -- > viresh
On 25-06-20, 13:47, Wei Wang wrote: > On Thu, Jun 25, 2020 at 3:23 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > I am sorry but I am not fully sure of what the problem is. Can you > > describe that by giving an example with some random frequency, and > > tell the expected and actual behavior ? > > > The problem is sugov thought next_freq already updated (but actually > skipped by the rate limit thing) and all following updates will be > skipped. I am sorry, can you please give a detailed example with existing frequency and limits, then the limits changed to new values, then what exactly happens ? > Actually this is specifically for Android common kernel 4.19's issue > which has sugov_up_down_rate_limit in sugov_update_next_freq, let's > continue discussion there. If it is a mainline problem, we will surely get it fixed here. Just that I am not able to understand the problem yet. Sorry about that.
On 26-06-20, 07:44, Viresh Kumar wrote: > On 25-06-20, 13:47, Wei Wang wrote: > > On Thu, Jun 25, 2020 at 3:23 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > I am sorry but I am not fully sure of what the problem is. Can you > > > describe that by giving an example with some random frequency, and > > > tell the expected and actual behavior ? > > > > > The problem is sugov thought next_freq already updated (but actually > > skipped by the rate limit thing) and all following updates will be > > skipped. The rate-limiting thing is specific to android and not present in mainline. Even in android I see next_freq getting updated only after rate-limiting is verified. I think you maybe trying to fix an android only problem in mainline, which may not be required at all. And I am not sure if Android has a problem as well :) > I am sorry, can you please give a detailed example with existing > frequency and limits, then the limits changed to new values, then what > exactly happens ? > > > Actually this is specifically for Android common kernel 4.19's issue > > which has sugov_up_down_rate_limit in sugov_update_next_freq, let's > > continue discussion there. > > If it is a mainline problem, we will surely get it fixed here. Just > that I am not able to understand the problem yet. Sorry about that.
On Thu, Jun 25, 2020 at 7:32 PM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 26-06-20, 07:44, Viresh Kumar wrote: > > On 25-06-20, 13:47, Wei Wang wrote: > > > On Thu, Jun 25, 2020 at 3:23 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > I am sorry but I am not fully sure of what the problem is. Can you > > > > describe that by giving an example with some random frequency, and > > > > tell the expected and actual behavior ? > > > > > > > The problem is sugov thought next_freq already updated (but actually > > > skipped by the rate limit thing) and all following updates will be > > > skipped. > > The rate-limiting thing is specific to android and not present in > mainline. Even in android I see next_freq getting updated only after > rate-limiting is verified. > > I think you maybe trying to fix an android only problem in mainline, > which may not be required at all. And I am not sure if Android has a > problem as well :) > Yes, that is Android specific, I added you to the Gerrit already. Thanks! -Wei > > I am sorry, can you please give a detailed example with existing > > frequency and limits, then the limits changed to new values, then what > > exactly happens ? > > > > > Actually this is specifically for Android common kernel 4.19's issue > > > which has sugov_up_down_rate_limit in sugov_update_next_freq, let's > > > continue discussion there. > > > > If it is a mainline problem, we will surely get it fixed here. Just > > that I am not able to understand the problem yet. Sorry about that. > > -- > viresh
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 7fbaee24c824..dc2cd768022e 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -102,11 +102,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time, unsigned int next_freq) { - if (sg_policy->next_freq == next_freq) + if (!sg_policy->need_freq_update && sg_policy->next_freq == next_freq) return false; sg_policy->next_freq = next_freq; sg_policy->last_freq_update_time = time; + sg_policy->need_freq_update = false; return true; } @@ -178,7 +179,6 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update) return sg_policy->next_freq; - sg_policy->need_freq_update = false; sg_policy->cached_raw_freq = freq; return cpufreq_driver_resolve_freq(policy, freq); }
To avoid reducing the frequency of a CPU prematurely, we skip reducing the frequency if the CPU had been busy recently. This should not be done when the limits of the policy are changed, for example due to thermal throttling. We should always get the frequency within the new limits as soon as possible. There was a fix in commit 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when limits change") upstream which introduced another flag. However, the fix didn't address the case when next_freq is the same as previously voted, which is then checked in sugov_update_next_freq. As a result, the frequency would be stuck at low until the high demanding workload quits. test trace: kworker/u19:0-1872 ( 1872) [002] .... 347.878871: cpu_frequency_limits: min=600000 max=2348000 cpu_id=6 dhry64-11525 (11525) [007] d.h2 347.880012: sugov_should_update_freq: thermal limit on policy6 dhry64-11525 (11525) [007] d.h2 347.880012: sugov_deferred_update: policy6 skipped update dhry64-11525 (11525) [007] d.h2 347.884040: sugov_deferred_update: policy6 skipped update ... This patch fixes this by skipping the check and forcing an update in this case. The second flag was kept as the limits_change flag could be updated in thermal kworker from another CPU. Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX") Signed-off-by: Wei Wang <wvw@google.com> --- kernel/sched/cpufreq_schedutil.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)