Message ID | 20211103161020.26714-5-lukasz.luba@arm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Refactor thermal pressure update to avoid code duplication | expand |
Hi Lukasz, On 11/3/21 12:10 PM, Lukasz Luba wrote: > Thermal pressure provides a new API, which allows to use CPU frequency > as an argument. That removes the need of local conversion to capacity. > Use this new API and remove old local conversion code. > > Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> > --- > drivers/cpufreq/qcom-cpufreq-hw.c | 15 +++++---------- > 1 file changed, 5 insertions(+), 10 deletions(-) > > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c > index 0138b2ec406d..425f351450ad 100644 > --- a/drivers/cpufreq/qcom-cpufreq-hw.c > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c > @@ -275,10 +275,10 @@ static unsigned int qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data) > > static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data) > { > - unsigned long max_capacity, capacity, freq_hz, throttled_freq; > struct cpufreq_policy *policy = data->policy; > int cpu = cpumask_first(policy->cpus); > struct device *dev = get_cpu_device(cpu); > + unsigned long freq_hz, throttled_freq; > struct dev_pm_opp *opp; > unsigned int freq; > > @@ -295,17 +295,12 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data) > > throttled_freq = freq_hz / HZ_PER_KHZ; > > - /* Update thermal pressure */ > - > - max_capacity = arch_scale_cpu_capacity(cpu); > - capacity = mult_frac(max_capacity, throttled_freq, policy->cpuinfo.max_freq); > - > /* Don't pass boost capacity to scheduler */ > - if (capacity > max_capacity) > - capacity = max_capacity; So, I think this should go into the common topology_update_thermal_pressure in lieu of + if (WARN_ON(max_freq < capped_freq)) + return; This will fix the issue Steev Klimaszewski has been reporting https://lore.kernel.org/linux-arm-kernel/3cba148a-7077-7b6b-f131-dc65045aa348@arm.com/
Hi Thara, +CC Steev, who discovered this issue with boost frequency On 11/5/21 7:12 PM, Thara Gopinath wrote: > Hi Lukasz, > > > On 11/3/21 12:10 PM, Lukasz Luba wrote: >> Thermal pressure provides a new API, which allows to use CPU frequency >> as an argument. That removes the need of local conversion to capacity. >> Use this new API and remove old local conversion code. >> >> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> >> --- >> drivers/cpufreq/qcom-cpufreq-hw.c | 15 +++++---------- >> 1 file changed, 5 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c >> b/drivers/cpufreq/qcom-cpufreq-hw.c >> index 0138b2ec406d..425f351450ad 100644 >> --- a/drivers/cpufreq/qcom-cpufreq-hw.c >> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c >> @@ -275,10 +275,10 @@ static unsigned int >> qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data) >> static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data) >> { >> - unsigned long max_capacity, capacity, freq_hz, throttled_freq; >> struct cpufreq_policy *policy = data->policy; >> int cpu = cpumask_first(policy->cpus); >> struct device *dev = get_cpu_device(cpu); >> + unsigned long freq_hz, throttled_freq; >> struct dev_pm_opp *opp; >> unsigned int freq; >> @@ -295,17 +295,12 @@ static void qcom_lmh_dcvs_notify(struct >> qcom_cpufreq_data *data) >> throttled_freq = freq_hz / HZ_PER_KHZ; >> - /* Update thermal pressure */ >> - >> - max_capacity = arch_scale_cpu_capacity(cpu); >> - capacity = mult_frac(max_capacity, throttled_freq, >> policy->cpuinfo.max_freq); >> - >> /* Don't pass boost capacity to scheduler */ >> - if (capacity > max_capacity) >> - capacity = max_capacity; > > So, I think this should go into the common > topology_update_thermal_pressure in lieu of > > + if (WARN_ON(max_freq < capped_freq)) > + return; > > This will fix the issue Steev Klimaszewski has been reporting > https://lore.kernel.org/linux-arm-kernel/3cba148a-7077-7b6b-f131-dc65045aa348@arm.com/ > > > Well, I think the issue is broader. Look at the code which calculate this 'capacity'. It's just a multiplication & division: max_capacity = arch_scale_cpu_capacity(cpu); // =1024 in our case capacity = mult_frac(max_capacity, throttled_freq, policy->cpuinfo.max_freq); In the reported by Steev output from sysfs cpufreq we know that the value of 'policy->cpuinfo.max_freq' is: /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:2956800 so when we put the values to the equation we get: capacity = 1024 * 2956800 / 2956800; // =1024 The 'capacity' will be always <= 1024 and this check won't be triggered: /* Don't pass boost capacity to scheduler */ if (capacity > max_capacity) capacity = max_capacity; IIUC you original code, you don't want to have this boost frequency to be treated as 1024 capacity. The reason is because the whole capacity machinery in arch_topology.c is calculated based on max freq value = 2841600, so the max capacity 1024 would be pinned to that frequency (according to Steeve's log: [ 22.552273] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) for CPUs [4-7] ) Having all this in mind, the multiplication and division in your original code should be done: capacity = 1024 * 2956800 / 2841600; // = 1065 then clamped to 1024 value. My change just unveiled this division issue. With that in mind, I tend to agree that I should have not rely on passed boost freq value and try to apply your suggestion check. Let me experiment with that... Regards, Lukasz
On 11/8/21 9:12 AM, Lukasz Luba wrote: ...snip >> >> > > Well, I think the issue is broader. Look at the code which > calculate this 'capacity'. It's just a multiplication & division: > > max_capacity = arch_scale_cpu_capacity(cpu); // =1024 in our case > capacity = mult_frac(max_capacity, throttled_freq, > policy->cpuinfo.max_freq); > > In the reported by Steev output from sysfs cpufreq we know > that the value of 'policy->cpuinfo.max_freq' is: > /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:2956800 > > so when we put the values to the equation we get: > capacity = 1024 * 2956800 / 2956800; // =1024 > The 'capacity' will be always <= 1024 and this check won't > be triggered: > > /* Don't pass boost capacity to scheduler */ > if (capacity > max_capacity) > capacity = max_capacity; > > > IIUC you original code, you don't want to have this boost > frequency to be treated as 1024 capacity. The reason is because > the whole capacity machinery in arch_topology.c is calculated based > on max freq value = 2841600, > so the max capacity 1024 would be pinned to that frequency > (according to Steeve's log: > [ 22.552273] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) for > CPUs [4-7] ) Hi Lukasz, Yes you are right in that I was using policy->cpuinfo.max_freq where as I should have used freq_factor. So now that you are using freq_factor, it makes sense to cap the capacity at the max capacity calulated by the scheduler. I agree that the problem is complex because at some point we should look at rebuilding the topology based on changes to policy->cpuinfo.max_freq. > > > Having all this in mind, the multiplication and division in your > original code should be done: > > capacity = 1024 * 2956800 / 2841600; // = 1065 > > then clamped to 1024 value. > > My change just unveiled this division issue. > > With that in mind, I tend to agree that I should have not > rely on passed boost freq value and try to apply your suggestion check. > Let me experiment with that... > > Regards, > Lukasz
On 11/8/21 9:23 PM, Thara Gopinath wrote: > > > On 11/8/21 9:12 AM, Lukasz Luba wrote: > ...snip > >>> >>> >> >> Well, I think the issue is broader. Look at the code which >> calculate this 'capacity'. It's just a multiplication & division: >> >> max_capacity = arch_scale_cpu_capacity(cpu); // =1024 in our case >> capacity = mult_frac(max_capacity, throttled_freq, >> policy->cpuinfo.max_freq); >> >> In the reported by Steev output from sysfs cpufreq we know >> that the value of 'policy->cpuinfo.max_freq' is: >> /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:2956800 >> >> so when we put the values to the equation we get: >> capacity = 1024 * 2956800 / 2956800; // =1024 >> The 'capacity' will be always <= 1024 and this check won't >> be triggered: >> >> /* Don't pass boost capacity to scheduler */ >> if (capacity > max_capacity) >> capacity = max_capacity; >> >> >> IIUC you original code, you don't want to have this boost >> frequency to be treated as 1024 capacity. The reason is because >> the whole capacity machinery in arch_topology.c is calculated based >> on max freq value = 2841600, >> so the max capacity 1024 would be pinned to that frequency >> (according to Steeve's log: >> [ 22.552273] THERMAL_PRESSURE: max_freq(2841) < capped_freq(2956) >> for CPUs [4-7] ) > > Hi Lukasz, > > Yes you are right in that I was using policy->cpuinfo.max_freq where as > I should have used freq_factor. So now that you are using freq_factor, > it makes sense to cap the capacity at the max capacity calulated by the > scheduler. > > I agree that the problem is complex because at some point we should look > at rebuilding the topology based on changes to policy->cpuinfo.max_freq. > I probably cannot fix your driver easily right now. What I can do and is actually required for this new API arch_update_thermal_pressure() is to accept boost frequencies (values which are higher that 'freq_factor') without triggering a warning and just setting the thermal pressure to 0 (since we are told that the frequency capping is completely removed even for boost values). The next step would be to perform longer investigation how the boost frequencies are accepted then triggered/used by scheduler and other involved machinery. I've asked Steev for help with setting up this Rockchip RK3399 new boost frequency which actually is used. I want to understand why that platform is able to use the boost freq and this Qcom SoC is not able to use it. I agree with you that at some point we might need to try rebuilding the topology information based on these policy->cpuinfo.max_freq changes. I hope it would take only a few steps to fix these issues completely, without destroying a lot of existing code... Regards, Lukasz
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c index 0138b2ec406d..425f351450ad 100644 --- a/drivers/cpufreq/qcom-cpufreq-hw.c +++ b/drivers/cpufreq/qcom-cpufreq-hw.c @@ -275,10 +275,10 @@ static unsigned int qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data) static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data) { - unsigned long max_capacity, capacity, freq_hz, throttled_freq; struct cpufreq_policy *policy = data->policy; int cpu = cpumask_first(policy->cpus); struct device *dev = get_cpu_device(cpu); + unsigned long freq_hz, throttled_freq; struct dev_pm_opp *opp; unsigned int freq; @@ -295,17 +295,12 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data) throttled_freq = freq_hz / HZ_PER_KHZ; - /* Update thermal pressure */ - - max_capacity = arch_scale_cpu_capacity(cpu); - capacity = mult_frac(max_capacity, throttled_freq, policy->cpuinfo.max_freq); - /* Don't pass boost capacity to scheduler */ - if (capacity > max_capacity) - capacity = max_capacity; + if (throttled_freq > policy->cpuinfo.max_freq) + throttled_freq = policy->cpuinfo.max_freq; - arch_set_thermal_pressure(policy->related_cpus, - max_capacity - capacity); + /* Update thermal pressure */ + arch_update_thermal_pressure(policy->related_cpus, throttled_freq); /* * In the unlikely case policy is unregistered do not enable
Thermal pressure provides a new API, which allows to use CPU frequency as an argument. That removes the need of local conversion to capacity. Use this new API and remove old local conversion code. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> --- drivers/cpufreq/qcom-cpufreq-hw.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)