Message ID | 20230901130312.247719-5-vincent.guittot@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | consolidate and cleanup CPU capacity | expand |
On 9/1/23 14:03, Vincent Guittot wrote: > The last item of a performance domain is not always the performance point > that has been used to compute CPU's capacity. This can lead to different > target frequency compared with other part of the system like schedutil and > would result in wrong energy estimation. > > a new arch_scale_freq_ref() is available to return a fixed and coherent > frequency reference that can be used when computing the CPU's frequency > for an level of utilization. Use this function when available or fallback > to the last performance domain item otherwise. > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > include/linux/energy_model.h | 20 +++++++++++++++++--- > 1 file changed, 17 insertions(+), 3 deletions(-) > > diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h > index b9caa01dfac4..7ee07be6928e 100644 > --- a/include/linux/energy_model.h > +++ b/include/linux/energy_model.h > @@ -204,6 +204,20 @@ struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd, > return ps; > } > > +#ifdef arch_scale_freq_ref > +static __always_inline > +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) > +{ > + return arch_scale_freq_ref(cpu); > +} > +#else > +static __always_inline > +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) > +{ > + return pd->table[pd->nr_perf_states - 1].frequency; > +} > +#endif > + > /** > * em_cpu_energy() - Estimates the energy consumed by the CPUs of a > * performance domain > @@ -224,7 +238,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, > unsigned long max_util, unsigned long sum_util, > unsigned long allowed_cpu_cap) > { > - unsigned long freq, scale_cpu; > + unsigned long freq, ref_freq, scale_cpu; > struct em_perf_state *ps; > int cpu; > > @@ -241,11 +255,11 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, > */ > cpu = cpumask_first(to_cpumask(pd->cpus)); > scale_cpu = arch_scale_cpu_capacity(cpu); > - ps = &pd->table[pd->nr_perf_states - 1]; > + ref_freq = arch_scale_freq_ref_em(cpu, pd); > > max_util = map_util_perf(max_util); > max_util = min(max_util, allowed_cpu_cap); > - freq = map_util_freq(max_util, ps->frequency, scale_cpu); > + freq = map_util_freq(max_util, ref_freq, scale_cpu); > > /* > * Find the lowest performance state of the Energy Model above the LGTM, Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> FYI, I'm going to send my v4 for the EM hopefully in next days, so those changes might collide. But we can sort this out later (when both would be ready). Regards, Lukasz
Hello Vincent, I tried the patch-set on a platform using cppc_cpufreq and that has boosting frequencies, 1- On such platform, the CPU capacity comes from the CPPC highest_frequency field. The CPU capacity is set to the capacity of the boosting frequency. This behaviour is different from DT platforms where the CPU capacity is updated whenever the boosting mode is enabled (it seems). Wouldn't it be better to have CPU max capacities set to their boosting capacity as for CPPC base platforms ? It seems the max frequency is always available somehow for all the cpufreq drivers with boosting available, i.e. acpi-cpufreq, amd-pstate, cppc_cpufreq. 2- On the CPPC based platforms, the per_cpu freq_factor is not used/updated, meaning that we have: arch_scale_freq_ref_em() \-arch_scale_freq_ref() \-topology_get_freq_ref() \-per_cpu(freq_factor, cpu) (set to the default value: 1) and em_cpu_energy()'s ref_freq variable is then set to 1 instead of the max frequency (leading to a 0 energy computation). 3- Also just in case, arch_scale_freq_ref_policy() and cpufreq_get_hw_max_freq() seem to have close (but not identical) purpose, Regards, Pierre On 9/1/23 15:03, Vincent Guittot wrote: > The last item of a performance domain is not always the performance point > that has been used to compute CPU's capacity. This can lead to different > target frequency compared with other part of the system like schedutil and > would result in wrong energy estimation. > > a new arch_scale_freq_ref() is available to return a fixed and coherent > frequency reference that can be used when computing the CPU's frequency > for an level of utilization. Use this function when available or fallback > to the last performance domain item otherwise. > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > include/linux/energy_model.h | 20 +++++++++++++++++--- > 1 file changed, 17 insertions(+), 3 deletions(-) > > diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h > index b9caa01dfac4..7ee07be6928e 100644 > --- a/include/linux/energy_model.h > +++ b/include/linux/energy_model.h > @@ -204,6 +204,20 @@ struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd, > return ps; > } > > +#ifdef arch_scale_freq_ref > +static __always_inline > +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) > +{ > + return arch_scale_freq_ref(cpu); > +} > +#else > +static __always_inline > +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) > +{ > + return pd->table[pd->nr_perf_states - 1].frequency; > +} > +#endif > + > /** > * em_cpu_energy() - Estimates the energy consumed by the CPUs of a > * performance domain > @@ -224,7 +238,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, > unsigned long max_util, unsigned long sum_util, > unsigned long allowed_cpu_cap) > { > - unsigned long freq, scale_cpu; > + unsigned long freq, ref_freq, scale_cpu; > struct em_perf_state *ps; > int cpu; > > @@ -241,11 +255,11 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, > */ > cpu = cpumask_first(to_cpumask(pd->cpus)); > scale_cpu = arch_scale_cpu_capacity(cpu); > - ps = &pd->table[pd->nr_perf_states - 1]; > + ref_freq = arch_scale_freq_ref_em(cpu, pd); > > max_util = map_util_perf(max_util); > max_util = min(max_util, allowed_cpu_cap); > - freq = map_util_freq(max_util, ps->frequency, scale_cpu); > + freq = map_util_freq(max_util, ref_freq, scale_cpu); > > /* > * Find the lowest performance state of the Energy Model above the
On Tue, Sep 05, 2023 at 12:05:30PM +0200, Pierre Gondois wrote: > Hello Vincent, > I tried the patch-set on a platform using cppc_cpufreq and that has boosting > frequencies, > > 1- > On such platform, the CPU capacity comes from the CPPC highest_frequency > field. The CPU capacity is set to the capacity of the boosting frequency. > This behaviour is different from DT platforms where the CPU capacity is > updated whenever the boosting mode is enabled (it seems). > > Wouldn't it be better to have CPU max capacities set to their boosting > capacity as for CPPC base platforms ? It seems the max frequency is always > available somehow for all the cpufreq drivers with boosting available, i.e. > acpi-cpufreq, amd-pstate, cppc_cpufreq. So on Intel we don't use the max (turbo) boost value, but typically end up picking the 4-core turbo value or something. There's a comment in arch/x86/kernel/cpu/aperfmperf.c. Per that comment it probably makes sense to be able to differentiate between a mobile device and a server, or perhaps we can (ab)use the EAS enable knob for this distinction? That is, I'm not sure it makes sense to always pick the highest boost freqency for ARM64 servers, very much analogous to how we don't do that on Intel.
On Tue, 5 Sept 2023 at 12:05, Pierre Gondois <pierre.gondois@arm.com> wrote: > > Hello Vincent, > I tried the patch-set on a platform using cppc_cpufreq and that has boosting > frequencies, > > 1- > On such platform, the CPU capacity comes from the CPPC highest_frequency > field. The CPU capacity is set to the capacity of the boosting frequency. > This behaviour is different from DT platforms where the CPU capacity is > updated whenever the boosting mode is enabled (it seems). ok, I haven't noticed that cppc_cpufreq would be impacted by this change in arch_topology. I'm going to check how to fix that > > Wouldn't it be better to have CPU max capacities set to their boosting > capacity as for CPPC base platforms ? It seems the max frequency is always > available somehow for all the cpufreq drivers with boosting available, i.e. > acpi-cpufreq, amd-pstate, cppc_cpufreq. Some platforms will never enable boost or boost is only temporarily available before being capped. As a result some prefer to use a more sustainable freq for their max capacity. That's why we can't always use the max/boost freq > > > 2- > On the CPPC based platforms, the per_cpu freq_factor is not used/updated, > meaning that we have: > arch_scale_freq_ref_em() > \-arch_scale_freq_ref() > \-topology_get_freq_ref() > \-per_cpu(freq_factor, cpu) (set to the default value: 1) > and em_cpu_energy()'s ref_freq variable is then set to 1 instead of the max > frequency (leading to a 0 energy computation). IIUC, cppc uses the default cpu capacity of arch_topology and then never updates it and it creates an EM for this SMP system. ok, so you have an EM sets with ACPI and SMP. I'm going to check where we could set this reference frequency for your case. > > 3- > Also just in case, arch_scale_freq_ref_policy() and cpufreq_get_hw_max_freq() > seem to have close (but not identical) purpose, > > Regards, > Pierre > > On 9/1/23 15:03, Vincent Guittot wrote: > > The last item of a performance domain is not always the performance point > > that has been used to compute CPU's capacity. This can lead to different > > target frequency compared with other part of the system like schedutil and > > would result in wrong energy estimation. > > > > a new arch_scale_freq_ref() is available to return a fixed and coherent > > frequency reference that can be used when computing the CPU's frequency > > for an level of utilization. Use this function when available or fallback > > to the last performance domain item otherwise. > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > --- > > include/linux/energy_model.h | 20 +++++++++++++++++--- > > 1 file changed, 17 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h > > index b9caa01dfac4..7ee07be6928e 100644 > > --- a/include/linux/energy_model.h > > +++ b/include/linux/energy_model.h > > @@ -204,6 +204,20 @@ struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd, > > return ps; > > } > > > > +#ifdef arch_scale_freq_ref > > +static __always_inline > > +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) > > +{ > > + return arch_scale_freq_ref(cpu); > > +} > > +#else > > +static __always_inline > > +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) > > +{ > > + return pd->table[pd->nr_perf_states - 1].frequency; > > +} > > +#endif > > + > > /** > > * em_cpu_energy() - Estimates the energy consumed by the CPUs of a > > * performance domain > > @@ -224,7 +238,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, > > unsigned long max_util, unsigned long sum_util, > > unsigned long allowed_cpu_cap) > > { > > - unsigned long freq, scale_cpu; > > + unsigned long freq, ref_freq, scale_cpu; > > struct em_perf_state *ps; > > int cpu; > > > > @@ -241,11 +255,11 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, > > */ > > cpu = cpumask_first(to_cpumask(pd->cpus)); > > scale_cpu = arch_scale_cpu_capacity(cpu); > > - ps = &pd->table[pd->nr_perf_states - 1]; > > + ref_freq = arch_scale_freq_ref_em(cpu, pd); > > > > max_util = map_util_perf(max_util); > > max_util = min(max_util, allowed_cpu_cap); > > - freq = map_util_freq(max_util, ps->frequency, scale_cpu); > > + freq = map_util_freq(max_util, ref_freq, scale_cpu); > > > > /* > > * Find the lowest performance state of the Energy Model above the
On 01/09/2023 15:03, Vincent Guittot wrote: [...] > diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h > index b9caa01dfac4..7ee07be6928e 100644 > --- a/include/linux/energy_model.h > +++ b/include/linux/energy_model.h > @@ -204,6 +204,20 @@ struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd, > return ps; > } > > +#ifdef arch_scale_freq_ref > +static __always_inline > +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) Why is this function named with the arch prefix? So far we have 5 arch functions (arch_scale_freq_tick() <-> arch_scale_freq_ref()) and e.g. Arm/Arm64 defines them with there topology_foo implementations. Isn't arch_scale_freq_ref_em() (as well as arch_scale_freq_ref_policy()) different in this sense and so a proper EM function which should manifest in its name? > +{ > + return arch_scale_freq_ref(cpu); > +} > +#else > +static __always_inline > +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) > +{ > + return pd->table[pd->nr_perf_states - 1].frequency; > +} > +#endif [...] > @@ -241,11 +255,11 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, > */ > cpu = cpumask_first(to_cpumask(pd->cpus)); > scale_cpu = arch_scale_cpu_capacity(cpu); > - ps = &pd->table[pd->nr_perf_states - 1]; > + ref_freq = arch_scale_freq_ref_em(cpu, pd); Why not using existing `unsigned long freq` here like in schedutil's get_next_freq()? > > max_util = map_util_perf(max_util); [...]
On Thu, 14 Sept 2023 at 23:07, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > > On 01/09/2023 15:03, Vincent Guittot wrote: > > [...] > > > diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h > > index b9caa01dfac4..7ee07be6928e 100644 > > --- a/include/linux/energy_model.h > > +++ b/include/linux/energy_model.h > > @@ -204,6 +204,20 @@ struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd, > > return ps; > > } > > > > +#ifdef arch_scale_freq_ref > > +static __always_inline > > +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) > > Why is this function named with the arch prefix? > > So far we have 5 arch functions (arch_scale_freq_tick() <-> > arch_scale_freq_ref()) and e.g. Arm/Arm64 defines them with there > topology_foo implementations. > > Isn't arch_scale_freq_ref_em() (as well as arch_scale_freq_ref_policy()) > different in this sense and so a proper EM function which should > manifest in its name? arch_scale_freq_ref_em() is there to handle cases where arch_scale_freq_ref() is not defined by arch. I keep arch_ prefix because this should be provided by architecture which wants to use EM. In the case of EM, it's only there for allyes|randconfig on arch that doesn't use arch_topology.c like x86_64 > > > +{ > > + return arch_scale_freq_ref(cpu); > > +} > > +#else > > +static __always_inline > > +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) > > +{ > > + return pd->table[pd->nr_perf_states - 1].frequency; > > +} > > +#endif > > [...] > > > @@ -241,11 +255,11 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, > > */ > > cpu = cpumask_first(to_cpumask(pd->cpus)); > > scale_cpu = arch_scale_cpu_capacity(cpu); > > - ps = &pd->table[pd->nr_perf_states - 1]; > > + ref_freq = arch_scale_freq_ref_em(cpu, pd); > > Why not using existing `unsigned long freq` here like in schedutil's > get_next_freq()? Find it easier to read and understand and will not make any difference in the compiled code > > > > > max_util = map_util_perf(max_util); > > [...] >
On 15/09/2023 15:35, Vincent Guittot wrote: > On Thu, 14 Sept 2023 at 23:07, Dietmar Eggemann > <dietmar.eggemann@arm.com> wrote: >> >> On 01/09/2023 15:03, Vincent Guittot wrote: [...] >>> +#ifdef arch_scale_freq_ref >>> +static __always_inline >>> +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) >> >> Why is this function named with the arch prefix? >> >> So far we have 5 arch functions (arch_scale_freq_tick() <-> >> arch_scale_freq_ref()) and e.g. Arm/Arm64 defines them with there >> topology_foo implementations. >> >> Isn't arch_scale_freq_ref_em() (as well as arch_scale_freq_ref_policy()) >> different in this sense and so a proper EM function which should >> manifest in its name? > > arch_scale_freq_ref_em() is there to handle cases where > arch_scale_freq_ref() is not defined by arch. I keep arch_ prefix > because this should be provided by architecture which wants to use EM. That's correct, x86_64 with CONFIG_ENERGY_MODEL=y needs arch_scale_freq_ref_em() returning highest perf_state of the perf_domain. But this function as opposed to arch_scale_freq_ref() does not have to be provided by the arch itself. It's provided by the EM instead. That's why my doubt whether it should be named arch_scale_freq_ref_em(). > In the case of EM, it's only there for allyes|randconfig on arch that > doesn't use arch_topology.c like x86_64 [...] >>> @@ -241,11 +255,11 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, >>> */ >>> cpu = cpumask_first(to_cpumask(pd->cpus)); >>> scale_cpu = arch_scale_cpu_capacity(cpu); >>> - ps = &pd->table[pd->nr_perf_states - 1]; >>> + ref_freq = arch_scale_freq_ref_em(cpu, pd); >> >> Why not using existing `unsigned long freq` here like in schedutil's >> get_next_freq()? > > Find it easier to read and understand and will not make any difference > in the compiled code True but I thought it's easier to be able to detect the functional similarity between em_cpu_energy() (*) and get_next_freq(). freq = arch_scale_freq_ref_{policy,em}({policy,(cpu, pd)}); ... (in case of *) freq = map_util_freq(util, freq, max); Just a nitpick ... [...]
On Friday 01 Sep 2023 at 15:03:12 (+0200), Vincent Guittot wrote: > The last item of a performance domain is not always the performance point > that has been used to compute CPU's capacity. This can lead to different > target frequency compared with other part of the system like schedutil and > would result in wrong energy estimation. > > a new arch_scale_freq_ref() is available to return a fixed and coherent > frequency reference that can be used when computing the CPU's frequency > for an level of utilization. Use this function when available or fallback > to the last performance domain item otherwise. > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > include/linux/energy_model.h | 20 +++++++++++++++++--- > 1 file changed, 17 insertions(+), 3 deletions(-) > > diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h > index b9caa01dfac4..7ee07be6928e 100644 > --- a/include/linux/energy_model.h > +++ b/include/linux/energy_model.h > @@ -204,6 +204,20 @@ struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd, > return ps; > } > > +#ifdef arch_scale_freq_ref > +static __always_inline > +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) The comments in patch 3/4 should be considered for this function and its use as well. Thanks, Ionela. > +{ > + return arch_scale_freq_ref(cpu); > +} > +#else > +static __always_inline > +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) > +{ > + return pd->table[pd->nr_perf_states - 1].frequency; > +} > +#endif > + > /** > * em_cpu_energy() - Estimates the energy consumed by the CPUs of a > * performance domain > @@ -224,7 +238,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, > unsigned long max_util, unsigned long sum_util, > unsigned long allowed_cpu_cap) > { > - unsigned long freq, scale_cpu; > + unsigned long freq, ref_freq, scale_cpu; > struct em_perf_state *ps; > int cpu; > > @@ -241,11 +255,11 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, > */ > cpu = cpumask_first(to_cpumask(pd->cpus)); > scale_cpu = arch_scale_cpu_capacity(cpu); > - ps = &pd->table[pd->nr_perf_states - 1]; > + ref_freq = arch_scale_freq_ref_em(cpu, pd); > > max_util = map_util_perf(max_util); > max_util = min(max_util, allowed_cpu_cap); > - freq = map_util_freq(max_util, ps->frequency, scale_cpu); > + freq = map_util_freq(max_util, ref_freq, scale_cpu); > > /* > * Find the lowest performance state of the Energy Model above the > -- > 2.34.1 > >
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index b9caa01dfac4..7ee07be6928e 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -204,6 +204,20 @@ struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd, return ps; } +#ifdef arch_scale_freq_ref +static __always_inline +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) +{ + return arch_scale_freq_ref(cpu); +} +#else +static __always_inline +unsigned long arch_scale_freq_ref_em(int cpu, struct em_perf_domain *pd) +{ + return pd->table[pd->nr_perf_states - 1].frequency; +} +#endif + /** * em_cpu_energy() - Estimates the energy consumed by the CPUs of a * performance domain @@ -224,7 +238,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, unsigned long max_util, unsigned long sum_util, unsigned long allowed_cpu_cap) { - unsigned long freq, scale_cpu; + unsigned long freq, ref_freq, scale_cpu; struct em_perf_state *ps; int cpu; @@ -241,11 +255,11 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, */ cpu = cpumask_first(to_cpumask(pd->cpus)); scale_cpu = arch_scale_cpu_capacity(cpu); - ps = &pd->table[pd->nr_perf_states - 1]; + ref_freq = arch_scale_freq_ref_em(cpu, pd); max_util = map_util_perf(max_util); max_util = min(max_util, allowed_cpu_cap); - freq = map_util_freq(max_util, ps->frequency, scale_cpu); + freq = map_util_freq(max_util, ref_freq, scale_cpu); /* * Find the lowest performance state of the Energy Model above the
The last item of a performance domain is not always the performance point that has been used to compute CPU's capacity. This can lead to different target frequency compared with other part of the system like schedutil and would result in wrong energy estimation. a new arch_scale_freq_ref() is available to return a fixed and coherent frequency reference that can be used when computing the CPU's frequency for an level of utilization. Use this function when available or fallback to the last performance domain item otherwise. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- include/linux/energy_model.h | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-)