diff mbox series

arm64: Provide an AMU-based version of arch_freq_get_on_cpu

Message ID 20230606155754.245998-1-beata.michalska@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64: Provide an AMU-based version of arch_freq_get_on_cpu | expand

Commit Message

Beata Michalska June 6, 2023, 3:57 p.m. UTC
With the Frequency Invariance Engine (FIE) being already wired up with
sched tick and making use of relevant (core counter and constant
counter) AMU counters, getting the current frequency for a given CPU
on supported platforms, can be achieved by utilizing the frequency scale
factor which reflects an average CPU frequency for the last tick period
length.

With that at hand, arch_freq_get_on_cpu dedicated implementation
gets enrolled into cpuinfo_cur_freq policy sysfs attribute handler,
which is expected to represent the current frequency of a given CPU,
as obtained by the hardware. This is exactly the type of feedback that
cycle counters provide.

In order to avoid calling arch_freq_get_on_cpu from the scaling_cur_freq
attribute handler for platforms that do provide cpuinfo_cur_freq, and
yet keeping things intact for those platform that do not, its use gets
conditioned on the presence of cpufreq_driver (*get) callback (which also
seems to be the case for creating cpuinfo_cur_freq attribute).

Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
 arch/arm64/kernel/topology.c | 34 ++++++++++++++++++++++++++++++++++
 drivers/cpufreq/cpufreq.c    |  9 +++++++--
 2 files changed, 41 insertions(+), 2 deletions(-)

Comments

Sudeep Holla June 7, 2023, 9:58 a.m. UTC | #1
On Tue, Jun 06, 2023 at 04:57:54PM +0100, Beata Michalska wrote:
> With the Frequency Invariance Engine (FIE) being already wired up with
> sched tick and making use of relevant (core counter and constant
> counter) AMU counters, getting the current frequency for a given CPU
> on supported platforms, can be achieved by utilizing the frequency scale
> factor which reflects an average CPU frequency for the last tick period
> length.
> 
> With that at hand, arch_freq_get_on_cpu dedicated implementation
> gets enrolled into cpuinfo_cur_freq policy sysfs attribute handler,
> which is expected to represent the current frequency of a given CPU,
> as obtained by the hardware. This is exactly the type of feedback that
> cycle counters provide.
> 
> In order to avoid calling arch_freq_get_on_cpu from the scaling_cur_freq
> attribute handler for platforms that do provide cpuinfo_cur_freq, and
> yet keeping things intact for those platform that do not, its use gets
> conditioned on the presence of cpufreq_driver (*get) callback (which also
> seems to be the case for creating cpuinfo_cur_freq attribute).
>

LGTM,

Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>

However I fail to understand if both the changes are dependent ?
Can this be split into 2 patches ? I fail to see the dependency, what
am I missing ? Even if there is some dependency to get arch value
(arch_freq_get_on_cpu() from show_cpuinfo_cur_freq()), you can push
that change first followed by the arm64 change as 2 different change.
Beata Michalska June 7, 2023, 2 p.m. UTC | #2
On Wed, Jun 07, 2023 at 10:58:56AM +0100, Sudeep Holla wrote:
> On Tue, Jun 06, 2023 at 04:57:54PM +0100, Beata Michalska wrote:
> > With the Frequency Invariance Engine (FIE) being already wired up with
> > sched tick and making use of relevant (core counter and constant
> > counter) AMU counters, getting the current frequency for a given CPU
> > on supported platforms, can be achieved by utilizing the frequency scale
> > factor which reflects an average CPU frequency for the last tick period
> > length.
> > 
> > With that at hand, arch_freq_get_on_cpu dedicated implementation
> > gets enrolled into cpuinfo_cur_freq policy sysfs attribute handler,
> > which is expected to represent the current frequency of a given CPU,
> > as obtained by the hardware. This is exactly the type of feedback that
> > cycle counters provide.
> > 
> > In order to avoid calling arch_freq_get_on_cpu from the scaling_cur_freq
> > attribute handler for platforms that do provide cpuinfo_cur_freq, and
> > yet keeping things intact for those platform that do not, its use gets
> > conditioned on the presence of cpufreq_driver (*get) callback (which also
> > seems to be the case for creating cpuinfo_cur_freq attribute).
> >
> 
> LGTM,
> 
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> 
Thanks for the review.
> However I fail to understand if both the changes are dependent ?
> Can this be split into 2 patches ? I fail to see the dependency, what
> am I missing ? Even if there is some dependency to get arch value
> (arch_freq_get_on_cpu() from show_cpuinfo_cur_freq()), you can push
> that change first followed by the arm64 change as 2 different change.
> 
I guess I could split the patch into two parts:
1. adding implementation for the arch_freq_get_on_cpu
2. wiring it up with the cpufreq relevant attrib handlers

or the other way round (if that's what you have in mind).

Will wait a bit for any further comments before pushing new v.

---
BR
B.

> -- 
> Regards,
> Sudeep
Viresh Kumar June 8, 2023, 5:15 a.m. UTC | #3
+Len

On 06-06-23, 16:57, Beata Michalska wrote:
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> +unsigned int arch_freq_get_on_cpu(int cpu)
> +{
> +	unsigned int freq;
> +	u64 scale;
> +
> +	if (!cpumask_test_cpu(cpu, amu_fie_cpus))
> +		return 0;
> +
> +	if (!housekeeping_cpu(cpu, HK_TYPE_TICK)) {

I am not sure what we are doing in the `if` block here, at least a comment would
be useful.

> +		struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> +		int ref_cpu = nr_cpu_ids;
> +
> +		if (cpumask_intersects(housekeeping_cpumask(HK_TYPE_TICK),
> +				       policy->cpus))
> +			ref_cpu = cpumask_nth_and(cpu, policy->cpus,
> +						  housekeeping_cpumask(HK_TYPE_TICK));
> +		cpufreq_cpu_put(policy);
> +		if (ref_cpu >= nr_cpu_ids)
> +			return 0;
> +		cpu = ref_cpu;
> +	}

A blank line here please.

> +	/*
> +	 * Reversed computation to the one used to determine
> +	 * the arch_freq_scale value
> +	 * (see amu_scale_freq_tick for details)
> +	 */
> +	scale = per_cpu(arch_freq_scale, cpu);
> +	scale *= cpufreq_get_hw_max_freq(cpu);
> +	freq = scale >> SCHED_CAPACITY_SHIFT;
> +
> +	return freq;
> +}
> +
>  #ifdef CONFIG_ACPI_CPPC_LIB
>  #include <acpi/cppc_acpi.h>
>  
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 6b52ebe5a890..9f2cf45bf190 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -710,7 +710,8 @@ static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
>  	ssize_t ret;
>  	unsigned int freq;
>  
> -	freq = arch_freq_get_on_cpu(policy->cpu);
> +	freq = !cpufreq_driver->get ? arch_freq_get_on_cpu(policy->cpu)
> +				    : 0;

You may have changed the logic for X86 parts as well here. For a x86 platform
with setpolicy() and get() callbacks, we will not call arch_freq_get_on_cpu()
anymore ?

>  	if (freq)
>  		ret = sprintf(buf, "%u\n", freq);
>  	else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
> @@ -747,7 +748,11 @@ store_one(scaling_max_freq, max);
>  static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy,
>  					char *buf)
>  {
> -	unsigned int cur_freq = __cpufreq_get(policy);
> +	unsigned int cur_freq;
> +
> +	cur_freq = arch_freq_get_on_cpu(policy->cpu);
> +	if (!cur_freq)
> +		cur_freq = __cpufreq_get(policy);

For this and the above change, I am not sure what is the right thing to do.

From Len's commit [1]:

    Here we provide an x86 routine to make this calculation
    on supported hardware, and use it in preference to any
    driver driver-specific cpufreq_driver.get() routine.

I am not sure why Len updated `show_scaling_cur_freq()` and not
`show_cpuinfo_cur_freq()` ? Maybe we should update both these routines ?

Also, I don't think this is something that should have different logic for ARM
and X86, we should be consistent here as a cpufreq decision. Since both these
routines are reached via a read operation to a sysfs file, we shouldn't be
concerned about performance too.

What about doing this for both the routines, for all platforms now:

	cur_freq = arch_freq_get_on_cpu(policy->cpu);
	if (!cur_freq)
                ... get freq via policy->get() or policy->cur;
Viresh Kumar June 8, 2023, 5:18 a.m. UTC | #4
+Vincent

On 08-06-23, 10:45, Viresh Kumar wrote:
> +Len
> 
> On 06-06-23, 16:57, Beata Michalska wrote:
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > +unsigned int arch_freq_get_on_cpu(int cpu)
> > +{
> > +	unsigned int freq;
> > +	u64 scale;
> > +
> > +	if (!cpumask_test_cpu(cpu, amu_fie_cpus))
> > +		return 0;
> > +
> > +	if (!housekeeping_cpu(cpu, HK_TYPE_TICK)) {
> 
> I am not sure what we are doing in the `if` block here, at least a comment would
> be useful.
> 
> > +		struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > +		int ref_cpu = nr_cpu_ids;
> > +
> > +		if (cpumask_intersects(housekeeping_cpumask(HK_TYPE_TICK),
> > +				       policy->cpus))
> > +			ref_cpu = cpumask_nth_and(cpu, policy->cpus,
> > +						  housekeeping_cpumask(HK_TYPE_TICK));
> > +		cpufreq_cpu_put(policy);
> > +		if (ref_cpu >= nr_cpu_ids)
> > +			return 0;
> > +		cpu = ref_cpu;
> > +	}
> 
> A blank line here please.
> 
> > +	/*
> > +	 * Reversed computation to the one used to determine
> > +	 * the arch_freq_scale value
> > +	 * (see amu_scale_freq_tick for details)
> > +	 */
> > +	scale = per_cpu(arch_freq_scale, cpu);
> > +	scale *= cpufreq_get_hw_max_freq(cpu);
> > +	freq = scale >> SCHED_CAPACITY_SHIFT;
> > +
> > +	return freq;
> > +}
> > +
> >  #ifdef CONFIG_ACPI_CPPC_LIB
> >  #include <acpi/cppc_acpi.h>
> >  
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 6b52ebe5a890..9f2cf45bf190 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -710,7 +710,8 @@ static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> >  	ssize_t ret;
> >  	unsigned int freq;
> >  
> > -	freq = arch_freq_get_on_cpu(policy->cpu);
> > +	freq = !cpufreq_driver->get ? arch_freq_get_on_cpu(policy->cpu)
> > +				    : 0;
> 
> You may have changed the logic for X86 parts as well here. For a x86 platform
> with setpolicy() and get() callbacks, we will not call arch_freq_get_on_cpu()
> anymore ?
> 
> >  	if (freq)
> >  		ret = sprintf(buf, "%u\n", freq);
> >  	else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
> > @@ -747,7 +748,11 @@ store_one(scaling_max_freq, max);
> >  static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy,
> >  					char *buf)
> >  {
> > -	unsigned int cur_freq = __cpufreq_get(policy);
> > +	unsigned int cur_freq;
> > +
> > +	cur_freq = arch_freq_get_on_cpu(policy->cpu);
> > +	if (!cur_freq)
> > +		cur_freq = __cpufreq_get(policy);
> 
> For this and the above change, I am not sure what is the right thing to do.
> 
> >From Len's commit [1]:
> 
>     Here we provide an x86 routine to make this calculation
>     on supported hardware, and use it in preference to any
>     driver driver-specific cpufreq_driver.get() routine.
> 
> I am not sure why Len updated `show_scaling_cur_freq()` and not
> `show_cpuinfo_cur_freq()` ? Maybe we should update both these routines ?
> 
> Also, I don't think this is something that should have different logic for ARM
> and X86, we should be consistent here as a cpufreq decision. Since both these
> routines are reached via a read operation to a sysfs file, we shouldn't be
> concerned about performance too.
> 
> What about doing this for both the routines, for all platforms now:
> 
> 	cur_freq = arch_freq_get_on_cpu(policy->cpu);
> 	if (!cur_freq)
>                 ... get freq via policy->get() or policy->cur;
> 
> -- 
> viresh
> 
> [1] commit f8475cef9008 ("x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF")
Beata Michalska June 8, 2023, 2:45 p.m. UTC | #5
On Thu, Jun 08, 2023 at 10:48:16AM +0530, Viresh Kumar wrote:
> +Vincent
> 
> On 08-06-23, 10:45, Viresh Kumar wrote:
> > +Len
> > 
> > On 06-06-23, 16:57, Beata Michalska wrote:
> > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > > +unsigned int arch_freq_get_on_cpu(int cpu)
> > > +{
> > > +	unsigned int freq;
> > > +	u64 scale;
> > > +
> > > +	if (!cpumask_test_cpu(cpu, amu_fie_cpus))
> > > +		return 0;
> > > +
> > > +	if (!housekeeping_cpu(cpu, HK_TYPE_TICK)) {
> > 
> > I am not sure what we are doing in the `if` block here, at least a comment would
> > be useful.
For those CPUs that are in full dynticks mode , the arch_freq_scale_factor will
be basically useless (as there will be no regular sched_tick which eventually
calls topology_scale_freq_tick()), so the code below will look for any other
available CPU within current policy that could server as the source of the
counters. If there is none it will fallback to cpufreq driver to provide
current frequency.
Comment to be added.

> > 
> > > +		struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > > +		int ref_cpu = nr_cpu_ids;
> > > +
> > > +		if (cpumask_intersects(housekeeping_cpumask(HK_TYPE_TICK),
> > > +				       policy->cpus))
> > > +			ref_cpu = cpumask_nth_and(cpu, policy->cpus,
> > > +						  housekeeping_cpumask(HK_TYPE_TICK));
> > > +		cpufreq_cpu_put(policy);
> > > +		if (ref_cpu >= nr_cpu_ids)
> > > +			return 0;
> > > +		cpu = ref_cpu;
> > > +	}
> > 
> > A blank line here please.
Noted.
> > 
> > > +	/*
> > > +	 * Reversed computation to the one used to determine
> > > +	 * the arch_freq_scale value
> > > +	 * (see amu_scale_freq_tick for details)
> > > +	 */
> > > +	scale = per_cpu(arch_freq_scale, cpu);
> > > +	scale *= cpufreq_get_hw_max_freq(cpu);
> > > +	freq = scale >> SCHED_CAPACITY_SHIFT;
> > > +
> > > +	return freq;
> > > +}
> > > +
> > >  #ifdef CONFIG_ACPI_CPPC_LIB
> > >  #include <acpi/cppc_acpi.h>
> > >  
> > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > index 6b52ebe5a890..9f2cf45bf190 100644
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -710,7 +710,8 @@ static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> > >  	ssize_t ret;
> arch_freq_get_on_cpu> >  	unsigned int freq;
> > >  
> > > -	freq = arch_freq_get_on_cpu(policy->cpu);
> > > +	freq = !cpufreq_driver->get ? arch_freq_get_on_cpu(policy->cpu)
> > > +				    : 0;
> > 
> > You may have changed the logic for X86 parts as well here. For a x86 platform
> > with setpolicy() and get() callbacks, we will not call arch_freq_get_on_cpu()
> > anymore ?
> > 
There is a little bit of ambiguity around both 'cpuinfo_cur_freq' and
'scaling_cur_freq' and how those two are being handled on different platforms.
If I got things right, the first one is supposed to reflect the frequency as
obtained from  the hardware, whereas the latter is more of a sw point of view on
that, with the exception of x86, where that one is actually relying on info
retrieved from the hardware. It would potentially break the 'x86' platforms that
do provide both: setpolicy and get, though apparently I must have missed one (?)
(... so I did miss LongRun although not sure how much of an issue it would be
for that case ...)
> > >  	if (freq)
> > >  		ret = sprintf(buf, "%u\n", freq);
> > >  	else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
> > > @@ -747,7 +748,11 @@ store_one(scaling_max_freq, max);
> > >  static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy,
> > >  					char *buf)
> > >  {
> > > -	unsigned int cur_freq = __cpufreq_get(policy);
> > > +	unsigned int cur_freq;
> > > +
> > > +	cur_freq = arch_freq_get_on_cpu(policy->cpu);
> > > +	if (!cur_freq)
> > > +		cur_freq = __cpufreq_get(policy);
> > 
> > For this and the above change, I am not sure what is the right thing to do.
> > 
> > >From Len's commit [1]:
> > 
> >     Here we provide an x86 routine to make this calculation
> >     on supported hardware, and use it in preference to any
> >     driver driver-specific cpufreq_driver.get() routine.
> > 
I've seen that one as well though there seems to be no cpufreq driver that does
support both.
(scratch that ... there is LongRun)
> > I am not sure why Len updated `show_scaling_cur_freq()` and not
> > `show_cpuinfo_cur_freq()` ? Maybe we should update both these routines ?
> > 
> > Also, I don't think this is something that should have different logic for ARM
> > and X86, we should be consistent here as a cpufreq decision. Since both these
I agree, it would be good to align the two.
> > routines are reached via a read operation to a sysfs file, we shouldn't be
> > concerned about performance too.
> > 
> > What about doing this for both the routines, for all platforms now:
> > 
> > 	cur_freq = arch_freq_get_on_cpu(policy->cpu);
> > 	if (!cur_freq)
> >                 ... get freq via policy->get() or policy->cur;
> > 
That could work, I guess. But then we would have 'cpuinfo_cur_freq' ==
'scaling_cur_freq' for platforms that do provide arch_freq_get_on_cpu,
which might lead to more confusion as per what is the actual difference between
the two (?)

Thank you for all the comments!

---
BR
B.
> > -- 
> > viresh
> > 
> > [1] commit f8475cef9008 ("x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF")
> 
> -- 
> viresh
Viresh Kumar June 9, 2023, 4:39 a.m. UTC | #6
On 08-06-23, 15:45, Beata Michalska wrote:
> For those CPUs that are in full dynticks mode , the arch_freq_scale_factor will
> be basically useless (as there will be no regular sched_tick which eventually
> calls topology_scale_freq_tick()), so the code below will look for any other
> available CPU within current policy that could server as the source of the
> counters. If there is none it will fallback to cpufreq driver to provide
> current frequency.

Understood.

> There is a little bit of ambiguity around both 'cpuinfo_cur_freq' and
> 'scaling_cur_freq' and how those two are being handled on different platforms.
> If I got things right, the first one is supposed to reflect the frequency as
> obtained from  the hardware,

Yes, this must be accurate, as much as possible.

> whereas the latter is more of a sw point of view on that,

Historically, it was more about the last frequency requested by the software.
But that has changed, for example for X86 and now there is no limitation here
which disallows one to report the more accurate one.

> That could work, I guess. But then we would have 'cpuinfo_cur_freq' ==
> 'scaling_cur_freq' for platforms that do provide arch_freq_get_on_cpu,
> which might lead to more confusion as per what is the actual difference between
> the two (?)

The behavior should be same for all platforms. That's my primary concern here.
If getting same freq from both these files is okay for X86, then it should be
for ARM as well.

If the X86 commit (f8475cef9008) wasn't already merged, I would have suggested
to do this aperf/mperf thing only in cpuinfo_cur_freq() and not
scaling_cur_freq().

Maybe we can still revert back if there is no hard dependency here.

Len / Rafael ?

The question is if we should make scaling_cur_freq() to always return the last
requested frequency and make cpuinfo_cur_freq() to return the most accurate one,
preferably using aperf/mperf ?
Sumit Gupta June 14, 2023, 6:59 p.m. UTC | #7
On 06/06/23 21:27, Beata Michalska wrote:
> External email: Use caution opening links or attachments
> 
> 
> With the Frequency Invariance Engine (FIE) being already wired up with
> sched tick and making use of relevant (core counter and constant
> counter) AMU counters, getting the current frequency for a given CPU
> on supported platforms, can be achieved by utilizing the frequency scale
> factor which reflects an average CPU frequency for the last tick period
> length.
> 
> With that at hand, arch_freq_get_on_cpu dedicated implementation
> gets enrolled into cpuinfo_cur_freq policy sysfs attribute handler,
> which is expected to represent the current frequency of a given CPU,
> as obtained by the hardware. This is exactly the type of feedback that
> cycle counters provide.
> 
> In order to avoid calling arch_freq_get_on_cpu from the scaling_cur_freq
> attribute handler for platforms that do provide cpuinfo_cur_freq, and
> yet keeping things intact for those platform that do not, its use gets
> conditioned on the presence of cpufreq_driver (*get) callback (which also
> seems to be the case for creating cpuinfo_cur_freq attribute).
> 

Tested the change with frequency switch stress test but was getting big 
delta between set and get freq.
After passing "nohz=off" and commenting "wfi" in "cpu_do_idle()", the
delta is less. This confirms that more delta is due to AMU counters
stopping at "WFI".

   +++ b/arch/arm64/kernel/idle.c
   @@ -27,7 +27,7 @@ void noinstr cpu_do_idle(void)
           arm_cpuidle_save_irq_context(&context);

           dsb(sy);
   -       wfi();
   +//     wfi();

I am not sure if the expected behavior here is right.
In our tests, we compare the last set frequency against the re-generated
value from counters to confirm that the CPU is actually running at the
requested frequency and the counters are working correct. But that won't
happen with this change.

In [1] and later in the updated patch within [2], we are busy looping
on the target CPU and avoid WFI to get the actual frequency.

Please share what you think is the right expected behavior.

[1] https://lore.kernel.org/lkml/20230418113459.12860-7-sumitg@nvidia.com/
[2] 
https://lore.kernel.org/lkml/cde1d8a9-3a21-e82b-7895-40603a14d898@nvidia.com/T/#mb898a75fd0c72d166b26b04da3ad162afe068a82
Beata Michalska June 16, 2023, 9:53 a.m. UTC | #8
On Thu, Jun 15, 2023 at 12:29:57AM +0530, Sumit Gupta wrote:
> 
> 
> On 06/06/23 21:27, Beata Michalska wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > With the Frequency Invariance Engine (FIE) being already wired up with
> > sched tick and making use of relevant (core counter and constant
> > counter) AMU counters, getting the current frequency for a given CPU
> > on supported platforms, can be achieved by utilizing the frequency scale
> > factor which reflects an average CPU frequency for the last tick period
> > length.
> > 
> > With that at hand, arch_freq_get_on_cpu dedicated implementation
> > gets enrolled into cpuinfo_cur_freq policy sysfs attribute handler,
> > which is expected to represent the current frequency of a given CPU,
> > as obtained by the hardware. This is exactly the type of feedback that
> > cycle counters provide.
> > 
> > In order to avoid calling arch_freq_get_on_cpu from the scaling_cur_freq
> > attribute handler for platforms that do provide cpuinfo_cur_freq, and
> > yet keeping things intact for those platform that do not, its use gets
> > conditioned on the presence of cpufreq_driver (*get) callback (which also
> > seems to be the case for creating cpuinfo_cur_freq attribute).
> > 
> 
> Tested the change with frequency switch stress test but was getting big
> delta between set and get freq.
Would you mind sharing some more data re your testing ?
The arch_freq_get_on_cpu will provided an average freq for last tick period,
with an updated occurring on each sched tick so the differences between set
and get might show up. With your stress testing, if the frequency change comes
at the end of current tick period, it might not be reflected until next one
elapses.
In case of idle states - if the CPU for which the current frequency is being
requested is in idle mode, the frequency returned will be the last one before
entering idle, which seems reasonable (?).
I guess the question here would be what is your tolerance level for those
differences.
> After passing "nohz=off" and commenting "wfi" in "cpu_do_idle()", the
> delta is less. This confirms that more delta is due to AMU counters
> stopping at "WFI".
> 
>   +++ b/arch/arm64/kernel/idle.c
>   @@ -27,7 +27,7 @@ void noinstr cpu_do_idle(void)
>           arm_cpuidle_save_irq_context(&context);
> 
>           dsb(sy);
>   -       wfi();
>   +//     wfi();
> 
> I am not sure if the expected behavior here is right.
Both CPU_CYCLES and CNT_CYCLES are not incremented in WFI.
> In our tests, we compare the last set frequency against the re-generated
> value from counters to confirm that the CPU is actually running at the
> requested frequency and the counters are working correct. But that won't
> happen with this change.
> 
> In [1] and later in the updated patch within [2], we are busy looping
> on the target CPU and avoid WFI to get the actual frequency.
> 
> Please share what you think is the right expected behavior.
> 
> [1] https://lore.kernel.org/lkml/20230418113459.12860-7-sumitg@nvidia.com/
> [2] https://lore.kernel.org/lkml/cde1d8a9-3a21-e82b-7895-40603a14d898@nvidia.com/T/#mb898a75fd0c72d166b26b04da3ad162afe068a82
Beata Michalska June 16, 2023, 9:57 a.m. UTC | #9
On Fri, Jun 09, 2023 at 10:09:22AM +0530, Viresh Kumar wrote:
> On 08-06-23, 15:45, Beata Michalska wrote:
> > For those CPUs that are in full dynticks mode , the arch_freq_scale_factor will
> > be basically useless (as there will be no regular sched_tick which eventually
> > calls topology_scale_freq_tick()), so the code below will look for any other
> > available CPU within current policy that could server as the source of the
> > counters. If there is none it will fallback to cpufreq driver to provide
> > current frequency.
> 
> Understood.
> 
> > There is a little bit of ambiguity around both 'cpuinfo_cur_freq' and
> > 'scaling_cur_freq' and how those two are being handled on different platforms.
> > If I got things right, the first one is supposed to reflect the frequency as
> > obtained from  the hardware,
> 
> Yes, this must be accurate, as much as possible.
> 
> > whereas the latter is more of a sw point of view on that,
> 
> Historically, it was more about the last frequency requested by the software.
> But that has changed, for example for X86 and now there is no limitation here
> which disallows one to report the more accurate one.
>
That's my observation as well - thank you for clarifying.
> > That could work, I guess. But then we would have 'cpuinfo_cur_freq' ==
> > 'scaling_cur_freq' for platforms that do provide arch_freq_get_on_cpu,
> > which might lead to more confusion as per what is the actual difference between
> > the two (?)
> 
> The behavior should be same for all platforms. That's my primary concern here.
> If getting same freq from both these files is okay for X86, then it should be
> for ARM as well.
> 
I agree it would be good to align the behaviour here.
I guess we should wait for more input on what we can and cannot do for x86.

---
BR
B.
> If the X86 commit (f8475cef9008) wasn't already merged, I would have suggested
> to do this aperf/mperf thing only in cpuinfo_cur_freq() and not
> scaling_cur_freq().
> 
> Maybe we can still revert back if there is no hard dependency here.
> 
> Len / Rafael ?
> 
> The question is if we should make scaling_cur_freq() to always return the last
> requested frequency and make cpuinfo_cur_freq() to return the most accurate one,
> preferably using aperf/mperf ?
> 
> -- 
> viresh
Sumit Gupta June 23, 2023, 2:33 p.m. UTC | #10
On 16/06/23 15:23, Beata Michalska wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Thu, Jun 15, 2023 at 12:29:57AM +0530, Sumit Gupta wrote:
>>
>>
>> On 06/06/23 21:27, Beata Michalska wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> With the Frequency Invariance Engine (FIE) being already wired up with
>>> sched tick and making use of relevant (core counter and constant
>>> counter) AMU counters, getting the current frequency for a given CPU
>>> on supported platforms, can be achieved by utilizing the frequency scale
>>> factor which reflects an average CPU frequency for the last tick period
>>> length.
>>>
>>> With that at hand, arch_freq_get_on_cpu dedicated implementation
>>> gets enrolled into cpuinfo_cur_freq policy sysfs attribute handler,
>>> which is expected to represent the current frequency of a given CPU,
>>> as obtained by the hardware. This is exactly the type of feedback that
>>> cycle counters provide.
>>>
>>> In order to avoid calling arch_freq_get_on_cpu from the scaling_cur_freq
>>> attribute handler for platforms that do provide cpuinfo_cur_freq, and
>>> yet keeping things intact for those platform that do not, its use gets
>>> conditioned on the presence of cpufreq_driver (*get) callback (which also
>>> seems to be the case for creating cpuinfo_cur_freq attribute).
>>>
>>
>> Tested the change with frequency switch stress test but was getting big
>> delta between set and get freq.
> Would you mind sharing some more data re your testing ?
> The arch_freq_get_on_cpu will provided an average freq for last tick period,
> with an updated occurring on each sched tick so the differences between set
> and get might show up. With your stress testing, if the frequency change comes
> at the end of current tick period, it might not be reflected until next one
> elapses.
> In case of idle states - if the CPU for which the current frequency is being
> requested is in idle mode, the frequency returned will be the last one before
> entering idle, which seems reasonable (?).
> I guess the question here would be what is your tolerance level for those
> differences.

The test waits for 50ms before reading back the set frequency which is
much more than a tick period.

High delta value might be due to the reference counter increments
happening in bursts in Tegra241 SoC as mentioned in [1].
If the CPU is idle for most of the period then observation window will
be small due to AMU counters stopping at WFI. This can cause more error
in Tegra241 as increasing the observation time interval reduces error
percentage.

I created 100% CPU load and then the test is passing. So, looks like we 
will have to create load before the frequency switch test. Please share
if more suggestions.
[1] https://lore.kernel.org/lkml/20230418113459.12860-5-sumitg@nvidia.com/

>> After passing "nohz=off" and commenting "wfi" in "cpu_do_idle()", the
>> delta is less. This confirms that more delta is due to AMU counters
>> stopping at "WFI".
>>
>>    +++ b/arch/arm64/kernel/idle.c
>>    @@ -27,7 +27,7 @@ void noinstr cpu_do_idle(void)
>>            arm_cpuidle_save_irq_context(&context);
>>
>>            dsb(sy);
>>    -       wfi();
>>    +//     wfi();
>>
>> I am not sure if the expected behavior here is right.
> Both CPU_CYCLES and CNT_CYCLES are not incremented in WFI.
>> In our tests, we compare the last set frequency against the re-generated
>> value from counters to confirm that the CPU is actually running at the
>> requested frequency and the counters are working correct. But that won't
>> happen with this change.
>>
>> In [1] and later in the updated patch within [2], we are busy looping
>> on the target CPU and avoid WFI to get the actual frequency.
>>
>> Please share what you think is the right expected behavior.
>>
>> [1] https://lore.kernel.org/lkml/20230418113459.12860-7-sumitg@nvidia.com/
>> [2] https://lore.kernel.org/lkml/cde1d8a9-3a21-e82b-7895-40603a14d898@nvidia.com/T/#mb898a75fd0c72d166b26b04da3ad162afe068a82
Will Deacon July 27, 2023, 9:56 a.m. UTC | #11
On Wed, Jun 07, 2023 at 03:00:49PM +0100, Beata Michalska wrote:
> On Wed, Jun 07, 2023 at 10:58:56AM +0100, Sudeep Holla wrote:
> > On Tue, Jun 06, 2023 at 04:57:54PM +0100, Beata Michalska wrote:
> > > With the Frequency Invariance Engine (FIE) being already wired up with
> > > sched tick and making use of relevant (core counter and constant
> > > counter) AMU counters, getting the current frequency for a given CPU
> > > on supported platforms, can be achieved by utilizing the frequency scale
> > > factor which reflects an average CPU frequency for the last tick period
> > > length.
> > > 
> > > With that at hand, arch_freq_get_on_cpu dedicated implementation
> > > gets enrolled into cpuinfo_cur_freq policy sysfs attribute handler,
> > > which is expected to represent the current frequency of a given CPU,
> > > as obtained by the hardware. This is exactly the type of feedback that
> > > cycle counters provide.
> > > 
> > > In order to avoid calling arch_freq_get_on_cpu from the scaling_cur_freq
> > > attribute handler for platforms that do provide cpuinfo_cur_freq, and
> > > yet keeping things intact for those platform that do not, its use gets
> > > conditioned on the presence of cpufreq_driver (*get) callback (which also
> > > seems to be the case for creating cpuinfo_cur_freq attribute).
> > >
> > 
> > LGTM,
> > 
> > Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> > 
> Thanks for the review.
> > However I fail to understand if both the changes are dependent ?
> > Can this be split into 2 patches ? I fail to see the dependency, what
> > am I missing ? Even if there is some dependency to get arch value
> > (arch_freq_get_on_cpu() from show_cpuinfo_cur_freq()), you can push
> > that change first followed by the arm64 change as 2 different change.
> > 
> I guess I could split the patch into two parts:
> 1. adding implementation for the arch_freq_get_on_cpu
> 2. wiring it up with the cpufreq relevant attrib handlers
> 
> or the other way round (if that's what you have in mind).
> 
> Will wait a bit for any further comments before pushing new v.

Are you still planning on a v2?

Will
Beata Michalska Aug. 14, 2023, 7:27 a.m. UTC | #12
On Thu, Jul 27, 2023 at 10:56:05AM +0100, Will Deacon wrote:
> On Wed, Jun 07, 2023 at 03:00:49PM +0100, Beata Michalska wrote:
> > On Wed, Jun 07, 2023 at 10:58:56AM +0100, Sudeep Holla wrote:
> > > On Tue, Jun 06, 2023 at 04:57:54PM +0100, Beata Michalska wrote:
> > > > With the Frequency Invariance Engine (FIE) being already wired up with
> > > > sched tick and making use of relevant (core counter and constant
> > > > counter) AMU counters, getting the current frequency for a given CPU
> > > > on supported platforms, can be achieved by utilizing the frequency scale
> > > > factor which reflects an average CPU frequency for the last tick period
> > > > length.
> > > > 
> > > > With that at hand, arch_freq_get_on_cpu dedicated implementation
> > > > gets enrolled into cpuinfo_cur_freq policy sysfs attribute handler,
> > > > which is expected to represent the current frequency of a given CPU,
> > > > as obtained by the hardware. This is exactly the type of feedback that
> > > > cycle counters provide.
> > > > 
> > > > In order to avoid calling arch_freq_get_on_cpu from the scaling_cur_freq
> > > > attribute handler for platforms that do provide cpuinfo_cur_freq, and
> > > > yet keeping things intact for those platform that do not, its use gets
> > > > conditioned on the presence of cpufreq_driver (*get) callback (which also
> > > > seems to be the case for creating cpuinfo_cur_freq attribute).
> > > >
> > > 
> > > LGTM,
> > > 
> > > Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> > > 
> > Thanks for the review.
> > > However I fail to understand if both the changes are dependent ?
> > > Can this be split into 2 patches ? I fail to see the dependency, what
> > > am I missing ? Even if there is some dependency to get arch value
> > > (arch_freq_get_on_cpu() from show_cpuinfo_cur_freq()), you can push
> > > that change first followed by the arm64 change as 2 different change.
> > > 
> > I guess I could split the patch into two parts:
> > 1. adding implementation for the arch_freq_get_on_cpu
> > 2. wiring it up with the cpufreq relevant attrib handlers
> > 
> > or the other way round (if that's what you have in mind).
> > 
> > Will wait a bit for any further comments before pushing new v.
> 
> Are you still planning on a v2?

Apologies for late reply, 've been away for a while and then got bit swamped.
I do not think there will be v2 unless I'll find reasonable way to handle cases
as one mentioned in [1].

---
BR
B.

[1] https://lore.kernel.org/linux-arm-kernel/691d3eb2-cd93-f0fc-a7a4-2a8c0d44262c@nvidia.com/T/#m4e74cb5a0aaa353c60fedc6cfb95ab7a6e381e3c
> 
> Will
Sumit Gupta Oct. 18, 2023, 1:05 p.m. UTC | #13
On 15/06/23 00:29, Sumit Gupta wrote:
> 
> 
> On 06/06/23 21:27, Beata Michalska wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> With the Frequency Invariance Engine (FIE) being already wired up with
>> sched tick and making use of relevant (core counter and constant
>> counter) AMU counters, getting the current frequency for a given CPU
>> on supported platforms, can be achieved by utilizing the frequency scale
>> factor which reflects an average CPU frequency for the last tick period
>> length.
>>
>> With that at hand, arch_freq_get_on_cpu dedicated implementation
>> gets enrolled into cpuinfo_cur_freq policy sysfs attribute handler,
>> which is expected to represent the current frequency of a given CPU,
>> as obtained by the hardware. This is exactly the type of feedback that
>> cycle counters provide.
>>
>> In order to avoid calling arch_freq_get_on_cpu from the scaling_cur_freq
>> attribute handler for platforms that do provide cpuinfo_cur_freq, and
>> yet keeping things intact for those platform that do not, its use gets
>> conditioned on the presence of cpufreq_driver (*get) callback (which also
>> seems to be the case for creating cpuinfo_cur_freq attribute).
>>
> 
> Tested the change with frequency switch stress test but was getting big 
> delta between set and get freq.
> After passing "nohz=off" and commenting "wfi" in "cpu_do_idle()", the
> delta is less. This confirms that more delta is due to AMU counters
> stopping at "WFI".
> 
>    +++ b/arch/arm64/kernel/idle.c
>    @@ -27,7 +27,7 @@ void noinstr cpu_do_idle(void)
>            arm_cpuidle_save_irq_context(&context);
> 
>            dsb(sy);
>    -       wfi();
>    +//     wfi();
> 
> I am not sure if the expected behavior here is right.
> In our tests, we compare the last set frequency against the re-generated
> value from counters to confirm that the CPU is actually running at the
> requested frequency and the counters are working correct. But that won't
> happen with this change.
> 
> In [1] and later in the updated patch within [2], we are busy looping
> on the target CPU and avoid WFI to get the actual frequency.
> 
> Please share what you think is the right expected behavior.
> 
> [1] https://lore.kernel.org/lkml/20230418113459.12860-7-sumitg@nvidia.com/
> [2] 
> https://lore.kernel.org/lkml/cde1d8a9-3a21-e82b-7895-40603a14d898@nvidia.com/T/#mb898a75fd0c72d166b26b04da3ad162afe068a82

Observed another issue where CPUFREQ is coming too high when the
performance governor is set as default.

Below change solves that by using the new API arch_freq_get_on_cpu()
if present over the existing one, while verifying the currently set 
frequency.

  diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
  index 62face349fd2..2c74e70f701e 100644
  --- a/drivers/cpufreq/cpufreq.c
  +++ b/drivers/cpufreq/cpufreq.c
  @@ -1761,9 +1761,12 @@ static unsigned int 
cpufreq_verify_current_freq(struct cpufreq_policy *policy, b
   {
  		unsigned int new_freq;

  -       new_freq = cpufreq_driver->get(policy->cpu);
  -       if (!new_freq)
  -               return 0;
  +       new_freq = arch_freq_get_on_cpu(policy->cpu);
  +       if (!new_freq) {
  +               new_freq = cpufreq_driver->get(policy->cpu);
  +               if (!new_freq)
  +                       return 0;
  +       }


Best Regards,
Sumit Gupta
diff mbox series

Patch

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 817d788cd866..00a1aa421ec2 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -17,6 +17,7 @@ 
 #include <linux/cpufreq.h>
 #include <linux/init.h>
 #include <linux/percpu.h>
+#include <linux/sched/isolation.h>
 
 #include <asm/cpu.h>
 #include <asm/cputype.h>
@@ -251,6 +252,39 @@  static int __init init_amu_fie(void)
 }
 core_initcall(init_amu_fie);
 
+unsigned int arch_freq_get_on_cpu(int cpu)
+{
+	unsigned int freq;
+	u64 scale;
+
+	if (!cpumask_test_cpu(cpu, amu_fie_cpus))
+		return 0;
+
+	if (!housekeeping_cpu(cpu, HK_TYPE_TICK)) {
+		struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
+		int ref_cpu = nr_cpu_ids;
+
+		if (cpumask_intersects(housekeeping_cpumask(HK_TYPE_TICK),
+				       policy->cpus))
+			ref_cpu = cpumask_nth_and(cpu, policy->cpus,
+						  housekeeping_cpumask(HK_TYPE_TICK));
+		cpufreq_cpu_put(policy);
+		if (ref_cpu >= nr_cpu_ids)
+			return 0;
+		cpu = ref_cpu;
+	}
+	/*
+	 * Reversed computation to the one used to determine
+	 * the arch_freq_scale value
+	 * (see amu_scale_freq_tick for details)
+	 */
+	scale = per_cpu(arch_freq_scale, cpu);
+	scale *= cpufreq_get_hw_max_freq(cpu);
+	freq = scale >> SCHED_CAPACITY_SHIFT;
+
+	return freq;
+}
+
 #ifdef CONFIG_ACPI_CPPC_LIB
 #include <acpi/cppc_acpi.h>
 
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 6b52ebe5a890..9f2cf45bf190 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -710,7 +710,8 @@  static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
 	ssize_t ret;
 	unsigned int freq;
 
-	freq = arch_freq_get_on_cpu(policy->cpu);
+	freq = !cpufreq_driver->get ? arch_freq_get_on_cpu(policy->cpu)
+				    : 0;
 	if (freq)
 		ret = sprintf(buf, "%u\n", freq);
 	else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
@@ -747,7 +748,11 @@  store_one(scaling_max_freq, max);
 static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy,
 					char *buf)
 {
-	unsigned int cur_freq = __cpufreq_get(policy);
+	unsigned int cur_freq;
+
+	cur_freq = arch_freq_get_on_cpu(policy->cpu);
+	if (!cur_freq)
+		cur_freq = __cpufreq_get(policy);
 
 	if (cur_freq)
 		return sprintf(buf, "%u\n", cur_freq);