diff mbox series

[v3,6/7] arm64: use activity monitors for frequency invariance

Message ID 20200211184542.29585-7-ionela.voinescu@arm.com (mailing list archive)
State Not Applicable, archived
Headers show
Series arm64: ARMv8.4 Activity Monitors support | expand

Commit Message

Ionela Voinescu Feb. 11, 2020, 6:45 p.m. UTC
The Frequency Invariance Engine (FIE) is providing a frequency
scaling correction factor that helps achieve more accurate
load-tracking.

So far, for arm and arm64 platforms, this scale factor has been
obtained based on the ratio between the current frequency and the
maximum supported frequency recorded by the cpufreq policy. The
setting of this scale factor is triggered from cpufreq drivers by
calling arch_set_freq_scale. The current frequency used in computation
is the frequency requested by a governor, but it may not be the
frequency that was implemented by the platform.

This correction factor can also be obtained using a core counter and a
constant counter to get information on the performance (frequency based
only) obtained in a period of time. This will more accurately reflect
the actual current frequency of the CPU, compared with the alternative
implementation that reflects the request of a performance level from
the OS.

Therefore, implement arch_scale_freq_tick to use activity monitors, if
present, for the computation of the frequency scale factor.

The use of AMU counters depends on:
 - CONFIG_ARM64_AMU_EXTN - depents on the AMU extension being present
 - (optional) CONFIG_CPU_FREQ - the current frequency obtained using
   counter information is divided by the maximum frequency obtained from
   the cpufreq policy. But the use of cpufreq policy maximum frequency
   is weak and cpu_get_max_freq can be redefined to provide the data
   some other way.

While it is possible to have a combination of CPUs in the system with
and without support for activity monitors, the use of counters for
frequency invariance is only enabled for a CPU if all related CPUs
(CPUs in the same frequency domain) support and have enabled the core
and constant activity monitor counters. In this way, there is a clear
separation between the policies for which arch_set_freq_scale (cpufreq
based FIE) is used, and the policies for which arch_scale_freq_tick
(counter based FIE) is used to set the frequency scale factor. For
this purpose, a late_initcall_sync is registered to trigger validation
work for policies that will enable or disable the use of AMU counters
for frequency invariance. If CONFIG_CPU_FREQ is not defined, the use
of counters is enabled on all CPUs only if all possible CPUs correctly
support the necessary counters.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
---
 arch/arm64/include/asm/topology.h |  16 +++
 arch/arm64/kernel/cpufeature.c    |   4 +
 arch/arm64/kernel/topology.c      | 185 ++++++++++++++++++++++++++++++
 drivers/base/arch_topology.c      |   8 ++
 include/linux/topology.h          |   7 ++
 5 files changed, 220 insertions(+)

Comments

Lukasz Luba Feb. 12, 2020, 6:59 p.m. UTC | #1
Hi Ionela,

On 2/11/20 6:45 PM, Ionela Voinescu wrote:
> The Frequency Invariance Engine (FIE) is providing a frequency
> scaling correction factor that helps achieve more accurate
> load-tracking.
> 
> So far, for arm and arm64 platforms, this scale factor has been
> obtained based on the ratio between the current frequency and the
> maximum supported frequency recorded by the cpufreq policy. The
> setting of this scale factor is triggered from cpufreq drivers by
> calling arch_set_freq_scale. The current frequency used in computation
> is the frequency requested by a governor, but it may not be the
> frequency that was implemented by the platform.
> 
> This correction factor can also be obtained using a core counter and a
> constant counter to get information on the performance (frequency based
> only) obtained in a period of time. This will more accurately reflect
> the actual current frequency of the CPU, compared with the alternative
> implementation that reflects the request of a performance level from
> the OS.
> 
> Therefore, implement arch_scale_freq_tick to use activity monitors, if
> present, for the computation of the frequency scale factor.
> 
> The use of AMU counters depends on:
>   - CONFIG_ARM64_AMU_EXTN - depents on the AMU extension being present
>   - (optional) CONFIG_CPU_FREQ - the current frequency obtained using
>     counter information is divided by the maximum frequency obtained from
>     the cpufreq policy. But the use of cpufreq policy maximum frequency
>     is weak and cpu_get_max_freq can be redefined to provide the data
>     some other way.
> 
> While it is possible to have a combination of CPUs in the system with
> and without support for activity monitors, the use of counters for
> frequency invariance is only enabled for a CPU if all related CPUs
> (CPUs in the same frequency domain) support and have enabled the core
> and constant activity monitor counters. In this way, there is a clear
> separation between the policies for which arch_set_freq_scale (cpufreq
> based FIE) is used, and the policies for which arch_scale_freq_tick
> (counter based FIE) is used to set the frequency scale factor. For
> this purpose, a late_initcall_sync is registered to trigger validation
> work for policies that will enable or disable the use of AMU counters
> for frequency invariance. If CONFIG_CPU_FREQ is not defined, the use
> of counters is enabled on all CPUs only if all possible CPUs correctly
> support the necessary counters.
> 
> Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> ---
>   arch/arm64/include/asm/topology.h |  16 +++
>   arch/arm64/kernel/cpufeature.c    |   4 +
>   arch/arm64/kernel/topology.c      | 185 ++++++++++++++++++++++++++++++
>   drivers/base/arch_topology.c      |   8 ++
>   include/linux/topology.h          |   7 ++
>   5 files changed, 220 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
> index a4d945db95a2..d910d463cf76 100644
> --- a/arch/arm64/include/asm/topology.h
> +++ b/arch/arm64/include/asm/topology.h
> @@ -16,6 +16,22 @@ int pcibus_to_node(struct pci_bus *bus);
>   
>   #include <linux/arch_topology.h>
>   
> +#ifdef CONFIG_ARM64_AMU_EXTN
> +extern unsigned int cpu_get_max_freq(unsigned int cpu);
> +/*
> + * Replace default function that signals use of counters
> + * for frequency invariance for given CPUs.
> + */
> +bool topology_cpu_freq_counters(struct cpumask *cpus);
> +#define arch_cpu_freq_counters topology_cpu_freq_counters
> +/*
> + * Replace task scheduler's default counter-based
> + * frequency-invariance scale factor setting.
> + */
> +void topology_scale_freq_tick(void);
> +#define arch_scale_freq_tick topology_scale_freq_tick
> +#endif /* CONFIG_ARM64_AMU_EXTN */
> +
>   /* Replace task scheduler's default frequency-invariant accounting */
>   #define arch_scale_freq_capacity topology_get_freq_scale
>   
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 029a473ad273..a4620b269b56 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1172,12 +1172,16 @@ bool cpu_has_amu_feat(int cpu)
>   	return false;
>   }
>   
> +/* Initialize the use of AMU counters for frequency invariance */
> +extern void init_cpu_freq_invariance_counters(void);
> +
>   static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap)
>   {
>   	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) {
>   		pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n",
>   			smp_processor_id());
>   		cpumask_set_cpu(smp_processor_id(), amu_cpus);
> +		init_cpu_freq_invariance_counters();
>   	}
>   }
>   
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index fa9528dfd0ce..3f6e379f07b3 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -14,6 +14,7 @@
>   #include <linux/acpi.h>
>   #include <linux/arch_topology.h>
>   #include <linux/cacheinfo.h>
> +#include <linux/cpufreq.h>
>   #include <linux/init.h>
>   #include <linux/percpu.h>
>   
> @@ -120,4 +121,188 @@ int __init parse_acpi_topology(void)
>   }
>   #endif
>   
> +#ifdef CONFIG_ARM64_AMU_EXTN
>   
> +#undef pr_fmt
> +#define pr_fmt(fmt) "AMU: " fmt
> +
> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale);
> +static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> +static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> +static cpumask_var_t amu_fie_cpus;
> +
> +/* Obtain max frequency (in KHz) as reported by hardware */
> +__weak unsigned int cpu_get_max_freq(unsigned int cpu)
> +{
> +	return 0;
> +}
> +
> +#ifdef CONFIG_CPU_FREQ
> +/* Replace max frequency getter with cpufreq based function */
> +#define cpu_get_max_freq cpufreq_get_hw_max_freq
> +#endif

Can we just use cpufreq_get_hw_max_freq()?
We have cpufreq_get_hw_max_freq returning 0 in such case, so it should
be OK.

Is there a possibility that some platform which has !CONFIG_CPU_FREQ
would define its own cpu_get_max_freq() overwriting the weak function
above?
Based on the code which checks 'if (unlikely(!max_freq_hz))' it should,
otherwise 'valid_cpus' is not set.

I would assume that we won't see such platform, interested
in AMU freq invariance without CONFIG_CPU_FREQ.

We already have a lot of these defines or __weak functions, which is
hard to follow.

Apart from that and the issue with cpu_has_amu_feat() it looks good.

> +
> +/* Initialize counter reference per-cpu variables for the current CPU */
> +void init_cpu_freq_invariance_counters(void)
> +{
> +	this_cpu_write(arch_core_cycles_prev,
> +		       read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0));
> +	this_cpu_write(arch_const_cycles_prev,
> +		       read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0));
> +}
> +
> +static int validate_cpu_freq_invariance_counters(int cpu)
> +{
> +	u64 max_freq_hz, ratio;
> +
> +	if (!cpu_has_amu_feat(cpu)) {

As Suzuki pointed out with 'amu_cpus', make sure we read
a one instance of mask here.


Regards,
Lukasz
Ionela Voinescu Feb. 13, 2020, 9:47 a.m. UTC | #2
Hi Lukasz,

[..]
> > +
> > +/* Obtain max frequency (in KHz) as reported by hardware */
> > +__weak unsigned int cpu_get_max_freq(unsigned int cpu)
> > +{
> > +	return 0;
> > +}
> > +
> > +#ifdef CONFIG_CPU_FREQ
> > +/* Replace max frequency getter with cpufreq based function */
> > +#define cpu_get_max_freq cpufreq_get_hw_max_freq
> > +#endif
> 
> Can we just use cpufreq_get_hw_max_freq()?
> We have cpufreq_get_hw_max_freq returning 0 in such case, so it should
> be OK.
> 

The reasoning for the implementation is the following:
 - For CONFIG_CPU_FREQ we use cpufreq_get_hw_max_freq (weak default or
   strong alternative)
 - For !CONFIG_CPU_FREQ cpufreq_get_hw_max_freq returns 0 - it signals
   that cpufreq cannot return the hardware max frequency. In this case
   cpu_get_max_freq is used (weak default or strong alternative
   implementation).

> Is there a possibility that some platform which has !CONFIG_CPU_FREQ
> would define its own cpu_get_max_freq() overwriting the weak function
> above?
> Based on the code which checks 'if (unlikely(!max_freq_hz))' it should,
> otherwise 'valid_cpus' is not set.
> 
> I would assume that we won't see such platform, interested
> in AMU freq invariance without CONFIG_CPU_FREQ.
> 
> We already have a lot of these defines or __weak functions, which is
> hard to follow.

There is no dependency between CONFIG_CPU_FREQ and frequency invariance.
Therefore, I did not see a reason to potentially bypass the use of AMU
for frequency invariance for !CONFIG_CPU_FREQ.

But I agree it makes the code harder to read so I can remove
cpu_get_max_freq and keep cpufreq_get_hw_max_freq only until there is a
provable need for this. 

Thank you for the review,
Ionela.
Valentin Schneider Feb. 17, 2020, 4:59 p.m. UTC | #3
Hi Ionela,

Overall I think this approach is much better, and apart from a few nits below
this is looking pretty good.

On 2/11/20 6:45 PM, Ionela Voinescu wrote:
> @@ -120,4 +121,188 @@ int __init parse_acpi_topology(void)
>  }
>  #endif
>  
> +#ifdef CONFIG_ARM64_AMU_EXTN
>  
> +#undef pr_fmt
> +#define pr_fmt(fmt) "AMU: " fmt
> +
> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale);
> +static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> +static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> +static cpumask_var_t amu_fie_cpus;
> +
> +/* Obtain max frequency (in KHz) as reported by hardware */
> +__weak unsigned int cpu_get_max_freq(unsigned int cpu)
> +{
> +	return 0;
> +}
> +
> +#ifdef CONFIG_CPU_FREQ
> +/* Replace max frequency getter with cpufreq based function */
> +#define cpu_get_max_freq cpufreq_get_hw_max_freq
> +#endif
> +
> +/* Initialize counter reference per-cpu variables for the current CPU */
> +void init_cpu_freq_invariance_counters(void)
> +{
> +	this_cpu_write(arch_core_cycles_prev,
> +		       read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0));
> +	this_cpu_write(arch_const_cycles_prev,
> +		       read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0));
> +}
> +
> +static int validate_cpu_freq_invariance_counters(int cpu)
> +{
> +	u64 max_freq_hz, ratio;
> +
> +	if (!cpu_has_amu_feat(cpu)) {
> +		pr_debug("CPU%d: counters are not supported.\n", cpu);
> +		return -EINVAL;
> +	}
> +
> +	if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
> +		     !per_cpu(arch_core_cycles_prev, cpu))) {
> +		pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
> +		return -EINVAL;
> +	}
> +
> +	/* Convert maximum frequency from KHz to Hz and validate */
> +	max_freq_hz = cpu_get_max_freq(cpu) * 1000;
> +	if (unlikely(!max_freq_hz)) {
> +		pr_debug("CPU%d: invalid maximum frequency.\n", cpu);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Pre-compute the fixed ratio between the frequency of the constant
> +	 * counter and the maximum frequency of the CPU.
> +	 *
> +	 *			      const_freq
> +	 * arch_max_freq_scale =   ---------------- * SCHED_CAPACITY_SCALE²
> +	 *			   cpuinfo_max_freq
> +	 *
> +	 * We use a factor of 2 * SCHED_CAPACITY_SHIFT -> SCHED_CAPACITY_SCALE²
> +	 * in order to ensure a good resolution for arch_max_freq_scale for
> +	 * very low arch timer frequencies (up to the KHz range which should be
                                            ^^^^^
<pedantic hat on>: s/up to/down to/

> +	 * unlikely).
> +	 */
> +	ratio = (u64)arch_timer_get_rate() << (2 * SCHED_CAPACITY_SHIFT);
> +	ratio = div64_u64(ratio, max_freq_hz);
> +	if (!ratio) {
> +		pr_err("System timer frequency too low.\n");

Should that be a WARN_ONCE() instead? If the arch timer freq is too low,
we'll end up spamming this message, since we go through this for all CPUs.


> +		return -EINVAL;
> +	}
> +
> +	per_cpu(arch_max_freq_scale, cpu) = (unsigned long)ratio;
> +

It occurred to me that this isn't strictly speaking a per-CPU information as
it only depends on the max possible frequency. Not really worth bothering
about though, I think.

> +	return 0;
> +}
> +
> +static inline int
> +enable_policy_freq_counters(int cpu, cpumask_var_t valid_cpus)
> +{
> +	struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> +
> +	if (!policy) {
> +		pr_debug("CPU%d: No cpufreq policy found.\n", cpu);
> +		return false;
> +	}
> +
> +	if (cpumask_subset(policy->related_cpus, valid_cpus)) {
> +		cpumask_or(amu_fie_cpus, policy->related_cpus,
> +			   amu_fie_cpus);
> +		pr_info("CPUs[%*pbl]: counters will be used for FIE.",
> +			cpumask_pr_args(amu_fie_cpus));

Could we have a single print of all CPUs in one go? AIUI this will generate a
line per cpufreq policy. Maybe just something at the tail of init_amu_fie():

if (!cpumask_empty(amu_fie_cpus))
	pr_info(<blah>);

> +	}
> +
> +	cpufreq_cpu_put(policy);
> +
> +	return true;
> +}
> +
> +static int __init init_amu_fie(void)
> +{
> +	cpumask_var_t valid_cpus;
> +	bool have_policy = false;
> +	int cpu;
> +
> +	if (!zalloc_cpumask_var(&valid_cpus, GFP_KERNEL) ||
> +	    !zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	for_each_possible_cpu(cpu) {
> +		if (validate_cpu_freq_invariance_counters(cpu))
> +			continue;
> +		cpumask_set_cpu(cpu, valid_cpus);
> +		have_policy = enable_policy_freq_counters(cpu, valid_cpus) ||
> +			      have_policy;

What about:
		have_policy |= enable_policy_freq_counters(cpu, valid_cpus);

> +	}
> +
> +	if (!have_policy) {
> +		/*
> +		 * If we are not restricted by cpufreq policies, we only enable
> +		 * the use of the AMU feature for FIE if all CPUs support AMU.
> +		 * Otherwise, enable_policy_freq_counters has already enabled
> +		 * policy cpus.
> +		 */
> +		if (cpumask_equal(valid_cpus, cpu_possible_mask)) {

Mmm so I'm thinking what we want here is the cpu_present_mask rather than
the possible one. This is very corner-casy, but I think that if we fail to
boot a secondary, we'll have it possible but not present.

While at it you could make the loop only target present CPUs, but I think the
one bit that matters is this check right here (!present should fail at
validate_cpu_freq_invariance_counters()).

> +			cpumask_or(amu_fie_cpus, amu_fie_cpus, valid_cpus);
> +			pr_info("CPUs[%*pbl]: counters will be used for FIE.",
> +				cpumask_pr_args(amu_fie_cpus));
> +		}
> +	}
> +
> +	free_cpumask_var(valid_cpus);
> +
> +	return 0;
> +}
> +late_initcall_sync(init_amu_fie);
> +
> +bool topology_cpu_freq_counters(struct cpumask *cpus)
> +{
> +	return cpumask_available(amu_fie_cpus) &&
> +	       cpumask_subset(cpus, amu_fie_cpus);
> +}
> +
> +void topology_scale_freq_tick(void)
> +{
> +	u64 prev_core_cnt, prev_const_cnt;
> +	u64 core_cnt, const_cnt, scale;
> +	int cpu = smp_processor_id();
> +
> +	if (!cpumask_available(amu_fie_cpus) ||
> +	    !cpumask_test_cpu(cpu, amu_fie_cpus))
> +		return;

It might be a good idea to have a static key to gate our entry into this
function - that way we can lessen our impact on older platforms (without AMUs)
running a recent kernel with CONFIG_ARM64_AMU_EXTN=y.

x86 does just that, if you look at their arch_scale_freq_tick()
implementation. FWIW I don't think we should bother with playing with the
key counter to count AMU-enabled CPUs, just enable it at startup if we have
> 1 such CPU and let the cpumask drive the rest.

In your check here, the static key check could replace the cpumask_available()
check. The static key could also be used for topology_cpu_freq_counters().

> +
> +	const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
> +	core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
> +	prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> +	prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> +
> +	if (unlikely(core_cnt <= prev_core_cnt ||
> +		     const_cnt <= prev_const_cnt))
> +		goto store_and_exit;
> +
> +	/*
> +	 *	    /\core    arch_max_freq_scale
> +	 * scale =  ------- * --------------------
> +	 *	    /\const   SCHED_CAPACITY_SCALE
> +	 *
> +	 * We shift by SCHED_CAPACITY_SHIFT (divide by SCHED_CAPACITY_SCALE)
> +	 * in order to compensate for the SCHED_CAPACITY_SCALE² factor in
> +	 * arch_max_freq_scale (used to ensure its resolution) while keeping
> +	 * the scale value in the 0-SCHED_CAPACITY_SCALE capacity range.
> +	 */

A simple "See validate_cpu_freq_invariance_counters() for details on the
scale factor" would suffice wrt the shifting details.

> +	scale = core_cnt - prev_core_cnt;
> +	scale *= this_cpu_read(arch_max_freq_scale);
> +	scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
> +			  const_cnt - prev_const_cnt);
> +
> +	scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> +	this_cpu_write(freq_scale, (unsigned long)scale);
> +
> +store_and_exit:
> +	this_cpu_write(arch_core_cycles_prev, core_cnt);
> +	this_cpu_write(arch_const_cycles_prev, const_cnt);
> +}
> +#endif /* CONFIG_ARM64_AMU_EXTN */
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 1eb81f113786..1ab2b7503d63 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -29,6 +29,14 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
>  	unsigned long scale;
>  	int i;
>  
> +	/*
> +	 * If the use of counters for FIE is enabled, just return as we don't
> +	 * want to update the scale factor with information from CPUFREQ.
> +	 * Instead the scale factor will be updated from arch_scale_freq_tick.
> +	 */
> +	if (arch_cpu_freq_counters(cpus))
> +		return;
> +
>  	scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
>  
>  	for_each_cpu(i, cpus)
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index eb2fe6edd73c..397aad6ae163 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -227,5 +227,12 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
>  	return cpumask_of_node(cpu_to_node(cpu));
>  }
>  
> +#ifndef arch_cpu_freq_counters
> +static __always_inline
> +bool arch_cpu_freq_counters(struct cpumask *cpus)
> +{
> +	return false;
> +}
> +#endif
>  
>  #endif /* _LINUX_TOPOLOGY_H */
>
Ionela Voinescu Feb. 23, 2020, 6:49 p.m. UTC | #4
Hi Valentin,

Sorry for the delay in my reply and thank you very much for the review!

I will push v4 very soon with these changes.

On Monday 17 Feb 2020 at 16:59:24 (+0000), Valentin Schneider wrote:
> > +	 * Pre-compute the fixed ratio between the frequency of the constant
> > +	 * counter and the maximum frequency of the CPU.
> > +	 *
> > +	 *			      const_freq
> > +	 * arch_max_freq_scale =   ---------------- * SCHED_CAPACITY_SCALE²
> > +	 *			   cpuinfo_max_freq
> > +	 *
> > +	 * We use a factor of 2 * SCHED_CAPACITY_SHIFT -> SCHED_CAPACITY_SCALE²
> > +	 * in order to ensure a good resolution for arch_max_freq_scale for
> > +	 * very low arch timer frequencies (up to the KHz range which should be
>                                             ^^^^^
> <pedantic hat on>: s/up to/down to/

Done!

> > +	 * unlikely).
> > +	 */
> > +	ratio = (u64)arch_timer_get_rate() << (2 * SCHED_CAPACITY_SHIFT);
> > +	ratio = div64_u64(ratio, max_freq_hz);
> > +	if (!ratio) {
> > +		pr_err("System timer frequency too low.\n");
> 
> Should that be a WARN_ONCE() instead? If the arch timer freq is too low,
> we'll end up spamming this message, since we go through this for all CPUs.

Done!

> > +		return -EINVAL;
> > +	}
> > +
> > +	per_cpu(arch_max_freq_scale, cpu) = (unsigned long)ratio;
> > +
> 
> It occurred to me that this isn't strictly speaking a per-CPU information as
> it only depends on the max possible frequency. Not really worth bothering
> about though, I think.
> 

Yes, it depends on the max possible frequency of all CPUs in a frequency
domain. But I wanted to put this factor in a per-cpu variable in order
to be able to retrieve it faster in topology_scale_freq_tick, rather
than having to consider policies and related CPUs in that function.

> > +	return 0;
> > +}
> > +
> > +static inline int
> > +enable_policy_freq_counters(int cpu, cpumask_var_t valid_cpus)
> > +{
> > +	struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > +
> > +	if (!policy) {
> > +		pr_debug("CPU%d: No cpufreq policy found.\n", cpu);
> > +		return false;
> > +	}
> > +
> > +	if (cpumask_subset(policy->related_cpus, valid_cpus)) {
> > +		cpumask_or(amu_fie_cpus, policy->related_cpus,
> > +			   amu_fie_cpus);
> > +		pr_info("CPUs[%*pbl]: counters will be used for FIE.",
> > +			cpumask_pr_args(amu_fie_cpus));
> 
> Could we have a single print of all CPUs in one go? AIUI this will generate a
> line per cpufreq policy. Maybe just something at the tail of init_amu_fie():
> 
> if (!cpumask_empty(amu_fie_cpus))
> 	pr_info(<blah>);
> 

Done. I've used this location as well to set the static key that you've
suggested below.

> > +	}
> > +
> > +	cpufreq_cpu_put(policy);
> > +
> > +	return true;
> > +}
> > +
> > +static int __init init_amu_fie(void)
> > +{
> > +	cpumask_var_t valid_cpus;
> > +	bool have_policy = false;
> > +	int cpu;
> > +
> > +	if (!zalloc_cpumask_var(&valid_cpus, GFP_KERNEL) ||
> > +	    !zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL))
> > +		return -ENOMEM;
> > +
> > +	for_each_possible_cpu(cpu) {
> > +		if (validate_cpu_freq_invariance_counters(cpu))
> > +			continue;
> > +		cpumask_set_cpu(cpu, valid_cpus);
> > +		have_policy = enable_policy_freq_counters(cpu, valid_cpus) ||
> > +			      have_policy;
> 
> What about:
> 		have_policy |= enable_policy_freq_counters(cpu, valid_cpus);
> 

Done as well.

> > +	}
> > +
> > +	if (!have_policy) {
> > +		/*
> > +		 * If we are not restricted by cpufreq policies, we only enable
> > +		 * the use of the AMU feature for FIE if all CPUs support AMU.
> > +		 * Otherwise, enable_policy_freq_counters has already enabled
> > +		 * policy cpus.
> > +		 */
> > +		if (cpumask_equal(valid_cpus, cpu_possible_mask)) {
> 
> Mmm so I'm thinking what we want here is the cpu_present_mask rather than
> the possible one. This is very corner-casy, but I think that if we fail to
> boot a secondary, we'll have it possible but not present.
>

Yes, this is correct. It does depend on the stage it fails at: for
example if some feature checks fail, a CPU will not be marked in
cpu_present_mask (see cpu_die_early()), while the following will result
in possible == present.

---
[    0.056524] EFI services will not be available.
[    0.065690] smp: Bringing up secondary CPUs ...
[    0.098010] psci: failed to boot CPU1 (-22)
[    0.098037] CPU1: failed to boot: -22
[    0.130290] psci: failed to boot CPU2 (-22)
[    0.130315] CPU2: failed to boot: -22
[    0.162568] psci: failed to boot CPU3 (-22)
[    0.162594] CPU3: failed to boot: -22
[    0.194890] Detected PIPT I-cache on CPU4
[    0.194990] GICv3: CPU4: found redistributor 100 region
0:0x000000002f120000
[    0.195046] GICv3: CPU4: using allocated LPI pending table
@0x00000000fc0d0000
[    0.195133] CPU4: Booted secondary processor 0x0000000100
[0x410fd0f0]
[    0.227190] psci: failed to boot CPU5 (-22)
[    0.227412] CPU5: failed to boot: -22
[    0.259431] psci: failed to boot CPU6 (-22)
[    0.259522] CPU6: failed to boot: -22
[    0.291683] psci: failed to boot CPU7 (-22)
[    0.291709] CPU7: failed to boot: -22
[    0.291990] smp: Brought up 1 node, 2 CPUs  
[..]
root@buildroot:~# cat present
0-7
root@buildroot:~# cat possible
0-7

This failure happens while the CPU is being brought up (__cpu_up).
I'm not sure if this should result in set_cpu_present(cpu, 0) as well.
But it's unrelated to this..

In any case, your suggestion is valid and cpu_present_mask is better to
be used here.


> While at it you could make the loop only target present CPUs, but I think the
> one bit that matters is this check right here (!present should fail at
> validate_cpu_freq_invariance_counters()).
> 

Will change the loop as well. Thanks!

> > +			cpumask_or(amu_fie_cpus, amu_fie_cpus, valid_cpus);
> > +			pr_info("CPUs[%*pbl]: counters will be used for FIE.",
> > +				cpumask_pr_args(amu_fie_cpus));
> > +		}
> > +	}
> > +
> > +	free_cpumask_var(valid_cpus);
> > +
> > +	return 0;
> > +}
> > +late_initcall_sync(init_amu_fie);
> > +
> > +bool topology_cpu_freq_counters(struct cpumask *cpus)
> > +{
> > +	return cpumask_available(amu_fie_cpus) &&
> > +	       cpumask_subset(cpus, amu_fie_cpus);
> > +}
> > +
> > +void topology_scale_freq_tick(void)
> > +{
> > +	u64 prev_core_cnt, prev_const_cnt;
> > +	u64 core_cnt, const_cnt, scale;
> > +	int cpu = smp_processor_id();
> > +
> > +	if (!cpumask_available(amu_fie_cpus) ||
> > +	    !cpumask_test_cpu(cpu, amu_fie_cpus))
> > +		return;
> 
> It might be a good idea to have a static key to gate our entry into this
> function - that way we can lessen our impact on older platforms (without AMUs)
> running a recent kernel with CONFIG_ARM64_AMU_EXTN=y.
> 
> x86 does just that, if you look at their arch_scale_freq_tick()
> implementation. FWIW I don't think we should bother with playing with the
> key counter to count AMU-enabled CPUs, just enable it at startup if we have
> > 1 such CPU and let the cpumask drive the rest.
> 
> In your check here, the static key check could replace the cpumask_available()
> check. The static key could also be used for topology_cpu_freq_counters().
>

Very good idea! Done as well. Yes, the counter (number of AMU enabled
CPUs) would not be of much help for the moment.

> > +
> > +	const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
> > +	core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
> > +	prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> > +	prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> > +
> > +	if (unlikely(core_cnt <= prev_core_cnt ||
> > +		     const_cnt <= prev_const_cnt))
> > +		goto store_and_exit;
> > +
> > +	/*
> > +	 *	    /\core    arch_max_freq_scale
> > +	 * scale =  ------- * --------------------
> > +	 *	    /\const   SCHED_CAPACITY_SCALE
> > +	 *
> > +	 * We shift by SCHED_CAPACITY_SHIFT (divide by SCHED_CAPACITY_SCALE)
> > +	 * in order to compensate for the SCHED_CAPACITY_SCALE² factor in
> > +	 * arch_max_freq_scale (used to ensure its resolution) while keeping
> > +	 * the scale value in the 0-SCHED_CAPACITY_SCALE capacity range.
> > +	 */
> 
> A simple "See validate_cpu_freq_invariance_counters() for details on the
> scale factor" would suffice wrt the shifting details.
>

Done!

Thank you,
Ionela.

> > +	scale = core_cnt - prev_core_cnt;
> > +	scale *= this_cpu_read(arch_max_freq_scale);
> > +	scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
> > +			  const_cnt - prev_const_cnt);
> > +
> > +	scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> > +	this_cpu_write(freq_scale, (unsigned long)scale);
> > +
> > +store_and_exit:
> > +	this_cpu_write(arch_core_cycles_prev, core_cnt);
> > +	this_cpu_write(arch_const_cycles_prev, const_cnt);
> > +}
> > +#endif /* CONFIG_ARM64_AMU_EXTN */
> > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> > index 1eb81f113786..1ab2b7503d63 100644
> > --- a/drivers/base/arch_topology.c
> > +++ b/drivers/base/arch_topology.c
> > @@ -29,6 +29,14 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
> >  	unsigned long scale;
> >  	int i;
> >  
> > +	/*
> > +	 * If the use of counters for FIE is enabled, just return as we don't
> > +	 * want to update the scale factor with information from CPUFREQ.
> > +	 * Instead the scale factor will be updated from arch_scale_freq_tick.
> > +	 */
> > +	if (arch_cpu_freq_counters(cpus))
> > +		return;
> > +
> >  	scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
> >  
> >  	for_each_cpu(i, cpus)
> > diff --git a/include/linux/topology.h b/include/linux/topology.h
> > index eb2fe6edd73c..397aad6ae163 100644
> > --- a/include/linux/topology.h
> > +++ b/include/linux/topology.h
> > @@ -227,5 +227,12 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
> >  	return cpumask_of_node(cpu_to_node(cpu));
> >  }
> >  
> > +#ifndef arch_cpu_freq_counters
> > +static __always_inline
> > +bool arch_cpu_freq_counters(struct cpumask *cpus)
> > +{
> > +	return false;
> > +}
> > +#endif
> >  
> >  #endif /* _LINUX_TOPOLOGY_H */
> >
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index a4d945db95a2..d910d463cf76 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -16,6 +16,22 @@  int pcibus_to_node(struct pci_bus *bus);
 
 #include <linux/arch_topology.h>
 
+#ifdef CONFIG_ARM64_AMU_EXTN
+extern unsigned int cpu_get_max_freq(unsigned int cpu);
+/*
+ * Replace default function that signals use of counters
+ * for frequency invariance for given CPUs.
+ */
+bool topology_cpu_freq_counters(struct cpumask *cpus);
+#define arch_cpu_freq_counters topology_cpu_freq_counters
+/*
+ * Replace task scheduler's default counter-based
+ * frequency-invariance scale factor setting.
+ */
+void topology_scale_freq_tick(void);
+#define arch_scale_freq_tick topology_scale_freq_tick
+#endif /* CONFIG_ARM64_AMU_EXTN */
+
 /* Replace task scheduler's default frequency-invariant accounting */
 #define arch_scale_freq_capacity topology_get_freq_scale
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 029a473ad273..a4620b269b56 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1172,12 +1172,16 @@  bool cpu_has_amu_feat(int cpu)
 	return false;
 }
 
+/* Initialize the use of AMU counters for frequency invariance */
+extern void init_cpu_freq_invariance_counters(void);
+
 static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap)
 {
 	if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) {
 		pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n",
 			smp_processor_id());
 		cpumask_set_cpu(smp_processor_id(), amu_cpus);
+		init_cpu_freq_invariance_counters();
 	}
 }
 
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index fa9528dfd0ce..3f6e379f07b3 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -14,6 +14,7 @@ 
 #include <linux/acpi.h>
 #include <linux/arch_topology.h>
 #include <linux/cacheinfo.h>
+#include <linux/cpufreq.h>
 #include <linux/init.h>
 #include <linux/percpu.h>
 
@@ -120,4 +121,188 @@  int __init parse_acpi_topology(void)
 }
 #endif
 
+#ifdef CONFIG_ARM64_AMU_EXTN
 
+#undef pr_fmt
+#define pr_fmt(fmt) "AMU: " fmt
+
+static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale);
+static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
+static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
+static cpumask_var_t amu_fie_cpus;
+
+/* Obtain max frequency (in KHz) as reported by hardware */
+__weak unsigned int cpu_get_max_freq(unsigned int cpu)
+{
+	return 0;
+}
+
+#ifdef CONFIG_CPU_FREQ
+/* Replace max frequency getter with cpufreq based function */
+#define cpu_get_max_freq cpufreq_get_hw_max_freq
+#endif
+
+/* Initialize counter reference per-cpu variables for the current CPU */
+void init_cpu_freq_invariance_counters(void)
+{
+	this_cpu_write(arch_core_cycles_prev,
+		       read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0));
+	this_cpu_write(arch_const_cycles_prev,
+		       read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0));
+}
+
+static int validate_cpu_freq_invariance_counters(int cpu)
+{
+	u64 max_freq_hz, ratio;
+
+	if (!cpu_has_amu_feat(cpu)) {
+		pr_debug("CPU%d: counters are not supported.\n", cpu);
+		return -EINVAL;
+	}
+
+	if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
+		     !per_cpu(arch_core_cycles_prev, cpu))) {
+		pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
+		return -EINVAL;
+	}
+
+	/* Convert maximum frequency from KHz to Hz and validate */
+	max_freq_hz = cpu_get_max_freq(cpu) * 1000;
+	if (unlikely(!max_freq_hz)) {
+		pr_debug("CPU%d: invalid maximum frequency.\n", cpu);
+		return -EINVAL;
+	}
+
+	/*
+	 * Pre-compute the fixed ratio between the frequency of the constant
+	 * counter and the maximum frequency of the CPU.
+	 *
+	 *			      const_freq
+	 * arch_max_freq_scale =   ---------------- * SCHED_CAPACITY_SCALE²
+	 *			   cpuinfo_max_freq
+	 *
+	 * We use a factor of 2 * SCHED_CAPACITY_SHIFT -> SCHED_CAPACITY_SCALE²
+	 * in order to ensure a good resolution for arch_max_freq_scale for
+	 * very low arch timer frequencies (up to the KHz range which should be
+	 * unlikely).
+	 */
+	ratio = (u64)arch_timer_get_rate() << (2 * SCHED_CAPACITY_SHIFT);
+	ratio = div64_u64(ratio, max_freq_hz);
+	if (!ratio) {
+		pr_err("System timer frequency too low.\n");
+		return -EINVAL;
+	}
+
+	per_cpu(arch_max_freq_scale, cpu) = (unsigned long)ratio;
+
+	return 0;
+}
+
+static inline int
+enable_policy_freq_counters(int cpu, cpumask_var_t valid_cpus)
+{
+	struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
+
+	if (!policy) {
+		pr_debug("CPU%d: No cpufreq policy found.\n", cpu);
+		return false;
+	}
+
+	if (cpumask_subset(policy->related_cpus, valid_cpus)) {
+		cpumask_or(amu_fie_cpus, policy->related_cpus,
+			   amu_fie_cpus);
+		pr_info("CPUs[%*pbl]: counters will be used for FIE.",
+			cpumask_pr_args(amu_fie_cpus));
+	}
+
+	cpufreq_cpu_put(policy);
+
+	return true;
+}
+
+static int __init init_amu_fie(void)
+{
+	cpumask_var_t valid_cpus;
+	bool have_policy = false;
+	int cpu;
+
+	if (!zalloc_cpumask_var(&valid_cpus, GFP_KERNEL) ||
+	    !zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL))
+		return -ENOMEM;
+
+	for_each_possible_cpu(cpu) {
+		if (validate_cpu_freq_invariance_counters(cpu))
+			continue;
+		cpumask_set_cpu(cpu, valid_cpus);
+		have_policy = enable_policy_freq_counters(cpu, valid_cpus) ||
+			      have_policy;
+	}
+
+	if (!have_policy) {
+		/*
+		 * If we are not restricted by cpufreq policies, we only enable
+		 * the use of the AMU feature for FIE if all CPUs support AMU.
+		 * Otherwise, enable_policy_freq_counters has already enabled
+		 * policy cpus.
+		 */
+		if (cpumask_equal(valid_cpus, cpu_possible_mask)) {
+			cpumask_or(amu_fie_cpus, amu_fie_cpus, valid_cpus);
+			pr_info("CPUs[%*pbl]: counters will be used for FIE.",
+				cpumask_pr_args(amu_fie_cpus));
+		}
+	}
+
+	free_cpumask_var(valid_cpus);
+
+	return 0;
+}
+late_initcall_sync(init_amu_fie);
+
+bool topology_cpu_freq_counters(struct cpumask *cpus)
+{
+	return cpumask_available(amu_fie_cpus) &&
+	       cpumask_subset(cpus, amu_fie_cpus);
+}
+
+void topology_scale_freq_tick(void)
+{
+	u64 prev_core_cnt, prev_const_cnt;
+	u64 core_cnt, const_cnt, scale;
+	int cpu = smp_processor_id();
+
+	if (!cpumask_available(amu_fie_cpus) ||
+	    !cpumask_test_cpu(cpu, amu_fie_cpus))
+		return;
+
+	const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0);
+	core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0);
+	prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
+	prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
+
+	if (unlikely(core_cnt <= prev_core_cnt ||
+		     const_cnt <= prev_const_cnt))
+		goto store_and_exit;
+
+	/*
+	 *	    /\core    arch_max_freq_scale
+	 * scale =  ------- * --------------------
+	 *	    /\const   SCHED_CAPACITY_SCALE
+	 *
+	 * We shift by SCHED_CAPACITY_SHIFT (divide by SCHED_CAPACITY_SCALE)
+	 * in order to compensate for the SCHED_CAPACITY_SCALE² factor in
+	 * arch_max_freq_scale (used to ensure its resolution) while keeping
+	 * the scale value in the 0-SCHED_CAPACITY_SCALE capacity range.
+	 */
+	scale = core_cnt - prev_core_cnt;
+	scale *= this_cpu_read(arch_max_freq_scale);
+	scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
+			  const_cnt - prev_const_cnt);
+
+	scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
+	this_cpu_write(freq_scale, (unsigned long)scale);
+
+store_and_exit:
+	this_cpu_write(arch_core_cycles_prev, core_cnt);
+	this_cpu_write(arch_const_cycles_prev, const_cnt);
+}
+#endif /* CONFIG_ARM64_AMU_EXTN */
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 1eb81f113786..1ab2b7503d63 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -29,6 +29,14 @@  void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
 	unsigned long scale;
 	int i;
 
+	/*
+	 * If the use of counters for FIE is enabled, just return as we don't
+	 * want to update the scale factor with information from CPUFREQ.
+	 * Instead the scale factor will be updated from arch_scale_freq_tick.
+	 */
+	if (arch_cpu_freq_counters(cpus))
+		return;
+
 	scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
 
 	for_each_cpu(i, cpus)
diff --git a/include/linux/topology.h b/include/linux/topology.h
index eb2fe6edd73c..397aad6ae163 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -227,5 +227,12 @@  static inline const struct cpumask *cpu_cpu_mask(int cpu)
 	return cpumask_of_node(cpu_to_node(cpu));
 }
 
+#ifndef arch_cpu_freq_counters
+static __always_inline
+bool arch_cpu_freq_counters(struct cpumask *cpus)
+{
+	return false;
+}
+#endif
 
 #endif /* _LINUX_TOPOLOGY_H */