diff mbox series

[4/8] cpufreq,vexpress-spc: fix Frequency Invariance (FI) for bL switching

Message ID 20200701090751.7543-5-ionela.voinescu@arm.com (mailing list archive)
State Changes Requested, archived
Headers show
Series cpufreq: improve frequency invariance support | expand

Commit Message

Ionela Voinescu July 1, 2020, 9:07 a.m. UTC
In the majority of cases, the index argument to cpufreq's target_index()
is meant to identify the frequency that is requested from the hardware,
according to the frequency table: policy->freq_table[index].frequency.

After successfully requesting it from the hardware, this value, together
with the maximum hardware frequency (policy->cpuinfo.max_freq) are used
as arguments to arch_set_freq_scale(), in order to set the task scheduler
frequency scale factor. This is a normalized indication of a CPU's
current performance.

But for the vexpress-spc-cpufreq driver, when big.LITTLE switching [1]
is enabled, there are three issues with using the above information for
setting the FI scale factor:

 - cur_freq: policy->freq_table[index].frequency is not the frequency
   requested from the hardware. ve_spc_cpufreq_set_rate() will convert
   from this virtual frequency to an actual frequency, which is then
   requested from the hardware. For the A7 cluster, the virtual frequency
   is half the actual frequency. The use of the virtual policy->freq_table
   frequency results in an incorrect FI scale factor.

 - max_freq: policy->cpuinfo.max_freq does not correctly identify the
   maximum frequency of the physical cluster. This value identifies the
   maximum frequency achievable by the big-LITTLE pair, that is the
   maximum frequency of the big CPU. But when the LITTLE CPU in the group
   is used, the hardware maximum frquency passed to arch_set_freq_scale()
   is incorrect.

 - missing a scale factor update: when switching clusters, the driver
   recalculates the frequency of the old clock domain based on the
   requests of the remaining CPUs in the domain and asks for a clock
   change. But this does not result in an update in the scale factor.

Therefore, introduce a local function bLs_set_sched_freq_scale() that
helps call arch_set_freq_scale() with correct information for the
is_bL_switching_enabled() case, while maintaining the old, more
efficient, call site of arch_set_freq_scale() for when cluster
switching is disabled.

Also, because of these requirements in computing the scale factor, this
driver is the only one that maintains custom support for FI, which is
marked by the presence of the CPUFREQ_CUSTOM_SET_FREQ_SCALE flag.

[1] https://lwn.net/Articles/481055/

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Liviu Dudau <liviu.dudau@arm.com>
---
 drivers/cpufreq/vexpress-spc-cpufreq.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

Comments

Viresh Kumar July 1, 2020, 10:46 a.m. UTC | #1
On 01-07-20, 10:07, Ionela Voinescu wrote:
> In the majority of cases, the index argument to cpufreq's target_index()
> is meant to identify the frequency that is requested from the hardware,
> according to the frequency table: policy->freq_table[index].frequency.
> 
> After successfully requesting it from the hardware, this value, together
> with the maximum hardware frequency (policy->cpuinfo.max_freq) are used
> as arguments to arch_set_freq_scale(), in order to set the task scheduler
> frequency scale factor. This is a normalized indication of a CPU's
> current performance.
> 
> But for the vexpress-spc-cpufreq driver, when big.LITTLE switching [1]
> is enabled, there are three issues with using the above information for
> setting the FI scale factor:
> 
>  - cur_freq: policy->freq_table[index].frequency is not the frequency
>    requested from the hardware. ve_spc_cpufreq_set_rate() will convert
>    from this virtual frequency to an actual frequency, which is then
>    requested from the hardware. For the A7 cluster, the virtual frequency
>    is half the actual frequency. The use of the virtual policy->freq_table
>    frequency results in an incorrect FI scale factor.
> 
>  - max_freq: policy->cpuinfo.max_freq does not correctly identify the
>    maximum frequency of the physical cluster. This value identifies the
>    maximum frequency achievable by the big-LITTLE pair, that is the
>    maximum frequency of the big CPU. But when the LITTLE CPU in the group
>    is used, the hardware maximum frquency passed to arch_set_freq_scale()
>    is incorrect.
> 
>  - missing a scale factor update: when switching clusters, the driver
>    recalculates the frequency of the old clock domain based on the
>    requests of the remaining CPUs in the domain and asks for a clock
>    change. But this does not result in an update in the scale factor.
> 
> Therefore, introduce a local function bLs_set_sched_freq_scale() that
> helps call arch_set_freq_scale() with correct information for the
> is_bL_switching_enabled() case, while maintaining the old, more
> efficient, call site of arch_set_freq_scale() for when cluster
> switching is disabled.
> 
> Also, because of these requirements in computing the scale factor, this
> driver is the only one that maintains custom support for FI, which is
> marked by the presence of the CPUFREQ_CUSTOM_SET_FREQ_SCALE flag.
> 
> [1] https://lwn.net/Articles/481055/
> 
> Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Liviu Dudau <liviu.dudau@arm.com>
> ---
>  drivers/cpufreq/vexpress-spc-cpufreq.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)

Is there anyone who cares for this driver and EAS ? I will just skip doing the
FIE thing here and mark it skipped.
Ionela Voinescu July 1, 2020, 2:07 p.m. UTC | #2
On Wednesday 01 Jul 2020 at 16:16:19 (+0530), Viresh Kumar wrote:
> On 01-07-20, 10:07, Ionela Voinescu wrote:
> > In the majority of cases, the index argument to cpufreq's target_index()
> > is meant to identify the frequency that is requested from the hardware,
> > according to the frequency table: policy->freq_table[index].frequency.
> > 
> > After successfully requesting it from the hardware, this value, together
> > with the maximum hardware frequency (policy->cpuinfo.max_freq) are used
> > as arguments to arch_set_freq_scale(), in order to set the task scheduler
> > frequency scale factor. This is a normalized indication of a CPU's
> > current performance.
> > 
> > But for the vexpress-spc-cpufreq driver, when big.LITTLE switching [1]
> > is enabled, there are three issues with using the above information for
> > setting the FI scale factor:
> > 
> >  - cur_freq: policy->freq_table[index].frequency is not the frequency
> >    requested from the hardware. ve_spc_cpufreq_set_rate() will convert
> >    from this virtual frequency to an actual frequency, which is then
> >    requested from the hardware. For the A7 cluster, the virtual frequency
> >    is half the actual frequency. The use of the virtual policy->freq_table
> >    frequency results in an incorrect FI scale factor.
> > 
> >  - max_freq: policy->cpuinfo.max_freq does not correctly identify the
> >    maximum frequency of the physical cluster. This value identifies the
> >    maximum frequency achievable by the big-LITTLE pair, that is the
> >    maximum frequency of the big CPU. But when the LITTLE CPU in the group
> >    is used, the hardware maximum frquency passed to arch_set_freq_scale()
> >    is incorrect.
> > 
> >  - missing a scale factor update: when switching clusters, the driver
> >    recalculates the frequency of the old clock domain based on the
> >    requests of the remaining CPUs in the domain and asks for a clock
> >    change. But this does not result in an update in the scale factor.
> > 
> > Therefore, introduce a local function bLs_set_sched_freq_scale() that
> > helps call arch_set_freq_scale() with correct information for the
> > is_bL_switching_enabled() case, while maintaining the old, more
> > efficient, call site of arch_set_freq_scale() for when cluster
> > switching is disabled.
> > 
> > Also, because of these requirements in computing the scale factor, this
> > driver is the only one that maintains custom support for FI, which is
> > marked by the presence of the CPUFREQ_CUSTOM_SET_FREQ_SCALE flag.
> > 
> > [1] https://lwn.net/Articles/481055/
> > 
> > Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Cc: Liviu Dudau <liviu.dudau@arm.com>
> > ---
> >  drivers/cpufreq/vexpress-spc-cpufreq.c | 23 ++++++++++++++++++++++-
> >  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> Is there anyone who cares for this driver and EAS ? I will just skip doing the
> FIE thing here and mark it skipped.
> 

That is a good question. The vexpress driver is still used for TC2, but
I don't know of any users of this bL switcher functionality that's part
of the driver. I think there were a few people wondering recently if
it's still used [1].

If we disconsider the bL switcher functionality, then the vexpress
driver itself gets in line with the other drivers supported by this
series. Therefore there would not be a flag set needed here. Also, to
maintain current functionality, we would not need to introduce a flag
at all.

But, the frequency invariance fix is also useful for schedutil to
better select a frequency based on invariant utilization. So if we
independently decide having a flag like the one introduced at 1/8 is
useful, I would recommend to consider this patch, as it does fix some
current functionality in the kernel (whether we can determine if it's
used much or not).

Therefore, IMO, if it's not used any more it should be removed, but if
it's kept it should be fixed.

[1] https://lore.kernel.org/linux-arm-kernel/20200624195811.435857-8-maz@kernel.org/


Thanks,
Ionela.


> -- 
> viresh
Viresh Kumar July 2, 2020, 3:05 a.m. UTC | #3
On 01-07-20, 15:07, Ionela Voinescu wrote:
> On Wednesday 01 Jul 2020 at 16:16:19 (+0530), Viresh Kumar wrote:
> > Is there anyone who cares for this driver and EAS ? I will just skip doing the
> > FIE thing here and mark it skipped.
> 
> That is a good question. The vexpress driver is still used for TC2, but
> I don't know of any users of this bL switcher functionality that's part
> of the driver. I think there were a few people wondering recently if
> it's still used [1].

Even if it is used by some, there is no need, I believe, to enable
freq-invariance for it, which wasn't enabled until now.

And considering that we are going to enable the flag only for the
interested parties now, as from the discussion on 1/8, this shouldn't
be required.
Ionela Voinescu July 2, 2020, 11:41 a.m. UTC | #4
Hi,

On Thursday 02 Jul 2020 at 08:35:51 (+0530), Viresh Kumar wrote:
> On 01-07-20, 15:07, Ionela Voinescu wrote:
> > On Wednesday 01 Jul 2020 at 16:16:19 (+0530), Viresh Kumar wrote:
> > > Is there anyone who cares for this driver and EAS ? I will just skip doing the
> > > FIE thing here and mark it skipped.
> > 
> > That is a good question. The vexpress driver is still used for TC2, but
> > I don't know of any users of this bL switcher functionality that's part
> > of the driver. I think there were a few people wondering recently if
> > it's still used [1].
> 
> Even if it is used by some, there is no need, I believe, to enable
> freq-invariance for it, which wasn't enabled until now.
>

It was enabled until now, but it was partially broken. If you look over
the driver you'll see arch_set_freq_scale() being called for both
is_bL_switching_enabled() and for when it's not [1].

For !is_bL_switching_enabled() this is fine. But for
is_bL_switching_enabled(), it is broken as described in 4/8.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/cpufreq/vexpress-spc-cpufreq.c?h=v5.8-rc3#n203

> And considering that we are going to enable the flag only for the
> interested parties now, as from the discussion on 1/8, this shouldn't
> be required.
> 

If we just don't want frequency invariance for
is_bL_switching_enabled(), I can just guard the setting of the flag
suggested by Rafael at 1/8 by !CONFIG_BL_SWITCHER.

I'll proceed to do that and remove the fix at 4/8.

Many thanks!
Ionela.

> -- 
> viresh
Viresh Kumar July 2, 2020, 11:46 a.m. UTC | #5
On 02-07-20, 12:41, Ionela Voinescu wrote:
> It was enabled until now, but it was partially broken. If you look over
> the driver you'll see arch_set_freq_scale() being called for both
> is_bL_switching_enabled() and for when it's not [1].

I missed that completely, it was indeed added here:

commit 518accf20629 ("cpufreq: arm_big_little: invoke frequency-invariance setter function")

and so this patch or a version of it is required here.

> If we just don't want frequency invariance for
> is_bL_switching_enabled(), I can just guard the setting of the flag
> suggested by Rafael at 1/8 by !CONFIG_BL_SWITCHER.
> 
> I'll proceed to do that and remove the fix at 4/8.

I think it would be better to do that and avoid any complicate code
unnecessarily here.
diff mbox series

Patch

diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c
index e0a1a3367ec5..f2caf67d4050 100644
--- a/drivers/cpufreq/vexpress-spc-cpufreq.c
+++ b/drivers/cpufreq/vexpress-spc-cpufreq.c
@@ -55,6 +55,8 @@  static atomic_t cluster_usage[MAX_CLUSTERS + 1];
 static unsigned int clk_big_min;	/* (Big) clock frequencies */
 static unsigned int clk_little_max;	/* Maximum clock frequency (Little) */
 
+static inline u32 get_table_max(struct cpufreq_frequency_table *table);
+
 static DEFINE_PER_CPU(unsigned int, physical_cluster);
 static DEFINE_PER_CPU(unsigned int, cpu_last_req_freq);
 
@@ -87,6 +89,18 @@  static unsigned int find_cluster_maxfreq(int cluster)
 	return max_freq;
 }
 
+static void bLs_set_sched_freq_scale(int cluster, unsigned long cur_freq)
+{
+	unsigned long max_freq = get_table_max(freq_table[cluster]);
+	int j;
+
+	for_each_online_cpu(j) {
+		if (cluster == per_cpu(physical_cluster, j))
+			arch_set_freq_scale(get_cpu_mask(j), cur_freq,
+					    max_freq);
+	}
+}
+
 static unsigned int clk_get_cpu_rate(unsigned int cpu)
 {
 	u32 cur_cluster = per_cpu(physical_cluster, cpu);
@@ -154,6 +168,9 @@  ve_spc_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate)
 
 	mutex_unlock(&cluster_lock[new_cluster]);
 
+	if (bLs)
+		bLs_set_sched_freq_scale(new_cluster, new_rate);
+
 	/* Recalc freq for old cluster when switching clusters */
 	if (old_cluster != new_cluster) {
 		/* Switch cluster */
@@ -170,7 +187,11 @@  ve_spc_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate)
 			pr_err("%s: clk_set_rate failed: %d, old cluster: %d\n",
 			       __func__, ret, old_cluster);
 		}
+
 		mutex_unlock(&cluster_lock[old_cluster]);
+
+		if (new_rate)
+			bLs_set_sched_freq_scale(old_cluster, new_rate);
 	}
 
 	return 0;
@@ -200,7 +221,7 @@  static int ve_spc_cpufreq_set_target(struct cpufreq_policy *policy,
 	ret = ve_spc_cpufreq_set_rate(cpu, actual_cluster, new_cluster,
 				      freqs_new);
 
-	if (!ret) {
+	if (!is_bL_switching_enabled() && !ret) {
 		arch_set_freq_scale(policy->related_cpus, freqs_new,
 				    policy->cpuinfo.max_freq);
 	}