Message ID | d398729676f3d2b0d2ab024a2c9ea6e9ee1d0dca.1611829953.git.viresh.kumar@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cpufreq: cppc: Add support for frequency invariance | expand |
Hi Viresh, Many thanks for the renaming of functions/variables/enums. I've cropped all the code that looks good to me, and I kept some portions of interest. On Thursday 28 Jan 2021 at 16:18:55 (+0530), Viresh Kumar wrote: > This patch attempts to make it generic enough so other parts of the > kernel can also provide their own implementation of scale_freq_tick() > callback, which is called by the scheduler periodically to update the > per-cpu freq_scale variable. [..] > static void amu_fie_setup(const struct cpumask *cpus) > { > @@ -227,7 +263,7 @@ static void amu_fie_setup(const struct cpumask *cpus) > if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask)) > return; > > - static_branch_enable(&amu_fie_key); > + topology_set_scale_freq_source(&amu_sfd, amu_fie_cpus); > > pr_debug("CPUs[%*pbl]: counters will be used for FIE.", > cpumask_pr_args(cpus)); > @@ -283,53 +319,6 @@ static int __init init_amu_fie(void) > } > core_initcall(init_amu_fie); [..] > +void topology_set_scale_freq_source(struct scale_freq_data *data, > + const struct cpumask *cpus) > +{ > + struct scale_freq_data *sfd; > + int cpu; > + > + for_each_cpu(cpu, cpus) { > + sfd = per_cpu(sft_data, cpu); > + > + /* Use ARCH provided counters whenever possible */ > + if (!sfd || sfd->source != SCALE_FREQ_SOURCE_ARCH) { > + per_cpu(sft_data, cpu) = data; > + cpumask_set_cpu(cpu, &scale_freq_counters_mask); > + } > + } > } > +EXPORT_SYMBOL_GPL(topology_set_scale_freq_source); [..] I have one single comment left for this patch, and apologies in advance for the long story. In general, for frequency invariance, we're interested in whether the full system is frequency invariant as well, for two reasons: - We want to be able to either set a scale factor for all CPUs or none at all. - If at some point during the boot process the system invariance status changes, we want to be able to inform dependents: schedutil and EAS. This management is currently done on amu_fie_setup(), because before these patches we only had two sources for frequency invariance: cpufreq and AMU counters. But that's not enough after these two patches, from both functional and code design point of view. I have to mention that your code will work as it is for now, but only because CPPC is the new other source of counters, and for CPPC we also have cpufreq invariance available. But this only hides what can become a problem later: if in the future we won't have cpufreq invariance for CPPC or if another provider of counters is added and used in a system without cpufreq invariance, the late initialization of these counters will either break schedutil/scheduler if not all CPUs support those counters, or keep EAS disabled, even if all CPUs support these counters, and frequency invariance is later enabled. Therefore, I think system level invariance management (checks and call to rebuild_sched_domains_energy()) also needs to move from arm64 code to arch_topology code. Thanks, Ionela.
On 03-02-21, 11:45, Ionela Voinescu wrote: > Therefore, I think system level invariance management (checks and > call to rebuild_sched_domains_energy()) also needs to move from arm64 > code to arch_topology code. Here is the 3rd patch of this series then :) From: Viresh Kumar <viresh.kumar@linaro.org> Date: Fri, 5 Feb 2021 13:31:53 +0530 Subject: [PATCH] drivers: arch_topology: rebuild sched domains on invariance change We already do this for the arm64, move it to arch_topology.c as we manage all sched_freq_tick sources here now. Reported-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- arch/arm64/kernel/topology.c | 16 ---------------- drivers/base/arch_topology.c | 22 ++++++++++++++++++++++ 2 files changed, 22 insertions(+), 16 deletions(-) diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index 1e47dfd465f8..47fca7376c93 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -240,7 +240,6 @@ static struct scale_freq_data amu_sfd = { static void amu_fie_setup(const struct cpumask *cpus) { - bool invariant; int cpu; /* We are already set since the last insmod of cpufreq driver */ @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus) cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus); - invariant = topology_scale_freq_invariant(); - - /* We aren't fully invariant yet */ - if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask)) - return; - topology_set_scale_freq_source(&amu_sfd, amu_fie_cpus); pr_debug("CPUs[%*pbl]: counters will be used for FIE.", cpumask_pr_args(cpus)); - - /* - * Task scheduler behavior depends on frequency invariance support, - * either cpufreq or counter driven. If the support status changes as - * a result of counter initialisation and use, retrigger the build of - * scheduling domains to ensure the information is propagated properly. - */ - if (!invariant) - rebuild_sched_domains_energy(); } static int init_amu_fie_callback(struct notifier_block *nb, unsigned long val, diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 20b511949cd8..3631877f4440 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -23,6 +23,7 @@ static DEFINE_PER_CPU(struct scale_freq_data *, sft_data); static struct cpumask scale_freq_counters_mask; +static bool scale_freq_invariant; static bool supports_scale_freq_counters(const struct cpumask *cpus) { @@ -35,6 +36,23 @@ bool topology_scale_freq_invariant(void) supports_scale_freq_counters(cpu_online_mask); } +static void update_scale_freq_invariant(bool status) +{ + if (scale_freq_invariant == status) + return; + + /* + * Task scheduler behavior depends on frequency invariance support, + * either cpufreq or counter driven. If the support status changes as + * a result of counter initialisation and use, retrigger the build of + * scheduling domains to ensure the information is propagated properly. + */ + if (topology_scale_freq_invariant() == status) { + scale_freq_invariant = status; + rebuild_sched_domains_energy(); + } +} + void topology_set_scale_freq_source(struct scale_freq_data *data, const struct cpumask *cpus) { @@ -50,6 +68,8 @@ void topology_set_scale_freq_source(struct scale_freq_data *data, cpumask_set_cpu(cpu, &scale_freq_counters_mask); } } + + update_scale_freq_invariant(true); } EXPORT_SYMBOL_GPL(topology_set_scale_freq_source); @@ -67,6 +87,8 @@ void topology_clear_scale_freq_source(enum scale_freq_source source, cpumask_clear_cpu(cpu, &scale_freq_counters_mask); } } + + update_scale_freq_invariant(false); } EXPORT_SYMBOL_GPL(topology_clear_scale_freq_source);
Hey, On Friday 05 Feb 2021 at 14:44:24 (+0530), Viresh Kumar wrote: > On 03-02-21, 11:45, Ionela Voinescu wrote: > > Therefore, I think system level invariance management (checks and > > call to rebuild_sched_domains_energy()) also needs to move from arm64 > > code to arch_topology code. > > Here is the 3rd patch of this series then :) > I think it could be merged in patch 1/2 as it's part of enabling the use of multiple sources of information for FIE. Up to you! > From: Viresh Kumar <viresh.kumar@linaro.org> > Date: Fri, 5 Feb 2021 13:31:53 +0530 > Subject: [PATCH] drivers: arch_topology: rebuild sched domains on invariance > change > > We already do this for the arm64, move it to arch_topology.c as we > manage all sched_freq_tick sources here now. > > Reported-by: Ionela Voinescu <ionela.voinescu@arm.com> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > --- > arch/arm64/kernel/topology.c | 16 ---------------- > drivers/base/arch_topology.c | 22 ++++++++++++++++++++++ > 2 files changed, 22 insertions(+), 16 deletions(-) > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c > index 1e47dfd465f8..47fca7376c93 100644 > --- a/arch/arm64/kernel/topology.c > +++ b/arch/arm64/kernel/topology.c > @@ -240,7 +240,6 @@ static struct scale_freq_data amu_sfd = { > > static void amu_fie_setup(const struct cpumask *cpus) > { > - bool invariant; > int cpu; > > /* We are already set since the last insmod of cpufreq driver */ > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus) > > cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus); > > - invariant = topology_scale_freq_invariant(); > - > - /* We aren't fully invariant yet */ > - if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask)) > - return; > - You still need these checks, otherwise you could end up with only part of the CPUs setting a scale factor, when only part of the CPUs support AMUs and there is no cpufreq support for FIE. > topology_set_scale_freq_source(&amu_sfd, amu_fie_cpus); > > pr_debug("CPUs[%*pbl]: counters will be used for FIE.", > cpumask_pr_args(cpus)); > - > - /* > - * Task scheduler behavior depends on frequency invariance support, > - * either cpufreq or counter driven. If the support status changes as > - * a result of counter initialisation and use, retrigger the build of > - * scheduling domains to ensure the information is propagated properly. > - */ > - if (!invariant) > - rebuild_sched_domains_energy(); > } > > static int init_amu_fie_callback(struct notifier_block *nb, unsigned long val, > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c > index 20b511949cd8..3631877f4440 100644 > --- a/drivers/base/arch_topology.c > +++ b/drivers/base/arch_topology.c > @@ -23,6 +23,7 @@ > > static DEFINE_PER_CPU(struct scale_freq_data *, sft_data); > static struct cpumask scale_freq_counters_mask; > +static bool scale_freq_invariant; > > static bool supports_scale_freq_counters(const struct cpumask *cpus) > { > @@ -35,6 +36,23 @@ bool topology_scale_freq_invariant(void) > supports_scale_freq_counters(cpu_online_mask); > } > > +static void update_scale_freq_invariant(bool status) > +{ > + if (scale_freq_invariant == status) > + return; > + > + /* > + * Task scheduler behavior depends on frequency invariance support, > + * either cpufreq or counter driven. If the support status changes as > + * a result of counter initialisation and use, retrigger the build of > + * scheduling domains to ensure the information is propagated properly. > + */ > + if (topology_scale_freq_invariant() == status) { > + scale_freq_invariant = status; > + rebuild_sched_domains_energy(); > + } > +} > + > void topology_set_scale_freq_source(struct scale_freq_data *data, > const struct cpumask *cpus) > { > @@ -50,6 +68,8 @@ void topology_set_scale_freq_source(struct scale_freq_data *data, > cpumask_set_cpu(cpu, &scale_freq_counters_mask); > } > } Small(ish) optimisation at the beginning of this function: if (cpumask_empty(&scale_freq_counters_mask)) scale_freq_invariant = topology_scale_freq_invariant(); This will save you a call to rebuild_sched_domains_energy(), which is quite expensive, when cpufreq supports FIE and we also have counters. After comments addressed, Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Ionela Voinescu <ionela.voinescu@arm.com> ..for 1/2 and this addition. Thanks, Ionela. > + > + update_scale_freq_invariant(true); > } > EXPORT_SYMBOL_GPL(topology_set_scale_freq_source); > > @@ -67,6 +87,8 @@ void topology_clear_scale_freq_source(enum scale_freq_source source, > cpumask_clear_cpu(cpu, &scale_freq_counters_mask); > } > } > + > + update_scale_freq_invariant(false); > } > EXPORT_SYMBOL_GPL(topology_clear_scale_freq_source); > > -- > 2.25.0.rc1.19.g042ed3e048af > > -- > viresh
On 17-02-21, 00:24, Ionela Voinescu wrote: > I think it could be merged in patch 1/2 as it's part of enabling the use > of multiple sources of information for FIE. Up to you! Sure. > > static void amu_fie_setup(const struct cpumask *cpus) > > { > > - bool invariant; > > int cpu; > > > > /* We are already set since the last insmod of cpufreq driver */ > > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus) > > > > cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus); > > > > - invariant = topology_scale_freq_invariant(); > > - > > - /* We aren't fully invariant yet */ > > - if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask)) > > - return; > > - > > You still need these checks, otherwise you could end up with only part > of the CPUs setting a scale factor, when only part of the CPUs support > AMUs and there is no cpufreq support for FIE. Both supports_scale_freq_counters() and topology_scale_freq_invariant() take care of this now and they will keep reporting the system as invariant until the time all the CPUs have counters (in absence of cpufreq). The topology_set_scale_freq_source() API is supposed to be called multiple times, probably once for each policy and so I don't see a need of these checks anymore. > Small(ish) optimisation at the beginning of this function: > > if (cpumask_empty(&scale_freq_counters_mask)) > scale_freq_invariant = topology_scale_freq_invariant(); > > This will save you a call to rebuild_sched_domains_energy(), which is > quite expensive, when cpufreq supports FIE and we also have counters. Good Point. > After comments addressed, > > Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Thanks. > Tested-by: Ionela Voinescu <ionela.voinescu@arm.com> Just out of curiosity, what exactly did you test and what was the setup ? :)
Hi, Replying this first as it's going to be relevant below: > Just out of curiosity, what exactly did you test and what was the setup ? :) I tested it on: - Juno R0 (CPUs [0, 3-5] are littles, CPUs [1-2] are bigs) + PMUs faking AMUs + userspace/schedutil + + cpufreq-FIE/!cpufreq-FIE + DT This testing did not yet cover patch 2/2. My checklist shows: - system invariance status correct - passed - scale factor correct (userspace cpufreq governor) - passed - arch_set_freq_scale bypassed - passed - partial "AMUs" support - failed (see below) - EAS enabling - passed I don't have an automated process for this as many test cases involve kernel source changes. In time I will automate all of these and possibly cover all scenarios with FVP (fast models) testing, but for now human error is possible :). On Wednesday 17 Feb 2021 at 09:55:58 (+0530), Viresh Kumar wrote: > On 17-02-21, 00:24, Ionela Voinescu wrote: > > I think it could be merged in patch 1/2 as it's part of enabling the use > > of multiple sources of information for FIE. Up to you! > > Sure. > > > > static void amu_fie_setup(const struct cpumask *cpus) > > > { > > > - bool invariant; > > > int cpu; > > > > > > /* We are already set since the last insmod of cpufreq driver */ > > > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus) > > > > > > cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus); > > > > > > - invariant = topology_scale_freq_invariant(); > > > - > > > - /* We aren't fully invariant yet */ > > > - if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask)) > > > - return; > > > - > > > > You still need these checks, otherwise you could end up with only part > > of the CPUs setting a scale factor, when only part of the CPUs support > > AMUs and there is no cpufreq support for FIE. > > Both supports_scale_freq_counters() and topology_scale_freq_invariant() take > care of this now and they will keep reporting the system as invariant until the > time all the CPUs have counters (in absence of cpufreq). > Correct! > The topology_set_scale_freq_source() API is supposed to be called multiple > times, probably once for each policy and so I don't see a need of these checks > anymore. > The problem is not topology_scale_freq_invariant() but whether a scale factor is set for some CPUs. Scenario (test system above): - "AMUs" are only supported for [1-2], - cpufreq_supports_freq_invariance() -> false What should happen: - topology_scale_freq_invariant() -> false (passed) - all CPUs should have their freq_scale unmodified (1024) - (failed) because only 2 out of 6 CPUs have a method of setting a scale factor What does happen: - arch_set_freq_tick() -> topology_set_freq_tick() will set a scale factor for [1-2] based on AMUs. This should not happen. We will end up with invariant signals for bigs and signals that are not freq invariant for littles. Ionela. > > Small(ish) optimisation at the beginning of this function: > > > > if (cpumask_empty(&scale_freq_counters_mask)) > > scale_freq_invariant = topology_scale_freq_invariant(); > > > > This will save you a call to rebuild_sched_domains_energy(), which is > > quite expensive, when cpufreq supports FIE and we also have counters. > > Good Point. > > > After comments addressed, > > > > Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> > > Thanks. > > > Tested-by: Ionela Voinescu <ionela.voinescu@arm.com> > > > -- > viresh
On 17-02-21, 11:30, Ionela Voinescu wrote: > The problem is not topology_scale_freq_invariant() but whether a scale > factor is set for some CPUs. > > Scenario (test system above): > - "AMUs" are only supported for [1-2], > - cpufreq_supports_freq_invariance() -> false > > What should happen: > - topology_scale_freq_invariant() -> false (passed) > - all CPUs should have their freq_scale unmodified (1024) - (failed) > because only 2 out of 6 CPUs have a method of setting a scale factor > > What does happen: > - arch_set_freq_tick() -> topology_set_freq_tick() will set a scale > factor for [1-2] based on AMUs. This should not happen. We will end > up with invariant signals for bigs and signals that are not freq > invariant for littles. Another case. cpufreq is included as a module and AMU is implemented partially. - first time cpufreq driver is inserted, we set up everything and freq_scale gets updated on ticks. - remove cpufreq driver, we are back in same situation. We can't control it that way.. Or we add another call layer in middle before the tick-handler gets called for AMU, which will check if we are fully invariant or not ?
On Wednesday 17 Feb 2021 at 17:10:27 (+0530), Viresh Kumar wrote: > On 17-02-21, 11:30, Ionela Voinescu wrote: > > The problem is not topology_scale_freq_invariant() but whether a scale > > factor is set for some CPUs. > > > > Scenario (test system above): > > - "AMUs" are only supported for [1-2], > > - cpufreq_supports_freq_invariance() -> false > > > > What should happen: > > - topology_scale_freq_invariant() -> false (passed) > > - all CPUs should have their freq_scale unmodified (1024) - (failed) > > because only 2 out of 6 CPUs have a method of setting a scale factor > > > > What does happen: > > - arch_set_freq_tick() -> topology_set_freq_tick() will set a scale > > factor for [1-2] based on AMUs. This should not happen. We will end > > up with invariant signals for bigs and signals that are not freq > > invariant for littles. > > Another case. cpufreq is included as a module and AMU is implemented > partially. > > - first time cpufreq driver is inserted, we set up everything and > freq_scale gets updated on ticks. > > - remove cpufreq driver, we are back in same situation. > Yes, but the littles (lacking AMUs) would have had a scale factor set through arch_set_freq_scale() which will correspond to the last frequency change through the cpufreq driver. When removing the driver, it's unlikely that the frequency of littles will change (no driver). - topology_scale_freq_invariant() will still return true. - littles would still have a scale factor set which is likely accurate - bigs will continue updating the scale factor through AMUs. See a very useful comment someone added recently :) : """ + /* + * We don't need to handle CPUFREQ_REMOVE_POLICY event as the AMU + * counters don't have any dependency on cpufreq driver once we have + * initialized AMU support and enabled invariance. The AMU counters will + * keep on working just fine in the absence of the cpufreq driver, and + * for the CPUs for which there are no counters available, the last set + * value of freq_scale will remain valid as that is the frequency those + * CPUs are running at. + */ """ > We can't control it that way.. Or we add another call layer in middle > before the tick-handler gets called for AMU, which will check if we > are fully invariant or not ? > I would avoid additional work done on the tick, especially for a scenario which is unlikely. If you think this case is worth supporting, it might be best to do it at CPUFREQ_REMOVE_POLICY event. Thanks, Ionela. > -- > viresh
On 17-02-21, 11:57, Ionela Voinescu wrote: > See a very useful comment someone added recently :) : > > """ > + /* > + * We don't need to handle CPUFREQ_REMOVE_POLICY event as the AMU > + * counters don't have any dependency on cpufreq driver once we have > + * initialized AMU support and enabled invariance. The AMU counters will > + * keep on working just fine in the absence of the cpufreq driver, and > + * for the CPUs for which there are no counters available, the last set > + * value of freq_scale will remain valid as that is the frequency those > + * CPUs are running at. > + */ > """ Lol...
On 17-02-21, 00:24, Ionela Voinescu wrote: > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c > > index 1e47dfd465f8..47fca7376c93 100644 > > --- a/arch/arm64/kernel/topology.c > > +++ b/arch/arm64/kernel/topology.c > > @@ -240,7 +240,6 @@ static struct scale_freq_data amu_sfd = { > > > > static void amu_fie_setup(const struct cpumask *cpus) > > { > > - bool invariant; > > int cpu; > > > > /* We are already set since the last insmod of cpufreq driver */ > > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus) > > > > cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus); > > > > - invariant = topology_scale_freq_invariant(); > > - > > - /* We aren't fully invariant yet */ > > - if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask)) > > - return; > > - > > You still need these checks, otherwise you could end up with only part > of the CPUs setting a scale factor, when only part of the CPUs support > AMUs and there is no cpufreq support for FIE. Another look at it and here goes another reason (hope I don't have another in-code comment somewhere else to kill this one) :) We don't need to care for the reason you gave (which is a valid reason otherwise), as we are talking specifically about amu_fie_setup() here and it gets called from cpufreq policy-notifier. i.e. we won't support AMUs without cpufreq being there in the first place and the same goes for cppc-driver. Does that sound reasonable ?
Hey, On Thursday 18 Feb 2021 at 15:03:04 (+0530), Viresh Kumar wrote: > On 17-02-21, 00:24, Ionela Voinescu wrote: > > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c > > > index 1e47dfd465f8..47fca7376c93 100644 > > > --- a/arch/arm64/kernel/topology.c > > > +++ b/arch/arm64/kernel/topology.c > > > @@ -240,7 +240,6 @@ static struct scale_freq_data amu_sfd = { > > > > > > static void amu_fie_setup(const struct cpumask *cpus) > > > { > > > - bool invariant; > > > int cpu; > > > > > > /* We are already set since the last insmod of cpufreq driver */ > > > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus) > > > > > > cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus); > > > > > > - invariant = topology_scale_freq_invariant(); > > > - > > > - /* We aren't fully invariant yet */ > > > - if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask)) > > > - return; > > > - > > > > You still need these checks, otherwise you could end up with only part > > of the CPUs setting a scale factor, when only part of the CPUs support > > AMUs and there is no cpufreq support for FIE. > > Another look at it and here goes another reason (hope I don't have > another in-code comment somewhere else to kill this one) :) > > We don't need to care for the reason you gave (which is a valid reason > otherwise), as we are talking specifically about amu_fie_setup() here > and it gets called from cpufreq policy-notifier. i.e. we won't support > AMUs without cpufreq being there in the first place and the same goes > for cppc-driver. > > Does that sound reasonable ? > Yes, we don't care if there is no cpufreq driver, as the use of AMUs won't get initialised either. But we do care if there is a cpufreq driver that does not support frequency invariance, which is the example above. The intention with the patches that made cpufreq based invariance generic a while back was for it to be present, seamlessly, for as many drivers as possible, as a less than accurate invariance default method is still better than nothing. So only a few drivers today don't support cpufreq based FI, but it's not a guarantee that it will stay this way. Hope it makes sense, Ionela. > -- > viresh
On 18-02-21, 16:36, Ionela Voinescu wrote: > Yes, we don't care if there is no cpufreq driver, as the use of AMUs won't > get initialised either. But we do care if there is a cpufreq driver that > does not support frequency invariance, which is the example above. > > The intention with the patches that made cpufreq based invariance generic > a while back was for it to be present, seamlessly, for as many drivers as > possible, as a less than accurate invariance default method is still > better than nothing. Right. > So only a few drivers today don't support cpufreq based FI Only two AFAICT, both x86, and the AMU stuff doesn't conflict with them. drivers/cpufreq/intel_pstate.c drivers/cpufreq/longrun.c > but it's not a guarantee that it will stay this way. What do you mean by "no guarantee" here ? The very core routines (cpufreq_freq_transition_end() and cpufreq_driver_fast_switch()) of the cpufreq core call arch_set_freq_scale() today and this isn't going to change anytime soon. If something gets changed there someone will need to see other parts of the kernel which may get broken with that. I don't see any need of complicating other parts of the kernel like, amu or cppc code for that. They should be kept simple and they should assume cpufreq invariance will be supported as it is today.
On Friday 19 Feb 2021 at 10:28:23 (+0530), Viresh Kumar wrote: > On 18-02-21, 16:36, Ionela Voinescu wrote: > > Yes, we don't care if there is no cpufreq driver, as the use of AMUs won't > > get initialised either. But we do care if there is a cpufreq driver that > > does not support frequency invariance, which is the example above. > > > > The intention with the patches that made cpufreq based invariance generic > > a while back was for it to be present, seamlessly, for as many drivers as > > possible, as a less than accurate invariance default method is still > > better than nothing. > > Right. > > > So only a few drivers today don't support cpufreq based FI > > Only two AFAICT, both x86, and the AMU stuff doesn't conflict with > them. > > drivers/cpufreq/intel_pstate.c > drivers/cpufreq/longrun.c > > > but it's not a guarantee that it will stay this way. > > What do you mean by "no guarantee" here ? > > The very core routines (cpufreq_freq_transition_end() and > cpufreq_driver_fast_switch()) of the cpufreq core call > arch_set_freq_scale() today and this isn't going to change anytime > soon. If something gets changed there someone will need to see other > parts of the kernel which may get broken with that. > Yes, but it won't really be straightforward to notice this breakage if that happens, so in my opinion it was worth to keep that condition. > I don't see any need of complicating other parts of the kernel like, > amu or cppc code for that. They should be kept simple and they should > assume cpufreq invariance will be supported as it is today. > Fair enough! It is a corner case after all. Thanks, Ionela. > -- > viresh
On 19-02-21, 09:44, Ionela Voinescu wrote: > On Friday 19 Feb 2021 at 10:28:23 (+0530), Viresh Kumar wrote: > > The very core routines (cpufreq_freq_transition_end() and > > cpufreq_driver_fast_switch()) of the cpufreq core call > > arch_set_freq_scale() today and this isn't going to change anytime > > soon. If something gets changed there someone will need to see other > > parts of the kernel which may get broken with that. > > > > Yes, but it won't really be straightforward to notice this breakage if > that happens, so in my opinion it was worth to keep that condition. Right, but chances of that happening are close to zero right now. I don't see any changes being made there in near future and so as we agreed, lets leave it as is. Btw, thanks for your feedback, it was indeed very valuable.
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h index 3b8dca4eb08d..ec2db3419c41 100644 --- a/arch/arm64/include/asm/topology.h +++ b/arch/arm64/include/asm/topology.h @@ -17,17 +17,9 @@ int pcibus_to_node(struct pci_bus *bus); #include <linux/arch_topology.h> void update_freq_counters_refs(void); -void topology_scale_freq_tick(void); - -#ifdef CONFIG_ARM64_AMU_EXTN -/* - * Replace task scheduler's default counter-based - * frequency-invariance scale factor setting. - */ -#define arch_scale_freq_tick topology_scale_freq_tick -#endif /* CONFIG_ARM64_AMU_EXTN */ /* Replace task scheduler's default frequency-invariant accounting */ +#define arch_scale_freq_tick topology_scale_freq_tick #define arch_set_freq_scale topology_set_freq_scale #define arch_scale_freq_capacity topology_get_freq_scale #define arch_scale_freq_invariant topology_scale_freq_invariant diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index e08a4126453a..1e47dfd465f8 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -199,8 +199,44 @@ static int freq_inv_set_max_ratio(int cpu, u64 max_rate, u64 ref_rate) return 0; } -static DEFINE_STATIC_KEY_FALSE(amu_fie_key); -#define amu_freq_invariant() static_branch_unlikely(&amu_fie_key) +static void amu_scale_freq_tick(void) +{ + u64 prev_core_cnt, prev_const_cnt; + u64 core_cnt, const_cnt, scale; + + prev_const_cnt = this_cpu_read(arch_const_cycles_prev); + prev_core_cnt = this_cpu_read(arch_core_cycles_prev); + + update_freq_counters_refs(); + + const_cnt = this_cpu_read(arch_const_cycles_prev); + core_cnt = this_cpu_read(arch_core_cycles_prev); + + if (unlikely(core_cnt <= prev_core_cnt || + const_cnt <= prev_const_cnt)) + return; + + /* + * /\core arch_max_freq_scale + * scale = ------- * -------------------- + * /\const SCHED_CAPACITY_SCALE + * + * See validate_cpu_freq_invariance_counters() for details on + * arch_max_freq_scale and the use of SCHED_CAPACITY_SHIFT. + */ + scale = core_cnt - prev_core_cnt; + scale *= this_cpu_read(arch_max_freq_scale); + scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT, + const_cnt - prev_const_cnt); + + scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE); + this_cpu_write(freq_scale, (unsigned long)scale); +} + +static struct scale_freq_data amu_sfd = { + .source = SCALE_FREQ_SOURCE_ARCH, + .set_freq_scale = amu_scale_freq_tick, +}; static void amu_fie_setup(const struct cpumask *cpus) { @@ -227,7 +263,7 @@ static void amu_fie_setup(const struct cpumask *cpus) if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask)) return; - static_branch_enable(&amu_fie_key); + topology_set_scale_freq_source(&amu_sfd, amu_fie_cpus); pr_debug("CPUs[%*pbl]: counters will be used for FIE.", cpumask_pr_args(cpus)); @@ -283,53 +319,6 @@ static int __init init_amu_fie(void) } core_initcall(init_amu_fie); -bool arch_freq_counters_available(const struct cpumask *cpus) -{ - return amu_freq_invariant() && - cpumask_subset(cpus, amu_fie_cpus); -} - -void topology_scale_freq_tick(void) -{ - u64 prev_core_cnt, prev_const_cnt; - u64 core_cnt, const_cnt, scale; - int cpu = smp_processor_id(); - - if (!amu_freq_invariant()) - return; - - if (!cpumask_test_cpu(cpu, amu_fie_cpus)) - return; - - prev_const_cnt = this_cpu_read(arch_const_cycles_prev); - prev_core_cnt = this_cpu_read(arch_core_cycles_prev); - - update_freq_counters_refs(); - - const_cnt = this_cpu_read(arch_const_cycles_prev); - core_cnt = this_cpu_read(arch_core_cycles_prev); - - if (unlikely(core_cnt <= prev_core_cnt || - const_cnt <= prev_const_cnt)) - return; - - /* - * /\core arch_max_freq_scale - * scale = ------- * -------------------- - * /\const SCHED_CAPACITY_SCALE - * - * See validate_cpu_freq_invariance_counters() for details on - * arch_max_freq_scale and the use of SCHED_CAPACITY_SHIFT. - */ - scale = core_cnt - prev_core_cnt; - scale *= this_cpu_read(arch_max_freq_scale); - scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT, - const_cnt - prev_const_cnt); - - scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE); - this_cpu_write(freq_scale, (unsigned long)scale); -} - #ifdef CONFIG_ACPI_CPPC_LIB #include <acpi/cppc_acpi.h> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index de8587cc119e..20b511949cd8 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -21,17 +21,65 @@ #include <linux/sched.h> #include <linux/smp.h> +static DEFINE_PER_CPU(struct scale_freq_data *, sft_data); +static struct cpumask scale_freq_counters_mask; + +static bool supports_scale_freq_counters(const struct cpumask *cpus) +{ + return cpumask_subset(cpus, &scale_freq_counters_mask); +} + bool topology_scale_freq_invariant(void) { return cpufreq_supports_freq_invariance() || - arch_freq_counters_available(cpu_online_mask); + supports_scale_freq_counters(cpu_online_mask); +} + +void topology_set_scale_freq_source(struct scale_freq_data *data, + const struct cpumask *cpus) +{ + struct scale_freq_data *sfd; + int cpu; + + for_each_cpu(cpu, cpus) { + sfd = per_cpu(sft_data, cpu); + + /* Use ARCH provided counters whenever possible */ + if (!sfd || sfd->source != SCALE_FREQ_SOURCE_ARCH) { + per_cpu(sft_data, cpu) = data; + cpumask_set_cpu(cpu, &scale_freq_counters_mask); + } + } } +EXPORT_SYMBOL_GPL(topology_set_scale_freq_source); -__weak bool arch_freq_counters_available(const struct cpumask *cpus) +void topology_clear_scale_freq_source(enum scale_freq_source source, + const struct cpumask *cpus) { - return false; + struct scale_freq_data *sfd; + int cpu; + + for_each_cpu(cpu, cpus) { + sfd = per_cpu(sft_data, cpu); + + if (sfd && sfd->source == source) { + per_cpu(sft_data, cpu) = NULL; + cpumask_clear_cpu(cpu, &scale_freq_counters_mask); + } + } } +EXPORT_SYMBOL_GPL(topology_clear_scale_freq_source); + +void topology_scale_freq_tick(void) +{ + struct scale_freq_data *sfd = *this_cpu_ptr(&sft_data); + + if (sfd) + sfd->set_freq_scale(); +} + DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE; +EXPORT_SYMBOL_GPL(freq_scale); void topology_set_freq_scale(const struct cpumask *cpus, unsigned long cur_freq, unsigned long max_freq) @@ -47,7 +95,7 @@ void topology_set_freq_scale(const struct cpumask *cpus, unsigned long cur_freq, * want to update the scale factor with information from CPUFREQ. * Instead the scale factor will be updated from arch_scale_freq_tick. */ - if (arch_freq_counters_available(cpus)) + if (supports_scale_freq_counters(cpus)) return; scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq; diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index 0f6cd6b73a61..3bcfba5c21a7 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -34,7 +34,19 @@ void topology_set_freq_scale(const struct cpumask *cpus, unsigned long cur_freq, unsigned long max_freq); bool topology_scale_freq_invariant(void); -bool arch_freq_counters_available(const struct cpumask *cpus); +enum scale_freq_source { + SCALE_FREQ_SOURCE_CPUFREQ = 0, + SCALE_FREQ_SOURCE_ARCH, +}; + +struct scale_freq_data { + enum scale_freq_source source; + void (*set_freq_scale)(void); +}; + +void topology_scale_freq_tick(void); +void topology_set_scale_freq_source(struct scale_freq_data *data, const struct cpumask *cpus); +void topology_clear_scale_freq_source(enum scale_freq_source source, const struct cpumask *cpus); DECLARE_PER_CPU(unsigned long, thermal_pressure);
This patch attempts to make it generic enough so other parts of the kernel can also provide their own implementation of scale_freq_tick() callback, which is called by the scheduler periodically to update the per-cpu freq_scale variable. The implementations now need to provide struct scale_freq_data for the CPUs for which they have hardware counters available, and a callback gets registered for each possible CPU in a per-cpu variable. The arch specific (or ARM AMU) counters are updated to adapt to this and they take the highest priority if they are available, i.e. they will be used instead of CPPC based counters for example. Note that this also defines SCALE_FREQ_SOURCE_CPUFREQ but doesn't use it and it is added to show that cpufreq is also acts as source of information for FIE and will be used by default if no other counters are supported for a platform. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- arch/arm64/include/asm/topology.h | 10 +--- arch/arm64/kernel/topology.c | 89 ++++++++++++++----------------- drivers/base/arch_topology.c | 56 +++++++++++++++++-- include/linux/arch_topology.h | 14 ++++- 4 files changed, 105 insertions(+), 64 deletions(-)