diff mbox series

[2/7] cpufreq: dt: Register an Energy Model

Message ID 20190128165522.31749-3-quentin.perret@arm.com (mailing list archive)
State Changes Requested, archived
Headers show
Series Register an Energy Model for Arm reference platforms | expand

Commit Message

Quentin Perret Jan. 28, 2019, 4:55 p.m. UTC
Now that PM_OPP provides a helper function to estimate the power
consumed by CPUs, make sure to try and register an Energy Model (EM)
from cpufreq-dt, hence ensuring interested subsystems (the task
scheduler, for example) can make use of that information when available.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
---
 drivers/cpufreq/cpufreq-dt.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Matthias Kaehlcke Jan. 28, 2019, 7:36 p.m. UTC | #1
On Mon, Jan 28, 2019 at 04:55:17PM +0000, Quentin Perret wrote:
> Now that PM_OPP provides a helper function to estimate the power
> consumed by CPUs, make sure to try and register an Energy Model (EM)
> from cpufreq-dt, hence ensuring interested subsystems (the task
> scheduler, for example) can make use of that information when available.
> 
> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
> ---
>  drivers/cpufreq/cpufreq-dt.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
> index e58bfcb1169e..7556e07e7a9f 100644
> --- a/drivers/cpufreq/cpufreq-dt.c
> +++ b/drivers/cpufreq/cpufreq-dt.c
> @@ -16,6 +16,7 @@
>  #include <linux/cpu_cooling.h>
>  #include <linux/cpufreq.h>
>  #include <linux/cpumask.h>
> +#include <linux/energy_model.h>
>  #include <linux/err.h>
>  #include <linux/module.h>
>  #include <linux/of.h>
> @@ -152,6 +153,7 @@ static int resources_available(void)
>  
>  static int cpufreq_init(struct cpufreq_policy *policy)
>  {
> +	struct em_data_callback em_cb = EM_DATA_CB(of_dev_pm_opp_get_cpu_power);
>  	struct cpufreq_frequency_table *freq_table;
>  	struct opp_table *opp_table = NULL;
>  	struct private_data *priv;
> @@ -160,7 +162,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
>  	unsigned int transition_latency;
>  	bool fallback = false;
>  	const char *name;
> -	int ret;
> +	int ret, nr_opp;
>  
>  	cpu_dev = get_cpu_device(policy->cpu);
>  	if (!cpu_dev) {
> @@ -237,6 +239,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
>  		ret = -EPROBE_DEFER;
>  		goto out_free_opp;
>  	}
> +	nr_opp = ret;
>  
>  	if (fallback) {
>  		cpumask_setall(policy->cpus);
> @@ -280,6 +283,8 @@ static int cpufreq_init(struct cpufreq_policy *policy)
>  	policy->cpuinfo.transition_latency = transition_latency;
>  	policy->dvfs_possible_from_any_cpu = true;
>  
> +	em_register_perf_domain(policy->cpus, nr_opp, &em_cb);

The perf domain is registered here but not unregistered in exit()
(there is currently no API do do this). When init() is called for the
second (or third, ...) time em_register_perf_domain() is called again,
though it skips the registration. To make things more explicit and not
rely on internal behavior of em_register_perf_domain() you could place
the registration inside an 'if (!em_cpu_get(cpu))' branch.

You should probably check the return value of em_register_perf_domain()
and at least print a warning in case of failure. This would require to
call em_register_perf_domain() only when CONFIG_ENERGY_MODEL=y (the
function returns -EINVAL otherwise).

I think this patch will result in error messages at registration on
platforms that use the cpufreq-dt driver and don't specify
'dynamic-power-coefficient' for the CPUs in the DT. Not sure if that's
a problem as long as the cpufreq initialization succeeds regardless,
it could be seen as a not-so-gentle nudge to add the values.

Cheers

Matthias
Viresh Kumar Jan. 29, 2019, 5:21 a.m. UTC | #2
On 28-01-19, 11:36, Matthias Kaehlcke wrote:
> I think this patch will result in error messages at registration on
> platforms that use the cpufreq-dt driver and don't specify
> 'dynamic-power-coefficient' for the CPUs in the DT. Not sure if that's
> a problem as long as the cpufreq initialization succeeds regardless,
> it could be seen as a not-so-gentle nudge to add the values.

That wouldn't be acceptable.
Viresh Kumar Jan. 29, 2019, 5:30 a.m. UTC | #3
On 28-01-19, 16:55, Quentin Perret wrote:
> Now that PM_OPP provides a helper function to estimate the power
> consumed by CPUs, make sure to try and register an Energy Model (EM)
> from cpufreq-dt, hence ensuring interested subsystems (the task
> scheduler, for example) can make use of that information when available.
> 
> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
> ---
>  drivers/cpufreq/cpufreq-dt.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
> index e58bfcb1169e..7556e07e7a9f 100644
> --- a/drivers/cpufreq/cpufreq-dt.c
> +++ b/drivers/cpufreq/cpufreq-dt.c
> @@ -16,6 +16,7 @@
>  #include <linux/cpu_cooling.h>
>  #include <linux/cpufreq.h>
>  #include <linux/cpumask.h>
> +#include <linux/energy_model.h>
>  #include <linux/err.h>
>  #include <linux/module.h>
>  #include <linux/of.h>
> @@ -152,6 +153,7 @@ static int resources_available(void)
>  
>  static int cpufreq_init(struct cpufreq_policy *policy)
>  {
> +	struct em_data_callback em_cb = EM_DATA_CB(of_dev_pm_opp_get_cpu_power);

Hmm, so all the earlier comments regarding prototype of this routine are invalid
because of this.

>  	struct cpufreq_frequency_table *freq_table;
>  	struct opp_table *opp_table = NULL;
>  	struct private_data *priv;
> @@ -160,7 +162,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
>  	unsigned int transition_latency;
>  	bool fallback = false;
>  	const char *name;
> -	int ret;
> +	int ret, nr_opp;
>  
>  	cpu_dev = get_cpu_device(policy->cpu);
>  	if (!cpu_dev) {
> @@ -237,6 +239,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
>  		ret = -EPROBE_DEFER;
>  		goto out_free_opp;
>  	}
> +	nr_opp = ret;

Instead use nr_opp instead of ret while calling dev_pm_opp_get_opp_count().

>  
>  	if (fallback) {
>  		cpumask_setall(policy->cpus);
> @@ -280,6 +283,8 @@ static int cpufreq_init(struct cpufreq_policy *policy)
>  	policy->cpuinfo.transition_latency = transition_latency;
>  	policy->dvfs_possible_from_any_cpu = true;
>  
> +	em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
> +
>  	return 0;
>  
>  out_free_cpufreq_table:
> -- 
> 2.20.1
Quentin Perret Jan. 29, 2019, 9:10 a.m. UTC | #4
On Tuesday 29 Jan 2019 at 11:00:41 (+0530), Viresh Kumar wrote:
> On 28-01-19, 16:55, Quentin Perret wrote:
> > Now that PM_OPP provides a helper function to estimate the power
> > consumed by CPUs, make sure to try and register an Energy Model (EM)
> > from cpufreq-dt, hence ensuring interested subsystems (the task
> > scheduler, for example) can make use of that information when available.
> > 
> > Signed-off-by: Quentin Perret <quentin.perret@arm.com>
> > ---
> >  drivers/cpufreq/cpufreq-dt.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
> > index e58bfcb1169e..7556e07e7a9f 100644
> > --- a/drivers/cpufreq/cpufreq-dt.c
> > +++ b/drivers/cpufreq/cpufreq-dt.c
> > @@ -16,6 +16,7 @@
> >  #include <linux/cpu_cooling.h>
> >  #include <linux/cpufreq.h>
> >  #include <linux/cpumask.h>
> > +#include <linux/energy_model.h>
> >  #include <linux/err.h>
> >  #include <linux/module.h>
> >  #include <linux/of.h>
> > @@ -152,6 +153,7 @@ static int resources_available(void)
> >  
> >  static int cpufreq_init(struct cpufreq_policy *policy)
> >  {
> > +	struct em_data_callback em_cb = EM_DATA_CB(of_dev_pm_opp_get_cpu_power);
> 
> Hmm, so all the earlier comments regarding prototype of this routine are invalid
> because of this.

Right.

> 
> >  	struct cpufreq_frequency_table *freq_table;
> >  	struct opp_table *opp_table = NULL;
> >  	struct private_data *priv;
> > @@ -160,7 +162,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
> >  	unsigned int transition_latency;
> >  	bool fallback = false;
> >  	const char *name;
> > -	int ret;
> > +	int ret, nr_opp;
> >  
> >  	cpu_dev = get_cpu_device(policy->cpu);
> >  	if (!cpu_dev) {
> > @@ -237,6 +239,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
> >  		ret = -EPROBE_DEFER;
> >  		goto out_free_opp;
> >  	}
> > +	nr_opp = ret;
> 
> Instead use nr_opp instead of ret while calling dev_pm_opp_get_opp_count().

Will do.

> 
> >  
> >  	if (fallback) {
> >  		cpumask_setall(policy->cpus);
> > @@ -280,6 +283,8 @@ static int cpufreq_init(struct cpufreq_policy *policy)
> >  	policy->cpuinfo.transition_latency = transition_latency;
> >  	policy->dvfs_possible_from_any_cpu = true;
> >  
> > +	em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
> > +
> >  	return 0;
> >  
> >  out_free_cpufreq_table:
> > -- 
> > 2.20.1
> 
> -- 
> viresh

Thanks,
Quentin
Quentin Perret Jan. 29, 2019, 9:15 a.m. UTC | #5
On Tuesday 29 Jan 2019 at 10:51:44 (+0530), Viresh Kumar wrote:
> On 28-01-19, 11:36, Matthias Kaehlcke wrote:
> > I think this patch will result in error messages at registration on
> > platforms that use the cpufreq-dt driver and don't specify
> > 'dynamic-power-coefficient' for the CPUs in the DT. Not sure if that's
> > a problem as long as the cpufreq initialization succeeds regardless,
> > it could be seen as a not-so-gentle nudge to add the values.
> 
> That wouldn't be acceptable.

Fair enough. What I can propose in this case is to have in PM_OPP a
helper called 'dev_pm_opp_of_register_em()' or something like this. This
function will check all prerequisites are present (we have the right
values in DT, and so on) and then call (or not) em_register_perf_domain().
Then we can make the CPUFreq drivers use that instead of calling
em_register_perf_domain() directly.

That would also make it easy to implement Matthias' suggestion to not
call em_register_perf_domain() if an EM is already present.

Would that work ?

Thanks,
Quentin
Viresh Kumar Jan. 30, 2019, 5:18 a.m. UTC | #6
On 29-01-19, 09:15, Quentin Perret wrote:
> On Tuesday 29 Jan 2019 at 10:51:44 (+0530), Viresh Kumar wrote:
> > On 28-01-19, 11:36, Matthias Kaehlcke wrote:
> > > I think this patch will result in error messages at registration on
> > > platforms that use the cpufreq-dt driver and don't specify
> > > 'dynamic-power-coefficient' for the CPUs in the DT. Not sure if that's
> > > a problem as long as the cpufreq initialization succeeds regardless,
> > > it could be seen as a not-so-gentle nudge to add the values.
> > 
> > That wouldn't be acceptable.
> 
> Fair enough. What I can propose in this case is to have in PM_OPP a
> helper called 'dev_pm_opp_of_register_em()' or something like this. This
> function will check all prerequisites are present (we have the right
> values in DT, and so on) and then call (or not) em_register_perf_domain().
> Then we can make the CPUFreq drivers use that instead of calling
> em_register_perf_domain() directly.

That should be fine.

> That would also make it easy to implement Matthias' suggestion to not
> call em_register_perf_domain() if an EM is already present.

So you will track registration state within the OPP core for that ?
Sorry but that doesn't sound right. What's wrong with having an
unregister helper in energy-model to keep proper code flow everywhere
?
Quentin Perret Jan. 30, 2019, 9:12 a.m. UTC | #7
Hi Viresh,

On Wednesday 30 Jan 2019 at 10:48:06 (+0530), Viresh Kumar wrote:
> On 29-01-19, 09:15, Quentin Perret wrote:
> > On Tuesday 29 Jan 2019 at 10:51:44 (+0530), Viresh Kumar wrote:
> > > On 28-01-19, 11:36, Matthias Kaehlcke wrote:
> > > > I think this patch will result in error messages at registration on
> > > > platforms that use the cpufreq-dt driver and don't specify
> > > > 'dynamic-power-coefficient' for the CPUs in the DT. Not sure if that's
> > > > a problem as long as the cpufreq initialization succeeds regardless,
> > > > it could be seen as a not-so-gentle nudge to add the values.
> > > 
> > > That wouldn't be acceptable.
> > 
> > Fair enough. What I can propose in this case is to have in PM_OPP a
> > helper called 'dev_pm_opp_of_register_em()' or something like this. This
> > function will check all prerequisites are present (we have the right
> > values in DT, and so on) and then call (or not) em_register_perf_domain().
> > Then we can make the CPUFreq drivers use that instead of calling
> > em_register_perf_domain() directly.
> 
> That should be fine.
> 
> > That would also make it easy to implement Matthias' suggestion to not
> > call em_register_perf_domain() if an EM is already present.
> 
> So you will track registration state within the OPP core for that ?

What I had in mind is something as simple as:

	void of_dev_pm_opp_register_em(struct cpumask *cpus)
	{
		/* Bail out if an EM is there */
		if (em_cpu_get(cpumask_first(cpus)))
			return;

		/* Check prerequisites: dpc coeff in DT, ... */
		...

		em_register_perf_domain(...);
	}

IIUC, Matthias' point was that if the EM is already registered, there is
no good reason to call em_register_perf_domain() again. Now, that should
in fact be harmless because em_register_perf_domain() already does that
check. It's just cleaner and easier to understand from a conceptual
standpoint to not call that function several times for no reason I
assume.

> Sorry but that doesn't sound right. What's wrong with having an
> unregister helper in energy-model to keep proper code flow everywhere
> ?

For the EM we basically allocate the tables in memory once and for all
at boot time and never touch them again. That makes it easy for users
(the scheduler as of now, IPA soon) to access them without messing
around with RCU or so. So there isn't really a concept or unregistering
a pd right now.

Thanks,
Quentin
Viresh Kumar Jan. 30, 2019, 10:17 a.m. UTC | #8
On 30-01-19, 09:12, Quentin Perret wrote:
> What I had in mind is something as simple as:
> 
> 	void of_dev_pm_opp_register_em(struct cpumask *cpus)
> 	{
> 		/* Bail out if an EM is there */
> 		if (em_cpu_get(cpumask_first(cpus)))
> 			return;
> 
> 		/* Check prerequisites: dpc coeff in DT, ... */
> 		...
> 
> 		em_register_perf_domain(...);
> 	}
> 
> IIUC, Matthias' point was that if the EM is already registered, there is
> no good reason to call em_register_perf_domain() again. Now, that should
> in fact be harmless because em_register_perf_domain() already does that
> check. It's just cleaner and easier to understand from a conceptual
> standpoint to not call that function several times for no reason I
> assume.

If there is no good reason to call em_register_perf_domain() several
times, then the same applies to of_dev_pm_opp_register_em() as well,
isn't it ?

This is init code anyway isn't going to run a lot, so I wouldn't
suggest adding any such (duplicate) checks in
of_dev_pm_opp_register_em().
Quentin Perret Jan. 30, 2019, 10:20 a.m. UTC | #9
On Wednesday 30 Jan 2019 at 15:47:15 (+0530), Viresh Kumar wrote:
> On 30-01-19, 09:12, Quentin Perret wrote:
> > What I had in mind is something as simple as:
> > 
> > 	void of_dev_pm_opp_register_em(struct cpumask *cpus)
> > 	{
> > 		/* Bail out if an EM is there */
> > 		if (em_cpu_get(cpumask_first(cpus)))
> > 			return;
> > 
> > 		/* Check prerequisites: dpc coeff in DT, ... */
> > 		...
> > 
> > 		em_register_perf_domain(...);
> > 	}
> > 
> > IIUC, Matthias' point was that if the EM is already registered, there is
> > no good reason to call em_register_perf_domain() again. Now, that should
> > in fact be harmless because em_register_perf_domain() already does that
> > check. It's just cleaner and easier to understand from a conceptual
> > standpoint to not call that function several times for no reason I
> > assume.
> 
> If there is no good reason to call em_register_perf_domain() several
> times, then the same applies to of_dev_pm_opp_register_em() as well,
> isn't it ?

Hrmpf, that is true ... :-)

> This is init code anyway isn't going to run a lot, so I wouldn't
> suggest adding any such (duplicate) checks in
> of_dev_pm_opp_register_em().

Fair enough, I'll post a v2 w/o the check and we'll see if others have
complaints.

Thanks,
Quentin
diff mbox series

Patch

diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index e58bfcb1169e..7556e07e7a9f 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -16,6 +16,7 @@ 
 #include <linux/cpu_cooling.h>
 #include <linux/cpufreq.h>
 #include <linux/cpumask.h>
+#include <linux/energy_model.h>
 #include <linux/err.h>
 #include <linux/module.h>
 #include <linux/of.h>
@@ -152,6 +153,7 @@  static int resources_available(void)
 
 static int cpufreq_init(struct cpufreq_policy *policy)
 {
+	struct em_data_callback em_cb = EM_DATA_CB(of_dev_pm_opp_get_cpu_power);
 	struct cpufreq_frequency_table *freq_table;
 	struct opp_table *opp_table = NULL;
 	struct private_data *priv;
@@ -160,7 +162,7 @@  static int cpufreq_init(struct cpufreq_policy *policy)
 	unsigned int transition_latency;
 	bool fallback = false;
 	const char *name;
-	int ret;
+	int ret, nr_opp;
 
 	cpu_dev = get_cpu_device(policy->cpu);
 	if (!cpu_dev) {
@@ -237,6 +239,7 @@  static int cpufreq_init(struct cpufreq_policy *policy)
 		ret = -EPROBE_DEFER;
 		goto out_free_opp;
 	}
+	nr_opp = ret;
 
 	if (fallback) {
 		cpumask_setall(policy->cpus);
@@ -280,6 +283,8 @@  static int cpufreq_init(struct cpufreq_policy *policy)
 	policy->cpuinfo.transition_latency = transition_latency;
 	policy->dvfs_possible_from_any_cpu = true;
 
+	em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
+
 	return 0;
 
 out_free_cpufreq_table: