diff mbox series

[RFC] PM/OPP: Add disable count

Message ID 20180829224014.70036-1-mka@chromium.org (mailing list archive)
State RFC, archived
Headers show
Series [RFC] PM/OPP: Add disable count | expand

Commit Message

Matthias Kaehlcke Aug. 29, 2018, 10:40 p.m. UTC
The devfreq sub-system uses the OPP API to determine which frequencies are
available. A devfreq cooling device (drivers/thermal/devfreq_cooling.c)
may disable certain OPPs depending on temperature input and the thermal
policy.

The WIP throttler driver (https://lore.kernel.org/patchwork/project/lkml/list/?series=357937)
can also limit the frequency range of devfreq devices. The current
version does this by introducing a function similar to cpufreq's
cpufreq_verify_within_limits(), however devfreq maintainer Chanwoo Choi
wants the throttler to use the OPP interface for limiting the frequency
range.
A devfreq cooling device and the throttler manipulate OPPs independently
from each other, one might enable an OPP, the other disable it. This patch
adds a 'disable count' to the OPP, to allow to track how often the OPP
was disabled without a subsequent opp_enable(). The OPP is considered
disabled when the disable count is > 0 , even directly after a
opp_enable() call.

This patch is an initial draft, intended to get feedback from OPP
maintainers on the general idea.

Suggested-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---

Even though I sent this patch I have reservations about it, some of
general nature. and others in the context of devfreq/throttler/cooling
device:

- possible race conditions, devfreq may end up using a disabled OPP
  (cooling device and throttler run asynchronously to devfreq).
  This is not really a new issue though, I think this can already
  happen today with devfreq and cooling device. It doesn't cause
  problems since currently disabling an OPP doesn't results in disabling
  regulators, clocks or other components

- all OPPs could end up disabled since multiple drivers manipulate OPPs
  independently without any arbitration

- a crashing/buggy driver can leave OPPs disabled forever

- a typical caller would expect a OPP to be enabled after opp_enable(),
  this is not always the case

- added complexity for callers of opp_enable/disable() to keep track
  which OPPs they enabled/disabled previously
---
 drivers/opp/core.c | 9 +++++++++
 drivers/opp/opp.h  | 3 +++
 2 files changed, 12 insertions(+)

Comments

Matthias Kaehlcke Sept. 13, 2018, 7:01 p.m. UTC | #1
Any comments on this?

On Wed, Aug 29, 2018 at 03:40:14PM -0700, Matthias Kaehlcke wrote:
> The devfreq sub-system uses the OPP API to determine which frequencies are
> available. A devfreq cooling device (drivers/thermal/devfreq_cooling.c)
> may disable certain OPPs depending on temperature input and the thermal
> policy.
> 
> The WIP throttler driver (https://lore.kernel.org/patchwork/project/lkml/list/?series=357937)
> can also limit the frequency range of devfreq devices. The current
> version does this by introducing a function similar to cpufreq's
> cpufreq_verify_within_limits(), however devfreq maintainer Chanwoo Choi
> wants the throttler to use the OPP interface for limiting the frequency
> range.
> A devfreq cooling device and the throttler manipulate OPPs independently
> from each other, one might enable an OPP, the other disable it. This patch
> adds a 'disable count' to the OPP, to allow to track how often the OPP
> was disabled without a subsequent opp_enable(). The OPP is considered
> disabled when the disable count is > 0 , even directly after a
> opp_enable() call.
> 
> This patch is an initial draft, intended to get feedback from OPP
> maintainers on the general idea.
> 
> Suggested-by: Chanwoo Choi <cw00.choi@samsung.com>
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> 
> Even though I sent this patch I have reservations about it, some of
> general nature. and others in the context of devfreq/throttler/cooling
> device:
> 
> - possible race conditions, devfreq may end up using a disabled OPP
>   (cooling device and throttler run asynchronously to devfreq).
>   This is not really a new issue though, I think this can already
>   happen today with devfreq and cooling device. It doesn't cause
>   problems since currently disabling an OPP doesn't results in disabling
>   regulators, clocks or other components
> 
> - all OPPs could end up disabled since multiple drivers manipulate OPPs
>   independently without any arbitration
> 
> - a crashing/buggy driver can leave OPPs disabled forever
> 
> - a typical caller would expect a OPP to be enabled after opp_enable(),
>   this is not always the case
> 
> - added complexity for callers of opp_enable/disable() to keep track
>   which OPPs they enabled/disabled previously
> ---
>  drivers/opp/core.c | 9 +++++++++
>  drivers/opp/opp.h  | 3 +++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 31ff03dbeb83..a6d23e8560d2 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -1072,6 +1072,7 @@ int _opp_add(struct device *dev, struct dev_pm_opp *new_opp,
>  
>  	if (!_opp_supported_by_regulators(new_opp, opp_table)) {
>  		new_opp->available = false;
> +		new_opp->disable_count = 1;
>  		dev_warn(dev, "%s: OPP not supported by regulators (%lu)\n",
>  			 __func__, new_opp->rate);
>  	}
> @@ -1592,10 +1593,18 @@ static int _opp_set_availability(struct device *dev, unsigned long freq,
>  		goto unlock;
>  	}
>  
> +	if (availability_req)
> +		opp->disable_count--;
> +	else
> +		opp->disable_count++;
> +
>  	/* Is update really needed? */
>  	if (opp->available == availability_req)
>  		goto unlock;
>  
> +	if (availability_req && opp->disable_count > 0)
> +		goto unlock;
> +
>  	opp->available = availability_req;
>  
>  	dev_pm_opp_get(opp);
> diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
> index 7c540fd063b2..a72acb16c364 100644
> --- a/drivers/opp/opp.h
> +++ b/drivers/opp/opp.h
> @@ -55,6 +55,8 @@ extern struct list_head opp_tables;
>   *		order.
>   * @kref:	for reference count of the OPP.
>   * @available:	true/false - marks if this OPP as available or not
> + * @disable_count: tracks the number of times the OPP has been disabled
> + *              (without re-enabling it).
>   * @dynamic:	not-created from static DT entries.
>   * @turbo:	true if turbo (boost) OPP
>   * @suspend:	true if suspend OPP
> @@ -74,6 +76,7 @@ struct dev_pm_opp {
>  	struct kref kref;
>  
>  	bool available;
> +	unsigned int disable_count;
>  	bool dynamic;
>  	bool turbo;
>  	bool suspend;
Viresh Kumar Oct. 1, 2018, 9:26 a.m. UTC | #2
Hi Matthias,

On 29-08-18, 15:40, Matthias Kaehlcke wrote:
> The devfreq sub-system uses the OPP API to determine which frequencies are
> available. A devfreq cooling device (drivers/thermal/devfreq_cooling.c)
> may disable certain OPPs depending on temperature input and the thermal
> policy.

I looked at it and the way it is doing it currently has certain assumptions.
Like it can enable an already enabled OPP without any side effects, which may
not be true after this patch.

> The WIP throttler driver (https://lore.kernel.org/patchwork/project/lkml/list/?series=357937)
> can also limit the frequency range of devfreq devices. The current
> version does this by introducing a function similar to cpufreq's
> cpufreq_verify_within_limits(), however devfreq maintainer Chanwoo Choi

I don't like this interface and have plans on updating it with something where a
user asks for some constraints and then releases them at a later point of time.
That will make cpufreq and other frameworks manage things in a better way.

> wants the throttler to use the OPP interface for limiting the frequency
> range.

I am not sure if that would be the right approach as well. Maybe something
similar to what I wrote earlier. For example all throttlers can provide min/max
frequencies they support at a certain point of time and the devfreq core
accumulates all such requests and finds the right target frequency. You wouldn't
require to disable OPPs then to make it work.

> A devfreq cooling device and the throttler manipulate OPPs independently
> from each other, one might enable an OPP, the other disable it. This patch
> adds a 'disable count' to the OPP, to allow to track how often the OPP
> was disabled without a subsequent opp_enable(). The OPP is considered
> disabled when the disable count is > 0 , even directly after a
> opp_enable() call.
> 
> This patch is an initial draft, intended to get feedback from OPP
> maintainers on the general idea.
> 
> Suggested-by: Chanwoo Choi <cw00.choi@samsung.com>
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> 
> Even though I sent this patch I have reservations about it, some of
> general nature. and others in the context of devfreq/throttler/cooling
> device:
> 
> - possible race conditions, devfreq may end up using a disabled OPP
>   (cooling device and throttler run asynchronously to devfreq).
>   This is not really a new issue though, I think this can already
>   happen today with devfreq and cooling device. It doesn't cause
>   problems since currently disabling an OPP doesn't results in disabling
>   regulators, clocks or other components
> 
> - all OPPs could end up disabled since multiple drivers manipulate OPPs
>   independently without any arbitration
> 
> - a crashing/buggy driver can leave OPPs disabled forever
> 
> - a typical caller would expect a OPP to be enabled after opp_enable(),
>   this is not always the case
> 
> - added complexity for callers of opp_enable/disable() to keep track
>   which OPPs they enabled/disabled previously

Yeah, its unnecessary complexity on a framework like OPP which should normally
be a helper library to provide data.

> ---
>  drivers/opp/core.c | 9 +++++++++
>  drivers/opp/opp.h  | 3 +++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 31ff03dbeb83..a6d23e8560d2 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -1072,6 +1072,7 @@ int _opp_add(struct device *dev, struct dev_pm_opp *new_opp,
>  
>  	if (!_opp_supported_by_regulators(new_opp, opp_table)) {
>  		new_opp->available = false;
> +		new_opp->disable_count = 1;
>  		dev_warn(dev, "%s: OPP not supported by regulators (%lu)\n",
>  			 __func__, new_opp->rate);
>  	}
> @@ -1592,10 +1593,18 @@ static int _opp_set_availability(struct device *dev, unsigned long freq,
>  		goto unlock;
>  	}
>  
> +	if (availability_req)
> +		opp->disable_count--;

what if disable_count becomes < 0 ? Is that a bug ?

> +	else
> +		opp->disable_count++;
> +
>  	/* Is update really needed? */
>  	if (opp->available == availability_req)
>  		goto unlock;
>  
> +	if (availability_req && opp->disable_count > 0)
> +		goto unlock;
> +
>  	opp->available = availability_req;
>  
>  	dev_pm_opp_get(opp);
> diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
> index 7c540fd063b2..a72acb16c364 100644
> --- a/drivers/opp/opp.h
> +++ b/drivers/opp/opp.h
> @@ -55,6 +55,8 @@ extern struct list_head opp_tables;
>   *		order.
>   * @kref:	for reference count of the OPP.
>   * @available:	true/false - marks if this OPP as available or not
> + * @disable_count: tracks the number of times the OPP has been disabled
> + *              (without re-enabling it).
>   * @dynamic:	not-created from static DT entries.
>   * @turbo:	true if turbo (boost) OPP
>   * @suspend:	true if suspend OPP
> @@ -74,6 +76,7 @@ struct dev_pm_opp {
>  	struct kref kref;
>  
>  	bool available;
> +	unsigned int disable_count;
>  	bool dynamic;
>  	bool turbo;
>  	bool suspend;
> -- 
> 2.19.0.rc0.228.g281dcd1b4d0-goog
Matthias Kaehlcke Oct. 2, 2018, 10:06 p.m. UTC | #3
Hi Viresh,

thanks for your feedback!

Some comments inline

On Mon, Oct 01, 2018 at 02:56:44PM +0530, Viresh Kumar wrote:
> Hi Matthias,
> 
> On 29-08-18, 15:40, Matthias Kaehlcke wrote:
> > The devfreq sub-system uses the OPP API to determine which frequencies are
> > available. A devfreq cooling device (drivers/thermal/devfreq_cooling.c)
> > may disable certain OPPs depending on temperature input and the thermal
> > policy.
> 
> I looked at it and the way it is doing it currently has certain assumptions.
> Like it can enable an already enabled OPP without any side effects, which may
> not be true after this patch.

Yes, devfreq_cooling.c would need to be adapted to keep track of which
OPPs it previously en/disabled, which is one of the reasons why I'm
reluctant about this change.

> > The WIP throttler driver (https://lore.kernel.org/patchwork/project/lkml/list/?series=357937)
> > can also limit the frequency range of devfreq devices. The current
> > version does this by introducing a function similar to cpufreq's
> > cpufreq_verify_within_limits(), however devfreq maintainer Chanwoo Choi
> 
> I don't like this interface and have plans on updating it with something where a
> user asks for some constraints and then releases them at a later point of time.
> That will make cpufreq and other frameworks manage things in a better way.

At first sight this doesn't sound much different, only that cpufreq
(or a meta-framework) retains information about the constraints. Also
the constraints could contain additional information (e.g. it's a
thermal or user defined constraint), which might facilitate the
resolution of possible conflicts.

Do you have a guesstimate on when you might find time for this?

> > wants the throttler to use the OPP interface for limiting the frequency
> > range.
> 
> I am not sure if that would be the right approach as well. Maybe something
> similar to what I wrote earlier. For example all throttlers can provide min/max
> frequencies they support at a certain point of time and the devfreq core
> accumulates all such requests and finds the right target frequency. You wouldn't
> require to disable OPPs then to make it work.

Do you think the same 'constraint framework' could be used for cpufreq
and devfreq? The problem appears to be very similar for both
sub-systems, having a single solution seems desirable unless there are
major caveats.

From what you described I came up with the following draft for a
possible API for a common framework:

enum FreqConstraintType {
     	// constraint from the thermal framework, typically max freq
	FREQ_CONSTRAINT_THERMAL,
	// userspace contraint, min or max freq
	FREQ_CONSTRAINT_USER,
	// constraint to reduce power consumption, typically max freq
	FREQ_CONSTRAINT_POWER,
	// constraint to ensure a certain performance, typically min freq
	FREQ_CONSTRAINT_PERFORMANCE,
};

/*
 * a single contraint
 *
 * since most constraints either impose a max (e.g. thermal) or a min
 * (e.g. performance) it would also be an option to use a single
 * frequency here instead of a pair and derive whether it is min/max from
 * the type (and have USER_MIN and USER_MAX)
 */
struct freq_constraint {
	unsigned long min_freq;
	unsigned long max_freq;
	enum FreqConstraintType type;
};

/* set of constraints for a device/CPU/... */
struct freq_constraint_set {
	struct idr *idr;
};

/* compound constraint after conflict resolution */
struct compound_freq_constraint {
	unsigned long min;
	unsigned long max;
};

// returns an IDR
int freq_constraints_add(struct freq_constraint_set *fcs,
			 enum FreqConstraintType type,
			 unsigned long min_freq, unsigned long max_freq);

int freq_constraints_update(struct freq_constraint_set *fcs, int constraint_id,
			    unsigned long min_freq, unsigned long max_freq);

int freq_constraints_remove(struct freq_constraint_set *fcs, int constraint_id)

// determines the compound constrained, resolves conflicts if necessary
int freq_constraints_get_compound_constraint(struct freq_constraint_set *fcs,
                                             struct compound_freq_constraint *constraint);


For conflict resolution the order of preference could be:

max freq: thermal > user > power
min freq: user > performance

Or in prose:

If userspace sets a max frequency through sysfs that is higher than
the one requested by the thermal framework, the constraint of the
thermal framework is used.

A 'power' constraint is loosely related with thermal constraints, but
its purpose is less to avoid possible damage than to reduce power
consumption (the exact reasons may vary, this would be one of our
real world throttler use cases). If userspace decides to overwrite
a 'power' constraint let them have their way.

A 'performance' constraint could be set by the driver that owns a
devfreq device, to request a minimum frequency to perform a certain
task. If userspace thinks they are smarter allow it and let them deal
with the possible consequences. Not sure if 'performance' constraints
are really needed.

Does this in general seem reasonable and is in line with what you
envision?

> > A devfreq cooling device and the throttler manipulate OPPs independently
> > from each other, one might enable an OPP, the other disable it. This patch
> > adds a 'disable count' to the OPP, to allow to track how often the OPP
> > was disabled without a subsequent opp_enable(). The OPP is considered
> > disabled when the disable count is > 0 , even directly after a
> > opp_enable() call.
> > 
> > This patch is an initial draft, intended to get feedback from OPP
> > maintainers on the general idea.
> > 
> > Suggested-by: Chanwoo Choi <cw00.choi@samsung.com>
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > ---
> > 
> > Even though I sent this patch I have reservations about it, some of
> > general nature. and others in the context of devfreq/throttler/cooling
> > device:
> > 
> > - possible race conditions, devfreq may end up using a disabled OPP
> >   (cooling device and throttler run asynchronously to devfreq).
> >   This is not really a new issue though, I think this can already
> >   happen today with devfreq and cooling device. It doesn't cause
> >   problems since currently disabling an OPP doesn't results in disabling
> >   regulators, clocks or other components
> > 
> > - all OPPs could end up disabled since multiple drivers manipulate OPPs
> >   independently without any arbitration
> > 
> > - a crashing/buggy driver can leave OPPs disabled forever
> > 
> > - a typical caller would expect a OPP to be enabled after opp_enable(),
> >   this is not always the case
> > 
> > - added complexity for callers of opp_enable/disable() to keep track
> >   which OPPs they enabled/disabled previously
> 
> Yeah, its unnecessary complexity on a framework like OPP which should normally
> be a helper library to provide data.
> 
> > ---
> >  drivers/opp/core.c | 9 +++++++++
> >  drivers/opp/opp.h  | 3 +++
> >  2 files changed, 12 insertions(+)
> > 
> > diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> > index 31ff03dbeb83..a6d23e8560d2 100644
> > --- a/drivers/opp/core.c
> > +++ b/drivers/opp/core.c
> > @@ -1072,6 +1072,7 @@ int _opp_add(struct device *dev, struct dev_pm_opp *new_opp,
> >  
> >  	if (!_opp_supported_by_regulators(new_opp, opp_table)) {
> >  		new_opp->available = false;
> > +		new_opp->disable_count = 1;
> >  		dev_warn(dev, "%s: OPP not supported by regulators (%lu)\n",
> >  			 __func__, new_opp->rate);
> >  	}
> > @@ -1592,10 +1593,18 @@ static int _opp_set_availability(struct device *dev, unsigned long freq,
> >  		goto unlock;
> >  	}
> >  
> > +	if (availability_req)
> > +		opp->disable_count--;
> 
> what if disable_count becomes < 0 ? Is that a bug ?

Yes, that would be a bug

Cheers

Matthias
Viresh Kumar Oct. 10, 2018, 7:21 a.m. UTC | #4
On 02-10-18, 15:06, Matthias Kaehlcke wrote:
> On Mon, Oct 01, 2018 at 02:56:44PM +0530, Viresh Kumar wrote:
> > On 29-08-18, 15:40, Matthias Kaehlcke wrote:
> > > The WIP throttler driver (https://lore.kernel.org/patchwork/project/lkml/list/?series=357937)
> > > can also limit the frequency range of devfreq devices. The current
> > > version does this by introducing a function similar to cpufreq's
> > > cpufreq_verify_within_limits(), however devfreq maintainer Chanwoo Choi
> > 
> > I don't like this interface and have plans on updating it with something where a
> > user asks for some constraints and then releases them at a later point of time.
> > That will make cpufreq and other frameworks manage things in a better way.
> 
> At first sight this doesn't sound much different, only that cpufreq
> (or a meta-framework) retains information about the constraints. Also
> the constraints could contain additional information (e.g. it's a
> thermal or user defined constraint), which might facilitate the
> resolution of possible conflicts.
> 
> Do you have a guesstimate on when you might find time for this?

I have lot of pending things to do right now and will easily say I
wouldn't touch it until mid of November :)

> > > wants the throttler to use the OPP interface for limiting the frequency
> > > range.
> > 
> > I am not sure if that would be the right approach as well. Maybe something
> > similar to what I wrote earlier. For example all throttlers can provide min/max
> > frequencies they support at a certain point of time and the devfreq core
> > accumulates all such requests and finds the right target frequency. You wouldn't
> > require to disable OPPs then to make it work.
> 
> Do you think the same 'constraint framework' could be used for cpufreq
> and devfreq? The problem appears to be very similar for both
> sub-systems, having a single solution seems desirable unless there are
> major caveats.

I wasn't thinking of writing a separate layer for this but to put that
as part of cpufreq core itself. But now that we have more than one
user for the same, I don't see why shouldn't we write something based
on your stuff.

> >From what you described I came up with the following draft for a
> possible API for a common framework:
> 
> enum FreqConstraintType {
>      	// constraint from the thermal framework, typically max freq

We may actually need a min here. There are discussions around cases
where the hardware works in very cold places and they may need to run
the CPUs at a higher frequencies to make it hot enough to work
properly :)

> 	FREQ_CONSTRAINT_THERMAL,
> 	// userspace contraint, min or max freq
> 	FREQ_CONSTRAINT_USER,
> 	// constraint to reduce power consumption, typically max freq
> 	FREQ_CONSTRAINT_POWER,
> 	// constraint to ensure a certain performance, typically min freq
> 	FREQ_CONSTRAINT_PERFORMANCE,
> };
> 
> /*
>  * a single contraint
>  *
>  * since most constraints either impose a max (e.g. thermal) or a min
>  * (e.g. performance) it would also be an option to use a single
>  * frequency here instead of a pair and derive whether it is min/max from
>  * the type (and have USER_MIN and USER_MAX)
>  */
> struct freq_constraint {
> 	unsigned long min_freq;
> 	unsigned long max_freq;
> 	enum FreqConstraintType type;
> };
> 
> /* set of constraints for a device/CPU/... */
> struct freq_constraint_set {
> 	struct idr *idr;
> };
> 
> /* compound constraint after conflict resolution */
> struct compound_freq_constraint {
> 	unsigned long min;
> 	unsigned long max;
> };
> 
> // returns an IDR
> int freq_constraints_add(struct freq_constraint_set *fcs,
> 			 enum FreqConstraintType type,
> 			 unsigned long min_freq, unsigned long max_freq);
> 
> int freq_constraints_update(struct freq_constraint_set *fcs, int constraint_id,
> 			    unsigned long min_freq, unsigned long max_freq);
> 
> int freq_constraints_remove(struct freq_constraint_set *fcs, int constraint_id)
> 
> // determines the compound constrained, resolves conflicts if necessary
> int freq_constraints_get_compound_constraint(struct freq_constraint_set *fcs,
>                                              struct compound_freq_constraint *constraint);
> 
> 
> For conflict resolution the order of preference could be:
> 
> max freq: thermal > user > power
> min freq: user > performance
> 
> Or in prose:
> 
> If userspace sets a max frequency through sysfs that is higher than
> the one requested by the thermal framework, the constraint of the
> thermal framework is used.
> 
> A 'power' constraint is loosely related with thermal constraints, but
> its purpose is less to avoid possible damage than to reduce power
> consumption (the exact reasons may vary, this would be one of our
> real world throttler use cases). If userspace decides to overwrite
> a 'power' constraint let them have their way.
> 
> A 'performance' constraint could be set by the driver that owns a
> devfreq device, to request a minimum frequency to perform a certain
> task. If userspace thinks they are smarter allow it and let them deal
> with the possible consequences. Not sure if 'performance' constraints
> are really needed.
> 
> Does this in general seem reasonable and is in line with what you
> envision?

It does.

If you are willing to do this stuff, maybe just send patches for this
and I can help reviewing it.
Matthias Kaehlcke Oct. 12, 2018, 7:23 p.m. UTC | #5
On Wed, Oct 10, 2018 at 12:51:51PM +0530, Viresh Kumar wrote:
> On 02-10-18, 15:06, Matthias Kaehlcke wrote:
> > On Mon, Oct 01, 2018 at 02:56:44PM +0530, Viresh Kumar wrote:
> > > On 29-08-18, 15:40, Matthias Kaehlcke wrote:
> > > > The WIP throttler driver (https://lore.kernel.org/patchwork/project/lkml/list/?series=357937)
> > > > can also limit the frequency range of devfreq devices. The current
> > > > version does this by introducing a function similar to cpufreq's
> > > > cpufreq_verify_within_limits(), however devfreq maintainer Chanwoo Choi
> > > 
> > > I don't like this interface and have plans on updating it with something where a
> > > user asks for some constraints and then releases them at a later point of time.
> > > That will make cpufreq and other frameworks manage things in a better way.
> > 
> > At first sight this doesn't sound much different, only that cpufreq
> > (or a meta-framework) retains information about the constraints. Also
> > the constraints could contain additional information (e.g. it's a
> > thermal or user defined constraint), which might facilitate the
> > resolution of possible conflicts.
> > 
> > Do you have a guesstimate on when you might find time for this?
> 
> I have lot of pending things to do right now and will easily say I
> wouldn't touch it until mid of November :)
> 
> > > > wants the throttler to use the OPP interface for limiting the frequency
> > > > range.
> > > 
> > > I am not sure if that would be the right approach as well. Maybe something
> > > similar to what I wrote earlier. For example all throttlers can provide min/max
> > > frequencies they support at a certain point of time and the devfreq core
> > > accumulates all such requests and finds the right target frequency. You wouldn't
> > > require to disable OPPs then to make it work.
> > 
> > Do you think the same 'constraint framework' could be used for cpufreq
> > and devfreq? The problem appears to be very similar for both
> > sub-systems, having a single solution seems desirable unless there are
> > major caveats.
> 
> I wasn't thinking of writing a separate layer for this but to put that
> as part of cpufreq core itself. But now that we have more than one
> user for the same, I don't see why shouldn't we write something based
> on your stuff.
> 
> > >From what you described I came up with the following draft for a
> > possible API for a common framework:
> > 
> > enum FreqConstraintType {
> >      	// constraint from the thermal framework, typically max freq
> 
> We may actually need a min here. There are discussions around cases
> where the hardware works in very cold places and they may need to run
> the CPUs at a higher frequencies to make it hot enough to work
> properly :)

Interesting, thanks for the heads up!

> > 	FREQ_CONSTRAINT_THERMAL,
> > 	// userspace contraint, min or max freq
> > 	FREQ_CONSTRAINT_USER,
> > 	// constraint to reduce power consumption, typically max freq
> > 	FREQ_CONSTRAINT_POWER,
> > 	// constraint to ensure a certain performance, typically min freq
> > 	FREQ_CONSTRAINT_PERFORMANCE,
> > };
> > 
> > /*
> >  * a single contraint
> >  *
> >  * since most constraints either impose a max (e.g. thermal) or a min
> >  * (e.g. performance) it would also be an option to use a single
> >  * frequency here instead of a pair and derive whether it is min/max from
> >  * the type (and have USER_MIN and USER_MAX)
> >  */
> > struct freq_constraint {
> > 	unsigned long min_freq;
> > 	unsigned long max_freq;
> > 	enum FreqConstraintType type;
> > };
> > 
> > /* set of constraints for a device/CPU/... */
> > struct freq_constraint_set {
> > 	struct idr *idr;
> > };
> > 
> > /* compound constraint after conflict resolution */
> > struct compound_freq_constraint {
> > 	unsigned long min;
> > 	unsigned long max;
> > };
> > 
> > // returns an IDR
> > int freq_constraints_add(struct freq_constraint_set *fcs,
> > 			 enum FreqConstraintType type,
> > 			 unsigned long min_freq, unsigned long max_freq);
> > 
> > int freq_constraints_update(struct freq_constraint_set *fcs, int constraint_id,
> > 			    unsigned long min_freq, unsigned long max_freq);
> > 
> > int freq_constraints_remove(struct freq_constraint_set *fcs, int constraint_id)
> > 
> > // determines the compound constrained, resolves conflicts if necessary
> > int freq_constraints_get_compound_constraint(struct freq_constraint_set *fcs,
> >                                              struct compound_freq_constraint *constraint);
> > 
> > 
> > For conflict resolution the order of preference could be:
> > 
> > max freq: thermal > user > power
> > min freq: user > performance
> > 
> > Or in prose:
> > 
> > If userspace sets a max frequency through sysfs that is higher than
> > the one requested by the thermal framework, the constraint of the
> > thermal framework is used.
> > 
> > A 'power' constraint is loosely related with thermal constraints, but
> > its purpose is less to avoid possible damage than to reduce power
> > consumption (the exact reasons may vary, this would be one of our
> > real world throttler use cases). If userspace decides to overwrite
> > a 'power' constraint let them have their way.
> > 
> > A 'performance' constraint could be set by the driver that owns a
> > devfreq device, to request a minimum frequency to perform a certain
> > task. If userspace thinks they are smarter allow it and let them deal
> > with the possible consequences. Not sure if 'performance' constraints
> > are really needed.
> > 
> > Does this in general seem reasonable and is in line with what you
> > envision?
> 
> It does.

Thanks for your feedback!

> If you are willing to do this stuff, maybe just send patches for this
> and I can help reviewing it.

Ok, I'll try to find some time for it, might take a bit though.

Cheers

Matthias
diff mbox series

Patch

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 31ff03dbeb83..a6d23e8560d2 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1072,6 +1072,7 @@  int _opp_add(struct device *dev, struct dev_pm_opp *new_opp,
 
 	if (!_opp_supported_by_regulators(new_opp, opp_table)) {
 		new_opp->available = false;
+		new_opp->disable_count = 1;
 		dev_warn(dev, "%s: OPP not supported by regulators (%lu)\n",
 			 __func__, new_opp->rate);
 	}
@@ -1592,10 +1593,18 @@  static int _opp_set_availability(struct device *dev, unsigned long freq,
 		goto unlock;
 	}
 
+	if (availability_req)
+		opp->disable_count--;
+	else
+		opp->disable_count++;
+
 	/* Is update really needed? */
 	if (opp->available == availability_req)
 		goto unlock;
 
+	if (availability_req && opp->disable_count > 0)
+		goto unlock;
+
 	opp->available = availability_req;
 
 	dev_pm_opp_get(opp);
diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
index 7c540fd063b2..a72acb16c364 100644
--- a/drivers/opp/opp.h
+++ b/drivers/opp/opp.h
@@ -55,6 +55,8 @@  extern struct list_head opp_tables;
  *		order.
  * @kref:	for reference count of the OPP.
  * @available:	true/false - marks if this OPP as available or not
+ * @disable_count: tracks the number of times the OPP has been disabled
+ *              (without re-enabling it).
  * @dynamic:	not-created from static DT entries.
  * @turbo:	true if turbo (boost) OPP
  * @suspend:	true if suspend OPP
@@ -74,6 +76,7 @@  struct dev_pm_opp {
 	struct kref kref;
 
 	bool available;
+	unsigned int disable_count;
 	bool dynamic;
 	bool turbo;
 	bool suspend;