diff mbox series

[1/8] cpufreq: allow drivers to flag custom support for freq invariance

Message ID 20200701090751.7543-2-ionela.voinescu@arm.com (mailing list archive)
State Changes Requested, archived
Headers show
Series cpufreq: improve frequency invariance support | expand

Commit Message

Ionela Voinescu July 1, 2020, 9:07 a.m. UTC
The scheduler's Frequency Invariance Engine (FIE) is providing a
frequency scale correction factor that helps achieve more accurate
load-tracking by conveying information about the currently selected
frequency relative to the maximum supported frequency of a CPU.

In some cases this is achieved by passing information from cpufreq
drivers about the frequency selection done by cpufreq.

Given that most drivers follow a similar process of selecting and
setting of frequency, there is a strong case for moving the setting
of the frequency scale factor from the cpufreq drivers frequency
switch callbacks (target_index() and fast_switch()), to the cpufreq
core functions that call them.

In preparation for this, acknowledge that there are still drivers
who's frequency setting process is custom and therefore these drivers
will want to provide and flag custom support for the setting of the
scheduler's frequency invariance (FI) scale factor as well. Prepare
for this by introducing a new flag: CPUFREQ_CUSTOM_SET_FREQ_SCALE.

Examples of users of this flag are:
 - drivers that do not implement the callbacks that lend themselves
   to triggering the setting of the FI scale factor,
 - drivers that implement the appropriate callbacks but which have
   an atypical implementation.

Currently, given that all drivers call arch_set_freq_scale() directly,
flag all users with CPUFREQ_CUSTOM_SET_FREQ_SCALE. These driver changes
are also useful to maintain bisection between the FI switch from the
drivers to the core.

Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq-dt.c           |  3 ++-
 drivers/cpufreq/qcom-cpufreq-hw.c      |  3 ++-
 drivers/cpufreq/scmi-cpufreq.c         |  3 ++-
 drivers/cpufreq/scpi-cpufreq.c         |  3 ++-
 drivers/cpufreq/vexpress-spc-cpufreq.c |  3 ++-
 include/linux/cpufreq.h                | 10 +++++++++-
 6 files changed, 19 insertions(+), 6 deletions(-)

Comments

Viresh Kumar July 1, 2020, 10:46 a.m. UTC | #1
On 01-07-20, 10:07, Ionela Voinescu wrote:
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 3494f6763597..42668588f9f8 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name)
>  
>  struct cpufreq_driver {
>  	char		name[CPUFREQ_NAME_LEN];
> -	u8		flags;
> +	u16		flags;

Lets make it u32.

>  	void		*driver_data;
>  
>  	/* needed by all drivers */
> @@ -417,6 +417,14 @@ struct cpufreq_driver {
>   */
>  #define CPUFREQ_IS_COOLING_DEV			BIT(7)
>  
> +/*
> + * Set by drivers which implement the necessary calls to the scheduler's
> + * frequency invariance engine. The use of this flag will result in the
> + * default arch_set_freq_scale calls being skipped in favour of custom
> + * driver calls.
> + */
> +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE		BIT(8)

I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and
functionality. We need to give drivers a choice if they do not want
the core to do it on their behalf, because they are doing it on their
own or they don't want to do it.
Ionela Voinescu July 1, 2020, 1:33 p.m. UTC | #2
Hi,

Thank you for taking a look over these so quickly.

On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
> On 01-07-20, 10:07, Ionela Voinescu wrote:
> > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > index 3494f6763597..42668588f9f8 100644
> > --- a/include/linux/cpufreq.h
> > +++ b/include/linux/cpufreq.h
> > @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name)
> >  
> >  struct cpufreq_driver {
> >  	char		name[CPUFREQ_NAME_LEN];
> > -	u8		flags;
> > +	u16		flags;
> 
> Lets make it u32.
> 
> >  	void		*driver_data;
> >  
> >  	/* needed by all drivers */
> > @@ -417,6 +417,14 @@ struct cpufreq_driver {
> >   */
> >  #define CPUFREQ_IS_COOLING_DEV			BIT(7)
> >  
> > +/*
> > + * Set by drivers which implement the necessary calls to the scheduler's
> > + * frequency invariance engine. The use of this flag will result in the
> > + * default arch_set_freq_scale calls being skipped in favour of custom
> > + * driver calls.
> > + */
> > +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE		BIT(8)
> 
> I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and
> functionality. We need to give drivers a choice if they do not want
> the core to do it on their behalf, because they are doing it on their
> own or they don't want to do it.
> 

In this case we would not be able to tell if cpufreq (driver or core)
can provide the frequency scale factor, so we would not be able to tell
if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE
would be set if either:
 - the driver calls arch_set_freq_scale() on its own
 - the driver does not want arch_set_freq_scale() to be called.

So at the core level we would not be able to distinguish between the
two, and return whether cpufreq-based invariance is supported.

I don't really see a reason why a driver would not want to set the
frequency scale factor, if it has the proper mechanisms to do so
(therefore excluding the exceptions mentioned in 2/8). I think the
cpufreq core or drivers should produce the information (set the scale
factor) and it should be up to the users to decide whether to use it or
not. But being invariant should always be the default.

Therefore, there are a few reasons I went for
CPUFREQ_CUSTOM_SET_FREQ_SCALE instead:
 - It tells us if the driver has custom mechanisms to set the scale
   factor to filter the setting in cpufreq core and to inform the
   core on whether the system is frequency invariant.
 - It does have a user in the vexpress-spc driver.
 - Currently there aren't drivers that could but choose not to set
   the frequency scale factor, and it my opinion this should not be
   the case.

Thanks,
Ionela.

> -- 
> viresh
Rafael J. Wysocki July 1, 2020, 4:05 p.m. UTC | #3
On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <ionela.voinescu@arm.com> wrote:
>
> Hi,
>
> Thank you for taking a look over these so quickly.
>
> On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
> > On 01-07-20, 10:07, Ionela Voinescu wrote:
> > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > > index 3494f6763597..42668588f9f8 100644
> > > --- a/include/linux/cpufreq.h
> > > +++ b/include/linux/cpufreq.h
> > > @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name)
> > >
> > >  struct cpufreq_driver {
> > >     char            name[CPUFREQ_NAME_LEN];
> > > -   u8              flags;
> > > +   u16             flags;
> >
> > Lets make it u32.
> >
> > >     void            *driver_data;
> > >
> > >     /* needed by all drivers */
> > > @@ -417,6 +417,14 @@ struct cpufreq_driver {
> > >   */
> > >  #define CPUFREQ_IS_COOLING_DEV                     BIT(7)
> > >
> > > +/*
> > > + * Set by drivers which implement the necessary calls to the scheduler's
> > > + * frequency invariance engine. The use of this flag will result in the
> > > + * default arch_set_freq_scale calls being skipped in favour of custom
> > > + * driver calls.
> > > + */
> > > +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE              BIT(8)
> >
> > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and
> > functionality. We need to give drivers a choice if they do not want
> > the core to do it on their behalf, because they are doing it on their
> > own or they don't want to do it.

Well, this would go backwards to me, as we seem to be designing an
opt-out flag for something that's not even implemented already.

I would go for an opt-in instead.  That would be much cleaner and less
prone to regressions IMO.

>
> In this case we would not be able to tell if cpufreq (driver or core)
> can provide the frequency scale factor, so we would not be able to tell
> if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE
> would be set if either:
>  - the driver calls arch_set_freq_scale() on its own
>  - the driver does not want arch_set_freq_scale() to be called.
>
> So at the core level we would not be able to distinguish between the
> two, and return whether cpufreq-based invariance is supported.
>
> I don't really see a reason why a driver would not want to set the
> frequency scale factor, if it has the proper mechanisms to do so
> (therefore excluding the exceptions mentioned in 2/8). I think the
> cpufreq core or drivers should produce the information (set the scale
> factor) and it should be up to the users to decide whether to use it or
> not. But being invariant should always be the default.

So instead of what is being introduced by this patch, there should be
an opt-in mechanism for drivers to tell the core to do the freq-scale
factor setting on behalf of the driver.

Then, the driver would be responsible to only opt-in for that if it
knows it for a fact that the sched tick doesn't set the freq-scale
factor.

> Therefore, there are a few reasons I went for
> CPUFREQ_CUSTOM_SET_FREQ_SCALE instead:
>  - It tells us if the driver has custom mechanisms to set the scale
>    factor to filter the setting in cpufreq core and to inform the
>    core on whether the system is frequency invariant.
>  - It does have a user in the vexpress-spc driver.
>  - Currently there aren't drivers that could but choose not to set
>    the frequency scale factor, and it my opinion this should not be
>    the case.

Well, that depends on what you mean by "could".

For example, it doesn't really make sense to set the freq-scale factor
in either the ACPI cpufreq driver or intel_pstate, because the
frequency (or P-state to be precise) requested by them may not be the
one the CPU ends up running at and even so it may change at any time
for various reasons (eg. in the turbo range).  However, the ACPI
cpufreq driver as well as intel_pstate in the passive mode both set
policy->cur, so that might be used for setting the freq-scale factor
in principle, but that freq-scale factor may not be very useful in
practice.

Thanks!
Ionela Voinescu July 1, 2020, 6:06 p.m. UTC | #4
Hi Rafael,

Thank you for the review!

On Wednesday 01 Jul 2020 at 18:05:33 (+0200), Rafael J. Wysocki wrote:
> On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <ionela.voinescu@arm.com> wrote:
> >
> > Hi,
> >
> > Thank you for taking a look over these so quickly.
> >
> > On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
> > > On 01-07-20, 10:07, Ionela Voinescu wrote:
> > > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > > > index 3494f6763597..42668588f9f8 100644
> > > > --- a/include/linux/cpufreq.h
> > > > +++ b/include/linux/cpufreq.h
> > > > @@ -293,7 +293,7 @@ __ATTR(_name, 0644, show_##_name, store_##_name)
> > > >
> > > >  struct cpufreq_driver {
> > > >     char            name[CPUFREQ_NAME_LEN];
> > > > -   u8              flags;
> > > > +   u16             flags;
> > >
> > > Lets make it u32.
> > >
> > > >     void            *driver_data;
> > > >
> > > >     /* needed by all drivers */
> > > > @@ -417,6 +417,14 @@ struct cpufreq_driver {
> > > >   */
> > > >  #define CPUFREQ_IS_COOLING_DEV                     BIT(7)
> > > >
> > > > +/*
> > > > + * Set by drivers which implement the necessary calls to the scheduler's
> > > > + * frequency invariance engine. The use of this flag will result in the
> > > > + * default arch_set_freq_scale calls being skipped in favour of custom
> > > > + * driver calls.
> > > > + */
> > > > +#define CPUFREQ_CUSTOM_SET_FREQ_SCALE              BIT(8)
> > >
> > > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and
> > > functionality. We need to give drivers a choice if they do not want
> > > the core to do it on their behalf, because they are doing it on their
> > > own or they don't want to do it.
> 
> Well, this would go backwards to me, as we seem to be designing an
> opt-out flag for something that's not even implemented already.
> 
> I would go for an opt-in instead.  That would be much cleaner and less
> prone to regressions IMO.
> 
> >
> > In this case we would not be able to tell if cpufreq (driver or core)
> > can provide the frequency scale factor, so we would not be able to tell
> > if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE
> > would be set if either:
> >  - the driver calls arch_set_freq_scale() on its own
> >  - the driver does not want arch_set_freq_scale() to be called.
> >
> > So at the core level we would not be able to distinguish between the
> > two, and return whether cpufreq-based invariance is supported.
> >
> > I don't really see a reason why a driver would not want to set the
> > frequency scale factor, if it has the proper mechanisms to do so
> > (therefore excluding the exceptions mentioned in 2/8). I think the
> > cpufreq core or drivers should produce the information (set the scale
> > factor) and it should be up to the users to decide whether to use it or
> > not. But being invariant should always be the default.
> 
> So instead of what is being introduced by this patch, there should be
> an opt-in mechanism for drivers to tell the core to do the freq-scale
> factor setting on behalf of the driver.
> 


This could work better as it covers the following scenarios:
 - All the drivers in patch 3/8 would just use the flag to inform the
   the core that it can call arch_set_freq_scale() on their behalf.
 - It being omitted truly conveys the message that cpufreq information
   should not be used for frequency invariance, no matter the
   implementation of arch_set_freq_scale() (more details below)

The only case that it does not cover is is the scenario in patch 4/8:
one in which the driver is atypical and it needs its own calls to
arch_set_freq_scale(), while it still wants to be able to report support
for frequency invariance through cpufreq_sets_freq_scale() and later
arch_scale_freq_invariant(). But the jury is still out on whether that
part of the vexpress-spc driver should be given that much consideration.

My choice of flag was considering this case and potentially other future
ones like it, but this alternative also sounds good to me.


> Then, the driver would be responsible to only opt-in for that if it
> knows it for a fact that the sched tick doesn't set the freq-scale
> factor.
> 

I think that would create a tight coupling between the driver and the
architecture, when arch_set_freq_scale() is already meant to have the
same purpose, but it also provides some flexibility. Let me expand on
this below.

> > Therefore, there are a few reasons I went for
> > CPUFREQ_CUSTOM_SET_FREQ_SCALE instead:
> >  - It tells us if the driver has custom mechanisms to set the scale
> >    factor to filter the setting in cpufreq core and to inform the
> >    core on whether the system is frequency invariant.
> >  - It does have a user in the vexpress-spc driver.
> >  - Currently there aren't drivers that could but choose not to set
> >    the frequency scale factor, and it my opinion this should not be
> >    the case.
> 
> Well, that depends on what you mean by "could".
> 
> For example, it doesn't really make sense to set the freq-scale factor
> in either the ACPI cpufreq driver or intel_pstate, because the
> frequency (or P-state to be precise) requested by them may not be the
> one the CPU ends up running at and even so it may change at any time
> for various reasons (eg. in the turbo range).  However, the ACPI
> cpufreq driver as well as intel_pstate in the passive mode both set
> policy->cur, so that might be used for setting the freq-scale factor
> in principle, but that freq-scale factor may not be very useful in
> practice.
> 

Yes, this completely makes sense, and if there are more accurate methods
of obtaining information about the current performance level, by using
counters for example, they should definitely be used.

But in my opinion it should not be up to the driver to choose between
the methods. The driver and core would only have some information on the
current performance level (more or less accurate) and
arch_set_freq_scale() is called to *potentially* use it to set the scale
factor. So the use of policy->cur would be entirely dependent on the
implementation of arch_set_freq_scale().

There could be a few scenarios here:
 - arch_set_freq_scale() is left to its weak default that does nothing
   (which would be the case for when the ACPI cpufreq driver or
   intel_psate are used)
 - arch_set_freq_scale() is implemented in such a way that takes into
   account the presence of a counter-based method of setting the scale
   factor and makes that take precedence (currently done for the users
   of the arch_topology driver). This also provides support for platforms
   that have partial support for counters, where the use of cpufreq
   information is still useful for the CPUs that don't support counters.
   For those cases, some information, although not entirely accurate,
   is still better than no information at all.

So I believe cpufreq should just provide the information, if it can,
and let the user decide whether to use it, or what source of information
takes precedence. Therefore, arch_set_freq_scale() would decide to
whether to filter it out.

In any case, your suggestion regarding the choice of flag would make
bypassing the use of cpufreq information in setting the scale factor
explicit, no matter the definition of arch_set_freq_scale(). But it
would also require writers of cpufreq driver code to remember to
consider the setting of that flag.

I'll consider this more while gauging interest in 4/8.

Many thanks,
Ionela.

> Thanks!
Viresh Kumar July 2, 2020, 2:58 a.m. UTC | #5
On 01-07-20, 18:05, Rafael J. Wysocki wrote:
> On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <ionela.voinescu@arm.com> wrote:
> > On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
> > > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and
> > > functionality. We need to give drivers a choice if they do not want
> > > the core to do it on their behalf, because they are doing it on their
> > > own or they don't want to do it.
> 
> Well, this would go backwards to me, as we seem to be designing an
> opt-out flag for something that's not even implemented already.
> 
> I would go for an opt-in instead.  That would be much cleaner and less
> prone to regressions IMO.

That's fine, I just wanted an option for drivers to opt-out of this
thing. I felt okay with the opt-out flag as this should be enabled for
most of the drivers and so enabling by default looked okay as well.

> > In this case we would not be able to tell if cpufreq (driver or core)
> > can provide the frequency scale factor, so we would not be able to tell
> > if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE

That is easy to fix. Let the drivers call
enable_cpufreq_freq_invariance() and set the flag.

> > would be set if either:
> >  - the driver calls arch_set_freq_scale() on its own
> >  - the driver does not want arch_set_freq_scale() to be called.
> >
> > So at the core level we would not be able to distinguish between the
> > two, and return whether cpufreq-based invariance is supported.
> >
> > I don't really see a reason why a driver would not want to set the
> > frequency scale factor

A simple case where the driver doesn't have any idea what the real
freq of the CPU is and it doesn't have counters to guess it as well.

There can be other reasons which we aren't able to imagine at this
point of time.
Ionela Voinescu July 2, 2020, 11:44 a.m. UTC | #6
Hi,

On Thursday 02 Jul 2020 at 08:28:18 (+0530), Viresh Kumar wrote:
> On 01-07-20, 18:05, Rafael J. Wysocki wrote:
> > On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <ionela.voinescu@arm.com> wrote:
> > > On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
> > > > I will rather suggest CPUFREQ_SKIP_SET_FREQ_SCALE as the name and
> > > > functionality. We need to give drivers a choice if they do not want
> > > > the core to do it on their behalf, because they are doing it on their
> > > > own or they don't want to do it.
> > 
> > Well, this would go backwards to me, as we seem to be designing an
> > opt-out flag for something that's not even implemented already.
> > 
> > I would go for an opt-in instead.  That would be much cleaner and less
> > prone to regressions IMO.
> 
> That's fine, I just wanted an option for drivers to opt-out of this
> thing. I felt okay with the opt-out flag as this should be enabled for
> most of the drivers and so enabling by default looked okay as well.
> 
> > > In this case we would not be able to tell if cpufreq (driver or core)
> > > can provide the frequency scale factor, so we would not be able to tell
> > > if the system is really frequency invariant; CPUFREQ_SKIP_SET_FREQ_SCALE
> 
> That is easy to fix. Let the drivers call
> enable_cpufreq_freq_invariance() and set the flag.
> 

Right! I suppose part of "the dream" :) was for drivers to be ignorant of
frequency invariance, and for the core to figure out if it has proper
information to potentially* pass to the scheduler.

*potentially = depending on the arch_set_freq_scale() definition.

> > > would be set if either:
> > >  - the driver calls arch_set_freq_scale() on its own
> > >  - the driver does not want arch_set_freq_scale() to be called.
> > >
> > > So at the core level we would not be able to distinguish between the
> > > two, and return whether cpufreq-based invariance is supported.
> > >
> > > I don't really see a reason why a driver would not want to set the
> > > frequency scale factor
> 
> A simple case where the driver doesn't have any idea what the real
> freq 

For me, this would have been filtered by either the type of callback
they use (target_index(), fast_switch() and even target() would offer
some close to accurate indication of the current frequency, while
setpolicy() it obviously targets a range of frequencies) or by the
definition of arch_set_freq_scale().

> ..of the CPU is and it doesn't have counters to guess it as well.
> 
> There can be other reasons which we aren't able to imagine at this
> point of time.
> 

But I understand both the points you and Rafael raised so it's obvious
that a 'opt in' flag would be the better option.

Thank you both,
Ionela.

> -- 
> viresh
Dietmar Eggemann July 6, 2020, 12:14 p.m. UTC | #7
On 02/07/2020 13:44, Ionela Voinescu wrote:
> Hi,
> 
> On Thursday 02 Jul 2020 at 08:28:18 (+0530), Viresh Kumar wrote:
>> On 01-07-20, 18:05, Rafael J. Wysocki wrote:
>>> On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <ionela.voinescu@arm.com> wrote:
>>>> On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:

[...]

>> There can be other reasons which we aren't able to imagine at this
>> point of time.
>>
> 
> But I understand both the points you and Rafael raised so it's obvious
> that a 'opt in' flag would be the better option.

Why can't we just move the arch_set_freq_scale() call from cpufreq
driver to cpufreq core w/o introducing a FIE related driver flag?

Current scenario for Frequency Invariance Engine (FIE) on arm/arm64.

+------------------------------+       +------------------------------+
|                              |       |                              |
| cpufreq core:                |       | arch: (arm, arm64)           |

|                              |       |                              |
| weak arch_set_freq_scale() {}|       |                              |
|                              |       |                              |
+------------------------------+       |                              |
                                       |                              |
+------------------------------+       |                              |
|                              |       |                              |
| cpufreq driver:              |       |                              |
|                            +-----------> arch_set_freq_scale()      |
|                              |       |   {                          |
+------------------------------+       |      if (use counters)       |
                                       |        return;               |
+------------------------------+       |      ...                     |
|                              |       |   }                          |
| task scheduler:              |       |                              |
|                            +-----------> arch_scale_freq_tick()*    |
|                              |       |   {                          |

|                              |       |      if (!use counters)      |
|                              |       |        return;               |
|                              |       |      ...                     |
|                              |       |   }                          |
+------------------------------+       +------------------------------+

* defined as topology_scale_freq_tick() in arm64

Only Arm/Arm64 defines arch_set_freq_scale() to get the 'legacy' CPUfreq
based FIE. This would still be the case when we move
arch_set_freq_scale() from individual cpufreq drivers to cpufreq core.

Arm64 is the only arch which has to runtime-choose between two different
FIEs. This is currently done by bailing out early in one of the FIE
functions based on 'use counters'.

X86 (and others) will continue to not define arch_set_freq_scale().

The issue with CONFIG_BL_SWITCHER (vexpress-spc-cpufreq.c) could be
solved arm/arm64 internally (arch_topology.c) by putting
arch_set_freq_scale() under a !CONFIG_BL_SWITCHER guard.
I doubt that there are any arm bL systems out there running it. At least
I'm not aware of any complaints due to missing FIE support in bl
switcher setups so far.
Ionela Voinescu July 9, 2020, 8:53 a.m. UTC | #8
Hi guys,

On Monday 06 Jul 2020 at 14:14:47 (+0200), Dietmar Eggemann wrote:
> On 02/07/2020 13:44, Ionela Voinescu wrote:
> > Hi,
> > 
> > On Thursday 02 Jul 2020 at 08:28:18 (+0530), Viresh Kumar wrote:
> >> On 01-07-20, 18:05, Rafael J. Wysocki wrote:
> >>> On Wed, Jul 1, 2020 at 3:33 PM Ionela Voinescu <ionela.voinescu@arm.com> wrote:
> >>>> On Wednesday 01 Jul 2020 at 16:16:17 (+0530), Viresh Kumar wrote:
> 
> [...]
> 
> >> There can be other reasons which we aren't able to imagine at this
> >> point of time.
> >>
> > 
> > But I understand both the points you and Rafael raised so it's obvious
> > that a 'opt in' flag would be the better option.
> 
> Why can't we just move the arch_set_freq_scale() call from cpufreq
> driver to cpufreq core w/o introducing a FIE related driver flag?
> 
> Current scenario for Frequency Invariance Engine (FIE) on arm/arm64.
> 
> +------------------------------+       +------------------------------+
> |                              |       |                              |
> | cpufreq core:                |       | arch: (arm, arm64)           |
> 
> |                              |       |                              |
> | weak arch_set_freq_scale() {}|       |                              |
> |                              |       |                              |
> +------------------------------+       |                              |
>                                        |                              |
> +------------------------------+       |                              |
> |                              |       |                              |
> | cpufreq driver:              |       |                              |
> |                            +-----------> arch_set_freq_scale()      |
> |                              |       |   {                          |
> +------------------------------+       |      if (use counters)       |
>                                        |        return;               |
> +------------------------------+       |      ...                     |
> |                              |       |   }                          |
> | task scheduler:              |       |                              |
> |                            +-----------> arch_scale_freq_tick()*    |
> |                              |       |   {                          |
> 
> |                              |       |      if (!use counters)      |
> |                              |       |        return;               |
> |                              |       |      ...                     |
> |                              |       |   }                          |
> +------------------------------+       +------------------------------+
> 
> * defined as topology_scale_freq_tick() in arm64
> 
> Only Arm/Arm64 defines arch_set_freq_scale() to get the 'legacy' CPUfreq
> based FIE. This would still be the case when we move
> arch_set_freq_scale() from individual cpufreq drivers to cpufreq core.
> 
> Arm64 is the only arch which has to runtime-choose between two different
> FIEs. This is currently done by bailing out early in one of the FIE
> functions based on 'use counters'.
> 
> X86 (and others) will continue to not define arch_set_freq_scale().
> 
> The issue with CONFIG_BL_SWITCHER (vexpress-spc-cpufreq.c) could be
> solved arm/arm64 internally (arch_topology.c) by putting
> arch_set_freq_scale() under a !CONFIG_BL_SWITCHER guard.
> I doubt that there are any arm bL systems out there running it. At least
> I'm not aware of any complaints due to missing FIE support in bl
> switcher setups so far.

Thank you Dietmar, for your review.

I was trying to suggest the same in my other replies. Given that BL_SWITCHER
can be removed as an argument for introducing a flag, I would also find it
cleaner to just skip on introducing a flag altogether, at least until we
have a driver/scenario in the kernel that will functionally benefit from it.
This would also give us the chance to reconsider the best meaning of the
flag we later introduce.

The introduction of the 'opt in' flag would be the next best thing as
suggested in the other replies, but currently it would not result in
anything functionally different.


Rafael, Viresh, would you mind confirming whether you still consider
having an 'opt in' flag is preferable here?

Many thanks,
Ionela.
Viresh Kumar July 9, 2020, 9:09 a.m. UTC | #9
On 09-07-20, 09:53, Ionela Voinescu wrote:
> On Monday 06 Jul 2020 at 14:14:47 (+0200), Dietmar Eggemann wrote:
> > Why can't we just move the arch_set_freq_scale() call from cpufreq
> > driver to cpufreq core w/o introducing a FIE related driver flag?
> > 
> > Current scenario for Frequency Invariance Engine (FIE) on arm/arm64.
> > 
> > +------------------------------+       +------------------------------+
> > |                              |       |                              |
> > | cpufreq core:                |       | arch: (arm, arm64)           |
> > 
> > |                              |       |                              |
> > | weak arch_set_freq_scale() {}|       |                              |
> > |                              |       |                              |
> > +------------------------------+       |                              |
> >                                        |                              |
> > +------------------------------+       |                              |
> > |                              |       |                              |
> > | cpufreq driver:              |       |                              |
> > |                            +-----------> arch_set_freq_scale()      |
> > |                              |       |   {                          |
> > +------------------------------+       |      if (use counters)       |
> >                                        |        return;               |
> > +------------------------------+       |      ...                     |
> > |                              |       |   }                          |
> > | task scheduler:              |       |                              |
> > |                            +-----------> arch_scale_freq_tick()*    |
> > |                              |       |   {                          |
> > 
> > |                              |       |      if (!use counters)      |
> > |                              |       |        return;               |
> > |                              |       |      ...                     |
> > |                              |       |   }                          |
> > +------------------------------+       +------------------------------+
> > 
> > * defined as topology_scale_freq_tick() in arm64
> > 
> > Only Arm/Arm64 defines arch_set_freq_scale() to get the 'legacy' CPUfreq
> > based FIE. This would still be the case when we move
> > arch_set_freq_scale() from individual cpufreq drivers to cpufreq core.
> > 
> > Arm64 is the only arch which has to runtime-choose between two different
> > FIEs. This is currently done by bailing out early in one of the FIE
> > functions based on 'use counters'.
> > 
> > X86 (and others) will continue to not define arch_set_freq_scale().
> > 
> > The issue with CONFIG_BL_SWITCHER (vexpress-spc-cpufreq.c) could be
> > solved arm/arm64 internally (arch_topology.c) by putting
> > arch_set_freq_scale() under a !CONFIG_BL_SWITCHER guard.
> > I doubt that there are any arm bL systems out there running it. At least
> > I'm not aware of any complaints due to missing FIE support in bl
> > switcher setups so far.

I agree to that.

> Thank you Dietmar, for your review.
> 
> I was trying to suggest the same in my other replies.

I am sorry, I must have overlooked that part in your replies,
otherwise I may agreed to it :)

> Rafael, Viresh, would you mind confirming whether you still consider
> having an 'opt in' flag is preferable here?

Well, we wanted an opt-in flag instead of an opt-out one. And no flag
is certainly better.
diff mbox series

Patch

diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index 944d7b45afe9..8e0571a49d1e 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -331,7 +331,8 @@  static int cpufreq_exit(struct cpufreq_policy *policy)
 
 static struct cpufreq_driver dt_cpufreq_driver = {
 	.flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK |
-		 CPUFREQ_IS_COOLING_DEV,
+		 CPUFREQ_IS_COOLING_DEV |
+		 CPUFREQ_CUSTOM_SET_FREQ_SCALE,
 	.verify = cpufreq_generic_frequency_table_verify,
 	.target_index = set_target,
 	.get = cpufreq_generic_get,
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 573630c23aca..e13780beb373 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -337,7 +337,8 @@  static struct freq_attr *qcom_cpufreq_hw_attr[] = {
 static struct cpufreq_driver cpufreq_qcom_hw_driver = {
 	.flags		= CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK |
 			  CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
-			  CPUFREQ_IS_COOLING_DEV,
+			  CPUFREQ_IS_COOLING_DEV |
+			  CPUFREQ_CUSTOM_SET_FREQ_SCALE,
 	.verify		= cpufreq_generic_frequency_table_verify,
 	.target_index	= qcom_cpufreq_hw_target_index,
 	.get		= qcom_cpufreq_hw_get,
diff --git a/drivers/cpufreq/scmi-cpufreq.c b/drivers/cpufreq/scmi-cpufreq.c
index fb42e3390377..16ab4ecc75e4 100644
--- a/drivers/cpufreq/scmi-cpufreq.c
+++ b/drivers/cpufreq/scmi-cpufreq.c
@@ -223,7 +223,8 @@  static struct cpufreq_driver scmi_cpufreq_driver = {
 	.name	= "scmi",
 	.flags	= CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
 		  CPUFREQ_NEED_INITIAL_FREQ_CHECK |
-		  CPUFREQ_IS_COOLING_DEV,
+		  CPUFREQ_IS_COOLING_DEV |
+		  CPUFREQ_CUSTOM_SET_FREQ_SCALE,
 	.verify	= cpufreq_generic_frequency_table_verify,
 	.attr	= cpufreq_generic_attr,
 	.target_index	= scmi_cpufreq_set_target,
diff --git a/drivers/cpufreq/scpi-cpufreq.c b/drivers/cpufreq/scpi-cpufreq.c
index b0f5388b8854..6b5f56dc3ca3 100644
--- a/drivers/cpufreq/scpi-cpufreq.c
+++ b/drivers/cpufreq/scpi-cpufreq.c
@@ -197,7 +197,8 @@  static struct cpufreq_driver scpi_cpufreq_driver = {
 	.name	= "scpi-cpufreq",
 	.flags	= CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
 		  CPUFREQ_NEED_INITIAL_FREQ_CHECK |
-		  CPUFREQ_IS_COOLING_DEV,
+		  CPUFREQ_IS_COOLING_DEV |
+		  CPUFREQ_CUSTOM_SET_FREQ_SCALE,
 	.verify	= cpufreq_generic_frequency_table_verify,
 	.attr	= cpufreq_generic_attr,
 	.get	= scpi_cpufreq_get_rate,
diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c
index 4e8b1dee7c9a..e0a1a3367ec5 100644
--- a/drivers/cpufreq/vexpress-spc-cpufreq.c
+++ b/drivers/cpufreq/vexpress-spc-cpufreq.c
@@ -496,7 +496,8 @@  static struct cpufreq_driver ve_spc_cpufreq_driver = {
 	.name			= "vexpress-spc",
 	.flags			= CPUFREQ_STICKY |
 					CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
-					CPUFREQ_NEED_INITIAL_FREQ_CHECK,
+					CPUFREQ_NEED_INITIAL_FREQ_CHECK |
+					CPUFREQ_CUSTOM_SET_FREQ_SCALE,
 	.verify			= cpufreq_generic_frequency_table_verify,
 	.target_index		= ve_spc_cpufreq_set_target,
 	.get			= ve_spc_cpufreq_get_rate,
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 3494f6763597..42668588f9f8 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -293,7 +293,7 @@  __ATTR(_name, 0644, show_##_name, store_##_name)
 
 struct cpufreq_driver {
 	char		name[CPUFREQ_NAME_LEN];
-	u8		flags;
+	u16		flags;
 	void		*driver_data;
 
 	/* needed by all drivers */
@@ -417,6 +417,14 @@  struct cpufreq_driver {
  */
 #define CPUFREQ_IS_COOLING_DEV			BIT(7)
 
+/*
+ * Set by drivers which implement the necessary calls to the scheduler's
+ * frequency invariance engine. The use of this flag will result in the
+ * default arch_set_freq_scale calls being skipped in favour of custom
+ * driver calls.
+ */
+#define CPUFREQ_CUSTOM_SET_FREQ_SCALE		BIT(8)
+
 int cpufreq_register_driver(struct cpufreq_driver *driver_data);
 int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);