mbox series

[v1,0/3] Add required-opps support to devfreq passive gov

Message ID 20190622003449.33707-1-saravanak@google.com (mailing list archive)
Headers show
Series Add required-opps support to devfreq passive gov | expand

Message

Saravana Kannan June 22, 2019, 12:34 a.m. UTC
The devfreq passive governor scales the frequency of a "child" device
based on the current frequency of a "parent" device (not parent/child in
the sense of device hierarchy). As of today, the passive governor
requires one of the following to work correctly:
1. The parent and child device have the same number of frequencies
2. The child device driver passes a mapping function to translate from
   parent frequency to child frequency.

When (1) is not true, (2) is the only option right now. But often times,
all that is required is a simple mapping from parent's frequency to
child's frequency.

Since OPPs already support pointing to other "required-opps", add
support for using that to map from parent device frequency to child
device frequency. That way, every child device driver doesn't have to
implement a separate mapping function anytime (1) isn't true.

-Saravana

Saravana Kannan (3):
  OPP: Allow required-opps even if the device doesn't have power-domains
  OPP: Add function to look up required OPP's for a given OPP
  PM / devfreq: Add required OPPs support to passive governor

 drivers/devfreq/governor_passive.c | 25 +++++++++++--
 drivers/opp/core.c                 | 56 +++++++++++++++++++++++++++++-
 drivers/opp/of.c                   | 14 --------
 include/linux/pm_opp.h             | 11 ++++++
 4 files changed, 89 insertions(+), 17 deletions(-)

Comments

Viresh Kumar June 24, 2019, 9:43 a.m. UTC | #1
On 21-06-19, 17:34, Saravana Kannan wrote:
> The devfreq passive governor scales the frequency of a "child" device
> based on the current frequency of a "parent" device (not parent/child in
> the sense of device hierarchy). As of today, the passive governor
> requires one of the following to work correctly:
> 1. The parent and child device have the same number of frequencies
> 2. The child device driver passes a mapping function to translate from
>    parent frequency to child frequency.
> 
> When (1) is not true, (2) is the only option right now. But often times,
> all that is required is a simple mapping from parent's frequency to
> child's frequency.
> 
> Since OPPs already support pointing to other "required-opps", add
> support for using that to map from parent device frequency to child
> device frequency. That way, every child device driver doesn't have to
> implement a separate mapping function anytime (1) isn't true.

Can you please provide a real world example with DT code here so I
can understand it better ?
Saravana Kannan June 24, 2019, 10:17 p.m. UTC | #2
On Mon, Jun 24, 2019 at 2:43 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 21-06-19, 17:34, Saravana Kannan wrote:
> > The devfreq passive governor scales the frequency of a "child" device
> > based on the current frequency of a "parent" device (not parent/child in
> > the sense of device hierarchy). As of today, the passive governor
> > requires one of the following to work correctly:
> > 1. The parent and child device have the same number of frequencies
> > 2. The child device driver passes a mapping function to translate from
> >    parent frequency to child frequency.
> >
> > When (1) is not true, (2) is the only option right now. But often times,
> > all that is required is a simple mapping from parent's frequency to
> > child's frequency.
> >
> > Since OPPs already support pointing to other "required-opps", add
> > support for using that to map from parent device frequency to child
> > device frequency. That way, every child device driver doesn't have to
> > implement a separate mapping function anytime (1) isn't true.
>
> Can you please provide a real world example with DT code here so I
> can understand it better ?
>

Here's an example. This can't be done today, but can be done with this change.

In arch/arm64/boot/dts/exynos/exynos5433-bus.dtsi you have something
like this with the following changes:

        bus_g2d_400: bus0 {
                compatible = "samsung,exynos-bus";
                clocks = <&cmu_top CLK_ACLK_G2D_400>;
                clock-names = "bus";
                operating-points-v2 = <&bus_g2d_400_opp_table>;
                status = "disabled";
        };

        bus_noc2: bus9 {
                compatible = "samsung,exynos-bus";
                clocks = <&cmu_mif CLK_ACLK_BUS2_400>;
                clock-names = "bus";
                operating-points-v2 = <&bus_noc2_opp_table>;
                status = "disabled";
        };

        bus_g2d_400_opp_table: opp_table2 {
                compatible = "operating-points-v2";
                opp-shared;

                opp-400000000 {
                        opp-hz = /bits/ 64 <400000000>;
                        opp-microvolt = <1075000>;
+                       required-opps = <&noc2_400>;
                };
                opp-267000000 {
                        opp-hz = /bits/ 64 <267000000>;
                        opp-microvolt = <1000000>;
+                       required-opps = <&noc2_200>;
                };
                opp-200000000 {
                        opp-hz = /bits/ 64 <200000000>;
                        opp-microvolt = <975000>;
+                       required-opps = <&noc2_200>;
                };
                opp-160000000 {
                        opp-hz = /bits/ 64 <160000000>;
                        opp-microvolt = <962500>;
+                       required-opps = <&noc2_134>;
                };
                opp-134000000 {
                        opp-hz = /bits/ 64 <134000000>;
                        opp-microvolt = <950000>;
+                       required-opps = <&noc2_134>;
                };
                opp-100000000 {
                        opp-hz = /bits/ 64 <100000000>;
                        opp-microvolt = <937500>;
+                       required-opps = <&noc2_100>;
                };
        };

        bus_noc2_opp_table: opp_table6 {
                compatible = "operating-points-v2";

-               opp-400000000 {
+               noc2_400: opp-400000000 {
                        opp-hz = /bits/ 64 <400000000>;
                };
-               opp-200000000 {
+               noc2_200: opp-200000000 {
                        opp-hz = /bits/ 64 <200000000>;
                };
-               opp-134000000 {
+               noc2_134: opp-134000000 {
                        opp-hz = /bits/ 64 <134000000>;
                };
-               opp-100000000 {
+               noc2_100: opp-100000000 {
                        opp-hz = /bits/ 64 <100000000>;
                };
        };

Thanks,
Saravana
Viresh Kumar June 25, 2019, 4:10 a.m. UTC | #3
On 24-06-19, 15:17, Saravana Kannan wrote:
> Here's an example. This can't be done today, but can be done with this change.
> 
> In arch/arm64/boot/dts/exynos/exynos5433-bus.dtsi you have something
> like this with the following changes:
> 
>         bus_g2d_400: bus0 {
>                 compatible = "samsung,exynos-bus";
>                 clocks = <&cmu_top CLK_ACLK_G2D_400>;
>                 clock-names = "bus";
>                 operating-points-v2 = <&bus_g2d_400_opp_table>;
>                 status = "disabled";
>         };
> 
>         bus_noc2: bus9 {
>                 compatible = "samsung,exynos-bus";
>                 clocks = <&cmu_mif CLK_ACLK_BUS2_400>;
>                 clock-names = "bus";
>                 operating-points-v2 = <&bus_noc2_opp_table>;
>                 status = "disabled";
>         };

And what is the relation between these two busses ?

>         bus_g2d_400_opp_table: opp_table2 {
>                 compatible = "operating-points-v2";
>                 opp-shared;
> 
>                 opp-400000000 {
>                         opp-hz = /bits/ 64 <400000000>;
>                         opp-microvolt = <1075000>;
> +                       required-opps = <&noc2_400>;
>                 };
>                 opp-267000000 {
>                         opp-hz = /bits/ 64 <267000000>;
>                         opp-microvolt = <1000000>;
> +                       required-opps = <&noc2_200>;
>                 };
>                 opp-200000000 {
>                         opp-hz = /bits/ 64 <200000000>;
>                         opp-microvolt = <975000>;
> +                       required-opps = <&noc2_200>;
>                 };
>                 opp-160000000 {
>                         opp-hz = /bits/ 64 <160000000>;
>                         opp-microvolt = <962500>;
> +                       required-opps = <&noc2_134>;
>                 };
>                 opp-134000000 {
>                         opp-hz = /bits/ 64 <134000000>;
>                         opp-microvolt = <950000>;
> +                       required-opps = <&noc2_134>;
>                 };
>                 opp-100000000 {
>                         opp-hz = /bits/ 64 <100000000>;
>                         opp-microvolt = <937500>;
> +                       required-opps = <&noc2_100>;
>                 };
>         };
> 
>         bus_noc2_opp_table: opp_table6 {
>                 compatible = "operating-points-v2";
> 
> -               opp-400000000 {
> +               noc2_400: opp-400000000 {
>                         opp-hz = /bits/ 64 <400000000>;
>                 };
> -               opp-200000000 {
> +               noc2_200: opp-200000000 {
>                         opp-hz = /bits/ 64 <200000000>;
>                 };
> -               opp-134000000 {
> +               noc2_134: opp-134000000 {
>                         opp-hz = /bits/ 64 <134000000>;
>                 };
> -               opp-100000000 {
> +               noc2_100: opp-100000000 {
>                         opp-hz = /bits/ 64 <100000000>;
>                 };
>         };
> 
> Thanks,
> Saravana
Saravana Kannan June 25, 2019, 5 a.m. UTC | #4
On Mon, Jun 24, 2019 at 9:11 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 24-06-19, 15:17, Saravana Kannan wrote:
> > Here's an example. This can't be done today, but can be done with this change.
> >
> > In arch/arm64/boot/dts/exynos/exynos5433-bus.dtsi you have something
> > like this with the following changes:
> >
> >         bus_g2d_400: bus0 {
> >                 compatible = "samsung,exynos-bus";
> >                 clocks = <&cmu_top CLK_ACLK_G2D_400>;
> >                 clock-names = "bus";
> >                 operating-points-v2 = <&bus_g2d_400_opp_table>;
> >                 status = "disabled";
> >         };
> >
> >         bus_noc2: bus9 {
> >                 compatible = "samsung,exynos-bus";
> >                 clocks = <&cmu_mif CLK_ACLK_BUS2_400>;
> >                 clock-names = "bus";
> >                 operating-points-v2 = <&bus_noc2_opp_table>;
> >                 status = "disabled";
> >         };
>
> And what is the relation between these two busses ?

I can't speak for the Exynos hardware. Maybe Chanwoo knows.

But a couple of common reasons to do this between devices are:
1. These were the combination of frequencies that were
validated/screen during the manufacturing process.
2. These are the sensible performance combinations between two devices
interacting with each other. So that when one runs fast the other
doesn't become the bottleneck.
3. Hardware bugs requiring some kind of frequency ratio between devices.

All of the cases above are some real world scenarios I've come across.
CPU and L2/L3 on ARM systems are a good example of (2) but the passive
governor doesn't work with CPUs yet. But I plan to work on that later
as that's not related to this patch series.

-Saravana

> >         bus_g2d_400_opp_table: opp_table2 {
> >                 compatible = "operating-points-v2";
> >                 opp-shared;
> >
> >                 opp-400000000 {
> >                         opp-hz = /bits/ 64 <400000000>;
> >                         opp-microvolt = <1075000>;
> > +                       required-opps = <&noc2_400>;
> >                 };
> >                 opp-267000000 {
> >                         opp-hz = /bits/ 64 <267000000>;
> >                         opp-microvolt = <1000000>;
> > +                       required-opps = <&noc2_200>;
> >                 };
> >                 opp-200000000 {
> >                         opp-hz = /bits/ 64 <200000000>;
> >                         opp-microvolt = <975000>;
> > +                       required-opps = <&noc2_200>;
> >                 };
> >                 opp-160000000 {
> >                         opp-hz = /bits/ 64 <160000000>;
> >                         opp-microvolt = <962500>;
> > +                       required-opps = <&noc2_134>;
> >                 };
> >                 opp-134000000 {
> >                         opp-hz = /bits/ 64 <134000000>;
> >                         opp-microvolt = <950000>;
> > +                       required-opps = <&noc2_134>;
> >                 };
> >                 opp-100000000 {
> >                         opp-hz = /bits/ 64 <100000000>;
> >                         opp-microvolt = <937500>;
> > +                       required-opps = <&noc2_100>;
> >                 };
> >         };
> >
> >         bus_noc2_opp_table: opp_table6 {
> >                 compatible = "operating-points-v2";
> >
> > -               opp-400000000 {
> > +               noc2_400: opp-400000000 {
> >                         opp-hz = /bits/ 64 <400000000>;
> >                 };
> > -               opp-200000000 {
> > +               noc2_200: opp-200000000 {
> >                         opp-hz = /bits/ 64 <200000000>;
> >                 };
> > -               opp-134000000 {
> > +               noc2_134: opp-134000000 {
> >                         opp-hz = /bits/ 64 <134000000>;
> >                 };
> > -               opp-100000000 {
> > +               noc2_100: opp-100000000 {
> >                         opp-hz = /bits/ 64 <100000000>;
> >                 };
> >         };
> >
> > Thanks,
> > Saravana
>
> --
> viresh
Viresh Kumar June 25, 2019, 5:22 a.m. UTC | #5
On 24-06-19, 22:00, Saravana Kannan wrote:
> All of the cases above are some real world scenarios I've come across.
> CPU and L2/L3 on ARM systems are a good example of (2) but the passive
> governor doesn't work with CPUs yet. But I plan to work on that later
> as that's not related to this patch series.

So in case of CPUs, the cache will be the parent device and CPU be the
children ? And CPUs nodes will contain the required-opps property ?
Saravana Kannan June 25, 2019, 5:29 a.m. UTC | #6
On Mon, Jun 24, 2019 at 10:22 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 24-06-19, 22:00, Saravana Kannan wrote:
> > All of the cases above are some real world scenarios I've come across.
> > CPU and L2/L3 on ARM systems are a good example of (2) but the passive
> > governor doesn't work with CPUs yet. But I plan to work on that later
> > as that's not related to this patch series.
>
> So in case of CPUs, the cache will be the parent device and CPU be the
> children ? And CPUs nodes will contain the required-opps property ?

No, the CPUs will be the "parent" and the cache will be the "child".
CPU is a special case when it comes to the actual software (not DT) as
we'll need the devfreq governor to look at all the CPUfreq policies to
decide the cache frequency (max of all their requirements).

I think "master" and "slave" would have been a better term as the
master device determines its frequency using whatever means and the
"slave" device just "follows" the master device.

-Saravana
Viresh Kumar June 26, 2019, 6:32 a.m. UTC | #7
On 24-06-19, 22:29, Saravana Kannan wrote:
> No, the CPUs will be the "parent" and the cache will be the "child".
> CPU is a special case when it comes to the actual software (not DT) as
> we'll need the devfreq governor to look at all the CPUfreq policies to
> decide the cache frequency (max of all their requirements).
> 
> I think "master" and "slave" would have been a better term as the
> master device determines its frequency using whatever means and the
> "slave" device just "follows" the master device.

Okay, so to confirm again this is what we will have:

- CPUs are called masters and Caches are slaves.

- The devfreq governor we are talking about takes care of changing
  frequency of caches (slaves) and chooses a target frequency for
  caches based on what the masters are running at.

- The CPUs OPP nodes will have required-opps property and will be
  pointing to the caches OPP nodes. The CPUs may already be using
  required-opps node for PM domain performance state thing.


Now the problem is "required-opp" means something really *required*
and it is not optional. Like a specific voltage level before we can
switch to a particular frequency. And this is how I have though of it
until now. And this shouldn't be handled asynchronously, i.e. required
OPPs must be set while configuring OPP of a device.

So, when a CPU changes frequency, we must change the performance state
of PM domain and change frequency/bw of the cache synchronously. And
in such a case "required-opps" can be a very good fit for this use
case.

But with what you are trying to do it is no longer required-opp but
good-to-have-opp. And that worries me.
Saravana Kannan June 26, 2019, 6:10 p.m. UTC | #8
On Tue, Jun 25, 2019 at 11:32 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 24-06-19, 22:29, Saravana Kannan wrote:
> > No, the CPUs will be the "parent" and the cache will be the "child".
> > CPU is a special case when it comes to the actual software (not DT) as
> > we'll need the devfreq governor to look at all the CPUfreq policies to
> > decide the cache frequency (max of all their requirements).
> >
> > I think "master" and "slave" would have been a better term as the
> > master device determines its frequency using whatever means and the
> > "slave" device just "follows" the master device.
>
> Okay, so to confirm again this is what we will have:
>
> - CPUs are called masters and Caches are slaves.
>
> - The devfreq governor we are talking about takes care of changing
>   frequency of caches (slaves) and chooses a target frequency for
>   caches based on what the masters are running at.
>
> - The CPUs OPP nodes will have required-opps property and will be
>   pointing to the caches OPP nodes. The CPUs may already be using
>   required-opps node for PM domain performance state thing.
>
>
> Now the problem is "required-opp" means something really *required*
> and it is not optional.

But we could interpret it as "required" for different things.

> Like a specific voltage level before we can
> switch to a particular frequency.

Required for stability.

> And this is how I have though of it
> until now. And this shouldn't be handled asynchronously, i.e. required
> OPPs must be set while configuring OPP of a device.

The users of clocks are expected to set up the voltage correctly
before changing the frequency, the drivers are expected to power up
the device before trying to access its registers, etc. So I don't
think there is one correct way. Also OPP sets the pstate only if
dev_pm_opp_set_genpd_virt_dev() has been called to set them up. So
this is not a mandatory principle in the kernel that a framework
providing an API should have all the dependencies. So I don't think
we'll be violating some golden rule.

> So, when a CPU changes frequency, we must change the performance state
> of PM domain and change frequency/bw of the cache synchronously.

I mean, it's going to be changed when we get the CPUfreq transition
notifiers. From a correctness point of view, setting it inside the OPP
framework is not any better than doing it when we get the notifiers.

> And
> in such a case "required-opps" can be a very good fit for this use
> case.

Glad you agree :)

> But with what you are trying to do it is no longer required-opp but
> good-to-have-opp. And that worries me.

I see this as "required for good performance". So I don't see it as
redefining required-opps. If someone wants good performance/power
balance they follow the "required-opps". Technically even the PM
pstates are required for good power. Otherwise, the system could leave
the voltage at max and stuff would still work.

Also, the slave device might need to get input from multiple master
devices and aggregate the request before setting the slave device
frequency. So I don't think OPP  framework would be the right place to
deal with those things. For example, L3 might (will) have different
mappings for big vs little cores. So that needs to be aggregated and
set properly by the slave device driver. Also, GPU might have a
mapping for L3 too. In which case the L3 slave driver needs to take
input from even more masters before it decides its frequency. But most
importantly, we still need the ability to change governors for L3.
Again these are just examples with L3 and it can get more complicated
based on the situation.

Most importantly, instead of always going by mapping, one might decide
to scale the L3 based on some other governor (that looks at some HW
counter). Or just set it to performance governor for a use case for
which performance is more important. All of this comes for free with
devfreq and if we always set it from OPP framework we don't give this
required control to userspace.

I think going through devfreq is the right approach for this. And we
can always rewrite the software if we find problems in the future. But
as it stands today, this will help cases like exynos without the need
for a lot of changes. Hope I've convinced you.

-Saravana
Viresh Kumar June 28, 2019, 6:49 a.m. UTC | #9
On 26-06-19, 11:10, Saravana Kannan wrote:
> On Tue, Jun 25, 2019 at 11:32 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:

> > So, when a CPU changes frequency, we must change the performance state
> > of PM domain and change frequency/bw of the cache synchronously.
> 
> I mean, it's going to be changed when we get the CPUfreq transition
> notifiers. From a correctness point of view, setting it inside the OPP
> framework is not any better than doing it when we get the notifiers.

That's what the problem is. All maintainers now a days ask people to
stay away from notifiers and we are making that base of another new
thing we are starting.

Over that, with many cpufreq drivers we have fast switching enabled
and notifiers disabled. How will they make these things work ? We
still want to scale L3 in those cases as well.

> I see this as "required for good performance". So I don't see it as
> redefining required-opps. If someone wants good performance/power
> balance they follow the "required-opps". Technically even the PM
> pstates are required for good power. Otherwise, the system could leave
> the voltage at max and stuff would still work.
> 
> Also, the slave device might need to get input from multiple master
> devices and aggregate the request before setting the slave device
> frequency. So I don't think OPP  framework would be the right place to
> deal with those things. For example, L3 might (will) have different
> mappings for big vs little cores. So that needs to be aggregated and
> set properly by the slave device driver. Also, GPU might have a
> mapping for L3 too. In which case the L3 slave driver needs to take
> input from even more masters before it decides its frequency. But most
> importantly, we still need the ability to change governors for L3.
> Again these are just examples with L3 and it can get more complicated
> based on the situation.
> 
> Most importantly, instead of always going by mapping, one might decide
> to scale the L3 based on some other governor (that looks at some HW
> counter). Or just set it to performance governor for a use case for
> which performance is more important. All of this comes for free with
> devfreq and if we always set it from OPP framework we don't give this
> required control to userspace.
> 
> I think going through devfreq is the right approach for this. And we
> can always rewrite the software if we find problems in the future. But
> as it stands today, this will help cases like exynos without the need
> for a lot of changes. Hope I've convinced you.

I understand the aggregation thing and fully support that the
aggregation can't happen in OPP core and must be done somewhere else.
But the input can go from OPP core while the frequency is changing,
isn't it ?
Saravana Kannan June 28, 2019, 8:26 p.m. UTC | #10
On Thu, Jun 27, 2019 at 11:49 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 26-06-19, 11:10, Saravana Kannan wrote:
> > On Tue, Jun 25, 2019 at 11:32 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> > > So, when a CPU changes frequency, we must change the performance state
> > > of PM domain and change frequency/bw of the cache synchronously.
> >
> > I mean, it's going to be changed when we get the CPUfreq transition
> > notifiers. From a correctness point of view, setting it inside the OPP
> > framework is not any better than doing it when we get the notifiers.
>
> That's what the problem is. All maintainers now a days ask people to
> stay away from notifiers and we are making that base of another new
> thing we are starting.

In that case we can just add direct calls in cpufreq.c to let devfreq
know about the frequency changes. But then again, CPU is just one
example for this use case. I'm just using that because people are more
familiar with that.

> Over that, with many cpufreq drivers we have fast switching enabled
> and notifiers disabled. How will they make these things work ? We
> still want to scale L3 in those cases as well.

Nothing is preventing them from using the xlate OPP API I added to
figure out all the CPU to L3 frequency mapping and then set the L3
frequency directly from the CPUfreq driver.

Also, whether we use OPP framework or devfreq to set the L3 frequency,
it's going to block fast switching because both these frameworks have
APIs that can sleep.

But really, most mobile use cases don't want to permanently tie L3
freq to CPU freq. Having it go through devfreq and being able to
switch governors is a very important need for mobile products.

Keep in mind that nothing in this series does any of the cpufreq stuff
yet. That'll need a few more changes. I was just using CPUfreq as an
example.

> > I see this as "required for good performance". So I don't see it as
> > redefining required-opps. If someone wants good performance/power
> > balance they follow the "required-opps". Technically even the PM
> > pstates are required for good power. Otherwise, the system could leave
> > the voltage at max and stuff would still work.
> >
> > Also, the slave device might need to get input from multiple master
> > devices and aggregate the request before setting the slave device
> > frequency. So I don't think OPP  framework would be the right place to
> > deal with those things. For example, L3 might (will) have different
> > mappings for big vs little cores. So that needs to be aggregated and
> > set properly by the slave device driver. Also, GPU might have a
> > mapping for L3 too. In which case the L3 slave driver needs to take
> > input from even more masters before it decides its frequency. But most
> > importantly, we still need the ability to change governors for L3.
> > Again these are just examples with L3 and it can get more complicated
> > based on the situation.
> >
> > Most importantly, instead of always going by mapping, one might decide
> > to scale the L3 based on some other governor (that looks at some HW
> > counter). Or just set it to performance governor for a use case for
> > which performance is more important. All of this comes for free with
> > devfreq and if we always set it from OPP framework we don't give this
> > required control to userspace.
> >
> > I think going through devfreq is the right approach for this. And we
> > can always rewrite the software if we find problems in the future. But
> > as it stands today, this will help cases like exynos without the need
> > for a lot of changes. Hope I've convinced you.
>
> I understand the aggregation thing and fully support that the
> aggregation can't happen in OPP core and must be done somewhere else.
> But the input can go from OPP core while the frequency is changing,
> isn't it ?

I'm not opposed to OPP sending input to devfreq to let it know that a
master device frequency change is happening. But I think this is kinda
orthogonal to this patch series.

Today the passive governor looks at the master device's devfreq
frequency changes to trigger the frequency change of the slave
devfreq. It neither supports tracking OPP frequency change nor CPUfreq
frequency change. If that's something we want to add, we can look into
that separately as passive governor (or a new governor) changes.

But then not all devices (CPUfreq or otherwise) use OPP to set the
frequencies. So it's beneficial to have all of these frameworks as
inputs for devfreq passive (like) governor. CPUfreq is actually a bit
more tricky because we'll also have to track hotplug, etc. So direct
calls from CPUfreq to devfreq (similar to cpufreq stats tracking)
would be good.

-Saravana
Saravana Kannan July 11, 2019, 11:16 p.m. UTC | #11
Bump. I saw your email in Sibi's patch series. His patch series is
just one use case/user of this feature. This patch series is useful in
general. Do plan to Ack this? Or thoughts on my earlier response?

Thanks,
Saravana

On Fri, Jun 28, 2019 at 1:26 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Thu, Jun 27, 2019 at 11:49 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 26-06-19, 11:10, Saravana Kannan wrote:
> > > On Tue, Jun 25, 2019 at 11:32 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > > > So, when a CPU changes frequency, we must change the performance state
> > > > of PM domain and change frequency/bw of the cache synchronously.
> > >
> > > I mean, it's going to be changed when we get the CPUfreq transition
> > > notifiers. From a correctness point of view, setting it inside the OPP
> > > framework is not any better than doing it when we get the notifiers.
> >
> > That's what the problem is. All maintainers now a days ask people to
> > stay away from notifiers and we are making that base of another new
> > thing we are starting.
>
> In that case we can just add direct calls in cpufreq.c to let devfreq
> know about the frequency changes. But then again, CPU is just one
> example for this use case. I'm just using that because people are more
> familiar with that.
>
> > Over that, with many cpufreq drivers we have fast switching enabled
> > and notifiers disabled. How will they make these things work ? We
> > still want to scale L3 in those cases as well.
>
> Nothing is preventing them from using the xlate OPP API I added to
> figure out all the CPU to L3 frequency mapping and then set the L3
> frequency directly from the CPUfreq driver.
>
> Also, whether we use OPP framework or devfreq to set the L3 frequency,
> it's going to block fast switching because both these frameworks have
> APIs that can sleep.
>
> But really, most mobile use cases don't want to permanently tie L3
> freq to CPU freq. Having it go through devfreq and being able to
> switch governors is a very important need for mobile products.
>
> Keep in mind that nothing in this series does any of the cpufreq stuff
> yet. That'll need a few more changes. I was just using CPUfreq as an
> example.
>
> > > I see this as "required for good performance". So I don't see it as
> > > redefining required-opps. If someone wants good performance/power
> > > balance they follow the "required-opps". Technically even the PM
> > > pstates are required for good power. Otherwise, the system could leave
> > > the voltage at max and stuff would still work.
> > >
> > > Also, the slave device might need to get input from multiple master
> > > devices and aggregate the request before setting the slave device
> > > frequency. So I don't think OPP  framework would be the right place to
> > > deal with those things. For example, L3 might (will) have different
> > > mappings for big vs little cores. So that needs to be aggregated and
> > > set properly by the slave device driver. Also, GPU might have a
> > > mapping for L3 too. In which case the L3 slave driver needs to take
> > > input from even more masters before it decides its frequency. But most
> > > importantly, we still need the ability to change governors for L3.
> > > Again these are just examples with L3 and it can get more complicated
> > > based on the situation.
> > >
> > > Most importantly, instead of always going by mapping, one might decide
> > > to scale the L3 based on some other governor (that looks at some HW
> > > counter). Or just set it to performance governor for a use case for
> > > which performance is more important. All of this comes for free with
> > > devfreq and if we always set it from OPP framework we don't give this
> > > required control to userspace.
> > >
> > > I think going through devfreq is the right approach for this. And we
> > > can always rewrite the software if we find problems in the future. But
> > > as it stands today, this will help cases like exynos without the need
> > > for a lot of changes. Hope I've convinced you.
> >
> > I understand the aggregation thing and fully support that the
> > aggregation can't happen in OPP core and must be done somewhere else.
> > But the input can go from OPP core while the frequency is changing,
> > isn't it ?
>
> I'm not opposed to OPP sending input to devfreq to let it know that a
> master device frequency change is happening. But I think this is kinda
> orthogonal to this patch series.
>
> Today the passive governor looks at the master device's devfreq
> frequency changes to trigger the frequency change of the slave
> devfreq. It neither supports tracking OPP frequency change nor CPUfreq
> frequency change. If that's something we want to add, we can look into
> that separately as passive governor (or a new governor) changes.
>
> But then not all devices (CPUfreq or otherwise) use OPP to set the
> frequencies. So it's beneficial to have all of these frameworks as
> inputs for devfreq passive (like) governor. CPUfreq is actually a bit
> more tricky because we'll also have to track hotplug, etc. So direct
> calls from CPUfreq to devfreq (similar to cpufreq stats tracking)
> would be good.
>
> -Saravana