[RFC,v2,01/11] OPP: Don't overwrite rounded clk rate

Message ID	20190320094918.20234-2-rnayak@codeaurora.org (mailing list archive)
State	RFC, archived
Headers	show Return-Path: <linux-pm-owner@kernel.org> sender: rnayak@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 691C06072E; Wed, 20 Mar 2019 09:49:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 691C06072E From: Rajendra Nayak <rnayak@codeaurora.org> To: linux-kernel@vger.kernel.org Cc: linux-arm-msm@vger.kernel.org, linux-pm@vger.kernel.org, linux-serial@vger.kernel.org, linux-spi@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-scsi@vger.kernel.org, swboyd@chromium.org, ulf.hansson@linaro.org, viresh.kumar@linaro.org, dianders@chromium.org, rafael@kernel.org, Rajendra Nayak <rnayak@codeaurora.org> Subject: [RFC v2 01/11] OPP: Don't overwrite rounded clk rate Date: Wed, 20 Mar 2019 15:19:08 +0530 Message-Id: <20190320094918.20234-2-rnayak@codeaurora.org> In-Reply-To: <20190320094918.20234-1-rnayak@codeaurora.org> References: <20190320094918.20234-1-rnayak@codeaurora.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-pm-owner@vger.kernel.org Precedence: bulk
Series	DVFS in the OPP core \| expand [RFC,v2,00/11] DVFS in the OPP core [RFC,v2,01/11] OPP: Don't overwrite rounded clk rate [RFC,v2,02/11] OPP: Make dev_pm_opp_set_rate() with freq=0 as valid [RFC,v2,03/11] tty: serial: qcom_geni_serial: Use OPP API to set clk/perf state [RFC,v2,04/11] spi: spi-geni-qcom: Use OPP API to set clk/perf state [RFC,v2,05/11] arm64: dts: sdm845: Add OPP table for all qup devices [RFC,v2,06/11] scsi: ufs: Add support to manage multiple power domains in ufshcd-pltfrm [RFC,v2,07/11] scsi: ufs: Add support for specifying OPP tables in DT [RFC,v2,08/11] arm64: dts: sdm845: Add ufs opps and power-domains [RFC,v2,09/11] drm/msm/dpu: Use OPP API to set clk/perf state [RFC,v2,10/11] drm/msm: dsi: Use OPP API to set clk/perf state [RFC,v2,11/11] arm64: dts: sdm845: Add DSI and MDP OPP tables and power-domains

Rajendra Nayak March 20, 2019, 9:49 a.m. UTC

From: Stephen Boyd <swboyd@chromium.org>

Doing this allows us to call this API with any rate requested and have
it not need to match in the OPP table. Instead, we'll round the rate up
to the nearest OPP that we see so that we can get the voltage or level
that's required for that OPP. This supports users of OPP that want to
specify the 'fmax' tables of a device instead of every single frequency
that they need. And for devices that required the exact frequency, we
can rely on the clk framework to round the rate to the nearest supported
frequency instead of the OPP framework to do so.

Note that this may affect drivers that don't want the clk framework to
do rounding, but instead want the OPP table to do the rounding for them.
Do we have that case? Should we add some flag to the OPP table to
indicate this and then not have that flag set when there isn't an OPP
table for the device and also introduce a property like 'opp-use-clk' to
tell the table that it should use the clk APIs to round rates instead of
OPP?

Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
---
 drivers/opp/core.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Viresh Kumar June 11, 2019, 10:54 a.m. UTC | #1

On 20-03-19, 15:19, Rajendra Nayak wrote:
> From: Stephen Boyd <swboyd@chromium.org>
> 
> Doing this allows us to call this API with any rate requested and have
> it not need to match in the OPP table. Instead, we'll round the rate up
> to the nearest OPP that we see so that we can get the voltage or level
> that's required for that OPP. This supports users of OPP that want to
> specify the 'fmax' tables of a device instead of every single frequency
> that they need. And for devices that required the exact frequency, we
> can rely on the clk framework to round the rate to the nearest supported
> frequency instead of the OPP framework to do so.
> 
> Note that this may affect drivers that don't want the clk framework to
> do rounding, but instead want the OPP table to do the rounding for them.
> Do we have that case? Should we add some flag to the OPP table to
> indicate this and then not have that flag set when there isn't an OPP
> table for the device and also introduce a property like 'opp-use-clk' to
> tell the table that it should use the clk APIs to round rates instead of
> OPP?
> 
> Signed-off-by: Stephen Boyd <swboyd@chromium.org>
> Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
> ---
>  drivers/opp/core.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 0420f7e8ad5b..bc9a7762dd4c 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -703,7 +703,7 @@ static int _set_required_opps(struct device *dev,
>  int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
>  {
>  	struct opp_table *opp_table;
> -	unsigned long freq, old_freq;
> +	unsigned long freq, opp_freq, old_freq, old_opp_freq;
>  	struct dev_pm_opp *old_opp, *opp;
>  	struct clk *clk;
>  	int ret;
> @@ -742,13 +742,15 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
>  		goto put_opp_table;
>  	}
>  
> -	old_opp = _find_freq_ceil(opp_table, &old_freq);
> +	old_opp_freq = old_freq;
> +	old_opp = _find_freq_ceil(opp_table, &old_opp_freq);
>  	if (IS_ERR(old_opp)) {
>  		dev_err(dev, "%s: failed to find current OPP for freq %lu (%ld)\n",
>  			__func__, old_freq, PTR_ERR(old_opp));
>  	}
>  
> -	opp = _find_freq_ceil(opp_table, &freq);
> +	opp_freq = freq;
> +	opp = _find_freq_ceil(opp_table, &opp_freq);
>  	if (IS_ERR(opp)) {
>  		ret = PTR_ERR(opp);
>  		dev_err(dev, "%s: failed to find OPP for freq %lu (%d)\n",

I see a logical problem with this patch.

Suppose the clock driver supports following frequencies: 500M, 800M,
1G, 1.2G and the OPP table contains following list: 500M, 1G, 1.2G
(i.e. missing 800M).

Now 800M should never get programmed as it isn't part of the OPP
table. But if you pass 600M to opp-set-rate, then it will end up
selecting 800M as clock driver will round up to the closest value.

Even if no one is doing this right now, it is a sensible usecase,
specially during testing of patches and I don't think we should avoid
it.

What exactly is the use case for which we need this patch ? What kind
of driver ? Some detail can be helpful to find another solution that
fixes this problem.

Rajendra Nayak June 12, 2019, 7:42 a.m. UTC | #2

On 6/11/2019 4:24 PM, Viresh Kumar wrote:
> On 20-03-19, 15:19, Rajendra Nayak wrote:
>> From: Stephen Boyd <swboyd@chromium.org>
>>
>> Doing this allows us to call this API with any rate requested and have
>> it not need to match in the OPP table. Instead, we'll round the rate up
>> to the nearest OPP that we see so that we can get the voltage or level
>> that's required for that OPP. This supports users of OPP that want to
>> specify the 'fmax' tables of a device instead of every single frequency
>> that they need. And for devices that required the exact frequency, we
>> can rely on the clk framework to round the rate to the nearest supported
>> frequency instead of the OPP framework to do so.
>>
>> Note that this may affect drivers that don't want the clk framework to
>> do rounding, but instead want the OPP table to do the rounding for them.
>> Do we have that case? Should we add some flag to the OPP table to
>> indicate this and then not have that flag set when there isn't an OPP
>> table for the device and also introduce a property like 'opp-use-clk' to
>> tell the table that it should use the clk APIs to round rates instead of
>> OPP?
>>
>> Signed-off-by: Stephen Boyd <swboyd@chromium.org>
>> Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
>> ---

[]...

> 
> I see a logical problem with this patch.
> 
> Suppose the clock driver supports following frequencies: 500M, 800M,
> 1G, 1.2G and the OPP table contains following list: 500M, 1G, 1.2G
> (i.e. missing 800M).
> 
> Now 800M should never get programmed as it isn't part of the OPP
> table. But if you pass 600M to opp-set-rate, then it will end up
> selecting 800M as clock driver will round up to the closest value.

correct

> 
> Even if no one is doing this right now, it is a sensible usecase,
> specially during testing of patches and I don't think we should avoid
> it.
> 
> What exactly is the use case for which we need this patch ? 
Like the changelog says 'This supports users of OPP that want to
specify the 'fmax' tables of a device instead of every single frequency
that they need'

so the 'fmax' tables basically say what the max frequency the device can
operate at for a given performance state/voltage level.

so in your example it would be for instance

500M, Perf state = 2
1G, Perf state = 3
1.2G, Perf state = 4

Now when the device wants to operate at say 800Mhz, you need to set the
Perf state to 3, so this patch basically avoids you having to put those additional
OPPs in the table which would otherwise look something like this

500M, Perf state = 2
800M, Perf state = 3 <-- redundant OPP
1G, Perf state = 3
1.2G, Perf state = 4

Your example had just 1 missing entry in the 'fmax' tables in reality its a lot more,
atleast on all qualcomm platforms.

Viresh Kumar June 12, 2019, 8:25 a.m. UTC | #3

On 12-06-19, 13:12, Rajendra Nayak wrote:
> so the 'fmax' tables basically say what the max frequency the device can
> operate at for a given performance state/voltage level.
> 
> so in your example it would be for instance
> 
> 500M, Perf state = 2
> 1G, Perf state = 3
> 1.2G, Perf state = 4
> 
> Now when the device wants to operate at say 800Mhz, you need to set the
> Perf state to 3, so this patch basically avoids you having to put those additional
> OPPs in the table which would otherwise look something like this
> 
> 500M, Perf state = 2
> 800M, Perf state = 3 <-- redundant OPP
> 1G, Perf state = 3
> 1.2G, Perf state = 4
> 
> Your example had just 1 missing entry in the 'fmax' tables in reality its a lot more,
> atleast on all qualcomm platforms.

Okay, I have applied this patch (alone) to the OPP tree with minor
modifications in commit log and diff.

Viresh Kumar June 13, 2019, 9:54 a.m. UTC | #4

On 12-06-19, 13:55, Viresh Kumar wrote:
> Okay, I have applied this patch (alone) to the OPP tree with minor
> modifications in commit log and diff.

And I have removed it now :)

I am confused as hell on what we should be doing and what we are doing
right now. And if we should do better.

Let me explain with an example.

- The clock provider supports following frequencies: 500, 600, 700,
  800, 900, 1000 MHz.

- The OPP table contains/supports only a subset: 500, 700, 1000 MHz.

Now, the request to change the frequency starts from cpufreq
governors, like schedutil when they calls:

__cpufreq_driver_target(policy, 599 MHz, CPUFREQ_RELATION_L);

CPUFREQ_RELATION_L means: lowest frequency at or above target. And so
I would expect the frequency to get set to 600MHz (if we look at clock
driver) or 700MHz (if we look at OPP table). I think we should decide
this thing from the OPP table only as that's what the platform guys
want us to use. So, we should end up with 700 MHz.

Then we land into dev_pm_opp_set_rate(), which does this (which is
code copied from earlier version of cpufreq-dt driver):

- clk_round_rate(clk, 599 MHz).

  clk_round_rate() returns the highest frequency lower than target. So
  it must return 500 MHz (I haven't tested this yet, all theoretical).

- _find_freq_ceil(opp_table, 500 MHz).

  This works like CPUFREQ_RELATION_L, so we find lowest frequency >=
  target freq. And so we should get: 500 MHz itself as OPP table has
  it.

- clk_set_rate(clk, 500 MHz).

  This must be doing round-rate again, but I think we will settle with
  500 MHz eventually.

Now the questionnaire:

- Is this whole exercise correct ?
- We shouldn't have landed on 500 MHz, right ?
- Is there anything wrong with the above theory (I am going to test it soon though).
- Why do we need to do the first clock_round_rate() ? Should we remove
  it ?

Now lets move to this patch, which makes it more confusing.

The OPP tables for CPUs and GPUs should already be somewhat like fmax
tables for particular voltage values and that's why both cpufreq and
OPP core try to find a frequency higher than target so we choose the
most optimum one power-efficiency wise.

For cases where the OPP table is only a subset of the clk-providers
table (almost always), if we let the clock provider to find the
nearest frequency (which is lower) we will run the CPU/GPU at a
not-so-optimal frequency. i.e. if 500, 600, 700 MHz all need voltage
to be 1.2 V, we should be running at 700 always, while we may end up
running at 500 MHz.

This kind of behavior (introduced by this patch) is important for
other devices which want to run at the nearest frequency to target
one, but not for CPUs/GPUs. So, we need to tag these IO devices
separately, maybe from DT ? So we select the closest match instead of
most optimal one.

But lets fix the existing issues first and then think about this
patch.

Viresh Kumar June 14, 2019, 5:27 a.m. UTC | #5

On 13-06-19, 15:24, Viresh Kumar wrote:
> I am confused as hell on what we should be doing and what we are doing
> right now. And if we should do better.
> 
> Let me explain with an example.
> 
> - The clock provider supports following frequencies: 500, 600, 700,
>   800, 900, 1000 MHz.
> 
> - The OPP table contains/supports only a subset: 500, 700, 1000 MHz.
> 
> Now, the request to change the frequency starts from cpufreq
> governors, like schedutil when they calls:
> 
> __cpufreq_driver_target(policy, 599 MHz, CPUFREQ_RELATION_L);
> 
> CPUFREQ_RELATION_L means: lowest frequency at or above target. And so
> I would expect the frequency to get set to 600MHz (if we look at clock
> driver) or 700MHz (if we look at OPP table). I think we should decide
> this thing from the OPP table only as that's what the platform guys
> want us to use. So, we should end up with 700 MHz.
> 
> Then we land into dev_pm_opp_set_rate(), which does this (which is
> code copied from earlier version of cpufreq-dt driver):
> 
> - clk_round_rate(clk, 599 MHz).
> 
>   clk_round_rate() returns the highest frequency lower than target. So
>   it must return 500 MHz (I haven't tested this yet, all theoretical).
> 
> - _find_freq_ceil(opp_table, 500 MHz).
> 
>   This works like CPUFREQ_RELATION_L, so we find lowest frequency >=
>   target freq. And so we should get: 500 MHz itself as OPP table has
>   it.
> 
> - clk_set_rate(clk, 500 MHz).
> 
>   This must be doing round-rate again, but I think we will settle with
>   500 MHz eventually.
> 
> 
> Now the questionnaire:
> 
> - Is this whole exercise correct ?

No, I missed the call to cpufreq_frequency_table_target() in
__cpufreq_driver_target() which finds the exact frequency from cpufreq table
(which was created using opp table) and so we never screw up here. Sorry for
confusing everyone on this :(

> Now lets move to this patch, which makes it more confusing.
> 
> The OPP tables for CPUs and GPUs should already be somewhat like fmax
> tables for particular voltage values and that's why both cpufreq and
> OPP core try to find a frequency higher than target so we choose the
> most optimum one power-efficiency wise.
> 
> For cases where the OPP table is only a subset of the clk-providers
> table (almost always), if we let the clock provider to find the
> nearest frequency (which is lower) we will run the CPU/GPU at a
> not-so-optimal frequency. i.e. if 500, 600, 700 MHz all need voltage
> to be 1.2 V, we should be running at 700 always, while we may end up
> running at 500 MHz.

This won't happen for CPUs because of the reason I explained earlier. cpufreq
core does the rounding before calling dev_pm_opp_set_rate(). And no other
devices use dev_pm_opp_set_rate() right now apart from CPUs, so we are not going
to break anything.

> This kind of behavior (introduced by this patch) is important for
> other devices which want to run at the nearest frequency to target
> one, but not for CPUs/GPUs. So, we need to tag these IO devices
> separately, maybe from DT ? So we select the closest match instead of
> most optimal one.

Hmm, so this patch won't break anything and I am inclined to apply it again :)

Does anyone see any other issues with it, which I might be missing ?

Rajendra Nayak June 14, 2019, 5:54 a.m. UTC | #6

> Now, the request to change the frequency starts from cpufreq
> governors, like schedutil when they calls:
> 
> __cpufreq_driver_target(policy, 599 MHz, CPUFREQ_RELATION_L);
> 
> CPUFREQ_RELATION_L means: lowest frequency at or above target. And so
> I would expect the frequency to get set to 600MHz (if we look at clock
> driver) or 700MHz (if we look at OPP table). I think we should decide
> this thing from the OPP table only as that's what the platform guys
> want us to use. So, we should end up with 700 MHz.
> 
> Then we land into dev_pm_opp_set_rate(), which does this (which is
> code copied from earlier version of cpufreq-dt driver):

so before we land into dev_pm_opp_set_rate() from a __cpufreq_driver_target()
I guess we do have a cpufreq driver callback that gets called in between?
which is either .target_index or .target

In case of .target_index, the cpufreq core looks for a OPP index
and we would land up with 700Mhz i guess, so we are good.

In case of .target though the 'relation' CPUFREQ_RELATION_L does get passed over
to the cpufreq driver which I am guessing is expected to handle it in some way to
make sure the target frequency set is not less than whats requested? instead of
simply passing the requested frequency over to dev_pm_opp_set_rate()?

Looking at all the existing cpufreq drivers upstream, while most support .target_index
the 3 which do support .target seem to completely ignore this 'relation' input that's
passed to them.

drivers/cpufreq/cppc_cpufreq.c:	.target = cppc_cpufreq_set_target,
drivers/cpufreq/cpufreq-nforce2.c:	.target = nforce2_target,
drivers/cpufreq/pcc-cpufreq.c:	.target = pcc_cpufreq_target,

> This kind of behavior (introduced by this patch) is important for
> other devices which want to run at the nearest frequency to target
> one, but not for CPUs/GPUs. So, we need to tag these IO devices
> separately, maybe from DT ? So we select the closest match instead of
> most optimal one.

yes we do need some way to distinguish between CPU/GPU devices and other
IO devices. CPU/GPU can always run at fmax for a given voltage, that's not true
for IO devices and I don't see how we can satisfy both cases without
clearly knowing if we are serving a processor or an IO device, unless the
higher layers (cpufreq/devfreq) are able to handle this somehow without
expecting the OPP layer to handle the differences.

Viresh Kumar June 17, 2019, 3:50 a.m. UTC | #7

On 14-06-19, 10:57, Viresh Kumar wrote:
> Hmm, so this patch won't break anything and I am inclined to apply it again :)
> 
> Does anyone see any other issues with it, which I might be missing ?

I have updated the commit log a bit more to clarify on things, please let me
know if it looks okay.

    opp: Don't overwrite rounded clk rate
    
    The OPP table normally contains 'fmax' values corresponding to the
    voltage or performance levels of each OPP, but we don't necessarily want
    all the devices to run at fmax all the time. Running at fmax makes sense
    for devices like CPU/GPU, which have a finite amount of work to do and
    since a specific amount of energy is consumed at an OPP, its better to
    run at the highest possible frequency for that voltage value.
    
    On the other hand, we have IO devices which need to run at specific
    frequencies only for their proper functioning, instead of maximum
    possible frequency.
    
    The OPP core currently roundup to the next possible OPP for a frequency
    and select the fmax value. To support the IO devices by the OPP core,
    lets do the roundup to fetch the voltage or performance state values,
    but not use the OPP frequency value. Rather use the value returned by
    clk_round_rate().
    
    The current user, cpufreq, of dev_pm_opp_set_rate() already does the
    rounding to the next OPP before calling this routine and it won't
    have any side affects because of this change.
    
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
    [ Viresh: Massaged changelog and use temp_opp variable instead ]
    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

Rajendra Nayak June 17, 2019, 4:07 a.m. UTC | #8

On 6/17/2019 9:20 AM, Viresh Kumar wrote:
> On 14-06-19, 10:57, Viresh Kumar wrote:
>> Hmm, so this patch won't break anything and I am inclined to apply it again :)
>>
>> Does anyone see any other issues with it, which I might be missing ?
> 
> I have updated the commit log a bit more to clarify on things, please let me
> know if it looks okay.
> 
>      opp: Don't overwrite rounded clk rate
>      
>      The OPP table normally contains 'fmax' values corresponding to the
>      voltage or performance levels of each OPP, but we don't necessarily want
>      all the devices to run at fmax all the time. Running at fmax makes sense
>      for devices like CPU/GPU, which have a finite amount of work to do and
>      since a specific amount of energy is consumed at an OPP, its better to
>      run at the highest possible frequency for that voltage value.
>      
>      On the other hand, we have IO devices which need to run at specific
>      frequencies only for their proper functioning, instead of maximum
>      possible frequency.
>      
>      The OPP core currently roundup to the next possible OPP for a frequency
>      and select the fmax value. To support the IO devices by the OPP core,
>      lets do the roundup to fetch the voltage or performance state values,
>      but not use the OPP frequency value. Rather use the value returned by
>      clk_round_rate().
>      
>      The current user, cpufreq, of dev_pm_opp_set_rate() already does the
>      rounding to the next OPP before calling this routine and it won't
>      have any side affects because of this change.

Looks good to me. Should this also be documented someplace that dev_pm_opp_set_rate()
would not be able to distinguish between its users trying to scale CPU/GPU's vs IO
devices, so its the callers responsibility to round it accordingly before calling the
API?

>      
>      Signed-off-by: Stephen Boyd <swboyd@chromium.org>
>      Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
>      [ Viresh: Massaged changelog and use temp_opp variable instead ]
>      Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> 
>

Viresh Kumar June 17, 2019, 4:17 a.m. UTC | #9

On 17-06-19, 09:37, Rajendra Nayak wrote:
> 
> 
> On 6/17/2019 9:20 AM, Viresh Kumar wrote:
> > On 14-06-19, 10:57, Viresh Kumar wrote:
> > > Hmm, so this patch won't break anything and I am inclined to apply it again :)
> > > 
> > > Does anyone see any other issues with it, which I might be missing ?
> > 
> > I have updated the commit log a bit more to clarify on things, please let me
> > know if it looks okay.
> > 
> >      opp: Don't overwrite rounded clk rate
> >      The OPP table normally contains 'fmax' values corresponding to the
> >      voltage or performance levels of each OPP, but we don't necessarily want
> >      all the devices to run at fmax all the time. Running at fmax makes sense
> >      for devices like CPU/GPU, which have a finite amount of work to do and
> >      since a specific amount of energy is consumed at an OPP, its better to
> >      run at the highest possible frequency for that voltage value.
> >      On the other hand, we have IO devices which need to run at specific
> >      frequencies only for their proper functioning, instead of maximum
> >      possible frequency.
> >      The OPP core currently roundup to the next possible OPP for a frequency
> >      and select the fmax value. To support the IO devices by the OPP core,
> >      lets do the roundup to fetch the voltage or performance state values,
> >      but not use the OPP frequency value. Rather use the value returned by
> >      clk_round_rate().
> >      The current user, cpufreq, of dev_pm_opp_set_rate() already does the
> >      rounding to the next OPP before calling this routine and it won't
> >      have any side affects because of this change.
> 
> Looks good to me. Should this also be documented someplace that dev_pm_opp_set_rate()
> would not be able to distinguish between its users trying to scale CPU/GPU's vs IO
> devices, so its the callers responsibility to round it accordingly before calling the
> API?

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 0fbc77f05048..bae94bfa1e96 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -751,8 +751,11 @@ static int _set_required_opps(struct device *dev,
  * @dev:        device for which we do this operation
  * @target_freq: frequency to achieve
  *
- * This configures the power-supplies and clock source to the levels specified
- * by the OPP corresponding to the target_freq.
+ * This configures the power-supplies to the levels specified by the OPP
+ * corresponding to the target_freq, and programs the clock to a value <=
+ * target_freq, as rounded by clk_round_rate(). Device wanting to run at fmax
+ * provided by the opp, should have already rounded to the target OPP's
+ * frequency.
  */

Rajendra Nayak June 17, 2019, 4:25 a.m. UTC | #10

On 6/17/2019 9:47 AM, Viresh Kumar wrote:
> On 17-06-19, 09:37, Rajendra Nayak wrote:
>>
>>
>> On 6/17/2019 9:20 AM, Viresh Kumar wrote:
>>> On 14-06-19, 10:57, Viresh Kumar wrote:
>>>> Hmm, so this patch won't break anything and I am inclined to apply it again :)
>>>>
>>>> Does anyone see any other issues with it, which I might be missing ?
>>>
>>> I have updated the commit log a bit more to clarify on things, please let me
>>> know if it looks okay.
>>>
>>>       opp: Don't overwrite rounded clk rate
>>>       The OPP table normally contains 'fmax' values corresponding to the
>>>       voltage or performance levels of each OPP, but we don't necessarily want
>>>       all the devices to run at fmax all the time. Running at fmax makes sense
>>>       for devices like CPU/GPU, which have a finite amount of work to do and
>>>       since a specific amount of energy is consumed at an OPP, its better to
>>>       run at the highest possible frequency for that voltage value.
>>>       On the other hand, we have IO devices which need to run at specific
>>>       frequencies only for their proper functioning, instead of maximum
>>>       possible frequency.
>>>       The OPP core currently roundup to the next possible OPP for a frequency
>>>       and select the fmax value. To support the IO devices by the OPP core,
>>>       lets do the roundup to fetch the voltage or performance state values,
>>>       but not use the OPP frequency value. Rather use the value returned by
>>>       clk_round_rate().
>>>       The current user, cpufreq, of dev_pm_opp_set_rate() already does the
>>>       rounding to the next OPP before calling this routine and it won't
>>>       have any side affects because of this change.
>>
>> Looks good to me. Should this also be documented someplace that dev_pm_opp_set_rate()
>> would not be able to distinguish between its users trying to scale CPU/GPU's vs IO
>> devices, so its the callers responsibility to round it accordingly before calling the
>> API?
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 0fbc77f05048..bae94bfa1e96 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -751,8 +751,11 @@ static int _set_required_opps(struct device *dev,
>    * @dev:        device for which we do this operation
>    * @target_freq: frequency to achieve
>    *
> - * This configures the power-supplies and clock source to the levels specified
> - * by the OPP corresponding to the target_freq.
> + * This configures the power-supplies to the levels specified by the OPP
> + * corresponding to the target_freq, and programs the clock to a value <=
> + * target_freq, as rounded by clk_round_rate(). Device wanting to run at fmax
> + * provided by the opp, should have already rounded to the target OPP's
> + * frequency.
>    */

Perfect, thanks.

[RFC,v2,01/11] OPP: Don't overwrite rounded clk rate

Commit Message

Comments

Patch