diff mbox

cpufreq: cpufreq-cpu0: Use a sane boot frequency when booting with a mismatched bootloader configuration

Message ID 1384568535-26611-1-git-send-email-nm@ti.com (mailing list archive)
State Changes Requested, archived
Headers show

Commit Message

Nishanth Menon Nov. 16, 2013, 2:22 a.m. UTC
On many development platforms, at boot, the bootloader configured
frequency maynot match the valid frequencies that are stated to be
supported in OPP table. This may occur due to various reasons:
a) older or default bootloader in development platform without latest
updates
b) SoC documentation update that may have occurred in kernel
c) kernel definitions are out of date Vs bootloader which is updated
etc..

In these cases, we should assume from a kernel perspective, the only
safe frequency that the system can be on is the ones available in the
OPP table. This may not handle case (c), but, that is a different
kernel bug of it's own.

Considering that in many existing or development platforms, (a) or
(b) is common, enforce a sanity check and reprogram to a conservative
start configuration at probe to allow sane operation independent of
bootloader.

Reported-by: Carlos Hernandez <ceh@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
---

based on v3.12 tag - however, a rebase with opp function name change
is needed for v3.13-rc1 if folks are ok, I can repost with updates.

Identified when during debug of https://patchwork.kernel.org/patch/3191411/
on TI vendor kernel based on v3.12 tag.

In the case discussed, bootloader booted OMAP5uEVM at 1.1GHz when the available
frequencies were 500MHz, 1GHz, 1.5GHz.

To prevent system from functioning at this invalid configuration (and hence
trigger the bug seen in stats), we should remove the dependence of the kernel
from bootloader configuration.
With this patch, in the mentioned boot scenario, we get the following log:
[   25.649736] cpufreq_cpu0: bootloader freq 1100000000 no match to table, Using 1000000000

Artificially modifying the bootloader to create other boundary conditions:
(lower bound check)
[   25.633535] cpufreq_cpu0: bootloader freq 300000000Hz no match to table, Using 500000000Hz
(upper bound check)
[   27.355837] cpufreq_cpu0: bootloader freq 1600000000Hz no match to table, Using 1500000000Hz

The other alternative(to reduce code churn) would be to just report a
mismatch and continue to function at the potentially risky OPP - but
in the cases such as using userspace governor, the system could be in
unstable state resulting in hard to debug behavior.

The last alternative is to always expect bootloader to be in sync with proper
OPP configuration, which rarely happens correctly.

 drivers/cpufreq/cpufreq-cpu0.c |   84 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 82 insertions(+), 2 deletions(-)

Comments

Shawn Guo Nov. 16, 2013, 1:44 p.m. UTC | #1
On Fri, Nov 15, 2013 at 08:22:15PM -0600, Nishanth Menon wrote:
> On many development platforms, at boot, the bootloader configured
> frequency maynot match the valid frequencies that are stated to be
> supported in OPP table. This may occur due to various reasons:
> a) older or default bootloader in development platform without latest
> updates
> b) SoC documentation update that may have occurred in kernel
> c) kernel definitions are out of date Vs bootloader which is updated
> etc..
> 
> In these cases, we should assume from a kernel perspective, the only
> safe frequency that the system can be on is the ones available in the
> OPP table. This may not handle case (c), but, that is a different
> kernel bug of it's own.

No, it's not a kernel bug.

OPP is not a definition that belongs to kernel.  Instead, it's
characteristics of hardware, and that's why we can naturally put the
definition into device tree.  Bear it in mind that device tree is a
hardware description and should be OS agnostic.  So it shouldn't be
treated as part of Linux kernel in any case, even though the device
tree source is currently maintained in kernel tree.

Device tree is designed as a way for firmware/bootloader to describe
hardware to kernel.  That said, device tree is more part of bootloader
than kernel.  If bootloader runs at a frequency that does not match the
OPP in device tree, the one should be fixed is either bootloader or
device tree but never kernel.

However, I agree we should at least have a check in cpufreq-cpu0 driver
and fail out in case that a mismatch is detected.

Shawn

> 
> Considering that in many existing or development platforms, (a) or
> (b) is common, enforce a sanity check and reprogram to a conservative
> start configuration at probe to allow sane operation independent of
> bootloader.
> 
> Reported-by: Carlos Hernandez <ceh@ti.com>
> Signed-off-by: Nishanth Menon <nm@ti.com>
> ---
> 
> based on v3.12 tag - however, a rebase with opp function name change
> is needed for v3.13-rc1 if folks are ok, I can repost with updates.
> 
> Identified when during debug of https://patchwork.kernel.org/patch/3191411/
> on TI vendor kernel based on v3.12 tag.
> 
> In the case discussed, bootloader booted OMAP5uEVM at 1.1GHz when the available
> frequencies were 500MHz, 1GHz, 1.5GHz.
> 
> To prevent system from functioning at this invalid configuration (and hence
> trigger the bug seen in stats), we should remove the dependence of the kernel
> from bootloader configuration.
> With this patch, in the mentioned boot scenario, we get the following log:
> [   25.649736] cpufreq_cpu0: bootloader freq 1100000000 no match to table, Using 1000000000
> 
> Artificially modifying the bootloader to create other boundary conditions:
> (lower bound check)
> [   25.633535] cpufreq_cpu0: bootloader freq 300000000Hz no match to table, Using 500000000Hz
> (upper bound check)
> [   27.355837] cpufreq_cpu0: bootloader freq 1600000000Hz no match to table, Using 1500000000Hz
> 
> The other alternative(to reduce code churn) would be to just report a
> mismatch and continue to function at the potentially risky OPP - but
> in the cases such as using userspace governor, the system could be in
> unstable state resulting in hard to debug behavior.
> 
> The last alternative is to always expect bootloader to be in sync with proper
> OPP configuration, which rarely happens correctly.
> 
>  drivers/cpufreq/cpufreq-cpu0.c |   84 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 82 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c
> index c522a95..9765050 100644
> --- a/drivers/cpufreq/cpufreq-cpu0.c
> +++ b/drivers/cpufreq/cpufreq-cpu0.c
> @@ -176,7 +176,8 @@ static struct cpufreq_driver cpu0_cpufreq_driver = {
>  static int cpu0_cpufreq_probe(struct platform_device *pdev)
>  {
>  	struct device_node *np;
> -	int ret;
> +	int ret, i;
> +	long boot_freq;
>  
>  	cpu_dev = get_cpu_device(0);
>  	if (!cpu_dev) {
> @@ -232,7 +233,6 @@ static int cpu0_cpufreq_probe(struct platform_device *pdev)
>  	if (!IS_ERR(cpu_reg)) {
>  		struct opp *opp;
>  		unsigned long min_uV, max_uV;
> -		int i;
>  
>  		/*
>  		 * OPP is maintained in order of increasing frequency, and
> @@ -254,6 +254,86 @@ static int cpu0_cpufreq_probe(struct platform_device *pdev)
>  			transition_latency += ret * 1000;
>  	}
>  
> +	boot_freq = clk_get_rate(cpu_clk);
> +
> +	/* See if we have a perfect match */
> +	for (i = 0; freq_table[i].frequency != CPUFREQ_TABLE_END; i++)
> +		if (boot_freq == freq_table[i].frequency * 1000)
> +			break;
> +
> +	/* If we have a bad bootloader config, try recovery */
> +	if (freq_table[i].frequency == CPUFREQ_TABLE_END) {
> +		struct opp *opp;
> +		long new_freq = boot_freq, freq_exact;
> +		unsigned long volt = 0, tol = 0;
> +
> +		ret = 0;
> +		rcu_read_lock();
> +
> +		/* Try a conservative match */
> +		opp = opp_find_freq_floor(cpu_dev, &new_freq);
> +
> +		/* If we did not get a floor match, try least available freq */
> +		if (IS_ERR(opp)) {
> +			new_freq = freq_table[0].frequency * 1000;
> +			opp = opp_find_freq_exact(cpu_dev, new_freq, true);
> +		}
> +		if (IS_ERR(opp))
> +			ret = -ERANGE;
> +		if (!IS_ERR(opp) && !IS_ERR(cpu_reg)) {
> +			volt = opp_get_voltage(opp);
> +			tol = volt * voltage_tolerance / 100;
> +		}
> +		rcu_read_unlock();
> +		if (ret) {
> +			pr_err("Fail to find match boot clock rate: %lu\n",
> +			       boot_freq);
> +			goto out_free_table;
> +		}
> +
> +		/* We dont expect to endup with same result */
> +		WARN_ON(boot_freq == new_freq);
> +
> +		freq_exact = clk_round_rate(cpu_clk, new_freq);
> +		if (freq_exact < 0) {
> +			pr_err("Fail to find valid boot clock rate: %lu\n",
> +			       freq_exact);
> +			goto out_free_table;
> +		}
> +
> +		/* Warn to get developer to fix bootloader */
> +		pr_err("Bootloader freq %luHz no match to table, Using %luHz\n",
> +		       boot_freq, new_freq);
> +
> +		/*
> +		 * For voltage sequencing we *assume* that bootloader has at
> +		 * least set the voltage appropriate for the boot_frequency
> +		 */
> +		if (!IS_ERR(cpu_reg) && boot_freq < new_freq) {
> +			ret = regulator_set_voltage_tol(cpu_reg, volt, tol);
> +			if (ret) {
> +				pr_err("Fail to scale boot voltage up: %d\n",
> +				       ret);
> +				goto out_free_table;
> +			}
> +		}
> +
> +		ret = clk_set_rate(cpu_clk, freq_exact);
> +		if (ret) {
> +			pr_err("Fail to set boot clock rate: %d\n", ret);
> +			goto out_free_table;
> +		}
> +
> +		if (!IS_ERR(cpu_reg) && boot_freq > new_freq) {
> +			ret = regulator_set_voltage_tol(cpu_reg, volt, tol);
> +			if (ret) {
> +				pr_err("Fail to scale boot voltage down: %d\n",
> +				       ret);
> +				goto out_free_table;
> +			}
> +		}
> +	}
> +
>  	ret = cpufreq_register_driver(&cpu0_cpufreq_driver);
>  	if (ret) {
>  		pr_err("failed register driver: %d\n", ret);
> -- 
> 1.7.9.5
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Nov. 17, 2013, 4:02 a.m. UTC | #2
On 16 November 2013 19:14, Shawn Guo <shawn.guo@linaro.org> wrote:
> No, it's not a kernel bug.
>
> OPP is not a definition that belongs to kernel.  Instead, it's
> characteristics of hardware, and that's why we can naturally put the
> definition into device tree.  Bear it in mind that device tree is a
> hardware description and should be OS agnostic.  So it shouldn't be
> treated as part of Linux kernel in any case, even though the device
> tree source is currently maintained in kernel tree.
>
> Device tree is designed as a way for firmware/bootloader to describe
> hardware to kernel.  That said, device tree is more part of bootloader
> than kernel.  If bootloader runs at a frequency that does not match the
> OPP in device tree, the one should be fixed is either bootloader or
> device tree but never kernel.

I agree for all that..

> However, I agree we should at least have a check in cpufreq-cpu0 driver
> and fail out in case that a mismatch is detected.

But not here.. We aren't in a non-workable state here.. and so creating
panic isn't the right approach.

At max we can print an warning but then it doesn't lie in cpufreq-cpu0's
domain. It should be done in core if required..

Though for Shawn's information we have another thread in parallel for
this issue. You might like to check that too..

https://lkml.org/lkml/2013/11/15/503
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Nov. 18, 2013, 2:45 p.m. UTC | #3
On 11/16/2013 10:02 PM, Viresh Kumar wrote:
> On 16 November 2013 19:14, Shawn Guo <shawn.guo@linaro.org> wrote:
>> No, it's not a kernel bug.
>>
>> OPP is not a definition that belongs to kernel.  Instead, it's
>> characteristics of hardware, and that's why we can naturally put the
>> definition into device tree.  Bear it in mind that device tree is a
>> hardware description and should be OS agnostic.  So it shouldn't be
>> treated as part of Linux kernel in any case, even though the device
>> tree source is currently maintained in kernel tree.
>>
>> Device tree is designed as a way for firmware/bootloader to describe
>> hardware to kernel.  That said, device tree is more part of bootloader
>> than kernel.  If bootloader runs at a frequency that does not match the
>> OPP in device tree, the one should be fixed is either bootloader or
>> device tree but never kernel.
> 
> I agree for all that..

I do not agree about this stance - device tree describes hardware
capabilities to kernel -> I agree with that statement. Kernel should
not care if that is provided by bootloader/firmware/fused into
flash/ROM etc.

If we agree with that statement, the moment kernel sees device
operating in a configuration that is not hardware capability, it is
our job to switch to the right configuration if it is possible for us
to do so (which in this case, we can). to give a comparison - if i2c
bus was configured for 3.4MHz and device tree describes the bus to
operate at 100KHz, do we ignore device tree recommended configuration?
No. we would switch to 100KHz. as far as kernel is concerned, it can
detect invalid configuration and switch back to a valid configuration
based on hardware description provided by device tree.
Shawn Guo Nov. 18, 2013, 3:57 p.m. UTC | #4
On Mon, Nov 18, 2013 at 08:45:12AM -0600, Nishanth Menon wrote:
<snip>
> I do not agree about this stance - device tree describes hardware
> capabilities to kernel -> I agree with that statement. Kernel should
> not care if that is provided by bootloader/firmware/fused into
> flash/ROM etc.

Yea, ideally kernel should just believe that the device tree is always
the correct one.  But the following text in your commit log suggests
that might not be always the case.

| a) older or default bootloader in development platform without latest
| updates
| b) SoC documentation update that may have occurred in kernel
| c) kernel definitions are out of date Vs bootloader which is updated
| etc..
| 
| In these cases, we should assume from a kernel perspective, the only
| safe frequency that the system can be on is the ones available in the
| OPP table. This may not handle case (c), but, that is a different
| kernel bug of it's own.

It says that the device tree might be wrong as well.  That's why I
think the best/safest thing that kernel can do when seeing a mismatch
is to throw out an error message and fail out.

Shawn

> 
> If we agree with that statement, the moment kernel sees device
> operating in a configuration that is not hardware capability, it is
> our job to switch to the right configuration if it is possible for us
> to do so (which in this case, we can). to give a comparison - if i2c
> bus was configured for 3.4MHz and device tree describes the bus to
> operate at 100KHz, do we ignore device tree recommended configuration?
> No. we would switch to 100KHz. as far as kernel is concerned, it can
> detect invalid configuration and switch back to a valid configuration
> based on hardware description provided by device tree.
> 
> 
> 
> -- 
> Regards,
> Nishanth Menon

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Nov. 18, 2013, 4:41 p.m. UTC | #5
On 11/18/2013 09:57 AM, Shawn Guo wrote:
> On Mon, Nov 18, 2013 at 08:45:12AM -0600, Nishanth Menon wrote:
> <snip>
>> I do not agree about this stance - device tree describes hardware
>> capabilities to kernel -> I agree with that statement. Kernel should
>> not care if that is provided by bootloader/firmware/fused into
>> flash/ROM etc.
> 
> Yea, ideally kernel should just believe that the device tree is always
> the correct one.  But the following text in your commit log suggests
> that might not be always the case.
> 
> | a) older or default bootloader in development platform without latest
> | updates
> | b) SoC documentation update that may have occurred in kernel
> | c) kernel definitions are out of date Vs bootloader which is updated
> | etc..
> | 
> | In these cases, we should assume from a kernel perspective, the only
> | safe frequency that the system can be on is the ones available in the
> | OPP table. This may not handle case (c), but, that is a different
> | kernel bug of it's own.
> 
> It says that the device tree might be wrong as well.  That's why I
> think the best/safest thing that kernel can do when seeing a mismatch
> is to throw out an error message and fail out.

In the case of mismatch, to consider that device tree may be wrong in
driver is also to assume that hardware was always configured correctly
and we assume description is the flawed data. That is just plain
wrong. we need to assume device tree is the correct description of the
hardware and any mismatch must be assumed as bad configuration - and
this is what the patch does.

Now, if the description is wrong, that is a dts bug of it's own.

If the suggestion is to improve my commit message, I am more than
happy to do so - please suggest how I could improve the same.

> 
> Shawn
> 
>>
>> If we agree with that statement, the moment kernel sees device
>> operating in a configuration that is not hardware capability, it is
>> our job to switch to the right configuration if it is possible for us
>> to do so (which in this case, we can). to give a comparison - if i2c
>> bus was configured for 3.4MHz and device tree describes the bus to
>> operate at 100KHz, do we ignore device tree recommended configuration?
>> No. we would switch to 100KHz. as far as kernel is concerned, it can
>> detect invalid configuration and switch back to a valid configuration
>> based on hardware description provided by device tree.
>>
>>
>>
>> -- 
>> Regards,
>> Nishanth Menon
>
Shawn Guo Nov. 19, 2013, 2:21 a.m. UTC | #6
On Mon, Nov 18, 2013 at 10:41:36AM -0600, Nishanth Menon wrote:
> In the case of mismatch, to consider that device tree may be wrong in
> driver is also to assume that hardware was always configured correctly
> and we assume description is the flawed data.

No, I did not say that.  IMO, when cpufreq-cpu0 sees a mismatch, it has
no way to know or assume which one is correct and which is incorrect.
The best thing it can do is to fail out without changing anything about
running frequency and voltage.

Shawn

> That is just plain
> wrong. we need to assume device tree is the correct description of the
> hardware and any mismatch must be assumed as bad configuration - and
> this is what the patch does.
> 
> Now, if the description is wrong, that is a dts bug of it's own.
> 
> If the suggestion is to improve my commit message, I am more than
> happy to do so - please suggest how I could improve the same.

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Nov. 19, 2013, 3:46 a.m. UTC | #7
On 19 November 2013 07:51, Shawn Guo <shawn.guo@linaro.org> wrote:
> No, I did not say that.  IMO, when cpufreq-cpu0 sees a mismatch, it has
> no way to know or assume which one is correct and which is incorrect.
> The best thing it can do is to fail out without changing anything about
> running frequency and voltage.

Not specifically on this patch, but this is what I feel about this issue:

- As we are discussing on the other thread, there is scope of adding
"unknown" field in tables so that people would know that they were
running out of table freq at some point..
- This is a common problem for all drivers/platforms and not only
cpufreq-cpu0, so the solution has to be generic and not driver
specific.. So, atleast I don't want to get this patch in at any cost,
unless there is a generic solution present..
- There are non-dt drivers as well, and so freq table is present
with the kernel and we can't support all frequencies that bootloader
may end up with..

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Nov. 19, 2013, 2:16 p.m. UTC | #8
On 11/18/2013 09:46 PM, Viresh Kumar wrote:
> On 19 November 2013 07:51, Shawn Guo <shawn.guo@linaro.org> wrote:
>> No, I did not say that.  IMO, when cpufreq-cpu0 sees a mismatch, it has
>> no way to know or assume which one is correct and which is incorrect.
>> The best thing it can do is to fail out without changing anything about
>> running frequency and voltage.
> 
> Not specifically on this patch, but this is what I feel about this issue:
> 
> - As we are discussing on the other thread, there is scope of adding
> "unknown" field in tables so that people would know that they were
> running out of table freq at some point..

Consider something like userspace governor selection -> the device at
boot will probably remain in an unknown/"invalid" configuration till
the very first transition attempt. I am less worried about the stats
than not following what the hardware description is (as stated by
device tree/other forms).

I staunchly disagree that at a point of mismatch detection, we just
refuse to load up cpufreq governor -even though we know from device
tree/other alternative entries what the hardware behavior is supposed
to be. To refuse to loadup to a known configuration is considering the
"valid configuration" data provided to the driver is wrong - an
equivalent(considering the i2c example) is that if i2c driver sees bus
configured for 3.4MHz and was asked to use 100KHz, it just refuses to
load up!

> - This is a common problem for all drivers/platforms and not only
> cpufreq-cpu0, so the solution has to be generic and not driver
> specific.. So, atleast I don't want to get this patch in at any cost,
> unless there is a generic solution present..
> - There are non-dt drivers as well, and so freq table is present
> with the kernel and we can't support all frequencies that bootloader
> may end up with..
The above two are fair comments -> but that implies that policy->cur
population should no longer be the responsibility of cpufreq drivers
and be the responsibility of cpufreq core. are we stating we want to
move that to cpufreq core?
Viresh Kumar Nov. 19, 2013, 2:26 p.m. UTC | #9
On 19 November 2013 19:46, Nishanth Menon <nm@ti.com> wrote:
> Consider something like userspace governor selection -> the device at
> boot will probably remain in an unknown/"invalid" configuration till
> the very first transition attempt. I am less worried about the stats
> than not following what the hardware description is (as stated by
> device tree/other forms).
>
> I staunchly disagree that at a point of mismatch detection, we just
> refuse to load up cpufreq governor -even though we know from device
> tree/other alternative entries what the hardware behavior is supposed
> to be. To refuse to loadup to a known configuration is considering the
> "valid configuration" data provided to the driver is wrong - an
> equivalent(considering the i2c example) is that if i2c driver sees bus
> configured for 3.4MHz and was asked to use 100KHz, it just refuses to
> load up!

CPU looks to be a bit different in that aspect as compared to I2C. We
aren't really sure if I2C will work at the existing freq but we are 100%
sure that current freq of CPU is valid enough, otherwise we wouldn't
have reached to this point.. :)

> The above two are fair comments -> but that implies that policy->cur
> population should no longer be the responsibility of cpufreq drivers
> and be the responsibility of cpufreq core. are we stating we want to
> move that to cpufreq core?

I am sure you want to have a look at this:

commit da60ce9f2faca87013fd3cab1c3bed5183608c3d
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Thu Oct 3 20:28:30 2013 +0530

    cpufreq: call cpufreq_driver->get() after calling ->init()

    Almost all drivers set policy->cur with current CPU frequency in
their ->init()
    part. This can be done for all of them at core level and so they
wouldn't need
    to do it.

    This patch adds supporting code in cpufreq core for calling get()
after we have
    called init() for a policy.

    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq.c | 11 +++++++++++
 1 file changed, 11 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Nov. 19, 2013, 2:59 p.m. UTC | #10
On 11/19/2013 08:26 AM, Viresh Kumar wrote:
> On 19 November 2013 19:46, Nishanth Menon <nm@ti.com> wrote:
>> Consider something like userspace governor selection -> the device at
>> boot will probably remain in an unknown/"invalid" configuration till
>> the very first transition attempt. I am less worried about the stats
>> than not following what the hardware description is (as stated by
>> device tree/other forms).
>>
>> I staunchly disagree that at a point of mismatch detection, we just
>> refuse to load up cpufreq governor -even though we know from device
>> tree/other alternative entries what the hardware behavior is supposed
>> to be. To refuse to loadup to a known configuration is considering the
>> "valid configuration" data provided to the driver is wrong - an
>> equivalent(considering the i2c example) is that if i2c driver sees bus
>> configured for 3.4MHz and was asked to use 100KHz, it just refuses to
>> load up!
> 
> CPU looks to be a bit different in that aspect as compared to I2C. We
> aren't really sure if I2C will work at the existing freq but we are 100%
> sure that current freq of CPU is valid enough, otherwise we wouldn't
> have reached to this point.. :)
> 
Not completely true - reaching probe after boot in a few seconds may
not mean that system will remain stable at that frequency for longer
duration. From a silicon vendor perspective, I do know that we
gaurentee the discrete frequencies in the data manual (and that gets
populated in devicetree and hence in freq_table), but we will not
guarentee any other frequency to be functional for any length of time.
in short, if a actual product is manufactured and operational at a
frequency we do not "officially support", there is a risk associated
with that. just a boot on a few development systems do not ever
guarentee productization capability.

>> The above two are fair comments -> but that implies that policy->cur
>> population should no longer be the responsibility of cpufreq drivers
>> and be the responsibility of cpufreq core. are we stating we want to
>> move that to cpufreq core?
> 
> I am sure you want to have a look at this:

my bad. I missed this one.

So, to summarize: what is our overall strategy here? to move to a
frequency matched in freq_table OR just giveup? I can try and respin
accordingly.

> 
> commit da60ce9f2faca87013fd3cab1c3bed5183608c3d
> Author: Viresh Kumar <viresh.kumar@linaro.org>
> Date:   Thu Oct 3 20:28:30 2013 +0530
> 
>     cpufreq: call cpufreq_driver->get() after calling ->init()
> 
>     Almost all drivers set policy->cur with current CPU frequency in
> their ->init()
>     part. This can be done for all of them at core level and so they
> wouldn't need
>     to do it.
> 
>     This patch adds supporting code in cpufreq core for calling get()
> after we have
>     called init() for a policy.
> 
>     Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>     Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/cpufreq.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
Viresh Kumar Nov. 19, 2013, 3:32 p.m. UTC | #11
On 19 November 2013 20:29, Nishanth Menon <nm@ti.com> wrote:
> Not completely true - reaching probe after boot in a few seconds may
> not mean that system will remain stable at that frequency for longer
> duration. From a silicon vendor perspective, I do know that we
> gaurentee the discrete frequencies in the data manual (and that gets
> populated in devicetree and hence in freq_table), but we will not
> guarentee any other frequency to be functional for any length of time.
> in short, if a actual product is manufactured and operational at a
> frequency we do not "officially support", there is a risk associated
> with that. just a boot on a few development systems do not ever
> guarentee productization capability.

Maybe for a long duration system isn't stable enough but should be
stable for few seconds Atleast?

As soon as ->init() of driver is called we will get a call to:
cpufreq_set_policy() from cpufreq_init_policy(). And we will execute
this for sure:

ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);

On this call if the current freq is lesser or greater than policy limits,
then we will fix it straight away.. Otherwise whichever the governor
is, we will change the freq to any from freq-table very soon..

So, we need a real example of unstable system which really
requires your patch. Otherwise I feel that we will not face any problems
at all..

> So, to summarize: what is our overall strategy here? to move to a
> frequency matched in freq_table OR just giveup? I can try and respin
> accordingly.

I want cpufreq-stats to be fixed with something else, might be something
similar to the patch I have sent..

But this stuff can be left as is..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Nov. 19, 2013, 3:48 p.m. UTC | #12
On 11/19/2013 09:32 AM, Viresh Kumar wrote:
> On 19 November 2013 20:29, Nishanth Menon <nm@ti.com> wrote:
>> Not completely true - reaching probe after boot in a few seconds may
>> not mean that system will remain stable at that frequency for longer
>> duration. From a silicon vendor perspective, I do know that we
>> gaurentee the discrete frequencies in the data manual (and that gets
>> populated in devicetree and hence in freq_table), but we will not
>> guarentee any other frequency to be functional for any length of time.
>> in short, if a actual product is manufactured and operational at a
>> frequency we do not "officially support", there is a risk associated
>> with that. just a boot on a few development systems do not ever
>> guarentee productization capability.
> 
> Maybe for a long duration system isn't stable enough but should be
> stable for few seconds Atleast?
yes.

> 
> As soon as ->init() of driver is called we will get a call to:
> cpufreq_set_policy() from cpufreq_init_policy(). And we will execute
> this for sure:
> 
> ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
> 
> On this call if the current freq is lesser or greater than policy limits,
> then we will fix it straight away.. Otherwise whichever the governor
> is, we will change the freq to any from freq-table very soon..
is that true for userspace governor
(CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE)?
/sys/devices/system/cpu/cpu0/cpufreq $ cat scaling_available_frequencies
500000 1000000 1500000

/sys/devices/system/cpu/cpu0/cpufreq $ cat scaling_cur_freq
1100000


> 
> So, we need a real example of unstable system which really
> requires your patch. Otherwise I feel that we will not face any problems
> at all..
OMAP5-UEVM will remain at this frequency for a long period of time
with AVS voltage(Adaptive Voltage Scaling technique used in OMAP to
optimize operational voltage) that was meant for 1GHz! that is
definitely not stable if there is no further transition to a valid
frequency.

> 
>> So, to summarize: what is our overall strategy here? to move to a
>> frequency matched in freq_table OR just giveup? I can try and respin
>> accordingly.
> 
> I want cpufreq-stats to be fixed with something else, might be something
> similar to the patch I have sent..
> 
> But this stuff can be left as is..
An alternative might be to ensure CPUFREQ_GOV_LIMITS takes care of that?
Viresh Kumar Nov. 19, 2013, 5:10 p.m. UTC | #13
On 19 November 2013 21:18, Nishanth Menon <nm@ti.com> wrote:
> is that true for userspace governor
> (CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE)?
> /sys/devices/system/cpu/cpu0/cpufreq $ cat scaling_available_frequencies
> 500000 1000000 1500000
>
> /sys/devices/system/cpu/cpu0/cpufreq $ cat scaling_cur_freq
> 1100000

No, but userspace governor must take care of this stuff as it want's
to change freq from userspace..

> OMAP5-UEVM will remain at this frequency for a long period of time
> with AVS voltage(Adaptive Voltage Scaling technique used in OMAP to
> optimize operational voltage) that was meant for 1GHz! that is
> definitely not stable if there is no further transition to a valid
> frequency.

I understand that point, but will this stay for a long time at that freq?
Why aren't governors coming into picture here?

> An alternative might be to ensure CPUFREQ_GOV_LIMITS takes care of that?

Again, I don't see if we really really need to do this at all..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Nov. 19, 2013, 5:43 p.m. UTC | #14
On 11/19/2013 11:10 AM, Viresh Kumar wrote:
> On 19 November 2013 21:18, Nishanth Menon <nm@ti.com> wrote:
>> is that true for userspace governor
>> (CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE)?
>> /sys/devices/system/cpu/cpu0/cpufreq $ cat scaling_available_frequencies
>> 500000 1000000 1500000
>>
>> /sys/devices/system/cpu/cpu0/cpufreq $ cat scaling_cur_freq
>> 1100000
> 
> No, but userspace governor must take care of this stuff as it want's
> to change freq from userspace..
> 

Right, the point is that CPUFREQ_GOV_LIMITS wont help in the case when
it is within min-max bound.. the first transition will help bring it
to a sane value - that I agree.

>> OMAP5-UEVM will remain at this frequency for a long period of time
>> with AVS voltage(Adaptive Voltage Scaling technique used in OMAP to
>> optimize operational voltage) that was meant for 1GHz! that is
>> definitely not stable if there is no further transition to a valid
>> frequency.
> 
> I understand that point, but will this stay for a long time at that freq?
> Why aren't governors coming into picture here?

we depend on the first transition to take us to a sane configuration -
but we cannot predict when and if it will happen.
Viresh Kumar Nov. 20, 2013, 5:24 a.m. UTC | #15
On Tuesday 19 November 2013 11:13 PM, Nishanth Menon wrote:
> we depend on the first transition to take us to a sane configuration -
> but we cannot predict when and if it will happen.

I really believe that it happens fairly quickly, isn't it? We straight away
start the sampling of load and withing few milliseconds we must be fixing the freq..

We aren't going to stay for the unknown, might be unstable, freq for ever..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Nov. 20, 2013, 2:59 p.m. UTC | #16
On 11/19/2013 11:24 PM, viresh kumar wrote:
> On Tuesday 19 November 2013 11:13 PM, Nishanth Menon wrote:
>> we depend on the first transition to take us to a sane configuration -
>> but we cannot predict when and if it will happen.
> 
> I really believe that it happens fairly quickly, isn't it? We straight away
> start the sampling of load and withing few milliseconds we must be fixing the freq..

that heavily depends on the governor - and we all know the variants of
ondemand governors that various distributions use. I cannot say
(having not studied all governors out there) if this will take place
in a few milliseconds or a few minutes or hours.

> 
> We aren't going to stay for the unknown, might be unstable, freq for ever..
> 
With the current governors that we have in upstream, the only one of
my concern has been userspace governor, but based on your comment
earlier in the thread, this is considered an non-issue since userspace
must trigger transition. However, this does put the SoC at risk
depending on distro and custom governors used.

If the opinion is that we dont care about these, well.. I can end my
complaints and depend on the stats to tell me if an unknown frequency
was ever attempted for debug (even though I might personally not be
too excited about it).
Viresh Kumar Nov. 21, 2013, 7:41 a.m. UTC | #17
On 20 November 2013 20:29, Nishanth Menon <nm@ti.com> wrote:
> With the current governors that we have in upstream, the only one of
> my concern has been userspace governor, but based on your comment
> earlier in the thread, this is considered an non-issue since userspace
> must trigger transition. However, this does put the SoC at risk
> depending on distro and custom governors used.
>
> If the opinion is that we dont care about these, well.. I can end my
> complaints and depend on the stats to tell me if an unknown frequency
> was ever attempted for debug (even though I might personally not be
> too excited about it).

I wouldn't say that I was against this patch, but it looked like we could live
without it as well.. But I also feel that there is some amount of uncertainty
here and specially with userspace governor..

So, just see the patch which I have floated as replacement for few of the
patches we have been talking about..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c
index c522a95..9765050 100644
--- a/drivers/cpufreq/cpufreq-cpu0.c
+++ b/drivers/cpufreq/cpufreq-cpu0.c
@@ -176,7 +176,8 @@  static struct cpufreq_driver cpu0_cpufreq_driver = {
 static int cpu0_cpufreq_probe(struct platform_device *pdev)
 {
 	struct device_node *np;
-	int ret;
+	int ret, i;
+	long boot_freq;
 
 	cpu_dev = get_cpu_device(0);
 	if (!cpu_dev) {
@@ -232,7 +233,6 @@  static int cpu0_cpufreq_probe(struct platform_device *pdev)
 	if (!IS_ERR(cpu_reg)) {
 		struct opp *opp;
 		unsigned long min_uV, max_uV;
-		int i;
 
 		/*
 		 * OPP is maintained in order of increasing frequency, and
@@ -254,6 +254,86 @@  static int cpu0_cpufreq_probe(struct platform_device *pdev)
 			transition_latency += ret * 1000;
 	}
 
+	boot_freq = clk_get_rate(cpu_clk);
+
+	/* See if we have a perfect match */
+	for (i = 0; freq_table[i].frequency != CPUFREQ_TABLE_END; i++)
+		if (boot_freq == freq_table[i].frequency * 1000)
+			break;
+
+	/* If we have a bad bootloader config, try recovery */
+	if (freq_table[i].frequency == CPUFREQ_TABLE_END) {
+		struct opp *opp;
+		long new_freq = boot_freq, freq_exact;
+		unsigned long volt = 0, tol = 0;
+
+		ret = 0;
+		rcu_read_lock();
+
+		/* Try a conservative match */
+		opp = opp_find_freq_floor(cpu_dev, &new_freq);
+
+		/* If we did not get a floor match, try least available freq */
+		if (IS_ERR(opp)) {
+			new_freq = freq_table[0].frequency * 1000;
+			opp = opp_find_freq_exact(cpu_dev, new_freq, true);
+		}
+		if (IS_ERR(opp))
+			ret = -ERANGE;
+		if (!IS_ERR(opp) && !IS_ERR(cpu_reg)) {
+			volt = opp_get_voltage(opp);
+			tol = volt * voltage_tolerance / 100;
+		}
+		rcu_read_unlock();
+		if (ret) {
+			pr_err("Fail to find match boot clock rate: %lu\n",
+			       boot_freq);
+			goto out_free_table;
+		}
+
+		/* We dont expect to endup with same result */
+		WARN_ON(boot_freq == new_freq);
+
+		freq_exact = clk_round_rate(cpu_clk, new_freq);
+		if (freq_exact < 0) {
+			pr_err("Fail to find valid boot clock rate: %lu\n",
+			       freq_exact);
+			goto out_free_table;
+		}
+
+		/* Warn to get developer to fix bootloader */
+		pr_err("Bootloader freq %luHz no match to table, Using %luHz\n",
+		       boot_freq, new_freq);
+
+		/*
+		 * For voltage sequencing we *assume* that bootloader has at
+		 * least set the voltage appropriate for the boot_frequency
+		 */
+		if (!IS_ERR(cpu_reg) && boot_freq < new_freq) {
+			ret = regulator_set_voltage_tol(cpu_reg, volt, tol);
+			if (ret) {
+				pr_err("Fail to scale boot voltage up: %d\n",
+				       ret);
+				goto out_free_table;
+			}
+		}
+
+		ret = clk_set_rate(cpu_clk, freq_exact);
+		if (ret) {
+			pr_err("Fail to set boot clock rate: %d\n", ret);
+			goto out_free_table;
+		}
+
+		if (!IS_ERR(cpu_reg) && boot_freq > new_freq) {
+			ret = regulator_set_voltage_tol(cpu_reg, volt, tol);
+			if (ret) {
+				pr_err("Fail to scale boot voltage down: %d\n",
+				       ret);
+				goto out_free_table;
+			}
+		}
+	}
+
 	ret = cpufreq_register_driver(&cpu0_cpufreq_driver);
 	if (ret) {
 		pr_err("failed register driver: %d\n", ret);