Message ID | 1384568535-26611-1-git-send-email-nm@ti.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
On Fri, Nov 15, 2013 at 08:22:15PM -0600, Nishanth Menon wrote: > On many development platforms, at boot, the bootloader configured > frequency maynot match the valid frequencies that are stated to be > supported in OPP table. This may occur due to various reasons: > a) older or default bootloader in development platform without latest > updates > b) SoC documentation update that may have occurred in kernel > c) kernel definitions are out of date Vs bootloader which is updated > etc.. > > In these cases, we should assume from a kernel perspective, the only > safe frequency that the system can be on is the ones available in the > OPP table. This may not handle case (c), but, that is a different > kernel bug of it's own. No, it's not a kernel bug. OPP is not a definition that belongs to kernel. Instead, it's characteristics of hardware, and that's why we can naturally put the definition into device tree. Bear it in mind that device tree is a hardware description and should be OS agnostic. So it shouldn't be treated as part of Linux kernel in any case, even though the device tree source is currently maintained in kernel tree. Device tree is designed as a way for firmware/bootloader to describe hardware to kernel. That said, device tree is more part of bootloader than kernel. If bootloader runs at a frequency that does not match the OPP in device tree, the one should be fixed is either bootloader or device tree but never kernel. However, I agree we should at least have a check in cpufreq-cpu0 driver and fail out in case that a mismatch is detected. Shawn > > Considering that in many existing or development platforms, (a) or > (b) is common, enforce a sanity check and reprogram to a conservative > start configuration at probe to allow sane operation independent of > bootloader. > > Reported-by: Carlos Hernandez <ceh@ti.com> > Signed-off-by: Nishanth Menon <nm@ti.com> > --- > > based on v3.12 tag - however, a rebase with opp function name change > is needed for v3.13-rc1 if folks are ok, I can repost with updates. > > Identified when during debug of https://patchwork.kernel.org/patch/3191411/ > on TI vendor kernel based on v3.12 tag. > > In the case discussed, bootloader booted OMAP5uEVM at 1.1GHz when the available > frequencies were 500MHz, 1GHz, 1.5GHz. > > To prevent system from functioning at this invalid configuration (and hence > trigger the bug seen in stats), we should remove the dependence of the kernel > from bootloader configuration. > With this patch, in the mentioned boot scenario, we get the following log: > [ 25.649736] cpufreq_cpu0: bootloader freq 1100000000 no match to table, Using 1000000000 > > Artificially modifying the bootloader to create other boundary conditions: > (lower bound check) > [ 25.633535] cpufreq_cpu0: bootloader freq 300000000Hz no match to table, Using 500000000Hz > (upper bound check) > [ 27.355837] cpufreq_cpu0: bootloader freq 1600000000Hz no match to table, Using 1500000000Hz > > The other alternative(to reduce code churn) would be to just report a > mismatch and continue to function at the potentially risky OPP - but > in the cases such as using userspace governor, the system could be in > unstable state resulting in hard to debug behavior. > > The last alternative is to always expect bootloader to be in sync with proper > OPP configuration, which rarely happens correctly. > > drivers/cpufreq/cpufreq-cpu0.c | 84 +++++++++++++++++++++++++++++++++++++++- > 1 file changed, 82 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c > index c522a95..9765050 100644 > --- a/drivers/cpufreq/cpufreq-cpu0.c > +++ b/drivers/cpufreq/cpufreq-cpu0.c > @@ -176,7 +176,8 @@ static struct cpufreq_driver cpu0_cpufreq_driver = { > static int cpu0_cpufreq_probe(struct platform_device *pdev) > { > struct device_node *np; > - int ret; > + int ret, i; > + long boot_freq; > > cpu_dev = get_cpu_device(0); > if (!cpu_dev) { > @@ -232,7 +233,6 @@ static int cpu0_cpufreq_probe(struct platform_device *pdev) > if (!IS_ERR(cpu_reg)) { > struct opp *opp; > unsigned long min_uV, max_uV; > - int i; > > /* > * OPP is maintained in order of increasing frequency, and > @@ -254,6 +254,86 @@ static int cpu0_cpufreq_probe(struct platform_device *pdev) > transition_latency += ret * 1000; > } > > + boot_freq = clk_get_rate(cpu_clk); > + > + /* See if we have a perfect match */ > + for (i = 0; freq_table[i].frequency != CPUFREQ_TABLE_END; i++) > + if (boot_freq == freq_table[i].frequency * 1000) > + break; > + > + /* If we have a bad bootloader config, try recovery */ > + if (freq_table[i].frequency == CPUFREQ_TABLE_END) { > + struct opp *opp; > + long new_freq = boot_freq, freq_exact; > + unsigned long volt = 0, tol = 0; > + > + ret = 0; > + rcu_read_lock(); > + > + /* Try a conservative match */ > + opp = opp_find_freq_floor(cpu_dev, &new_freq); > + > + /* If we did not get a floor match, try least available freq */ > + if (IS_ERR(opp)) { > + new_freq = freq_table[0].frequency * 1000; > + opp = opp_find_freq_exact(cpu_dev, new_freq, true); > + } > + if (IS_ERR(opp)) > + ret = -ERANGE; > + if (!IS_ERR(opp) && !IS_ERR(cpu_reg)) { > + volt = opp_get_voltage(opp); > + tol = volt * voltage_tolerance / 100; > + } > + rcu_read_unlock(); > + if (ret) { > + pr_err("Fail to find match boot clock rate: %lu\n", > + boot_freq); > + goto out_free_table; > + } > + > + /* We dont expect to endup with same result */ > + WARN_ON(boot_freq == new_freq); > + > + freq_exact = clk_round_rate(cpu_clk, new_freq); > + if (freq_exact < 0) { > + pr_err("Fail to find valid boot clock rate: %lu\n", > + freq_exact); > + goto out_free_table; > + } > + > + /* Warn to get developer to fix bootloader */ > + pr_err("Bootloader freq %luHz no match to table, Using %luHz\n", > + boot_freq, new_freq); > + > + /* > + * For voltage sequencing we *assume* that bootloader has at > + * least set the voltage appropriate for the boot_frequency > + */ > + if (!IS_ERR(cpu_reg) && boot_freq < new_freq) { > + ret = regulator_set_voltage_tol(cpu_reg, volt, tol); > + if (ret) { > + pr_err("Fail to scale boot voltage up: %d\n", > + ret); > + goto out_free_table; > + } > + } > + > + ret = clk_set_rate(cpu_clk, freq_exact); > + if (ret) { > + pr_err("Fail to set boot clock rate: %d\n", ret); > + goto out_free_table; > + } > + > + if (!IS_ERR(cpu_reg) && boot_freq > new_freq) { > + ret = regulator_set_voltage_tol(cpu_reg, volt, tol); > + if (ret) { > + pr_err("Fail to scale boot voltage down: %d\n", > + ret); > + goto out_free_table; > + } > + } > + } > + > ret = cpufreq_register_driver(&cpu0_cpufreq_driver); > if (ret) { > pr_err("failed register driver: %d\n", ret); > -- > 1.7.9.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 16 November 2013 19:14, Shawn Guo <shawn.guo@linaro.org> wrote: > No, it's not a kernel bug. > > OPP is not a definition that belongs to kernel. Instead, it's > characteristics of hardware, and that's why we can naturally put the > definition into device tree. Bear it in mind that device tree is a > hardware description and should be OS agnostic. So it shouldn't be > treated as part of Linux kernel in any case, even though the device > tree source is currently maintained in kernel tree. > > Device tree is designed as a way for firmware/bootloader to describe > hardware to kernel. That said, device tree is more part of bootloader > than kernel. If bootloader runs at a frequency that does not match the > OPP in device tree, the one should be fixed is either bootloader or > device tree but never kernel. I agree for all that.. > However, I agree we should at least have a check in cpufreq-cpu0 driver > and fail out in case that a mismatch is detected. But not here.. We aren't in a non-workable state here.. and so creating panic isn't the right approach. At max we can print an warning but then it doesn't lie in cpufreq-cpu0's domain. It should be done in core if required.. Though for Shawn's information we have another thread in parallel for this issue. You might like to check that too.. https://lkml.org/lkml/2013/11/15/503 -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/16/2013 10:02 PM, Viresh Kumar wrote: > On 16 November 2013 19:14, Shawn Guo <shawn.guo@linaro.org> wrote: >> No, it's not a kernel bug. >> >> OPP is not a definition that belongs to kernel. Instead, it's >> characteristics of hardware, and that's why we can naturally put the >> definition into device tree. Bear it in mind that device tree is a >> hardware description and should be OS agnostic. So it shouldn't be >> treated as part of Linux kernel in any case, even though the device >> tree source is currently maintained in kernel tree. >> >> Device tree is designed as a way for firmware/bootloader to describe >> hardware to kernel. That said, device tree is more part of bootloader >> than kernel. If bootloader runs at a frequency that does not match the >> OPP in device tree, the one should be fixed is either bootloader or >> device tree but never kernel. > > I agree for all that.. I do not agree about this stance - device tree describes hardware capabilities to kernel -> I agree with that statement. Kernel should not care if that is provided by bootloader/firmware/fused into flash/ROM etc. If we agree with that statement, the moment kernel sees device operating in a configuration that is not hardware capability, it is our job to switch to the right configuration if it is possible for us to do so (which in this case, we can). to give a comparison - if i2c bus was configured for 3.4MHz and device tree describes the bus to operate at 100KHz, do we ignore device tree recommended configuration? No. we would switch to 100KHz. as far as kernel is concerned, it can detect invalid configuration and switch back to a valid configuration based on hardware description provided by device tree.
On Mon, Nov 18, 2013 at 08:45:12AM -0600, Nishanth Menon wrote: <snip> > I do not agree about this stance - device tree describes hardware > capabilities to kernel -> I agree with that statement. Kernel should > not care if that is provided by bootloader/firmware/fused into > flash/ROM etc. Yea, ideally kernel should just believe that the device tree is always the correct one. But the following text in your commit log suggests that might not be always the case. | a) older or default bootloader in development platform without latest | updates | b) SoC documentation update that may have occurred in kernel | c) kernel definitions are out of date Vs bootloader which is updated | etc.. | | In these cases, we should assume from a kernel perspective, the only | safe frequency that the system can be on is the ones available in the | OPP table. This may not handle case (c), but, that is a different | kernel bug of it's own. It says that the device tree might be wrong as well. That's why I think the best/safest thing that kernel can do when seeing a mismatch is to throw out an error message and fail out. Shawn > > If we agree with that statement, the moment kernel sees device > operating in a configuration that is not hardware capability, it is > our job to switch to the right configuration if it is possible for us > to do so (which in this case, we can). to give a comparison - if i2c > bus was configured for 3.4MHz and device tree describes the bus to > operate at 100KHz, do we ignore device tree recommended configuration? > No. we would switch to 100KHz. as far as kernel is concerned, it can > detect invalid configuration and switch back to a valid configuration > based on hardware description provided by device tree. > > > > -- > Regards, > Nishanth Menon -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/18/2013 09:57 AM, Shawn Guo wrote: > On Mon, Nov 18, 2013 at 08:45:12AM -0600, Nishanth Menon wrote: > <snip> >> I do not agree about this stance - device tree describes hardware >> capabilities to kernel -> I agree with that statement. Kernel should >> not care if that is provided by bootloader/firmware/fused into >> flash/ROM etc. > > Yea, ideally kernel should just believe that the device tree is always > the correct one. But the following text in your commit log suggests > that might not be always the case. > > | a) older or default bootloader in development platform without latest > | updates > | b) SoC documentation update that may have occurred in kernel > | c) kernel definitions are out of date Vs bootloader which is updated > | etc.. > | > | In these cases, we should assume from a kernel perspective, the only > | safe frequency that the system can be on is the ones available in the > | OPP table. This may not handle case (c), but, that is a different > | kernel bug of it's own. > > It says that the device tree might be wrong as well. That's why I > think the best/safest thing that kernel can do when seeing a mismatch > is to throw out an error message and fail out. In the case of mismatch, to consider that device tree may be wrong in driver is also to assume that hardware was always configured correctly and we assume description is the flawed data. That is just plain wrong. we need to assume device tree is the correct description of the hardware and any mismatch must be assumed as bad configuration - and this is what the patch does. Now, if the description is wrong, that is a dts bug of it's own. If the suggestion is to improve my commit message, I am more than happy to do so - please suggest how I could improve the same. > > Shawn > >> >> If we agree with that statement, the moment kernel sees device >> operating in a configuration that is not hardware capability, it is >> our job to switch to the right configuration if it is possible for us >> to do so (which in this case, we can). to give a comparison - if i2c >> bus was configured for 3.4MHz and device tree describes the bus to >> operate at 100KHz, do we ignore device tree recommended configuration? >> No. we would switch to 100KHz. as far as kernel is concerned, it can >> detect invalid configuration and switch back to a valid configuration >> based on hardware description provided by device tree. >> >> >> >> -- >> Regards, >> Nishanth Menon >
On Mon, Nov 18, 2013 at 10:41:36AM -0600, Nishanth Menon wrote: > In the case of mismatch, to consider that device tree may be wrong in > driver is also to assume that hardware was always configured correctly > and we assume description is the flawed data. No, I did not say that. IMO, when cpufreq-cpu0 sees a mismatch, it has no way to know or assume which one is correct and which is incorrect. The best thing it can do is to fail out without changing anything about running frequency and voltage. Shawn > That is just plain > wrong. we need to assume device tree is the correct description of the > hardware and any mismatch must be assumed as bad configuration - and > this is what the patch does. > > Now, if the description is wrong, that is a dts bug of it's own. > > If the suggestion is to improve my commit message, I am more than > happy to do so - please suggest how I could improve the same. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19 November 2013 07:51, Shawn Guo <shawn.guo@linaro.org> wrote: > No, I did not say that. IMO, when cpufreq-cpu0 sees a mismatch, it has > no way to know or assume which one is correct and which is incorrect. > The best thing it can do is to fail out without changing anything about > running frequency and voltage. Not specifically on this patch, but this is what I feel about this issue: - As we are discussing on the other thread, there is scope of adding "unknown" field in tables so that people would know that they were running out of table freq at some point.. - This is a common problem for all drivers/platforms and not only cpufreq-cpu0, so the solution has to be generic and not driver specific.. So, atleast I don't want to get this patch in at any cost, unless there is a generic solution present.. - There are non-dt drivers as well, and so freq table is present with the kernel and we can't support all frequencies that bootloader may end up with.. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/18/2013 09:46 PM, Viresh Kumar wrote: > On 19 November 2013 07:51, Shawn Guo <shawn.guo@linaro.org> wrote: >> No, I did not say that. IMO, when cpufreq-cpu0 sees a mismatch, it has >> no way to know or assume which one is correct and which is incorrect. >> The best thing it can do is to fail out without changing anything about >> running frequency and voltage. > > Not specifically on this patch, but this is what I feel about this issue: > > - As we are discussing on the other thread, there is scope of adding > "unknown" field in tables so that people would know that they were > running out of table freq at some point.. Consider something like userspace governor selection -> the device at boot will probably remain in an unknown/"invalid" configuration till the very first transition attempt. I am less worried about the stats than not following what the hardware description is (as stated by device tree/other forms). I staunchly disagree that at a point of mismatch detection, we just refuse to load up cpufreq governor -even though we know from device tree/other alternative entries what the hardware behavior is supposed to be. To refuse to loadup to a known configuration is considering the "valid configuration" data provided to the driver is wrong - an equivalent(considering the i2c example) is that if i2c driver sees bus configured for 3.4MHz and was asked to use 100KHz, it just refuses to load up! > - This is a common problem for all drivers/platforms and not only > cpufreq-cpu0, so the solution has to be generic and not driver > specific.. So, atleast I don't want to get this patch in at any cost, > unless there is a generic solution present.. > - There are non-dt drivers as well, and so freq table is present > with the kernel and we can't support all frequencies that bootloader > may end up with.. The above two are fair comments -> but that implies that policy->cur population should no longer be the responsibility of cpufreq drivers and be the responsibility of cpufreq core. are we stating we want to move that to cpufreq core?
On 19 November 2013 19:46, Nishanth Menon <nm@ti.com> wrote: > Consider something like userspace governor selection -> the device at > boot will probably remain in an unknown/"invalid" configuration till > the very first transition attempt. I am less worried about the stats > than not following what the hardware description is (as stated by > device tree/other forms). > > I staunchly disagree that at a point of mismatch detection, we just > refuse to load up cpufreq governor -even though we know from device > tree/other alternative entries what the hardware behavior is supposed > to be. To refuse to loadup to a known configuration is considering the > "valid configuration" data provided to the driver is wrong - an > equivalent(considering the i2c example) is that if i2c driver sees bus > configured for 3.4MHz and was asked to use 100KHz, it just refuses to > load up! CPU looks to be a bit different in that aspect as compared to I2C. We aren't really sure if I2C will work at the existing freq but we are 100% sure that current freq of CPU is valid enough, otherwise we wouldn't have reached to this point.. :) > The above two are fair comments -> but that implies that policy->cur > population should no longer be the responsibility of cpufreq drivers > and be the responsibility of cpufreq core. are we stating we want to > move that to cpufreq core? I am sure you want to have a look at this: commit da60ce9f2faca87013fd3cab1c3bed5183608c3d Author: Viresh Kumar <viresh.kumar@linaro.org> Date: Thu Oct 3 20:28:30 2013 +0530 cpufreq: call cpufreq_driver->get() after calling ->init() Almost all drivers set policy->cur with current CPU frequency in their ->init() part. This can be done for all of them at core level and so they wouldn't need to do it. This patch adds supporting code in cpufreq core for calling get() after we have called init() for a policy. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- drivers/cpufreq/cpufreq.c | 11 +++++++++++ 1 file changed, 11 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/19/2013 08:26 AM, Viresh Kumar wrote: > On 19 November 2013 19:46, Nishanth Menon <nm@ti.com> wrote: >> Consider something like userspace governor selection -> the device at >> boot will probably remain in an unknown/"invalid" configuration till >> the very first transition attempt. I am less worried about the stats >> than not following what the hardware description is (as stated by >> device tree/other forms). >> >> I staunchly disagree that at a point of mismatch detection, we just >> refuse to load up cpufreq governor -even though we know from device >> tree/other alternative entries what the hardware behavior is supposed >> to be. To refuse to loadup to a known configuration is considering the >> "valid configuration" data provided to the driver is wrong - an >> equivalent(considering the i2c example) is that if i2c driver sees bus >> configured for 3.4MHz and was asked to use 100KHz, it just refuses to >> load up! > > CPU looks to be a bit different in that aspect as compared to I2C. We > aren't really sure if I2C will work at the existing freq but we are 100% > sure that current freq of CPU is valid enough, otherwise we wouldn't > have reached to this point.. :) > Not completely true - reaching probe after boot in a few seconds may not mean that system will remain stable at that frequency for longer duration. From a silicon vendor perspective, I do know that we gaurentee the discrete frequencies in the data manual (and that gets populated in devicetree and hence in freq_table), but we will not guarentee any other frequency to be functional for any length of time. in short, if a actual product is manufactured and operational at a frequency we do not "officially support", there is a risk associated with that. just a boot on a few development systems do not ever guarentee productization capability. >> The above two are fair comments -> but that implies that policy->cur >> population should no longer be the responsibility of cpufreq drivers >> and be the responsibility of cpufreq core. are we stating we want to >> move that to cpufreq core? > > I am sure you want to have a look at this: my bad. I missed this one. So, to summarize: what is our overall strategy here? to move to a frequency matched in freq_table OR just giveup? I can try and respin accordingly. > > commit da60ce9f2faca87013fd3cab1c3bed5183608c3d > Author: Viresh Kumar <viresh.kumar@linaro.org> > Date: Thu Oct 3 20:28:30 2013 +0530 > > cpufreq: call cpufreq_driver->get() after calling ->init() > > Almost all drivers set policy->cur with current CPU frequency in > their ->init() > part. This can be done for all of them at core level and so they > wouldn't need > to do it. > > This patch adds supporting code in cpufreq core for calling get() > after we have > called init() for a policy. > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/cpufreq/cpufreq.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) >
On 19 November 2013 20:29, Nishanth Menon <nm@ti.com> wrote: > Not completely true - reaching probe after boot in a few seconds may > not mean that system will remain stable at that frequency for longer > duration. From a silicon vendor perspective, I do know that we > gaurentee the discrete frequencies in the data manual (and that gets > populated in devicetree and hence in freq_table), but we will not > guarentee any other frequency to be functional for any length of time. > in short, if a actual product is manufactured and operational at a > frequency we do not "officially support", there is a risk associated > with that. just a boot on a few development systems do not ever > guarentee productization capability. Maybe for a long duration system isn't stable enough but should be stable for few seconds Atleast? As soon as ->init() of driver is called we will get a call to: cpufreq_set_policy() from cpufreq_init_policy(). And we will execute this for sure: ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); On this call if the current freq is lesser or greater than policy limits, then we will fix it straight away.. Otherwise whichever the governor is, we will change the freq to any from freq-table very soon.. So, we need a real example of unstable system which really requires your patch. Otherwise I feel that we will not face any problems at all.. > So, to summarize: what is our overall strategy here? to move to a > frequency matched in freq_table OR just giveup? I can try and respin > accordingly. I want cpufreq-stats to be fixed with something else, might be something similar to the patch I have sent.. But this stuff can be left as is.. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/19/2013 09:32 AM, Viresh Kumar wrote: > On 19 November 2013 20:29, Nishanth Menon <nm@ti.com> wrote: >> Not completely true - reaching probe after boot in a few seconds may >> not mean that system will remain stable at that frequency for longer >> duration. From a silicon vendor perspective, I do know that we >> gaurentee the discrete frequencies in the data manual (and that gets >> populated in devicetree and hence in freq_table), but we will not >> guarentee any other frequency to be functional for any length of time. >> in short, if a actual product is manufactured and operational at a >> frequency we do not "officially support", there is a risk associated >> with that. just a boot on a few development systems do not ever >> guarentee productization capability. > > Maybe for a long duration system isn't stable enough but should be > stable for few seconds Atleast? yes. > > As soon as ->init() of driver is called we will get a call to: > cpufreq_set_policy() from cpufreq_init_policy(). And we will execute > this for sure: > > ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); > > On this call if the current freq is lesser or greater than policy limits, > then we will fix it straight away.. Otherwise whichever the governor > is, we will change the freq to any from freq-table very soon.. is that true for userspace governor (CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE)? /sys/devices/system/cpu/cpu0/cpufreq $ cat scaling_available_frequencies 500000 1000000 1500000 /sys/devices/system/cpu/cpu0/cpufreq $ cat scaling_cur_freq 1100000 > > So, we need a real example of unstable system which really > requires your patch. Otherwise I feel that we will not face any problems > at all.. OMAP5-UEVM will remain at this frequency for a long period of time with AVS voltage(Adaptive Voltage Scaling technique used in OMAP to optimize operational voltage) that was meant for 1GHz! that is definitely not stable if there is no further transition to a valid frequency. > >> So, to summarize: what is our overall strategy here? to move to a >> frequency matched in freq_table OR just giveup? I can try and respin >> accordingly. > > I want cpufreq-stats to be fixed with something else, might be something > similar to the patch I have sent.. > > But this stuff can be left as is.. An alternative might be to ensure CPUFREQ_GOV_LIMITS takes care of that?
On 19 November 2013 21:18, Nishanth Menon <nm@ti.com> wrote: > is that true for userspace governor > (CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE)? > /sys/devices/system/cpu/cpu0/cpufreq $ cat scaling_available_frequencies > 500000 1000000 1500000 > > /sys/devices/system/cpu/cpu0/cpufreq $ cat scaling_cur_freq > 1100000 No, but userspace governor must take care of this stuff as it want's to change freq from userspace.. > OMAP5-UEVM will remain at this frequency for a long period of time > with AVS voltage(Adaptive Voltage Scaling technique used in OMAP to > optimize operational voltage) that was meant for 1GHz! that is > definitely not stable if there is no further transition to a valid > frequency. I understand that point, but will this stay for a long time at that freq? Why aren't governors coming into picture here? > An alternative might be to ensure CPUFREQ_GOV_LIMITS takes care of that? Again, I don't see if we really really need to do this at all.. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/19/2013 11:10 AM, Viresh Kumar wrote: > On 19 November 2013 21:18, Nishanth Menon <nm@ti.com> wrote: >> is that true for userspace governor >> (CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE)? >> /sys/devices/system/cpu/cpu0/cpufreq $ cat scaling_available_frequencies >> 500000 1000000 1500000 >> >> /sys/devices/system/cpu/cpu0/cpufreq $ cat scaling_cur_freq >> 1100000 > > No, but userspace governor must take care of this stuff as it want's > to change freq from userspace.. > Right, the point is that CPUFREQ_GOV_LIMITS wont help in the case when it is within min-max bound.. the first transition will help bring it to a sane value - that I agree. >> OMAP5-UEVM will remain at this frequency for a long period of time >> with AVS voltage(Adaptive Voltage Scaling technique used in OMAP to >> optimize operational voltage) that was meant for 1GHz! that is >> definitely not stable if there is no further transition to a valid >> frequency. > > I understand that point, but will this stay for a long time at that freq? > Why aren't governors coming into picture here? we depend on the first transition to take us to a sane configuration - but we cannot predict when and if it will happen.
On Tuesday 19 November 2013 11:13 PM, Nishanth Menon wrote: > we depend on the first transition to take us to a sane configuration - > but we cannot predict when and if it will happen. I really believe that it happens fairly quickly, isn't it? We straight away start the sampling of load and withing few milliseconds we must be fixing the freq.. We aren't going to stay for the unknown, might be unstable, freq for ever.. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/19/2013 11:24 PM, viresh kumar wrote: > On Tuesday 19 November 2013 11:13 PM, Nishanth Menon wrote: >> we depend on the first transition to take us to a sane configuration - >> but we cannot predict when and if it will happen. > > I really believe that it happens fairly quickly, isn't it? We straight away > start the sampling of load and withing few milliseconds we must be fixing the freq.. that heavily depends on the governor - and we all know the variants of ondemand governors that various distributions use. I cannot say (having not studied all governors out there) if this will take place in a few milliseconds or a few minutes or hours. > > We aren't going to stay for the unknown, might be unstable, freq for ever.. > With the current governors that we have in upstream, the only one of my concern has been userspace governor, but based on your comment earlier in the thread, this is considered an non-issue since userspace must trigger transition. However, this does put the SoC at risk depending on distro and custom governors used. If the opinion is that we dont care about these, well.. I can end my complaints and depend on the stats to tell me if an unknown frequency was ever attempted for debug (even though I might personally not be too excited about it).
On 20 November 2013 20:29, Nishanth Menon <nm@ti.com> wrote: > With the current governors that we have in upstream, the only one of > my concern has been userspace governor, but based on your comment > earlier in the thread, this is considered an non-issue since userspace > must trigger transition. However, this does put the SoC at risk > depending on distro and custom governors used. > > If the opinion is that we dont care about these, well.. I can end my > complaints and depend on the stats to tell me if an unknown frequency > was ever attempted for debug (even though I might personally not be > too excited about it). I wouldn't say that I was against this patch, but it looked like we could live without it as well.. But I also feel that there is some amount of uncertainty here and specially with userspace governor.. So, just see the patch which I have floated as replacement for few of the patches we have been talking about.. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpufreq/cpufreq-cpu0.c b/drivers/cpufreq/cpufreq-cpu0.c index c522a95..9765050 100644 --- a/drivers/cpufreq/cpufreq-cpu0.c +++ b/drivers/cpufreq/cpufreq-cpu0.c @@ -176,7 +176,8 @@ static struct cpufreq_driver cpu0_cpufreq_driver = { static int cpu0_cpufreq_probe(struct platform_device *pdev) { struct device_node *np; - int ret; + int ret, i; + long boot_freq; cpu_dev = get_cpu_device(0); if (!cpu_dev) { @@ -232,7 +233,6 @@ static int cpu0_cpufreq_probe(struct platform_device *pdev) if (!IS_ERR(cpu_reg)) { struct opp *opp; unsigned long min_uV, max_uV; - int i; /* * OPP is maintained in order of increasing frequency, and @@ -254,6 +254,86 @@ static int cpu0_cpufreq_probe(struct platform_device *pdev) transition_latency += ret * 1000; } + boot_freq = clk_get_rate(cpu_clk); + + /* See if we have a perfect match */ + for (i = 0; freq_table[i].frequency != CPUFREQ_TABLE_END; i++) + if (boot_freq == freq_table[i].frequency * 1000) + break; + + /* If we have a bad bootloader config, try recovery */ + if (freq_table[i].frequency == CPUFREQ_TABLE_END) { + struct opp *opp; + long new_freq = boot_freq, freq_exact; + unsigned long volt = 0, tol = 0; + + ret = 0; + rcu_read_lock(); + + /* Try a conservative match */ + opp = opp_find_freq_floor(cpu_dev, &new_freq); + + /* If we did not get a floor match, try least available freq */ + if (IS_ERR(opp)) { + new_freq = freq_table[0].frequency * 1000; + opp = opp_find_freq_exact(cpu_dev, new_freq, true); + } + if (IS_ERR(opp)) + ret = -ERANGE; + if (!IS_ERR(opp) && !IS_ERR(cpu_reg)) { + volt = opp_get_voltage(opp); + tol = volt * voltage_tolerance / 100; + } + rcu_read_unlock(); + if (ret) { + pr_err("Fail to find match boot clock rate: %lu\n", + boot_freq); + goto out_free_table; + } + + /* We dont expect to endup with same result */ + WARN_ON(boot_freq == new_freq); + + freq_exact = clk_round_rate(cpu_clk, new_freq); + if (freq_exact < 0) { + pr_err("Fail to find valid boot clock rate: %lu\n", + freq_exact); + goto out_free_table; + } + + /* Warn to get developer to fix bootloader */ + pr_err("Bootloader freq %luHz no match to table, Using %luHz\n", + boot_freq, new_freq); + + /* + * For voltage sequencing we *assume* that bootloader has at + * least set the voltage appropriate for the boot_frequency + */ + if (!IS_ERR(cpu_reg) && boot_freq < new_freq) { + ret = regulator_set_voltage_tol(cpu_reg, volt, tol); + if (ret) { + pr_err("Fail to scale boot voltage up: %d\n", + ret); + goto out_free_table; + } + } + + ret = clk_set_rate(cpu_clk, freq_exact); + if (ret) { + pr_err("Fail to set boot clock rate: %d\n", ret); + goto out_free_table; + } + + if (!IS_ERR(cpu_reg) && boot_freq > new_freq) { + ret = regulator_set_voltage_tol(cpu_reg, volt, tol); + if (ret) { + pr_err("Fail to scale boot voltage down: %d\n", + ret); + goto out_free_table; + } + } + } + ret = cpufreq_register_driver(&cpu0_cpufreq_driver); if (ret) { pr_err("failed register driver: %d\n", ret);
On many development platforms, at boot, the bootloader configured frequency maynot match the valid frequencies that are stated to be supported in OPP table. This may occur due to various reasons: a) older or default bootloader in development platform without latest updates b) SoC documentation update that may have occurred in kernel c) kernel definitions are out of date Vs bootloader which is updated etc.. In these cases, we should assume from a kernel perspective, the only safe frequency that the system can be on is the ones available in the OPP table. This may not handle case (c), but, that is a different kernel bug of it's own. Considering that in many existing or development platforms, (a) or (b) is common, enforce a sanity check and reprogram to a conservative start configuration at probe to allow sane operation independent of bootloader. Reported-by: Carlos Hernandez <ceh@ti.com> Signed-off-by: Nishanth Menon <nm@ti.com> --- based on v3.12 tag - however, a rebase with opp function name change is needed for v3.13-rc1 if folks are ok, I can repost with updates. Identified when during debug of https://patchwork.kernel.org/patch/3191411/ on TI vendor kernel based on v3.12 tag. In the case discussed, bootloader booted OMAP5uEVM at 1.1GHz when the available frequencies were 500MHz, 1GHz, 1.5GHz. To prevent system from functioning at this invalid configuration (and hence trigger the bug seen in stats), we should remove the dependence of the kernel from bootloader configuration. With this patch, in the mentioned boot scenario, we get the following log: [ 25.649736] cpufreq_cpu0: bootloader freq 1100000000 no match to table, Using 1000000000 Artificially modifying the bootloader to create other boundary conditions: (lower bound check) [ 25.633535] cpufreq_cpu0: bootloader freq 300000000Hz no match to table, Using 500000000Hz (upper bound check) [ 27.355837] cpufreq_cpu0: bootloader freq 1600000000Hz no match to table, Using 1500000000Hz The other alternative(to reduce code churn) would be to just report a mismatch and continue to function at the potentially risky OPP - but in the cases such as using userspace governor, the system could be in unstable state resulting in hard to debug behavior. The last alternative is to always expect bootloader to be in sync with proper OPP configuration, which rarely happens correctly. drivers/cpufreq/cpufreq-cpu0.c | 84 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 82 insertions(+), 2 deletions(-)