diff mbox

[v7,3/8] cpufreq: kirkwood: Remove use of the clk provider API

Message ID 1408375833-10703-4-git-send-email-tomeu.vizoso@collabora.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tomeu Vizoso Aug. 18, 2014, 3:30 p.m. UTC
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
---
 drivers/cpufreq/kirkwood-cpufreq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Tomeu Vizoso Aug. 21, 2014, 7:53 a.m. UTC | #1
On 08/21/2014 12:55 AM, Mike Turquette wrote:
> Quoting Tomeu Vizoso (2014-08-18 08:30:29)
>> Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>> ---
>>   drivers/cpufreq/kirkwood-cpufreq.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/cpufreq/kirkwood-cpufreq.c b/drivers/cpufreq/kirkwood-cpufreq.c
>> index 37a4806..f3d087f 100644
>> --- a/drivers/cpufreq/kirkwood-cpufreq.c
>> +++ b/drivers/cpufreq/kirkwood-cpufreq.c
>> @@ -12,7 +12,6 @@
>>   #include <linux/kernel.h>
>>   #include <linux/module.h>
>>   #include <linux/clk.h>
>> -#include <linux/clk-provider.h>
>>   #include <linux/cpufreq.h>
>>   #include <linux/of_device.h>
>>   #include <linux/platform_device.h>
>> @@ -50,7 +49,7 @@ static struct cpufreq_frequency_table kirkwood_freq_table[] = {
>>
>>   static unsigned int kirkwood_cpufreq_get_cpu_frequency(unsigned int cpu)
>>   {
>> -       if (__clk_is_enabled(priv.powersave_clk))
>> +       if (clk_is_enabled(priv.powersave_clk))
>>                  return kirkwood_freq_table[1].frequency;
>>          return kirkwood_freq_table[0].frequency;
>>   }
>> --
>> 1.9.3
>>
>
> Tomeu,
>
> After taking a second look at clk_is_enabled and the Kirkwood driver, I
> would prefer to not implement clk_is_enabled. The main reason is that it
> is racey, since the clock's status could of course change as soon as as
> that call completes. Furthermore I am worried that drivers might do
> something like:
>
> if (!clk_is_enabled(clk))
> 	clk_enable(clk);
>
> Which is crap and the driver should just call clk_enable any time it
> needs the clock. To that end I propose to drop "clk: provide public
> clk_is_enabled function" and replace your update to kirkwood-cpufreq.c
> with the following patch. Let me know what you think.

Yep, this looks much better to me. I will replace 2/8 and 3/8 with this 
one for v8.

Reviewed-By: Tomeu Vizoso <tomeu.vizoso@collabora.com>

Thanks!

Tomeu

> Regards,
> Mike
>
>
>
>  From 713b46fd66abd26a4e4d584a0b8a2a28c0459a60 Mon Sep 17 00:00:00 2001
> From: Mike Turquette <mturquette@linaro.org>
> Date: Mon, 18 Aug 2014 17:30:29 +0200
> Subject: [PATCH] cpufreq: kirkwood: Remove use of the clk provider API
>
> Instead of checking the clock framework to see if the clock is enabled,
> track the cpu frequency within the driver as the powersave_clk is
> enabled and disabled.
>
> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: Mike Turquette <mturquette@linaro.org>
> ---
>   drivers/cpufreq/kirkwood-cpufreq.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpufreq/kirkwood-cpufreq.c b/drivers/cpufreq/kirkwood-cpufreq.c
> index 37a4806..88f4641 100644
> --- a/drivers/cpufreq/kirkwood-cpufreq.c
> +++ b/drivers/cpufreq/kirkwood-cpufreq.c
> @@ -12,7 +12,6 @@
>   #include <linux/kernel.h>
>   #include <linux/module.h>
>   #include <linux/clk.h>
> -#include <linux/clk-provider.h>
>   #include <linux/cpufreq.h>
>   #include <linux/of_device.h>
>   #include <linux/platform_device.h>
> @@ -33,6 +32,8 @@ static struct priv
>   #define STATE_CPU_FREQ 0x01
>   #define STATE_DDR_FREQ 0x02
>
> +static unsigned long cpu_frequency = 0;
> +
>   /*
>    * Kirkwood can swap the clock to the CPU between two clocks:
>    *
> @@ -50,9 +51,7 @@ static struct cpufreq_frequency_table kirkwood_freq_table[] = {
>
>   static unsigned int kirkwood_cpufreq_get_cpu_frequency(unsigned int cpu)
>   {
> -	if (__clk_is_enabled(priv.powersave_clk))
> -		return kirkwood_freq_table[1].frequency;
> -	return kirkwood_freq_table[0].frequency;
> +	return cpu_frequency;
>   }
>
>   static int kirkwood_cpufreq_target(struct cpufreq_policy *policy,
> @@ -71,9 +70,11 @@ static int kirkwood_cpufreq_target(struct cpufreq_policy *policy,
>   	switch (state) {
>   	case STATE_CPU_FREQ:
>   		clk_disable(priv.powersave_clk);
> +		cpu_frequency = kirkwood_freq_table[0].frequency;
>   		break;
>   	case STATE_DDR_FREQ:
>   		clk_enable(priv.powersave_clk);
> +		cpu_frequency = kirkwood_freq_table[1].frequency;
>   		break;
>   	}
>
> @@ -133,6 +134,7 @@ static int kirkwood_cpufreq_probe(struct platform_device *pdev)
>
>   	clk_prepare_enable(priv.cpu_clk);
>   	kirkwood_freq_table[0].frequency = clk_get_rate(priv.cpu_clk) / 1000;
> +	cpu_frequency = kirkwood_freq_table[0].frequency;
>
>   	priv.ddr_clk = of_clk_get_by_name(np, "ddrclk");
>   	if (IS_ERR(priv.ddr_clk)) {
>
Andrew Lunn Aug. 21, 2014, 1:38 p.m. UTC | #2
On Thu, Aug 21, 2014 at 09:53:43AM +0200, Tomeu Vizoso wrote:
> On 08/21/2014 12:55 AM, Mike Turquette wrote:
> >Quoting Tomeu Vizoso (2014-08-18 08:30:29)
> >>Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> >>---
> >>  drivers/cpufreq/kirkwood-cpufreq.c | 3 +--
> >>  1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >>diff --git a/drivers/cpufreq/kirkwood-cpufreq.c b/drivers/cpufreq/kirkwood-cpufreq.c
> >>index 37a4806..f3d087f 100644
> >>--- a/drivers/cpufreq/kirkwood-cpufreq.c
> >>+++ b/drivers/cpufreq/kirkwood-cpufreq.c
> >>@@ -12,7 +12,6 @@
> >>  #include <linux/kernel.h>
> >>  #include <linux/module.h>
> >>  #include <linux/clk.h>
> >>-#include <linux/clk-provider.h>
> >>  #include <linux/cpufreq.h>
> >>  #include <linux/of_device.h>
> >>  #include <linux/platform_device.h>
> >>@@ -50,7 +49,7 @@ static struct cpufreq_frequency_table kirkwood_freq_table[] = {
> >>
> >>  static unsigned int kirkwood_cpufreq_get_cpu_frequency(unsigned int cpu)
> >>  {
> >>-       if (__clk_is_enabled(priv.powersave_clk))
> >>+       if (clk_is_enabled(priv.powersave_clk))
> >>                 return kirkwood_freq_table[1].frequency;
> >>         return kirkwood_freq_table[0].frequency;
> >>  }
> >>--
> >>1.9.3
> >>
> >
> >Tomeu,
> >
> >After taking a second look at clk_is_enabled and the Kirkwood driver, I
> >would prefer to not implement clk_is_enabled. The main reason is that it
> >is racey, since the clock's status could of course change as soon as as
> >that call completes. Furthermore I am worried that drivers might do
> >something like:
> >
> >if (!clk_is_enabled(clk))
> >	clk_enable(clk);
> >
> >Which is crap and the driver should just call clk_enable any time it
> >needs the clock. To that end I propose to drop "clk: provide public
> >clk_is_enabled function" and replace your update to kirkwood-cpufreq.c
> >with the following patch. Let me know what you think.

Hi Mike, Tomeu

> >+static unsigned long cpu_frequency = 0;


This has a problem. You are making an assumption about the initial
state. The way the hardware works, is you change the state of the
clock and then perform a Wait For Interrupt. Once the hardware has
finished adjusting its PLL, it raises an interrupt and things
continue.

However, if you don't cause an actual state change, the WFI never
returns. If this assumption is wrong, your box is dead the first time
it tries to change cpu frequency.

This is why the code reads the hardware register to find the real
current state, rather than assume it.

	Andrew

> >+
> >  /*
> >   * Kirkwood can swap the clock to the CPU between two clocks:
> >   *
> >@@ -50,9 +51,7 @@ static struct cpufreq_frequency_table kirkwood_freq_table[] = {
> >
> >  static unsigned int kirkwood_cpufreq_get_cpu_frequency(unsigned int cpu)
> >  {
> >-	if (__clk_is_enabled(priv.powersave_clk))
> >-		return kirkwood_freq_table[1].frequency;
> >-	return kirkwood_freq_table[0].frequency;
> >+	return cpu_frequency;
> >  }
> >
> >  static int kirkwood_cpufreq_target(struct cpufreq_policy *policy,
> >@@ -71,9 +70,11 @@ static int kirkwood_cpufreq_target(struct cpufreq_policy *policy,
> >  	switch (state) {
> >  	case STATE_CPU_FREQ:
> >  		clk_disable(priv.powersave_clk);
> >+		cpu_frequency = kirkwood_freq_table[0].frequency;
> >  		break;
> >  	case STATE_DDR_FREQ:
> >  		clk_enable(priv.powersave_clk);
> >+		cpu_frequency = kirkwood_freq_table[1].frequency;
> >  		break;
> >  	}
> >
> >@@ -133,6 +134,7 @@ static int kirkwood_cpufreq_probe(struct platform_device *pdev)
> >
> >  	clk_prepare_enable(priv.cpu_clk);
> >  	kirkwood_freq_table[0].frequency = clk_get_rate(priv.cpu_clk) / 1000;
> >+	cpu_frequency = kirkwood_freq_table[0].frequency;
> >
> >  	priv.ddr_clk = of_clk_get_by_name(np, "ddrclk");
> >  	if (IS_ERR(priv.ddr_clk)) {
> >
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Mike Turquette Aug. 22, 2014, 7:29 p.m. UTC | #3
Quoting Andrew Lunn (2014-08-21 06:38:25)
> On Thu, Aug 21, 2014 at 09:53:43AM +0200, Tomeu Vizoso wrote:
> > On 08/21/2014 12:55 AM, Mike Turquette wrote:
> > >Quoting Tomeu Vizoso (2014-08-18 08:30:29)
> > >>Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > >>---
> > >>  drivers/cpufreq/kirkwood-cpufreq.c | 3 +--
> > >>  1 file changed, 1 insertion(+), 2 deletions(-)
> > >>
> > >>diff --git a/drivers/cpufreq/kirkwood-cpufreq.c b/drivers/cpufreq/kirkwood-cpufreq.c
> > >>index 37a4806..f3d087f 100644
> > >>--- a/drivers/cpufreq/kirkwood-cpufreq.c
> > >>+++ b/drivers/cpufreq/kirkwood-cpufreq.c
> > >>@@ -12,7 +12,6 @@
> > >>  #include <linux/kernel.h>
> > >>  #include <linux/module.h>
> > >>  #include <linux/clk.h>
> > >>-#include <linux/clk-provider.h>
> > >>  #include <linux/cpufreq.h>
> > >>  #include <linux/of_device.h>
> > >>  #include <linux/platform_device.h>
> > >>@@ -50,7 +49,7 @@ static struct cpufreq_frequency_table kirkwood_freq_table[] = {
> > >>
> > >>  static unsigned int kirkwood_cpufreq_get_cpu_frequency(unsigned int cpu)
> > >>  {
> > >>-       if (__clk_is_enabled(priv.powersave_clk))
> > >>+       if (clk_is_enabled(priv.powersave_clk))
> > >>                 return kirkwood_freq_table[1].frequency;
> > >>         return kirkwood_freq_table[0].frequency;
> > >>  }
> > >>--
> > >>1.9.3
> > >>
> > >
> > >Tomeu,
> > >
> > >After taking a second look at clk_is_enabled and the Kirkwood driver, I
> > >would prefer to not implement clk_is_enabled. The main reason is that it
> > >is racey, since the clock's status could of course change as soon as as
> > >that call completes. Furthermore I am worried that drivers might do
> > >something like:
> > >
> > >if (!clk_is_enabled(clk))
> > >     clk_enable(clk);
> > >
> > >Which is crap and the driver should just call clk_enable any time it
> > >needs the clock. To that end I propose to drop "clk: provide public
> > >clk_is_enabled function" and replace your update to kirkwood-cpufreq.c
> > >with the following patch. Let me know what you think.
> 
> Hi Mike, Tomeu
> 
> > >+static unsigned long cpu_frequency = 0;
> 
> 
> This has a problem. You are making an assumption about the initial
> state. The way the hardware works, is you change the state of the
> clock and then perform a Wait For Interrupt. Once the hardware has
> finished adjusting its PLL, it raises an interrupt and things
> continue.

Andrew,

Thanks for reviewing. I think my patch is equivalent to the old way of
doing things, with one exception that I will address later below.

struct cpufreq_frequency_table kirkwood_freq_table has clock rates
initialized to zero, so there is no regression compared to my unsigned
long cpu_frequency variable used for tracking the clock rate. Both
implementations start with unknown rates in hardware and initialize a
variable to zero until that rate can be discovered later on in
kirkwood_cpufreq_probe().

kirkwood_cpufreq_get_cpu_frequency() returns the frequency based on the
state of the clock. As best I can tell, this clock is only touched by
this cpufreq driver and nowhere else, so the driver knows the state of
the clock implicitly and doesn't need to read any hardware registers to
see if it is enabled or not. Every time we enable or disable the clock
we can know the cpu frequency.

> 
> However, if you don't cause an actual state change, the WFI never
> returns. If this assumption is wrong, your box is dead the first time
> it tries to change cpu frequency.

So if a state change in hardware never occurs, the cpu will not wake up?
That sounds like a bad situation but I do not understand how it relates
to the changes I made in the driver. Could you explain how tracking
cpu_frequency in the driver would result in the cpu not waking up from
wfi?

> 
> This is why the code reads the hardware register to find the real
> current state, rather than assume it.

OK, so this is the point that I referenced above that needs to be
addressed. The *only* difference between my implementation and yours is
that you do read the enable bit on the powersave clock every time you
query the frequency. Note that this driver controls the state of the
powersave clock directly via calls to clk_enable & clk_disable.

Is there ever a case where hardware will change the state of the clock
behind our backs? If the driver calls clk_enable(priv.powersave_clk),
then is there ever a possibility that the clock will in fact be
disabled? Likewise if we disable the clock with a call to
clk_disable(priv.powersave_clk), is there ever an instance where the
hardware will re-enable that clock without telling us? Can the driver's
view of the clock status be out of sync with the actual hardware?

Thanks,
Mike

> 
>         Andrew
> 
> > >+
> > >  /*
> > >   * Kirkwood can swap the clock to the CPU between two clocks:
> > >   *
> > >@@ -50,9 +51,7 @@ static struct cpufreq_frequency_table kirkwood_freq_table[] = {
> > >
> > >  static unsigned int kirkwood_cpufreq_get_cpu_frequency(unsigned int cpu)
> > >  {
> > >-    if (__clk_is_enabled(priv.powersave_clk))
> > >-            return kirkwood_freq_table[1].frequency;
> > >-    return kirkwood_freq_table[0].frequency;
> > >+    return cpu_frequency;
> > >  }
> > >
> > >  static int kirkwood_cpufreq_target(struct cpufreq_policy *policy,
> > >@@ -71,9 +70,11 @@ static int kirkwood_cpufreq_target(struct cpufreq_policy *policy,
> > >     switch (state) {
> > >     case STATE_CPU_FREQ:
> > >             clk_disable(priv.powersave_clk);
> > >+            cpu_frequency = kirkwood_freq_table[0].frequency;
> > >             break;
> > >     case STATE_DDR_FREQ:
> > >             clk_enable(priv.powersave_clk);
> > >+            cpu_frequency = kirkwood_freq_table[1].frequency;
> > >             break;
> > >     }
> > >
> > >@@ -133,6 +134,7 @@ static int kirkwood_cpufreq_probe(struct platform_device *pdev)
> > >
> > >     clk_prepare_enable(priv.cpu_clk);
> > >     kirkwood_freq_table[0].frequency = clk_get_rate(priv.cpu_clk) / 1000;
> > >+    cpu_frequency = kirkwood_freq_table[0].frequency;
> > >
> > >     priv.ddr_clk = of_clk_get_by_name(np, "ddrclk");
> > >     if (IS_ERR(priv.ddr_clk)) {
> > >
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Andrew Lunn Aug. 22, 2014, 8:11 p.m. UTC | #4
> Thanks for reviewing. I think my patch is equivalent to the old way of
> doing things, with one exception that I will address later below.
> 
> struct cpufreq_frequency_table kirkwood_freq_table has clock rates
> initialized to zero, so there is no regression compared to my unsigned
> long cpu_frequency variable used for tracking the clock rate. Both
> implementations start with unknown rates in hardware and initialize a
> variable to zero until that rate can be discovered later on in
> kirkwood_cpufreq_probe().
> 
> kirkwood_cpufreq_get_cpu_frequency() returns the frequency based on the
> state of the clock. As best I can tell, this clock is only touched by
> this cpufreq driver and nowhere else

Not quite true. u-boot might of touch the clock. Weird things happen
with some kirkwood boards. Some don't have the ability to control
there power supplies. So some boards implement "power off" by
rebooting, and letting u-boot spin until a button is pressed. I hope
such a u-boot powers off as much as possible, and e.g. drops the CPU
clock to the lower frequency. One would also hope it puts it back to
high speed before calling the kernel.

There is also the somewhat unlikely case that somebody uses kexec to
go from one kernel to another. You have no idea what state the
previous kernel left the clock in.

> , so the driver knows the state of
> the clock implicitly and doesn't need to read any hardware registers to
> see if it is enabled or not. Every time we enable or disable the clock
> we can know the cpu frequency.
> 
> > 
> > However, if you don't cause an actual state change, the WFI never
> > returns. If this assumption is wrong, your box is dead the first time
> > it tries to change cpu frequency.
> 
> So if a state change in hardware never occurs, the cpu will not wake up?

Correct. I had that situation a few times while developing this
driver. And it is not obvious what has happened, since it does not
happen immediately, but when the governor decides the CPU load passes
a threshold and the frequency can be changed. I had it stop dead while
i assume it executed some sleep in an /etc/init.d script.

> That sounds like a bad situation but I do not understand how it relates
> to the changes I made in the driver. Could you explain how tracking
> cpu_frequency in the driver would result in the cpu not waking up from
> wfi?

It was clearer in earlier versions of the driver, but code has been
refactored into the cpufreq core. The core should call
kirkwood_cpufreq_get_cpu_frequency() in order to get the current
frequency, and only perform a change if the requested frequency is
different. In the current code, kirkwood_cpufreq_get_cpu_frequency()
reads from the hardware what the current frequency is. So we are
guaranteed to only call kirkwood_cpufreq_target() when there is a real
change.

[snip]

> Can the driver's
> view of the clock status be out of sync with the actual hardware?

Yes, at startup, as explained above, we have no idea what the current
state of the hardware is. Once we do know the real state, we can track
it within the driver, but we need to get that initial state correct,
or we WFI forever on the first frequency change.

   Andrew
Andrew Lunn Aug. 22, 2014, 8:27 p.m. UTC | #5
> It was clearer in earlier versions of the driver, but code has been
> refactored into the cpufreq core. The core should call
> kirkwood_cpufreq_get_cpu_frequency() in order to get the current
> frequency, and only perform a change if the requested frequency is
> different. In the current code, kirkwood_cpufreq_get_cpu_frequency()
> reads from the hardware what the current frequency is. So we are
> guaranteed to only call kirkwood_cpufreq_target() when there is a real
> change.

Hi Mike

I went looking at the core.

drivers/cpufreq/cpufreq.c:__cpufreq_add_dev() contains:

        if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
                policy->cur = cpufreq_driver->get(policy->cpu);
                if (!policy->cur) {
                        pr_err("%s: ->get() failed\n", __func__);
                        goto err_get_freq;
                }
        }

So this gets the current frequency from the driver, when the driver is
added. So for the current code, this gets the real state of the
hardware.

and

drivers/cpufreq/cpufreq.c:__cpufreq_driver_target() contains:

        /*
         * This might look like a redundant call as we are checking it again
         * after finding index. But it is left intentionally for cases where
         * exactly same freq is called again and so we can save on few function
         * calls.
         */
        if (target_freq == policy->cur)
                return 0;

        /* Save last value to restore later on errors */
        policy->restore_freq = policy->cur;

        if (cpufreq_driver->target)
                retval = cpufreq_driver->target(policy, target_freq, relation);

and here it will only call the function to change the frequency, if it
is different from the current frequency.

   Andrew
Mike Turquette Aug. 26, 2014, 9:46 p.m. UTC | #6
Quoting Andrew Lunn (2014-08-22 13:11:12)
> > Thanks for reviewing. I think my patch is equivalent to the old way of
> > doing things, with one exception that I will address later below.
> > 
> > struct cpufreq_frequency_table kirkwood_freq_table has clock rates
> > initialized to zero, so there is no regression compared to my unsigned
> > long cpu_frequency variable used for tracking the clock rate. Both
> > implementations start with unknown rates in hardware and initialize a
> > variable to zero until that rate can be discovered later on in
> > kirkwood_cpufreq_probe().
> > 
> > kirkwood_cpufreq_get_cpu_frequency() returns the frequency based on the
> > state of the clock. As best I can tell, this clock is only touched by
> > this cpufreq driver and nowhere else
> 
> Not quite true. u-boot might of touch the clock. Weird things happen
> with some kirkwood boards. Some don't have the ability to control
> there power supplies. So some boards implement "power off" by
> rebooting, and letting u-boot spin until a button is pressed. I hope
> such a u-boot powers off as much as possible, and e.g. drops the CPU
> clock to the lower frequency. One would also hope it puts it back to
> high speed before calling the kernel.

I have a doubt about this.

The powersave clock in drivers/clk/mvebu/kirkwood.c does not set
CLK_IGNORE_UNUSED, nor do any of the clocks in kirkwood_gating_desc[].

So regardless of what U-boot does, if no driver has called clk_enable() on
powersave_clk by late_initcall-time then clk_disable_unused() will
disable it as a power-saving mechanism.

So are kirkwood systems that use cpufreq simply getting lucky and not
hanging?

Regards,
Mike

> 
> There is also the somewhat unlikely case that somebody uses kexec to
> go from one kernel to another. You have no idea what state the
> previous kernel left the clock in.
> 
> > , so the driver knows the state of
> > the clock implicitly and doesn't need to read any hardware registers to
> > see if it is enabled or not. Every time we enable or disable the clock
> > we can know the cpu frequency.
> > 
> > > 
> > > However, if you don't cause an actual state change, the WFI never
> > > returns. If this assumption is wrong, your box is dead the first time
> > > it tries to change cpu frequency.
> > 
> > So if a state change in hardware never occurs, the cpu will not wake up?
> 
> Correct. I had that situation a few times while developing this
> driver. And it is not obvious what has happened, since it does not
> happen immediately, but when the governor decides the CPU load passes
> a threshold and the frequency can be changed. I had it stop dead while
> i assume it executed some sleep in an /etc/init.d script.
> 
> > That sounds like a bad situation but I do not understand how it relates
> > to the changes I made in the driver. Could you explain how tracking
> > cpu_frequency in the driver would result in the cpu not waking up from
> > wfi?
> 
> It was clearer in earlier versions of the driver, but code has been
> refactored into the cpufreq core. The core should call
> kirkwood_cpufreq_get_cpu_frequency() in order to get the current
> frequency, and only perform a change if the requested frequency is
> different. In the current code, kirkwood_cpufreq_get_cpu_frequency()
> reads from the hardware what the current frequency is. So we are
> guaranteed to only call kirkwood_cpufreq_target() when there is a real
> change.
> 
> [snip]
> 
> > Can the driver's
> > view of the clock status be out of sync with the actual hardware?
> 
> Yes, at startup, as explained above, we have no idea what the current
> state of the hardware is. Once we do know the real state, we can track
> it within the driver, but we need to get that initial state correct,
> or we WFI forever on the first frequency change.
> 
>    Andrew
Andrew Lunn Aug. 26, 2014, 10:36 p.m. UTC | #7
> > Not quite true. u-boot might of touch the clock. Weird things happen
> > with some kirkwood boards. Some don't have the ability to control
> > there power supplies. So some boards implement "power off" by
> > rebooting, and letting u-boot spin until a button is pressed. I hope
> > such a u-boot powers off as much as possible, and e.g. drops the CPU
> > clock to the lower frequency. One would also hope it puts it back to
> > high speed before calling the kernel.
> 
> I have a doubt about this.
> 
> The powersave clock in drivers/clk/mvebu/kirkwood.c does not set
> CLK_IGNORE_UNUSED, nor do any of the clocks in kirkwood_gating_desc[].
> 
> So regardless of what U-boot does, if no driver has called clk_enable() on
> powersave_clk by late_initcall-time then clk_disable_unused() will
> disable it as a power-saving mechanism.
> 
> So are kirkwood systems that use cpufreq simply getting lucky and not
> hanging?

Hi Mike

Its a good question.

However, the reset value of the clock is off. off means the CPU is
running at its high speed. Turning this clock on, actually reduces the
clock speed! So for 99% of the time, the late_initcall does nothing.

It gets more interesting when uboot, or a previous kernel has turned
the clock on. I admit, i don't expect this to happen very often, but
if it does, and there is no cpufreq driver, interesting things could
happen. The cpufreq driver can only be builtin, not a module. So if it
is available, it should be guaranteed to claim the clock before the
late_initcall could turn it off. And since it reads the hardware
state, it will do the right thing.

       Andrew
Mike Turquette Aug. 26, 2014, 11:30 p.m. UTC | #8
Quoting Andrew Lunn (2014-08-26 15:36:37)
> > > Not quite true. u-boot might of touch the clock. Weird things happen
> > > with some kirkwood boards. Some don't have the ability to control
> > > there power supplies. So some boards implement "power off" by
> > > rebooting, and letting u-boot spin until a button is pressed. I hope
> > > such a u-boot powers off as much as possible, and e.g. drops the CPU
> > > clock to the lower frequency. One would also hope it puts it back to
> > > high speed before calling the kernel.
> > 
> > I have a doubt about this.
> > 
> > The powersave clock in drivers/clk/mvebu/kirkwood.c does not set
> > CLK_IGNORE_UNUSED, nor do any of the clocks in kirkwood_gating_desc[].
> > 
> > So regardless of what U-boot does, if no driver has called clk_enable() on
> > powersave_clk by late_initcall-time then clk_disable_unused() will
> > disable it as a power-saving mechanism.
> > 
> > So are kirkwood systems that use cpufreq simply getting lucky and not
> > hanging?
> 
> Hi Mike
> 
> Its a good question.
> 
> However, the reset value of the clock is off. off means the CPU is
> running at its high speed. Turning this clock on, actually reduces the
> clock speed! So for 99% of the time, the late_initcall does nothing.
> 
> It gets more interesting when uboot, or a previous kernel has turned
> the clock on. I admit, i don't expect this to happen very often, but
> if it does, and there is no cpufreq driver, interesting things could
> happen. The cpufreq driver can only be builtin, not a module. So if it
> is available, it should be guaranteed to claim the clock before the
> late_initcall could turn it off. And since it reads the hardware
> state, it will do the right thing.

Hi Andrew,

Thanks for the response. That all makes sense. I still think my solution
is OK for both of your cases:

1) for the U-boot-enables-powersave-clk case, if we do not have driver
that claims the clock and does something with it then that is out of
scope of our concerns

2) for the kexec-kernel-case, the responsibility is on the first kernel
to set things up in a good state for the second kernel, with the
exception of using kexec to debug/examime/recover from a kernel crash,
in which case you likely don't care about this stuff as much

But anyways I do understand the case you are trying to prevent.

One final thought I have had is that it might be a good idea to have a
mux clock which represents the clock signal that feeds into the cpu. It
seems that a mux is exactly what is going on here with cpuclk rate and
ddrclk rate.

I even wonder if it is even appropriate to model this transition with a
clock enable operation? Maybe it is only a multiplex operation, or
perhaps a combination of enabling the powersave clock and changing the
parent input to the cpu?

My idea is instead of relying on a cpufreq driver to parse the state of
your clocks and understand the multiplexing, you can use the clock
framework for that. In fact that might help you get one step closer to
using the cpufreq-cpu0.c/cpufreq-generic.c implementation.

Regards,
Mike

> 
>        Andrew
Andrew Lunn Aug. 27, 2014, 12:35 a.m. UTC | #9
> One final thought I have had is that it might be a good idea to have a
> mux clock which represents the clock signal that feeds into the cpu. It
> seems that a mux is exactly what is going on here with cpuclk rate and
> ddrclk rate.

I did think of this when i implemented the cpufreq driver. What makes
it hard is that this bit is mixed in the same 32 bit register as the
gate clocks. It would mean two clock providers sharing the same
register, sharing a spinlock, etc. And the gating provider is shared
with all mvebu platforms, dove, kirkword, orion5x, and four different
armada SoCs. So i'm pushing complexity into this shared code, which
none of the others need. Does the standard mux clock do what is
needed? Or would i have to write a new clock provider?

> I even wonder if it is even appropriate to model this transition with a
> clock enable operation? Maybe it is only a multiplex operation, or
> perhaps a combination of enabling the powersave clock and changing the
> parent input to the cpu?
> 
> My idea is instead of relying on a cpufreq driver to parse the state of
> your clocks and understand the multiplexing, you can use the clock
> framework for that. In fact that might help you get one step closer to
> using the cpufreq-cpu0.c/cpufreq-generic.c implementation.

So you want the whole disabling of interrupt delivery to the cpu,
flipping the mux, wait for interrupt and re-enabling of interrupt
delivery to the cpu inside the clock provider? That is way past a
simple mux clock.

       Andrew
Mike Turquette Aug. 27, 2014, 5:04 a.m. UTC | #10
On Tue, Aug 26, 2014 at 5:35 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>> One final thought I have had is that it might be a good idea to have a
>> mux clock which represents the clock signal that feeds into the cpu. It
>> seems that a mux is exactly what is going on here with cpuclk rate and
>> ddrclk rate.
>
> I did think of this when i implemented the cpufreq driver. What makes
> it hard is that this bit is mixed in the same 32 bit register as the
> gate clocks. It would mean two clock providers sharing the same
> register, sharing a spinlock, etc. And the gating provider is shared
> with all mvebu platforms, dove, kirkword, orion5x, and four different
> armada SoCs. So i'm pushing complexity into this shared code, which
> none of the others need. Does the standard mux clock do what is
> needed? Or would i have to write a new clock provider?

Well I think that the mux-clock type should suffice. Both the standard
gate and mux can have a spinlock passed in at registration-time, so it
could be a shared spinlock. The standard mux clock expects a bitfield
in a register, similar to the single-bit approach taken by the gate
clock. So I think it could do just fine.

>
>> I even wonder if it is even appropriate to model this transition with a
>> clock enable operation? Maybe it is only a multiplex operation, or
>> perhaps a combination of enabling the powersave clock and changing the
>> parent input to the cpu?
>>
>> My idea is instead of relying on a cpufreq driver to parse the state of
>> your clocks and understand the multiplexing, you can use the clock
>> framework for that. In fact that might help you get one step closer to
>> using the cpufreq-cpu0.c/cpufreq-generic.c implementation.
>
> So you want the whole disabling of interrupt delivery to the cpu,
> flipping the mux, wait for interrupt and re-enabling of interrupt
> delivery to the cpu inside the clock provider? That is way past a
> simple mux clock.

No way! I said, "one step closer" for a reason. The interrupt stuff is
clearly out of scope.

Regards,
Mike

>
>        Andrew
Jason Gunthorpe Aug. 27, 2014, 3:58 p.m. UTC | #11
On Tue, Aug 26, 2014 at 04:30:08PM -0700, Mike Turquette wrote:

> 2) for the kexec-kernel-case, the responsibility is on the first kernel
> to set things up in a good state for the second kernel, with the
> exception of using kexec to debug/examime/recover from a kernel crash,
> in which case you likely don't care about this stuff as much

FWIW, we frequently use a kexec flow on Kirkwood for development here
- and I have not been able to get the initial kernel to cleanly shut
down before kexec'ing the second kernel.

The flow we've had to use involved including a pre-kernel stub in the
kexec flow that goes around and cleans up all the registers enough so
that the 2nd kernel will work properly. Critically it does things like
turn off ethernet DMA, because the initial kernel won't even do that
:|

There is some kind of support for doing this, but I ran out of time
unraveling the mess of config options to actually turn it on for
kirkwood.. It is tied to PM support which was/is missing elements on
Kirkwood..

Jason
diff mbox

Patch

diff --git a/drivers/cpufreq/kirkwood-cpufreq.c b/drivers/cpufreq/kirkwood-cpufreq.c
index 37a4806..f3d087f 100644
--- a/drivers/cpufreq/kirkwood-cpufreq.c
+++ b/drivers/cpufreq/kirkwood-cpufreq.c
@@ -12,7 +12,6 @@ 
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/clk.h>
-#include <linux/clk-provider.h>
 #include <linux/cpufreq.h>
 #include <linux/of_device.h>
 #include <linux/platform_device.h>
@@ -50,7 +49,7 @@  static struct cpufreq_frequency_table kirkwood_freq_table[] = {
 
 static unsigned int kirkwood_cpufreq_get_cpu_frequency(unsigned int cpu)
 {
-	if (__clk_is_enabled(priv.powersave_clk))
+	if (clk_is_enabled(priv.powersave_clk))
 		return kirkwood_freq_table[1].frequency;
 	return kirkwood_freq_table[0].frequency;
 }