diff mbox

[1/2] cpufreq: arm_big_little: check if the frequency is set correctly

Message ID 1427718438-31098-1-git-send-email-sudeep.holla@arm.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Sudeep Holla March 30, 2015, 12:27 p.m. UTC
The actual frequency is set through "clk_change_rate" which is void
function. If the underlying hardware fails and returns error, the error
is lost in the clk layer. In order to track such failures, we need to
read back the frequency(just the cached value as clk_recalc called after
clk->ops->set_rate gets the frequency)

This patch adds check to see if the frequency is set correctly or if
they were any hardware failures and sends the appropriate errors to the
cpufreq core.

Cc: Viresh Kumar <viresh.kumar@linaro.org> 
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 drivers/cpufreq/arm_big_little.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Viresh Kumar March 30, 2015, 1:27 p.m. UTC | #1
On 30 March 2015 at 17:57, Sudeep Holla <sudeep.holla@arm.com> wrote:
> The actual frequency is set through "clk_change_rate" which is void
> function. If the underlying hardware fails and returns error, the error
> is lost in the clk layer. In order to track such failures, we need to
> read back the frequency(just the cached value as clk_recalc called after
> clk->ops->set_rate gets the frequency)
>
> This patch adds check to see if the frequency is set correctly or if
> they were any hardware failures and sends the appropriate errors to the
> cpufreq core.
>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> ---
>  drivers/cpufreq/arm_big_little.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
> index e1a6ba66a7f5..3fc676c63f91 100644
> --- a/drivers/cpufreq/arm_big_little.c
> +++ b/drivers/cpufreq/arm_big_little.c
> @@ -186,6 +186,8 @@ bL_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate)
>                 mutex_unlock(&cluster_lock[old_cluster]);
>         }
>
> +       if (bL_cpufreq_get_rate(cpu) != new_rate)
> +               return -EIO;
>         return 0;
>  }

This doesn't look to me the right place for fixing this.

@Mike ??
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla March 30, 2015, 1:39 p.m. UTC | #2
On 30/03/15 14:27, Viresh Kumar wrote:
> On 30 March 2015 at 17:57, Sudeep Holla <sudeep.holla@arm.com> wrote:
>> The actual frequency is set through "clk_change_rate" which is void
>> function. If the underlying hardware fails and returns error, the error
>> is lost in the clk layer. In order to track such failures, we need to
>> read back the frequency(just the cached value as clk_recalc called after
>> clk->ops->set_rate gets the frequency)
>>
>> This patch adds check to see if the frequency is set correctly or if
>> they were any hardware failures and sends the appropriate errors to the
>> cpufreq core.
>>
>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
>> ---
>>   drivers/cpufreq/arm_big_little.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
>> index e1a6ba66a7f5..3fc676c63f91 100644
>> --- a/drivers/cpufreq/arm_big_little.c
>> +++ b/drivers/cpufreq/arm_big_little.c
>> @@ -186,6 +186,8 @@ bL_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate)
>>                  mutex_unlock(&cluster_lock[old_cluster]);
>>          }
>>
>> +       if (bL_cpufreq_get_rate(cpu) != new_rate)
>> +               return -EIO;
>>          return 0;
>>   }
>
> This doesn't look to me the right place for fixing this.
>

Yes I agree, after going through clk.c, I thought pre-/post- notifiers
are designed for such purpose. I tried using them but found it
unnecessary when it can be as simple as in this patch. However it's good
to hear from Mike as I seem to have assumed a lot here.

Regards,
Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Turquette March 31, 2015, 1:48 a.m. UTC | #3
Quoting Sudeep Holla (2015-03-30 06:39:00)
> 
> 
> On 30/03/15 14:27, Viresh Kumar wrote:
> > On 30 March 2015 at 17:57, Sudeep Holla <sudeep.holla@arm.com> wrote:
> >> The actual frequency is set through "clk_change_rate" which is void
> >> function. If the underlying hardware fails and returns error, the error
> >> is lost in the clk layer. In order to track such failures, we need to
> >> read back the frequency(just the cached value as clk_recalc called after
> >> clk->ops->set_rate gets the frequency)
> >>
> >> This patch adds check to see if the frequency is set correctly or if
> >> they were any hardware failures and sends the appropriate errors to the
> >> cpufreq core.
> >>
> >> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> >> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> >> ---
> >>   drivers/cpufreq/arm_big_little.c | 2 ++
> >>   1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
> >> index e1a6ba66a7f5..3fc676c63f91 100644
> >> --- a/drivers/cpufreq/arm_big_little.c
> >> +++ b/drivers/cpufreq/arm_big_little.c
> >> @@ -186,6 +186,8 @@ bL_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate)
> >>                  mutex_unlock(&cluster_lock[old_cluster]);
> >>          }
> >>
> >> +       if (bL_cpufreq_get_rate(cpu) != new_rate)
> >> +               return -EIO;
> >>          return 0;
> >>   }
> >
> > This doesn't look to me the right place for fixing this.
> >
> 
> Yes I agree, after going through clk.c, I thought pre-/post- notifiers
> are designed for such purpose. I tried using them but found it
> unnecessary when it can be as simple as in this patch. However it's good
> to hear from Mike as I seem to have assumed a lot here.

Viresh & Sudeep,

clk_set_rate returns an error (and always has), so it seems to me that
this patch is unnecessary. bL_cpufreq_set_rate checks for an error from
clk_set_rate and handles it.

clk_change_rate is static and not exposed outside of drivers/clk/clk.c.

This patch gets a NAK from me.

Regards,
Mike

> 
> Regards,
> Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla March 31, 2015, 9:24 a.m. UTC | #4
On 31/03/15 02:48, Michael Turquette wrote:
> Quoting Sudeep Holla (2015-03-30 06:39:00)
>> On 30/03/15 14:27, Viresh Kumar wrote:
>>> On 30 March 2015 at 17:57, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>>> The actual frequency is set through "clk_change_rate" which is void
>>>> function. If the underlying hardware fails and returns error, the error
>>>> is lost in the clk layer. In order to track such failures, we need to
>>>> read back the frequency(just the cached value as clk_recalc called after
>>>> clk->ops->set_rate gets the frequency)

[...]
>>>
>>> This doesn't look to me the right place for fixing this.
>>>
>>
>> Yes I agree, after going through clk.c, I thought pre-/post- notifiers
>> are designed for such purpose. I tried using them but found it
>> unnecessary when it can be as simple as in this patch. However it's good
>> to hear from Mike as I seem to have assumed a lot here.
>
> Viresh & Sudeep,
>
> clk_set_rate returns an error (and always has), so it seems to me that
> this patch is unnecessary. bL_cpufreq_set_rate checks for an error from
> clk_set_rate and handles it.
>

No that's not correct, may be I was not clear earlier. Let me explain
with the stack trace.

bL_cpufreq_set_target(returns 0 even when clock driver returned error)
         |
         V
clk_set_rate(returns whatever it get from clk_core_set_rate_nolock)
         |
         V
clk_core_set_rate_nolock(always return 0 after calling clk_change_rate)
         |
         V
clk_change_rate(void function, so no return)
         |
         V
clk->ops->set_rate(i.e. <clock_driver_set_rate>)

Now for drivers/clk/clk.c IIUC, the return value from clk->ops->set_rate
is not checked. Now if <clock_driver_set_rate> returns error when h/w
fails to set the rate, I would like to know how the error returned by
<clock_driver_set_rate> is returned and received by clk_set_rate.
Correct me if I am missing anything in the above sequence.

In the current state of code, one can use notifier(basically
POST_RATE_CHANGE is called only if the clock rate changes), but since
the clk_recalc reads back the clock rate, I found this patch is simpler
compared to the notifiers.

Regards,
Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla April 1, 2015, 10:01 a.m. UTC | #5
On 31/03/15 10:24, Sudeep Holla wrote:
> On 31/03/15 02:48, Michael Turquette wrote:

[...]

>> clk_set_rate returns an error (and always has), so it seems to me that
>> this patch is unnecessary. bL_cpufreq_set_rate checks for an error from
>> clk_set_rate and handles it.
>>
>
> No that's not correct, may be I was not clear earlier. Let me explain
> with the stack trace.
>
> bL_cpufreq_set_target(returns 0 even when clock driver returned error)
>           |
>           V
> clk_set_rate(returns whatever it get from clk_core_set_rate_nolock)
>           |
>           V
> clk_core_set_rate_nolock(always return 0 after calling clk_change_rate)
>           |
>           V
> clk_change_rate(void function, so no return)
>           |
>           V
> clk->ops->set_rate(i.e. <clock_driver_set_rate>)
>
> Now for drivers/clk/clk.c IIUC, the return value from clk->ops->set_rate
> is not checked. Now if <clock_driver_set_rate> returns error when h/w
> fails to set the rate, I would like to know how the error returned by
> <clock_driver_set_rate> is returned and received by clk_set_rate.
> Correct me if I am missing anything in the above sequence.
>

Any input on this ? or am I taking non-sense here ?

Regards,
Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Turquette April 1, 2015, 9:48 p.m. UTC | #6
Quoting Sudeep Holla (2015-03-31 02:24:29)
> 
> 
> On 31/03/15 02:48, Michael Turquette wrote:
> > Quoting Sudeep Holla (2015-03-30 06:39:00)
> >> On 30/03/15 14:27, Viresh Kumar wrote:
> >>> On 30 March 2015 at 17:57, Sudeep Holla <sudeep.holla@arm.com> wrote:
> >>>> The actual frequency is set through "clk_change_rate" which is void
> >>>> function. If the underlying hardware fails and returns error, the error
> >>>> is lost in the clk layer. In order to track such failures, we need to
> >>>> read back the frequency(just the cached value as clk_recalc called after
> >>>> clk->ops->set_rate gets the frequency)
> 
> [...]
> >>>
> >>> This doesn't look to me the right place for fixing this.
> >>>
> >>
> >> Yes I agree, after going through clk.c, I thought pre-/post- notifiers
> >> are designed for such purpose. I tried using them but found it
> >> unnecessary when it can be as simple as in this patch. However it's good
> >> to hear from Mike as I seem to have assumed a lot here.
> >
> > Viresh & Sudeep,
> >
> > clk_set_rate returns an error (and always has), so it seems to me that
> > this patch is unnecessary. bL_cpufreq_set_rate checks for an error from
> > clk_set_rate and handles it.
> >
> 
> No that's not correct, may be I was not clear earlier. Let me explain
> with the stack trace.
> 
> bL_cpufreq_set_target(returns 0 even when clock driver returned error)
>          |
>          V
> clk_set_rate(returns whatever it get from clk_core_set_rate_nolock)
>          |
>          V
> clk_core_set_rate_nolock(always return 0 after calling clk_change_rate)

Ah, now I understand our misunderstanding.

clk_core_set_rate_nolock can fail BEFORE calling clk_change_rate, which
is where we do a lot of the work to see if the rate change is even
possible. That is what I was referring to in my previous mail.

What you have is a failing .set_rate callback and you need to know if it
failed. You are correct that we are not handling the return value from
.set_rate. That needs to change.

>          |
>          V
> clk_change_rate(void function, so no return)
>          |
>          V
> clk->ops->set_rate(i.e. <clock_driver_set_rate>)
> 
> Now for drivers/clk/clk.c IIUC, the return value from clk->ops->set_rate
> is not checked. Now if <clock_driver_set_rate> returns error when h/w
> fails to set the rate, I would like to know how the error returned by
> <clock_driver_set_rate> is returned and received by clk_set_rate.
> Correct me if I am missing anything in the above sequence.
> 
> In the current state of code, one can use notifier(basically
> POST_RATE_CHANGE is called only if the clock rate changes), but since
> the clk_recalc reads back the clock rate, I found this patch is simpler
> compared to the notifiers.

Simpler, but not better. What you want is to know if the rate change
failed. We need to through an exception when .set_rate fails and
propagate the error up the call chain to the cpufreq driver.

I'm thinking of ways to do this ... would require some surgery to the
clock framework but it might give us a more elegant way to recover from
a failure and roll back to a known good state.

Regards,
Mike

> 
> Regards,
> Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla April 2, 2015, 8:55 a.m. UTC | #7
On 01/04/15 22:48, Michael Turquette wrote:
> Quoting Sudeep Holla (2015-03-31 02:24:29)

[...]

>>
>> No that's not correct, may be I was not clear earlier. Let me explain
>> with the stack trace.
>>
>> bL_cpufreq_set_target(returns 0 even when clock driver returned error)
>>           |
>>           V
>> clk_set_rate(returns whatever it get from clk_core_set_rate_nolock)
>>           |
>>           V
>> clk_core_set_rate_nolock(always return 0 after calling clk_change_rate)
>
> Ah, now I understand our misunderstanding.
>
> clk_core_set_rate_nolock can fail BEFORE calling clk_change_rate, which
> is where we do a lot of the work to see if the rate change is even
> possible. That is what I was referring to in my previous mail.
>

Ah, I guessed so as I was not clear in my earlier email. A simple flow
diagram did the job better for me :)

> What you have is a failing .set_rate callback and you need to know if it
> failed. You are correct that we are not handling the return value from
> .set_rate. That needs to change.
>

Cool, since I had not followed the design of the clock APIs, I assumed
it needs to be handled in one of the way: notifiers or get_rate. Thanks
for the clarification.

>>           |
>>           V
>> clk_change_rate(void function, so no return)
>>           |
>>           V
>> clk->ops->set_rate(i.e. <clock_driver_set_rate>)
>>
>> Now for drivers/clk/clk.c IIUC, the return value from clk->ops->set_rate
>> is not checked. Now if <clock_driver_set_rate> returns error when h/w
>> fails to set the rate, I would like to know how the error returned by
>> <clock_driver_set_rate> is returned and received by clk_set_rate.
>> Correct me if I am missing anything in the above sequence.
>>
>> In the current state of code, one can use notifier(basically
>> POST_RATE_CHANGE is called only if the clock rate changes), but since
>> the clk_recalc reads back the clock rate, I found this patch is simpler
>> compared to the notifiers.
>
> Simpler, but not better. What you want is to know if the rate change
> failed. We need to through an exception when .set_rate fails and
> propagate the error up the call chain to the cpufreq driver.
>

Agreed, but I was under the assumption that since the POST_RATE_CHANGE
notifier are not called, it's implicit. So you are saying that's not the
case ?

> I'm thinking of ways to do this ... would require some surgery to the
> clock framework but it might give us a more elegant way to recover from
> a failure and roll back to a known good state.
>

Agreed. I avoid doing that for 2 reasons: firstly as you said it needs
changes at multiple places and secondly I assumed alternate ways to
handle it as the designed way.

Regards,
Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Turquette April 13, 2015, 5:08 a.m. UTC | #8
Quoting Sudeep Holla (2015-04-02 01:55:05)
> 
> 
> On 01/04/15 22:48, Michael Turquette wrote:
> > Quoting Sudeep Holla (2015-03-31 02:24:29)
> 
> [...]
> 
> >>
> >> No that's not correct, may be I was not clear earlier. Let me explain
> >> with the stack trace.
> >>
> >> bL_cpufreq_set_target(returns 0 even when clock driver returned error)
> >>           |
> >>           V
> >> clk_set_rate(returns whatever it get from clk_core_set_rate_nolock)
> >>           |
> >>           V
> >> clk_core_set_rate_nolock(always return 0 after calling clk_change_rate)
> >
> > Ah, now I understand our misunderstanding.
> >
> > clk_core_set_rate_nolock can fail BEFORE calling clk_change_rate, which
> > is where we do a lot of the work to see if the rate change is even
> > possible. That is what I was referring to in my previous mail.
> >
> 
> Ah, I guessed so as I was not clear in my earlier email. A simple flow
> diagram did the job better for me :)
> 
> > What you have is a failing .set_rate callback and you need to know if it
> > failed. You are correct that we are not handling the return value from
> > .set_rate. That needs to change.
> >
> 
> Cool, since I had not followed the design of the clock APIs, I assumed
> it needs to be handled in one of the way: notifiers or get_rate. Thanks
> for the clarification.
> 
> >>           |
> >>           V
> >> clk_change_rate(void function, so no return)
> >>           |
> >>           V
> >> clk->ops->set_rate(i.e. <clock_driver_set_rate>)
> >>
> >> Now for drivers/clk/clk.c IIUC, the return value from clk->ops->set_rate
> >> is not checked. Now if <clock_driver_set_rate> returns error when h/w
> >> fails to set the rate, I would like to know how the error returned by
> >> <clock_driver_set_rate> is returned and received by clk_set_rate.
> >> Correct me if I am missing anything in the above sequence.
> >>
> >> In the current state of code, one can use notifier(basically
> >> POST_RATE_CHANGE is called only if the clock rate changes), but since
> >> the clk_recalc reads back the clock rate, I found this patch is simpler
> >> compared to the notifiers.
> >
> > Simpler, but not better. What you want is to know if the rate change
> > failed. We need to through an exception when .set_rate fails and
> > propagate the error up the call chain to the cpufreq driver.
> >
> 
> Agreed, but I was under the assumption that since the POST_RATE_CHANGE
> notifier are not called, it's implicit. So you are saying that's not the
> case ?

The lack of POST_RATE_CHANGE notifier doesn't imply anything. If we
calculate that a rate cannot be achieved via clk_propagate_rate_change
then we fire off ABORT_RATE_CHANGE notifiers. Once we fix up the
deficiency around not returning the error code for .set_rate callbacks
then we will probably fire these notifiers off in the event that a rate
change fails.

> 
> > I'm thinking of ways to do this ... would require some surgery to the
> > clock framework but it might give us a more elegant way to recover from
> > a failure and roll back to a known good state.
> >
> 
> Agreed. I avoid doing that for 2 reasons: firstly as you said it needs
> changes at multiple places and secondly I assumed alternate ways to
> handle it as the designed way.

So your patch for cpufreq is hopefully a temporary bandage until we fix
the clk framework. Please feel free to add my Reviewed-by.

Regards,
Mike

> 
> Regards,
> Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla April 13, 2015, 10:21 a.m. UTC | #9
On 13/04/15 06:08, Michael Turquette wrote:
> Quoting Sudeep Holla (2015-04-02 01:55:05)
>>
>>
>> On 01/04/15 22:48, Michael Turquette wrote:
>>> Quoting Sudeep Holla (2015-03-31 02:24:29)

[...]

>>
>>> I'm thinking of ways to do this ... would require some surgery to the
>>> clock framework but it might give us a more elegant way to recover from
>>> a failure and roll back to a known good state.
>>>
>>
>> Agreed. I avoid doing that for 2 reasons: firstly as you said it needs
>> changes at multiple places and secondly I assumed alternate ways to
>> handle it as the designed way.
>
> So your patch for cpufreq is hopefully a temporary bandage until we fix
> the clk framework. Please feel free to add my Reviewed-by.
>

Thanks Mike.

Viresh, is it OK if we carry this patch until the clk framework can
handle this case ? I will add a *TODO* stating it's temporary change
and can be dropped once the clk layer handle it if that helps in any way
:).

This issue is seen on TC2 when firmware is stress tested with continuous
DVFS requests.

Regards,
Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar April 13, 2015, 10:25 a.m. UTC | #10
On 13 April 2015 at 15:51, Sudeep Holla <sudeep.holla@arm.com> wrote:
> Thanks Mike.
>
> Viresh, is it OK if we carry this patch until the clk framework can
> handle this case ? I will add a *TODO* stating it's temporary change
> and can be dropped once the clk layer handle it if that helps in any way
> :).
>
> This issue is seen on TC2 when firmware is stress tested with continuous
> DVFS requests.

Sure.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla April 13, 2015, 3:14 p.m. UTC | #11
On 13/04/15 11:25, Viresh Kumar wrote:
> On 13 April 2015 at 15:51, Sudeep Holla <sudeep.holla@arm.com> wrote:
>> Thanks Mike.
>>
>> Viresh, is it OK if we carry this patch until the clk framework can
>> handle this case ? I will add a *TODO* stating it's temporary change
>> and can be dropped once the clk layer handle it if that helps in any way
>> :).
>>
>> This issue is seen on TC2 when firmware is stress tested with continuous
>> DVFS requests.
>
> Sure.
>
Thanks, will repost the patches again after merge window to avoid
it getting lost during the merge window.

Regards,
Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index e1a6ba66a7f5..3fc676c63f91 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -186,6 +186,8 @@  bL_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate)
 		mutex_unlock(&cluster_lock[old_cluster]);
 	}
 
+	if (bL_cpufreq_get_rate(cpu) != new_rate)
+		return -EIO;
 	return 0;
 }