Message ID | 1427718438-31098-1-git-send-email-sudeep.holla@arm.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On 30 March 2015 at 17:57, Sudeep Holla <sudeep.holla@arm.com> wrote: > The actual frequency is set through "clk_change_rate" which is void > function. If the underlying hardware fails and returns error, the error > is lost in the clk layer. In order to track such failures, we need to > read back the frequency(just the cached value as clk_recalc called after > clk->ops->set_rate gets the frequency) > > This patch adds check to see if the frequency is set correctly or if > they were any hardware failures and sends the appropriate errors to the > cpufreq core. > > Cc: Viresh Kumar <viresh.kumar@linaro.org> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> > --- > drivers/cpufreq/arm_big_little.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c > index e1a6ba66a7f5..3fc676c63f91 100644 > --- a/drivers/cpufreq/arm_big_little.c > +++ b/drivers/cpufreq/arm_big_little.c > @@ -186,6 +186,8 @@ bL_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) > mutex_unlock(&cluster_lock[old_cluster]); > } > > + if (bL_cpufreq_get_rate(cpu) != new_rate) > + return -EIO; > return 0; > } This doesn't look to me the right place for fixing this. @Mike ?? -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 30/03/15 14:27, Viresh Kumar wrote: > On 30 March 2015 at 17:57, Sudeep Holla <sudeep.holla@arm.com> wrote: >> The actual frequency is set through "clk_change_rate" which is void >> function. If the underlying hardware fails and returns error, the error >> is lost in the clk layer. In order to track such failures, we need to >> read back the frequency(just the cached value as clk_recalc called after >> clk->ops->set_rate gets the frequency) >> >> This patch adds check to see if the frequency is set correctly or if >> they were any hardware failures and sends the appropriate errors to the >> cpufreq core. >> >> Cc: Viresh Kumar <viresh.kumar@linaro.org> >> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> >> --- >> drivers/cpufreq/arm_big_little.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c >> index e1a6ba66a7f5..3fc676c63f91 100644 >> --- a/drivers/cpufreq/arm_big_little.c >> +++ b/drivers/cpufreq/arm_big_little.c >> @@ -186,6 +186,8 @@ bL_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) >> mutex_unlock(&cluster_lock[old_cluster]); >> } >> >> + if (bL_cpufreq_get_rate(cpu) != new_rate) >> + return -EIO; >> return 0; >> } > > This doesn't look to me the right place for fixing this. > Yes I agree, after going through clk.c, I thought pre-/post- notifiers are designed for such purpose. I tried using them but found it unnecessary when it can be as simple as in this patch. However it's good to hear from Mike as I seem to have assumed a lot here. Regards, Sudeep -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Quoting Sudeep Holla (2015-03-30 06:39:00) > > > On 30/03/15 14:27, Viresh Kumar wrote: > > On 30 March 2015 at 17:57, Sudeep Holla <sudeep.holla@arm.com> wrote: > >> The actual frequency is set through "clk_change_rate" which is void > >> function. If the underlying hardware fails and returns error, the error > >> is lost in the clk layer. In order to track such failures, we need to > >> read back the frequency(just the cached value as clk_recalc called after > >> clk->ops->set_rate gets the frequency) > >> > >> This patch adds check to see if the frequency is set correctly or if > >> they were any hardware failures and sends the appropriate errors to the > >> cpufreq core. > >> > >> Cc: Viresh Kumar <viresh.kumar@linaro.org> > >> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> > >> --- > >> drivers/cpufreq/arm_big_little.c | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c > >> index e1a6ba66a7f5..3fc676c63f91 100644 > >> --- a/drivers/cpufreq/arm_big_little.c > >> +++ b/drivers/cpufreq/arm_big_little.c > >> @@ -186,6 +186,8 @@ bL_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) > >> mutex_unlock(&cluster_lock[old_cluster]); > >> } > >> > >> + if (bL_cpufreq_get_rate(cpu) != new_rate) > >> + return -EIO; > >> return 0; > >> } > > > > This doesn't look to me the right place for fixing this. > > > > Yes I agree, after going through clk.c, I thought pre-/post- notifiers > are designed for such purpose. I tried using them but found it > unnecessary when it can be as simple as in this patch. However it's good > to hear from Mike as I seem to have assumed a lot here. Viresh & Sudeep, clk_set_rate returns an error (and always has), so it seems to me that this patch is unnecessary. bL_cpufreq_set_rate checks for an error from clk_set_rate and handles it. clk_change_rate is static and not exposed outside of drivers/clk/clk.c. This patch gets a NAK from me. Regards, Mike > > Regards, > Sudeep -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 31/03/15 02:48, Michael Turquette wrote: > Quoting Sudeep Holla (2015-03-30 06:39:00) >> On 30/03/15 14:27, Viresh Kumar wrote: >>> On 30 March 2015 at 17:57, Sudeep Holla <sudeep.holla@arm.com> wrote: >>>> The actual frequency is set through "clk_change_rate" which is void >>>> function. If the underlying hardware fails and returns error, the error >>>> is lost in the clk layer. In order to track such failures, we need to >>>> read back the frequency(just the cached value as clk_recalc called after >>>> clk->ops->set_rate gets the frequency) [...] >>> >>> This doesn't look to me the right place for fixing this. >>> >> >> Yes I agree, after going through clk.c, I thought pre-/post- notifiers >> are designed for such purpose. I tried using them but found it >> unnecessary when it can be as simple as in this patch. However it's good >> to hear from Mike as I seem to have assumed a lot here. > > Viresh & Sudeep, > > clk_set_rate returns an error (and always has), so it seems to me that > this patch is unnecessary. bL_cpufreq_set_rate checks for an error from > clk_set_rate and handles it. > No that's not correct, may be I was not clear earlier. Let me explain with the stack trace. bL_cpufreq_set_target(returns 0 even when clock driver returned error) | V clk_set_rate(returns whatever it get from clk_core_set_rate_nolock) | V clk_core_set_rate_nolock(always return 0 after calling clk_change_rate) | V clk_change_rate(void function, so no return) | V clk->ops->set_rate(i.e. <clock_driver_set_rate>) Now for drivers/clk/clk.c IIUC, the return value from clk->ops->set_rate is not checked. Now if <clock_driver_set_rate> returns error when h/w fails to set the rate, I would like to know how the error returned by <clock_driver_set_rate> is returned and received by clk_set_rate. Correct me if I am missing anything in the above sequence. In the current state of code, one can use notifier(basically POST_RATE_CHANGE is called only if the clock rate changes), but since the clk_recalc reads back the clock rate, I found this patch is simpler compared to the notifiers. Regards, Sudeep -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 31/03/15 10:24, Sudeep Holla wrote: > On 31/03/15 02:48, Michael Turquette wrote: [...] >> clk_set_rate returns an error (and always has), so it seems to me that >> this patch is unnecessary. bL_cpufreq_set_rate checks for an error from >> clk_set_rate and handles it. >> > > No that's not correct, may be I was not clear earlier. Let me explain > with the stack trace. > > bL_cpufreq_set_target(returns 0 even when clock driver returned error) > | > V > clk_set_rate(returns whatever it get from clk_core_set_rate_nolock) > | > V > clk_core_set_rate_nolock(always return 0 after calling clk_change_rate) > | > V > clk_change_rate(void function, so no return) > | > V > clk->ops->set_rate(i.e. <clock_driver_set_rate>) > > Now for drivers/clk/clk.c IIUC, the return value from clk->ops->set_rate > is not checked. Now if <clock_driver_set_rate> returns error when h/w > fails to set the rate, I would like to know how the error returned by > <clock_driver_set_rate> is returned and received by clk_set_rate. > Correct me if I am missing anything in the above sequence. > Any input on this ? or am I taking non-sense here ? Regards, Sudeep -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Quoting Sudeep Holla (2015-03-31 02:24:29) > > > On 31/03/15 02:48, Michael Turquette wrote: > > Quoting Sudeep Holla (2015-03-30 06:39:00) > >> On 30/03/15 14:27, Viresh Kumar wrote: > >>> On 30 March 2015 at 17:57, Sudeep Holla <sudeep.holla@arm.com> wrote: > >>>> The actual frequency is set through "clk_change_rate" which is void > >>>> function. If the underlying hardware fails and returns error, the error > >>>> is lost in the clk layer. In order to track such failures, we need to > >>>> read back the frequency(just the cached value as clk_recalc called after > >>>> clk->ops->set_rate gets the frequency) > > [...] > >>> > >>> This doesn't look to me the right place for fixing this. > >>> > >> > >> Yes I agree, after going through clk.c, I thought pre-/post- notifiers > >> are designed for such purpose. I tried using them but found it > >> unnecessary when it can be as simple as in this patch. However it's good > >> to hear from Mike as I seem to have assumed a lot here. > > > > Viresh & Sudeep, > > > > clk_set_rate returns an error (and always has), so it seems to me that > > this patch is unnecessary. bL_cpufreq_set_rate checks for an error from > > clk_set_rate and handles it. > > > > No that's not correct, may be I was not clear earlier. Let me explain > with the stack trace. > > bL_cpufreq_set_target(returns 0 even when clock driver returned error) > | > V > clk_set_rate(returns whatever it get from clk_core_set_rate_nolock) > | > V > clk_core_set_rate_nolock(always return 0 after calling clk_change_rate) Ah, now I understand our misunderstanding. clk_core_set_rate_nolock can fail BEFORE calling clk_change_rate, which is where we do a lot of the work to see if the rate change is even possible. That is what I was referring to in my previous mail. What you have is a failing .set_rate callback and you need to know if it failed. You are correct that we are not handling the return value from .set_rate. That needs to change. > | > V > clk_change_rate(void function, so no return) > | > V > clk->ops->set_rate(i.e. <clock_driver_set_rate>) > > Now for drivers/clk/clk.c IIUC, the return value from clk->ops->set_rate > is not checked. Now if <clock_driver_set_rate> returns error when h/w > fails to set the rate, I would like to know how the error returned by > <clock_driver_set_rate> is returned and received by clk_set_rate. > Correct me if I am missing anything in the above sequence. > > In the current state of code, one can use notifier(basically > POST_RATE_CHANGE is called only if the clock rate changes), but since > the clk_recalc reads back the clock rate, I found this patch is simpler > compared to the notifiers. Simpler, but not better. What you want is to know if the rate change failed. We need to through an exception when .set_rate fails and propagate the error up the call chain to the cpufreq driver. I'm thinking of ways to do this ... would require some surgery to the clock framework but it might give us a more elegant way to recover from a failure and roll back to a known good state. Regards, Mike > > Regards, > Sudeep -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/04/15 22:48, Michael Turquette wrote: > Quoting Sudeep Holla (2015-03-31 02:24:29) [...] >> >> No that's not correct, may be I was not clear earlier. Let me explain >> with the stack trace. >> >> bL_cpufreq_set_target(returns 0 even when clock driver returned error) >> | >> V >> clk_set_rate(returns whatever it get from clk_core_set_rate_nolock) >> | >> V >> clk_core_set_rate_nolock(always return 0 after calling clk_change_rate) > > Ah, now I understand our misunderstanding. > > clk_core_set_rate_nolock can fail BEFORE calling clk_change_rate, which > is where we do a lot of the work to see if the rate change is even > possible. That is what I was referring to in my previous mail. > Ah, I guessed so as I was not clear in my earlier email. A simple flow diagram did the job better for me :) > What you have is a failing .set_rate callback and you need to know if it > failed. You are correct that we are not handling the return value from > .set_rate. That needs to change. > Cool, since I had not followed the design of the clock APIs, I assumed it needs to be handled in one of the way: notifiers or get_rate. Thanks for the clarification. >> | >> V >> clk_change_rate(void function, so no return) >> | >> V >> clk->ops->set_rate(i.e. <clock_driver_set_rate>) >> >> Now for drivers/clk/clk.c IIUC, the return value from clk->ops->set_rate >> is not checked. Now if <clock_driver_set_rate> returns error when h/w >> fails to set the rate, I would like to know how the error returned by >> <clock_driver_set_rate> is returned and received by clk_set_rate. >> Correct me if I am missing anything in the above sequence. >> >> In the current state of code, one can use notifier(basically >> POST_RATE_CHANGE is called only if the clock rate changes), but since >> the clk_recalc reads back the clock rate, I found this patch is simpler >> compared to the notifiers. > > Simpler, but not better. What you want is to know if the rate change > failed. We need to through an exception when .set_rate fails and > propagate the error up the call chain to the cpufreq driver. > Agreed, but I was under the assumption that since the POST_RATE_CHANGE notifier are not called, it's implicit. So you are saying that's not the case ? > I'm thinking of ways to do this ... would require some surgery to the > clock framework but it might give us a more elegant way to recover from > a failure and roll back to a known good state. > Agreed. I avoid doing that for 2 reasons: firstly as you said it needs changes at multiple places and secondly I assumed alternate ways to handle it as the designed way. Regards, Sudeep -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Quoting Sudeep Holla (2015-04-02 01:55:05) > > > On 01/04/15 22:48, Michael Turquette wrote: > > Quoting Sudeep Holla (2015-03-31 02:24:29) > > [...] > > >> > >> No that's not correct, may be I was not clear earlier. Let me explain > >> with the stack trace. > >> > >> bL_cpufreq_set_target(returns 0 even when clock driver returned error) > >> | > >> V > >> clk_set_rate(returns whatever it get from clk_core_set_rate_nolock) > >> | > >> V > >> clk_core_set_rate_nolock(always return 0 after calling clk_change_rate) > > > > Ah, now I understand our misunderstanding. > > > > clk_core_set_rate_nolock can fail BEFORE calling clk_change_rate, which > > is where we do a lot of the work to see if the rate change is even > > possible. That is what I was referring to in my previous mail. > > > > Ah, I guessed so as I was not clear in my earlier email. A simple flow > diagram did the job better for me :) > > > What you have is a failing .set_rate callback and you need to know if it > > failed. You are correct that we are not handling the return value from > > .set_rate. That needs to change. > > > > Cool, since I had not followed the design of the clock APIs, I assumed > it needs to be handled in one of the way: notifiers or get_rate. Thanks > for the clarification. > > >> | > >> V > >> clk_change_rate(void function, so no return) > >> | > >> V > >> clk->ops->set_rate(i.e. <clock_driver_set_rate>) > >> > >> Now for drivers/clk/clk.c IIUC, the return value from clk->ops->set_rate > >> is not checked. Now if <clock_driver_set_rate> returns error when h/w > >> fails to set the rate, I would like to know how the error returned by > >> <clock_driver_set_rate> is returned and received by clk_set_rate. > >> Correct me if I am missing anything in the above sequence. > >> > >> In the current state of code, one can use notifier(basically > >> POST_RATE_CHANGE is called only if the clock rate changes), but since > >> the clk_recalc reads back the clock rate, I found this patch is simpler > >> compared to the notifiers. > > > > Simpler, but not better. What you want is to know if the rate change > > failed. We need to through an exception when .set_rate fails and > > propagate the error up the call chain to the cpufreq driver. > > > > Agreed, but I was under the assumption that since the POST_RATE_CHANGE > notifier are not called, it's implicit. So you are saying that's not the > case ? The lack of POST_RATE_CHANGE notifier doesn't imply anything. If we calculate that a rate cannot be achieved via clk_propagate_rate_change then we fire off ABORT_RATE_CHANGE notifiers. Once we fix up the deficiency around not returning the error code for .set_rate callbacks then we will probably fire these notifiers off in the event that a rate change fails. > > > I'm thinking of ways to do this ... would require some surgery to the > > clock framework but it might give us a more elegant way to recover from > > a failure and roll back to a known good state. > > > > Agreed. I avoid doing that for 2 reasons: firstly as you said it needs > changes at multiple places and secondly I assumed alternate ways to > handle it as the designed way. So your patch for cpufreq is hopefully a temporary bandage until we fix the clk framework. Please feel free to add my Reviewed-by. Regards, Mike > > Regards, > Sudeep -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 13/04/15 06:08, Michael Turquette wrote: > Quoting Sudeep Holla (2015-04-02 01:55:05) >> >> >> On 01/04/15 22:48, Michael Turquette wrote: >>> Quoting Sudeep Holla (2015-03-31 02:24:29) [...] >> >>> I'm thinking of ways to do this ... would require some surgery to the >>> clock framework but it might give us a more elegant way to recover from >>> a failure and roll back to a known good state. >>> >> >> Agreed. I avoid doing that for 2 reasons: firstly as you said it needs >> changes at multiple places and secondly I assumed alternate ways to >> handle it as the designed way. > > So your patch for cpufreq is hopefully a temporary bandage until we fix > the clk framework. Please feel free to add my Reviewed-by. > Thanks Mike. Viresh, is it OK if we carry this patch until the clk framework can handle this case ? I will add a *TODO* stating it's temporary change and can be dropped once the clk layer handle it if that helps in any way :). This issue is seen on TC2 when firmware is stress tested with continuous DVFS requests. Regards, Sudeep -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 13 April 2015 at 15:51, Sudeep Holla <sudeep.holla@arm.com> wrote: > Thanks Mike. > > Viresh, is it OK if we carry this patch until the clk framework can > handle this case ? I will add a *TODO* stating it's temporary change > and can be dropped once the clk layer handle it if that helps in any way > :). > > This issue is seen on TC2 when firmware is stress tested with continuous > DVFS requests. Sure. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 13/04/15 11:25, Viresh Kumar wrote: > On 13 April 2015 at 15:51, Sudeep Holla <sudeep.holla@arm.com> wrote: >> Thanks Mike. >> >> Viresh, is it OK if we carry this patch until the clk framework can >> handle this case ? I will add a *TODO* stating it's temporary change >> and can be dropped once the clk layer handle it if that helps in any way >> :). >> >> This issue is seen on TC2 when firmware is stress tested with continuous >> DVFS requests. > > Sure. > Thanks, will repost the patches again after merge window to avoid it getting lost during the merge window. Regards, Sudeep -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c index e1a6ba66a7f5..3fc676c63f91 100644 --- a/drivers/cpufreq/arm_big_little.c +++ b/drivers/cpufreq/arm_big_little.c @@ -186,6 +186,8 @@ bL_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) mutex_unlock(&cluster_lock[old_cluster]); } + if (bL_cpufreq_get_rate(cpu) != new_rate) + return -EIO; return 0; }
The actual frequency is set through "clk_change_rate" which is void function. If the underlying hardware fails and returns error, the error is lost in the clk layer. In order to track such failures, we need to read back the frequency(just the cached value as clk_recalc called after clk->ops->set_rate gets the frequency) This patch adds check to see if the frequency is set correctly or if they were any hardware failures and sends the appropriate errors to the cpufreq core. Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> --- drivers/cpufreq/arm_big_little.c | 2 ++ 1 file changed, 2 insertions(+)