diff mbox

cpufreq: Stop BUGing the system

Message ID 1418831495-26958-1-git-send-email-nm@ti.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Nishanth Menon Dec. 17, 2014, 3:51 p.m. UTC
CPUFRreq subsystem is not a system catastrophic failure point.
Failures in these cases DONOT need complete system shutdown with BUG.
just refuse to let cpufreq function should be good enough.

Signed-off-by: Nishanth Menon <nm@ti.com>
---
 drivers/cpufreq/cpufreq.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

Comments

Kevin Hilman Dec. 17, 2014, 7:16 p.m. UTC | #1
Nishanth Menon <nm@ti.com> writes:

> CPUFRreq subsystem is not a system catastrophic failure point.
> Failures in these cases DONOT need complete system shutdown with BUG.
> just refuse to let cpufreq function should be good enough.
>
> Signed-off-by: Nishanth Menon <nm@ti.com>

Acked-by: Kevin Hilman <khilman@linaro.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Dec. 18, 2014, 2:08 a.m. UTC | #2
On 17 December 2014 at 21:21, Nishanth Menon <nm@ti.com> wrote:
> CPUFRreq subsystem is not a system catastrophic failure point.
> Failures in these cases DONOT need complete system shutdown with BUG.
> just refuse to let cpufreq function should be good enough.
>
> Signed-off-by: Nishanth Menon <nm@ti.com>
> ---
>  drivers/cpufreq/cpufreq.c |   17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index a09a29c..a5aa2fa 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -281,7 +281,10 @@ static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci)
>  static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
>                 struct cpufreq_freqs *freqs, unsigned int state)
>  {
> -       BUG_ON(irqs_disabled());
> +       if (irqs_disabled()) {
> +               WARN(1, "IRQs disabled!\n");
> +               return;
> +       }

What about:

> +               if (WARN(irqs_disabled(), "IRQs disabled!\n")
> +                       return;

Same for the last change as well..

>
>         if (cpufreq_disabled())
>                 return;
> @@ -1253,9 +1256,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
>                         /*
>                          * Reaching here after boot in a few seconds may not
>                          * mean that system will remain stable at "unknown"
> -                        * frequency for longer duration. Hence, a BUG_ON().
> +                        * frequency for longer duration. Hence, a WARN().
>                          */
> -                       BUG_ON(ret);
> +                       if (ret) {
> +                               WARN(1, "SYSTEM operating at invalid freq %u", policy->cur);
> +                               goto err_out_unregister;
> +                       }

And I still don't agree for this one. We shouldn't keep on working on a
potentially unstable frequency.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Dec. 18, 2014, 2:49 p.m. UTC | #3
On 07:38-20141218, Viresh Kumar wrote:
> On 17 December 2014 at 21:21, Nishanth Menon <nm@ti.com> wrote:
> > CPUFRreq subsystem is not a system catastrophic failure point.
> > Failures in these cases DONOT need complete system shutdown with BUG.
> > just refuse to let cpufreq function should be good enough.
> >
> > Signed-off-by: Nishanth Menon <nm@ti.com>
> > ---
> >  drivers/cpufreq/cpufreq.c |   17 +++++++++++++----
> >  1 file changed, 13 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index a09a29c..a5aa2fa 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -281,7 +281,10 @@ static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci)
> >  static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
> >                 struct cpufreq_freqs *freqs, unsigned int state)
> >  {
> > -       BUG_ON(irqs_disabled());
> > +       if (irqs_disabled()) {
> > +               WARN(1, "IRQs disabled!\n");
> > +               return;
> > +       }
> 
> What about:
> 
> > +               if (WARN(irqs_disabled(), "IRQs disabled!\n")
> > +                       return;
> 
> Same for the last change as well..

k.
> 
> >
> >         if (cpufreq_disabled())
> >                 return;
> > @@ -1253,9 +1256,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
> >                         /*
> >                          * Reaching here after boot in a few seconds may not
> >                          * mean that system will remain stable at "unknown"
> > -                        * frequency for longer duration. Hence, a BUG_ON().
> > +                        * frequency for longer duration. Hence, a WARN().
> >                          */
> > -                       BUG_ON(ret);
> > +                       if (ret) {
> > +                               WARN(1, "SYSTEM operating at invalid freq %u", policy->cur);
> > +                               goto err_out_unregister;
> > +                       }
> 
> And I still don't agree for this one. We shouldn't keep on working on a
> potentially unstable frequency.

I can add "could be unstable" -> the point being there can be psuedo
errors reported in the system - example - clock framework bugs. Dont
just stop the boot. example: what if cpufreq was a driver module - it
would not have rescued the system because cpufreq had'nt detected the
logic - if we are going to force this on the system, we should probably
not do this in cpufreq code, instead should be somewhere generic.

While I do empathise (and had infact advocated in the past) of not
favouring system attempting to continue at an invalid configuration and
our attempt to rescue has failed - given that we cannot provide a
consistent behavior (it is not a core system behavior) and potential of
a false-postive (example clk framework or underlying bug), it should be
good enough to "enhance" WARN to be "severe sounding enough" to
flag it for developer and continue while keeping the system alive as
much as possible.
Viresh Kumar Dec. 19, 2014, 1:41 a.m. UTC | #4
On 18 December 2014 at 20:19, Nishanth Menon <nm@ti.com> wrote:
> I can add "could be unstable" -> the point being there can be psuedo
> errors reported in the system - example - clock framework bugs. Dont
> just stop the boot. example: what if cpufreq was a driver module - it
> would not have rescued the system because cpufreq had'nt detected the
> logic - if we are going to force this on the system, we should probably
> not do this in cpufreq code, instead should be somewhere generic.
>
> While I do empathise (and had infact advocated in the past) of not
> favouring system attempting to continue at an invalid configuration and
> our attempt to rescue has failed - given that we cannot provide a
> consistent behavior (it is not a core system behavior) and potential of
> a false-postive (example clk framework or underlying bug), it should be
> good enough to "enhance" WARN to be "severe sounding enough" to
> flag it for developer and continue while keeping the system alive as
> much as possible.

There is no way out for the kernel to know if its a false positive or a real
bug. And in the worst case, it can screw up a platform completely.

I am still not sure if changing it to a WARN would be good idea.

@Rafael: Thoughts ?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Dec. 19, 2014, 2:08 a.m. UTC | #5
On Friday, December 19, 2014 07:11:19 AM Viresh Kumar wrote:
> On 18 December 2014 at 20:19, Nishanth Menon <nm@ti.com> wrote:
> > I can add "could be unstable" -> the point being there can be psuedo
> > errors reported in the system - example - clock framework bugs. Dont
> > just stop the boot. example: what if cpufreq was a driver module - it
> > would not have rescued the system because cpufreq had'nt detected the
> > logic - if we are going to force this on the system, we should probably
> > not do this in cpufreq code, instead should be somewhere generic.
> >
> > While I do empathise (and had infact advocated in the past) of not
> > favouring system attempting to continue at an invalid configuration and
> > our attempt to rescue has failed - given that we cannot provide a
> > consistent behavior (it is not a core system behavior) and potential of
> > a false-postive (example clk framework or underlying bug), it should be
> > good enough to "enhance" WARN to be "severe sounding enough" to
> > flag it for developer and continue while keeping the system alive as
> > much as possible.
> 
> There is no way out for the kernel to know if its a false positive or a real
> bug. And in the worst case, it can screw up a platform completely.
> 
> I am still not sure if changing it to a WARN would be good idea.
> 
> @Rafael: Thoughts ?

I'm a bit divided here.  On the one hand I don't like BUG_ON() as a rule and it
is used in too many places where it doesn't have to be used.

On the other hand, in this particular case, I'm not sure if allowing the system
to run without cpufreq when it might rely on it for CPU cooling, for one example,
is a good idea.
Nishanth Menon Dec. 24, 2014, 3:06 p.m. UTC | #6
On 12/18/2014 08:08 PM, Rafael J. Wysocki wrote:
> On Friday, December 19, 2014 07:11:19 AM Viresh Kumar wrote:
>> On 18 December 2014 at 20:19, Nishanth Menon <nm@ti.com> wrote:
>>> I can add "could be unstable" -> the point being there can be psuedo
>>> errors reported in the system - example - clock framework bugs. Dont
>>> just stop the boot. example: what if cpufreq was a driver module - it
>>> would not have rescued the system because cpufreq had'nt detected the
>>> logic - if we are going to force this on the system, we should probably
>>> not do this in cpufreq code, instead should be somewhere generic.
>>>
>>> While I do empathise (and had infact advocated in the past) of not
>>> favouring system attempting to continue at an invalid configuration and
>>> our attempt to rescue has failed - given that we cannot provide a
>>> consistent behavior (it is not a core system behavior) and potential of
>>> a false-postive (example clk framework or underlying bug), it should be
>>> good enough to "enhance" WARN to be "severe sounding enough" to
>>> flag it for developer and continue while keeping the system alive as
>>> much as possible.
>>
>> There is no way out for the kernel to know if its a false positive or a real
>> bug. And in the worst case, it can screw up a platform completely.
>>
>> I am still not sure if changing it to a WARN would be good idea.
>>
>> @Rafael: Thoughts ?
> 
> I'm a bit divided here.  On the one hand I don't like BUG_ON() as a rule and it
> is used in too many places where it doesn't have to be used.
> 
> On the other hand, in this particular case, I'm not sure if allowing the system
> to run without cpufreq when it might rely on it for CPU cooling, for one example,
> is a good idea.

but then, CPUFReq is not a mandatory feature - we could as well do the
same with CPU_FREQ disabled.
Rafael J. Wysocki Dec. 27, 2014, 8:37 p.m. UTC | #7
On Wednesday, December 24, 2014 09:06:05 AM Nishanth Menon wrote:
> On 12/18/2014 08:08 PM, Rafael J. Wysocki wrote:
> > On Friday, December 19, 2014 07:11:19 AM Viresh Kumar wrote:
> >> On 18 December 2014 at 20:19, Nishanth Menon <nm@ti.com> wrote:
> >>> I can add "could be unstable" -> the point being there can be psuedo
> >>> errors reported in the system - example - clock framework bugs. Dont
> >>> just stop the boot. example: what if cpufreq was a driver module - it
> >>> would not have rescued the system because cpufreq had'nt detected the
> >>> logic - if we are going to force this on the system, we should probably
> >>> not do this in cpufreq code, instead should be somewhere generic.
> >>>
> >>> While I do empathise (and had infact advocated in the past) of not
> >>> favouring system attempting to continue at an invalid configuration and
> >>> our attempt to rescue has failed - given that we cannot provide a
> >>> consistent behavior (it is not a core system behavior) and potential of
> >>> a false-postive (example clk framework or underlying bug), it should be
> >>> good enough to "enhance" WARN to be "severe sounding enough" to
> >>> flag it for developer and continue while keeping the system alive as
> >>> much as possible.
> >>
> >> There is no way out for the kernel to know if its a false positive or a real
> >> bug. And in the worst case, it can screw up a platform completely.
> >>
> >> I am still not sure if changing it to a WARN would be good idea.
> >>
> >> @Rafael: Thoughts ?
> > 
> > I'm a bit divided here.  On the one hand I don't like BUG_ON() as a rule and it
> > is used in too many places where it doesn't have to be used.
> > 
> > On the other hand, in this particular case, I'm not sure if allowing the system
> > to run without cpufreq when it might rely on it for CPU cooling, for one example,
> > is a good idea.
> 
> but then, CPUFReq is not a mandatory feature - we could as well do the
> same with CPU_FREQ disabled.

Some platforms pretty much require CPU_FREQ and will always have it set, but
with the $subject patch they may end up not using it.

So this isn't a valid argument to me, sorry.
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index a09a29c..a5aa2fa 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -281,7 +281,10 @@  static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci)
 static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
 		struct cpufreq_freqs *freqs, unsigned int state)
 {
-	BUG_ON(irqs_disabled());
+	if (irqs_disabled()) {
+		WARN(1, "IRQs disabled!\n");
+		return;
+	}
 
 	if (cpufreq_disabled())
 		return;
@@ -1253,9 +1256,12 @@  static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
 			/*
 			 * Reaching here after boot in a few seconds may not
 			 * mean that system will remain stable at "unknown"
-			 * frequency for longer duration. Hence, a BUG_ON().
+			 * frequency for longer duration. Hence, a WARN().
 			 */
-			BUG_ON(ret);
+			if (ret) {
+				WARN(1, "SYSTEM operating at invalid freq %u", policy->cur);
+				goto err_out_unregister;
+			}
 			pr_warn("%s: CPU%d: Unlisted initial frequency changed to: %u KHz\n",
 				__func__, policy->cpu, policy->cur);
 		}
@@ -2556,7 +2562,10 @@  static int __init cpufreq_core_init(void)
 		return -ENODEV;
 
 	cpufreq_global_kobject = kobject_create();
-	BUG_ON(!cpufreq_global_kobject);
+	if (!cpufreq_global_kobject) {
+		WARN(1, "No memory for cpufreq_global_kobject\n");
+		return -ENOMEM;
+	}
 
 	return 0;
 }