diff mbox series

x86: Downgrade clock throttling thermal event critical error

Message ID 20181009113754.20888-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series x86: Downgrade clock throttling thermal event critical error | expand

Commit Message

Chris Wilson Oct. 9, 2018, 11:37 a.m. UTC
Under CI testing, it is common for the cpus to overheat with the
continuous workloads and end up being throttled. As the cpus still
function, it is less of a critical error meriting urgent action, but an
expected yet significant condition (pr_note).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Petri Latvala <petri.latvala@intel.com>
---
 arch/x86/kernel/cpu/mcheck/therm_throt.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Tvrtko Ursulin Oct. 10, 2018, 11:59 a.m. UTC | #1
On 09/10/2018 12:37, Chris Wilson wrote:
> Under CI testing, it is common for the cpus to overheat with the
> continuous workloads and end up being throttled. As the cpus still
> function, it is less of a critical error meriting urgent action, but an
> expected yet significant condition (pr_note).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Petri Latvala <petri.latvala@intel.com>
> ---
>   arch/x86/kernel/cpu/mcheck/therm_throt.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> index 2da67b70ba98..bc57b5988589 100644
> --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> @@ -184,10 +184,10 @@ static void therm_throt_process(bool new_event, int event, int level)
>   	/* if we just entered the thermal event */
>   	if (new_event) {
>   		if (event == THERMAL_THROTTLING_EVENT)
> -			pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> -				this_cpu,
> -				level == CORE_LEVEL ? "Core" : "Package",
> -				state->count);
> +			pr_notice("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> +				  this_cpu,
> +				  level == CORE_LEVEL ? "Core" : "Package",
> +				  state->count);
>   		return;
>   	}
>   	if (old_event) {
> 

It even sounds it wouldn't be far fetched to argue these days notice is 
the correct log level for thermal throttling. Unless there are more 
sources of throttling messages. TBC when I get back to my Skull Canyon. 
That one certainly logs something like this shortly after invoking make -j8.

Regards,

Tvrtko
Chris Wilson Oct. 10, 2018, 12:10 p.m. UTC | #2
Quoting Tvrtko Ursulin (2018-10-10 12:59:59)
> 
> On 09/10/2018 12:37, Chris Wilson wrote:
> > Under CI testing, it is common for the cpus to overheat with the
> > continuous workloads and end up being throttled. As the cpus still
> > function, it is less of a critical error meriting urgent action, but an
> > expected yet significant condition (pr_note).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Petri Latvala <petri.latvala@intel.com>
> > ---
> >   arch/x86/kernel/cpu/mcheck/therm_throt.c | 8 ++++----
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > index 2da67b70ba98..bc57b5988589 100644
> > --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > @@ -184,10 +184,10 @@ static void therm_throt_process(bool new_event, int event, int level)
> >       /* if we just entered the thermal event */
> >       if (new_event) {
> >               if (event == THERMAL_THROTTLING_EVENT)
> > -                     pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> > -                             this_cpu,
> > -                             level == CORE_LEVEL ? "Core" : "Package",
> > -                             state->count);
> > +                     pr_notice("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> > +                               this_cpu,
> > +                               level == CORE_LEVEL ? "Core" : "Package",
> > +                               state->count);
> >               return;
> >       }
> >       if (old_event) {
> > 
> 
> It even sounds it wouldn't be far fetched to argue these days notice is 
> the correct log level for thermal throttling. Unless there are more 
> sources of throttling messages. TBC when I get back to my Skull Canyon. 
> That one certainly logs something like this shortly after invoking make -j8.

I was thinking of tarting up the language to say most processors
nowadays can easily exceed their Thermal Design Point and are built with
that in mind. The caveat is making sure that the shutdown limit is still
reported as a critical event, iirc that comes as a MCE.
-Chris
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 2da67b70ba98..bc57b5988589 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -184,10 +184,10 @@  static void therm_throt_process(bool new_event, int event, int level)
 	/* if we just entered the thermal event */
 	if (new_event) {
 		if (event == THERMAL_THROTTLING_EVENT)
-			pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
-				this_cpu,
-				level == CORE_LEVEL ? "Core" : "Package",
-				state->count);
+			pr_notice("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
+				  this_cpu,
+				  level == CORE_LEVEL ? "Core" : "Package",
+				  state->count);
 		return;
 	}
 	if (old_event) {