Message ID | 1432251947-13335-1-git-send-email-tharvey@gateworks.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, May 21, 2015 at 04:45:47PM -0700, Tim Harvey wrote: > The IMX6Q/IMX6DL SoC's have a 2-bit temperature grade stored in OTP which > is valid for all IMX6 SoC's (despite the fact that the IMXSDLRM and > IMXSXRM do not document this - this has been proven via tests as well as > verified by Freescale FAE). > > Instead of assuming a fixed 85C for passive cooling threshold and 105C for > critical use the thermal grade for these configurations. > > We will set the critical to maxT - 5C and passive to maxT - 10C. > > Cc: Anson Huang <b20788@freescale.com> > Cc: Fabio Estevam <fabio.estevam@freescale.com> > Cc: Eduardo Valentin <edubezval@gmail.com> > Cc: Zhang Rui <rui.zhang@intel.com> > Signed-off-by: Tim Harvey <tharvey@gateworks.com> Acked-by: Shawn Guo <shawn.guo@linaro.org>
On Sun, May 24, 2015 at 4:48 AM, Shawn Guo <shawn.guo@linaro.org> wrote: > On Thu, May 21, 2015 at 04:45:47PM -0700, Tim Harvey wrote: >> The IMX6Q/IMX6DL SoC's have a 2-bit temperature grade stored in OTP which >> is valid for all IMX6 SoC's (despite the fact that the IMXSDLRM and >> IMXSXRM do not document this - this has been proven via tests as well as >> verified by Freescale FAE). >> >> Instead of assuming a fixed 85C for passive cooling threshold and 105C for >> critical use the thermal grade for these configurations. >> >> We will set the critical to maxT - 5C and passive to maxT - 10C. I would like to chime in here if you don't mind. I have been carrying a patch similar to this in the SolidRun repo to fix cooling issues that we have had. I would recommend keeping the passive temp at maxT - 20C due to the thermal properties of the chip. I have found that around 85-90C we can maintain a relatively steady thermal state with only passive cooling. Generally with a hard non-NEON based cpu workload the iMX6 will level off at about 87C with all the cores clocked to 1Ghz, and sometimes dipping down to 800Mhz periodically. With a NEON based workload on all the cores it will push beyond this and generally end up finding steady state at about 800Mhz right around 90C. If you raise the initial passive threshold by 10C it will allow enough heat to build up in the chip that the only way to avoid reaching critical temps is by dropping the CPU down to its lowest frequency. This is not the best experience as then you have a much warmer chip and if the workload doesn't change you will just be switching between running at the highest cpu frequency or lowest which makes for a choppy experience. A longer passive cooling zone allows the temperature of the chip to be regulated using only passive methods but without drastic performance drops. I am doing things a bit differently in my implementation as I setup a passive cooling zone for each cpu frequency, but that is just so you can have more control from userspace by changing the different passive trip points. -Jon
On Sat, May 23, 2015 at 10:19 PM, Jon Nettleton <jon.nettleton@gmail.com> wrote: > On Sun, May 24, 2015 at 4:48 AM, Shawn Guo <shawn.guo@linaro.org> wrote: >> On Thu, May 21, 2015 at 04:45:47PM -0700, Tim Harvey wrote: >>> The IMX6Q/IMX6DL SoC's have a 2-bit temperature grade stored in OTP which >>> is valid for all IMX6 SoC's (despite the fact that the IMXSDLRM and >>> IMXSXRM do not document this - this has been proven via tests as well as >>> verified by Freescale FAE). >>> >>> Instead of assuming a fixed 85C for passive cooling threshold and 105C for >>> critical use the thermal grade for these configurations. >>> >>> We will set the critical to maxT - 5C and passive to maxT - 10C. > > I would like to chime in here if you don't mind. I have been carrying > a patch similar to this in the SolidRun repo to fix cooling issues > that we have had. I would recommend keeping the passive temp at maxT > - 20C due to the thermal properties of the chip. I have found that > around 85-90C we can maintain a relatively steady thermal state with > only passive cooling. Generally with a hard non-NEON based cpu > workload the iMX6 will level off at about 87C with all the cores > clocked to 1Ghz, and sometimes dipping down to 800Mhz periodically. > With a NEON based workload on all the cores it will push beyond this > and generally end up finding steady state at about 800Mhz right around > 90C. > > If you raise the initial passive threshold by 10C it will allow enough > heat to build up in the chip that the only way to avoid reaching > critical temps is by dropping the CPU down to its lowest frequency. > This is not the best experience as then you have a much warmer chip > and if the workload doesn't change you will just be switching between > running at the highest cpu frequency or lowest which makes for a > choppy experience. A longer passive cooling zone allows the > temperature of the chip to be regulated using only passive methods but > without drastic performance drops. > > I am doing things a bit differently in my implementation as I setup a > passive cooling zone for each cpu frequency, but that is just so you > can have more control from userspace by changing the different passive > trip points. > > -Jon Jon, I can agree with leaving a Max-20C passive delta. What do you think about the critical threshold of Max-5C and rule of not allowing it to be changed? Tim
On Tue, May 26, 2015 at 11:24 PM, Tim Harvey <tharvey@gateworks.com> wrote: > On Sat, May 23, 2015 at 10:19 PM, Jon Nettleton <jon.nettleton@gmail.com> wrote: >> On Sun, May 24, 2015 at 4:48 AM, Shawn Guo <shawn.guo@linaro.org> wrote: >>> On Thu, May 21, 2015 at 04:45:47PM -0700, Tim Harvey wrote: >>>> The IMX6Q/IMX6DL SoC's have a 2-bit temperature grade stored in OTP which >>>> is valid for all IMX6 SoC's (despite the fact that the IMXSDLRM and >>>> IMXSXRM do not document this - this has been proven via tests as well as >>>> verified by Freescale FAE). >>>> >>>> Instead of assuming a fixed 85C for passive cooling threshold and 105C for >>>> critical use the thermal grade for these configurations. >>>> >>>> We will set the critical to maxT - 5C and passive to maxT - 10C. >> >> I would like to chime in here if you don't mind. I have been carrying >> a patch similar to this in the SolidRun repo to fix cooling issues >> that we have had. I would recommend keeping the passive temp at maxT >> - 20C due to the thermal properties of the chip. I have found that >> around 85-90C we can maintain a relatively steady thermal state with >> only passive cooling. Generally with a hard non-NEON based cpu >> workload the iMX6 will level off at about 87C with all the cores >> clocked to 1Ghz, and sometimes dipping down to 800Mhz periodically. >> With a NEON based workload on all the cores it will push beyond this >> and generally end up finding steady state at about 800Mhz right around >> 90C. >> >> If you raise the initial passive threshold by 10C it will allow enough >> heat to build up in the chip that the only way to avoid reaching >> critical temps is by dropping the CPU down to its lowest frequency. >> This is not the best experience as then you have a much warmer chip >> and if the workload doesn't change you will just be switching between >> running at the highest cpu frequency or lowest which makes for a >> choppy experience. A longer passive cooling zone allows the >> temperature of the chip to be regulated using only passive methods but >> without drastic performance drops. >> >> I am doing things a bit differently in my implementation as I setup a >> passive cooling zone for each cpu frequency, but that is just so you >> can have more control from userspace by changing the different passive >> trip points. >> >> -Jon > > Jon, > > I can agree with leaving a Max-20C passive delta. What do you think > about the critical threshold of Max-5C and rule of not allowing it to > be changed? > Tim, I definitely agree that the Critical temp should be a fixed point. Is the purpose of lowering the critical threshold from the hardware default, to allow Linux to shutdown more cleanly rather than just have the hardware shutting down? If that is the case then I think that is fine. If it is to protect the SOC then that is unnecessary. We have heated the SOCs to well beyond the critical threshold and they have survived just fine. This is a bit out of context but here is the formula I am using to figure out my trip points. By default I use a linear set of trip points for passive cooling. https://github.com/linux4kix/linux-linaro-stable-mx6/commit/212c17d543739a5fe0bd75b66c10f05177e8bcb0 The short of it is I set a trip delta of 6C and then figure out the lowest passive trip point as Critical - (#passive trip points * trip delta), where each cpu frequency stage is a passive trip point. This will allow an 800Mhz SOC with 2 trip points to run at full speed longer than a 1.2Ghz with 4 trip points. The idea being that the higher the clock rate means we will generate more heat and have more passive cooling levels so it is better to drop the top speed of the CPU earlier in order to let the passive cooling be effective and find a steady state. This may be a bit over the top but has fixed problems where long running processes would build up heat and eventually cause a thermal shutdown, but doesn't completely cripple the faster SOCs.
On Tue, May 26, 2015 at 11:08 PM, Jon Nettleton <jon.nettleton@gmail.com> wrote: > On Tue, May 26, 2015 at 11:24 PM, Tim Harvey <tharvey@gateworks.com> wrote: >> On Sat, May 23, 2015 at 10:19 PM, Jon Nettleton <jon.nettleton@gmail.com> wrote: >>> On Sun, May 24, 2015 at 4:48 AM, Shawn Guo <shawn.guo@linaro.org> wrote: >>>> On Thu, May 21, 2015 at 04:45:47PM -0700, Tim Harvey wrote: >>>>> The IMX6Q/IMX6DL SoC's have a 2-bit temperature grade stored in OTP which >>>>> is valid for all IMX6 SoC's (despite the fact that the IMXSDLRM and >>>>> IMXSXRM do not document this - this has been proven via tests as well as >>>>> verified by Freescale FAE). >>>>> >>>>> Instead of assuming a fixed 85C for passive cooling threshold and 105C for >>>>> critical use the thermal grade for these configurations. >>>>> >>>>> We will set the critical to maxT - 5C and passive to maxT - 10C. >>> >>> I would like to chime in here if you don't mind. I have been carrying >>> a patch similar to this in the SolidRun repo to fix cooling issues >>> that we have had. I would recommend keeping the passive temp at maxT >>> - 20C due to the thermal properties of the chip. I have found that >>> around 85-90C we can maintain a relatively steady thermal state with >>> only passive cooling. Generally with a hard non-NEON based cpu >>> workload the iMX6 will level off at about 87C with all the cores >>> clocked to 1Ghz, and sometimes dipping down to 800Mhz periodically. >>> With a NEON based workload on all the cores it will push beyond this >>> and generally end up finding steady state at about 800Mhz right around >>> 90C. >>> >>> If you raise the initial passive threshold by 10C it will allow enough >>> heat to build up in the chip that the only way to avoid reaching >>> critical temps is by dropping the CPU down to its lowest frequency. >>> This is not the best experience as then you have a much warmer chip >>> and if the workload doesn't change you will just be switching between >>> running at the highest cpu frequency or lowest which makes for a >>> choppy experience. A longer passive cooling zone allows the >>> temperature of the chip to be regulated using only passive methods but >>> without drastic performance drops. >>> >>> I am doing things a bit differently in my implementation as I setup a >>> passive cooling zone for each cpu frequency, but that is just so you >>> can have more control from userspace by changing the different passive >>> trip points. >>> >>> -Jon >> >> Jon, >> >> I can agree with leaving a Max-20C passive delta. What do you think >> about the critical threshold of Max-5C and rule of not allowing it to >> be changed? >> > > Tim, > > I definitely agree that the Critical temp should be a fixed point. Is > the purpose of lowering the critical threshold from the hardware > default, to allow Linux to shutdown more cleanly rather than just have > the hardware shutting down? If that is the case then I think that is > fine. If it is to protect the SOC then that is unnecessary. We have > heated the SOCs to well beyond the critical threshold and they have > survived just fine. > > This is a bit out of context but here is the formula I am using to > figure out my trip points. By default I use a linear set of trip > points for passive cooling. > https://github.com/linux4kix/linux-linaro-stable-mx6/commit/212c17d543739a5fe0bd75b66c10f05177e8bcb0 > > The short of it is I set a trip delta of 6C and then figure out the > lowest passive trip point as Critical - (#passive trip points * trip > delta), where each cpu frequency stage is a passive trip point. This > will allow an 800Mhz SOC with 2 trip points to run at full speed > longer than a 1.2Ghz with 4 trip points. The idea being that the > higher the clock rate means we will generate more heat and have more > passive cooling levels so it is better to drop the top speed of the > CPU earlier in order to let the passive cooling be effective and find > a steady state. > > This may be a bit over the top but has fixed problems where long > running processes would build up heat and eventually cause a thermal > shutdown, but doesn't completely cripple the faster SOCs. Jon, Yes - the purpose of lowering the critical threshold from the hardware default is to allow Linux to shutdown more cleanly. If you agree with the fact that the patch here offers the improvement of using OTG temperature grade as a basis can you ack it and if you feel that the thresholds need to be adjusted perhaps propose a follow-on patch? I feel people can debate the temperature delta's endlessly but what I was really after here was to fix the fact that all the processors are not temperature graded equally because they are packaged differently (metal case on automotive offering better thermal conductivity vs plastic case on consumer) Regards, Tim
Tim, Okay that all sounds fine. I came into this half way through so I missed the core of what you were trying to accomplish, sorry to de-rail the conversation a bit. I will gladly ACK the patch and will follow up with patches we can discuss for additional changes. -Jon On Tue, Jul 28, 2015 at 4:50 PM, Tim Harvey <tharvey@gateworks.com> wrote: > On Tue, May 26, 2015 at 11:08 PM, Jon Nettleton <jon.nettleton@gmail.com> wrote: >> On Tue, May 26, 2015 at 11:24 PM, Tim Harvey <tharvey@gateworks.com> wrote: >>> On Sat, May 23, 2015 at 10:19 PM, Jon Nettleton <jon.nettleton@gmail.com> wrote: >>>> On Sun, May 24, 2015 at 4:48 AM, Shawn Guo <shawn.guo@linaro.org> wrote: >>>>> On Thu, May 21, 2015 at 04:45:47PM -0700, Tim Harvey wrote: >>>>>> The IMX6Q/IMX6DL SoC's have a 2-bit temperature grade stored in OTP which >>>>>> is valid for all IMX6 SoC's (despite the fact that the IMXSDLRM and >>>>>> IMXSXRM do not document this - this has been proven via tests as well as >>>>>> verified by Freescale FAE). >>>>>> >>>>>> Instead of assuming a fixed 85C for passive cooling threshold and 105C for >>>>>> critical use the thermal grade for these configurations. >>>>>> >>>>>> We will set the critical to maxT - 5C and passive to maxT - 10C. >>>> >>>> I would like to chime in here if you don't mind. I have been carrying >>>> a patch similar to this in the SolidRun repo to fix cooling issues >>>> that we have had. I would recommend keeping the passive temp at maxT >>>> - 20C due to the thermal properties of the chip. I have found that >>>> around 85-90C we can maintain a relatively steady thermal state with >>>> only passive cooling. Generally with a hard non-NEON based cpu >>>> workload the iMX6 will level off at about 87C with all the cores >>>> clocked to 1Ghz, and sometimes dipping down to 800Mhz periodically. >>>> With a NEON based workload on all the cores it will push beyond this >>>> and generally end up finding steady state at about 800Mhz right around >>>> 90C. >>>> >>>> If you raise the initial passive threshold by 10C it will allow enough >>>> heat to build up in the chip that the only way to avoid reaching >>>> critical temps is by dropping the CPU down to its lowest frequency. >>>> This is not the best experience as then you have a much warmer chip >>>> and if the workload doesn't change you will just be switching between >>>> running at the highest cpu frequency or lowest which makes for a >>>> choppy experience. A longer passive cooling zone allows the >>>> temperature of the chip to be regulated using only passive methods but >>>> without drastic performance drops. >>>> >>>> I am doing things a bit differently in my implementation as I setup a >>>> passive cooling zone for each cpu frequency, but that is just so you >>>> can have more control from userspace by changing the different passive >>>> trip points. >>>> >>>> -Jon >>> >>> Jon, >>> >>> I can agree with leaving a Max-20C passive delta. What do you think >>> about the critical threshold of Max-5C and rule of not allowing it to >>> be changed? >>> >> >> Tim, >> >> I definitely agree that the Critical temp should be a fixed point. Is >> the purpose of lowering the critical threshold from the hardware >> default, to allow Linux to shutdown more cleanly rather than just have >> the hardware shutting down? If that is the case then I think that is >> fine. If it is to protect the SOC then that is unnecessary. We have >> heated the SOCs to well beyond the critical threshold and they have >> survived just fine. >> >> This is a bit out of context but here is the formula I am using to >> figure out my trip points. By default I use a linear set of trip >> points for passive cooling. >> https://github.com/linux4kix/linux-linaro-stable-mx6/commit/212c17d543739a5fe0bd75b66c10f05177e8bcb0 >> >> The short of it is I set a trip delta of 6C and then figure out the >> lowest passive trip point as Critical - (#passive trip points * trip >> delta), where each cpu frequency stage is a passive trip point. This >> will allow an 800Mhz SOC with 2 trip points to run at full speed >> longer than a 1.2Ghz with 4 trip points. The idea being that the >> higher the clock rate means we will generate more heat and have more >> passive cooling levels so it is better to drop the top speed of the >> CPU earlier in order to let the passive cooling be effective and find >> a steady state. >> >> This may be a bit over the top but has fixed problems where long >> running processes would build up heat and eventually cause a thermal >> shutdown, but doesn't completely cripple the faster SOCs. > > Jon, > > Yes - the purpose of lowering the critical threshold from the hardware > default is to allow Linux to shutdown more cleanly. > > If you agree with the fact that the patch here offers the improvement > of using OTG temperature grade as a basis can you ack it and if you > feel that the thresholds need to be adjusted perhaps propose a > follow-on patch? I feel people can debate the temperature delta's > endlessly but what I was really after here was to fix the fact that > all the processors are not temperature graded equally because they are > packaged differently (metal case on automotive offering better thermal > conductivity vs plastic case on consumer) > > Regards, > > Tim
diff --git a/drivers/thermal/imx_thermal.c b/drivers/thermal/imx_thermal.c index 2ccbc07..d349d02 100644 --- a/drivers/thermal/imx_thermal.c +++ b/drivers/thermal/imx_thermal.c @@ -55,6 +55,7 @@ #define TEMPSENSE2_PANIC_VALUE_SHIFT 16 #define TEMPSENSE2_PANIC_VALUE_MASK 0xfff0000 +#define OCOTP_MEM0 0x0480 #define OCOTP_ANA1 0x04e0 /* The driver supports 1 passive trip point and 1 critical trip point */ @@ -64,12 +65,6 @@ enum imx_thermal_trip { IMX_TRIP_NUM, }; -/* - * It defines the temperature in millicelsius for passive trip point - * that will trigger cooling action when crossed. - */ -#define IMX_TEMP_PASSIVE 85000 - #define IMX_POLLING_DELAY 2000 /* millisecond */ #define IMX_PASSIVE_DELAY 1000 @@ -100,12 +95,14 @@ struct imx_thermal_data { u32 c1, c2; /* See formula in imx_get_sensor_data() */ unsigned long temp_passive; unsigned long temp_critical; + unsigned long temp_max; unsigned long alarm_temp; unsigned long last_temp; bool irq_enabled; int irq; struct clk *thermal_clk; const struct thermal_soc_data *socdata; + const char *temp_grade; }; static void imx_set_panic_temp(struct imx_thermal_data *data, @@ -286,10 +283,12 @@ static int imx_set_trip_temp(struct thermal_zone_device *tz, int trip, { struct imx_thermal_data *data = tz->devdata; + /* do not allow changing critical threshold */ if (trip == IMX_TRIP_CRITICAL) return -EPERM; - if (temp > IMX_TEMP_PASSIVE) + /* do not allow passive to be set higher than critical */ + if (temp > data->temp_critical) return -EINVAL; data->temp_passive = temp; @@ -404,17 +403,39 @@ static int imx_get_sensor_data(struct platform_device *pdev) data->c1 = temp64; data->c2 = n1 * data->c1 + 1000 * t1; - /* - * Set the default passive cooling trip point, - * can be changed from userspace. - */ - data->temp_passive = IMX_TEMP_PASSIVE; + /* use OTP for thermal grade */ + ret = regmap_read(map, OCOTP_MEM0, &val); + if (ret) { + dev_err(&pdev->dev, "failed to read temp grade: %d\n", ret); + return ret; + } + + /* The maximum die temp is specified by the Temperature Grade */ + switch ((val >> 6) & 0x3) { + case 0: /* Commercial (0 to 95C) */ + data->temp_grade = "Commercial"; + data->temp_max = 95000; + break; + case 1: /* Extended Commercial (-20 to 105C) */ + data->temp_grade = "Extended Commercial"; + data->temp_max = 105000; + break; + case 2: /* Industrial (-40 to 105C) */ + data->temp_grade = "Industrial"; + data->temp_max = 105000; + break; + case 3: /* Automotive (-40 to 125C) */ + data->temp_grade = "Automotive"; + data->temp_max = 125000; + break; + } /* - * The maximum die temperature set to 20 C higher than - * IMX_TEMP_PASSIVE. + * Set the critical trip point at 5C under max + * Set the passive trip point at 10C under max (can change via sysfs) */ - data->temp_critical = 1000 * 20 + data->temp_passive; + data->temp_critical = data->temp_max - (1000 * 5); + data->temp_passive = data->temp_max - (1000 * 10); return 0; } @@ -559,6 +580,11 @@ static int imx_thermal_probe(struct platform_device *pdev) return ret; } + dev_info(&pdev->dev, "%s CPU temperature grade - max:%ldC" + " critical:%ldC passive:%ldC\n", data->temp_grade, + data->temp_max / 1000, data->temp_critical / 1000, + data->temp_passive / 1000); + /* Enable measurements at ~ 10 Hz */ regmap_write(map, TEMPSENSE1 + REG_CLR, TEMPSENSE1_MEASURE_FREQ); measure_freq = DIV_ROUND_UP(32768, 10); /* 10 Hz */