diff mbox series

thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430

Message ID 20200706183338.25622-1-tony@atomide.com (mailing list archive)
State New, archived
Headers show
Series thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430 | expand

Commit Message

Tony Lindgren July 6, 2020, 6:33 p.m. UTC
We can sometimes get bogus thermal shutdowns on omap4430 at least with
droid4 running idle with a battery charger connected:

thermal thermal_zone0: critical temperature reached (143 C), shutting down

Dumping out the register values shows we can occasionally get a 0x7f value
that is outside the TRM listed values in the ADC conversion table. And then
we get a normal value when reading again after that. Reading the register
multiple times does not seem help avoiding the bogus values as they stay
until the next sample is ready.

Looking at the TRM chapter "18.4.10.2.3 ADC Codes Versus Temperature", we
should have values from 13 to 107 listed with a total of 95 values. But
looking at the omap4430_adc_to_temp array, the values are off, and the
end values are missing. And it seems that the 4430 ADC table is similar
to omap3630 rather than omap4460.

Let's fix the issue by using values based on the omap3630 table and just
ignoring invalid values. Compared to the 4430 TRM, the omap3630 table has
the missing values added while the TRM table only shows every second
value.

Note that sometimes the ADC register values within the valid table can
also be way off for about 1 out of 10 values. But it seems that those
just show about 25 C too low values rather than too high values. So those
do not cause a bogus thermal shutdown.

Fixes: 1a31270e54d7 ("staging: omap-thermal: add OMAP4 data structures")
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
 .../ti-soc-thermal/omap4-thermal-data.c       | 23 ++++++++++---------
 .../thermal/ti-soc-thermal/omap4xxx-bandgap.h | 10 +++++---
 2 files changed, 19 insertions(+), 14 deletions(-)

Comments

Pavel Machek Aug. 23, 2020, 9:12 p.m. UTC | #1
Hi!

> We can sometimes get bogus thermal shutdowns on omap4430 at least with
> droid4 running idle with a battery charger connected:
> 
> thermal thermal_zone0: critical temperature reached (143 C), shutting down
> 
> Dumping out the register values shows we can occasionally get a 0x7f value
> that is outside the TRM listed values in the ADC conversion table. And then
> we get a normal value when reading again after that. Reading the register
> multiple times does not seem help avoiding the bogus values as they stay
> until the next sample is ready.
> 
> Looking at the TRM chapter "18.4.10.2.3 ADC Codes Versus Temperature", we
> should have values from 13 to 107 listed with a total of 95 values. But
> looking at the omap4430_adc_to_temp array, the values are off, and the
> end values are missing. And it seems that the 4430 ADC table is similar
> to omap3630 rather than omap4460.
> 
> Let's fix the issue by using values based on the omap3630 table and just
> ignoring invalid values. Compared to the 4430 TRM, the omap3630 table has
> the missing values added while the TRM table only shows every second
> value.
> 
> Note that sometimes the ADC register values within the valid table can
> also be way off for about 1 out of 10 values. But it seems that those
> just show about 25 C too low values rather than too high values. So those
> do not cause a bogus thermal shutdown.

This does not seem to be in recent -next. Ping?

Best regards,
								Pavel
Daniel Lezcano Aug. 24, 2020, 10:45 a.m. UTC | #2
On 23/08/2020 23:12, Pavel Machek wrote:
> Hi!
> 
>> We can sometimes get bogus thermal shutdowns on omap4430 at least with
>> droid4 running idle with a battery charger connected:
>>
>> thermal thermal_zone0: critical temperature reached (143 C), shutting down
>>
>> Dumping out the register values shows we can occasionally get a 0x7f value
>> that is outside the TRM listed values in the ADC conversion table. And then
>> we get a normal value when reading again after that. Reading the register
>> multiple times does not seem help avoiding the bogus values as they stay
>> until the next sample is ready.
>>
>> Looking at the TRM chapter "18.4.10.2.3 ADC Codes Versus Temperature", we
>> should have values from 13 to 107 listed with a total of 95 values. But
>> looking at the omap4430_adc_to_temp array, the values are off, and the
>> end values are missing. And it seems that the 4430 ADC table is similar
>> to omap3630 rather than omap4460.
>>
>> Let's fix the issue by using values based on the omap3630 table and just
>> ignoring invalid values. Compared to the 4430 TRM, the omap3630 table has
>> the missing values added while the TRM table only shows every second
>> value.
>>
>> Note that sometimes the ADC register values within the valid table can
>> also be way off for about 1 out of 10 values. But it seems that those
>> just show about 25 C too low values rather than too high values. So those
>> do not cause a bogus thermal shutdown.
> 
> This does not seem to be in recent -next. Ping?

Pong.

Going back from vacation. Will be in next very soon.
diff mbox series

Patch

diff --git a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
--- a/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
+++ b/drivers/thermal/ti-soc-thermal/omap4-thermal-data.c
@@ -37,20 +37,21 @@  static struct temp_sensor_data omap4430_mpu_temp_sensor_data = {
 
 /*
  * Temperature values in milli degree celsius
- * ADC code values from 530 to 923
+ * ADC code values from 13 to 107, see TRM
+ * "18.4.10.2.3 ADC Codes Versus Temperature".
  */
 static const int
 omap4430_adc_to_temp[OMAP4430_ADC_END_VALUE - OMAP4430_ADC_START_VALUE + 1] = {
-	-38000, -35000, -34000, -32000, -30000, -28000, -26000, -24000, -22000,
-	-20000, -18000, -17000, -15000, -13000, -12000, -10000, -8000, -6000,
-	-5000, -3000, -1000, 0, 2000, 3000, 5000, 6000, 8000, 10000, 12000,
-	13000, 15000, 17000, 19000, 21000, 23000, 25000, 27000, 28000, 30000,
-	32000, 33000, 35000, 37000, 38000, 40000, 42000, 43000, 45000, 47000,
-	48000, 50000, 52000, 53000, 55000, 57000, 58000, 60000, 62000, 64000,
-	66000, 68000, 70000, 71000, 73000, 75000, 77000, 78000, 80000, 82000,
-	83000, 85000, 87000, 88000, 90000, 92000, 93000, 95000, 97000, 98000,
-	100000, 102000, 103000, 105000, 107000, 109000, 111000, 113000, 115000,
-	117000, 118000, 120000, 122000, 123000,
+	-40000, -38000, -35000, -34000, -32000, -30000, -28000, -26000, -24000,
+	-22000,	-20000, -18500, -17000, -15000, -13500, -12000, -10000, -8000,
+	-6500, -5000, -3500, -1500, 0, 2000, 3500, 5000, 6500, 8500, 10000,
+	12000, 13500, 15000, 17000, 19000, 21000, 23000, 25000, 27000, 28500,
+	30000, 32000, 33500, 35000, 37000, 38500, 40000, 42000, 43500, 45000,
+	47000, 48500, 50000, 52000, 53500, 55000, 57000, 58500, 60000, 62000,
+	64000, 66000, 68000, 70000, 71500, 73500, 75000, 77000, 78500, 80000,
+	82000, 83500, 85000, 87000, 88500, 90000, 92000, 93500, 95000, 97000,
+	98500, 100000, 102000, 103500, 105000, 107000, 109000, 111000, 113000,
+	115000, 117000, 118500, 120000, 122000, 123500, 125000,
 };
 
 /* OMAP4430 data */
diff --git a/drivers/thermal/ti-soc-thermal/omap4xxx-bandgap.h b/drivers/thermal/ti-soc-thermal/omap4xxx-bandgap.h
--- a/drivers/thermal/ti-soc-thermal/omap4xxx-bandgap.h
+++ b/drivers/thermal/ti-soc-thermal/omap4xxx-bandgap.h
@@ -53,9 +53,13 @@ 
  * and thresholds for OMAP4430.
  */
 
-/* ADC conversion table limits */
-#define OMAP4430_ADC_START_VALUE			0
-#define OMAP4430_ADC_END_VALUE				127
+/*
+ * ADC conversion table limits. Ignore values outside the TRM listed
+ * range to avoid bogus thermal shutdowns. See omap4430 TRM chapter
+ * "18.4.10.2.3 ADC Codes Versus Temperature".
+ */
+#define OMAP4430_ADC_START_VALUE			13
+#define OMAP4430_ADC_END_VALUE				107
 /* bandgap clock limits (no control on 4430) */
 #define OMAP4430_MAX_FREQ				32768
 #define OMAP4430_MIN_FREQ				32768