diff mbox series

thermal: ti-soc-thermal: Disable the CPU PM notifier for OMAP4430

Message ID 20201029100335.27665-1-peter.ujfalusi@ti.com
State New
Delegated to: Daniel Lezcano
Headers show
Series thermal: ti-soc-thermal: Disable the CPU PM notifier for OMAP4430 | expand

Commit Message

Peter Ujfalusi Oct. 29, 2020, 10:03 a.m. UTC
It has been observed that on OMAP4430 (ES2.0, ES2.1 and ES2.3) the enabled
notifier causes errors on the DTEMP readout values:

ti-soc-thermal 4a002260.bandgap: in range ADC val: 52
ti-soc-thermal 4a002260.bandgap: in range ADC val: 64
ti-soc-thermal 4a002260.bandgap: in range ADC val: 64
ti-soc-thermal 4a002260.bandgap: out of range ADC val: 0
thermal thermal_zone0: failed to read out thermal zone (-5)
ti-soc-thermal 4a002260.bandgap: out of range ADC val: 0
thermal thermal_zone0: failed to read out thermal zone (-5)
ti-soc-thermal 4a002260.bandgap: out of range ADC val: 4
thermal thermal_zone0: failed to read out thermal zone (-5)
ti-soc-thermal 4a002260.bandgap: in range ADC val: 100

raw 100 translates to 133 Celsius on omap4-sdp, triggering shutdown due to
critical temperature.

When the notifier is disable for OMAP4430 the DTEMP values are stable:
ti-soc-thermal 4a002260.bandgap: in range ADC val: 56
ti-soc-thermal 4a002260.bandgap: in range ADC val: 56
ti-soc-thermal 4a002260.bandgap: in range ADC val: 57
ti-soc-thermal 4a002260.bandgap: in range ADC val: 57
ti-soc-thermal 4a002260.bandgap: in range ADC val: 56

Fixes: 5093402e5b44 ("thermal: ti-soc-thermal: Enable addition power management")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
Hi,

my omap4-sdp (Blaze) was shutting down randomly due to critical temperature with
5.10-rc1 and I have bisected it back to 5093402e5b44.

Disabling the notifier fixes the random shutdowns on OMAP4430 (ES2.0 and ES2.1)
but it does not cause any issues on OMAP4460 (PandaES) or OMAP3630 (BeagleXM).
Tony's duovero with OMAP4430 ES2.3 did not ninja-shutdown, but he also have
constant and steady stream of:
thermal thermal_zone0: failed to read out thermal zone (-5)

pointing to similar issue.

Regards,
Peter

 drivers/thermal/ti-soc-thermal/ti-bandgap.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Tony Lindgren Oct. 29, 2020, 10:51 a.m. UTC | #1
* Peter Ujfalusi <peter.ujfalusi@ti.com> [201029 10:03]:
> Disabling the notifier fixes the random shutdowns on OMAP4430 (ES2.0 and ES2.1)
> but it does not cause any issues on OMAP4460 (PandaES) or OMAP3630 (BeagleXM).
> Tony's duovero with OMAP4430 ES2.3 did not ninja-shutdown, but he also have
> constant and steady stream of:
> thermal thermal_zone0: failed to read out thermal zone (-5)

Works for me and I've verified duovero still keeps hitting core ret idle:

Tested-by: Tony Lindgren <tony@atomide.com>

Regards,

Tony
Peter Ujfalusi Nov. 3, 2020, 6:42 a.m. UTC | #2
Eduardo, Keerthy,

On 29/10/2020 12.51, Tony Lindgren wrote:
> * Peter Ujfalusi <peter.ujfalusi@ti.com> [201029 10:03]:
>> Disabling the notifier fixes the random shutdowns on OMAP4430 (ES2.0 and ES2.1)
>> but it does not cause any issues on OMAP4460 (PandaES) or OMAP3630 (BeagleXM).
>> Tony's duovero with OMAP4430 ES2.3 did not ninja-shutdown, but he also have
>> constant and steady stream of:
>> thermal thermal_zone0: failed to read out thermal zone (-5)
> 
> Works for me and I've verified duovero still keeps hitting core ret idle:

Can you pick this one up for 5.10 to make omap4430-sdp to be usable (to
not shut down randomly).
The regression was introduced in 5.10-rc1.

> Tested-by: Tony Lindgren <tony@atomide.com>
> 
> Regards,
> 
> Tony
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
J, KEERTHY Nov. 3, 2020, 6:50 a.m. UTC | #3
On 11/3/2020 12:12 PM, Peter Ujfalusi wrote:
> Eduardo, Keerthy,
> 
> On 29/10/2020 12.51, Tony Lindgren wrote:
>> * Peter Ujfalusi <peter.ujfalusi@ti.com> [201029 10:03]:
>>> Disabling the notifier fixes the random shutdowns on OMAP4430 (ES2.0 and ES2.1)
>>> but it does not cause any issues on OMAP4460 (PandaES) or OMAP3630 (BeagleXM).
>>> Tony's duovero with OMAP4430 ES2.3 did not ninja-shutdown, but he also have
>>> constant and steady stream of:
>>> thermal thermal_zone0: failed to read out thermal zone (-5)
>>
>> Works for me and I've verified duovero still keeps hitting core ret idle:
> 
> Can you pick this one up for 5.10 to make omap4430-sdp to be usable (to
> not shut down randomly).
> The regression was introduced in 5.10-rc1.

Peter,

Thanks for the fix.

Acked-by: Keerthy <j-keerthy@ti.com>

Best Regards,
Keerthy

> 
>> Tested-by: Tony Lindgren <tony@atomide.com>
>>
>> Regards,
>>
>> Tony
>>
> 
> - Péter
> 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
>
Daniel Lezcano Nov. 12, 2020, 11:31 a.m. UTC | #4
On 03/11/2020 07:42, Peter Ujfalusi wrote:
> Eduardo, Keerthy,
> 
> On 29/10/2020 12.51, Tony Lindgren wrote:
>> * Peter Ujfalusi <peter.ujfalusi@ti.com> [201029 10:03]:
>>> Disabling the notifier fixes the random shutdowns on OMAP4430 (ES2.0 and ES2.1)
>>> but it does not cause any issues on OMAP4460 (PandaES) or OMAP3630 (BeagleXM).
>>> Tony's duovero with OMAP4430 ES2.3 did not ninja-shutdown, but he also have
>>> constant and steady stream of:
>>> thermal thermal_zone0: failed to read out thermal zone (-5)
>>
>> Works for me and I've verified duovero still keeps hitting core ret idle:
> 
> Can you pick this one up for 5.10 to make omap4430-sdp to be usable (to
> not shut down randomly).
> The regression was introduced in 5.10-rc1.
> 
>> Tested-by: Tony Lindgren <tony@atomide.com>

Applied as a fix for v5.10-rc
diff mbox series

Patch

diff --git a/drivers/thermal/ti-soc-thermal/ti-bandgap.c b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
index 5e596168ba73..dcac99f327b0 100644
--- a/drivers/thermal/ti-soc-thermal/ti-bandgap.c
+++ b/drivers/thermal/ti-soc-thermal/ti-bandgap.c
@@ -20,6 +20,7 @@ 
 #include <linux/err.h>
 #include <linux/types.h>
 #include <linux/spinlock.h>
+#include <linux/sys_soc.h>
 #include <linux/reboot.h>
 #include <linux/of_device.h>
 #include <linux/of_platform.h>
@@ -864,6 +865,17 @@  static struct ti_bandgap *ti_bandgap_build(struct platform_device *pdev)
 	return bgp;
 }
 
+/*
+ * List of SoCs on which the CPU PM notifier can cause erros on the DTEMP
+ * readout.
+ * Enabled notifier on these machines results in erroneous, random values which
+ * could trigger unexpected thermal shutdown.
+ */
+static const struct soc_device_attribute soc_no_cpu_notifier[] = {
+	{ .machine = "OMAP4430" },
+	{ /* sentinel */ },
+};
+
 /***   Device driver call backs   ***/
 
 static
@@ -1020,7 +1032,8 @@  int ti_bandgap_probe(struct platform_device *pdev)
 
 #ifdef CONFIG_PM_SLEEP
 	bgp->nb.notifier_call = bandgap_omap_cpu_notifier;
-	cpu_pm_register_notifier(&bgp->nb);
+	if (!soc_device_match(soc_no_cpu_notifier))
+		cpu_pm_register_notifier(&bgp->nb);
 #endif
 
 	return 0;
@@ -1056,7 +1069,8 @@  int ti_bandgap_remove(struct platform_device *pdev)
 	struct ti_bandgap *bgp = platform_get_drvdata(pdev);
 	int i;
 
-	cpu_pm_unregister_notifier(&bgp->nb);
+	if (!soc_device_match(soc_no_cpu_notifier))
+		cpu_pm_unregister_notifier(&bgp->nb);
 
 	/* Remove sensor interfaces */
 	for (i = 0; i < bgp->conf->sensor_count; i++) {