Message ID | 20250306093900.2199442-8-o.rempel@pengutronix.de (mailing list archive) |
---|---|
State | Handled Elsewhere, archived |
Headers | show |
Series | Introduction of PSCR Framework and Related Components | expand |
Hi Oleksij, On 06/03/2025 10:38, Oleksij Rempel wrote: > Enhance the thermal core to record the Power State Change Reason (PSCR) > prior to invoking hw_protection_shutdown(). This change integrates the > PSCR framework with the thermal subsystem, ensuring that reasons for > power state changes, such as overtemperature events, are stored in a > dedicated non-volatile memory (NVMEM) cell. > > This 'black box' recording is crucial for post-mortem analysis, enabling > a deeper understanding of system failures and abrupt shutdowns, > especially in scenarios where PMICs or watchdog timers are incapable of > logging such events. The recorded data can be utilized during system > recovery routines in the bootloader or early kernel stages of subsequent > boots, significantly enhancing system diagnostics, reliability, and > debugging capabilities. > > Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> > --- > drivers/thermal/thermal_core.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 2328ac0d8561..af4e9cf22bf6 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -16,6 +16,7 @@ > #include <linux/kdev_t.h> > #include <linux/idr.h> > #include <linux/thermal.h> > +#include <linux/pscrr.h> > #include <linux/reboot.h> > #include <linux/string.h> > #include <linux/of.h> > @@ -380,6 +381,8 @@ static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdo > > dev_emerg(&tz->device, "%s: critical temperature reached\n", tz->type); > > + set_power_state_change_reason(PSCR_OVERTEMPERATURE); > + > if (shutdown) > hw_protection_shutdown(msg, poweroff_delay_ms); > else In the future could you add me as recipient to the series instead of this one ? so I can get more context. Given there are no so much hw_protection_shutdown() users in the kernel, it could be more interesting to change the function to receive a enum pscr_reason and then in the hw_protection_shutdown() call pscrr_reason_to_str().
Hi Daniel, On Wed, Mar 12, 2025 at 04:35:51PM +0100, Daniel Lezcano wrote: > > @@ -380,6 +381,8 @@ static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdo > > dev_emerg(&tz->device, "%s: critical temperature reached\n", tz->type); > > + set_power_state_change_reason(PSCR_OVERTEMPERATURE); > > + > > if (shutdown) > > hw_protection_shutdown(msg, poweroff_delay_ms); > > else > > In the future could you add me as recipient to the series instead of this > one ? so I can get more context. ack. > Given there are no so much hw_protection_shutdown() users in the kernel, it > could be more interesting to change the function to receive a enum > pscr_reason and then in the hw_protection_shutdown() call > pscrr_reason_to_str(). Do you mean, make it work with CONFIG_PSCRR=n? Beside, the latest version is v5: https://lore.kernel.org/all/20250310103732.423542-1-o.rempel@pengutronix.de/
On 12/03/2025 17:51, Oleksij Rempel wrote: > Hi Daniel, > > On Wed, Mar 12, 2025 at 04:35:51PM +0100, Daniel Lezcano wrote: >>> @@ -380,6 +381,8 @@ static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdo >>> dev_emerg(&tz->device, "%s: critical temperature reached\n", tz->type); >>> + set_power_state_change_reason(PSCR_OVERTEMPERATURE); >>> + >>> if (shutdown) >>> hw_protection_shutdown(msg, poweroff_delay_ms); >>> else >> >> In the future could you add me as recipient to the series instead of this >> one ? so I can get more context. > > ack. > >> Given there are no so much hw_protection_shutdown() users in the kernel, it >> could be more interesting to change the function to receive a enum >> pscr_reason and then in the hw_protection_shutdown() call >> pscrr_reason_to_str(). > > Do you mean, make it work with CONFIG_PSCRR=n? No I meant instead of doing: + set_power_state_change_reason(PSCR_OVERTEMPERATURE); + if (shutdown) hw_protection_shutdown(msg, poweroff_delay_ms); Replace it by: if (shutdown) hw_protection_shutdown(PSCR_OVERTEMPERATURE, poweroff_delay_ms); and in hw_protection_shutdown() use pscrr_reason_to_str() to display a msg. That can work with CONFIG_PSCRR=n > Beside, the latest version is v5: > https://lore.kernel.org/all/20250310103732.423542-1-o.rempel@pengutronix.de/ Ah ok thanks for the pointer
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 2328ac0d8561..af4e9cf22bf6 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -16,6 +16,7 @@ #include <linux/kdev_t.h> #include <linux/idr.h> #include <linux/thermal.h> +#include <linux/pscrr.h> #include <linux/reboot.h> #include <linux/string.h> #include <linux/of.h> @@ -380,6 +381,8 @@ static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdo dev_emerg(&tz->device, "%s: critical temperature reached\n", tz->type); + set_power_state_change_reason(PSCR_OVERTEMPERATURE); + if (shutdown) hw_protection_shutdown(msg, poweroff_delay_ms); else
Enhance the thermal core to record the Power State Change Reason (PSCR) prior to invoking hw_protection_shutdown(). This change integrates the PSCR framework with the thermal subsystem, ensuring that reasons for power state changes, such as overtemperature events, are stored in a dedicated non-volatile memory (NVMEM) cell. This 'black box' recording is crucial for post-mortem analysis, enabling a deeper understanding of system failures and abrupt shutdowns, especially in scenarios where PMICs or watchdog timers are incapable of logging such events. The recorded data can be utilized during system recovery routines in the bootloader or early kernel stages of subsequent boots, significantly enhancing system diagnostics, reliability, and debugging capabilities. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> --- drivers/thermal/thermal_core.c | 3 +++ 1 file changed, 3 insertions(+)