From patchwork Tue May 18 11:25:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Vaittinen, Matti" X-Patchwork-Id: 12264613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 897F0C433ED for ; Tue, 18 May 2021 11:25:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6EE9C6117A for ; Tue, 18 May 2021 11:25:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233023AbhERL1A (ORCPT ); Tue, 18 May 2021 07:27:00 -0400 Received: from mail-lj1-f173.google.com ([209.85.208.173]:37409 "EHLO mail-lj1-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230494AbhERL07 (ORCPT ); Tue, 18 May 2021 07:26:59 -0400 Received: by mail-lj1-f173.google.com with SMTP id e2so4792931ljk.4; Tue, 18 May 2021 04:25:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Cd/c2LtdpRuc14pg+36HsS3RaXLeJ4fTyyPGFfmqhkE=; b=Xa4sYrBY29t8ofvNjeyzaohTYUSh9jkmZNCo/JaCGRWEWTQYuEu/TjPRZydx2U5AbT EOxJnvBWPSrbdAp1HoYH0TLwesJ+ubmZIyouMz4l9p1DtZ4IM/pluJZZRUfPCH7EerFa 6B1vgwLO18sHpxxVsqZDv61bf094ukVbHOfc8xYotC511zriFhM4+CF0TBPY9IULhit8 KNmWlVtb5kDv7qN9+a4T8I4rRG0ekGmAk7cbceZtbruCUxoS38ZnzlICxyS+2yxQZiG2 hJnX+pSRSP2uUizvhPVvyaLHA21LgDdoIhisBrnAT3d/vRCpR6jN4fCoCrNI0FuWWyCs RnYg== X-Gm-Message-State: AOAM530rUMaUW0QGg0z6XKgvubHv7IIgefSkQJPsX4OCQH+/iFaF6LAI WnbVstbCIzJdJo6zedSh43OhFmrVn5ekhg== X-Google-Smtp-Source: ABdhPJzLc2VgvEnx2hGYnKlpfErMzh2nxmZLWUbZ/UEcZIWBpi+IyEV3oOLkkJZsk+qLptiONpwS+w== X-Received: by 2002:a2e:9d43:: with SMTP id y3mr3583993ljj.85.1621337139127; Tue, 18 May 2021 04:25:39 -0700 (PDT) Received: from localhost.localdomain (dc7vkhyyyyyyyyyyyyyyt-3.rev.dnainternet.fi. [2001:14ba:16e2:8300::1]) by smtp.gmail.com with ESMTPSA id h14sm3264777lji.57.2021.05.18.04.25.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 May 2021 04:25:38 -0700 (PDT) Date: Tue, 18 May 2021 14:25:31 +0300 From: Matti Vaittinen To: Matti Vaittinen , Matti Vaittinen Cc: Mark Brown , Kees Cook , Andy Shevchenko , Zhang Rui , Guenter Roeck , "agross@kernel.org" , "devicetree@vger.kernel.org" , linux-power , "linux-kernel@vger.kernel.org" , "linux-renesas-soc@vger.kernel.org" , "linux-arm-msm@vger.kernel.org" , "bjorn.andersson@linaro.org" , "lgirdwood@gmail.com" , "robh+dt@kernel.org" , Daniel Lezcano , Amit Kucheria , Matteo Croce , Andrew Morton , Petr Mladek , "Rafael J. Wysocki" , Mike Rapoport , Josef Bacik , Kai-Heng Feng , linux-pm@vger.kernel.org Subject: [PATCH v10 02/11] reboot: Add hardware protection power-off Message-ID: <1278eb55ebf2de82b08761c69ab1c65a05c20b37.1621333893.git.matti.vaittinen@fi.rohmeurope.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org There can be few cases when we need to shut-down the system in order to protect the hardware. Currently this is done at least by the thermal core when temperature raises over certain limit. Some PMICs can also generate interrupts for example for over-current or over-voltage, voltage drops, short-circuit, ... etc. On some systems these are a sign of hardware failure and only thing to do is try to protect the rest of the hardware by shutting down the system. Add shut-down logic which can be used by all subsystems instead of implementing the shutdown in each subsystem. The logic is stolen from thermal_core with difference of using atomic_t instead of a mutex in order to allow calls directly from IRQ context and changing the WARN() to pr_emerg() as discussed here: https://lore.kernel.org/lkml/YJuPwAZroVZ%2Fw633@alley/ and here: https://lore.kernel.org/linux-iommu/20210331093104.383705-4-geert+renesas@glider.be/ Signed-off-by: Matti Vaittinen --- Changelog: v10: (changes suggested by Petr Mladek and Geert) - replace WARN() by pr_emerg() v8: (changes suggested by Daniel Lezcano) - replace a protection implemented by a flag + spin_lock_irqsave() with simple atomic_dec_and_test(). - Split thermal-core changes and adding the new API to separate patches v7: - New patch --- include/linux/reboot.h | 1 + kernel/reboot.c | 79 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 80 insertions(+) diff --git a/include/linux/reboot.h b/include/linux/reboot.h index 3734cd8f38a8..af907a3d68d1 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -79,6 +79,7 @@ extern char poweroff_cmd[POWEROFF_CMD_PATH_LEN]; extern void orderly_poweroff(bool force); extern void orderly_reboot(void); +void hw_protection_shutdown(const char *reason, int ms_until_forced); /* * Emergency restart, callable from an interrupt handler. diff --git a/kernel/reboot.c b/kernel/reboot.c index a6ad5eb2fa73..f7440c0c7e43 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -7,6 +7,7 @@ #define pr_fmt(fmt) "reboot: " fmt +#include #include #include #include @@ -518,6 +519,84 @@ void orderly_reboot(void) } EXPORT_SYMBOL_GPL(orderly_reboot); +/** + * hw_failure_emergency_poweroff_func - emergency poweroff work after a known delay + * @work: work_struct associated with the emergency poweroff function + * + * This function is called in very critical situations to force + * a kernel poweroff after a configurable timeout value. + */ +static void hw_failure_emergency_poweroff_func(struct work_struct *work) +{ + /* + * We have reached here after the emergency shutdown waiting period has + * expired. This means orderly_poweroff has not been able to shut off + * the system for some reason. + * + * Try to shut down the system immediately using kernel_power_off + * if populated + */ + pr_emerg("Hardware protection timed-out. Trying forced poweroff\n"); + kernel_power_off(); + + /* + * Worst of the worst case trigger emergency restart + */ + pr_emerg("Hardware protection shutdown failed. Trying emergency restart\n"); + emergency_restart(); +} + +static DECLARE_DELAYED_WORK(hw_failure_emergency_poweroff_work, + hw_failure_emergency_poweroff_func); + +/** + * hw_failure_emergency_poweroff - Trigger an emergency system poweroff + * + * This may be called from any critical situation to trigger a system shutdown + * after a given period of time. If time is negative this is not scheduled. + */ +static void hw_failure_emergency_poweroff(int poweroff_delay_ms) +{ + if (poweroff_delay_ms <= 0) + return; + schedule_delayed_work(&hw_failure_emergency_poweroff_work, + msecs_to_jiffies(poweroff_delay_ms)); +} + +/** + * hw_protection_shutdown - Trigger an emergency system poweroff + * + * @reason: Reason of emergency shutdown to be printed. + * @ms_until_forced: Time to wait for orderly shutdown before tiggering a + * forced shudown. Negative value disables the forced + * shutdown. + * + * Initiate an emergency system shutdown in order to protect hardware from + * further damage. Usage examples include a thermal protection or a voltage or + * current regulator failures. + * NOTE: The request is ignored if protection shutdown is already pending even + * if the previous request has given a large timeout for forced shutdown. + * Can be called from any context. + */ +void hw_protection_shutdown(const char *reason, int ms_until_forced) +{ + static atomic_t allow_proceed = ATOMIC_INIT(1); + + pr_emerg("HARDWARE PROTECTION shutdown (%s)\n", reason); + + /* Shutdown should be initiated only once. */ + if (!atomic_dec_and_test(&allow_proceed)) + return; + + /* + * Queue a backup emergency shutdown in the event of + * orderly_poweroff failure + */ + hw_failure_emergency_poweroff(ms_until_forced); + orderly_poweroff(true); +} +EXPORT_SYMBOL_GPL(hw_protection_shutdown); + static int __init reboot_setup(char *str) { for (;;) { From patchwork Tue May 18 11:26:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Vaittinen, Matti" X-Patchwork-Id: 12264615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 135A0C43462 for ; Tue, 18 May 2021 11:26:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EACD061209 for ; Tue, 18 May 2021 11:26:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233308AbhERL1e (ORCPT ); Tue, 18 May 2021 07:27:34 -0400 Received: from mail-lj1-f173.google.com ([209.85.208.173]:41563 "EHLO mail-lj1-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230494AbhERL1c (ORCPT ); Tue, 18 May 2021 07:27:32 -0400 Received: by mail-lj1-f173.google.com with SMTP id p20so11103125ljj.8; Tue, 18 May 2021 04:26:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=VFlfog4KJ+l3eRmDAFa5VS85r+dwLt7jTTW+pV0L12E=; b=HU2mmmRLVmgTTCQCHZvYpLbKXRqqUGaTIYtoPhcjBpAR0lpu30rO9DUhXJUC4VVgYu jo5xYCQdUgCg24tlugkb18itucK2FCSlZt/ePQVlva8MUoJl83xOfDIVFzs7SneOv0qp emQTMSpp8nBkF7Jan55DBd/JuVTRkzXJPs684RSWYKrLio/+QIyGAEkZila4YKa0XoBN bUK1GNn+6+Mr0tRZJN7GK4kC8DpxgsnNsLKWfY4yUUI7kEggRX91sek2kaYD9YPYG+YB 2tkqFyTpJZhVNeWJUd7A6BO5+k55KYS0yQ9+x/62s+BBUJfoQnay18BIeFm4YgYKgAGn bvgg== X-Gm-Message-State: AOAM531Eic8vvyfi8Aw2ol/4upQVX12X8yHYr2WkCYnJfwWNcnpzPMvu 5tDtwH2jPolpZ6wWjadUrZg= X-Google-Smtp-Source: ABdhPJzPW+Ah4AeKes7K+5+/knvV1PFe1oJA1Trn0u7zgGMS4unJRJyLA3mzwlw3bX0FczM73ufhuQ== X-Received: by 2002:a2e:2c0a:: with SMTP id s10mr3740361ljs.171.1621337172122; Tue, 18 May 2021 04:26:12 -0700 (PDT) Received: from localhost.localdomain (dc7vkhyyyyyyyyyyyyyyt-3.rev.dnainternet.fi. [2001:14ba:16e2:8300::1]) by smtp.gmail.com with ESMTPSA id 6sm2257942lfz.189.2021.05.18.04.26.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 May 2021 04:26:11 -0700 (PDT) Date: Tue, 18 May 2021 14:26:04 +0300 From: Matti Vaittinen To: Matti Vaittinen , Matti Vaittinen Cc: Mark Brown , Kees Cook , Andy Shevchenko , Zhang Rui , Guenter Roeck , "agross@kernel.org" , "devicetree@vger.kernel.org" , linux-power , "linux-kernel@vger.kernel.org" , "linux-renesas-soc@vger.kernel.org" , "linux-arm-msm@vger.kernel.org" , "bjorn.andersson@linaro.org" , "lgirdwood@gmail.com" , "robh+dt@kernel.org" , Daniel Lezcano , Amit Kucheria , Matteo Croce , Andrew Morton , Petr Mladek , "Rafael J. Wysocki" , Mike Rapoport , Josef Bacik , Kai-Heng Feng , linux-pm@vger.kernel.org Subject: [PATCH v10 03/11] thermal: Use generic HW-protection shutdown API Message-ID: <2cf331403a881c523c3dda463f66c9ab7a080755.1621333893.git.matti.vaittinen@fi.rohmeurope.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The hardware shutdown function was exported from kernel/reboot for other subsystems to use. Logic is copied from the thermal_core. The protection mutex is replaced by an atomic_t to allow calls also from an IRQ context. Also the WARN() was replaced by pr_emerg() based on discussions here: https://lore.kernel.org/lkml/YJuPwAZroVZ%2Fw633@alley/ and here: https://lore.kernel.org/linux-iommu/20210331093104.383705-4-geert+renesas@glider.be/ Use the exported API instead of implementing own just for the thermal_core. Signed-off-by: Matti Vaittinen --- Changelog: v10: - update commit message to mention changing WARN() to pr_emerg() v9: - Update the thermal documentation v8: - new patch (change added in v7, splitted in own patch at v8) Use the exported API instead --- .../driver-api/thermal/sysfs-api.rst | 24 +++---- drivers/thermal/thermal_core.c | 63 ++----------------- 2 files changed, 13 insertions(+), 74 deletions(-) diff --git a/Documentation/driver-api/thermal/sysfs-api.rst b/Documentation/driver-api/thermal/sysfs-api.rst index 4b638c14bc16..c93fa5e961a0 100644 --- a/Documentation/driver-api/thermal/sysfs-api.rst +++ b/Documentation/driver-api/thermal/sysfs-api.rst @@ -740,21 +740,15 @@ possible. 5. thermal_emergency_poweroff ============================= -On an event of critical trip temperature crossing. Thermal framework -allows the system to shutdown gracefully by calling orderly_poweroff(). -In the event of a failure of orderly_poweroff() to shut down the system -we are in danger of keeping the system alive at undesirably high -temperatures. To mitigate this high risk scenario we program a work -queue to fire after a pre-determined number of seconds to start -an emergency shutdown of the device using the kernel_power_off() -function. In case kernel_power_off() fails then finally -emergency_restart() is called in the worst case. +On an event of critical trip temperature crossing the thermal framework +shuts down the system by calling hw_protection_shutdown(). The +hw_protection_shutdown() first attempts to perform an orderly shutdown +but accepts a delay after which it proceeds doing a forced power-off +or as last resort an emergency_restart. The delay should be carefully profiled so as to give adequate time for -orderly_poweroff(). In case of failure of an orderly_poweroff() the -emergency poweroff kicks in after the delay has elapsed and shuts down -the system. +orderly poweroff. -If set to 0 emergency poweroff will not be supported. So a carefully -profiled non-zero positive value is a must for emergency poweroff to be -triggered. +If the delay is set to 0 emergency poweroff will not be supported. So a +carefully profiled non-zero positive value is a must for emergency +poweroff to be triggered. diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index d20b25f40d19..10a2d8e1cacf 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -36,10 +36,8 @@ static LIST_HEAD(thermal_governor_list); static DEFINE_MUTEX(thermal_list_lock); static DEFINE_MUTEX(thermal_governor_lock); -static DEFINE_MUTEX(poweroff_lock); static atomic_t in_suspend; -static bool power_off_triggered; static struct thermal_governor *def_governor; @@ -327,70 +325,18 @@ static void handle_non_critical_trips(struct thermal_zone_device *tz, int trip) def_governor->throttle(tz, trip); } -/** - * thermal_emergency_poweroff_func - emergency poweroff work after a known delay - * @work: work_struct associated with the emergency poweroff function - * - * This function is called in very critical situations to force - * a kernel poweroff after a configurable timeout value. - */ -static void thermal_emergency_poweroff_func(struct work_struct *work) -{ - /* - * We have reached here after the emergency thermal shutdown - * Waiting period has expired. This means orderly_poweroff has - * not been able to shut off the system for some reason. - * Try to shut down the system immediately using kernel_power_off - * if populated - */ - WARN(1, "Attempting kernel_power_off: Temperature too high\n"); - kernel_power_off(); - - /* - * Worst of the worst case trigger emergency restart - */ - WARN(1, "Attempting emergency_restart: Temperature too high\n"); - emergency_restart(); -} - -static DECLARE_DELAYED_WORK(thermal_emergency_poweroff_work, - thermal_emergency_poweroff_func); - -/** - * thermal_emergency_poweroff - Trigger an emergency system poweroff - * - * This may be called from any critical situation to trigger a system shutdown - * after a known period of time. By default this is not scheduled. - */ -static void thermal_emergency_poweroff(void) +void thermal_zone_device_critical(struct thermal_zone_device *tz) { - int poweroff_delay_ms = CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS; /* * poweroff_delay_ms must be a carefully profiled positive value. - * Its a must for thermal_emergency_poweroff_work to be scheduled + * Its a must for forced_emergency_poweroff_work to be scheduled. */ - if (poweroff_delay_ms <= 0) - return; - schedule_delayed_work(&thermal_emergency_poweroff_work, - msecs_to_jiffies(poweroff_delay_ms)); -} + int poweroff_delay_ms = CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS; -void thermal_zone_device_critical(struct thermal_zone_device *tz) -{ dev_emerg(&tz->device, "%s: critical temperature reached, " "shutting down\n", tz->type); - mutex_lock(&poweroff_lock); - if (!power_off_triggered) { - /* - * Queue a backup emergency shutdown in the event of - * orderly_poweroff failure - */ - thermal_emergency_poweroff(); - orderly_poweroff(true); - power_off_triggered = true; - } - mutex_unlock(&poweroff_lock); + hw_protection_shutdown("Temperature too high", poweroff_delay_ms); } EXPORT_SYMBOL(thermal_zone_device_critical); @@ -1538,7 +1484,6 @@ static int __init thermal_init(void) ida_destroy(&thermal_cdev_ida); mutex_destroy(&thermal_list_lock); mutex_destroy(&thermal_governor_lock); - mutex_destroy(&poweroff_lock); return result; } postcore_initcall(thermal_init);