diff mbox series

[PATCHv2] watchdog: Add stop_on_reboot parameter to control reboot policy

Message ID 20200214162209.129107-1-dima@arista.com (mailing list archive)
State Changes Requested
Headers show
Series [PATCHv2] watchdog: Add stop_on_reboot parameter to control reboot policy | expand

Commit Message

Dmitry Safonov Feb. 14, 2020, 4:22 p.m. UTC
Many watchdog drivers use watchdog_stop_on_reboot() helper in order
to stop the watchdog on system reboot. Unfortunately, this logic is
coded in driver's probe function and doesn't allows user to decide what
to do during shutdown/reboot.

On the other side, Xen and Qemu watchdog drivers (xen_wdt and i6300esb)
may be configured to either send NMI or turn off/reboot VM as
the watchdog action. As the kernel may stuck at any state, sending NMIs
can't reliably reboot the VM.

At Arista, we benefited from the following set-up: the emulated watchdogs
trigger VM reset and softdog is set to catch less severe conditions to
generate vmcore. Just before reboot watchdog's timeout is increased
to some good-enough value (3 mins). That keeps watchdog always running
and guarantees that VM doesn't stuck.

Provide new stop_on_reboot module parameter to let user control
watchdog's reboot policy.

Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: linux-watchdog@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
Changes v1 => v2: Add module parameter instead of ioctl()

 drivers/watchdog/watchdog_core.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Guenter Roeck Feb. 22, 2020, 4:06 p.m. UTC | #1
On Fri, Feb 14, 2020 at 04:22:09PM +0000, Dmitry Safonov wrote:
> Many watchdog drivers use watchdog_stop_on_reboot() helper in order
> to stop the watchdog on system reboot. Unfortunately, this logic is
> coded in driver's probe function and doesn't allows user to decide what
> to do during shutdown/reboot.
> 
> On the other side, Xen and Qemu watchdog drivers (xen_wdt and i6300esb)
> may be configured to either send NMI or turn off/reboot VM as
> the watchdog action. As the kernel may stuck at any state, sending NMIs
> can't reliably reboot the VM.
> 
> At Arista, we benefited from the following set-up: the emulated watchdogs
> trigger VM reset and softdog is set to catch less severe conditions to
> generate vmcore. Just before reboot watchdog's timeout is increased
> to some good-enough value (3 mins). That keeps watchdog always running
> and guarantees that VM doesn't stuck.
> 
> Provide new stop_on_reboot module parameter to let user control
> watchdog's reboot policy.
> 
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
> Cc: linux-watchdog@vger.kernel.org
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
> Changes v1 => v2: Add module parameter instead of ioctl()
> 
>  drivers/watchdog/watchdog_core.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
> index 861daf4f37b2..5ead96199a0b 100644
> --- a/drivers/watchdog/watchdog_core.c
> +++ b/drivers/watchdog/watchdog_core.c
> @@ -39,6 +39,10 @@
>  
>  static DEFINE_IDA(watchdog_ida);
>  
> +static int stop_on_reboot = -1;
> +module_param(stop_on_reboot, int, 0644);
> +MODULE_PARM_DESC(stop_on_reboot, "Stop watchdogs on reboot (0=keep watching, 1=stop)");
> +

My major concern is that this is writeable at runtime.
Changing the value won't change the behavior of already loaded
drivers. Unloading and reloading the driver will change its behavior
after the value was changed. This would be confusing, and it is hard
to imagine for anyone to expect such a behavior. Does this have to be
writeable ?

Guenter

>  /*
>   * Deferred Registration infrastructure.
>   *
> @@ -254,6 +258,14 @@ static int __watchdog_register_device(struct watchdog_device *wdd)
>  		}
>  	}
>  
> +	/* Module parameter to force watchdog policy on reboot. */
> +	if (stop_on_reboot != -1) {
> +		if (stop_on_reboot)
> +			set_bit(WDOG_STOP_ON_REBOOT, &wdd->status);
> +		else
> +			clear_bit(WDOG_STOP_ON_REBOOT, &wdd->status);
> +	}
> +
>  	if (test_bit(WDOG_STOP_ON_REBOOT, &wdd->status)) {
>  		wdd->reboot_nb.notifier_call = watchdog_reboot_notifier;
>
Dmitry Safonov Feb. 23, 2020, 11:21 a.m. UTC | #2
Hi Guenter,

On 2/22/20 4:06 PM, Guenter Roeck wrote:
> On Fri, Feb 14, 2020 at 04:22:09PM +0000, Dmitry Safonov wrote:
[..]
>> +static int stop_on_reboot = -1;
>> +module_param(stop_on_reboot, int, 0644);
>> +MODULE_PARM_DESC(stop_on_reboot, "Stop watchdogs on reboot (0=keep watching, 1=stop)");
>> +
> 
> My major concern is that this is writeable at runtime.
> Changing the value won't change the behavior of already loaded
> drivers. Unloading and reloading the driver will change its behavior
> after the value was changed. This would be confusing, and it is hard
> to imagine for anyone to expect such a behavior. Does this have to be
> writeable ?

No, it wasn't. I've messed it up by thinking about fours in 0644, but
for some reason failed to recognize that it allows root writes.

I'll follow up with v3, sorry for simple-minded typo.

Thanks,
          Dmitry
diff mbox series

Patch

diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
index 861daf4f37b2..5ead96199a0b 100644
--- a/drivers/watchdog/watchdog_core.c
+++ b/drivers/watchdog/watchdog_core.c
@@ -39,6 +39,10 @@ 
 
 static DEFINE_IDA(watchdog_ida);
 
+static int stop_on_reboot = -1;
+module_param(stop_on_reboot, int, 0644);
+MODULE_PARM_DESC(stop_on_reboot, "Stop watchdogs on reboot (0=keep watching, 1=stop)");
+
 /*
  * Deferred Registration infrastructure.
  *
@@ -254,6 +258,14 @@  static int __watchdog_register_device(struct watchdog_device *wdd)
 		}
 	}
 
+	/* Module parameter to force watchdog policy on reboot. */
+	if (stop_on_reboot != -1) {
+		if (stop_on_reboot)
+			set_bit(WDOG_STOP_ON_REBOOT, &wdd->status);
+		else
+			clear_bit(WDOG_STOP_ON_REBOOT, &wdd->status);
+	}
+
 	if (test_bit(WDOG_STOP_ON_REBOOT, &wdd->status)) {
 		wdd->reboot_nb.notifier_call = watchdog_reboot_notifier;