diff mbox

[1/6] watchdog: watchdog_dev: WATCHDOG_KEEP_ON feature

Message ID 1411419350-1297-1-git-send-email-j.uzycki@elproma.com.pl (mailing list archive)
State New, archived
Headers show

Commit Message

j.uzycki@elproma.com.pl Sept. 22, 2014, 8:55 p.m. UTC
Some applications require to start watchdog before userspace software.
This patch enables such feature. Only the flag is necessary
to enable it.
Moreover kernel's ping is re-enabled when userspace software closed
watchdog using the magic character. The features improves kernel's
reliability if hardware watchdog is available.

Signed-off-by: Janusz Uzycki <j.uzycki@elproma.com.pl>
---
 drivers/watchdog/watchdog_dev.c | 58 ++++++++++++++++++++++++++-
 include/linux/watchdog.h        |  5 +++
 2 files changed, 61 insertions(+), 2 deletions(-)

Comments

Guenter Roeck Sept. 26, 2014, 4:01 a.m. UTC | #1
On 09/22/2014 01:55 PM, Janusz Uzycki wrote:
> Some applications require to start watchdog before userspace software.
> This patch enables such feature. Only the flag is necessary
> to enable it.
> Moreover kernel's ping is re-enabled when userspace software closed
> watchdog using the magic character. The features improves kernel's
> reliability if hardware watchdog is available.
>
> Signed-off-by: Janusz Uzycki <j.uzycki@elproma.com.pl>

Hi Janusz,

This patch set is trying to solve four problems at once:

1) Auto-start watchdog when its driver registers
2) Keep watchdog running when its driver registers until userspace opens it
3) Handle watchdogs which can not be stopped after being started
4) Keep watchdog running with kernel timer after it has been closed,
    even if it can be stopped.

The next time adds 'boot time protection', which is really another term
for an initial timeout, and case 5).

That is a bit too much for a single patch and, even more so, a single flag.
Let's look at one case after another.

Auto-start watchdog when its driver registers - this makes sense as a
feature just by itself. A good name for its flag might be something like
WDT_AUTOSTART. A matching module parameter might also make sense.

autostart:
	Set to 0 to disable, -1 to enable with unlimited timeout,
	or <n> for an initial timeout of <n> seconds.

This could be accompanied by a variable in watchdog_device:
	int init_timeout;	/* initial timeout in seconds */

An API function such as watchdog_set_autostart() with the initial timeout
as parameter would also be helpful. This function could then be used to
implement 2).

	if (autostart || (keep_running && this_watchdog_is_running())
		watchdog_set_autostart(&wdd, autostart ? : keep_running);

keep_running could then be a another module parameter with the same meaning
as autostart.

Together this would also solve problem 5) while at the same time keeping
the use cases separate.

For 3) we really need another flag. Actually, it might be sufficient to have
watchdog drivers with this condition simply not provide a 'stop' function.
If we use a flag, something like WDOG_HW_NO_WAY_OUT with matching
WATCHDOG_HW_NOWAYOUT might make sense. Its functionality is slightly
different to the other conditions: It would not auto-start a watchdog,
but keep it running with the internal timer when the watchdog file is closed.

As for 4), I don't really know if it makes sense to have this functionality.

Does this all make sense ?

Some more comments below.

Thanks,
Guenter

> ---
>   drivers/watchdog/watchdog_dev.c | 58 ++++++++++++++++++++++++++-
>   include/linux/watchdog.h        |  5 +++
>   2 files changed, 61 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index 6aaefba..51a65f6 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -41,6 +41,7 @@
>   #include <linux/miscdevice.h>	/* For handling misc devices */
>   #include <linux/init.h>		/* For __init/__exit/... */
>   #include <linux/uaccess.h>	/* For copy_to_user/put_user/... */
> +#include <linux/jiffies.h>	/* for ping timer */
>
>   #include "watchdog_core.h"
>
> @@ -277,6 +278,27 @@ out_ioctl:
>   	return err;
>   }
>
> +/* 'keep on' feature */
> +static void watchdog_ping_timer_cb(unsigned long data)
> +{
> +	struct watchdog_device *wdd = (struct watchdog_device *)data;
> +	watchdog_ping(wdd);
> +	/* call next ping half the timeout value */
> +	mod_timer(&wdd->ping_timer,
> +			jiffies + msecs_to_jiffies(wdd->timeout * 500));
> +}
> +
> +static void watchdog_keepon_start(struct watchdog_device *wdd)

This name reflects the intended use case, but not the functionality.
Something like "watchdog_timer_start" might make more sense here.

> +{
> +	watchdog_start(wdd);

This should probably be handled by the caller, to let us separate
cases where the watchdog is already running (for example on close).

> +	watchdog_ping_timer_cb((unsigned long)wdd);
> +}
> +
> +static void watchdog_keepon_stop(struct watchdog_device *wdd)

Same name comment as above. keepon -> timer.

> +{
> +	del_timer_sync(&wdd->ping_timer);
> +}
> +
>   /*
>    *	watchdog_write: writes to the watchdog.
>    *	@file: file from VFS
> @@ -430,6 +452,9 @@ static int watchdog_open(struct inode *inode, struct file *file)
>   	if (!try_module_get(wdd->ops->owner))
>   		goto out;
>
> +	if (test_bit(WDOG_KEEP_ON, &wdd->status))
> +		watchdog_keepon_stop(wdd);
> +
>   	err = watchdog_start(wdd);
>   	if (err < 0)
>   		goto out_mod;
> @@ -472,8 +497,13 @@ static int watchdog_release(struct inode *inode, struct file *file)
>   	if (!test_bit(WDOG_ACTIVE, &wdd->status))
>   		err = 0;
>   	else if (test_and_clear_bit(WDOG_ALLOW_RELEASE, &wdd->status) ||
> -		 !(wdd->info->options & WDIOF_MAGICCLOSE))
> -		err = watchdog_stop(wdd);
> +		 !(wdd->info->options & WDIOF_MAGICCLOSE)) {
> +		if (test_bit(WDOG_KEEP_ON, &wdd->status)) {
> +			watchdog_keepon_start(wdd);
> +			err = 0;
> +		} else
> +			err = watchdog_stop(wdd);
> +	}

I think this all should probably go into some helper function.
The conditions here are getting a bit complicated.

>
>   	/* If the watchdog was not stopped, send a keepalive ping */
>   	if (err < 0) {
> @@ -524,6 +554,14 @@ int watchdog_dev_register(struct watchdog_device *watchdog)
>   {
>   	int err, devno;
>
> +	if (test_bit(WDOG_KEEP_ON, &watchdog->status)) {
> +		if (!try_module_get(watchdog->ops->owner))
> +			return -ENODEV;
> +		setup_timer(&watchdog->ping_timer, watchdog_ping_timer_cb,
> +				(unsigned long)watchdog);
> +		watchdog_keepon_start(watchdog);
> +	}
> +
This should be the last action in the probe function.
Reason is that we might have a watchdog which can not be stopped after
it was started once. If one of the error cases below happens, we would
be stuck with a running watchdog and no means to stop it or to keep
it alive.

>   	if (watchdog->id == 0) {
>   		old_wdd = watchdog;
>   		watchdog_miscdev.parent = watchdog->parent;
> @@ -535,6 +573,11 @@ int watchdog_dev_register(struct watchdog_device *watchdog)
>   				pr_err("%s: a legacy watchdog module is probably present.\n",
>   					watchdog->info->identity);
>   			old_wdd = NULL;
> +			if (test_bit(WDOG_KEEP_ON, &watchdog->status)) {
> +				watchdog_keepon_stop(watchdog);
> +				watchdog_stop(watchdog);
> +				module_put(watchdog->ops->owner);
> +			}

Won't apply after moving above functions, but in general
complex error cleanup like this should be handled with a goto
to an error handler at the end of the function.

>   			return err;
>   		}
>   	}
> @@ -553,6 +596,11 @@ int watchdog_dev_register(struct watchdog_device *watchdog)
>   			misc_deregister(&watchdog_miscdev);
>   			old_wdd = NULL;
>   		}
> +		if (test_bit(WDOG_KEEP_ON, &watchdog->status)) {
> +			watchdog_keepon_stop(watchdog);
> +			watchdog_stop(watchdog);
> +			module_put(watchdog->ops->owner);
> +		}

... to avoid situations with duplicated error handling code like this,
but also to make code easier to read. See CodingStyle.

>   	}
>   	return err;
>   }
> @@ -575,6 +623,12 @@ int watchdog_dev_unregister(struct watchdog_device *watchdog)
>   		misc_deregister(&watchdog_miscdev);
>   		old_wdd = NULL;
>   	}
> +
> +	if (test_bit(WDOG_KEEP_ON, &watchdog->status)) {
> +		watchdog_keepon_stop(watchdog);
> +		watchdog_stop(watchdog);
> +		module_put(watchdog->ops->owner);
> +	}
>   	return 0;
>   }
>
> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> index 2a3038e..650e0d5 100644
> --- a/include/linux/watchdog.h
> +++ b/include/linux/watchdog.h
> @@ -12,6 +12,7 @@
>   #include <linux/bitops.h>
>   #include <linux/device.h>
>   #include <linux/cdev.h>
> +#include <linux/timer.h>		/* for ping timer */
>   #include <uapi/linux/watchdog.h>
>
>   struct watchdog_ops;
> @@ -95,6 +96,8 @@ struct watchdog_device {
>   #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
>   #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
>   #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
> +#define WDOG_KEEP_ON		5	/* Is 'keep on' feature set? */
> +	struct timer_list ping_timer;	/* timer to keep on hardware ping */
>   };
>
>   #ifdef CONFIG_WATCHDOG_NOWAYOUT
> @@ -104,6 +107,8 @@ struct watchdog_device {
>   #define WATCHDOG_NOWAYOUT		0
>   #define WATCHDOG_NOWAYOUT_INIT_STATUS	0
>   #endif
> +/* other proposal: WATCHDOG_ALWAYS_ACTIVE */
> +#define WATCHDOG_KEEP_ON		(1 << WDOG_KEEP_ON)
>
>   /* Use the following function to check whether or not the watchdog is active */
>   static inline bool watchdog_active(struct watchdog_device *wdd)
>
j.uzycki@elproma.com.pl Sept. 29, 2014, 4:25 p.m. UTC | #2
W dniu 2014-09-26 06:01, Guenter Roeck pisze:
> On 09/22/2014 01:55 PM, Janusz Uzycki wrote:
>> Some applications require to start watchdog before userspace software.
>> This patch enables such feature. Only the flag is necessary
>> to enable it.
>> Moreover kernel's ping is re-enabled when userspace software closed
>> watchdog using the magic character. The features improves kernel's
>> reliability if hardware watchdog is available.
>>
>> Signed-off-by: Janusz Uzycki <j.uzycki@elproma.com.pl>
>

Hi Guenter,

>
> This patch set is trying to solve four problems at once:
>
> 1) Auto-start watchdog when its driver registers
> 2) Keep watchdog running when its driver registers until userspace 
> opens it
> 3) Handle watchdogs which can not be stopped after being started
> 4) Keep watchdog running with kernel timer after it has been closed,
>    even if it can be stopped.
>
> The next time adds 'boot time protection', which is really another term
> for an initial timeout, and case 5).
>
> That is a bit too much for a single patch and, even more so, a single 
> flag.

OK, but I think [PATCH 3/6] could be applied.
Do you agree? Should I resent it separately?
I omited in the comment
"The patch adds suspend/resume PM support to stmp3xxx_rtc_wdt
watchdog driver" because the subject is almost the same.

> Let's look at one case after another.
>
> Auto-start watchdog when its driver registers - this makes sense as a
> feature just by itself. A good name for its flag might be something like
> WDT_AUTOSTART. A matching module parameter might also make sense.
>
> autostart:
>     Set to 0 to disable, -1 to enable with unlimited timeout,
>     or <n> for an initial timeout of <n> seconds.

Current start(1) + keep-on(2,3,4) + boottime(5) combined. It looks OK.

>
> This could be accompanied by a variable in watchdog_device:
>     int init_timeout;    /* initial timeout in seconds */

As the module parameter, instead of "boottime" in watchdog_core?

>
> An API function such as watchdog_set_autostart() with the initial timeout
> as parameter would also be helpful. This function could then be used to
> implement 2).
>
>     if (autostart || (keep_running && this_watchdog_is_running())
>         watchdog_set_autostart(&wdd, autostart ? : keep_running);
>
I don't understand the difference exactly and why to check the watchdog
is running? This means watchdog is active or something new?

> keep_running could then be a another module parameter with the same 
> meaning
> as autostart.
But autostart and keep_running aren't in conflict.
So I don't understand also "autostart ? : keep_running".

>
> Together this would also solve problem 5) while at the same time keeping
> the use cases separate.

It is solved by current code.

>
> For 3) we really need another flag. Actually, it might be sufficient 
> to have
> watchdog drivers with this condition simply not provide a 'stop' 
> function.
or use "NOT SUPPORTED" error code in stop,
Stop could be called on register and new flag is set.

> If we use a flag, something like WDOG_HW_NO_WAY_OUT with matching
> WATCHDOG_HW_NOWAYOUT might make sense. Its functionality is slightly

What is difference between WDOG_HW_NO_WAY_OUT
and WATCHDOG_HW_NOWAYOUT?

> different to the other conditions: It would not auto-start a watchdog,
> but keep it running with the internal timer when the watchdog file is 
> closed.
>
> As for 4), I don't really know if it makes sense to have this 
> functionality.

Yes, it is rootfs specific need. Script based code runs watchdog before
critical function and after exit the watchdog using magic char.
Critical section has timeout equal watchdog timeout value.
The feature allow to avoid userland application for watchdog
and does not cost much in the kernel.

>
> Does this all make sense ?

I need more details because not all is clear for me.

>
> Some more comments below.
thanks

>
> Thanks,
> Guenter
>
>> ---
>>   drivers/watchdog/watchdog_dev.c | 58 ++++++++++++++++++++++++++-
>>   include/linux/watchdog.h        |  5 +++
>>   2 files changed, 61 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/watchdog/watchdog_dev.c 
>> b/drivers/watchdog/watchdog_dev.c
>> index 6aaefba..51a65f6 100644
>> --- a/drivers/watchdog/watchdog_dev.c
>> +++ b/drivers/watchdog/watchdog_dev.c
>> @@ -41,6 +41,7 @@
>>   #include <linux/miscdevice.h>    /* For handling misc devices */
>>   #include <linux/init.h>        /* For __init/__exit/... */
>>   #include <linux/uaccess.h>    /* For copy_to_user/put_user/... */
>> +#include <linux/jiffies.h>    /* for ping timer */
>>
>>   #include "watchdog_core.h"
>>
>> @@ -277,6 +278,27 @@ out_ioctl:
>>       return err;
>>   }
>>
>> +/* 'keep on' feature */
>> +static void watchdog_ping_timer_cb(unsigned long data)
>> +{
>> +    struct watchdog_device *wdd = (struct watchdog_device *)data;
>> +    watchdog_ping(wdd);
>> +    /* call next ping half the timeout value */
>> +    mod_timer(&wdd->ping_timer,
>> +            jiffies + msecs_to_jiffies(wdd->timeout * 500));
>> +}
>> +
>> +static void watchdog_keepon_start(struct watchdog_device *wdd)
>
> This name reflects the intended use case, but not the functionality.
> Something like "watchdog_timer_start" might make more sense here.

I see

>
>> +{
>> +    watchdog_start(wdd);
>
> This should probably be handled by the caller, to let us separate
> cases where the watchdog is already running (for example on close).

The goal is to have watchdog always enabled.
This option could be disabled only if watchdog can't be stopped
because then suspend is also impossible.

>
>> +    watchdog_ping_timer_cb((unsigned long)wdd);
>> +}
>> +
>> +static void watchdog_keepon_stop(struct watchdog_device *wdd)
>
> Same name comment as above. keepon -> timer.

sure

>
>> +{
>> +    del_timer_sync(&wdd->ping_timer);
>> +}
>> +
>>   /*
>>    *    watchdog_write: writes to the watchdog.
>>    *    @file: file from VFS
>> @@ -430,6 +452,9 @@ static int watchdog_open(struct inode *inode, 
>> struct file *file)
>>       if (!try_module_get(wdd->ops->owner))
>>           goto out;
>>
>> +    if (test_bit(WDOG_KEEP_ON, &wdd->status))
>> +        watchdog_keepon_stop(wdd);
>> +
>>       err = watchdog_start(wdd);
>>       if (err < 0)
>>           goto out_mod;
>> @@ -472,8 +497,13 @@ static int watchdog_release(struct inode *inode, 
>> struct file *file)
>>       if (!test_bit(WDOG_ACTIVE, &wdd->status))
>>           err = 0;
>>       else if (test_and_clear_bit(WDOG_ALLOW_RELEASE, &wdd->status) ||
>> -         !(wdd->info->options & WDIOF_MAGICCLOSE))
>> -        err = watchdog_stop(wdd);
>> +         !(wdd->info->options & WDIOF_MAGICCLOSE)) {
>> +        if (test_bit(WDOG_KEEP_ON, &wdd->status)) {
>> +            watchdog_keepon_start(wdd);
>> +            err = 0;
>> +        } else
>> +            err = watchdog_stop(wdd);
>> +    }
>
> I think this all should probably go into some helper function.
> The conditions here are getting a bit complicated.

I will think about.

>
>>
>>       /* If the watchdog was not stopped, send a keepalive ping */
>>       if (err < 0) {
>> @@ -524,6 +554,14 @@ int watchdog_dev_register(struct watchdog_device 
>> *watchdog)
>>   {
>>       int err, devno;
>>
>> +    if (test_bit(WDOG_KEEP_ON, &watchdog->status)) {
>> +        if (!try_module_get(watchdog->ops->owner))
>> +            return -ENODEV;
>> +        setup_timer(&watchdog->ping_timer, watchdog_ping_timer_cb,
>> +                (unsigned long)watchdog);
>> +        watchdog_keepon_start(watchdog);
>> +    }
>> +
> This should be the last action in the probe function.
> Reason is that we might have a watchdog which can not be stopped after
> it was started once. If one of the error cases below happens, we would
> be stuck with a running watchdog and no means to stop it or to keep
> it alive.
Right. But why not the last in the register function?
Isn't the register function the last in the probe?

>
>>       if (watchdog->id == 0) {
>>           old_wdd = watchdog;
>>           watchdog_miscdev.parent = watchdog->parent;
>> @@ -535,6 +573,11 @@ int watchdog_dev_register(struct watchdog_device 
>> *watchdog)
>>                   pr_err("%s: a legacy watchdog module is probably 
>> present.\n",
>>                       watchdog->info->identity);
>>               old_wdd = NULL;
>> +            if (test_bit(WDOG_KEEP_ON, &watchdog->status)) {
>> +                watchdog_keepon_stop(watchdog);
>> +                watchdog_stop(watchdog);
>> +                module_put(watchdog->ops->owner);
>> +            }
>
> Won't apply after moving above functions, but in general
> complex error cleanup like this should be handled with a goto
> to an error handler at the end of the function.

OK

>
>>               return err;
>>           }
>>       }
>> @@ -553,6 +596,11 @@ int watchdog_dev_register(struct watchdog_device 
>> *watchdog)
>>               misc_deregister(&watchdog_miscdev);
>>               old_wdd = NULL;
>>           }
>> +        if (test_bit(WDOG_KEEP_ON, &watchdog->status)) {
>> +            watchdog_keepon_stop(watchdog);
>> +            watchdog_stop(watchdog);
>> +            module_put(watchdog->ops->owner);
>> +        }
>
> ... to avoid situations with duplicated error handling code like this,
> but also to make code easier to read. See CodingStyle.

of course

>
>>       }
>>       return err;
>>   }
>> @@ -575,6 +623,12 @@ int watchdog_dev_unregister(struct 
>> watchdog_device *watchdog)
>>           misc_deregister(&watchdog_miscdev);
>>           old_wdd = NULL;
>>       }
>> +
>> +    if (test_bit(WDOG_KEEP_ON, &watchdog->status)) {
>> +        watchdog_keepon_stop(watchdog);
>> +        watchdog_stop(watchdog);
>> +        module_put(watchdog->ops->owner);
>> +    }
>>       return 0;
>>   }
>>
>> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
>> index 2a3038e..650e0d5 100644
>> --- a/include/linux/watchdog.h
>> +++ b/include/linux/watchdog.h
>> @@ -12,6 +12,7 @@
>>   #include <linux/bitops.h>
>>   #include <linux/device.h>
>>   #include <linux/cdev.h>
>> +#include <linux/timer.h>        /* for ping timer */
>>   #include <uapi/linux/watchdog.h>
>>
>>   struct watchdog_ops;
>> @@ -95,6 +96,8 @@ struct watchdog_device {
>>   #define WDOG_ALLOW_RELEASE    2    /* Did we receive the magic char 
>> ? */
>>   #define WDOG_NO_WAY_OUT        3    /* Is 'nowayout' feature set ? */
>>   #define WDOG_UNREGISTERED    4    /* Has the device been 
>> unregistered */
>> +#define WDOG_KEEP_ON        5    /* Is 'keep on' feature set? */
>> +    struct timer_list ping_timer;    /* timer to keep on hardware 
>> ping */
>>   };
>>
>>   #ifdef CONFIG_WATCHDOG_NOWAYOUT
>> @@ -104,6 +107,8 @@ struct watchdog_device {
>>   #define WATCHDOG_NOWAYOUT        0
>>   #define WATCHDOG_NOWAYOUT_INIT_STATUS    0
>>   #endif
>> +/* other proposal: WATCHDOG_ALWAYS_ACTIVE */
>> +#define WATCHDOG_KEEP_ON        (1 << WDOG_KEEP_ON)
>>
>>   /* Use the following function to check whether or not the watchdog 
>> is active */
>>   static inline bool watchdog_active(struct watchdog_device *wdd)
>>
>
Guenter Roeck Sept. 30, 2014, 4:37 a.m. UTC | #3
On 09/29/2014 09:25 AM, Janusz U?ycki wrote:
>
> W dniu 2014-09-26 06:01, Guenter Roeck pisze:
>> On 09/22/2014 01:55 PM, Janusz Uzycki wrote:
>>> Some applications require to start watchdog before userspace software.
>>> This patch enables such feature. Only the flag is necessary
>>> to enable it.
>>> Moreover kernel's ping is re-enabled when userspace software closed
>>> watchdog using the magic character. The features improves kernel's
>>> reliability if hardware watchdog is available.
>>>
>>> Signed-off-by: Janusz Uzycki <j.uzycki@elproma.com.pl>
>>
>
> Hi Guenter,
>
>>
>> This patch set is trying to solve four problems at once:
>>
>> 1) Auto-start watchdog when its driver registers
>> 2) Keep watchdog running when its driver registers until userspace opens it
>> 3) Handle watchdogs which can not be stopped after being started
>> 4) Keep watchdog running with kernel timer after it has been closed,
>>    even if it can be stopped.
>>
>> The next time adds 'boot time protection', which is really another term
>> for an initial timeout, and case 5).
>>
>> That is a bit too much for a single patch and, even more so, a single flag.
>
> OK, but I think [PATCH 3/6] could be applied.
> Do you agree? Should I resent it separately?

Yes, it looks ok.

> I omited in the comment
> "The patch adds suspend/resume PM support to stmp3xxx_rtc_wdt
> watchdog driver" because the subject is almost the same.
>
>> Let's look at one case after another.
>>
>> Auto-start watchdog when its driver registers - this makes sense as a
>> feature just by itself. A good name for its flag might be something like
>> WDT_AUTOSTART. A matching module parameter might also make sense.
>>
>> autostart:
>>     Set to 0 to disable, -1 to enable with unlimited timeout,
>>     or <n> for an initial timeout of <n> seconds.
>
> Current start(1) + keep-on(2,3,4) + boottime(5) combined. It looks OK.
>
Maybe for you. For me they are different cases.

>>
>> This could be accompanied by a variable in watchdog_device:
>>     int init_timeout;    /* initial timeout in seconds */
>
> As the module parameter, instead of "boottime" in watchdog_core?
>
>>
>> An API function such as watchdog_set_autostart() with the initial timeout
>> as parameter would also be helpful. This function could then be used to
>> implement 2).
>>
>>     if (autostart || (keep_running && this_watchdog_is_running())
>>         watchdog_set_autostart(&wdd, autostart ? : keep_running);
>>
> I don't understand the difference exactly and why to check the watchdog
> is running? This means watchdog is active or something new?
>
>> keep_running could then be a another module parameter with the same meaning
>> as autostart.
> But autostart and keep_running aren't in conflict.
> So I don't understand also "autostart ? : keep_running".
>

The functionality is distinctively different.

autostart: start watchdog on module load
keep_running: If the watchdog is already running, keep it running. Otherwise do nothing.

Both have different use cases which should not be combined.

>>
>> Together this would also solve problem 5) while at the same time keeping
>> the use cases separate.
>
> It is solved by current code.
>
>>
>> For 3) we really need another flag. Actually, it might be sufficient to have
>> watchdog drivers with this condition simply not provide a 'stop' function.
> or use "NOT SUPPORTED" error code in stop,
> Stop could be called on register and new flag is set.
>

Seems to add complexity for no real benefit. Please explain why you think
it is a good idea to have multiple drivers implement the same function
just to return an error and do nothing else,

>> If we use a flag, something like WDOG_HW_NO_WAY_OUT with matching
>> WATCHDOG_HW_NOWAYOUT might make sense. Its functionality is slightly
>
> What is difference between WDOG_HW_NO_WAY_OUT
> and WATCHDOG_HW_NOWAYOUT?
>
Similar to other flags

#define WDOG_HW_NO_WAY_OUT	5
#define WATCHDOG_HW_NOWAYOUT	(1 << WDOG_HW_NO_WAY_OUT)

>> different to the other conditions: It would not auto-start a watchdog,
>> but keep it running with the internal timer when the watchdog file is closed.
>>
>> As for 4), I don't really know if it makes sense to have this functionality.
>
> Yes, it is rootfs specific need. Script based code runs watchdog before
> critical function and after exit the watchdog using magic char.
> Critical section has timeout equal watchdog timeout value.
> The feature allow to avoid userland application for watchdog
> and does not cost much in the kernel.
>
Sorry, I can not follow your logic here. A basic userland implementation
doesn't cost much either, is much safer, and even init systems such as
systemd implement it nowadays.

Guenter
j.uzycki@elproma.com.pl Sept. 30, 2014, 10:22 a.m. UTC | #4
W dniu 2014-09-30 06:37, Guenter Roeck pisze:
> On 09/29/2014 09:25 AM, Janusz U?ycki wrote:
>>
>>> This patch set is trying to solve four problems at once:
>>>
>>> 1) Auto-start watchdog when its driver registers
>>> 2) Keep watchdog running when its driver registers until userspace 
>>> opens it
>>> 3) Handle watchdogs which can not be stopped after being started
>>> 4) Keep watchdog running with kernel timer after it has been closed,
>>>    even if it can be stopped.
>>>
>>> The next time adds 'boot time protection', which is really another term
>>> for an initial timeout, and case 5).
>>>
>>> That is a bit too much for a single patch and, even more so, a 
>>> single flag.
>>
>> OK, but I think [PATCH 3/6] could be applied.
>> Do you agree? Should I resent it separately?
>
> Yes, it looks ok.

Can you apply it?

>
>> I omited in the comment
>> "The patch adds suspend/resume PM support to stmp3xxx_rtc_wdt
>> watchdog driver" because the subject is almost the same.
>>
>>> Let's look at one case after another.
>>>
>>> Auto-start watchdog when its driver registers - this makes sense as a
>>> feature just by itself. A good name for its flag might be something 
>>> like
>>> WDT_AUTOSTART. A matching module parameter might also make sense.
>>>
>>> autostart:
>>>     Set to 0 to disable, -1 to enable with unlimited timeout,
>>>     or <n> for an initial timeout of <n> seconds.
>>
>> Current start(1) + keep-on(2,3,4) + boottime(5) combined. It looks OK.
>>
> Maybe for you. For me they are different cases.

They are different. I missed some words in the first sentence :)
However autostart automatically combine them together again :)
The problem is common timer so the patches are dependent.
There is no reason to use more timers.

>
>>>
>>> This could be accompanied by a variable in watchdog_device:
>>>     int init_timeout;    /* initial timeout in seconds */
>>
>> As the module parameter, instead of "boottime" in watchdog_core?
>>
>>>
>>> An API function such as watchdog_set_autostart() with the initial 
>>> timeout
>>> as parameter would also be helpful. This function could then be used to
>>> implement 2).
>>>
>>>     if (autostart || (keep_running && this_watchdog_is_running())
>>>         watchdog_set_autostart(&wdd, autostart ? : keep_running);
>>>
>> I don't understand the difference exactly and why to check the watchdog
>> is running? This means watchdog is active or something new?
>>
>>> keep_running could then be a another module parameter with the same 
>>> meaning
>>> as autostart.
>> But autostart and keep_running aren't in conflict.
>> So I don't understand also "autostart ? : keep_running".
>>
>
> The functionality is distinctively different.
>
> autostart: start watchdog on module load
> keep_running: If the watchdog is already running, keep it running. 
> Otherwise do nothing.
>
> Both have different use cases which should not be combined.

according to autostart value:
-1: current "keep-on" feature
 > 0: current "boottime" feature

Does watchdog_set_autostart() start/activate watchog?
Does keep_running differ from autostart=-1 "this_watchdog_is_running()" 
only?
How/where is this_watchdog_is_running() implemented?

What type is keep_running?
* bool
* 0/1
* -1/0 for watchdog_set_autostart()?

>
>>>
>>> Together this would also solve problem 5) while at the same time 
>>> keeping
>>> the use cases separate.
>>
>> It is solved by current code.
>>
>>>
>>> For 3) we really need another flag. Actually, it might be sufficient 
>>> to have
>>> watchdog drivers with this condition simply not provide a 'stop' 
>>> function.
>> or use "NOT SUPPORTED" error code in stop,
>> Stop could be called on register and new flag is set.
>>
>
> Seems to add complexity for no real benefit. Please explain why you think
> it is a good idea to have multiple drivers implement the same function
> just to return an error and do nothing else,

Specific driver knows if the watchdog is stoppable.
wddev->ops->stop() is called from watchdog_dev and if the stop() returns 
an error
watchdog_dev prints warning. It assumes wddev->ops->stop() is always 
implemented
even if not supported, ie. there is no condition. So either ENOSUP can 
be used
or the condition for optional (not mandatory) wddev->ops->stop().
Today a lot of drivers returns 0 only instead of ENOSUP.
So stop() could be called before autostart() to set the flag (code 
complexity)
or as you wrote just use the flag directly. The last one is indeed better.

>
>>> If we use a flag, something like WDOG_HW_NO_WAY_OUT with matching
>>> WATCHDOG_HW_NOWAYOUT might make sense. Its functionality is slightly
>>
>> What is difference between WDOG_HW_NO_WAY_OUT
>> and WATCHDOG_HW_NOWAYOUT?
>>
> Similar to other flags
>
> #define WDOG_HW_NO_WAY_OUT    5
> #define WATCHDOG_HW_NOWAYOUT    (1 << WDOG_HW_NO_WAY_OUT)

oh, right

>
>>> different to the other conditions: It would not auto-start a watchdog,
>>> but keep it running with the internal timer when the watchdog file 
>>> is closed.
>>>
>>> As for 4), I don't really know if it makes sense to have this 
>>> functionality.
>>
>> Yes, it is rootfs specific need. Script based code runs watchdog before
>> critical function and after exit the watchdog using magic char.
>> Critical section has timeout equal watchdog timeout value.
>> The feature allow to avoid userland application for watchdog
>> and does not cost much in the kernel.
>>
> Sorry, I can not follow your logic here. A basic userland implementation
> doesn't cost much either, is much safer, and even init systems such as
> systemd implement it nowadays.

True but not always for embedded systems where systemd is overweight today.
Let's notice that 4) works only if magic char is sent on exit.
Thanks to 4) magic char works also for non-stoppable watchdogs.

Janusz
j.uzycki@elproma.com.pl Sept. 30, 2014, 12:46 p.m. UTC | #5
TODO: change WDOG_KEEP_ON to autostart module parameter

changelog:
[PATCH 1/6] watchdog: watchdog_dev: WATCHDOG_KEEP_ON feature
* clean up old comments
* watchdog_keepon_start() renamed to watchdog_timer_start()
* watchdog_keepon_stop() renamed to watchdog_timer_stop()
* watchdog_timer_register() added
* watchdog_timer_unregister() added
* watchdog_timer_register() is the last action
  in watchdog_dev_register().
  FIXME: Really should be in probe()?
* TODO: should watchdog_timer_start() call watchdog_start()?
* watchdog_release() uses watchdog_timer_restart() helper function
Guenter Roeck Sept. 30, 2014, 1:47 p.m. UTC | #6
On 09/30/2014 03:22 AM, Janusz U?ycki wrote:
>
> W dniu 2014-09-30 06:37, Guenter Roeck pisze:
>> On 09/29/2014 09:25 AM, Janusz U?ycki wrote:
>>>
>>>> This patch set is trying to solve four problems at once:
>>>>
>>>> 1) Auto-start watchdog when its driver registers
>>>> 2) Keep watchdog running when its driver registers until userspace opens it
>>>> 3) Handle watchdogs which can not be stopped after being started
>>>> 4) Keep watchdog running with kernel timer after it has been closed,
>>>>    even if it can be stopped.
>>>>
>>>> The next time adds 'boot time protection', which is really another term
>>>> for an initial timeout, and case 5).
>>>>
>>>> That is a bit too much for a single patch and, even more so, a single flag.
>>>
>>> OK, but I think [PATCH 3/6] could be applied.
>>> Do you agree? Should I resent it separately?
>>
>> Yes, it looks ok.
>
> Can you apply it?
>
I don't apply watchdog patches; I only provide review feedback.

>>
>>> I omited in the comment
>>> "The patch adds suspend/resume PM support to stmp3xxx_rtc_wdt
>>> watchdog driver" because the subject is almost the same.
>>>
>>>> Let's look at one case after another.
>>>>
>>>> Auto-start watchdog when its driver registers - this makes sense as a
>>>> feature just by itself. A good name for its flag might be something like
>>>> WDT_AUTOSTART. A matching module parameter might also make sense.
>>>>
>>>> autostart:
>>>>     Set to 0 to disable, -1 to enable with unlimited timeout,
>>>>     or <n> for an initial timeout of <n> seconds.
>>>
>>> Current start(1) + keep-on(2,3,4) + boottime(5) combined. It looks OK.
>>>
>> Maybe for you. For me they are different cases.
>
> They are different. I missed some words in the first sentence :)
> However autostart automatically combine them together again :)
> The problem is common timer so the patches are dependent.
> There is no reason to use more timers.
>
I did not say use multiple timers. I said use multiple flags.

Guenter
diff mbox

Patch

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 6aaefba..51a65f6 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -41,6 +41,7 @@ 
 #include <linux/miscdevice.h>	/* For handling misc devices */
 #include <linux/init.h>		/* For __init/__exit/... */
 #include <linux/uaccess.h>	/* For copy_to_user/put_user/... */
+#include <linux/jiffies.h>	/* for ping timer */
 
 #include "watchdog_core.h"
 
@@ -277,6 +278,27 @@  out_ioctl:
 	return err;
 }
 
+/* 'keep on' feature */
+static void watchdog_ping_timer_cb(unsigned long data)
+{
+	struct watchdog_device *wdd = (struct watchdog_device *)data;
+	watchdog_ping(wdd);
+	/* call next ping half the timeout value */
+	mod_timer(&wdd->ping_timer,
+			jiffies + msecs_to_jiffies(wdd->timeout * 500));
+}
+
+static void watchdog_keepon_start(struct watchdog_device *wdd)
+{
+	watchdog_start(wdd);
+	watchdog_ping_timer_cb((unsigned long)wdd);
+}
+
+static void watchdog_keepon_stop(struct watchdog_device *wdd)
+{
+	del_timer_sync(&wdd->ping_timer);
+}
+
 /*
  *	watchdog_write: writes to the watchdog.
  *	@file: file from VFS
@@ -430,6 +452,9 @@  static int watchdog_open(struct inode *inode, struct file *file)
 	if (!try_module_get(wdd->ops->owner))
 		goto out;
 
+	if (test_bit(WDOG_KEEP_ON, &wdd->status))
+		watchdog_keepon_stop(wdd);
+
 	err = watchdog_start(wdd);
 	if (err < 0)
 		goto out_mod;
@@ -472,8 +497,13 @@  static int watchdog_release(struct inode *inode, struct file *file)
 	if (!test_bit(WDOG_ACTIVE, &wdd->status))
 		err = 0;
 	else if (test_and_clear_bit(WDOG_ALLOW_RELEASE, &wdd->status) ||
-		 !(wdd->info->options & WDIOF_MAGICCLOSE))
-		err = watchdog_stop(wdd);
+		 !(wdd->info->options & WDIOF_MAGICCLOSE)) {
+		if (test_bit(WDOG_KEEP_ON, &wdd->status)) {
+			watchdog_keepon_start(wdd);
+			err = 0;
+		} else
+			err = watchdog_stop(wdd);
+	}
 
 	/* If the watchdog was not stopped, send a keepalive ping */
 	if (err < 0) {
@@ -524,6 +554,14 @@  int watchdog_dev_register(struct watchdog_device *watchdog)
 {
 	int err, devno;
 
+	if (test_bit(WDOG_KEEP_ON, &watchdog->status)) {
+		if (!try_module_get(watchdog->ops->owner))
+			return -ENODEV;
+		setup_timer(&watchdog->ping_timer, watchdog_ping_timer_cb,
+				(unsigned long)watchdog);
+		watchdog_keepon_start(watchdog);
+	}
+
 	if (watchdog->id == 0) {
 		old_wdd = watchdog;
 		watchdog_miscdev.parent = watchdog->parent;
@@ -535,6 +573,11 @@  int watchdog_dev_register(struct watchdog_device *watchdog)
 				pr_err("%s: a legacy watchdog module is probably present.\n",
 					watchdog->info->identity);
 			old_wdd = NULL;
+			if (test_bit(WDOG_KEEP_ON, &watchdog->status)) {
+				watchdog_keepon_stop(watchdog);
+				watchdog_stop(watchdog);
+				module_put(watchdog->ops->owner);
+			}
 			return err;
 		}
 	}
@@ -553,6 +596,11 @@  int watchdog_dev_register(struct watchdog_device *watchdog)
 			misc_deregister(&watchdog_miscdev);
 			old_wdd = NULL;
 		}
+		if (test_bit(WDOG_KEEP_ON, &watchdog->status)) {
+			watchdog_keepon_stop(watchdog);
+			watchdog_stop(watchdog);
+			module_put(watchdog->ops->owner);
+		}
 	}
 	return err;
 }
@@ -575,6 +623,12 @@  int watchdog_dev_unregister(struct watchdog_device *watchdog)
 		misc_deregister(&watchdog_miscdev);
 		old_wdd = NULL;
 	}
+
+	if (test_bit(WDOG_KEEP_ON, &watchdog->status)) {
+		watchdog_keepon_stop(watchdog);
+		watchdog_stop(watchdog);
+		module_put(watchdog->ops->owner);
+	}
 	return 0;
 }
 
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index 2a3038e..650e0d5 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -12,6 +12,7 @@ 
 #include <linux/bitops.h>
 #include <linux/device.h>
 #include <linux/cdev.h>
+#include <linux/timer.h>		/* for ping timer */
 #include <uapi/linux/watchdog.h>
 
 struct watchdog_ops;
@@ -95,6 +96,8 @@  struct watchdog_device {
 #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
 #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
 #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
+#define WDOG_KEEP_ON		5	/* Is 'keep on' feature set? */
+	struct timer_list ping_timer;	/* timer to keep on hardware ping */
 };
 
 #ifdef CONFIG_WATCHDOG_NOWAYOUT
@@ -104,6 +107,8 @@  struct watchdog_device {
 #define WATCHDOG_NOWAYOUT		0
 #define WATCHDOG_NOWAYOUT_INIT_STATUS	0
 #endif
+/* other proposal: WATCHDOG_ALWAYS_ACTIVE */
+#define WATCHDOG_KEEP_ON		(1 << WDOG_KEEP_ON)
 
 /* Use the following function to check whether or not the watchdog is active */
 static inline bool watchdog_active(struct watchdog_device *wdd)