diff mbox

[PATCHv7,1/8] watchdog: Extend kernel API to know about HW limitations

Message ID 1429701102-22320-2-git-send-email-timo.kokkonen@offcode.fi (mailing list archive)
State New, archived
Headers show

Commit Message

Timo Kokkonen April 22, 2015, 11:11 a.m. UTC
There is a great deal of diversity in the watchdog hardware found on
different devices. Differen hardware have different contstraints on
them, many of the constraints that are excessively difficult for the
user space to satisfy.

One such constraint is the length of the timeout value, which in many
cases can be just a few seconds. Drivers are creating ad hoc solutions
with timers and workqueues to extend the timeout in order to give user
space more time between updates. Looking at the drivers it is clear
that this has resulted to a lot of duplicate code.

Add an extension to the watchdog kernel API that allows the driver to
describe tis HW constraints to the watchdog code. A kernel worker in
the core is then used to extend the watchdog timeout on behalf of the
user space. This allows the drivers to be simplified as core takes
over the timer extending.

Tested-by: Wenyou Yang <wenyou.yang@atmel.com>
Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
---
 drivers/watchdog/watchdog_core.c | 101 +++++++++++++++++++++++++++++++++++++--
 drivers/watchdog/watchdog_dev.c  |  75 ++++++++++++++++++++++++++---
 include/linux/watchdog.h         |  23 +++++++++
 3 files changed, 189 insertions(+), 10 deletions(-)

Comments

Guenter Roeck April 24, 2015, 5:08 p.m. UTC | #1
On Wed, Apr 22, 2015 at 02:11:35PM +0300, Timo Kokkonen wrote:
> There is a great deal of diversity in the watchdog hardware found on
> different devices. Differen hardware have different contstraints on
> them, many of the constraints that are excessively difficult for the
> user space to satisfy.
> 
> One such constraint is the length of the timeout value, which in many
> cases can be just a few seconds. Drivers are creating ad hoc solutions
> with timers and workqueues to extend the timeout in order to give user
> space more time between updates. Looking at the drivers it is clear
> that this has resulted to a lot of duplicate code.
> 
> Add an extension to the watchdog kernel API that allows the driver to
> describe tis HW constraints to the watchdog code. A kernel worker in
> the core is then used to extend the watchdog timeout on behalf of the
> user space. This allows the drivers to be simplified as core takes
> over the timer extending.
> 
> Tested-by: Wenyou Yang <wenyou.yang@atmel.com>
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>

I started twice to review this series, and each time there is a new version
before I can finish the review. Guess I'll wait until it settles down a bit
before trying again :-(. Just a quick comment below.

> ---
>  drivers/watchdog/watchdog_core.c | 101 +++++++++++++++++++++++++++++++++++++--
>  drivers/watchdog/watchdog_dev.c  |  75 ++++++++++++++++++++++++++---
>  include/linux/watchdog.h         |  23 +++++++++
>  3 files changed, 189 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
> index cec9b55..fd12489 100644
> --- a/drivers/watchdog/watchdog_core.c
> +++ b/drivers/watchdog/watchdog_core.c
> @@ -99,6 +99,89 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
>  EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>  
>  /**
> + * watchdog_init_parms() - initialize generic watchdog parameters
> + * @wdd: Watchdog device to operate
> + * @dev: Device that stores the device tree properties
> + *
> + * Initialize the generic timeout parameters. The driver needs to set
> + * hw_features bitmask from @wdd prior calling this function in order
> + * to ensure the core knows how to handle the HW.
> + *
> + * A zero is returned on success and -EINVAL for failure.
> + */
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
> +{
> +	int ret = 0;
> +
> +	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);

You are changing the semantics of watchdog_init_timeout here;
for all practical purposes it no longer accepts the timeout passed
as parameter, but expects the timeout to be configured in wdd->timeout
instead. Please don't do that.

Guenter
Timo Kokkonen April 27, 2015, 5:41 a.m. UTC | #2
On 24.04.2015 20:08, Guenter Roeck wrote:
> On Wed, Apr 22, 2015 at 02:11:35PM +0300, Timo Kokkonen wrote:
>> There is a great deal of diversity in the watchdog hardware found on
>> different devices. Differen hardware have different contstraints on
>> them, many of the constraints that are excessively difficult for the
>> user space to satisfy.
>>
>> One such constraint is the length of the timeout value, which in many
>> cases can be just a few seconds. Drivers are creating ad hoc solutions
>> with timers and workqueues to extend the timeout in order to give user
>> space more time between updates. Looking at the drivers it is clear
>> that this has resulted to a lot of duplicate code.
>>
>> Add an extension to the watchdog kernel API that allows the driver to
>> describe tis HW constraints to the watchdog code. A kernel worker in
>> the core is then used to extend the watchdog timeout on behalf of the
>> user space. This allows the drivers to be simplified as core takes
>> over the timer extending.
>>
>> Tested-by: Wenyou Yang <wenyou.yang@atmel.com>
>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
>
> I started twice to review this series, and each time there is a new version
> before I can finish the review. Guess I'll wait until it settles down a bit
> before trying again :-(. Just a quick comment below.
>

Yeah, I didn't quite know how much time I had available to work with 
these patches, so I kept on sending fixed series out in a hope I get 
some feedback before making too big steps in any false direction.. And I 
ended up doing a  bit more than I thought at first. But I'm not going to 
make a new one until I have got enough feedback from this version. So 
please go ahead and review this version when you have time.

>> ---
>>   drivers/watchdog/watchdog_core.c | 101 +++++++++++++++++++++++++++++++++++++--
>>   drivers/watchdog/watchdog_dev.c  |  75 ++++++++++++++++++++++++++---
>>   include/linux/watchdog.h         |  23 +++++++++
>>   3 files changed, 189 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
>> index cec9b55..fd12489 100644
>> --- a/drivers/watchdog/watchdog_core.c
>> +++ b/drivers/watchdog/watchdog_core.c
>> @@ -99,6 +99,89 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
>>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>>
>>   /**
>> + * watchdog_init_parms() - initialize generic watchdog parameters
>> + * @wdd: Watchdog device to operate
>> + * @dev: Device that stores the device tree properties
>> + *
>> + * Initialize the generic timeout parameters. The driver needs to set
>> + * hw_features bitmask from @wdd prior calling this function in order
>> + * to ensure the core knows how to handle the HW.
>> + *
>> + * A zero is returned on success and -EINVAL for failure.
>> + */
>> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
>> +{
>> +	int ret = 0;
>> +
>> +	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
>
> You are changing the semantics of watchdog_init_timeout here;
> for all practical purposes it no longer accepts the timeout passed
> as parameter, but expects the timeout to be configured in wdd->timeout
> instead. Please don't do that.

hmm.. Yes, this isn't quite right. What I thought it should be is that 
the driver initializes  the values with sane value (either from module 
parameter or some default) and then if that's not set (wdd->timeout is 
zero), watchdog_init_params() gets the value from device tree. If we 
still don't get any preference for the timeout, then use some reasonable 
default, such as 60 seconds. How's that?

Right now the module parameter basically gets ignored altogether, which 
is wrong.

-Timo
Uwe Kleine-König May 4, 2015, 7:58 a.m. UTC | #3
Hello Timo,

On Wed, Apr 22, 2015 at 02:11:35PM +0300, Timo Kokkonen wrote:
> There is a great deal of diversity in the watchdog hardware found on
> different devices. Differen hardware have different contstraints on
s/Differen/Different/; s/contstraints/constraints/

> them, many of the constraints that are excessively difficult for the
> user space to satisfy.
> 
> One such constraint is the length of the timeout value, which in many
> cases can be just a few seconds. Drivers are creating ad hoc solutions
> with timers and workqueues to extend the timeout in order to give user
> space more time between updates. Looking at the drivers it is clear
> that this has resulted to a lot of duplicate code.
> 
> Add an extension to the watchdog kernel API that allows the driver to
> describe tis HW constraints to the watchdog code. A kernel worker in
s/tis/its/

> the core is then used to extend the watchdog timeout on behalf of the
> user space. This allows the drivers to be simplified as core takes
> over the timer extending.
In general a nice idea.

> Tested-by: Wenyou Yang <wenyou.yang@atmel.com>
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> ---
>  drivers/watchdog/watchdog_core.c | 101 +++++++++++++++++++++++++++++++++++++--
>  drivers/watchdog/watchdog_dev.c  |  75 ++++++++++++++++++++++++++---
>  include/linux/watchdog.h         |  23 +++++++++
>  3 files changed, 189 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
> index cec9b55..fd12489 100644
> --- a/drivers/watchdog/watchdog_core.c
> +++ b/drivers/watchdog/watchdog_core.c
> @@ -99,6 +99,89 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
>  EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>  
>  /**
> + * watchdog_init_parms() - initialize generic watchdog parameters
> + * @wdd: Watchdog device to operate
> + * @dev: Device that stores the device tree properties
> + *
> + * Initialize the generic timeout parameters. The driver needs to set
> + * hw_features bitmask from @wdd prior calling this function in order
> + * to ensure the core knows how to handle the HW.
> + *
> + * A zero is returned on success and -EINVAL for failure.
> + */
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
> +{
> +	int ret = 0;
> +
> +	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
> +	if (ret < 0)
> +		return ret;
As Guenter pointed out, using wdd->timeout is unfortunate here.

> +
> +	/*
> +	 * Max HW timeout needs to be set so that core knows when to
> +	 * use a kernel worker to support longer watchdog timeouts
> +	 */
> +	if (!wdd->hw_max_timeout)
> +		return -EINVAL;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(watchdog_init_params);
> +
> +static void watchdog_worker(struct work_struct *work)
> +{
> +	struct watchdog_device *wdd = container_of(to_delayed_work(work),
> +						struct watchdog_device, work);
please align the follow-up line to the opening (.

> +	bool boot_keepalive;
> +	bool active_keepalive;
> +
> +	mutex_lock(&wdd->lock);
> +
> +	boot_keepalive = !watchdog_active(wdd) &&
I'm not sure about the semantic of watchdog_active here. Would be nice
to clearify this. Does it mean /dev/watchdog is open, or the watchdog is
running? Maybe introduce watchdog_running(wdd) and watchdog_opened(wdd)?
Is this related to the topic of this patch? You didn't mention anything
in the commit log about the time until userspace comes up.

> +		!watchdog_is_stoppable(wdd);
> +
> +	active_keepalive = watchdog_active(wdd) &&
> +		wdd->hw_max_timeout < wdd->timeout * HZ;
> +
> +	if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
> +		pr_crit("I will reset your machine !\n");
> +		mutex_unlock(&wdd->lock);
> +		return;
> +	}
I'd replace the pr_crit by a code comment. Also "I will reset your
machine!" is wrong, the core just stops petting the watchdog. If
userspace comes in in time there is no harm, right?

> +	if (boot_keepalive || active_keepalive) {
> +		if (wdd->ops->ping)
> +			wdd->ops->ping(wdd);
> +		else
> +			wdd->ops->start(wdd);
if (boot_keepalive || active_keepalive)
	_watchdog_ping(wdd);

> +
> +		schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
s/  / /; Would it be sensible to assume hw_heartbeat to be timeout / 2
always?

> +	}
> +
> +	mutex_unlock(&wdd->lock);
> +}
> +
> +static int prepare_watchdog(struct watchdog_device *wdd)
> +{
> +	int err = 0;
> +
> +	/* Stop the watchdog now before user space opens the device */
> +	if (watchdog_is_stoppable(wdd) &&
> +		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> +		err = wdd->ops->stop(wdd);
> +
> +	} else if (!watchdog_is_stoppable(wdd) &&
> +		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> +		/*
> +		 * Can't stop it, use a delayed worker to tick it
> +		 * until it's open by user space
> +		 */
I'd add a ping here because if the driver comes up a big part of the
initial timeout setup by the bootloader might already be over.

Note that at least for the omap watchdog driver the assignment to
timeout doesn't relate in any way to the timeout actually setup in
hardware.

> +		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> +	}
> +	return err;
> +}
> +
> +/**
>   * watchdog_register_device() - register a watchdog device
>   * @wdd: watchdog device
>   *
> @@ -156,13 +239,24 @@ int watchdog_register_device(struct watchdog_device *wdd)
>  	wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>  					NULL, "watchdog%d", wdd->id);
>  	if (IS_ERR(wdd->dev)) {
> -		watchdog_dev_unregister(wdd);
> -		ida_simple_remove(&watchdog_ida, id);
>  		ret = PTR_ERR(wdd->dev);
> -		return ret;
> +		goto dev_create_fail;
> +	}
> +
> +	INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
> +
> +	if (wdd->hw_max_timeout) {
> +		ret = prepare_watchdog(wdd);
> +		if (ret)
> +			goto dev_create_fail;
	} else {
		dev_warn("Incomplete watchdog driver implementation, please report or fix\n")
>  	}
>  
>  	return 0;
> +
> +dev_create_fail:
> +	watchdog_dev_unregister(wdd);
> +	ida_simple_remove(&watchdog_ida, id);
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(watchdog_register_device);
>  
> @@ -181,6 +275,7 @@ void watchdog_unregister_device(struct watchdog_device *wdd)
>  	if (wdd == NULL)
>  		return;
>  
> +	cancel_delayed_work_sync(&wdd->work);
>  	devno = wdd->cdev.dev;
>  	ret = watchdog_dev_unregister(wdd);
>  	if (ret)
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index 6aaefba..04ac68c 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -49,6 +49,14 @@ static dev_t watchdog_devt;
>  /* the watchdog device behind /dev/watchdog */
>  static struct watchdog_device *old_wdd;
>  
> +static int _watchdog_ping(struct watchdog_device *wddev)
> +{
> +	if (wddev->ops->ping)
> +		return wddev->ops->ping(wddev);  /* ping the watchdog */
> +	else
> +		return wddev->ops->start(wddev); /* restart watchdog */
> +}
> +
>  /*
>   *	watchdog_ping: ping the watchdog.
>   *	@wddev: the watchdog device to ping
> @@ -73,10 +81,13 @@ static int watchdog_ping(struct watchdog_device *wddev)
>  	if (!watchdog_active(wddev))
>  		goto out_ping;
>  
> -	if (wddev->ops->ping)
> -		err = wddev->ops->ping(wddev);  /* ping the watchdog */
> -	else
> -		err = wddev->ops->start(wddev); /* restart watchdog */
> +	err = _watchdog_ping(wddev);
> +
> +	if (wddev->hw_max_timeout &&
> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
> +		wddev->expires = jiffies + wddev->timeout * HZ;
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	}
>  
>  out_ping:
>  	mutex_unlock(&wddev->lock);
> @@ -110,6 +121,13 @@ static int watchdog_start(struct watchdog_device *wddev)
>  	if (err == 0)
>  		set_bit(WDOG_ACTIVE, &wddev->status);
>  
> +	if (wddev->hw_max_timeout &&
> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
> +		wddev->expires = jiffies + wddev->timeout * HZ;
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	} else
> +		cancel_delayed_work(&wddev->work);
> +
>  out_start:
>  	mutex_unlock(&wddev->lock);
>  	return err;
> @@ -145,9 +163,21 @@ static int watchdog_stop(struct watchdog_device *wddev)
>  		goto out_stop;
>  	}
>  
> -	err = wddev->ops->stop(wddev);
> -	if (err == 0)
> +	if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
> +		cancel_delayed_work(&wddev->work);
> +		err = wddev->ops->stop(wddev);
> +		if (err == 0)
> +			clear_bit(WDOG_ACTIVE, &wddev->status);
> +	} else {
> +		/* Unstoppable watchdogs need the worker to keep them alive */
>  		clear_bit(WDOG_ACTIVE, &wddev->status);
> +		/*
> +		 * Ping it once as we don't know how much time there
> +		 * is left in the watchdog timer.
> +		 */
> +		err = _watchdog_ping(wddev);
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	}
Why not do:

	err = wddev->ops->stop(wddev);
	clear_bit(WDOG_ACTIVE, &wddev->status);
	if (err != 0) {
		err = _watchdog_ping(wddev);
		if (err) {
			dev_crit(...);
		schedule_delayed_work(...);
	}

This way you (maybe?) don't need to expand all drivers to tell you if
the watchdog timer is stoppable, just assert that the stop callback
fails which IMHO is easier and more natural.

>  
>  out_stop:
>  	mutex_unlock(&wddev->lock);
> @@ -194,7 +224,38 @@ out_status:
>  static int watchdog_set_timeout(struct watchdog_device *wddev,
>  							unsigned int timeout)
>  {
> -	int err;
> +	int err = 0;
> +
> +	if (wddev->hw_max_timeout) {
> +		int hw_timeout;
> +		/*
> +		 * We can't support too short timeout values. There is
> +		 * really no maximu however, anything longer than HW
s/maximu/maximum/

> +		 * maximum will be supported by the watchdog core on
> +		 * behalf of the actual HW.
> +		 */
> +		if (timeout < wddev->min_timeout)
> +			return -EINVAL;
Is it save to read min_timout without holding the lock?
> +
> +		mutex_lock(&wddev->lock);
> +		if (test_bit(WDOG_UNREGISTERED, &wddev->status)) {
> +			err = -ENODEV;
> +			goto out_timeout;
> +		}
> +
> +		if (timeout * HZ > wddev->hw_max_timeout)
> +			schedule_delayed_work(&wddev->work,
> +					wddev->hw_heartbeat);
> +
> +		hw_timeout = min(timeout, wddev->hw_max_timeout / HZ);
If you'd keep hw_max_timeout in seconds (i.e. what is used for timeout)
you wouldn't need to divide here. If my watchdog has hw_max_timeout =
0.5s then you do wddev->ops->set_timeout(0) here.

> +		if (wddev->info->options & WDIOF_SETTIMEOUT)
> +			err = wddev->ops->set_timeout(wddev, hw_timeout);
The case of not having WDIOF_SETTIMEOUT isn't handled here, right? Other
places check for WDIOF_SETTIMEOUT and wddev->ops->set_timeout?!

> +
> +		if (hw_timeout < timeout)
> +			wddev->timeout = timeout;
What is the semantic of wddev->timeout? Does it hold the value
set by userspace or the value the hardware is set to? You'd need to
clarify that by adding more comments. Is this if correct?

> +
> +		goto out_timeout;
> +	}
>  
>  	if ((wddev->ops->set_timeout == NULL) ||
>  	    !(wddev->info->options & WDIOF_SETTIMEOUT))
> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> index 395b70e..027c99d 100644
> --- a/include/linux/watchdog.h
> +++ b/include/linux/watchdog.h
> @@ -12,6 +12,7 @@
>  #include <linux/bitops.h>
>  #include <linux/device.h>
>  #include <linux/cdev.h>
> +#include <linux/workqueue.h>
>  #include <uapi/linux/watchdog.h>
>  
>  struct watchdog_ops;
> @@ -62,9 +63,13 @@ struct watchdog_ops {
>   * @timeout:	The watchdog devices timeout value.
>   * @min_timeout:The watchdog devices minimum timeout value.
>   * @max_timeout:The watchdog devices maximum timeout value.
> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
> + * @hw_heartbeat:Time interval in HW between timer pings.
>   * @driver-data:Pointer to the drivers private data.
>   * @lock:	Lock for watchdog core internal use only.
> + * @work:	Worker used to provide longer timeouts than hardware supports.
>   * @status:	Field that contains the devices internal status bits.
> + * @hw_features:Feature bits describing how the watchdog HW works.
>   *
>   * The watchdog_device structure contains all information about a
>   * watchdog timer device.
> @@ -86,8 +91,12 @@ struct watchdog_device {
>  	unsigned int timeout;
>  	unsigned int min_timeout;
>  	unsigned int max_timeout;
> +	unsigned int hw_max_timeout; /* in jiffies */
> +	unsigned int hw_heartbeat; /* in jiffies */
> +	unsigned long int expires; /* for keepalive worker */
>  	void *driver_data;
>  	struct mutex lock;
> +	struct delayed_work work;
I'd group things together here that belong together.

>  	unsigned long status;
>  /* Bit numbers for status flags */
>  #define WDOG_ACTIVE		0	/* Is the watchdog running/active */
> @@ -95,6 +104,14 @@ struct watchdog_device {
>  #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
>  #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
>  #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
> +
> +/* Bits describing features supported by the HW */
> +	unsigned long hw_features;
> +
> +/* Can the watchdog be stopped and started */
> +#define WDOG_HW_IS_STOPPABLE		BIT(0)
> +/* Is the watchdog already running when the driver starts up */
> +#define WDOG_HW_RUNNING_AT_BOOT		BIT(1)
>  };
>  
>  #define WATCHDOG_NOWAYOUT		IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
> @@ -120,6 +137,11 @@ static inline bool watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>  		(t < wdd->min_timeout || t > wdd->max_timeout));
>  }
>  
> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
> +{
> +	return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
> +}
> +
>  /* Use the following functions to manipulate watchdog driver specific data */
>  static inline void watchdog_set_drvdata(struct watchdog_device *wdd, void *data)
>  {
> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct watchdog_device *wdd)
>  /* drivers/watchdog/watchdog_core.c */
>  extern int watchdog_init_timeout(struct watchdog_device *wdd,
>  				  unsigned int timeout_parm, struct device *dev);
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev);
This needs commenting and probably embrace watchdog_init_timeout
properly. i.e. remove watchdog_init_timeout.

Best regards
Uwe
Timo Kokkonen May 4, 2015, 9:40 a.m. UTC | #4
Hi Uwe

On 04.05.2015 10:58, Uwe Kleine-König wrote:
> Hello Timo,
>
> On Wed, Apr 22, 2015 at 02:11:35PM +0300, Timo Kokkonen wrote:
>> There is a great deal of diversity in the watchdog hardware found on
>> different devices. Differen hardware have different contstraints on
> s/Differen/Different/; s/contstraints/constraints/
>
>> them, many of the constraints that are excessively difficult for the
>> user space to satisfy.
>>
>> One such constraint is the length of the timeout value, which in many
>> cases can be just a few seconds. Drivers are creating ad hoc solutions
>> with timers and workqueues to extend the timeout in order to give user
>> space more time between updates. Looking at the drivers it is clear
>> that this has resulted to a lot of duplicate code.
>>
>> Add an extension to the watchdog kernel API that allows the driver to
>> describe tis HW constraints to the watchdog code. A kernel worker in
> s/tis/its/
>

I thought that I ran ispell at least on the commit message, but maybe 
not on all commits.. I'll try to collect all these fixes to the next 
version.

>> the core is then used to extend the watchdog timeout on behalf of the
>> user space. This allows the drivers to be simplified as core takes
>> over the timer extending.
> In general a nice idea.
>
>> Tested-by: Wenyou Yang <wenyou.yang@atmel.com>
>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
>> ---
>>   drivers/watchdog/watchdog_core.c | 101 +++++++++++++++++++++++++++++++++++++--
>>   drivers/watchdog/watchdog_dev.c  |  75 ++++++++++++++++++++++++++---
>>   include/linux/watchdog.h         |  23 +++++++++
>>   3 files changed, 189 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
>> index cec9b55..fd12489 100644
>> --- a/drivers/watchdog/watchdog_core.c
>> +++ b/drivers/watchdog/watchdog_core.c
>> @@ -99,6 +99,89 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
>>   EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>>
>>   /**
>> + * watchdog_init_parms() - initialize generic watchdog parameters
>> + * @wdd: Watchdog device to operate
>> + * @dev: Device that stores the device tree properties
>> + *
>> + * Initialize the generic timeout parameters. The driver needs to set
>> + * hw_features bitmask from @wdd prior calling this function in order
>> + * to ensure the core knows how to handle the HW.
>> + *
>> + * A zero is returned on success and -EINVAL for failure.
>> + */
>> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
>> +{
>> +	int ret = 0;
>> +
>> +	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
>> +	if (ret < 0)
>> +		return ret;
> As Guenter pointed out, using wdd->timeout is unfortunate here.
>

Yeah, I'll come up with something better for the next version.

>> +
>> +	/*
>> +	 * Max HW timeout needs to be set so that core knows when to
>> +	 * use a kernel worker to support longer watchdog timeouts
>> +	 */
>> +	if (!wdd->hw_max_timeout)
>> +		return -EINVAL;
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(watchdog_init_params);
>> +
>> +static void watchdog_worker(struct work_struct *work)
>> +{
>> +	struct watchdog_device *wdd = container_of(to_delayed_work(work),
>> +						struct watchdog_device, work);
> please align the follow-up line to the opening (.
>

ok

>> +	bool boot_keepalive;
>> +	bool active_keepalive;
>> +
>> +	mutex_lock(&wdd->lock);
>> +
>> +	boot_keepalive = !watchdog_active(wdd) &&
> I'm not sure about the semantic of watchdog_active here. Would be nice
> to clearify this. Does it mean /dev/watchdog is open, or the watchdog is
> running? Maybe introduce watchdog_running(wdd) and watchdog_opened(wdd)?

watchdog_active() simply means the device is open by user and we expect 
the HW to be running. What is a bit confusing here is that if watchdog 
is closed by the user, according the variable names here, it goes back 
to "boot_keepalive" mode. Maybe it would be better to rename the 
variable as inactive_keepalive instead.

 > Is this related to the topic of this patch? You didn't mention anything
 > in the commit log about the time until userspace comes up.
 >

It's the next patch in this series that introduces the early_timeout_sec 
logic. I deliberately splitted that functionality out from this patch to 
not confuse the new feature with the change of the existing API.

>> +		!watchdog_is_stoppable(wdd);
>> +
>> +	active_keepalive = watchdog_active(wdd) &&
>> +		wdd->hw_max_timeout < wdd->timeout * HZ;
>> +
>> +	if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
>> +		pr_crit("I will reset your machine !\n");
>> +		mutex_unlock(&wdd->lock);
>> +		return;
>> +	}
> I'd replace the pr_crit by a code comment. Also "I will reset your
> machine!" is wrong, the core just stops petting the watchdog.

Yeah, this is something that kept repeating with some of the drivers 
that had custom keepalive code with them. I'm fine with replacing it 
with a commect.

> If userspace comes in in time there is no harm, right?

Yep, indeed.

>
>> +	if (boot_keepalive || active_keepalive) {
>> +		if (wdd->ops->ping)
>> +			wdd->ops->ping(wdd);
>> +		else
>> +			wdd->ops->start(wdd);
> if (boot_keepalive || active_keepalive)
> 	_watchdog_ping(wdd);
>

ack.

>> +
>> +		schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
> s/  / /; Would it be sensible to assume hw_heartbeat to be timeout / 2
> always?
>

Timeout variable stores the user space timeout value, which might be 
different to what hardware is capable of. I'd better document that 
change in the headers at least.

>> +	}
>> +
>> +	mutex_unlock(&wdd->lock);
>> +}
>> +
>> +static int prepare_watchdog(struct watchdog_device *wdd)
>> +{
>> +	int err = 0;
>> +
>> +	/* Stop the watchdog now before user space opens the device */
>> +	if (watchdog_is_stoppable(wdd) &&
>> +		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
>> +		err = wdd->ops->stop(wdd);
>> +
>> +	} else if (!watchdog_is_stoppable(wdd) &&
>> +		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
>> +		/*
>> +		 * Can't stop it, use a delayed worker to tick it
>> +		 * until it's open by user space
>> +		 */
> I'd add a ping here because if the driver comes up a big part of the
> initial timeout setup by the bootloader might already be over.

Agreed.

> Note that at least for the omap watchdog driver the assignment to
> timeout doesn't relate in any way to the timeout actually setup in
> hardware.

The timeout value should match what is setup in hardware, unless user is 
requesting something beyond hardware limits, in which case we use the 
worker to extend the limit. I'm not sure how that is relevant at here 
though..

>> +		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
>> +	}
>> +	return err;
>> +}
>> +
>> +/**
>>    * watchdog_register_device() - register a watchdog device
>>    * @wdd: watchdog device
>>    *
>> @@ -156,13 +239,24 @@ int watchdog_register_device(struct watchdog_device *wdd)
>>   	wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>>   					NULL, "watchdog%d", wdd->id);
>>   	if (IS_ERR(wdd->dev)) {
>> -		watchdog_dev_unregister(wdd);
>> -		ida_simple_remove(&watchdog_ida, id);
>>   		ret = PTR_ERR(wdd->dev);
>> -		return ret;
>> +		goto dev_create_fail;
>> +	}
>> +
>> +	INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
>> +
>> +	if (wdd->hw_max_timeout) {
>> +		ret = prepare_watchdog(wdd);
>> +		if (ret)
>> +			goto dev_create_fail;
> 	} else {
> 		dev_warn("Incomplete watchdog driver implementation, please report or fix\n")

Yes, this would probably speed up the rate drivers are converting to use 
the new API extensions. I'll add it.

>>   	}
>>
>>   	return 0;
>> +
>> +dev_create_fail:
>> +	watchdog_dev_unregister(wdd);
>> +	ida_simple_remove(&watchdog_ida, id);
>> +	return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(watchdog_register_device);
>>
>> @@ -181,6 +275,7 @@ void watchdog_unregister_device(struct watchdog_device *wdd)
>>   	if (wdd == NULL)
>>   		return;
>>
>> +	cancel_delayed_work_sync(&wdd->work);
>>   	devno = wdd->cdev.dev;
>>   	ret = watchdog_dev_unregister(wdd);
>>   	if (ret)
>> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
>> index 6aaefba..04ac68c 100644
>> --- a/drivers/watchdog/watchdog_dev.c
>> +++ b/drivers/watchdog/watchdog_dev.c
>> @@ -49,6 +49,14 @@ static dev_t watchdog_devt;
>>   /* the watchdog device behind /dev/watchdog */
>>   static struct watchdog_device *old_wdd;
>>
>> +static int _watchdog_ping(struct watchdog_device *wddev)
>> +{
>> +	if (wddev->ops->ping)
>> +		return wddev->ops->ping(wddev);  /* ping the watchdog */
>> +	else
>> +		return wddev->ops->start(wddev); /* restart watchdog */
>> +}
>> +
>>   /*
>>    *	watchdog_ping: ping the watchdog.
>>    *	@wddev: the watchdog device to ping
>> @@ -73,10 +81,13 @@ static int watchdog_ping(struct watchdog_device *wddev)
>>   	if (!watchdog_active(wddev))
>>   		goto out_ping;
>>
>> -	if (wddev->ops->ping)
>> -		err = wddev->ops->ping(wddev);  /* ping the watchdog */
>> -	else
>> -		err = wddev->ops->start(wddev); /* restart watchdog */
>> +	err = _watchdog_ping(wddev);
>> +
>> +	if (wddev->hw_max_timeout &&
>> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
>> +		wddev->expires = jiffies + wddev->timeout * HZ;
>> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>> +	}
>>
>>   out_ping:
>>   	mutex_unlock(&wddev->lock);
>> @@ -110,6 +121,13 @@ static int watchdog_start(struct watchdog_device *wddev)
>>   	if (err == 0)
>>   		set_bit(WDOG_ACTIVE, &wddev->status);
>>
>> +	if (wddev->hw_max_timeout &&
>> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
>> +		wddev->expires = jiffies + wddev->timeout * HZ;
>> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>> +	} else
>> +		cancel_delayed_work(&wddev->work);
>> +
>>   out_start:
>>   	mutex_unlock(&wddev->lock);
>>   	return err;
>> @@ -145,9 +163,21 @@ static int watchdog_stop(struct watchdog_device *wddev)
>>   		goto out_stop;
>>   	}
>>
>> -	err = wddev->ops->stop(wddev);
>> -	if (err == 0)
>> +	if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
>> +		cancel_delayed_work(&wddev->work);
>> +		err = wddev->ops->stop(wddev);
>> +		if (err == 0)
>> +			clear_bit(WDOG_ACTIVE, &wddev->status);
>> +	} else {
>> +		/* Unstoppable watchdogs need the worker to keep them alive */
>>   		clear_bit(WDOG_ACTIVE, &wddev->status);
>> +		/*
>> +		 * Ping it once as we don't know how much time there
>> +		 * is left in the watchdog timer.
>> +		 */
>> +		err = _watchdog_ping(wddev);
>> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
>> +	}
> Why not do:
>
> 	err = wddev->ops->stop(wddev);
> 	clear_bit(WDOG_ACTIVE, &wddev->status);
> 	if (err != 0) {
> 		err = _watchdog_ping(wddev);
> 		if (err) {
> 			dev_crit(...);
> 		schedule_delayed_work(...);
> 	}
>
> This way you (maybe?) don't need to expand all drivers to tell you if
> the watchdog timer is stoppable, just assert that the stop callback
> fails which IMHO is easier and more natural.
>

Currently the watchdog API mandates that stop() function must be 
implemented and the drivers that can't stop the actual HW just set up 
their own tricks to emulate stopped hardware. What we could do is to 
require that drivers using the new API return error on their stop 
function in case they can't stop the hardware. I never thought it this 
way. This would indeed reduce 50% of the proposed HW feature bits.

I'll look into this. The idea is good.

>>
>>   out_stop:
>>   	mutex_unlock(&wddev->lock);
>> @@ -194,7 +224,38 @@ out_status:
>>   static int watchdog_set_timeout(struct watchdog_device *wddev,
>>   							unsigned int timeout)
>>   {
>> -	int err;
>> +	int err = 0;
>> +
>> +	if (wddev->hw_max_timeout) {
>> +		int hw_timeout;
>> +		/*
>> +		 * We can't support too short timeout values. There is
>> +		 * really no maximu however, anything longer than HW
> s/maximu/maximum/
>
>> +		 * maximum will be supported by the watchdog core on
>> +		 * behalf of the actual HW.
>> +		 */
>> +		if (timeout < wddev->min_timeout)
>> +			return -EINVAL;
> Is it save to read min_timout without holding the lock?

No, I'd say it is a bug, even though I don't think we are likely to race 
here. I see no reason why this shouldn't be covered with the lock.

>> +
>> +		mutex_lock(&wddev->lock);
>> +		if (test_bit(WDOG_UNREGISTERED, &wddev->status)) {
>> +			err = -ENODEV;
>> +			goto out_timeout;
>> +		}
>> +
>> +		if (timeout * HZ > wddev->hw_max_timeout)
>> +			schedule_delayed_work(&wddev->work,
>> +					wddev->hw_heartbeat);
>> +
>> +		hw_timeout = min(timeout, wddev->hw_max_timeout / HZ);
> If you'd keep hw_max_timeout in seconds (i.e. what is used for timeout)
> you wouldn't need to divide here. If my watchdog has hw_max_timeout =
> 0.5s then you do wddev->ops->set_timeout(0) here.
>

If the maximum supported timeout is less than a second, there is no 
really point in having WDIOF_SETTIMEOUT set in the driver at all. There 
is no way user can set up the timeout through the current API. The 
hw_max_timeout needs to be in jiffies in case the HW maximum is less 
than a second.

>> +		if (wddev->info->options & WDIOF_SETTIMEOUT)
>> +			err = wddev->ops->set_timeout(wddev, hw_timeout);
> The case of not having WDIOF_SETTIMEOUT isn't handled here, right? Other
> places check for WDIOF_SETTIMEOUT and wddev->ops->set_timeout?!
>

Good point. I didn't actually test this with any drivers that didn't 
have WDIOF_SETTIMEOUT set. For example with at91sam9_wdt 
WDIOF_SETTIMEOUT is set but only because the driver implemented its own 
fake timeouts as the hardware cannot be set. So I need to remove 
WDIOF_SETTIMEOUT bit from the driver and then handle the timeout 
handling here properly. And in that case, what we can do here is to 
either have a timeout that is exactly same what is supported by the 
hardware or use the worker to extend the timeout. Can't do anything else.

Good catch!

>> +
>> +		if (hw_timeout < timeout)
>> +			wddev->timeout = timeout;
> What is the semantic of wddev->timeout? Does it hold the value
> set by userspace or the value the hardware is set to? You'd need to
> clarify that by adding more comments. Is this if correct?
>

It stores the timeout interval we are expecting from the user space. 
Hardware timeout can be shorter. I'll document this better to the next 
patch version.

>> +
>> +		goto out_timeout;
>> +	}
>>
>>   	if ((wddev->ops->set_timeout == NULL) ||
>>   	    !(wddev->info->options & WDIOF_SETTIMEOUT))
>> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
>> index 395b70e..027c99d 100644
>> --- a/include/linux/watchdog.h
>> +++ b/include/linux/watchdog.h
>> @@ -12,6 +12,7 @@
>>   #include <linux/bitops.h>
>>   #include <linux/device.h>
>>   #include <linux/cdev.h>
>> +#include <linux/workqueue.h>
>>   #include <uapi/linux/watchdog.h>
>>
>>   struct watchdog_ops;
>> @@ -62,9 +63,13 @@ struct watchdog_ops {
>>    * @timeout:	The watchdog devices timeout value.
>>    * @min_timeout:The watchdog devices minimum timeout value.
>>    * @max_timeout:The watchdog devices maximum timeout value.
>> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
>> + * @hw_heartbeat:Time interval in HW between timer pings.
>>    * @driver-data:Pointer to the drivers private data.
>>    * @lock:	Lock for watchdog core internal use only.
>> + * @work:	Worker used to provide longer timeouts than hardware supports.
>>    * @status:	Field that contains the devices internal status bits.
>> + * @hw_features:Feature bits describing how the watchdog HW works.
>>    *
>>    * The watchdog_device structure contains all information about a
>>    * watchdog timer device.
>> @@ -86,8 +91,12 @@ struct watchdog_device {
>>   	unsigned int timeout;
>>   	unsigned int min_timeout;
>>   	unsigned int max_timeout;
>> +	unsigned int hw_max_timeout; /* in jiffies */
>> +	unsigned int hw_heartbeat; /* in jiffies */
>> +	unsigned long int expires; /* for keepalive worker */
>>   	void *driver_data;
>>   	struct mutex lock;
>> +	struct delayed_work work;
> I'd group things together here that belong together.
>

How would you like to have the "belong together" criteria to be defined? 
I now have timeout values together and then other structures in another 
group as I found that logical. I'm fine with other criteria too, if you 
specify one to me.

>>   	unsigned long status;
>>   /* Bit numbers for status flags */
>>   #define WDOG_ACTIVE		0	/* Is the watchdog running/active */
>> @@ -95,6 +104,14 @@ struct watchdog_device {
>>   #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
>>   #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
>>   #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
>> +
>> +/* Bits describing features supported by the HW */
>> +	unsigned long hw_features;
>> +
>> +/* Can the watchdog be stopped and started */
>> +#define WDOG_HW_IS_STOPPABLE		BIT(0)
>> +/* Is the watchdog already running when the driver starts up */
>> +#define WDOG_HW_RUNNING_AT_BOOT		BIT(1)
>>   };
>>
>>   #define WATCHDOG_NOWAYOUT		IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
>> @@ -120,6 +137,11 @@ static inline bool watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>>   		(t < wdd->min_timeout || t > wdd->max_timeout));
>>   }
>>
>> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
>> +{
>> +	return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
>> +}
>> +
>>   /* Use the following functions to manipulate watchdog driver specific data */
>>   static inline void watchdog_set_drvdata(struct watchdog_device *wdd, void *data)
>>   {
>> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct watchdog_device *wdd)
>>   /* drivers/watchdog/watchdog_core.c */
>>   extern int watchdog_init_timeout(struct watchdog_device *wdd,
>>   				  unsigned int timeout_parm, struct device *dev);
>> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev);
> This needs commenting and probably embrace watchdog_init_timeout
> properly. i.e. remove watchdog_init_timeout.

Yes, the comments are now mostly above the function implementation. I 
can add more there if you feel its inadequate.

watchdog_init_timeout() can be removed once all drivers are converted to 
use init_watchdog_params()


Thank you for your valuable review.

-Timo

>
> Best regards
> Uwe
>
Guenter Roeck May 4, 2015, 3:43 p.m. UTC | #5
On Wed, Apr 22, 2015 at 02:11:35PM +0300, Timo Kokkonen wrote:
> There is a great deal of diversity in the watchdog hardware found on
> different devices. Differen hardware have different contstraints on
> them, many of the constraints that are excessively difficult for the
> user space to satisfy.
> 
> One such constraint is the length of the timeout value, which in many
> cases can be just a few seconds. Drivers are creating ad hoc solutions
> with timers and workqueues to extend the timeout in order to give user
> space more time between updates. Looking at the drivers it is clear
> that this has resulted to a lot of duplicate code.
> 
> Add an extension to the watchdog kernel API that allows the driver to
> describe tis HW constraints to the watchdog code. A kernel worker in
> the core is then used to extend the watchdog timeout on behalf of the
> user space. This allows the drivers to be simplified as core takes
> over the timer extending.
> 
> Tested-by: Wenyou Yang <wenyou.yang@atmel.com>
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>

Hi Timo,

Time to add my thoughts before you create the next version.
It isn't complete, so I won't try a line by line review.

The init_params function isn't the best choice for a name. Ultimately
we have other concepts to add - such as generic pretimeout support -
which makes this more and more difficult to handle. A separate function to
initialize the early timeout separately might make more sense. Either case,
any new API must not provide less functionality than the old one.

The current semantics of timeout and max_timeout is that those _are_ the
hardware limits. We can not just change that. Also, the new functionality
should, as much as possible, be automatic and should not require changes in
existing drivers to be useful for them.

I understand we need something line hw_timeout and hw_max_timeout, but that
doesn't mean that those _have_ to be specified to be used. For a generic use
case, it is sufficient to assume that timeout == hw_timeout and max_timeout ==
hw_max_timeout, and the hw_ values can be set automatically if not specified.
The code should be more permissive to not bail out if , for example,
hw_max_timeout is not specified (then hw_max_timeout == max_timeout).

Similar, it should not be necessary to have to specify hw_timeout and
hw_max_timeout in the first place for those to be useful. We can instead do
something like

	#define MIN_TIMEOUT	(60 * 10)	/* in seconds */

	if (!hw_max_timeout && max_timeout < MIN_TIMEOUT) {
		hw_max_timeout = max_timeout;
		max_timeout = MIN_TIMEOUT;
	}

with similar adjustments for hw_timeout and timeout. I am not sure if we should
keep the current meaning of timeout and max_timeout (ie it means hw limits) and
introduce sw_max_timeout and sw_timeout, or extend the meaning of max_timeout
and timeout to mean both and introduce the hw_variables. The latter would let us
specify more granularity for the hw_ values, so maybe the latter approach is
better.

Limits should not be in jiffies unless only used internally by the watchdog
subsystem. If more granularity is needed, use milliseconds and convert to
jiffies where needed.

Consider adding helper functions such as _watchdog_ping as inline in
watchdog_core.h.

hw_heartbeat should be set from the watchdog core; I don't think it needs to
be provided by the driver. It can be calculated from hw_max_timeout and timeout
as needed (to something like min(hw_max_timeout, timeout) / 2).

You don't need boot_kepalive and active_keepalive as separate flags in
watchdog_worker.

hw_features is unfortunate, and I am not sure that it is needed. That the
watchdog is running at boot time is not really a hardware feature, but a state.
Either add a bit to 'status' in watchdog_device, or add a callback function. 
WDOG_HW_IS_STOPPABLE can be added to 'status' as well. It should probably be
reversed, though, to something like WDOG_NOT_STOPPABLE, so that it does not have
to be set to existing drivers.

Guenter

> ---
>  drivers/watchdog/watchdog_core.c | 101 +++++++++++++++++++++++++++++++++++++--
>  drivers/watchdog/watchdog_dev.c  |  75 ++++++++++++++++++++++++++---
>  include/linux/watchdog.h         |  23 +++++++++
>  3 files changed, 189 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
> index cec9b55..fd12489 100644
> --- a/drivers/watchdog/watchdog_core.c
> +++ b/drivers/watchdog/watchdog_core.c
> @@ -99,6 +99,89 @@ int watchdog_init_timeout(struct watchdog_device *wdd,
>  EXPORT_SYMBOL_GPL(watchdog_init_timeout);
>  
>  /**
> + * watchdog_init_parms() - initialize generic watchdog parameters
> + * @wdd: Watchdog device to operate
> + * @dev: Device that stores the device tree properties
> + *
> + * Initialize the generic timeout parameters. The driver needs to set
> + * hw_features bitmask from @wdd prior calling this function in order
> + * to ensure the core knows how to handle the HW.
> + *
> + * A zero is returned on success and -EINVAL for failure.
> + */
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
> +{
> +	int ret = 0;
> +
> +	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
> +	if (ret < 0)
> +		return ret;
> +
> +	/*
> +	 * Max HW timeout needs to be set so that core knows when to
> +	 * use a kernel worker to support longer watchdog timeouts
> +	 */
> +	if (!wdd->hw_max_timeout)
> +		return -EINVAL;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(watchdog_init_params);
> +
> +static void watchdog_worker(struct work_struct *work)
> +{
> +	struct watchdog_device *wdd = container_of(to_delayed_work(work),
> +						struct watchdog_device, work);
> +	bool boot_keepalive;
> +	bool active_keepalive;
> +
> +	mutex_lock(&wdd->lock);
> +
> +	boot_keepalive = !watchdog_active(wdd) &&
> +		!watchdog_is_stoppable(wdd);
> +
> +	active_keepalive = watchdog_active(wdd) &&
> +		wdd->hw_max_timeout < wdd->timeout * HZ;
> +
> +	if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
> +		pr_crit("I will reset your machine !\n");
> +		mutex_unlock(&wdd->lock);
> +		return;
> +	}
> +
> +	if (boot_keepalive || active_keepalive) {
> +		if (wdd->ops->ping)
> +			wdd->ops->ping(wdd);
> +		else
> +			wdd->ops->start(wdd);
> +
> +		schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
> +	}
> +
> +	mutex_unlock(&wdd->lock);
> +}
> +
> +static int prepare_watchdog(struct watchdog_device *wdd)
> +{
> +	int err = 0;
> +
> +	/* Stop the watchdog now before user space opens the device */
> +	if (watchdog_is_stoppable(wdd) &&
> +		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> +		err = wdd->ops->stop(wdd);
> +
> +	} else if (!watchdog_is_stoppable(wdd) &&
> +		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> +		/*
> +		 * Can't stop it, use a delayed worker to tick it
> +		 * until it's open by user space
> +		 */
> +		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> +	}
> +	return err;
> +}
> +
> +/**
>   * watchdog_register_device() - register a watchdog device
>   * @wdd: watchdog device
>   *
> @@ -156,13 +239,24 @@ int watchdog_register_device(struct watchdog_device *wdd)
>  	wdd->dev = device_create(watchdog_class, wdd->parent, devno,
>  					NULL, "watchdog%d", wdd->id);
>  	if (IS_ERR(wdd->dev)) {
> -		watchdog_dev_unregister(wdd);
> -		ida_simple_remove(&watchdog_ida, id);
>  		ret = PTR_ERR(wdd->dev);
> -		return ret;
> +		goto dev_create_fail;
> +	}
> +
> +	INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
> +
> +	if (wdd->hw_max_timeout) {
> +		ret = prepare_watchdog(wdd);
> +		if (ret)
> +			goto dev_create_fail;
>  	}
>  
>  	return 0;
> +
> +dev_create_fail:
> +	watchdog_dev_unregister(wdd);
> +	ida_simple_remove(&watchdog_ida, id);
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(watchdog_register_device);
>  
> @@ -181,6 +275,7 @@ void watchdog_unregister_device(struct watchdog_device *wdd)
>  	if (wdd == NULL)
>  		return;
>  
> +	cancel_delayed_work_sync(&wdd->work);
>  	devno = wdd->cdev.dev;
>  	ret = watchdog_dev_unregister(wdd);
>  	if (ret)
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index 6aaefba..04ac68c 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -49,6 +49,14 @@ static dev_t watchdog_devt;
>  /* the watchdog device behind /dev/watchdog */
>  static struct watchdog_device *old_wdd;
>  
> +static int _watchdog_ping(struct watchdog_device *wddev)
> +{
> +	if (wddev->ops->ping)
> +		return wddev->ops->ping(wddev);  /* ping the watchdog */
> +	else
> +		return wddev->ops->start(wddev); /* restart watchdog */
> +}
> +
>  /*
>   *	watchdog_ping: ping the watchdog.
>   *	@wddev: the watchdog device to ping
> @@ -73,10 +81,13 @@ static int watchdog_ping(struct watchdog_device *wddev)
>  	if (!watchdog_active(wddev))
>  		goto out_ping;
>  
> -	if (wddev->ops->ping)
> -		err = wddev->ops->ping(wddev);  /* ping the watchdog */
> -	else
> -		err = wddev->ops->start(wddev); /* restart watchdog */
> +	err = _watchdog_ping(wddev);
> +
> +	if (wddev->hw_max_timeout &&
> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
> +		wddev->expires = jiffies + wddev->timeout * HZ;
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	}
>  
>  out_ping:
>  	mutex_unlock(&wddev->lock);
> @@ -110,6 +121,13 @@ static int watchdog_start(struct watchdog_device *wddev)
>  	if (err == 0)
>  		set_bit(WDOG_ACTIVE, &wddev->status);
>  
> +	if (wddev->hw_max_timeout &&
> +		wddev->timeout * HZ > wddev->hw_max_timeout) {
> +		wddev->expires = jiffies + wddev->timeout * HZ;
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	} else
> +		cancel_delayed_work(&wddev->work);
> +
>  out_start:
>  	mutex_unlock(&wddev->lock);
>  	return err;
> @@ -145,9 +163,21 @@ static int watchdog_stop(struct watchdog_device *wddev)
>  		goto out_stop;
>  	}
>  
> -	err = wddev->ops->stop(wddev);
> -	if (err == 0)
> +	if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
> +		cancel_delayed_work(&wddev->work);
> +		err = wddev->ops->stop(wddev);
> +		if (err == 0)
> +			clear_bit(WDOG_ACTIVE, &wddev->status);
> +	} else {
> +		/* Unstoppable watchdogs need the worker to keep them alive */
>  		clear_bit(WDOG_ACTIVE, &wddev->status);
> +		/*
> +		 * Ping it once as we don't know how much time there
> +		 * is left in the watchdog timer.
> +		 */
> +		err = _watchdog_ping(wddev);
> +		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> +	}
>  
>  out_stop:
>  	mutex_unlock(&wddev->lock);
> @@ -194,7 +224,38 @@ out_status:
>  static int watchdog_set_timeout(struct watchdog_device *wddev,
>  							unsigned int timeout)
>  {
> -	int err;
> +	int err = 0;
> +
> +	if (wddev->hw_max_timeout) {
> +		int hw_timeout;
> +		/*
> +		 * We can't support too short timeout values. There is
> +		 * really no maximu however, anything longer than HW
> +		 * maximum will be supported by the watchdog core on
> +		 * behalf of the actual HW.
> +		 */
> +		if (timeout < wddev->min_timeout)
> +			return -EINVAL;
> +
> +		mutex_lock(&wddev->lock);
> +		if (test_bit(WDOG_UNREGISTERED, &wddev->status)) {
> +			err = -ENODEV;
> +			goto out_timeout;
> +		}
> +
> +		if (timeout * HZ > wddev->hw_max_timeout)
> +			schedule_delayed_work(&wddev->work,
> +					wddev->hw_heartbeat);
> +
> +		hw_timeout = min(timeout, wddev->hw_max_timeout / HZ);
> +		if (wddev->info->options & WDIOF_SETTIMEOUT)
> +			err = wddev->ops->set_timeout(wddev, hw_timeout);
> +
> +		if (hw_timeout < timeout)
> +			wddev->timeout = timeout;
> +
> +		goto out_timeout;
> +	}
>  
>  	if ((wddev->ops->set_timeout == NULL) ||
>  	    !(wddev->info->options & WDIOF_SETTIMEOUT))
> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> index 395b70e..027c99d 100644
> --- a/include/linux/watchdog.h
> +++ b/include/linux/watchdog.h
> @@ -12,6 +12,7 @@
>  #include <linux/bitops.h>
>  #include <linux/device.h>
>  #include <linux/cdev.h>
> +#include <linux/workqueue.h>
>  #include <uapi/linux/watchdog.h>
>  
>  struct watchdog_ops;
> @@ -62,9 +63,13 @@ struct watchdog_ops {
>   * @timeout:	The watchdog devices timeout value.
>   * @min_timeout:The watchdog devices minimum timeout value.
>   * @max_timeout:The watchdog devices maximum timeout value.
> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
> + * @hw_heartbeat:Time interval in HW between timer pings.
>   * @driver-data:Pointer to the drivers private data.
>   * @lock:	Lock for watchdog core internal use only.
> + * @work:	Worker used to provide longer timeouts than hardware supports.
>   * @status:	Field that contains the devices internal status bits.
> + * @hw_features:Feature bits describing how the watchdog HW works.
>   *
>   * The watchdog_device structure contains all information about a
>   * watchdog timer device.
> @@ -86,8 +91,12 @@ struct watchdog_device {
>  	unsigned int timeout;
>  	unsigned int min_timeout;
>  	unsigned int max_timeout;
> +	unsigned int hw_max_timeout; /* in jiffies */
> +	unsigned int hw_heartbeat; /* in jiffies */
> +	unsigned long int expires; /* for keepalive worker */
>  	void *driver_data;
>  	struct mutex lock;
> +	struct delayed_work work;
>  	unsigned long status;
>  /* Bit numbers for status flags */
>  #define WDOG_ACTIVE		0	/* Is the watchdog running/active */
> @@ -95,6 +104,14 @@ struct watchdog_device {
>  #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
>  #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
>  #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
> +
> +/* Bits describing features supported by the HW */
> +	unsigned long hw_features;
> +
> +/* Can the watchdog be stopped and started */
> +#define WDOG_HW_IS_STOPPABLE		BIT(0)
> +/* Is the watchdog already running when the driver starts up */
> +#define WDOG_HW_RUNNING_AT_BOOT		BIT(1)
>  };
>  
>  #define WATCHDOG_NOWAYOUT		IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
> @@ -120,6 +137,11 @@ static inline bool watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
>  		(t < wdd->min_timeout || t > wdd->max_timeout));
>  }
>  
> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
> +{
> +	return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
> +}
> +
>  /* Use the following functions to manipulate watchdog driver specific data */
>  static inline void watchdog_set_drvdata(struct watchdog_device *wdd, void *data)
>  {
> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct watchdog_device *wdd)
>  /* drivers/watchdog/watchdog_core.c */
>  extern int watchdog_init_timeout(struct watchdog_device *wdd,
>  				  unsigned int timeout_parm, struct device *dev);
> +int watchdog_init_params(struct watchdog_device *wdd, struct device *dev);
>  extern int watchdog_register_device(struct watchdog_device *);
>  extern void watchdog_unregister_device(struct watchdog_device *);
>  
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Kleine-Budde May 4, 2015, 9:17 p.m. UTC | #6
On 04/22/2015 01:11 PM, Timo Kokkonen wrote:
> There is a great deal of diversity in the watchdog hardware found on
> different devices. Differen hardware have different contstraints on
> them, many of the constraints that are excessively difficult for the
> user space to satisfy.
> 
> One such constraint is the length of the timeout value, which in many
> cases can be just a few seconds. Drivers are creating ad hoc solutions
> with timers and workqueues to extend the timeout in order to give user
> space more time between updates. Looking at the drivers it is clear
> that this has resulted to a lot of duplicate code.
> 
> Add an extension to the watchdog kernel API that allows the driver to
> describe tis HW constraints to the watchdog code. A kernel worker in
> the core is then used to extend the watchdog timeout on behalf of the
> user space. This allows the drivers to be simplified as core takes
> over the timer extending.
> 
> Tested-by: Wenyou Yang <wenyou.yang@atmel.com>
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> ---
>  drivers/watchdog/watchdog_core.c | 101 +++++++++++++++++++++++++++++++++++++--
>  drivers/watchdog/watchdog_dev.c  |  75 ++++++++++++++++++++++++++---
>  include/linux/watchdog.h         |  23 +++++++++
>  3 files changed, 189 insertions(+), 10 deletions(-)
> 
[...]

>  static int watchdog_set_timeout(struct watchdog_device *wddev,
>  							unsigned int timeout)
>  {
> -	int err;
> +	int err = 0;
> +
> +	if (wddev->hw_max_timeout) {
> +		int hw_timeout;
> +		/*
> +		 * We can't support too short timeout values. There is
> +		 * really no maximu however, anything longer than HW
> +		 * maximum will be supported by the watchdog core on
> +		 * behalf of the actual HW.
> +		 */
> +		if (timeout < wddev->min_timeout)
> +			return -EINVAL;
> +
> +		mutex_lock(&wddev->lock);

The locking in this function looks weird.

> +		if (test_bit(WDOG_UNREGISTERED, &wddev->status)) {
> +			err = -ENODEV;
> +			goto out_timeout;
> +		}
> +
> +		if (timeout * HZ > wddev->hw_max_timeout)
> +			schedule_delayed_work(&wddev->work,
> +					wddev->hw_heartbeat);
> +
> +		hw_timeout = min(timeout, wddev->hw_max_timeout / HZ);
> +		if (wddev->info->options & WDIOF_SETTIMEOUT)
> +			err = wddev->ops->set_timeout(wddev, hw_timeout);
> +
> +		if (hw_timeout < timeout)
> +			wddev->timeout = timeout;
> +
> +		goto out_timeout;
> +	}
>  
>  	if ((wddev->ops->set_timeout == NULL) ||
>  	    !(wddev->info->options & WDIOF_SETTIMEOUT))

Marc
Timo Kokkonen May 5, 2015, 6:26 a.m. UTC | #7
Hi Guenter,

On 04.05.2015 18:43, Guenter Roeck wrote:
> On Wed, Apr 22, 2015 at 02:11:35PM +0300, Timo Kokkonen wrote:
>> There is a great deal of diversity in the watchdog hardware found on
>> different devices. Differen hardware have different contstraints on
>> them, many of the constraints that are excessively difficult for the
>> user space to satisfy.
>>
>> One such constraint is the length of the timeout value, which in many
>> cases can be just a few seconds. Drivers are creating ad hoc solutions
>> with timers and workqueues to extend the timeout in order to give user
>> space more time between updates. Looking at the drivers it is clear
>> that this has resulted to a lot of duplicate code.
>>
>> Add an extension to the watchdog kernel API that allows the driver to
>> describe tis HW constraints to the watchdog code. A kernel worker in
>> the core is then used to extend the watchdog timeout on behalf of the
>> user space. This allows the drivers to be simplified as core takes
>> over the timer extending.
>>
>> Tested-by: Wenyou Yang <wenyou.yang@atmel.com>
>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
>
> Hi Timo,
>
> Time to add my thoughts before you create the next version.
> It isn't complete, so I won't try a line by line review.
>
> The init_params function isn't the best choice for a name. Ultimately
> we have other concepts to add - such as generic pretimeout support -
> which makes this more and more difficult to handle. A separate function to
> initialize the early timeout separately might make more sense. Either case,
> any new API must not provide less functionality than the old one.

Did you have something specific on your mind, something else besides of 
the early-timeout-sec parameter I'm adding with this series? I would 
like to learn more about it so that I can ensure the core API is as 
generic as possible.

> The current semantics of timeout and max_timeout is that those _are_ the
> hardware limits. We can not just change that. Also, the new functionality
> should, as much as possible, be automatic and should not require changes in
> existing drivers to be useful for them.
>
> I understand we need something line hw_timeout and hw_max_timeout, but that
> doesn't mean that those _have_ to be specified to be used. For a generic use
> case, it is sufficient to assume that timeout == hw_timeout and max_timeout ==
> hw_max_timeout, and the hw_ values can be set automatically if not specified.
> The code should be more permissive to not bail out if , for example,
> hw_max_timeout is not specified (then hw_max_timeout == max_timeout).

There are two requirements here that made me add the new hw_timeout 
variables:

1) Some of the hardware need higher precision in the timeout parameters 
as their limits are very short
2) There are too many watchdog drivers for me to change all at once, so 
old drivers need to remain functional while they are converted to the 
new core API

Also once the drivers are using the new API, there are actually two 
timeout values, one that is limited by the HW and one that is requested 
by the user space. The core code then intermediates between these two so 
that user space doesn't need to know about the HW limitations.

The max_timeout actually becomes irrelevant from user space perspective 
as the core code is capable of extending the timeout indefinitely, if 
that is what user space wants. We could add some logical maximum here, 
such as one day, what is already used by some of the drivers..

> Similar, it should not be necessary to have to specify hw_timeout and
> hw_max_timeout in the first place for those to be useful. We can instead do
> something like
>
> 	#define MIN_TIMEOUT	(60 * 10)	/* in seconds */
>
> 	if (!hw_max_timeout && max_timeout < MIN_TIMEOUT) {
> 		hw_max_timeout = max_timeout;
> 		max_timeout = MIN_TIMEOUT;
> 	}
>
> with similar adjustments for hw_timeout and timeout. I am not sure if we should
> keep the current meaning of timeout and max_timeout (ie it means hw limits) and
> introduce sw_max_timeout and sw_timeout, or extend the meaning of max_timeout
> and timeout to mean both and introduce the hw_variables. The latter would let us
> specify more granularity for the hw_ values, so maybe the latter approach is
> better.

I would like to be clear with the variable naming that others are for SW 
and the other for HW, but as we can't change all drivers at once, I'm 
trying to make it obvious that the new set of limits are purely for 
hardware. Once all drivers using the core API are filling the hw limits, 
the core is the only place that is taking care of the sw limits.

> Limits should not be in jiffies unless only used internally by the watchdog
> subsystem. If more granularity is needed, use milliseconds and convert to
> jiffies where needed.

I can change the limits in milliseconds.

> Consider adding helper functions such as _watchdog_ping as inline in
> watchdog_core.h.

I was wondering whether it should be in there instead of in 
watchdog_dev.c, so I'll move it there in the next version.

> hw_heartbeat should be set from the watchdog core; I don't think it needs to
> be provided by the driver. It can be calculated from hw_max_timeout and timeout
> as needed (to something like min(hw_max_timeout, timeout) / 2).

The HW might have limitations on what is the proper or ideal heartbeat 
value, which is why I thought driver should probably decide about how it 
should be set. But we could let core make a guess of its own if 
hw_heartbeat is left zero. Now I'm assuming the driver is changing the 
hw_heartbeat when timeout is changed.

> You don't need boot_kepalive and active_keepalive as separate flags in
> watchdog_worker.

You are right, I'll fix it.

> hw_features is unfortunate, and I am not sure that it is needed. That the
> watchdog is running at boot time is not really a hardware feature, but a state.
> Either add a bit to 'status' in watchdog_device, or add a callback function.
> WDOG_HW_IS_STOPPABLE can be added to 'status' as well. It should probably be
> reversed, though, to something like WDOG_NOT_STOPPABLE, so that it does not have
> to be set to existing drivers.

We can avoid WDOG_HW_IS_STOPPABLE entirely by mandating that driver 
stop() returns failure if hardware can't be stopped. This way core can 
start the worker whenever the HW can't be stopped.

Then we only have the boot time status of the watchdog left. For some HW 
that's a feature (eg. at91sam9_wdt) and some other it's a state (eg. 
omap_wdt). But I can add a new status bit for it, that's a better 
approach than having an entirely new set of HW flags.

Thanks for your review.

-Timo
diff mbox

Patch

diff --git a/drivers/watchdog/watchdog_core.c b/drivers/watchdog/watchdog_core.c
index cec9b55..fd12489 100644
--- a/drivers/watchdog/watchdog_core.c
+++ b/drivers/watchdog/watchdog_core.c
@@ -99,6 +99,89 @@  int watchdog_init_timeout(struct watchdog_device *wdd,
 EXPORT_SYMBOL_GPL(watchdog_init_timeout);
 
 /**
+ * watchdog_init_parms() - initialize generic watchdog parameters
+ * @wdd: Watchdog device to operate
+ * @dev: Device that stores the device tree properties
+ *
+ * Initialize the generic timeout parameters. The driver needs to set
+ * hw_features bitmask from @wdd prior calling this function in order
+ * to ensure the core knows how to handle the HW.
+ *
+ * A zero is returned on success and -EINVAL for failure.
+ */
+int watchdog_init_params(struct watchdog_device *wdd, struct device *dev)
+{
+	int ret = 0;
+
+	ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
+	if (ret < 0)
+		return ret;
+
+	/*
+	 * Max HW timeout needs to be set so that core knows when to
+	 * use a kernel worker to support longer watchdog timeouts
+	 */
+	if (!wdd->hw_max_timeout)
+		return -EINVAL;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(watchdog_init_params);
+
+static void watchdog_worker(struct work_struct *work)
+{
+	struct watchdog_device *wdd = container_of(to_delayed_work(work),
+						struct watchdog_device, work);
+	bool boot_keepalive;
+	bool active_keepalive;
+
+	mutex_lock(&wdd->lock);
+
+	boot_keepalive = !watchdog_active(wdd) &&
+		!watchdog_is_stoppable(wdd);
+
+	active_keepalive = watchdog_active(wdd) &&
+		wdd->hw_max_timeout < wdd->timeout * HZ;
+
+	if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
+		pr_crit("I will reset your machine !\n");
+		mutex_unlock(&wdd->lock);
+		return;
+	}
+
+	if (boot_keepalive || active_keepalive) {
+		if (wdd->ops->ping)
+			wdd->ops->ping(wdd);
+		else
+			wdd->ops->start(wdd);
+
+		schedule_delayed_work(&wdd->work,  wdd->hw_heartbeat);
+	}
+
+	mutex_unlock(&wdd->lock);
+}
+
+static int prepare_watchdog(struct watchdog_device *wdd)
+{
+	int err = 0;
+
+	/* Stop the watchdog now before user space opens the device */
+	if (watchdog_is_stoppable(wdd) &&
+		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
+		err = wdd->ops->stop(wdd);
+
+	} else if (!watchdog_is_stoppable(wdd) &&
+		wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
+		/*
+		 * Can't stop it, use a delayed worker to tick it
+		 * until it's open by user space
+		 */
+		schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
+	}
+	return err;
+}
+
+/**
  * watchdog_register_device() - register a watchdog device
  * @wdd: watchdog device
  *
@@ -156,13 +239,24 @@  int watchdog_register_device(struct watchdog_device *wdd)
 	wdd->dev = device_create(watchdog_class, wdd->parent, devno,
 					NULL, "watchdog%d", wdd->id);
 	if (IS_ERR(wdd->dev)) {
-		watchdog_dev_unregister(wdd);
-		ida_simple_remove(&watchdog_ida, id);
 		ret = PTR_ERR(wdd->dev);
-		return ret;
+		goto dev_create_fail;
+	}
+
+	INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
+
+	if (wdd->hw_max_timeout) {
+		ret = prepare_watchdog(wdd);
+		if (ret)
+			goto dev_create_fail;
 	}
 
 	return 0;
+
+dev_create_fail:
+	watchdog_dev_unregister(wdd);
+	ida_simple_remove(&watchdog_ida, id);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(watchdog_register_device);
 
@@ -181,6 +275,7 @@  void watchdog_unregister_device(struct watchdog_device *wdd)
 	if (wdd == NULL)
 		return;
 
+	cancel_delayed_work_sync(&wdd->work);
 	devno = wdd->cdev.dev;
 	ret = watchdog_dev_unregister(wdd);
 	if (ret)
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 6aaefba..04ac68c 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -49,6 +49,14 @@  static dev_t watchdog_devt;
 /* the watchdog device behind /dev/watchdog */
 static struct watchdog_device *old_wdd;
 
+static int _watchdog_ping(struct watchdog_device *wddev)
+{
+	if (wddev->ops->ping)
+		return wddev->ops->ping(wddev);  /* ping the watchdog */
+	else
+		return wddev->ops->start(wddev); /* restart watchdog */
+}
+
 /*
  *	watchdog_ping: ping the watchdog.
  *	@wddev: the watchdog device to ping
@@ -73,10 +81,13 @@  static int watchdog_ping(struct watchdog_device *wddev)
 	if (!watchdog_active(wddev))
 		goto out_ping;
 
-	if (wddev->ops->ping)
-		err = wddev->ops->ping(wddev);  /* ping the watchdog */
-	else
-		err = wddev->ops->start(wddev); /* restart watchdog */
+	err = _watchdog_ping(wddev);
+
+	if (wddev->hw_max_timeout &&
+		wddev->timeout * HZ > wddev->hw_max_timeout) {
+		wddev->expires = jiffies + wddev->timeout * HZ;
+		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
+	}
 
 out_ping:
 	mutex_unlock(&wddev->lock);
@@ -110,6 +121,13 @@  static int watchdog_start(struct watchdog_device *wddev)
 	if (err == 0)
 		set_bit(WDOG_ACTIVE, &wddev->status);
 
+	if (wddev->hw_max_timeout &&
+		wddev->timeout * HZ > wddev->hw_max_timeout) {
+		wddev->expires = jiffies + wddev->timeout * HZ;
+		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
+	} else
+		cancel_delayed_work(&wddev->work);
+
 out_start:
 	mutex_unlock(&wddev->lock);
 	return err;
@@ -145,9 +163,21 @@  static int watchdog_stop(struct watchdog_device *wddev)
 		goto out_stop;
 	}
 
-	err = wddev->ops->stop(wddev);
-	if (err == 0)
+	if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
+		cancel_delayed_work(&wddev->work);
+		err = wddev->ops->stop(wddev);
+		if (err == 0)
+			clear_bit(WDOG_ACTIVE, &wddev->status);
+	} else {
+		/* Unstoppable watchdogs need the worker to keep them alive */
 		clear_bit(WDOG_ACTIVE, &wddev->status);
+		/*
+		 * Ping it once as we don't know how much time there
+		 * is left in the watchdog timer.
+		 */
+		err = _watchdog_ping(wddev);
+		schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
+	}
 
 out_stop:
 	mutex_unlock(&wddev->lock);
@@ -194,7 +224,38 @@  out_status:
 static int watchdog_set_timeout(struct watchdog_device *wddev,
 							unsigned int timeout)
 {
-	int err;
+	int err = 0;
+
+	if (wddev->hw_max_timeout) {
+		int hw_timeout;
+		/*
+		 * We can't support too short timeout values. There is
+		 * really no maximu however, anything longer than HW
+		 * maximum will be supported by the watchdog core on
+		 * behalf of the actual HW.
+		 */
+		if (timeout < wddev->min_timeout)
+			return -EINVAL;
+
+		mutex_lock(&wddev->lock);
+		if (test_bit(WDOG_UNREGISTERED, &wddev->status)) {
+			err = -ENODEV;
+			goto out_timeout;
+		}
+
+		if (timeout * HZ > wddev->hw_max_timeout)
+			schedule_delayed_work(&wddev->work,
+					wddev->hw_heartbeat);
+
+		hw_timeout = min(timeout, wddev->hw_max_timeout / HZ);
+		if (wddev->info->options & WDIOF_SETTIMEOUT)
+			err = wddev->ops->set_timeout(wddev, hw_timeout);
+
+		if (hw_timeout < timeout)
+			wddev->timeout = timeout;
+
+		goto out_timeout;
+	}
 
 	if ((wddev->ops->set_timeout == NULL) ||
 	    !(wddev->info->options & WDIOF_SETTIMEOUT))
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index 395b70e..027c99d 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -12,6 +12,7 @@ 
 #include <linux/bitops.h>
 #include <linux/device.h>
 #include <linux/cdev.h>
+#include <linux/workqueue.h>
 #include <uapi/linux/watchdog.h>
 
 struct watchdog_ops;
@@ -62,9 +63,13 @@  struct watchdog_ops {
  * @timeout:	The watchdog devices timeout value.
  * @min_timeout:The watchdog devices minimum timeout value.
  * @max_timeout:The watchdog devices maximum timeout value.
+ * @hw_max_timeout:The watchdog hardware maximum timeout value.
+ * @hw_heartbeat:Time interval in HW between timer pings.
  * @driver-data:Pointer to the drivers private data.
  * @lock:	Lock for watchdog core internal use only.
+ * @work:	Worker used to provide longer timeouts than hardware supports.
  * @status:	Field that contains the devices internal status bits.
+ * @hw_features:Feature bits describing how the watchdog HW works.
  *
  * The watchdog_device structure contains all information about a
  * watchdog timer device.
@@ -86,8 +91,12 @@  struct watchdog_device {
 	unsigned int timeout;
 	unsigned int min_timeout;
 	unsigned int max_timeout;
+	unsigned int hw_max_timeout; /* in jiffies */
+	unsigned int hw_heartbeat; /* in jiffies */
+	unsigned long int expires; /* for keepalive worker */
 	void *driver_data;
 	struct mutex lock;
+	struct delayed_work work;
 	unsigned long status;
 /* Bit numbers for status flags */
 #define WDOG_ACTIVE		0	/* Is the watchdog running/active */
@@ -95,6 +104,14 @@  struct watchdog_device {
 #define WDOG_ALLOW_RELEASE	2	/* Did we receive the magic char ? */
 #define WDOG_NO_WAY_OUT		3	/* Is 'nowayout' feature set ? */
 #define WDOG_UNREGISTERED	4	/* Has the device been unregistered */
+
+/* Bits describing features supported by the HW */
+	unsigned long hw_features;
+
+/* Can the watchdog be stopped and started */
+#define WDOG_HW_IS_STOPPABLE		BIT(0)
+/* Is the watchdog already running when the driver starts up */
+#define WDOG_HW_RUNNING_AT_BOOT		BIT(1)
 };
 
 #define WATCHDOG_NOWAYOUT		IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
@@ -120,6 +137,11 @@  static inline bool watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
 		(t < wdd->min_timeout || t > wdd->max_timeout));
 }
 
+static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
+{
+	return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
+}
+
 /* Use the following functions to manipulate watchdog driver specific data */
 static inline void watchdog_set_drvdata(struct watchdog_device *wdd, void *data)
 {
@@ -134,6 +156,7 @@  static inline void *watchdog_get_drvdata(struct watchdog_device *wdd)
 /* drivers/watchdog/watchdog_core.c */
 extern int watchdog_init_timeout(struct watchdog_device *wdd,
 				  unsigned int timeout_parm, struct device *dev);
+int watchdog_init_params(struct watchdog_device *wdd, struct device *dev);
 extern int watchdog_register_device(struct watchdog_device *);
 extern void watchdog_unregister_device(struct watchdog_device *);