diff mbox

[2/2] at91sam9_wdt: Allow watchdog to reset device at early boot

Message ID 6c0a3a5bcd93d18437eeed04712b4aeff201a16f.1424262664.git.timo.kokkonen@offcode.fi (mailing list archive)
State New, archived
Headers show

Commit Message

Timo Kokkonen Feb. 18, 2015, 12:57 p.m. UTC
By default the driver will start a kernel timer which keeps on kicking
the watchdog HW until user space has opened the watchdog
device. Usually this is desirable as the watchdog HW is running by
default and the user space may not have any watchdog daemon running at
all.

However, on production systems it may be mandatory that also early
crashes and lockups will lead to a watchdog reset, even if they happen
before the user space has opened the watchdog device.

To resolve the issue, add a new device tree property
"early-timeout-sec" which will let the kernel timer to ping the
watchdog HW only as long as the specified timeout permits. The default
is still to use kernel timer, but more strict behavior can be enabled
via the device tree property.

Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
---
 Documentation/devicetree/bindings/watchdog/watchdog.txt | 7 +++++++
 drivers/watchdog/at91sam9_wdt.c                         | 9 ++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

Comments

Boris BREZILLON Feb. 18, 2015, 1:21 p.m. UTC | #1
Hi Timo,

On Wed, 18 Feb 2015 14:57:22 +0200
Timo Kokkonen <timo.kokkonen@offcode.fi> wrote:

> By default the driver will start a kernel timer which keeps on kicking
> the watchdog HW until user space has opened the watchdog
> device. Usually this is desirable as the watchdog HW is running by
> default and the user space may not have any watchdog daemon running at
> all.
> 
> However, on production systems it may be mandatory that also early
> crashes and lockups will lead to a watchdog reset, even if they happen
> before the user space has opened the watchdog device.
> 
> To resolve the issue, add a new device tree property
> "early-timeout-sec" which will let the kernel timer to ping the
> watchdog HW only as long as the specified timeout permits. The default
> is still to use kernel timer, but more strict behavior can be enabled
> via the device tree property.
> 
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> ---
>  Documentation/devicetree/bindings/watchdog/watchdog.txt | 7 +++++++
>  drivers/watchdog/at91sam9_wdt.c                         | 9 ++++++++-
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/watchdog/watchdog.txt b/Documentation/devicetree/bindings/watchdog/watchdog.txt
> index 7e3686c..32647cf 100644
> --- a/Documentation/devicetree/bindings/watchdog/watchdog.txt
> +++ b/Documentation/devicetree/bindings/watchdog/watchdog.txt
> @@ -4,9 +4,16 @@ using these definitions.
>  
>  Optional properties:
>  - timeout-sec: Contains the watchdog timeout in seconds.
> +- early-timeout-sec: If present, specifies a timeout value in seconds
> +  that the driver keeps on ticking the watchdog HW on behalf of user
> +  space. Once this timeout expires watchdog is left to expire in
> +  timeout-sec seconds. If this propery is set to zero, watchdog is
> +  started (or left running) so that a reset occurs in timeout-sec
> +  since the watchdog was started.
>  
>  Example:
>  
>  watchdog {
>  	 timeout-sec = <60>;
> +	 early-timeout-sec = <120>;
>  };

IMHO the documentation update should be self-contained (it should be
done in a separate patch).

Best Regards,

Boris
Guenter Roeck Feb. 18, 2015, 1:59 p.m. UTC | #2
On 02/18/2015 04:57 AM, Timo Kokkonen wrote:
> By default the driver will start a kernel timer which keeps on kicking
> the watchdog HW until user space has opened the watchdog
> device. Usually this is desirable as the watchdog HW is running by
> default and the user space may not have any watchdog daemon running at
> all.
>
> However, on production systems it may be mandatory that also early
> crashes and lockups will lead to a watchdog reset, even if they happen
> before the user space has opened the watchdog device.
>
> To resolve the issue, add a new device tree property
> "early-timeout-sec" which will let the kernel timer to ping the
> watchdog HW only as long as the specified timeout permits. The default
> is still to use kernel timer, but more strict behavior can be enabled
> via the device tree property.
>
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> ---
>   Documentation/devicetree/bindings/watchdog/watchdog.txt | 7 +++++++
>   drivers/watchdog/at91sam9_wdt.c                         | 9 ++++++++-
>   2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/devicetree/bindings/watchdog/watchdog.txt b/Documentation/devicetree/bindings/watchdog/watchdog.txt
> index 7e3686c..32647cf 100644
> --- a/Documentation/devicetree/bindings/watchdog/watchdog.txt
> +++ b/Documentation/devicetree/bindings/watchdog/watchdog.txt
> @@ -4,9 +4,16 @@ using these definitions.
>
>   Optional properties:
>   - timeout-sec: Contains the watchdog timeout in seconds.
> +- early-timeout-sec: If present, specifies a timeout value in seconds
> +  that the driver keeps on ticking the watchdog HW on behalf of user
> +  space. Once this timeout expires watchdog is left to expire in
> +  timeout-sec seconds. If this propery is set to zero, watchdog is
> +  started (or left running) so that a reset occurs in timeout-sec
> +  since the watchdog was started.
>
>   Example:
>
>   watchdog {
>   	 timeout-sec = <60>;
> +	 early-timeout-sec = <120>;

That is not a generic property as you defined it; if so,
it would have to be implemented in the watchdog core code,
not in the at91 code. You'll have to document it in the bindings
description for at91sam9_wdt.

Guenter


>   };
> diff --git a/drivers/watchdog/at91sam9_wdt.c b/drivers/watchdog/at91sam9_wdt.c
> index 6df9405..1b40bfa 100644
> --- a/drivers/watchdog/at91sam9_wdt.c
> +++ b/drivers/watchdog/at91sam9_wdt.c
> @@ -89,6 +89,8 @@ struct at91wdt {
>   	u32 mr_mask;
>   	unsigned long heartbeat;	/* WDT heartbeat in jiffies */
>   	bool nowayout;
> +	/* Timeout in jiffies for stopping the early timer */
> +	unsigned long early_timer;
>   	unsigned int irq;
>   };
>
> @@ -122,7 +124,8 @@ static void at91_ping(unsigned long data)
>   {
>   	struct at91wdt *wdt = (struct at91wdt *)data;
>   	if (time_before(jiffies, wdt->next_heartbeat) ||
> -	    !watchdog_active(&wdt->wdd)) {
> +		(time_before(jiffies, wdt->early_timer) &&
> +			!watchdog_active(&wdt->wdd))) {
>   		at91_wdt_reset(wdt);
>   		mod_timer(&wdt->timer, jiffies + wdt->heartbeat);
>   	} else {
> @@ -316,6 +319,10 @@ static int of_at91wdt_init(struct device_node *np, struct at91wdt *wdt)
>
>   	wdt->mr |= max | ((max - min) << 16);
>
> +	if (!of_property_read_u32_index(np, "early-timeout-sec", 0,
> +					(u32 *)&wdt->early_timer))
> +		wdt->early_timer = wdt->early_timer * HZ + jiffies;
> +
>   	return 0;
>   }
>   #else
>
Boris BREZILLON Feb. 18, 2015, 2:17 p.m. UTC | #3
Hi Guenter,

On Wed, 18 Feb 2015 05:59:01 -0800
Guenter Roeck <linux@roeck-us.net> wrote:

> On 02/18/2015 04:57 AM, Timo Kokkonen wrote:
> > By default the driver will start a kernel timer which keeps on kicking
> > the watchdog HW until user space has opened the watchdog
> > device. Usually this is desirable as the watchdog HW is running by
> > default and the user space may not have any watchdog daemon running at
> > all.
> >
> > However, on production systems it may be mandatory that also early
> > crashes and lockups will lead to a watchdog reset, even if they happen
> > before the user space has opened the watchdog device.
> >
> > To resolve the issue, add a new device tree property
> > "early-timeout-sec" which will let the kernel timer to ping the
> > watchdog HW only as long as the specified timeout permits. The default
> > is still to use kernel timer, but more strict behavior can be enabled
> > via the device tree property.
> >
> > Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> > ---
> >   Documentation/devicetree/bindings/watchdog/watchdog.txt | 7 +++++++
> >   drivers/watchdog/at91sam9_wdt.c                         | 9 ++++++++-
> >   2 files changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/devicetree/bindings/watchdog/watchdog.txt b/Documentation/devicetree/bindings/watchdog/watchdog.txt
> > index 7e3686c..32647cf 100644
> > --- a/Documentation/devicetree/bindings/watchdog/watchdog.txt
> > +++ b/Documentation/devicetree/bindings/watchdog/watchdog.txt
> > @@ -4,9 +4,16 @@ using these definitions.
> >
> >   Optional properties:
> >   - timeout-sec: Contains the watchdog timeout in seconds.
> > +- early-timeout-sec: If present, specifies a timeout value in seconds
> > +  that the driver keeps on ticking the watchdog HW on behalf of user
> > +  space. Once this timeout expires watchdog is left to expire in
> > +  timeout-sec seconds. If this propery is set to zero, watchdog is
> > +  started (or left running) so that a reset occurs in timeout-sec
> > +  since the watchdog was started.
> >
> >   Example:
> >
> >   watchdog {
> >   	 timeout-sec = <60>;
> > +	 early-timeout-sec = <120>;
> 
> That is not a generic property as you defined it; if so,
> it would have to be implemented in the watchdog core code,
> not in the at91 code. You'll have to document it in the bindings
> description for at91sam9_wdt.

Then, if this is a controller specific property, it should be defined
with the 'atmel,' prefix.
We're kind of looping here: the initial discussion was "is there a need
for this property to be a generic one ?", and now you're saying no,
while you previously left the door opened.

Tomi is proposing a generic approach, as you asked him to. I agree that
parsing the property in core code and making its value part of the
generic watchdog struct makes sense (that's what I proposed to Tomi a
few weeks ago).

Anyway, I'm fine with both approaches (generic and controller specific),
so I'll let you decide in the end.

Best Regards,

Boris
Guenter Roeck Feb. 18, 2015, 2:50 p.m. UTC | #4
On 02/18/2015 06:17 AM, Boris Brezillon wrote:
> Hi Guenter,
>
> On Wed, 18 Feb 2015 05:59:01 -0800
> Guenter Roeck <linux@roeck-us.net> wrote:
>
>> On 02/18/2015 04:57 AM, Timo Kokkonen wrote:
>>> By default the driver will start a kernel timer which keeps on kicking
>>> the watchdog HW until user space has opened the watchdog
>>> device. Usually this is desirable as the watchdog HW is running by
>>> default and the user space may not have any watchdog daemon running at
>>> all.
>>>
>>> However, on production systems it may be mandatory that also early
>>> crashes and lockups will lead to a watchdog reset, even if they happen
>>> before the user space has opened the watchdog device.
>>>
>>> To resolve the issue, add a new device tree property
>>> "early-timeout-sec" which will let the kernel timer to ping the
>>> watchdog HW only as long as the specified timeout permits. The default
>>> is still to use kernel timer, but more strict behavior can be enabled
>>> via the device tree property.
>>>
>>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
>>> ---
>>>    Documentation/devicetree/bindings/watchdog/watchdog.txt | 7 +++++++
>>>    drivers/watchdog/at91sam9_wdt.c                         | 9 ++++++++-
>>>    2 files changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/devicetree/bindings/watchdog/watchdog.txt b/Documentation/devicetree/bindings/watchdog/watchdog.txt
>>> index 7e3686c..32647cf 100644
>>> --- a/Documentation/devicetree/bindings/watchdog/watchdog.txt
>>> +++ b/Documentation/devicetree/bindings/watchdog/watchdog.txt
>>> @@ -4,9 +4,16 @@ using these definitions.
>>>
>>>    Optional properties:
>>>    - timeout-sec: Contains the watchdog timeout in seconds.
>>> +- early-timeout-sec: If present, specifies a timeout value in seconds
>>> +  that the driver keeps on ticking the watchdog HW on behalf of user
>>> +  space. Once this timeout expires watchdog is left to expire in
>>> +  timeout-sec seconds. If this propery is set to zero, watchdog is
>>> +  started (or left running) so that a reset occurs in timeout-sec
>>> +  since the watchdog was started.
>>>
>>>    Example:
>>>
>>>    watchdog {
>>>    	 timeout-sec = <60>;
>>> +	 early-timeout-sec = <120>;
>>
>> That is not a generic property as you defined it; if so,
>> it would have to be implemented in the watchdog core code,
>> not in the at91 code. You'll have to document it in the bindings
>> description for at91sam9_wdt.
>
> Then, if this is a controller specific property, it should be defined
> with the 'atmel,' prefix.
> We're kind of looping here: the initial discussion was "is there a need
> for this property to be a generic one ?", and now you're saying no,
> while you previously left the door opened.
>
> Tomi is proposing a generic approach, as you asked him to. I agree that
> parsing the property in core code and making its value part of the
> generic watchdog struct makes sense (that's what I proposed to Tomi a
> few weeks ago).
>
Hmm ... the problem here is that the property description creates the
assumption or expectation that the property is used if defined,
which is not the case.

I am not sure how to best resolve this. Maybe a comment in the property
description stating that implementation of is device (driver) dependent ?
After all, that is true for the timeout-sec property as well.

Thanks,
Guenter
Alexandre Belloni Feb. 18, 2015, 4 p.m. UTC | #5
Hi,

On 18/02/2015 at 06:50:44 -0800, Guenter Roeck wrote :
> >>>   Optional properties:
> >>>   - timeout-sec: Contains the watchdog timeout in seconds.
> >>>+- early-timeout-sec: If present, specifies a timeout value in seconds
> >>>+  that the driver keeps on ticking the watchdog HW on behalf of user
> >>>+  space. Once this timeout expires watchdog is left to expire in
> >>>+  timeout-sec seconds. If this propery is set to zero, watchdog is
> >>>+  started (or left running) so that a reset occurs in timeout-sec
> >>>+  since the watchdog was started.
> >>>
> >>>   Example:
> >>>
> >>>   watchdog {
> >>>   	 timeout-sec = <60>;
> >>>+	 early-timeout-sec = <120>;
> >>
> >>That is not a generic property as you defined it; if so,
> >>it would have to be implemented in the watchdog core code,
> >>not in the at91 code. You'll have to document it in the bindings
> >>description for at91sam9_wdt.
> >
> >Then, if this is a controller specific property, it should be defined
> >with the 'atmel,' prefix.
> >We're kind of looping here: the initial discussion was "is there a need
> >for this property to be a generic one ?", and now you're saying no,
> >while you previously left the door opened.
> >
> >Tomi is proposing a generic approach, as you asked him to. I agree that
> >parsing the property in core code and making its value part of the
> >generic watchdog struct makes sense (that's what I proposed to Tomi a
> >few weeks ago).
> >
> Hmm ... the problem here is that the property description creates the
> assumption or expectation that the property is used if defined,
> which is not the case.
> 
> I am not sure how to best resolve this. Maybe a comment in the property
> description stating that implementation of is device (driver) dependent ?
> After all, that is true for the timeout-sec property as well.
> 

I would leave it in the generic file and state that it may not be
implemented in the driver. That way, the property is documented for new
driver writers.
Guenter Roeck Feb. 18, 2015, 5:50 p.m. UTC | #6
On Wed, Feb 18, 2015 at 05:00:33PM +0100, Alexandre Belloni wrote:
> Hi,
> 
> On 18/02/2015 at 06:50:44 -0800, Guenter Roeck wrote :
> > >>>   Optional properties:
> > >>>   - timeout-sec: Contains the watchdog timeout in seconds.
> > >>>+- early-timeout-sec: If present, specifies a timeout value in seconds
> > >>>+  that the driver keeps on ticking the watchdog HW on behalf of user
> > >>>+  space. Once this timeout expires watchdog is left to expire in
> > >>>+  timeout-sec seconds. If this propery is set to zero, watchdog is
> > >>>+  started (or left running) so that a reset occurs in timeout-sec
> > >>>+  since the watchdog was started.
> > >>>
> > >>>   Example:
> > >>>
> > >>>   watchdog {
> > >>>   	 timeout-sec = <60>;
> > >>>+	 early-timeout-sec = <120>;
> > >>
> > >>That is not a generic property as you defined it; if so,
> > >>it would have to be implemented in the watchdog core code,
> > >>not in the at91 code. You'll have to document it in the bindings
> > >>description for at91sam9_wdt.
> > >
> > >Then, if this is a controller specific property, it should be defined
> > >with the 'atmel,' prefix.
> > >We're kind of looping here: the initial discussion was "is there a need
> > >for this property to be a generic one ?", and now you're saying no,
> > >while you previously left the door opened.
> > >
> > >Tomi is proposing a generic approach, as you asked him to. I agree that
> > >parsing the property in core code and making its value part of the
> > >generic watchdog struct makes sense (that's what I proposed to Tomi a
> > >few weeks ago).
> > >
> > Hmm ... the problem here is that the property description creates the
> > assumption or expectation that the property is used if defined,
> > which is not the case.
> > 
> > I am not sure how to best resolve this. Maybe a comment in the property
> > description stating that implementation of is device (driver) dependent ?
> > After all, that is true for the timeout-sec property as well.
> > 
> 
> I would leave it in the generic file and state that it may not be
> implemented in the driver. That way, the property is documented for new
> driver writers.
> 
Yes, that would be fine ok me.

Thanks,
Guenter
Boris BREZILLON Feb. 18, 2015, 8:21 p.m. UTC | #7
On Wed, 18 Feb 2015 09:50:02 -0800
Guenter Roeck <linux@roeck-us.net> wrote:

> On Wed, Feb 18, 2015 at 05:00:33PM +0100, Alexandre Belloni wrote:
> > Hi,
> > 
> > On 18/02/2015 at 06:50:44 -0800, Guenter Roeck wrote :
> > > >>>   Optional properties:
> > > >>>   - timeout-sec: Contains the watchdog timeout in seconds.
> > > >>>+- early-timeout-sec: If present, specifies a timeout value in seconds
> > > >>>+  that the driver keeps on ticking the watchdog HW on behalf of user
> > > >>>+  space. Once this timeout expires watchdog is left to expire in
> > > >>>+  timeout-sec seconds. If this propery is set to zero, watchdog is
> > > >>>+  started (or left running) so that a reset occurs in timeout-sec
> > > >>>+  since the watchdog was started.
> > > >>>
> > > >>>   Example:
> > > >>>
> > > >>>   watchdog {
> > > >>>   	 timeout-sec = <60>;
> > > >>>+	 early-timeout-sec = <120>;
> > > >>
> > > >>That is not a generic property as you defined it; if so,
> > > >>it would have to be implemented in the watchdog core code,
> > > >>not in the at91 code. You'll have to document it in the bindings
> > > >>description for at91sam9_wdt.
> > > >
> > > >Then, if this is a controller specific property, it should be defined
> > > >with the 'atmel,' prefix.
> > > >We're kind of looping here: the initial discussion was "is there a need
> > > >for this property to be a generic one ?", and now you're saying no,
> > > >while you previously left the door opened.
> > > >
> > > >Tomi is proposing a generic approach, as you asked him to. I agree that
> > > >parsing the property in core code and making its value part of the
> > > >generic watchdog struct makes sense (that's what I proposed to Tomi a
> > > >few weeks ago).
> > > >
> > > Hmm ... the problem here is that the property description creates the
> > > assumption or expectation that the property is used if defined,
> > > which is not the case.
> > > 
> > > I am not sure how to best resolve this. Maybe a comment in the property
> > > description stating that implementation of is device (driver) dependent ?
> > > After all, that is true for the timeout-sec property as well.
> > > 
> > 
> > I would leave it in the generic file and state that it may not be
> > implemented in the driver. That way, the property is documented for new
> > driver writers.
> > 
> Yes, that would be fine ok me.

Great!
Timo can you change the documentation accordingly ?
Rob Herring Feb. 18, 2015, 9:11 p.m. UTC | #8
On Wed, Feb 18, 2015 at 10:00 AM, Alexandre Belloni
<alexandre.belloni@free-electrons.com> wrote:
> Hi,
>
> On 18/02/2015 at 06:50:44 -0800, Guenter Roeck wrote :
>> >>>   Optional properties:
>> >>>   - timeout-sec: Contains the watchdog timeout in seconds.
>> >>>+- early-timeout-sec: If present, specifies a timeout value in seconds
>> >>>+  that the driver keeps on ticking the watchdog HW on behalf of user
>> >>>+  space. Once this timeout expires watchdog is left to expire in
>> >>>+  timeout-sec seconds. If this propery is set to zero, watchdog is
>> >>>+  started (or left running) so that a reset occurs in timeout-sec
>> >>>+  since the watchdog was started.
>> >>>
>> >>>   Example:
>> >>>
>> >>>   watchdog {
>> >>>            timeout-sec = <60>;
>> >>>+   early-timeout-sec = <120>;
>> >>
>> >>That is not a generic property as you defined it; if so,
>> >>it would have to be implemented in the watchdog core code,
>> >>not in the at91 code. You'll have to document it in the bindings
>> >>description for at91sam9_wdt.
>> >
>> >Then, if this is a controller specific property, it should be defined
>> >with the 'atmel,' prefix.
>> >We're kind of looping here: the initial discussion was "is there a need
>> >for this property to be a generic one ?", and now you're saying no,
>> >while you previously left the door opened.
>> >
>> >Tomi is proposing a generic approach, as you asked him to. I agree that
>> >parsing the property in core code and making its value part of the
>> >generic watchdog struct makes sense (that's what I proposed to Tomi a
>> >few weeks ago).
>> >
>> Hmm ... the problem here is that the property description creates the
>> assumption or expectation that the property is used if defined,
>> which is not the case.
>>
>> I am not sure how to best resolve this. Maybe a comment in the property
>> description stating that implementation of is device (driver) dependent ?
>> After all, that is true for the timeout-sec property as well.
>>
>
> I would leave it in the generic file and state that it may not be
> implemented in the driver. That way, the property is documented for new
> driver writers.

That is pretty much true of any optional property. Whether implemented
in the driver or core is an implementation detail that does not belong
in the binding.

I find this property pretty questionable. Certainly having the kernel
service a watchdog either enabled at reset, in the bootloader, or by
the kernel is a useful feature. A timeout for "how long userspace
watchdog daemon takes to start" does not belong in DT. timeout-sec
should be the default/initial timeout and long enough for userspace to
start. Userspace can then change it to a more suitable value.

Rob
Timo Kokkonen Feb. 19, 2015, 6:02 a.m. UTC | #9
Hi,

On 18.02.2015 22:21, Boris Brezillon wrote:
> On Wed, 18 Feb 2015 09:50:02 -0800
> Guenter Roeck <linux@roeck-us.net> wrote:
>
>> On Wed, Feb 18, 2015 at 05:00:33PM +0100, Alexandre Belloni wrote:
>>> Hi,
>>>
>>> On 18/02/2015 at 06:50:44 -0800, Guenter Roeck wrote :
>>>>>>>    Optional properties:
>>>>>>>    - timeout-sec: Contains the watchdog timeout in seconds.
>>>>>>> +- early-timeout-sec: If present, specifies a timeout value in seconds
>>>>>>> +  that the driver keeps on ticking the watchdog HW on behalf of user
>>>>>>> +  space. Once this timeout expires watchdog is left to expire in
>>>>>>> +  timeout-sec seconds. If this propery is set to zero, watchdog is
>>>>>>> +  started (or left running) so that a reset occurs in timeout-sec
>>>>>>> +  since the watchdog was started.
>>>>>>>
>>>>>>>    Example:
>>>>>>>
>>>>>>>    watchdog {
>>>>>>>    	 timeout-sec = <60>;
>>>>>>> +	 early-timeout-sec = <120>;
>>>>>>
>>>>>> That is not a generic property as you defined it; if so,
>>>>>> it would have to be implemented in the watchdog core code,
>>>>>> not in the at91 code. You'll have to document it in the bindings
>>>>>> description for at91sam9_wdt.
>>>>>
>>>>> Then, if this is a controller specific property, it should be defined
>>>>> with the 'atmel,' prefix.
>>>>> We're kind of looping here: the initial discussion was "is there a need
>>>>> for this property to be a generic one ?", and now you're saying no,
>>>>> while you previously left the door opened.
>>>>>
>>>>> Tomi is proposing a generic approach, as you asked him to. I agree that
>>>>> parsing the property in core code and making its value part of the
>>>>> generic watchdog struct makes sense (that's what I proposed to Tomi a
>>>>> few weeks ago).
>>>>>
>>>> Hmm ... the problem here is that the property description creates the
>>>> assumption or expectation that the property is used if defined,
>>>> which is not the case.
>>>>
>>>> I am not sure how to best resolve this. Maybe a comment in the property
>>>> description stating that implementation of is device (driver) dependent ?
>>>> After all, that is true for the timeout-sec property as well.
>>>>
>>>
>>> I would leave it in the generic file and state that it may not be
>>> implemented in the driver. That way, the property is documented for new
>>> driver writers.
>>>
>> Yes, that would be fine ok me.
>
> Great!
> Timo can you change the documentation accordingly ?

Yes, sure. Will send v4 soon.

-Timo
Timo Kokkonen Feb. 19, 2015, 6:14 a.m. UTC | #10
Hi,

On 18.02.2015 23:11, Rob Herring wrote:
> On Wed, Feb 18, 2015 at 10:00 AM, Alexandre Belloni
> <alexandre.belloni@free-electrons.com> wrote:
>> Hi,
>>
>> On 18/02/2015 at 06:50:44 -0800, Guenter Roeck wrote :
>>>>>>    Optional properties:
>>>>>>    - timeout-sec: Contains the watchdog timeout in seconds.
>>>>>> +- early-timeout-sec: If present, specifies a timeout value in seconds
>>>>>> +  that the driver keeps on ticking the watchdog HW on behalf of user
>>>>>> +  space. Once this timeout expires watchdog is left to expire in
>>>>>> +  timeout-sec seconds. If this propery is set to zero, watchdog is
>>>>>> +  started (or left running) so that a reset occurs in timeout-sec
>>>>>> +  since the watchdog was started.
>>>>>>
>>>>>>    Example:
>>>>>>
>>>>>>    watchdog {
>>>>>>             timeout-sec = <60>;
>>>>>> +   early-timeout-sec = <120>;
>>>>>
>>>>> That is not a generic property as you defined it; if so,
>>>>> it would have to be implemented in the watchdog core code,
>>>>> not in the at91 code. You'll have to document it in the bindings
>>>>> description for at91sam9_wdt.
>>>>
>>>> Then, if this is a controller specific property, it should be defined
>>>> with the 'atmel,' prefix.
>>>> We're kind of looping here: the initial discussion was "is there a need
>>>> for this property to be a generic one ?", and now you're saying no,
>>>> while you previously left the door opened.
>>>>
>>>> Tomi is proposing a generic approach, as you asked him to. I agree that
>>>> parsing the property in core code and making its value part of the
>>>> generic watchdog struct makes sense (that's what I proposed to Tomi a
>>>> few weeks ago).
>>>>
>>> Hmm ... the problem here is that the property description creates the
>>> assumption or expectation that the property is used if defined,
>>> which is not the case.
>>>
>>> I am not sure how to best resolve this. Maybe a comment in the property
>>> description stating that implementation of is device (driver) dependent ?
>>> After all, that is true for the timeout-sec property as well.
>>>
>>
>> I would leave it in the generic file and state that it may not be
>> implemented in the driver. That way, the property is documented for new
>> driver writers.
>
> That is pretty much true of any optional property. Whether implemented
> in the driver or core is an implementation detail that does not belong
> in the binding.
>
> I find this property pretty questionable. Certainly having the kernel
> service a watchdog either enabled at reset, in the bootloader, or by
> the kernel is a useful feature. A timeout for "how long userspace
> watchdog daemon takes to start" does not belong in DT. timeout-sec
> should be the default/initial timeout and long enough for userspace to
> start. Userspace can then change it to a more suitable value.

That would be a good workaround if we had enough time in the watchdog HW 
to wait long enough for the user space to start up. For example in atmel 
HW the maximum is 16 seconds, which may not be enough for the kernel to 
boot up and the user space to start the watchdog daemon.

But even that is not enough as all of the watchdog drivers attempt to 
*disable* the watchdog device before user space opens it. What good is a 
watchdog if it is disabled by the kernel and we got stuck before the 
daemon wakes up and re-enables it? This the problem with all of the 
watchdog drivers right now. There are plenty of products out there that 
can't deal with this kind of limitation. They are all hacking around it 
one way or another. If there is a crash, the watchdog must reset the 
device. I can't think of any other run time way to configure the 
watchdog for this kind of situation than having a device tree property 
for it.

What I am proposing here is a way to solve this without hacking. I was 
told to think also a way to defer starting the watchdog for a given 
timeout so that user space would have more time to wake up, which 
sounded like a good idea. And this obviously needs to be implemented so 
that the watchdog is guaranteed to reset the device should there be a 
crash of any kind that prevents the watchdog daemon from starting up. 
There are a lot of details that need to be taken care of properly and 
therefore watchdog core can't do much about it, which is why I thought 
there is no much point trying to do it in watchdog core.

But never the less I can try to state this in the documentation just to 
make clear what we are trying to solve here.

Thanks for all the good comments!

-Timo
Jean-Christophe PLAGNIOL-VILLARD Feb. 20, 2015, 7:48 a.m. UTC | #11
> On Feb 18, 2015, at 8:57 PM, Timo Kokkonen <timo.kokkonen@offcode.fi> wrote:
> 
> By default the driver will start a kernel timer which keeps on kicking
> the watchdog HW until user space has opened the watchdog
> device. Usually this is desirable as the watchdog HW is running by
> default and the user space may not have any watchdog daemon running at
> all.
> 
> However, on production systems it may be mandatory that also early
> crashes and lockups will lead to a watchdog reset, even if they happen
> before the user space has opened the watchdog device.
> 
> To resolve the issue, add a new device tree property
> "early-timeout-sec" which will let the kernel timer to ping the
> watchdog HW only as long as the specified timeout permits. The default
> is still to use kernel timer, but more strict behavior can be enabled
> via the device tree property.
> 
> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> ---
> Documentation/devicetree/bindings/watchdog/watchdog.txt | 7 +++++++
> drivers/watchdog/at91sam9_wdt.c                         | 9 ++++++++-

This should not be handled by the driver but the kernel in a generic way

Best Regards,
J.
> 2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/watchdog/watchdog.txt b/Documentation/devicetree/bindings/watchdog/watchdog.txt
> index 7e3686c..32647cf 100644
> --- a/Documentation/devicetree/bindings/watchdog/watchdog.txt
> +++ b/Documentation/devicetree/bindings/watchdog/watchdog.txt
> @@ -4,9 +4,16 @@ using these definitions.
> 
> Optional properties:
> - timeout-sec: Contains the watchdog timeout in seconds.
> +- early-timeout-sec: If present, specifies a timeout value in seconds
> +  that the driver keeps on ticking the watchdog HW on behalf of user
> +  space. Once this timeout expires watchdog is left to expire in
> +  timeout-sec seconds. If this propery is set to zero, watchdog is
> +  started (or left running) so that a reset occurs in timeout-sec
> +  since the watchdog was started.
> 
> Example:
> 
> watchdog {
> 	 timeout-sec = <60>;
> +	 early-timeout-sec = <120>;
> };
> diff --git a/drivers/watchdog/at91sam9_wdt.c b/drivers/watchdog/at91sam9_wdt.c
> index 6df9405..1b40bfa 100644
> --- a/drivers/watchdog/at91sam9_wdt.c
> +++ b/drivers/watchdog/at91sam9_wdt.c
> @@ -89,6 +89,8 @@ struct at91wdt {
> 	u32 mr_mask;
> 	unsigned long heartbeat;	/* WDT heartbeat in jiffies */
> 	bool nowayout;
> +	/* Timeout in jiffies for stopping the early timer */
> +	unsigned long early_timer;
> 	unsigned int irq;
> };
> 
> @@ -122,7 +124,8 @@ static void at91_ping(unsigned long data)
> {
> 	struct at91wdt *wdt = (struct at91wdt *)data;
> 	if (time_before(jiffies, wdt->next_heartbeat) ||
> -	    !watchdog_active(&wdt->wdd)) {
> +		(time_before(jiffies, wdt->early_timer) &&
> +			!watchdog_active(&wdt->wdd))) {
> 		at91_wdt_reset(wdt);
> 		mod_timer(&wdt->timer, jiffies + wdt->heartbeat);
> 	} else {
> @@ -316,6 +319,10 @@ static int of_at91wdt_init(struct device_node *np, struct at91wdt *wdt)
> 
> 	wdt->mr |= max | ((max - min) << 16);
> 
> +	if (!of_property_read_u32_index(np, "early-timeout-sec", 0,
> +					(u32 *)&wdt->early_timer))
> +		wdt->early_timer = wdt->early_timer * HZ + jiffies;
> +
> 	return 0;
> }
> #else
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Boris BREZILLON Feb. 20, 2015, 7:51 a.m. UTC | #12
Hi Jean-Christophe,

On Fri, 20 Feb 2015 15:48:22 +0800
Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> wrote:

> 
> > On Feb 18, 2015, at 8:57 PM, Timo Kokkonen <timo.kokkonen@offcode.fi> wrote:
> > 
> > By default the driver will start a kernel timer which keeps on kicking
> > the watchdog HW until user space has opened the watchdog
> > device. Usually this is desirable as the watchdog HW is running by
> > default and the user space may not have any watchdog daemon running at
> > all.
> > 
> > However, on production systems it may be mandatory that also early
> > crashes and lockups will lead to a watchdog reset, even if they happen
> > before the user space has opened the watchdog device.
> > 
> > To resolve the issue, add a new device tree property
> > "early-timeout-sec" which will let the kernel timer to ping the
> > watchdog HW only as long as the specified timeout permits. The default
> > is still to use kernel timer, but more strict behavior can be enabled
> > via the device tree property.
> > 
> > Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> > ---
> > Documentation/devicetree/bindings/watchdog/watchdog.txt | 7 +++++++
> > drivers/watchdog/at91sam9_wdt.c                         | 9 ++++++++-
> 
> This should not be handled by the driver but the kernel in a generic way

Could you detail a bit more what you have in mind ?

Best Regards,

Boris
Timo Kokkonen Feb. 20, 2015, 8 a.m. UTC | #13
Hi,

On 20.02.2015 09:48, Jean-Christophe PLAGNIOL-VILLARD wrote:
>
>> On Feb 18, 2015, at 8:57 PM, Timo Kokkonen <timo.kokkonen@offcode.fi> wrote:
>>
>> By default the driver will start a kernel timer which keeps on kicking
>> the watchdog HW until user space has opened the watchdog
>> device. Usually this is desirable as the watchdog HW is running by
>> default and the user space may not have any watchdog daemon running at
>> all.
>>
>> However, on production systems it may be mandatory that also early
>> crashes and lockups will lead to a watchdog reset, even if they happen
>> before the user space has opened the watchdog device.
>>
>> To resolve the issue, add a new device tree property
>> "early-timeout-sec" which will let the kernel timer to ping the
>> watchdog HW only as long as the specified timeout permits. The default
>> is still to use kernel timer, but more strict behavior can be enabled
>> via the device tree property.
>>
>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
>> ---
>> Documentation/devicetree/bindings/watchdog/watchdog.txt | 7 +++++++
>> drivers/watchdog/at91sam9_wdt.c                         | 9 ++++++++-
>
> This should not be handled by the driver but the kernel in a generic way
>

Any idea how to do that? The generic watchdog code doesn't really know 
anything about how the actual HW works. Eg. it can't know whether the 
watchdog is already running or not, does it need to be started or should 
we just change the expiration timeout or what. The best the core can do 
about this is to parse the timeout value off the device tree and give 
that to the driver. And even that requires that the driver asks the core 
to do that, because the driver needs to know how to configure the HW 
properly before exiting probe.

This is why I thought it is quite pointless trying to do anything about 
it in the watchdog core. We could add more generic of parser in the core 
and change the core API and all drivers, but I don't really see what we 
would accomplish with that. Each driver that wants to support this needs 
to change, because all drivers currently try to stop watchdog on their 
probe function. This is not right if we are about to catch a crash that 
might happen in the kernel right after the watchdog HW has been stopped.

Of course, I might be missing something, please elaborate if you had 
some plan in your mind.

Thanks,
-Timo


>> 2 files changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/watchdog/watchdog.txt b/Documentation/devicetree/bindings/watchdog/watchdog.txt
>> index 7e3686c..32647cf 100644
>> --- a/Documentation/devicetree/bindings/watchdog/watchdog.txt
>> +++ b/Documentation/devicetree/bindings/watchdog/watchdog.txt
>> @@ -4,9 +4,16 @@ using these definitions.
>>
>> Optional properties:
>> - timeout-sec: Contains the watchdog timeout in seconds.
>> +- early-timeout-sec: If present, specifies a timeout value in seconds
>> +  that the driver keeps on ticking the watchdog HW on behalf of user
>> +  space. Once this timeout expires watchdog is left to expire in
>> +  timeout-sec seconds. If this propery is set to zero, watchdog is
>> +  started (or left running) so that a reset occurs in timeout-sec
>> +  since the watchdog was started.
>>
>> Example:
>>
>> watchdog {
>> 	 timeout-sec = <60>;
>> +	 early-timeout-sec = <120>;
>> };
>> diff --git a/drivers/watchdog/at91sam9_wdt.c b/drivers/watchdog/at91sam9_wdt.c
>> index 6df9405..1b40bfa 100644
>> --- a/drivers/watchdog/at91sam9_wdt.c
>> +++ b/drivers/watchdog/at91sam9_wdt.c
>> @@ -89,6 +89,8 @@ struct at91wdt {
>> 	u32 mr_mask;
>> 	unsigned long heartbeat;	/* WDT heartbeat in jiffies */
>> 	bool nowayout;
>> +	/* Timeout in jiffies for stopping the early timer */
>> +	unsigned long early_timer;
>> 	unsigned int irq;
>> };
>>
>> @@ -122,7 +124,8 @@ static void at91_ping(unsigned long data)
>> {
>> 	struct at91wdt *wdt = (struct at91wdt *)data;
>> 	if (time_before(jiffies, wdt->next_heartbeat) ||
>> -	    !watchdog_active(&wdt->wdd)) {
>> +		(time_before(jiffies, wdt->early_timer) &&
>> +			!watchdog_active(&wdt->wdd))) {
>> 		at91_wdt_reset(wdt);
>> 		mod_timer(&wdt->timer, jiffies + wdt->heartbeat);
>> 	} else {
>> @@ -316,6 +319,10 @@ static int of_at91wdt_init(struct device_node *np, struct at91wdt *wdt)
>>
>> 	wdt->mr |= max | ((max - min) << 16);
>>
>> +	if (!of_property_read_u32_index(np, "early-timeout-sec", 0,
>> +					(u32 *)&wdt->early_timer))
>> +		wdt->early_timer = wdt->early_timer * HZ + jiffies;
>> +
>> 	return 0;
>> }
>> #else
>> --
>> 2.1.0
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Rob Herring Feb. 20, 2015, 2:06 p.m. UTC | #14
On Thu, Feb 19, 2015 at 12:14 AM, Timo Kokkonen
<timo.kokkonen@offcode.fi> wrote:
> Hi,
>
>
> On 18.02.2015 23:11, Rob Herring wrote:
>>
>> On Wed, Feb 18, 2015 at 10:00 AM, Alexandre Belloni
>> <alexandre.belloni@free-electrons.com> wrote:
>>>
>>> Hi,
>>>
>>> On 18/02/2015 at 06:50:44 -0800, Guenter Roeck wrote :
>>>>>>>
>>>>>>>    Optional properties:
>>>>>>>    - timeout-sec: Contains the watchdog timeout in seconds.
>>>>>>> +- early-timeout-sec: If present, specifies a timeout value in
>>>>>>> seconds
>>>>>>> +  that the driver keeps on ticking the watchdog HW on behalf of user
>>>>>>> +  space. Once this timeout expires watchdog is left to expire in
>>>>>>> +  timeout-sec seconds. If this propery is set to zero, watchdog is
>>>>>>> +  started (or left running) so that a reset occurs in timeout-sec
>>>>>>> +  since the watchdog was started.
>>>>>>>
>>>>>>>    Example:
>>>>>>>
>>>>>>>    watchdog {
>>>>>>>             timeout-sec = <60>;
>>>>>>> +   early-timeout-sec = <120>;
>>>>>>
>>>>>>
>>>>>> That is not a generic property as you defined it; if so,
>>>>>> it would have to be implemented in the watchdog core code,
>>>>>> not in the at91 code. You'll have to document it in the bindings
>>>>>> description for at91sam9_wdt.
>>>>>
>>>>>
>>>>> Then, if this is a controller specific property, it should be defined
>>>>> with the 'atmel,' prefix.
>>>>> We're kind of looping here: the initial discussion was "is there a need
>>>>> for this property to be a generic one ?", and now you're saying no,
>>>>> while you previously left the door opened.
>>>>>
>>>>> Tomi is proposing a generic approach, as you asked him to. I agree that
>>>>> parsing the property in core code and making its value part of the
>>>>> generic watchdog struct makes sense (that's what I proposed to Tomi a
>>>>> few weeks ago).
>>>>>
>>>> Hmm ... the problem here is that the property description creates the
>>>> assumption or expectation that the property is used if defined,
>>>> which is not the case.
>>>>
>>>> I am not sure how to best resolve this. Maybe a comment in the property
>>>> description stating that implementation of is device (driver) dependent
>>>> ?
>>>> After all, that is true for the timeout-sec property as well.
>>>>
>>>
>>> I would leave it in the generic file and state that it may not be
>>> implemented in the driver. That way, the property is documented for new
>>> driver writers.
>>
>>
>> That is pretty much true of any optional property. Whether implemented
>> in the driver or core is an implementation detail that does not belong
>> in the binding.
>>
>> I find this property pretty questionable. Certainly having the kernel
>> service a watchdog either enabled at reset, in the bootloader, or by
>> the kernel is a useful feature. A timeout for "how long userspace
>> watchdog daemon takes to start" does not belong in DT. timeout-sec
>> should be the default/initial timeout and long enough for userspace to
>> start. Userspace can then change it to a more suitable value.
>
>
> That would be a good workaround if we had enough time in the watchdog HW to
> wait long enough for the user space to start up. For example in atmel HW the
> maximum is 16 seconds, which may not be enough for the kernel to boot up and
> the user space to start the watchdog daemon.

Well, the 16 sec maximum may be something useful to put into the DT as
that actually is a property of the h/w.

> But even that is not enough as all of the watchdog drivers attempt to
> *disable* the watchdog device before user space opens it. What good is a
> watchdog if it is disabled by the kernel and we got stuck before the daemon
> wakes up and re-enables it? This the problem with all of the watchdog
> drivers right now. There are plenty of products out there that can't deal
> with this kind of limitation. They are all hacking around it one way or
> another. If there is a crash, the watchdog must reset the device. I can't
> think of any other run time way to configure the watchdog for this kind of
> situation than having a device tree property for it.

I fully agree the current design is broken. We should fix that in a
generic way.

> What I am proposing here is a way to solve this without hacking. I was told
> to think also a way to defer starting the watchdog for a given timeout so
> that user space would have more time to wake up, which sounded like a good
> idea. And this obviously needs to be implemented so that the watchdog is
> guaranteed to reset the device should there be a crash of any kind that
> prevents the watchdog daemon from starting up. There are a lot of details
> that need to be taken care of properly and therefore watchdog core can't do
> much about it, which is why I thought there is no much point trying to do it
> in watchdog core.

Deferring would assume that the watchdog is not already enabled.

Putting in how long the kernel should service the watchdog in DT is
like putting soft or hard lockup detection times into DT. These are
kernel settings. If you need to change this, you should update your
kernel or kernel settings, not the DT.

Rob

>
> But never the less I can try to state this in the documentation just to make
> clear what we are trying to solve here.
>
> Thanks for all the good comments!
>
> -Timo
Guenter Roeck Feb. 20, 2015, 4:09 p.m. UTC | #15
On Fri, Feb 20, 2015 at 03:48:22PM +0800, Jean-Christophe PLAGNIOL-VILLARD wrote:
> 
> > On Feb 18, 2015, at 8:57 PM, Timo Kokkonen <timo.kokkonen@offcode.fi> wrote:
> > 
> > By default the driver will start a kernel timer which keeps on kicking
> > the watchdog HW until user space has opened the watchdog
> > device. Usually this is desirable as the watchdog HW is running by
> > default and the user space may not have any watchdog daemon running at
> > all.
> > 
> > However, on production systems it may be mandatory that also early
> > crashes and lockups will lead to a watchdog reset, even if they happen
> > before the user space has opened the watchdog device.
> > 
> > To resolve the issue, add a new device tree property
> > "early-timeout-sec" which will let the kernel timer to ping the
> > watchdog HW only as long as the specified timeout permits. The default
> > is still to use kernel timer, but more strict behavior can be enabled
> > via the device tree property.
> > 
> > Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> > ---
> > Documentation/devicetree/bindings/watchdog/watchdog.txt | 7 +++++++
> > drivers/watchdog/at91sam9_wdt.c                         | 9 ++++++++-
> 
> This should not be handled by the driver but the kernel in a generic way
> 
Sure, and it has been in the back of my mind for a couple of years.
Care to submit a set of patches to implement it ?

Guenter
Guenter Roeck Feb. 20, 2015, 4:28 p.m. UTC | #16
On Fri, Feb 20, 2015 at 08:06:23AM -0600, Rob Herring wrote:
> On Thu, Feb 19, 2015 at 12:14 AM, Timo Kokkonen
> <timo.kokkonen@offcode.fi> wrote:
> > Hi,
> >
> >
> > On 18.02.2015 23:11, Rob Herring wrote:
> >>
> >> On Wed, Feb 18, 2015 at 10:00 AM, Alexandre Belloni
> >> <alexandre.belloni@free-electrons.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> On 18/02/2015 at 06:50:44 -0800, Guenter Roeck wrote :
> >>>>>>>
> >>>>>>>    Optional properties:
> >>>>>>>    - timeout-sec: Contains the watchdog timeout in seconds.
> >>>>>>> +- early-timeout-sec: If present, specifies a timeout value in
> >>>>>>> seconds
> >>>>>>> +  that the driver keeps on ticking the watchdog HW on behalf of user
> >>>>>>> +  space. Once this timeout expires watchdog is left to expire in
> >>>>>>> +  timeout-sec seconds. If this propery is set to zero, watchdog is
> >>>>>>> +  started (or left running) so that a reset occurs in timeout-sec
> >>>>>>> +  since the watchdog was started.
> >>>>>>>
> >>>>>>>    Example:
> >>>>>>>
> >>>>>>>    watchdog {
> >>>>>>>             timeout-sec = <60>;
> >>>>>>> +   early-timeout-sec = <120>;
> >>>>>>
> >>>>>>
> >>>>>> That is not a generic property as you defined it; if so,
> >>>>>> it would have to be implemented in the watchdog core code,
> >>>>>> not in the at91 code. You'll have to document it in the bindings
> >>>>>> description for at91sam9_wdt.
> >>>>>
> >>>>>
> >>>>> Then, if this is a controller specific property, it should be defined
> >>>>> with the 'atmel,' prefix.
> >>>>> We're kind of looping here: the initial discussion was "is there a need
> >>>>> for this property to be a generic one ?", and now you're saying no,
> >>>>> while you previously left the door opened.
> >>>>>
> >>>>> Tomi is proposing a generic approach, as you asked him to. I agree that
> >>>>> parsing the property in core code and making its value part of the
> >>>>> generic watchdog struct makes sense (that's what I proposed to Tomi a
> >>>>> few weeks ago).
> >>>>>
> >>>> Hmm ... the problem here is that the property description creates the
> >>>> assumption or expectation that the property is used if defined,
> >>>> which is not the case.
> >>>>
> >>>> I am not sure how to best resolve this. Maybe a comment in the property
> >>>> description stating that implementation of is device (driver) dependent
> >>>> ?
> >>>> After all, that is true for the timeout-sec property as well.
> >>>>
> >>>
> >>> I would leave it in the generic file and state that it may not be
> >>> implemented in the driver. That way, the property is documented for new
> >>> driver writers.
> >>
> >>
> >> That is pretty much true of any optional property. Whether implemented
> >> in the driver or core is an implementation detail that does not belong
> >> in the binding.
> >>
> >> I find this property pretty questionable. Certainly having the kernel
> >> service a watchdog either enabled at reset, in the bootloader, or by
> >> the kernel is a useful feature. A timeout for "how long userspace
> >> watchdog daemon takes to start" does not belong in DT. timeout-sec
> >> should be the default/initial timeout and long enough for userspace to
> >> start. Userspace can then change it to a more suitable value.
> >
> >
> > That would be a good workaround if we had enough time in the watchdog HW to
> > wait long enough for the user space to start up. For example in atmel HW the
> > maximum is 16 seconds, which may not be enough for the kernel to boot up and
> > the user space to start the watchdog daemon.
> 
> Well, the 16 sec maximum may be something useful to put into the DT as
> that actually is a property of the h/w.
> 
> > But even that is not enough as all of the watchdog drivers attempt to
> > *disable* the watchdog device before user space opens it. What good is a
> > watchdog if it is disabled by the kernel and we got stuck before the daemon
> > wakes up and re-enables it? This the problem with all of the watchdog
> > drivers right now. There are plenty of products out there that can't deal
> > with this kind of limitation. They are all hacking around it one way or
> > another. If there is a crash, the watchdog must reset the device. I can't
> > think of any other run time way to configure the watchdog for this kind of
> > situation than having a device tree property for it.
> 
> I fully agree the current design is broken. We should fix that in a
> generic way.
> 
> > What I am proposing here is a way to solve this without hacking. I was told
> > to think also a way to defer starting the watchdog for a given timeout so
> > that user space would have more time to wake up, which sounded like a good
> > idea. And this obviously needs to be implemented so that the watchdog is
> > guaranteed to reset the device should there be a crash of any kind that
> > prevents the watchdog daemon from starting up. There are a lot of details
> > that need to be taken care of properly and therefore watchdog core can't do
> > much about it, which is why I thought there is no much point trying to do it
> > in watchdog core.
> 
> Deferring would assume that the watchdog is not already enabled.
> 
> Putting in how long the kernel should service the watchdog in DT is
> like putting soft or hard lockup detection times into DT. These are
> kernel settings. If you need to change this, you should update your
> kernel or kernel settings, not the DT.
> 
The time to userspace handover may differ from HW variant to HW variant.
Some may load faster, some may load slower.

Similar, the runtime watchdog timeout may be different from system
to system.  On a system with faster CPU, and/or one with faster io,
one may want (or need) a faster watchdog timeout. I assumed that
was accepted and understood when the timeout-sec property was
introduced a long time ago, but maybe not.

Yes, the problem should be resolved in a generic way. This has been
on my mind for a long time, including the problem if a watchdog should
or should not stay or become enabled during early boot. All those are
system properties which should be addressed generically, and there
should be a means to express those properties in devicetree.

Problem goes back into the old back-and-forth of what can be in devicetree
or not. Sorting that out always takes a long time and a substantial amount
of effort. Unfortunately, I don't have that time (writing the code would
probably be trivial in comparison).

If someone is willing and able to spend the necessary time to negotiate 
acceptable devicetree properties and to write the necessary code, I'll be
more than happy to review and as much as possible test the resulting patches,
and I am sure that Wim will be happy to accept them. Until then we'll have
to live with what we have today.

Thanks,
Guenter
Jean-Christophe PLAGNIOL-VILLARD Feb. 20, 2015, 4:33 p.m. UTC | #17
> On Feb 20, 2015, at 3:51 PM, Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
> 
> Hi Jean-Christophe,
> 
> On Fri, 20 Feb 2015 15:48:22 +0800
> Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> wrote:
> 
>> 
>>> On Feb 18, 2015, at 8:57 PM, Timo Kokkonen <timo.kokkonen@offcode.fi> wrote:
>>> 
>>> By default the driver will start a kernel timer which keeps on kicking
>>> the watchdog HW until user space has opened the watchdog
>>> device. Usually this is desirable as the watchdog HW is running by
>>> default and the user space may not have any watchdog daemon running at
>>> all.
>>> 
>>> However, on production systems it may be mandatory that also early
>>> crashes and lockups will lead to a watchdog reset, even if they happen
>>> before the user space has opened the watchdog device.
>>> 
>>> To resolve the issue, add a new device tree property
>>> "early-timeout-sec" which will let the kernel timer to ping the
>>> watchdog HW only as long as the specified timeout permits. The default
>>> is still to use kernel timer, but more strict behavior can be enabled
>>> via the device tree property.
>>> 
>>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
>>> ---
>>> Documentation/devicetree/bindings/watchdog/watchdog.txt | 7 +++++++
>>> drivers/watchdog/at91sam9_wdt.c                         | 9 ++++++++-
>> 
>> This should not be handled by the driver but the kernel in a generic way
> 
> Could you detail a bit more what you have in mind ?

move this timeout on the linux thread that keep alive the watchdog not in the driver

Best Regards,
J.
> 
> Best Regards,
> 
> Boris
> 
> -- 
> Boris Brezillon, Free Electrons
> Embedded Linux and Kernel engineering
> http://free-electrons.com
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Boris BREZILLON Feb. 20, 2015, 5:16 p.m. UTC | #18
Hi Jean-Christophe,

On Sat, 21 Feb 2015 00:33:17 +0800
Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> wrote:

> 
> > On Feb 20, 2015, at 3:51 PM, Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
> > 
> > Hi Jean-Christophe,
> > 
> > On Fri, 20 Feb 2015 15:48:22 +0800
> > Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> wrote:
> > 
> >> 
> >>> On Feb 18, 2015, at 8:57 PM, Timo Kokkonen <timo.kokkonen@offcode.fi> wrote:
> >>> 
> >>> By default the driver will start a kernel timer which keeps on kicking
> >>> the watchdog HW until user space has opened the watchdog
> >>> device. Usually this is desirable as the watchdog HW is running by
> >>> default and the user space may not have any watchdog daemon running at
> >>> all.
> >>> 
> >>> However, on production systems it may be mandatory that also early
> >>> crashes and lockups will lead to a watchdog reset, even if they happen
> >>> before the user space has opened the watchdog device.
> >>> 
> >>> To resolve the issue, add a new device tree property
> >>> "early-timeout-sec" which will let the kernel timer to ping the
> >>> watchdog HW only as long as the specified timeout permits. The default
> >>> is still to use kernel timer, but more strict behavior can be enabled
> >>> via the device tree property.
> >>> 
> >>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> >>> ---
> >>> Documentation/devicetree/bindings/watchdog/watchdog.txt | 7 +++++++
> >>> drivers/watchdog/at91sam9_wdt.c                         | 9 ++++++++-
> >> 
> >> This should not be handled by the driver but the kernel in a generic way
> > 
> > Could you detail a bit more what you have in mind ?
> 
> move this timeout on the linux thread that keep alive the watchdog not in the driver

AFAIK there's no such thing (if there is, could you point me to the
source file where this thread is defined ?), and each driver are
registering their own timer (if they need one).
If you're suggesting to add such common logic to watchdog core, why
don't you propose something ?

Timo's need is quite generic, but nobody seemed to bother with that
before.
Moreover, using an at91 specific implementation does not prevent
migrating to a more generic implementation when it's available.
Actually, it's rather difficult to design a generic infrastructure until
you have dealt with several devices requiring the same feature, and
that's obviously not the case here.

Best Regards,

Boris
Guenter Roeck Feb. 20, 2015, 6:06 p.m. UTC | #19
On Fri, Feb 20, 2015 at 06:16:40PM +0100, Boris Brezillon wrote:
> Hi Jean-Christophe,
> 
> On Sat, 21 Feb 2015 00:33:17 +0800
> Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> wrote:
> 
> > 
> > > On Feb 20, 2015, at 3:51 PM, Boris Brezillon <boris.brezillon@free-electrons.com> wrote:
> > > 
> > > Hi Jean-Christophe,
> > > 
> > > On Fri, 20 Feb 2015 15:48:22 +0800
> > > Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> wrote:
> > > 
> > >> 
> > >>> On Feb 18, 2015, at 8:57 PM, Timo Kokkonen <timo.kokkonen@offcode.fi> wrote:
> > >>> 
> > >>> By default the driver will start a kernel timer which keeps on kicking
> > >>> the watchdog HW until user space has opened the watchdog
> > >>> device. Usually this is desirable as the watchdog HW is running by
> > >>> default and the user space may not have any watchdog daemon running at
> > >>> all.
> > >>> 
> > >>> However, on production systems it may be mandatory that also early
> > >>> crashes and lockups will lead to a watchdog reset, even if they happen
> > >>> before the user space has opened the watchdog device.
> > >>> 
> > >>> To resolve the issue, add a new device tree property
> > >>> "early-timeout-sec" which will let the kernel timer to ping the
> > >>> watchdog HW only as long as the specified timeout permits. The default
> > >>> is still to use kernel timer, but more strict behavior can be enabled
> > >>> via the device tree property.
> > >>> 
> > >>> Signed-off-by: Timo Kokkonen <timo.kokkonen@offcode.fi>
> > >>> ---
> > >>> Documentation/devicetree/bindings/watchdog/watchdog.txt | 7 +++++++
> > >>> drivers/watchdog/at91sam9_wdt.c                         | 9 ++++++++-
> > >> 
> > >> This should not be handled by the driver but the kernel in a generic way
> > > 
> > > Could you detail a bit more what you have in mind ?
> > 
> > move this timeout on the linux thread that keep alive the watchdog not in the driver
> 
> AFAIK there's no such thing (if there is, could you point me to the
> source file where this thread is defined ?), and each driver are
> registering their own timer (if they need one).
> If you're suggesting to add such common logic to watchdog core, why
> don't you propose something ?
> 
> Timo's need is quite generic, but nobody seemed to bother with that
> before.

The problem has been discussed before. There are even some patches,
but they were too specific and limited in scope for my liking.

As I said in my other reply, to move forward we would need
someone who has the time and energy to get an agreement with the
DT folks about an acceptable means to express the properties needed
for a specific hardware, and to actually implement the necessary code.

> Moreover, using an at91 specific implementation does not prevent
> migrating to a more generic implementation when it's available.
> Actually, it's rather difficult to design a generic infrastructure until
> you have dealt with several devices requiring the same feature, and
> that's obviously not the case here.
> 
Absolutely agree. If we can not even get a property like the one suggested
here accepted, it is completely pointless to even think about a more
generic solution that would work for all watchdog drivers.

Guenter
Boris BREZILLON Feb. 20, 2015, 7:43 p.m. UTC | #20
On Fri, 20 Feb 2015 08:28:55 -0800
Guenter Roeck <linux@roeck-us.net> wrote:

> On Fri, Feb 20, 2015 at 08:06:23AM -0600, Rob Herring wrote:
> > On Thu, Feb 19, 2015 at 12:14 AM, Timo Kokkonen
> > <timo.kokkonen@offcode.fi> wrote:
> > > Hi,
> > >

[...]

> > > What I am proposing here is a way to solve this without hacking. I was told
> > > to think also a way to defer starting the watchdog for a given timeout so
> > > that user space would have more time to wake up, which sounded like a good
> > > idea. And this obviously needs to be implemented so that the watchdog is
> > > guaranteed to reset the device should there be a crash of any kind that
> > > prevents the watchdog daemon from starting up. There are a lot of details
> > > that need to be taken care of properly and therefore watchdog core can't do
> > > much about it, which is why I thought there is no much point trying to do it
> > > in watchdog core.
> > 
> > Deferring would assume that the watchdog is not already enabled.
> > 
> > Putting in how long the kernel should service the watchdog in DT is
> > like putting soft or hard lockup detection times into DT. These are
> > kernel settings. If you need to change this, you should update your
> > kernel or kernel settings, not the DT.
> > 
> The time to userspace handover may differ from HW variant to HW variant.
> Some may load faster, some may load slower.
> 
> Similar, the runtime watchdog timeout may be different from system
> to system.  On a system with faster CPU, and/or one with faster io,
> one may want (or need) a faster watchdog timeout. I assumed that
> was accepted and understood when the timeout-sec property was
> introduced a long time ago, but maybe not.
> 
> Yes, the problem should be resolved in a generic way. This has been
> on my mind for a long time, including the problem if a watchdog should
> or should not stay or become enabled during early boot. All those are
> system properties which should be addressed generically, and there
> should be a means to express those properties in devicetree.
> 
> Problem goes back into the old back-and-forth of what can be in devicetree
> or not. Sorting that out always takes a long time and a substantial amount
> of effort. Unfortunately, I don't have that time (writing the code would
> probably be trivial in comparison).

I once promised myself to propose this stupid idea, so here it is:
How about creating a new concept called CT (for Config Tree), that
would reuse the DT syntax and tree representation and add system config
information to the appropriate devices (something like an overlay
you'll put on top of your DT).

This way nobody could ever complain about config information being
stored in the DT, and people needing system config (like the one
described here) could put them in this CT.
CT files would be stored in files with a .cts or .ctsi extension, ...
I'm stopping here, I think you got my point.

Joke apart, I've had the same kind of discussion quite often lately.
While I understand DT guys concern to control which property should go
into the DT which ones should not, some system config are better
described next to the device they are attached to rather than in the
cmdline.

Best Regards,

Boris
Guenter Roeck Feb. 20, 2015, 8:04 p.m. UTC | #21
On Fri, Feb 20, 2015 at 08:43:58PM +0100, Boris Brezillon wrote:
> On Fri, 20 Feb 2015 08:28:55 -0800
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Fri, Feb 20, 2015 at 08:06:23AM -0600, Rob Herring wrote:
> > > On Thu, Feb 19, 2015 at 12:14 AM, Timo Kokkonen
> > > <timo.kokkonen@offcode.fi> wrote:
> > > > Hi,
> > > >
> 
> [...]
> 
> > > > What I am proposing here is a way to solve this without hacking. I was told
> > > > to think also a way to defer starting the watchdog for a given timeout so
> > > > that user space would have more time to wake up, which sounded like a good
> > > > idea. And this obviously needs to be implemented so that the watchdog is
> > > > guaranteed to reset the device should there be a crash of any kind that
> > > > prevents the watchdog daemon from starting up. There are a lot of details
> > > > that need to be taken care of properly and therefore watchdog core can't do
> > > > much about it, which is why I thought there is no much point trying to do it
> > > > in watchdog core.
> > > 
> > > Deferring would assume that the watchdog is not already enabled.
> > > 
> > > Putting in how long the kernel should service the watchdog in DT is
> > > like putting soft or hard lockup detection times into DT. These are
> > > kernel settings. If you need to change this, you should update your
> > > kernel or kernel settings, not the DT.
> > > 
> > The time to userspace handover may differ from HW variant to HW variant.
> > Some may load faster, some may load slower.
> > 
> > Similar, the runtime watchdog timeout may be different from system
> > to system.  On a system with faster CPU, and/or one with faster io,
> > one may want (or need) a faster watchdog timeout. I assumed that
> > was accepted and understood when the timeout-sec property was
> > introduced a long time ago, but maybe not.
> > 
> > Yes, the problem should be resolved in a generic way. This has been
> > on my mind for a long time, including the problem if a watchdog should
> > or should not stay or become enabled during early boot. All those are
> > system properties which should be addressed generically, and there
> > should be a means to express those properties in devicetree.
> > 
> > Problem goes back into the old back-and-forth of what can be in devicetree
> > or not. Sorting that out always takes a long time and a substantial amount
> > of effort. Unfortunately, I don't have that time (writing the code would
> > probably be trivial in comparison).
> 
> I once promised myself to propose this stupid idea, so here it is:
> How about creating a new concept called CT (for Config Tree), that
> would reuse the DT syntax and tree representation and add system config
> information to the appropriate devices (something like an overlay
> you'll put on top of your DT).
> 

We actually use that approach at work - essentially a system configuration
using dt syntax. It even uses the dt compiler to create dtb files,
only the dtb files are kept as part of the application software.

Problem here though is that we are not talking about configuration,
but about system properties. Secondary problem is that those system
properties can be abused, and "it can be abused, let's reject it"
is always easy. This is made more complicated by the fact that very
few people are willing or even able to go through the exercise of
clearly distinguishing system configuration vs. system properties
when proposing new dt properties, especially since this can be a very
tricky slope and may require 90% communication skills and 10% engineering
skills (use the term 'configuration' once and you are out). Not really
sure how to solve that problem, or if it is even possible to solve it.

> Joke apart, I've had the same kind of discussion quite often lately.
> While I understand DT guys concern to control which property should go
> into the DT which ones should not, some system config are better
> described next to the device they are attached to rather than in the
> cmdline.
> 
I think we are very much in agreement here.

Thanks,
Guenter
Timo Kokkonen Feb. 23, 2015, 7:29 a.m. UTC | #22
Hi,

On 20.02.2015 20:06, Guenter Roeck wrote:
> On Fri, Feb 20, 2015 at 06:16:40PM +0100, Boris Brezillon wrote:
>> Hi Jean-Christophe,
>>
>> On Sat, 21 Feb 2015 00:33:17 +0800
>> Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> wrote:
>>
>>
>> Timo's need is quite generic, but nobody seemed to bother with that
>> before.
>
> The problem has been discussed before. There are even some patches,
> but they were too specific and limited in scope for my liking.
>
> As I said in my other reply, to move forward we would need
> someone who has the time and energy to get an agreement with the
> DT folks about an acceptable means to express the properties needed
> for a specific hardware, and to actually implement the necessary code.
>
>> Moreover, using an at91 specific implementation does not prevent
>> migrating to a more generic implementation when it's available.
>> Actually, it's rather difficult to design a generic infrastructure until
>> you have dealt with several devices requiring the same feature, and
>> that's obviously not the case here.
>>
> Absolutely agree. If we can not even get a property like the one suggested
> here accepted, it is completely pointless to even think about a more
> generic solution that would work for all watchdog drivers.
>

I'm not really sure that I understand what we are really arguing here, 
but seems that this is not getting any forward unless we get in touch 
with the DT people who get to decide whether this kind of property 
belongs to device tree or not. Who are these people anyway? Which list 
should I write an email to get in touch with them? Why are we not CC'ing 
them already?

Anyway, the way I tried to express it in the v4 patch set, we have a 
generic device tree property that does not try to imply any sort of 
implementation or HW details. The description in watchdog.txt tries to 
state the purpose of the property clearly so that other driver writers 
could make other drivers to support it correctly. And then there is at91 
specific implementation because that's the only watchdog hardware 
currently on my desk that I can easily test it with.

I can't think of how I could make this more generic, not without making 
more changes to the way drivers initialize itself with the watchdog 
core. And that would require changing a lot of drivers, at least if we 
intend to make it work so that the watchdog core takes care of this 
instead of the driver. It's a nice idea but I think it's overly complex 
given the amount of functionality there needs to be in different drivers 
and the diversity between drivers.

To me there is nothing wrong with having this done also via a kernel 
configuration option. We could simply have 
CONFIG_WATCHDOG_EARLY_TIMEOUT_SEC option that works exactly the same way 
as the proposed device tree property. To make it clear only some drivers 
support this option, we could let each driver select an enabler config 
option CONFIG_WATCHDOG_HAS_DEFERRABLE_EARLY_RESET or such that is used 
to hide the config option unless we are building a watchdog that 
supports the given option. Or something like that. But that would be 
less flexible, as if we want to run the same kernel binary on different 
arm boards we can't make these values per board any more. The use case I 
am dealing with doesn't care about this, but I was thinking someone else 
might care. Which is why I thought it should be run time configurable 
via device tree instead of a compile time option.

But now that I have mentioned arm boards, I noticed we are talking about 
generic behaviour and there are also watchdogs running on architectures 
where device trees are not supported. That said, it might be better to 
make it compile time configurable now and add other configuration 
options later.

Any thoughts about that?

Thanks,
-Timo
Boris BREZILLON Feb. 23, 2015, 8:51 a.m. UTC | #23
Hi Timo,

On Mon, 23 Feb 2015 09:29:41 +0200
Timo Kokkonen <timo.kokkonen@offcode.fi> wrote:

> Hi,
> 
> On 20.02.2015 20:06, Guenter Roeck wrote:
> > On Fri, Feb 20, 2015 at 06:16:40PM +0100, Boris Brezillon wrote:
> >> Hi Jean-Christophe,
> >>
> >> On Sat, 21 Feb 2015 00:33:17 +0800
> >> Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> wrote:
> >>
> >>
> >> Timo's need is quite generic, but nobody seemed to bother with that
> >> before.
> >
> > The problem has been discussed before. There are even some patches,
> > but they were too specific and limited in scope for my liking.
> >
> > As I said in my other reply, to move forward we would need
> > someone who has the time and energy to get an agreement with the
> > DT folks about an acceptable means to express the properties needed
> > for a specific hardware, and to actually implement the necessary code.
> >
> >> Moreover, using an at91 specific implementation does not prevent
> >> migrating to a more generic implementation when it's available.
> >> Actually, it's rather difficult to design a generic infrastructure until
> >> you have dealt with several devices requiring the same feature, and
> >> that's obviously not the case here.
> >>
> > Absolutely agree. If we can not even get a property like the one suggested
> > here accepted, it is completely pointless to even think about a more
> > generic solution that would work for all watchdog drivers.
> >
> 
> I'm not really sure that I understand what we are really arguing here, 
> but seems that this is not getting any forward unless we get in touch 
> with the DT people who get to decide whether this kind of property 
> belongs to device tree or not. Who are these people anyway? Which list 
> should I write an email to get in touch with them? Why are we not CC'ing 
> them already?

I thought they were in the recipient list, but they are not.
If you want to be sure to include all the relevant maintainers and MLs
you can use scripts/get_maintainer.pl, this would have given you the DT
maintainers and DT ML.

BTW, the discussion has already started (Rob is a DT maintainer) ;-).

> 
> Anyway, the way I tried to express it in the v4 patch set, we have a 
> generic device tree property that does not try to imply any sort of 
> implementation or HW details. The description in watchdog.txt tries to 
> state the purpose of the property clearly so that other driver writers 
> could make other drivers to support it correctly. And then there is at91 
> specific implementation because that's the only watchdog hardware 
> currently on my desk that I can easily test it with.

The debate here is not whether the property is generic enough or not,
but whether it is hardware or software related.
DTs are supposed to describe HW and not the use you make of it (HW
configuration), the problem is: sometime you have to describe that to
get your hardware to work properly.

> 
> I can't think of how I could make this more generic, not without making 
> more changes to the way drivers initialize itself with the watchdog 
> core. And that would require changing a lot of drivers, at least if we 
> intend to make it work so that the watchdog core takes care of this 
> instead of the driver. It's a nice idea but I think it's overly complex 
> given the amount of functionality there needs to be in different drivers 
> and the diversity between drivers.
> 
> To me there is nothing wrong with having this done also via a kernel 
> configuration option. We could simply have 
> CONFIG_WATCHDOG_EARLY_TIMEOUT_SEC option that works exactly the same way 
> as the proposed device tree property. To make it clear only some drivers 
> support this option, we could let each driver select an enabler config 
> option CONFIG_WATCHDOG_HAS_DEFERRABLE_EARLY_RESET or such that is used 
> to hide the config option unless we are building a watchdog that 
> supports the given option. Or something like that. But that would be 
> less flexible, as if we want to run the same kernel binary on different 
> arm boards we can't make these values per board any more. The use case I 
> am dealing with doesn't care about this, but I was thinking someone else 
> might care. Which is why I thought it should be run time configurable 
> via device tree instead of a compile time option.

IMHO, a viable alternative to the early-timeout-sec property would be a
module parameter passed through the kernel cmdline.
But as I said, attaching such parameters to the appropriate watchdog
device (say you have several watchdogs in your system) is not as easy as
using the DT representation.

Is early-timeout-sec generic enough to be global to the watchdog
framework ?

> 
> But now that I have mentioned arm boards, I noticed we are talking about 
> generic behaviour and there are also watchdogs running on architectures 
> where device trees are not supported. That said, it might be better to 
> make it compile time configurable now and add other configuration 
> options later.

As long as the early_timeout_sec is part of the generic watchdog
struct, your driver shouldn't care where the value comes from (pdata,
DT, ACPI, ...), this is why making DT parsing code part of the core is
a better approach IMO.

> 
> Any thoughts about that?

As I said, the Kconfig option does not seem the right approach to
me, using a module parameter would be more appropriate IMO.


Best Regards,

Boris
Timo Kokkonen Feb. 23, 2015, 9:11 a.m. UTC | #24
On 23.02.2015 10:51, Boris Brezillon wrote:
> Hi Timo,
>
> On Mon, 23 Feb 2015 09:29:41 +0200
> Timo Kokkonen <timo.kokkonen@offcode.fi> wrote:
>
>> Hi,
>>
>> On 20.02.2015 20:06, Guenter Roeck wrote:
>>> On Fri, Feb 20, 2015 at 06:16:40PM +0100, Boris Brezillon wrote:
>>>> Hi Jean-Christophe,
>>>>
>>>> On Sat, 21 Feb 2015 00:33:17 +0800
>>>> Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> wrote:
>>>>
>>>>
>>>> Timo's need is quite generic, but nobody seemed to bother with that
>>>> before.
>>>
>>> The problem has been discussed before. There are even some patches,
>>> but they were too specific and limited in scope for my liking.
>>>
>>> As I said in my other reply, to move forward we would need
>>> someone who has the time and energy to get an agreement with the
>>> DT folks about an acceptable means to express the properties needed
>>> for a specific hardware, and to actually implement the necessary code.
>>>
>>>> Moreover, using an at91 specific implementation does not prevent
>>>> migrating to a more generic implementation when it's available.
>>>> Actually, it's rather difficult to design a generic infrastructure until
>>>> you have dealt with several devices requiring the same feature, and
>>>> that's obviously not the case here.
>>>>
>>> Absolutely agree. If we can not even get a property like the one suggested
>>> here accepted, it is completely pointless to even think about a more
>>> generic solution that would work for all watchdog drivers.
>>>
>>
>> I'm not really sure that I understand what we are really arguing here,
>> but seems that this is not getting any forward unless we get in touch
>> with the DT people who get to decide whether this kind of property
>> belongs to device tree or not. Who are these people anyway? Which list
>> should I write an email to get in touch with them? Why are we not CC'ing
>> them already?
>
> I thought they were in the recipient list, but they are not.
> If you want to be sure to include all the relevant maintainers and MLs
> you can use scripts/get_maintainer.pl, this would have given you the DT
> maintainers and DT ML.
>
> BTW, the discussion has already started (Rob is a DT maintainer) ;-).
>

Right, indeed. Good. :)

>>
>> Anyway, the way I tried to express it in the v4 patch set, we have a
>> generic device tree property that does not try to imply any sort of
>> implementation or HW details. The description in watchdog.txt tries to
>> state the purpose of the property clearly so that other driver writers
>> could make other drivers to support it correctly. And then there is at91
>> specific implementation because that's the only watchdog hardware
>> currently on my desk that I can easily test it with.
>
> The debate here is not whether the property is generic enough or not,
> but whether it is hardware or software related.
> DTs are supposed to describe HW and not the use you make of it (HW
> configuration), the problem is: sometime you have to describe that to
> get your hardware to work properly.
>
>>
>> I can't think of how I could make this more generic, not without making
>> more changes to the way drivers initialize itself with the watchdog
>> core. And that would require changing a lot of drivers, at least if we
>> intend to make it work so that the watchdog core takes care of this
>> instead of the driver. It's a nice idea but I think it's overly complex
>> given the amount of functionality there needs to be in different drivers
>> and the diversity between drivers.
>>
>> To me there is nothing wrong with having this done also via a kernel
>> configuration option. We could simply have
>> CONFIG_WATCHDOG_EARLY_TIMEOUT_SEC option that works exactly the same way
>> as the proposed device tree property. To make it clear only some drivers
>> support this option, we could let each driver select an enabler config
>> option CONFIG_WATCHDOG_HAS_DEFERRABLE_EARLY_RESET or such that is used
>> to hide the config option unless we are building a watchdog that
>> supports the given option. Or something like that. But that would be
>> less flexible, as if we want to run the same kernel binary on different
>> arm boards we can't make these values per board any more. The use case I
>> am dealing with doesn't care about this, but I was thinking someone else
>> might care. Which is why I thought it should be run time configurable
>> via device tree instead of a compile time option.
>
> IMHO, a viable alternative to the early-timeout-sec property would be a
> module parameter passed through the kernel cmdline.
> But as I said, attaching such parameters to the appropriate watchdog
> device (say you have several watchdogs in your system) is not as easy as
> using the DT representation.
>
> Is early-timeout-sec generic enough to be global to the watchdog
> framework ?
>

If we have multiple watchdogs running, each of the watchdog might have 
different meaning, so it may not be enough to have one global watchdog 
property that is global to all watchdogs. We'd need someone to comment 
who has multiple watchdogs..

>>
>> But now that I have mentioned arm boards, I noticed we are talking about
>> generic behaviour and there are also watchdogs running on architectures
>> where device trees are not supported. That said, it might be better to
>> make it compile time configurable now and add other configuration
>> options later.
>
> As long as the early_timeout_sec is part of the generic watchdog
> struct, your driver shouldn't care where the value comes from (pdata,
> DT, ACPI, ...), this is why making DT parsing code part of the core is
> a better approach IMO.
>
>>
>> Any thoughts about that?
>
> As I said, the Kconfig option does not seem the right approach to
> me, using a module parameter would be more appropriate IMO.
>

After all, we do have already watchdog_init_timeout function that parses 
the watchdog timeout argument from either device tree or from command 
line. How about if we expanded that interface? Maybe have a more generic 
watchdog_init_parmas() or something that parses the generic watchdog 
arguments, either from command line, device tree or ACPI or something. 
The device driver could replace call from watchdog_init_timeout() to 
watchdog_init_params() once it is ready to support other generic 
parameters, such as early-timeout-sec. Then the watchdog driver could do 
the right thing about whether watchdog should be left running or stopped 
and how long time should be given.

Alternatively, we could also let the watchdog core know a little more 
about the actual watchdog hardware, such as whether the HW is stoppable, 
whether it needs manual pinging by the kernel until user space has taken 
over. Or maybe we can just extend the timeout values until the user 
space has first opened it and then shorten the timeout automatically so 
that it doesn't take that long for the device to reset after a crash. Or 
some other behaviour that is common to many users. Suggestions are welcome.

Anyway, that is something that needs to be done to make watchdog core 
take more control over more of the generic watchdog behaviour. It just 
needs to be done so that we don't need to convert all drivers at once.

Thanks,
-Timo
Guenter Roeck Feb. 23, 2015, 4:19 p.m. UTC | #25
On Mon, Feb 23, 2015 at 11:11:38AM +0200, Timo Kokkonen wrote:
[ ... ]

> 
> After all, we do have already watchdog_init_timeout function that parses the
> watchdog timeout argument from either device tree or from command line. How
> about if we expanded that interface? Maybe have a more generic
> watchdog_init_parmas() or something that parses the generic watchdog
> arguments, either from command line, device tree or ACPI or something. The
> device driver could replace call from watchdog_init_timeout() to
> watchdog_init_params() once it is ready to support other generic parameters,
> such as early-timeout-sec. Then the watchdog driver could do the right thing
> about whether watchdog should be left running or stopped and how long time
> should be given.
> 
Good idea.

> Alternatively, we could also let the watchdog core know a little more about
> the actual watchdog hardware, such as whether the HW is stoppable, whether
> it needs manual pinging by the kernel until user space has taken over. Or

Yes, all that will be needed. But, still, the stop-gap is that we'll need to
get buyin from the DT folks for the necessary properties. I have had the
outline for the necessary watchdog core implementation in my mind for a long
time, but I just don't have the time (nor the patience, quite frankly)
to get DT buyin.

> maybe we can just extend the timeout values until the user space has first
> opened it and then shorten the timeout automatically so that it doesn't take
> that long for the device to reset after a crash. Or some other behaviour
> that is common to many users. Suggestions are welcome.
> 
> Anyway, that is something that needs to be done to make watchdog core take
> more control over more of the generic watchdog behaviour. It just needs to
> be done so that we don't need to convert all drivers at once.
> 
Correct. It should not be that difficult do do that if designed properly.

Guenter
Rob Herring Feb. 23, 2015, 5:10 p.m. UTC | #26
On Mon, Feb 23, 2015 at 10:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> On Mon, Feb 23, 2015 at 11:11:38AM +0200, Timo Kokkonen wrote:
> [ ... ]
>
>>
>> After all, we do have already watchdog_init_timeout function that parses the
>> watchdog timeout argument from either device tree or from command line. How
>> about if we expanded that interface? Maybe have a more generic
>> watchdog_init_parmas() or something that parses the generic watchdog
>> arguments, either from command line, device tree or ACPI or something. The
>> device driver could replace call from watchdog_init_timeout() to
>> watchdog_init_params() once it is ready to support other generic parameters,
>> such as early-timeout-sec. Then the watchdog driver could do the right thing
>> about whether watchdog should be left running or stopped and how long time
>> should be given.
>>
> Good idea.
>
>> Alternatively, we could also let the watchdog core know a little more about
>> the actual watchdog hardware, such as whether the HW is stoppable, whether
>> it needs manual pinging by the kernel until user space has taken over. Or

Whether the h/w is stoppable or not is certainly a reasonable DT
property. The maximum h/w timeout would be too (although that may just
be a property of counter size and clock rate).

> Yes, all that will be needed. But, still, the stop-gap is that we'll need to
> get buyin from the DT folks for the necessary properties. I have had the
> outline for the necessary watchdog core implementation in my mind for a long
> time, but I just don't have the time (nor the patience, quite frankly)
> to get DT buyin.

Defining wdog core functionality and behavior has little to do with DT
and you don't need buy in. Presumably you care about non-DT platforms
as well. Moving more of the intelligence to the core would be a good
thing and for the most part should be independent of DT.

Rob

>> maybe we can just extend the timeout values until the user space has first
>> opened it and then shorten the timeout automatically so that it doesn't take
>> that long for the device to reset after a crash. Or some other behaviour
>> that is common to many users. Suggestions are welcome.
>>
>> Anyway, that is something that needs to be done to make watchdog core take
>> more control over more of the generic watchdog behaviour. It just needs to
>> be done so that we don't need to convert all drivers at once.
>>
> Correct. It should not be that difficult do do that if designed properly.
>
> Guenter
Guenter Roeck Feb. 23, 2015, 5:43 p.m. UTC | #27
On Mon, Feb 23, 2015 at 11:10:13AM -0600, Rob Herring wrote:
> On Mon, Feb 23, 2015 at 10:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> > On Mon, Feb 23, 2015 at 11:11:38AM +0200, Timo Kokkonen wrote:
> > [ ... ]
> >
> >>
> >> After all, we do have already watchdog_init_timeout function that parses the
> >> watchdog timeout argument from either device tree or from command line. How
> >> about if we expanded that interface? Maybe have a more generic
> >> watchdog_init_parmas() or something that parses the generic watchdog
> >> arguments, either from command line, device tree or ACPI or something. The
> >> device driver could replace call from watchdog_init_timeout() to
> >> watchdog_init_params() once it is ready to support other generic parameters,
> >> such as early-timeout-sec. Then the watchdog driver could do the right thing
> >> about whether watchdog should be left running or stopped and how long time
> >> should be given.
> >>
> > Good idea.
> >
> >> Alternatively, we could also let the watchdog core know a little more about
> >> the actual watchdog hardware, such as whether the HW is stoppable, whether
> >> it needs manual pinging by the kernel until user space has taken over. Or
> 
> Whether the h/w is stoppable or not is certainly a reasonable DT
> property. The maximum h/w timeout would be too (although that may just
> be a property of counter size and clock rate).
> 
> > Yes, all that will be needed. But, still, the stop-gap is that we'll need to
> > get buyin from the DT folks for the necessary properties. I have had the
> > outline for the necessary watchdog core implementation in my mind for a long
> > time, but I just don't have the time (nor the patience, quite frankly)
> > to get DT buyin.
> 
> Defining wdog core functionality and behavior has little to do with DT
> and you don't need buy in. Presumably you care about non-DT platforms
> as well. Moving more of the intelligence to the core would be a good
> thing and for the most part should be independent of DT.
> 

Even so, if we can not express the HW with DT as needed, we would end up
having to work around DT limitations (like using module parameters on top
of DT). I just don't have the time to do that. Maybe someone else does.
As mentioned earlier, I'll be happy to review and as much as possible test
patches.

Guenter
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/watchdog/watchdog.txt b/Documentation/devicetree/bindings/watchdog/watchdog.txt
index 7e3686c..32647cf 100644
--- a/Documentation/devicetree/bindings/watchdog/watchdog.txt
+++ b/Documentation/devicetree/bindings/watchdog/watchdog.txt
@@ -4,9 +4,16 @@  using these definitions.
 
 Optional properties:
 - timeout-sec: Contains the watchdog timeout in seconds.
+- early-timeout-sec: If present, specifies a timeout value in seconds
+  that the driver keeps on ticking the watchdog HW on behalf of user
+  space. Once this timeout expires watchdog is left to expire in
+  timeout-sec seconds. If this propery is set to zero, watchdog is
+  started (or left running) so that a reset occurs in timeout-sec
+  since the watchdog was started.
 
 Example:
 
 watchdog {
 	 timeout-sec = <60>;
+	 early-timeout-sec = <120>;
 };
diff --git a/drivers/watchdog/at91sam9_wdt.c b/drivers/watchdog/at91sam9_wdt.c
index 6df9405..1b40bfa 100644
--- a/drivers/watchdog/at91sam9_wdt.c
+++ b/drivers/watchdog/at91sam9_wdt.c
@@ -89,6 +89,8 @@  struct at91wdt {
 	u32 mr_mask;
 	unsigned long heartbeat;	/* WDT heartbeat in jiffies */
 	bool nowayout;
+	/* Timeout in jiffies for stopping the early timer */
+	unsigned long early_timer;
 	unsigned int irq;
 };
 
@@ -122,7 +124,8 @@  static void at91_ping(unsigned long data)
 {
 	struct at91wdt *wdt = (struct at91wdt *)data;
 	if (time_before(jiffies, wdt->next_heartbeat) ||
-	    !watchdog_active(&wdt->wdd)) {
+		(time_before(jiffies, wdt->early_timer) &&
+			!watchdog_active(&wdt->wdd))) {
 		at91_wdt_reset(wdt);
 		mod_timer(&wdt->timer, jiffies + wdt->heartbeat);
 	} else {
@@ -316,6 +319,10 @@  static int of_at91wdt_init(struct device_node *np, struct at91wdt *wdt)
 
 	wdt->mr |= max | ((max - min) << 16);
 
+	if (!of_property_read_u32_index(np, "early-timeout-sec", 0,
+					(u32 *)&wdt->early_timer))
+		wdt->early_timer = wdt->early_timer * HZ + jiffies;
+
 	return 0;
 }
 #else