diff mbox

[RFC] Watchdog: sbsa_gwdt: Enhance timeout range

Message ID 20da73bb9bdf27993514c1da80fead13dc92932d.1462262900.git.panand@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Pratyush Anand May 3, 2016, 8:20 a.m. UTC
Currently only WOR is used to program both first and second stage which
provided very limited range of timeout.

This patch uses WCV as well to achieve higher range of timeout. This patch
programs max_timeout as 255, but that can be increased further as well.

Following testing shows that we can happily achieve 40 second default timeout.

 # modprobe sbsa_gwdt action=1
 [  131.187562] sbsa-gwdt sbsa-gwdt.0: Initialized with 40s timeout @ 250000000 Hz, action=1.
 # cd /sys/class/watchdog/watchdog0/
 # cat state
 inactive
 # cat /dev/watchdog0
 cat: /dev/watchdog0: Invalid argument
 [  161.710593] watchdog: watchdog0: watchdog did not stop!
 # cat state
 active
 # cat timeout
 40
 # cat timeleft
 38
 # cat timeleft
 25
 # cat /dev/watchdog0
 cat: /dev/watchdog0: Invalid argument
 [  184.931030] watchdog: watchdog0: watchdog did not stop!
 # cat timeleft
 37
 # cat timeleft
 21
 ...
 ...
 # cat timeleft
 1

panic() is called upon timeout of 40s. See timestamp of last kick (cat) and
next panic() message.

 [  224.939065] Kernel panic - not syncing: SBSA Watchdog timeout

Signed-off-by: Pratyush Anand <panand@redhat.com>
---
 drivers/watchdog/sbsa_gwdt.c | 83 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 62 insertions(+), 21 deletions(-)

Comments

Timur Tabi May 3, 2016, 12:12 p.m. UTC | #1
Pratyush Anand wrote:
> + * Note: This watchdog timer has two stages. If action is 0, first stage is
> + * determined by directly programming WCV and second by WOR. When first
> + * timeout is reached, WS0 is triggered and WCV is reloaded with value in
> + * WOR. WS0 interrupt will be ignored, then the second watch period starts;
> + * when second timeout is reached, then WS1 is triggered, system resets. WCV
> + * and WOR are programmed in such a way that total time corresponding to
> + * WCV+WOR becomes equivalent to user programmed "timeout".
> + * If action is 1, then we expect to call panic() at user programmed
> + * "timeout". Therefore, we program both first and second stage using WCV
> + * only.

So I'm not sure I understand how this works yet, but there was an 
earlier version of Fu's driver that did something similar.  It depended 
on being able to reprogram the hardware during the WS0 interrupt, and 
that was rejected by the community.

How is what you are doing different?
Pratyush Anand May 3, 2016, 1:24 p.m. UTC | #2
On 03/05/2016:07:12:04 AM, Timur Tabi wrote:
> Pratyush Anand wrote:
> >+ * Note: This watchdog timer has two stages. If action is 0, first stage is
> >+ * determined by directly programming WCV and second by WOR. When first
> >+ * timeout is reached, WS0 is triggered and WCV is reloaded with value in
> >+ * WOR. WS0 interrupt will be ignored, then the second watch period starts;
> >+ * when second timeout is reached, then WS1 is triggered, system resets. WCV
> >+ * and WOR are programmed in such a way that total time corresponding to
> >+ * WCV+WOR becomes equivalent to user programmed "timeout".
> >+ * If action is 1, then we expect to call panic() at user programmed
> >+ * "timeout". Therefore, we program both first and second stage using WCV
> >+ * only.
> 
> So I'm not sure I understand how this works yet, but there was an earlier
> version of Fu's driver that did something similar.  It depended on being
> able to reprogram the hardware during the WS0 interrupt, and that was
> rejected by the community.
> 
> How is what you are doing different?

* Following was the comment for Fu Wei's primitive version of patch [1], because
* of which community rejected it.

> The triggering of the hardware reset should never depend on an interrupt being
> handled properly.  You should always program WCV correctly in advance.

Now, there are couple of things different:

(1) There is an important difference in upstreamed version than the version [1]
which was rejected on above ground. In upstreamed version,  there would be no
interrupt handler when we are in normal mode ie action=0.  So, there is no
possibility of doing any thing in ISR for all normal usage of this timer. In
this mode WCV is always programmed well in advance now.

(2)action=1 mechanism was introduced to implement a dump saving mechanism if
watchdog timeout expires before next kick. So, the current upstream version
calls panic() in ISR. When action=1, then we do write WCV now in ISR, but there
too some precaution have been taken. 

When action=1, and we land into isr handler sbsa_gwdt_interrupt() we can not
trust watchdog data structure any more. That might have been corrupted.
(i) So it might happen that gwdt or wdd pointers have a corrupted value and as
soon as we access gwdt->wdd or wdd->timeout, kernel panics. *No harm*, just
panic() is called a bit early, which dump saving mechanism would be able to
find. So, in fact it will give an extra information to dump saving mechanism
that watchdog structure was corrupted as well.
(ii) Another case, It might happen that wdd->timeout has been corrupted with
large values. This patch does a protection while programming WCV in ISR. It
checks wdd->timeout against MAX_TIMEOUT value and reprograms WCV only when
wdd->timeout is lesser than MAX_TIMEOUT. So, here too, there would be watchdog
reset for sure if dump saving mechanism hangs.

~Pratyush

[1] https://lists.linaro.org/pipermail/linaro-acpi/2015-June/004956.html
Guenter Roeck May 3, 2016, 1:29 p.m. UTC | #3
On 05/03/2016 01:20 AM, Pratyush Anand wrote:
> Currently only WOR is used to program both first and second stage which
> provided very limited range of timeout.
>
> This patch uses WCV as well to achieve higher range of timeout. This patch
> programs max_timeout as 255, but that can be increased further as well.
>
> Following testing shows that we can happily achieve 40 second default timeout.
>
>   # modprobe sbsa_gwdt action=1
>   [  131.187562] sbsa-gwdt sbsa-gwdt.0: Initialized with 40s timeout @ 250000000 Hz, action=1.
>   # cd /sys/class/watchdog/watchdog0/
>   # cat state
>   inactive
>   # cat /dev/watchdog0
>   cat: /dev/watchdog0: Invalid argument
>   [  161.710593] watchdog: watchdog0: watchdog did not stop!
>   # cat state
>   active
>   # cat timeout
>   40
>   # cat timeleft
>   38
>   # cat timeleft
>   25
>   # cat /dev/watchdog0
>   cat: /dev/watchdog0: Invalid argument
>   [  184.931030] watchdog: watchdog0: watchdog did not stop!
>   # cat timeleft
>   37
>   # cat timeleft
>   21
>   ...
>   ...
>   # cat timeleft
>   1
>
> panic() is called upon timeout of 40s. See timestamp of last kick (cat) and
> next panic() message.
>
>   [  224.939065] Kernel panic - not syncing: SBSA Watchdog timeout
>
> Signed-off-by: Pratyush Anand <panand@redhat.com>

You could also use the new infrastructure (specify max_hw_heartbeat_ms instead
of max_timeout), and not depend on the correct implementation of WCV.

Overall this adds a lot of complexity for something that could by now easily
be handled by the infrastructure. Is this really necessary ?

Guenter

> ---
>   drivers/watchdog/sbsa_gwdt.c | 83 +++++++++++++++++++++++++++++++++-----------
>   1 file changed, 62 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/watchdog/sbsa_gwdt.c b/drivers/watchdog/sbsa_gwdt.c
> index ad383f6f15fc..529dd5e99fcd 100644
> --- a/drivers/watchdog/sbsa_gwdt.c
> +++ b/drivers/watchdog/sbsa_gwdt.c
> @@ -35,17 +35,23 @@
>    *
>    * SBSA GWDT:
>    * if action is 1 (the two stages mode):
> - * |--------WOR-------WS0--------WOR-------WS1
> + * |--------WCV-------WS0--------WCV-------WS1
>    * |----timeout-----(panic)----timeout-----reset
>    *
>    * if action is 0 (the single stage mode):
> - * |------WOR-----WS0(ignored)-----WOR------WS1
> + * |------WCV-----WS0(ignored)-----WOR------WS1
>    * |--------------timeout-------------------reset
>    *
> - * Note: Since this watchdog timer has two stages, and each stage is determined
> - * by WOR, in the single stage mode, the timeout is (WOR * 2); in the two
> - * stages mode, the timeout is WOR. The maximum timeout in the two stages mode
> - * is half of that in the single stage mode.
> + * Note: This watchdog timer has two stages. If action is 0, first stage is
> + * determined by directly programming WCV and second by WOR. When first
> + * timeout is reached, WS0 is triggered and WCV is reloaded with value in
> + * WOR. WS0 interrupt will be ignored, then the second watch period starts;
> + * when second timeout is reached, then WS1 is triggered, system resets. WCV
> + * and WOR are programmed in such a way that total time corresponding to
> + * WCV+WOR becomes equivalent to user programmed "timeout".
> + * If action is 1, then we expect to call panic() at user programmed
> + * "timeout". Therefore, we program both first and second stage using WCV
> + * only.
>    *
>    */
>
> @@ -95,7 +101,17 @@ struct sbsa_gwdt {
>   	void __iomem		*control_base;
>   };
>
> -#define DEFAULT_TIMEOUT		10 /* seconds */
> +/*
> + * Max Timeout Can be in days, but 255 seconds seems reasonable for all use
> + * cases
> + */
> +#define MAX_TIMEOUT		255

We don't usually define such arbitrary limits.

> +
> +/* Default timeout is 40 seconds, which is the 1st + 2nd watch periods when
> + * action is 0. When action is 1 then both 1st and 2nd watch periods will
> + * be of 40 seconds.
> + */
> +#define DEFAULT_TIMEOUT		40 /* seconds */
>
>   static unsigned int timeout;
>   module_param(timeout, uint, 0);
> @@ -127,20 +143,21 @@ static int sbsa_gwdt_set_timeout(struct watchdog_device *wdd,
>   				 unsigned int timeout)
>   {
>   	struct sbsa_gwdt *gwdt = watchdog_get_drvdata(wdd);
> +	u64 timeout_1, timeout_2;
>
>   	wdd->timeout = timeout;
>
>   	if (action)
> -		writel(gwdt->clk * timeout,
> -		       gwdt->control_base + SBSA_GWDT_WOR);
> +		timeout_1 = (u64)gwdt->clk * timeout;
>   	else
> -		/*
> -		 * In the single stage mode, The first signal (WS0) is ignored,
> -		 * the timeout is (WOR * 2), so the WOR should be configured
> -		 * to half value of timeout.
> -		 */
> -		writel(gwdt->clk / 2 * timeout,
> -		       gwdt->control_base + SBSA_GWDT_WOR);
> +		timeout_1 = (u64)gwdt->clk * (timeout - wdd->min_timeout);
> +
> +	/* when action=1, timeout_2 will be overwritten in ISR */
> +	timeout_2 = (u64)gwdt->clk * wdd->min_timeout;
> +
Why min_timeout ? Also, where is it overwritten in the interrupt handler,
and to which value ?

> +	writel(timeout_2, gwdt->control_base + SBSA_GWDT_WOR);
> +	writeq(timeout_1 + arch_counter_get_cntvct(),
> +		gwdt->control_base + SBSA_GWDT_WCV);
>
>   	return 0;
>   }
> @@ -172,12 +189,17 @@ static int sbsa_gwdt_keepalive(struct watchdog_device *wdd)
>   	struct sbsa_gwdt *gwdt = watchdog_get_drvdata(wdd);
>
>   	/*
> -	 * Writing WRR for an explicit watchdog refresh.
> -	 * You can write anyting (like 0).
> +	 * play safe: program WOR with max value so that we have sufficient
> +	 * time to overwrite them after explicit refresh
>   	 */
> +	writel(U32_MAX, gwdt->control_base + SBSA_GWDT_WOR);
> +	/*
> +	* Writing WRR for an explicit watchdog refresh.
> +	* You can write anyting (like 0).
> +	*/

Please stick with standard multi-line comments.

>   	writel(0, gwdt->refresh_base + SBSA_GWDT_WRR);
>
> -	return 0;
> +	return sbsa_gwdt_set_timeout(wdd, wdd->timeout);;
>   }
>
>   static unsigned int sbsa_gwdt_status(struct watchdog_device *wdd)
> @@ -193,10 +215,15 @@ static int sbsa_gwdt_start(struct watchdog_device *wdd)
>   {
>   	struct sbsa_gwdt *gwdt = watchdog_get_drvdata(wdd);
>
> +	/*
> +	 * play safe: program WOR with max value so that we have sufficient
> +	 * time to overwrite them after explicit refresh
> +	 */
> +	writel(U32_MAX, gwdt->control_base + SBSA_GWDT_WOR);
>   	/* writing WCS will cause an explicit watchdog refresh */
>   	writel(SBSA_GWDT_WCS_EN, gwdt->control_base + SBSA_GWDT_WCS);
>
> -	return 0;
> +	return sbsa_gwdt_set_timeout(wdd, wdd->timeout);;
>   }
>
>   static int sbsa_gwdt_stop(struct watchdog_device *wdd)
> @@ -211,6 +238,20 @@ static int sbsa_gwdt_stop(struct watchdog_device *wdd)
>
>   static irqreturn_t sbsa_gwdt_interrupt(int irq, void *dev_id)
>   {
> +	struct sbsa_gwdt *gwdt = (struct sbsa_gwdt *)dev_id;
> +	struct watchdog_device *wdd = &gwdt->wdd;
> +	u64 timeout_2 = (u64)gwdt->clk * wdd->timeout;
> +
> +	/*
> +	 * Since we can not trust system at this moment, therefore re-write
> +	 * WCV only if wdd->timeout <= MAX_TIMEOUT to avoid a corner
> +	 * scenario when we might have corrupted wdd->timeout values at
> +	 * this point.
> +	 */

This is quite vague. What is this corner scenario where wdd->timeout
would be corrupted ? How can wdd->timeout ever be larger than MAX_TIMEOUT ?

> +	if (wdd->timeout <= MAX_TIMEOUT)
> +		writeq(timeout_2 + arch_counter_get_cntvct(),
> +			gwdt->control_base + SBSA_GWDT_WCV);
> +
>   	panic(WATCHDOG_NAME " timeout");
>
>   	return IRQ_HANDLED;
> @@ -273,7 +314,7 @@ static int sbsa_gwdt_probe(struct platform_device *pdev)
>   	wdd->info = &sbsa_gwdt_info;
>   	wdd->ops = &sbsa_gwdt_ops;
>   	wdd->min_timeout = 1;
> -	wdd->max_timeout = U32_MAX / gwdt->clk;
> +	wdd->max_timeout = MAX_TIMEOUT;
>   	wdd->timeout = DEFAULT_TIMEOUT;
>   	watchdog_set_drvdata(wdd, gwdt);
>   	watchdog_set_nowayout(wdd, nowayout);
>
Guenter Roeck May 3, 2016, 1:47 p.m. UTC | #4
On 05/03/2016 06:24 AM, Pratyush Anand wrote:
> On 03/05/2016:07:12:04 AM, Timur Tabi wrote:
>> Pratyush Anand wrote:
>>> + * Note: This watchdog timer has two stages. If action is 0, first stage is
>>> + * determined by directly programming WCV and second by WOR. When first
>>> + * timeout is reached, WS0 is triggered and WCV is reloaded with value in
>>> + * WOR. WS0 interrupt will be ignored, then the second watch period starts;
>>> + * when second timeout is reached, then WS1 is triggered, system resets. WCV
>>> + * and WOR are programmed in such a way that total time corresponding to
>>> + * WCV+WOR becomes equivalent to user programmed "timeout".
>>> + * If action is 1, then we expect to call panic() at user programmed
>>> + * "timeout". Therefore, we program both first and second stage using WCV
>>> + * only.
>>
>> So I'm not sure I understand how this works yet, but there was an earlier
>> version of Fu's driver that did something similar.  It depended on being
>> able to reprogram the hardware during the WS0 interrupt, and that was
>> rejected by the community.
>>
>> How is what you are doing different?
>
> * Following was the comment for Fu Wei's primitive version of patch [1], because
> * of which community rejected it.
>
>> The triggering of the hardware reset should never depend on an interrupt being
>> handled properly.  You should always program WCV correctly in advance.
>
> Now, there are couple of things different:
>
> (1) There is an important difference in upstreamed version than the version [1]
> which was rejected on above ground. In upstreamed version,  there would be no
> interrupt handler when we are in normal mode ie action=0.  So, there is no
> possibility of doing any thing in ISR for all normal usage of this timer. In
> this mode WCV is always programmed well in advance now.
>
> (2)action=1 mechanism was introduced to implement a dump saving mechanism if
> watchdog timeout expires before next kick. So, the current upstream version
> calls panic() in ISR. When action=1, then we do write WCV now in ISR, but there
> too some precaution have been taken.
>
> When action=1, and we land into isr handler sbsa_gwdt_interrupt() we can not
> trust watchdog data structure any more. That might have been corrupted.

Why ?

> (i) So it might happen that gwdt or wdd pointers have a corrupted value and as
> soon as we access gwdt->wdd or wdd->timeout, kernel panics. *No harm*, just
> panic() is called a bit early, which dump saving mechanism would be able to
> find. So, in fact it will give an extra information to dump saving mechanism
> that watchdog structure was corrupted as well.
> (ii) Another case, It might happen that wdd->timeout has been corrupted with
> large values. This patch does a protection while programming WCV in ISR. It

How would wdd->timeout be corrupted ?

If what you are say is correct, the interrupt handler can not be trusted, period,
and should be disabled entirely.

Guenter

> checks wdd->timeout against MAX_TIMEOUT value and reprograms WCV only when
> wdd->timeout is lesser than MAX_TIMEOUT. So, here too, there would be watchdog
> reset for sure if dump saving mechanism hangs.
>
> ~Pratyush
>
> [1] https://lists.linaro.org/pipermail/linaro-acpi/2015-June/004956.html
>
Pratyush Anand May 3, 2016, 2:17 p.m. UTC | #5
Hi Guenter,

On 03/05/2016:06:47:12 AM, Guenter Roeck wrote:
> On 05/03/2016 06:24 AM, Pratyush Anand wrote:
> >On 03/05/2016:07:12:04 AM, Timur Tabi wrote:
> >>Pratyush Anand wrote:
> >>>+ * Note: This watchdog timer has two stages. If action is 0, first stage is
> >>>+ * determined by directly programming WCV and second by WOR. When first
> >>>+ * timeout is reached, WS0 is triggered and WCV is reloaded with value in
> >>>+ * WOR. WS0 interrupt will be ignored, then the second watch period starts;
> >>>+ * when second timeout is reached, then WS1 is triggered, system resets. WCV
> >>>+ * and WOR are programmed in such a way that total time corresponding to
> >>>+ * WCV+WOR becomes equivalent to user programmed "timeout".
> >>>+ * If action is 1, then we expect to call panic() at user programmed
> >>>+ * "timeout". Therefore, we program both first and second stage using WCV
> >>>+ * only.
> >>
> >>So I'm not sure I understand how this works yet, but there was an earlier
> >>version of Fu's driver that did something similar.  It depended on being
> >>able to reprogram the hardware during the WS0 interrupt, and that was
> >>rejected by the community.
> >>
> >>How is what you are doing different?
> >
> >* Following was the comment for Fu Wei's primitive version of patch [1], because
> >* of which community rejected it.
> >
> >>The triggering of the hardware reset should never depend on an interrupt being
> >>handled properly.  You should always program WCV correctly in advance.
> >
> >Now, there are couple of things different:
> >
> >(1) There is an important difference in upstreamed version than the version [1]
> >which was rejected on above ground. In upstreamed version,  there would be no
> >interrupt handler when we are in normal mode ie action=0.  So, there is no
> >possibility of doing any thing in ISR for all normal usage of this timer. In
> >this mode WCV is always programmed well in advance now.
> >
> >(2)action=1 mechanism was introduced to implement a dump saving mechanism if
> >watchdog timeout expires before next kick. So, the current upstream version
> >calls panic() in ISR. When action=1, then we do write WCV now in ISR, but there
> >too some precaution have been taken.
> >
> >When action=1, and we land into isr handler sbsa_gwdt_interrupt() we can not
> >trust watchdog data structure any more. That might have been corrupted.
> 
> Why ?
> 
> >(i) So it might happen that gwdt or wdd pointers have a corrupted value and as
> >soon as we access gwdt->wdd or wdd->timeout, kernel panics. *No harm*, just
> >panic() is called a bit early, which dump saving mechanism would be able to
> >find. So, in fact it will give an extra information to dump saving mechanism
> >that watchdog structure was corrupted as well.
> >(ii) Another case, It might happen that wdd->timeout has been corrupted with
> >large values. This patch does a protection while programming WCV in ISR. It
> 
> How would wdd->timeout be corrupted ?

Most likely it will not be corrupted. But, if ISR has been called it means
something went wrong, and watchdog was not kicked for the time programmed as
"timeout". So, probably we should be extra careful.

Anyway, there is no side effect of checking wdd->timeout against MAX_TIMEOUT.

Moreover, I was unable to envisage any regression with this patch other than
that it introduces a bit complexity. But, then it gives us an advantage of
having any timeout value.

> If what you are say is correct, the interrupt handler can not be trusted, period,
> and should be disabled entirely.

Yes, When system is hang, then probably we can not trust even interrupt handler.
We can not be 100% sure that handler will be called, but that is the case with
existing upstream code as well. In such situation upstream code will also
misbehave. There is little we can do for such cases.

If one wants to be fully assured for a hardware watchdog reset then the choice
is action=0. Even after current patch, action=0 will ensure that there would be
a hardware reset upon timeout.

~Pratyush
Pratyush Anand May 3, 2016, 2:38 p.m. UTC | #6
Hi Guenter,

On 03/05/2016:06:29:39 AM, Guenter Roeck wrote:
> On 05/03/2016 01:20 AM, Pratyush Anand wrote:
> >Currently only WOR is used to program both first and second stage which
> >provided very limited range of timeout.
> >
> >This patch uses WCV as well to achieve higher range of timeout. This patch
> >programs max_timeout as 255, but that can be increased further as well.
> >
> >Following testing shows that we can happily achieve 40 second default timeout.
> >
> >  # modprobe sbsa_gwdt action=1
> >  [  131.187562] sbsa-gwdt sbsa-gwdt.0: Initialized with 40s timeout @ 250000000 Hz, action=1.
> >  # cd /sys/class/watchdog/watchdog0/
> >  # cat state
> >  inactive
> >  # cat /dev/watchdog0
> >  cat: /dev/watchdog0: Invalid argument
> >  [  161.710593] watchdog: watchdog0: watchdog did not stop!
> >  # cat state
> >  active
> >  # cat timeout
> >  40
> >  # cat timeleft
> >  38
> >  # cat timeleft
> >  25
> >  # cat /dev/watchdog0
> >  cat: /dev/watchdog0: Invalid argument
> >  [  184.931030] watchdog: watchdog0: watchdog did not stop!
> >  # cat timeleft
> >  37
> >  # cat timeleft
> >  21
> >  ...
> >  ...
> >  # cat timeleft
> >  1
> >
> >panic() is called upon timeout of 40s. See timestamp of last kick (cat) and
> >next panic() message.
> >
> >  [  224.939065] Kernel panic - not syncing: SBSA Watchdog timeout
> >
> >Signed-off-by: Pratyush Anand <panand@redhat.com>
> 
> You could also use the new infrastructure (specify max_hw_heartbeat_ms instead
> of max_timeout), and not depend on the correct implementation of WCV.

Thanks for pointing to max_hw_heartbeat_ms. Just gone through it. Certainly it
would be helpful, and some part of this patch will go away. 

In fact after supporting max_hw_heartbeat_ms, there should be no change for
action=0 functionally. However, we would still need some changes for action=1.

When action=1, isr is called, which calls panic(). Calling panic() will further
trigger a dump saving mechanism, which can cause to execute a secondary kernel.
Now, it might happen that with the limited timeout (max_hw_heartbeat_ms)
programmed in first kernel, we land into a reset before secondary kernel could
start kicking it again or would complete dump save. 
So, in my opinion:
(1) We should use max_hw_heartbeat_ms.
(2) Then we should overwrite WCV in ISR so that it ensures a timeout of user
programmed "timeout" value for hardware reset.

~Pratyush
Guenter Roeck May 3, 2016, 2:46 p.m. UTC | #7
On 05/03/2016 07:17 AM, Pratyush Anand wrote:
> Hi Guenter,
>
> On 03/05/2016:06:47:12 AM, Guenter Roeck wrote:
>> On 05/03/2016 06:24 AM, Pratyush Anand wrote:
>>> On 03/05/2016:07:12:04 AM, Timur Tabi wrote:
>>>> Pratyush Anand wrote:
>>>>> + * Note: This watchdog timer has two stages. If action is 0, first stage is
>>>>> + * determined by directly programming WCV and second by WOR. When first
>>>>> + * timeout is reached, WS0 is triggered and WCV is reloaded with value in
>>>>> + * WOR. WS0 interrupt will be ignored, then the second watch period starts;
>>>>> + * when second timeout is reached, then WS1 is triggered, system resets. WCV
>>>>> + * and WOR are programmed in such a way that total time corresponding to
>>>>> + * WCV+WOR becomes equivalent to user programmed "timeout".
>>>>> + * If action is 1, then we expect to call panic() at user programmed
>>>>> + * "timeout". Therefore, we program both first and second stage using WCV
>>>>> + * only.
>>>>
>>>> So I'm not sure I understand how this works yet, but there was an earlier
>>>> version of Fu's driver that did something similar.  It depended on being
>>>> able to reprogram the hardware during the WS0 interrupt, and that was
>>>> rejected by the community.
>>>>
>>>> How is what you are doing different?
>>>
>>> * Following was the comment for Fu Wei's primitive version of patch [1], because
>>> * of which community rejected it.
>>>
>>>> The triggering of the hardware reset should never depend on an interrupt being
>>>> handled properly.  You should always program WCV correctly in advance.
>>>
>>> Now, there are couple of things different:
>>>
>>> (1) There is an important difference in upstreamed version than the version [1]
>>> which was rejected on above ground. In upstreamed version,  there would be no
>>> interrupt handler when we are in normal mode ie action=0.  So, there is no
>>> possibility of doing any thing in ISR for all normal usage of this timer. In
>>> this mode WCV is always programmed well in advance now.
>>>
>>> (2)action=1 mechanism was introduced to implement a dump saving mechanism if
>>> watchdog timeout expires before next kick. So, the current upstream version
>>> calls panic() in ISR. When action=1, then we do write WCV now in ISR, but there
>>> too some precaution have been taken.
>>>
>>> When action=1, and we land into isr handler sbsa_gwdt_interrupt() we can not
>>> trust watchdog data structure any more. That might have been corrupted.
>>
>> Why ?
>>
>>> (i) So it might happen that gwdt or wdd pointers have a corrupted value and as
>>> soon as we access gwdt->wdd or wdd->timeout, kernel panics. *No harm*, just
>>> panic() is called a bit early, which dump saving mechanism would be able to
>>> find. So, in fact it will give an extra information to dump saving mechanism
>>> that watchdog structure was corrupted as well.
>>> (ii) Another case, It might happen that wdd->timeout has been corrupted with
>>> large values. This patch does a protection while programming WCV in ISR. It
>>
>> How would wdd->timeout be corrupted ?
>
> Most likely it will not be corrupted. But, if ISR has been called it means
> something went wrong, and watchdog was not kicked for the time programmed as
> "timeout". So, probably we should be extra careful.
>
This logic would apply to _every_ watchdog driver implementing interrupts.
Actually, it would apply to _all_ kernel code, and the logic could be used
to introduce hyper-defensive programming all over the place, bloat the kernel
and ultimately make it all but unusable. I do not believe in such programming
in an operating system kernel.

On top of that, the assumption that the kernel would be still sane enough
to call the interrupt handler, but not sane enough to actually execute it,
seems to be a bit far-fetched.

> Anyway, there is no side effect of checking wdd->timeout against MAX_TIMEOUT.
>
The side effect is increased code size.

> Moreover, I was unable to envisage any regression with this patch other than
> that it introduces a bit complexity. But, then it gives us an advantage of
> having any timeout value.
>
>> If what you are say is correct, the interrupt handler can not be trusted, period,
>> and should be disabled entirely.
>
> Yes, When system is hang, then probably we can not trust even interrupt handler.
> We can not be 100% sure that handler will be called, but that is the case with
> existing upstream code as well. In such situation upstream code will also
> misbehave. There is little we can do for such cases.
>
... other than wait for the hard reset, which will be just fine, at least
with the current code.

> If one wants to be fully assured for a hardware watchdog reset then the choice
> is action=0. Even after current patch, action=0 will ensure that there would be
> a hardware reset upon timeout.
>
Sorry, I don't buy into this. You'll have to convince Wim, or simply
change the driver to use the infrastructure to achieve larger timeouts.

Thanks,
Guenter
Timur Tabi May 3, 2016, 3:04 p.m. UTC | #8
Guenter Roeck wrote:
>> Most likely it will not be corrupted. But, if ISR has been called it
>> means
>> something went wrong, and watchdog was not kicked for the time
>> programmed as
>> "timeout". So, probably we should be extra careful.
>>
> This logic would apply to _every_ watchdog driver implementing interrupts.
> Actually, it would apply to _all_ kernel code, and the logic could be used
> to introduce hyper-defensive programming all over the place, bloat the
> kernel
> and ultimately make it all but unusable. I do not believe in such
> programming
> in an operating system kernel.
>
> On top of that, the assumption that the kernel would be still sane enough
> to call the interrupt handler, but not sane enough to actually execute it,
> seems to be a bit far-fetched.

Agreed.  Pratyush, have you ever seen any watchdog register get 
corrupted, like you describe?  It just seems like you're imagining a 
problem that has never occurred.
Timur Tabi May 3, 2016, 3:07 p.m. UTC | #9
Pratyush Anand wrote:
> In fact after supporting max_hw_heartbeat_ms, there should be no change for
> action=0 functionally. However, we would still need some changes for action=1.

IMHO, action=1 is more of a debugging option, and not something that 
would be used normally.  I would need to see some evidence that real 
users want to have action=1 and a longer timeout.

I've never been a fan of the action=1 option, and I'm certainly not keen 
any patches that make action=1 more complicated than it already is.
Pratyush Anand May 3, 2016, 3:51 p.m. UTC | #10
On 03/05/2016:10:07:48 AM, Timur Tabi wrote:
> Pratyush Anand wrote:
> >In fact after supporting max_hw_heartbeat_ms, there should be no change for
> >action=0 functionally. However, we would still need some changes for action=1.
> 
> IMHO, action=1 is more of a debugging option, and not something that would
> be used normally.  I would need to see some evidence that real users want to
> have action=1 and a longer timeout.
> 
If action=1 need to be used effectively, then we should have something which
would help to increase timeout values.

Currently you have only 10 second to execute secondary kernel, which might not
be sufficient.

> I've never been a fan of the action=1 option, and I'm certainly not keen any
> patches that make action=1 more complicated than it already is.

I think, with max_hw_heartbeat_ms it would be far more simpler. Will attempt and
send another RFC.

~Pratyush
Guenter Roeck May 3, 2016, 5:16 p.m. UTC | #11
On Tue, May 03, 2016 at 09:21:41PM +0530, Pratyush Anand wrote:
> On 03/05/2016:10:07:48 AM, Timur Tabi wrote:
> > Pratyush Anand wrote:
> > >In fact after supporting max_hw_heartbeat_ms, there should be no change for
> > >action=0 functionally. However, we would still need some changes for action=1.
> > 
> > IMHO, action=1 is more of a debugging option, and not something that would
> > be used normally.  I would need to see some evidence that real users want to
> > have action=1 and a longer timeout.
> > 
> If action=1 need to be used effectively, then we should have something which
> would help to increase timeout values.
> 
> Currently you have only 10 second to execute secondary kernel, which might not
> be sufficient.
> 
Previously the argument was that the 10 seconds (assuming the clock runs at
maximum speed) would not be sufficient to load the watchdog application. Now it
seems the 10 seconds are deemed insufficient to load the watchdog driver (since
the infrastructure can handle the heartbeats). Is there actual evidence that
this is the case ?

Guenter

> > I've never been a fan of the action=1 option, and I'm certainly not keen any
> > patches that make action=1 more complicated than it already is.
> 
> I think, with max_hw_heartbeat_ms it would be far more simpler. Will attempt and
> send another RFC.
> 
> ~Pratyush
diff mbox

Patch

diff --git a/drivers/watchdog/sbsa_gwdt.c b/drivers/watchdog/sbsa_gwdt.c
index ad383f6f15fc..529dd5e99fcd 100644
--- a/drivers/watchdog/sbsa_gwdt.c
+++ b/drivers/watchdog/sbsa_gwdt.c
@@ -35,17 +35,23 @@ 
  *
  * SBSA GWDT:
  * if action is 1 (the two stages mode):
- * |--------WOR-------WS0--------WOR-------WS1
+ * |--------WCV-------WS0--------WCV-------WS1
  * |----timeout-----(panic)----timeout-----reset
  *
  * if action is 0 (the single stage mode):
- * |------WOR-----WS0(ignored)-----WOR------WS1
+ * |------WCV-----WS0(ignored)-----WOR------WS1
  * |--------------timeout-------------------reset
  *
- * Note: Since this watchdog timer has two stages, and each stage is determined
- * by WOR, in the single stage mode, the timeout is (WOR * 2); in the two
- * stages mode, the timeout is WOR. The maximum timeout in the two stages mode
- * is half of that in the single stage mode.
+ * Note: This watchdog timer has two stages. If action is 0, first stage is
+ * determined by directly programming WCV and second by WOR. When first
+ * timeout is reached, WS0 is triggered and WCV is reloaded with value in
+ * WOR. WS0 interrupt will be ignored, then the second watch period starts;
+ * when second timeout is reached, then WS1 is triggered, system resets. WCV
+ * and WOR are programmed in such a way that total time corresponding to
+ * WCV+WOR becomes equivalent to user programmed "timeout".
+ * If action is 1, then we expect to call panic() at user programmed
+ * "timeout". Therefore, we program both first and second stage using WCV
+ * only.
  *
  */
 
@@ -95,7 +101,17 @@  struct sbsa_gwdt {
 	void __iomem		*control_base;
 };
 
-#define DEFAULT_TIMEOUT		10 /* seconds */
+/*
+ * Max Timeout Can be in days, but 255 seconds seems reasonable for all use
+ * cases
+ */
+#define MAX_TIMEOUT		255
+
+/* Default timeout is 40 seconds, which is the 1st + 2nd watch periods when
+ * action is 0. When action is 1 then both 1st and 2nd watch periods will
+ * be of 40 seconds.
+ */
+#define DEFAULT_TIMEOUT		40 /* seconds */
 
 static unsigned int timeout;
 module_param(timeout, uint, 0);
@@ -127,20 +143,21 @@  static int sbsa_gwdt_set_timeout(struct watchdog_device *wdd,
 				 unsigned int timeout)
 {
 	struct sbsa_gwdt *gwdt = watchdog_get_drvdata(wdd);
+	u64 timeout_1, timeout_2;
 
 	wdd->timeout = timeout;
 
 	if (action)
-		writel(gwdt->clk * timeout,
-		       gwdt->control_base + SBSA_GWDT_WOR);
+		timeout_1 = (u64)gwdt->clk * timeout;
 	else
-		/*
-		 * In the single stage mode, The first signal (WS0) is ignored,
-		 * the timeout is (WOR * 2), so the WOR should be configured
-		 * to half value of timeout.
-		 */
-		writel(gwdt->clk / 2 * timeout,
-		       gwdt->control_base + SBSA_GWDT_WOR);
+		timeout_1 = (u64)gwdt->clk * (timeout - wdd->min_timeout);
+
+	/* when action=1, timeout_2 will be overwritten in ISR */
+	timeout_2 = (u64)gwdt->clk * wdd->min_timeout;
+
+	writel(timeout_2, gwdt->control_base + SBSA_GWDT_WOR);
+	writeq(timeout_1 + arch_counter_get_cntvct(),
+		gwdt->control_base + SBSA_GWDT_WCV);
 
 	return 0;
 }
@@ -172,12 +189,17 @@  static int sbsa_gwdt_keepalive(struct watchdog_device *wdd)
 	struct sbsa_gwdt *gwdt = watchdog_get_drvdata(wdd);
 
 	/*
-	 * Writing WRR for an explicit watchdog refresh.
-	 * You can write anyting (like 0).
+	 * play safe: program WOR with max value so that we have sufficient
+	 * time to overwrite them after explicit refresh
 	 */
+	writel(U32_MAX, gwdt->control_base + SBSA_GWDT_WOR);
+	/*
+	* Writing WRR for an explicit watchdog refresh.
+	* You can write anyting (like 0).
+	*/
 	writel(0, gwdt->refresh_base + SBSA_GWDT_WRR);
 
-	return 0;
+	return sbsa_gwdt_set_timeout(wdd, wdd->timeout);;
 }
 
 static unsigned int sbsa_gwdt_status(struct watchdog_device *wdd)
@@ -193,10 +215,15 @@  static int sbsa_gwdt_start(struct watchdog_device *wdd)
 {
 	struct sbsa_gwdt *gwdt = watchdog_get_drvdata(wdd);
 
+	/*
+	 * play safe: program WOR with max value so that we have sufficient
+	 * time to overwrite them after explicit refresh
+	 */
+	writel(U32_MAX, gwdt->control_base + SBSA_GWDT_WOR);
 	/* writing WCS will cause an explicit watchdog refresh */
 	writel(SBSA_GWDT_WCS_EN, gwdt->control_base + SBSA_GWDT_WCS);
 
-	return 0;
+	return sbsa_gwdt_set_timeout(wdd, wdd->timeout);;
 }
 
 static int sbsa_gwdt_stop(struct watchdog_device *wdd)
@@ -211,6 +238,20 @@  static int sbsa_gwdt_stop(struct watchdog_device *wdd)
 
 static irqreturn_t sbsa_gwdt_interrupt(int irq, void *dev_id)
 {
+	struct sbsa_gwdt *gwdt = (struct sbsa_gwdt *)dev_id;
+	struct watchdog_device *wdd = &gwdt->wdd;
+	u64 timeout_2 = (u64)gwdt->clk * wdd->timeout;
+
+	/*
+	 * Since we can not trust system at this moment, therefore re-write
+	 * WCV only if wdd->timeout <= MAX_TIMEOUT to avoid a corner
+	 * scenario when we might have corrupted wdd->timeout values at
+	 * this point.
+	 */
+	if (wdd->timeout <= MAX_TIMEOUT)
+		writeq(timeout_2 + arch_counter_get_cntvct(),
+			gwdt->control_base + SBSA_GWDT_WCV);
+
 	panic(WATCHDOG_NAME " timeout");
 
 	return IRQ_HANDLED;
@@ -273,7 +314,7 @@  static int sbsa_gwdt_probe(struct platform_device *pdev)
 	wdd->info = &sbsa_gwdt_info;
 	wdd->ops = &sbsa_gwdt_ops;
 	wdd->min_timeout = 1;
-	wdd->max_timeout = U32_MAX / gwdt->clk;
+	wdd->max_timeout = MAX_TIMEOUT;
 	wdd->timeout = DEFAULT_TIMEOUT;
 	watchdog_set_drvdata(wdd, gwdt);
 	watchdog_set_nowayout(wdd, nowayout);