diff mbox

[RFC] Watchdog: sbsa_gwdt: Enhance timeout range

Message ID 20160504141449.GG13045@dhcppc6.redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Pratyush Anand May 4, 2016, 2:14 p.m. UTC
Hi Guenter,

On 03/05/2016:10:16:02 AM, Guenter Roeck wrote:
> On Tue, May 03, 2016 at 09:21:41PM +0530, Pratyush Anand wrote:
> > On 03/05/2016:10:07:48 AM, Timur Tabi wrote:
> > > Pratyush Anand wrote:
> > > >In fact after supporting max_hw_heartbeat_ms, there should be no change for
> > > >action=0 functionally. However, we would still need some changes for action=1.
> > > 
> > > IMHO, action=1 is more of a debugging option, and not something that would
> > > be used normally.  I would need to see some evidence that real users want to
> > > have action=1 and a longer timeout.
> > > 
> > If action=1 need to be used effectively, then we should have something which
> > would help to increase timeout values.
> > 
> > Currently you have only 10 second to execute secondary kernel, which might not
> > be sufficient.
> > 
> Previously the argument was that the 10 seconds (assuming the clock runs at
> maximum speed) would not be sufficient to load the watchdog application. Now it

May be you meant "would be sufficient". OK..let me clarify on it.

Currently it takes 7-8 second from the point 1st kernel panics to the point
second kernel boots. (Given, we have D-cache enabled in kexec-tools, for which
community is not yet agreed), anyway..so, it is safe for me as of now. But,
there is only 2-3 second margin. So, I am not sure if all sort of secondary
kernel will be able to make it in that time.

Following minimal code will be able to extend timeout for secondary kernel, and
I do not see anything wrong in it. We are anyway, panicking in ISR, so what
could be disadvantage if we write a wdt register just before panicking?


~Pratyush

Comments

Timur Tabi May 4, 2016, 2:21 p.m. UTC | #1
Pratyush Anand wrote:
> static irqreturn_t sbsa_gwdt_interrupt(int irq, void *dev_id)
> {
> +    struct sbsa_gwdt *gwdt = (struct sbsa_gwdt *)dev_id;
> +    struct watchdog_device *wdd = &gwdt->wdd;
> +    u64 timeout = (u64)gwdt->clk * wdd->timeout;
> +
> +    writeq(timeout + arch_counter_get_cntvct(),
> +                    gwdt->control_base + SBSA_GWDT_WCV);
> +
>       panic(WATCHDOG_NAME " timeout");

I'm on the fence about this.

On one hand, I have always opposed the idea that the interrupt handler 
needs to function properly in order for the timeout to be correct.  Fu's 
original patch required this for every timeout.

The current code, however, only uses the interrupt when action=1.  In 
this case, WCV is only reprogrammed in order to prevent the system from 
resetting during the kexec.  Technically, the watchdog timeout has 
already been handled.

However, this should be unnecessary, because it can't be a problem 
that's unique to the SBSA watchdog.  Every system that kexecs another 
kernel needs to be able to handle a watchdog timeout.  Shouldn't the 
kexec code already ping or disable the watchdog?  We need a 
cross-platform solution.  Drivers should not need to do this.
Pratyush Anand May 4, 2016, 3:59 p.m. UTC | #2
+Dave

Hi Timur,

On 04/05/2016:09:21:43 AM, Timur Tabi wrote:
> Pratyush Anand wrote:
> >static irqreturn_t sbsa_gwdt_interrupt(int irq, void *dev_id)
> >{
> >+    struct sbsa_gwdt *gwdt = (struct sbsa_gwdt *)dev_id;
> >+    struct watchdog_device *wdd = &gwdt->wdd;
> >+    u64 timeout = (u64)gwdt->clk * wdd->timeout;
> >+
> >+    writeq(timeout + arch_counter_get_cntvct(),
> >+                    gwdt->control_base + SBSA_GWDT_WCV);
> >+
> >      panic(WATCHDOG_NAME " timeout");
> 
> I'm on the fence about this.
> 
> On one hand, I have always opposed the idea that the interrupt handler needs
> to function properly in order for the timeout to be correct.  Fu's original
> patch required this for every timeout.
> 
> The current code, however, only uses the interrupt when action=1.  In this
> case, WCV is only reprogrammed in order to prevent the system from resetting
> during the kexec.  Technically, the watchdog timeout has already been
> handled.

Yes.

> 
> However, this should be unnecessary, because it can't be a problem that's
> unique to the SBSA watchdog.  Every system that kexecs another kernel needs
> to be able to handle a watchdog timeout.  Shouldn't the kexec code already
> ping or disable the watchdog?  We need a cross-platform solution.  Drivers
> should not need to do this.

Its unique to SBSA because you have very little timeout here. kexec-tools
upstream does not have any mechanism to handle watchdog timeout. Lets say even
if we implement a framework there, the best it can do is to ping the watchdog
again. Disabling should not be an option in kexec-tools, because in that case if
kexec-tools or secondary kernel stuck, we won't have a way out.
Now, even if we ping it once in kexec tools, we will have to make sure that
watchdog driver's probe is called before timeout. Therefore, user must have a
way to specify this timeout, so that if a particular kernel take more time to
boot then he can increase the timeout. Given, these variable conditions I do not
see much advantage of implementing it in kexec-tools.

However fedora/rhel kedumpctl mechanism does some  best case correction. It
makes sure that watchdog module is loaded in second kernel if watchdog was
active during first kernel, and loaded as early as possible [1].

~Pratyush

[1] https://github.com/pratyushanand/kexec-tools/commits/watchdog_fmaster
Timur Tabi May 4, 2016, 4:17 p.m. UTC | #3
Pratyush Anand wrote:
> Its unique to SBSA because you have very little timeout here. kexec-tools
> upstream does not have any mechanism to handle watchdog timeout. Lets say even
> if we implement a framework there, the best it can do is to ping the watchdog
> again.

Ok, so it's more accurate to say that kexec has a minimum watchdog 
timeout requirement.  What happens if the system admin sets the timeout 
to 5 seconds arbitrarily?  The system will reset during kexec, no matter 
which hardware is used.

This still sounds like a band-aid to me.  We're just assuming that we 
need a timeout of at least 20 seconds to support kexec.  Frankly, this 
still sounds like a problem the kexec developers needs to acknowledge 
and deal with.

Still I'm okay with a patch that extends the timeout by programming WCV, 
but it has to be commented as a hack specifically to support kexec 
because the timeout might be too short.  Then Wim can decide whether he 
supports such changes.
Guenter Roeck May 5, 2016, 4:43 p.m. UTC | #4
On Wed, May 04, 2016 at 11:17:29AM -0500, Timur Tabi wrote:
> Pratyush Anand wrote:
> >Its unique to SBSA because you have very little timeout here. kexec-tools
> >upstream does not have any mechanism to handle watchdog timeout. Lets say even
> >if we implement a framework there, the best it can do is to ping the watchdog
> >again.
> 
> Ok, so it's more accurate to say that kexec has a minimum watchdog timeout
> requirement.  What happens if the system admin sets the timeout to 5 seconds
> arbitrarily?  The system will reset during kexec, no matter which hardware
> is used.
> 
> This still sounds like a band-aid to me.  We're just assuming that we need a
> timeout of at least 20 seconds to support kexec.  Frankly, this still sounds
> like a problem the kexec developers needs to acknowledge and deal with.
> 
> Still I'm okay with a patch that extends the timeout by programming WCV, but
> it has to be commented as a hack specifically to support kexec because the
> timeout might be too short.  Then Wim can decide whether he supports such
> changes.
> 
I don't even understand how kexec-tools is involved in the first place.
kexec-tools sounds like user space, which should execute _after_ the kernel
and its modules are loaded (assuming modules are loaded from initramfs).
If kexec-tools can somehow ping the watchdog (presumably by writing into
the HW directly), I don't understand why it doesn't simply load the watchdog
driver instead and let the watchdog core handle the heartbeats.

I am really missing something here. How can kexec-tools do anything before
the kernel is loaded ?

Guenter
Pratyush Anand May 5, 2016, 6:20 p.m. UTC | #5
+kexec list

Hi Guenter,

On 05/05/2016:09:43:00 AM, Guenter Roeck wrote:
> On Wed, May 04, 2016 at 11:17:29AM -0500, Timur Tabi wrote:
> > Pratyush Anand wrote:
> > >Its unique to SBSA because you have very little timeout here. kexec-tools
> > >upstream does not have any mechanism to handle watchdog timeout. Lets say even
> > >if we implement a framework there, the best it can do is to ping the watchdog
> > >again.
> > 
> > Ok, so it's more accurate to say that kexec has a minimum watchdog timeout
> > requirement.  What happens if the system admin sets the timeout to 5 seconds
> > arbitrarily?  The system will reset during kexec, no matter which hardware
> > is used.
> > 
> > This still sounds like a band-aid to me.  We're just assuming that we need a
> > timeout of at least 20 seconds to support kexec.  Frankly, this still sounds
> > like a problem the kexec developers needs to acknowledge and deal with.
> > 
> > Still I'm okay with a patch that extends the timeout by programming WCV, but
> > it has to be commented as a hack specifically to support kexec because the
> > timeout might be too short.  Then Wim can decide whether he supports such
> > changes.
> > 
> I don't even understand how kexec-tools is involved in the first place.
> kexec-tools sounds like user space, which should execute _after_ the kernel

So _after_ the 1st kernel and _before_ the second kernel. It is an application
for the 1st kernel, which creates a tiny boot loader for 2nd kernel. After the
1st kernel is loaded, kexec-tools is executed in user space, which provides a
sane 2nd kernel and initramfs to the kernel using kexec() system call. Now 1st
kernel keep these information loaded into a specific memory called "Crash
Kernel" memory. When 1st kernel crashes, kernel kexec code passes control to
kexec boot loader, which does sha verification of 2nd kernel and initramfs and
passes control to 2nd kernel.

> and its modules are loaded (assuming modules are loaded from initramfs).
> If kexec-tools can somehow ping the watchdog (presumably by writing into
> the HW directly), I don't understand why it doesn't simply load the watchdog
> driver instead and let the watchdog core handle the heartbeats.

Because that tiny boot loader (which called purgatory) does not have any
knowledge about driver.

> 
> I am really missing something here. How can kexec-tools do anything before
> the kernel is loaded ?

So, if we _do_  _not_ go with the current version of patch, probably this could
be the only available option. However, Even when we would kick watchdog once in
kexec boot loader, we will have to make sure the 2nd kernel is light enough to
load watchdog module before timeout.  I think, in the long run we must have SBSA
watchdog specification improvement to keep WOR as 64 bit.

~Pratyush
Timur Tabi May 5, 2016, 6:22 p.m. UTC | #6
Pratyush Anand wrote:
> I think, in the long run we must have SBSA
> watchdog specification improvement to keep WOR as 64 bit.

I agree with this 100%.  IMHO, using a 32-bit WOR was just a bad decision.
Guenter Roeck May 5, 2016, 11:36 p.m. UTC | #7
On 05/05/2016 11:22 AM, Timur Tabi wrote:
> Pratyush Anand wrote:
>> I think, in the long run we must have SBSA
>> watchdog specification improvement to keep WOR as 64 bit.
>
> I agree with this 100%.  IMHO, using a 32-bit WOR was just a bad decision.
>

A 32-bit counter is absolutely fine. Letting it run with a 400MHz clock
(or was it 200 MHz ?) is the problem. A resolution of 2.5ns for a watchdog
timer does not really make any sense.

Guenter
Timur Tabi May 5, 2016, 11:38 p.m. UTC | #8
Guenter Roeck wrote:
> A 32-bit counter is absolutely fine. Letting it run with a 400MHz clock
> (or was it 200 MHz ?) is the problem. A resolution of 2.5ns for a watchdog
> timer does not really make any sense.

The 10 second limit is based on a 20MHz clock.
Timur Tabi May 5, 2016, 11:45 p.m. UTC | #9
Timur Tabi wrote:
>
>> A 32-bit counter is absolutely fine. Letting it run with a 400MHz clock
>> (or was it 200 MHz ?) is the problem. A resolution of 2.5ns for a
>> watchdog
>> timer does not really make any sense.
>
> The 10 second limit is based on a 20MHz clock.

No, that's not true.  I misread the code.  I knew something was wrong, 
but it didn't click until just now.

The default timeout is 10 seconds.  The max timeout on a 20MHz system 
(which is what we're running) is over 200 seconds.

The problem is that Pratyush's system is running at a clock that's way 
too fast:

  [  131.187562] sbsa-gwdt sbsa-gwdt.0: Initialized with 40s timeout @ 
250000000 Hz, action=1.

250MHz is unreasonable.  Pratyush, why is your system counter so high? 
On our ARM64 system, it's set to 20MHz.
Guenter Roeck May 5, 2016, 11:51 p.m. UTC | #10
On 05/05/2016 04:38 PM, Timur Tabi wrote:
> Guenter Roeck wrote:
>> A 32-bit counter is absolutely fine. Letting it run with a 400MHz clock
>> (or was it 200 MHz ?) is the problem. A resolution of 2.5ns for a watchdog
>> timer does not really make any sense.
>
> The 10 second limit is based on a 20MHz clock.
>
Not that a resolution of 50 ns makes sense either, but 4294967296 / 20971520 = 204,
and 20971520 * 10 = 209715200 = 0x0c800000. Where does the resolution get lost ?

Guenter
Guenter Roeck May 6, 2016, 12:18 a.m. UTC | #11
On 05/05/2016 04:45 PM, Timur Tabi wrote:
> Timur Tabi wrote:
>>
>>> A 32-bit counter is absolutely fine. Letting it run with a 400MHz clock
>>> (or was it 200 MHz ?) is the problem. A resolution of 2.5ns for a
>>> watchdog
>>> timer does not really make any sense.
>>
>> The 10 second limit is based on a 20MHz clock.
>
> No, that's not true.  I misread the code.  I knew something was wrong, but it didn't click until just now.
>
> The default timeout is 10 seconds.  The max timeout on a 20MHz system (which is what we're running) is over 200 seconds.
>
> The problem is that Pratyush's system is running at a clock that's way too fast:
>
>   [  131.187562] sbsa-gwdt sbsa-gwdt.0: Initialized with 40s timeout @ 250000000 Hz, action=1.
>
> 250MHz is unreasonable.  Pratyush, why is your system counter so high? On our ARM64 system, it's set to 20MHz.
>

Guess that answers my earlier question. Problem is that the specification
_permits_ those unreasonable frequencies, and quite obviously they are
being used, no matter if it makes sense or not.

With a (still unreasonable) maximum frequency of 100 MHz, the problem
would not exist. So, if anything, someone with influence on the standard
might suggest to reduce the maximum permitted frequency.

Guenter
diff mbox

Patch

--- a/drivers/watchdog/sbsa_gwdt.c
+++ b/drivers/watchdog/sbsa_gwdt.c
@@ -221,6 +221,13 @@  static int sbsa_gwdt_stop(struct watchdog_device *wdd)
 
static irqreturn_t sbsa_gwdt_interrupt(int irq, void *dev_id)
{
+    struct sbsa_gwdt *gwdt = (struct sbsa_gwdt *)dev_id;
+    struct watchdog_device *wdd = &gwdt->wdd;
+    u64 timeout = (u64)gwdt->clk * wdd->timeout;
+
+    writeq(timeout + arch_counter_get_cntvct(),
+                    gwdt->control_base + SBSA_GWDT_WCV);
+
     panic(WATCHDOG_NAME " timeout");
      
             return IRQ_HANDLED;