diff mbox series

x86: adjust initial setting of watchdog kind

Message ID fe289ca1-aa3a-49af-b7d7-70949237464b@suse.com (mailing list archive)
State Superseded
Headers show
Series x86: adjust initial setting of watchdog kind | expand

Commit Message

Jan Beulich Jan. 25, 2024, 2:12 p.m. UTC
"watchdog_timeout=0" is documented to disable the watchdog. Make sure
this also is true when there's a subsequent "watchdog" command line
option (and no further "watchdog_timeout=" one).

While there also switch watchdog_setup() to returning void, bringing it
in line with the !CONFIG_WATCHDOG case. Further amend command line
documentation to also mention the implicit effect of specifying a non-
zero timeout.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Alternatively "watchdog" following "watchdog_timeout=0" could be taken
to mean to use the default timeout again.

Really I think the comment in watchdog_setup() is wrong, and the
function would hence better go away. The CPU notifier registration can
surely be done in a pre-SMP initcall, which would have the benefit of
boot-time AP bringup then working the same as runtime CPU-onlining. (In
particular the set_timer() out of CPU_UP_PREPARE is a little suspicious,
as the timer can't possibly be run right away when a CPU isn't online
yet.) Which would leave __start_xen() to call watchdog_enable() in the
place it's calling watchdog_setup() now.

Comments

Andrew Cooper March 19, 2024, 8:35 p.m. UTC | #1
On 25/01/2024 2:12 pm, Jan Beulich wrote:
> "watchdog_timeout=0" is documented to disable the watchdog. Make sure
> this also is true when there's a subsequent "watchdog" command line
> option (and no further "watchdog_timeout=" one).

We also document that latest takes precedence, at which point "watchdog"
would re-activate.

>
> While there also switch watchdog_setup() to returning void, bringing it
> in line with the !CONFIG_WATCHDOG case. Further amend command line
> documentation to also mention the implicit effect of specifying a non-
> zero timeout.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Alternatively "watchdog" following "watchdog_timeout=0" could be taken
> to mean to use the default timeout again.

I realise that watchdog_timeout is my fault, but in fairness it was an
early change of mine in Xen and didn't exactly get the kind of review it
would get these days.  It also wasn't used by XenServer in the end - we
just stayed at a default 5s.

I'm very tempted to suggest deleting watchdog_timeout, and extending
watchdog= to have `force | <bool> | <int>s` so you could specify e.g.
`watchdog=10s`.

The watchdog is off by default so I don't expect this will impact
people.  It is also more convenient for the end user, and means that we
don't have have the current split approach of two separate options
fighting for control over each other.

It also means we we can in principle support non-integer-second units of
time in a theoretical future when the NMI handler can count time properly.

> Really I think the comment in watchdog_setup() is wrong, and the
> function would hence better go away.

That comment dates from 2006.  I highly suspect it's not true any more,
and it certainly is odd to be running over all CPUs like that.

~Andrew
Jan Beulich March 20, 2024, 8:59 a.m. UTC | #2
On 19.03.2024 21:35, Andrew Cooper wrote:
> On 25/01/2024 2:12 pm, Jan Beulich wrote:
>> "watchdog_timeout=0" is documented to disable the watchdog. Make sure
>> this also is true when there's a subsequent "watchdog" command line
>> option (and no further "watchdog_timeout=" one).
> 
> We also document that latest takes precedence, at which point "watchdog"
> would re-activate.

True, so perhaps ...

>> While there also switch watchdog_setup() to returning void, bringing it
>> in line with the !CONFIG_WATCHDOG case. Further amend command line
>> documentation to also mention the implicit effect of specifying a non-
>> zero timeout.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Alternatively "watchdog" following "watchdog_timeout=0" could be taken
>> to mean to use the default timeout again.

... this alternative wants following.

> I realise that watchdog_timeout is my fault, but in fairness it was an
> early change of mine in Xen and didn't exactly get the kind of review it
> would get these days.  It also wasn't used by XenServer in the end - we
> just stayed at a default 5s.
> 
> I'm very tempted to suggest deleting watchdog_timeout, and extending
> watchdog= to have `force | <bool> | <int>s` so you could specify e.g.
> `watchdog=10s`.
> 
> The watchdog is off by default so I don't expect this will impact
> people.  It is also more convenient for the end user, and means that we
> don't have have the current split approach of two separate options
> fighting for control over each other.

While I'd be happy to fold the two options, I don't think the watchdog
being off by default is relevant here. People using just the
watchdog_timeout= option with a non-zero value will already have the
watchdog enabled. They'd need to pay attention to an eventual CHANGELOG
entry and change their command line.

Furthermore consolidating the two options isn't going to remove any
of the problems. What effect would e.g. "watchdog=off,10s" have? The
principle of "latest takes precedence" assigns clear meaning to
"watchdog=off watchdog=10s", but the above remains as ambiguous as
e.g. "watchdog=force,0s". I'd be inclined to follow those to the
letter, i.e. "watchdog=off,10s" sets the timeout to 10 but disables
the watchdog while "watchdog=force,0s" simply results in a non-
functioning watchdog (due to 0s effectively meaning 4 billion seconds
and hence for all practical purposes "never").

Jan
Jan Beulich April 4, 2024, 10:32 a.m. UTC | #3
On 20.03.2024 09:59, Jan Beulich wrote:
> On 19.03.2024 21:35, Andrew Cooper wrote:
>> On 25/01/2024 2:12 pm, Jan Beulich wrote:
>>> "watchdog_timeout=0" is documented to disable the watchdog. Make sure
>>> this also is true when there's a subsequent "watchdog" command line
>>> option (and no further "watchdog_timeout=" one).
>>
>> We also document that latest takes precedence, at which point "watchdog"
>> would re-activate.
> 
> True,

Actually - no. Latest takes precedence doesn't matter here. "watchdog"
following "watchdog_timeout=0" is simply asking to enable the watchdog
with a timeout of 0, meaning infinity in practice. Which still is as
good as "watchdog=off".

> so perhaps ...
> 
>>> While there also switch watchdog_setup() to returning void, bringing it
>>> in line with the !CONFIG_WATCHDOG case. Further amend command line
>>> documentation to also mention the implicit effect of specifying a non-
>>> zero timeout.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> Alternatively "watchdog" following "watchdog_timeout=0" could be taken
>>> to mean to use the default timeout again.
> 
> ... this alternative wants following.
> 
>> I realise that watchdog_timeout is my fault, but in fairness it was an
>> early change of mine in Xen and didn't exactly get the kind of review it
>> would get these days.  It also wasn't used by XenServer in the end - we
>> just stayed at a default 5s.
>>
>> I'm very tempted to suggest deleting watchdog_timeout, and extending
>> watchdog= to have `force | <bool> | <int>s` so you could specify e.g.
>> `watchdog=10s`.

This being a set of alternatives also isn't quite right. "force" needs to
be possible to combine with a timeout value. Yet if we make it "List of",
which I was ...

>> The watchdog is off by default so I don't expect this will impact
>> people.  It is also more convenient for the end user, and means that we
>> don't have have the current split approach of two separate options
>> fighting for control over each other.
> 
> While I'd be happy to fold the two options, I don't think the watchdog
> being off by default is relevant here. People using just the
> watchdog_timeout= option with a non-zero value will already have the
> watchdog enabled. They'd need to pay attention to an eventual CHANGELOG
> entry and change their command line.
> 
> Furthermore consolidating the two options isn't going to remove any
> of the problems. What effect would e.g. "watchdog=off,10s" have? The
> principle of "latest takes precedence" assigns clear meaning to
> "watchdog=off watchdog=10s", but the above remains as ambiguous as
> e.g. "watchdog=force,0s". I'd be inclined to follow those to the
> letter, i.e. "watchdog=off,10s" sets the timeout to 10 but disables
> the watchdog while "watchdog=force,0s" simply results in a non-
> functioning watchdog (due to 0s effectively meaning 4 billion seconds
> and hence for all practical purposes "never").

... assuming anyway (despite you having it written differently), we'll
have said problems again. So perhaps

<bool> | List of [ force | <int>s ]

with a timeout of 0 disabling the watchdog and a non-zero one enabling it?

Jan
diff mbox series

Patch

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2795,7 +2795,7 @@  unknown NMIs will still be processed.
 > Default: `5`
 
 Set the NMI watchdog timeout in seconds.  Specifying `0` will turn off
-the watchdog.
+the watchdog.  Specifying a non-zero value enables the watchdog.
 
 ### x2apic (x86)
 > `= <boolean>`
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -473,7 +473,16 @@  bool watchdog_enabled(void)
     return !atomic_read(&watchdog_disable_count);
 }
 
-int __init watchdog_setup(void)
+void __init watchdog_configure(void)
+{
+    if ( !opt_watchdog_timeout )
+        opt_watchdog = false;
+
+    if ( opt_watchdog )
+        nmi_watchdog = NMI_LOCAL_APIC;
+}
+
+void __init watchdog_setup(void)
 {
     unsigned int cpu;
 
@@ -486,7 +495,6 @@  int __init watchdog_setup(void)
     register_cpu_notifier(&cpu_nmi_nfb);
 
     watchdog_enable();
-    return 0;
 }
 
 /* Returns false if this was not a watchdog NMI, true otherwise */
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1783,8 +1783,7 @@  void asmlinkage __init noreturn __start_
 
     open_softirq(NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ, new_tlbflush_clock_period);
 
-    if ( opt_watchdog ) 
-        nmi_watchdog = NMI_LOCAL_APIC;
+    watchdog_configure();
 
     find_smp_config();
 
--- a/xen/include/xen/watchdog.h
+++ b/xen/include/xen/watchdog.h
@@ -11,8 +11,11 @@ 
 
 #ifdef CONFIG_WATCHDOG
 
+/* Configure what, if any, watchdog to (try to) use. */
+void watchdog_configure(void);
+
 /* Try to set up a watchdog. */
-int watchdog_setup(void);
+void watchdog_setup(void);
 
 /* Enable the watchdog. */
 void watchdog_enable(void);