diff mbox

[PATCHv2] x86: only check for two watchdog NMIs

Message ID 1454346592-27067-1-git-send-email-david.vrabel@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

David Vrabel Feb. 1, 2016, 5:09 p.m. UTC
Since the NMI handler can now recognize watchdog NMIs, make
check_nmi_watchdog() only check for at least two watchdog NMIs.  This
prevents false negatives caused by other processors (which may be
being power managed by the BIOS) running at reduced clock frequencies.

We check for more than one NMI since there are apparently systems
where the NMI works only once.

This will also slightly speed up boot times since we only wait the
full 10 ticks if the NMI watchdog on one or more CPUs is not working.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
v2:
- Check for two watchdog NMIs.
---
 xen/arch/x86/nmi.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

Comments

Andrew Cooper Feb. 1, 2016, 5:33 p.m. UTC | #1
On 01/02/16 17:09, David Vrabel wrote:
> Since the NMI handler can now recognize watchdog NMIs, make
> check_nmi_watchdog() only check for at least two watchdog NMIs.  This
> prevents false negatives caused by other processors (which may be
> being power managed by the BIOS) running at reduced clock frequencies.
>
> We check for more than one NMI since there are apparently systems
> where the NMI works only once.
>
> This will also slightly speed up boot times since we only wait the
> full 10 ticks if the NMI watchdog on one or more CPUs is not working.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
diff mbox

Patch

diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
index b1195a1..426c24e 100644
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -139,7 +139,18 @@  int nmi_active;
 
 static void __init wait_for_nmis(void *p)
 {
-    mdelay((10*1000)/nmi_hz); /* wait 10 ticks */
+    unsigned int cpu = smp_processor_id();
+    unsigned int start_count = nmi_count(cpu);
+    unsigned long ticks = 10 * 1000 * cpu_khz / nmi_hz;
+    unsigned long s, e;
+
+    s = rdtsc();
+    do {
+        cpu_relax();
+        if ( nmi_count(cpu) >= start_count + 2 )
+            break;
+        e = rdtsc();
+    } while( e - s < ticks );
 }
 
 int __init check_nmi_watchdog (void)
@@ -156,15 +167,16 @@  int __init check_nmi_watchdog (void)
     for_each_online_cpu ( cpu )
         prev_nmi_count[cpu] = nmi_count(cpu);
 
-    /* Wait for 10 ticks.  Busy-wait on all CPUs: the LAPIC counter that
-     * the NMI watchdog uses only runs while the core's not halted */
-    if ( nmi_watchdog == NMI_LOCAL_APIC )
-        smp_call_function(wait_for_nmis, NULL, 0);
-    wait_for_nmis(NULL);
+    /*
+     * Wait at most 10 ticks for 2 watchdog NMIs on each CPU.
+     * Busy-wait on all CPUs: the LAPIC counter that the NMI watchdog
+     * uses only runs while the core's not halted
+     */
+    on_selected_cpus(&cpu_online_map, wait_for_nmis, NULL, 1);
 
     for_each_online_cpu ( cpu )
     {
-        if ( nmi_count(cpu) - prev_nmi_count[cpu] <= 5 )
+        if ( nmi_count(cpu) - prev_nmi_count[cpu] < 2 )
         {
             printk(" %d", cpu);
             ok = 0;