diff mbox

hardlockup: detect hard lockups without NMIs using secondary cpus

Message ID 1357783059-13923-1-git-send-email-ccross@android.com (mailing list archive)
State New, archived
Headers show

Commit Message

Colin Cross Jan. 10, 2013, 1:57 a.m. UTC
Emulate NMIs on systems where they are not available by using timer
interrupts on other cpus.  Each cpu will use its softlockup hrtimer
to check that the next cpu is processing hrtimer interrupts by
verifying that a counter is increasing.

This patch is useful on systems where the hardlockup detector is not
available due to a lack of NMIs, for example most ARM SoCs.
Without this patch any cpu stuck with interrupts disabled can
cause a hardware watchdog reset with no debugging information,
but with this patch the kernel can detect the lockup and panic,
which can result in useful debugging info.

Signed-off-by: Colin Cross <ccross@android.com>
---
 include/linux/nmi.h |    5 ++-
 kernel/watchdog.c   |   98 ++++++++++++++++++++++++++++++++++++++++++++++++---
 lib/Kconfig.debug   |   14 +++++++-
 3 files changed, 110 insertions(+), 7 deletions(-)

Comments

Don Zickus Jan. 10, 2013, 2:02 p.m. UTC | #1
On Wed, Jan 09, 2013 at 05:57:39PM -0800, Colin Cross wrote:
> Emulate NMIs on systems where they are not available by using timer
> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
> to check that the next cpu is processing hrtimer interrupts by
> verifying that a counter is increasing.
> 
> This patch is useful on systems where the hardlockup detector is not
> available due to a lack of NMIs, for example most ARM SoCs.

I have seen other cpus, like Sparc I think, create a 'virtual NMI' by
reserving an IRQ line as 'special' (can not be masked).  Not sure if that
is something worth looking at here (or even possible).

> Without this patch any cpu stuck with interrupts disabled can
> cause a hardware watchdog reset with no debugging information,
> but with this patch the kernel can detect the lockup and panic,
> which can result in useful debugging info.

<SNIP>
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
> +static int is_hardlockup_other_cpu(int cpu)
> +{
> +	unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
> +
> +	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> +		return 1;
> +
> +	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> +	return 0;

Will this race with the other cpu you are checking?  For example if cpuA
just updated its hrtimer_interrupts_saved and cpuB goes to check cpuA's
hrtimer_interrupts_saved, it seems possible that cpuB could falsely assume
cpuA is stuck?


> +}
> +
> +static void watchdog_check_hardlockup_other_cpu(void)
> +{
> +	int cpu;
> +	cpumask_t cpus = watchdog_cpus;
> +
> +	/*
> +	 * Test for hardlockups every 3 samples.  The sample period is
> +	 *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
> +	 *  watchdog_thresh (over by 20%).
> +	 */
> +	if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
> +		return;
> +
> +	/* check for a hardlockup on the next cpu */
> +	cpu = cpumask_next(smp_processor_id(), &cpus);
> +	if (cpu >= nr_cpu_ids)
> +		cpu = cpumask_first(&cpus);
> +	if (cpu == smp_processor_id())
> +		return;
> +
> +	smp_rmb();
> +
> +	if (per_cpu(watchdog_nmi_touch, cpu) == true) {
> +		per_cpu(watchdog_nmi_touch, cpu) = false;
> +		return;
> +	}

Same race here.  Usually touch_nmi_watchdog is reserved for those
functions that plan on disabling interrupts for a while.  cpuB could set
cpuA's watchdog_nmi_touch to false here expecting not to revisit this
variable for another couple of seconds.  While cpuA could read this
variable milliseconds later after cpuB sets it and falsely assume there is
a lockup?

Perhaps I am misreading the code?

If not, I don't have a good idea on how to solve those races off the top of my
head unfortunately.

Cheers,
Don
Russell King - ARM Linux Jan. 10, 2013, 2:22 p.m. UTC | #2
On Thu, Jan 10, 2013 at 09:02:15AM -0500, Don Zickus wrote:
> On Wed, Jan 09, 2013 at 05:57:39PM -0800, Colin Cross wrote:
> > Emulate NMIs on systems where they are not available by using timer
> > interrupts on other cpus.  Each cpu will use its softlockup hrtimer
> > to check that the next cpu is processing hrtimer interrupts by
> > verifying that a counter is increasing.
> > 
> > This patch is useful on systems where the hardlockup detector is not
> > available due to a lack of NMIs, for example most ARM SoCs.
> 
> I have seen other cpus, like Sparc I think, create a 'virtual NMI' by
> reserving an IRQ line as 'special' (can not be masked).  Not sure if that
> is something worth looking at here (or even possible).

No it isn't, because that assumes that things like spin_lock_irqsave()
won't mask that interrupt.  We don't have the facility to do that.
Frederic Weisbecker Jan. 10, 2013, 4:18 p.m. UTC | #3
2013/1/10 Russell King - ARM Linux <linux@arm.linux.org.uk>:
> On Thu, Jan 10, 2013 at 09:02:15AM -0500, Don Zickus wrote:
>> On Wed, Jan 09, 2013 at 05:57:39PM -0800, Colin Cross wrote:
>> > Emulate NMIs on systems where they are not available by using timer
>> > interrupts on other cpus.  Each cpu will use its softlockup hrtimer
>> > to check that the next cpu is processing hrtimer interrupts by
>> > verifying that a counter is increasing.
>> >
>> > This patch is useful on systems where the hardlockup detector is not
>> > available due to a lack of NMIs, for example most ARM SoCs.
>>
>> I have seen other cpus, like Sparc I think, create a 'virtual NMI' by
>> reserving an IRQ line as 'special' (can not be masked).  Not sure if that
>> is something worth looking at here (or even possible).
>
> No it isn't, because that assumes that things like spin_lock_irqsave()
> won't mask that interrupt.  We don't have the facility to do that.

I believe sparc is doing something like this though. Look at
arch/sparc/include/asm/irqflags_64.h, it seems NMIs are implemented
there using an irq number that is not masked by this function.

Not all archs can do that so easily I guess.
Russell King - ARM Linux Jan. 10, 2013, 5 p.m. UTC | #4
On Thu, Jan 10, 2013 at 05:18:40PM +0100, Frederic Weisbecker wrote:
> 2013/1/10 Russell King - ARM Linux <linux@arm.linux.org.uk>:
> > On Thu, Jan 10, 2013 at 09:02:15AM -0500, Don Zickus wrote:
> >> On Wed, Jan 09, 2013 at 05:57:39PM -0800, Colin Cross wrote:
> >> > Emulate NMIs on systems where they are not available by using timer
> >> > interrupts on other cpus.  Each cpu will use its softlockup hrtimer
> >> > to check that the next cpu is processing hrtimer interrupts by
> >> > verifying that a counter is increasing.
> >> >
> >> > This patch is useful on systems where the hardlockup detector is not
> >> > available due to a lack of NMIs, for example most ARM SoCs.
> >>
> >> I have seen other cpus, like Sparc I think, create a 'virtual NMI' by
> >> reserving an IRQ line as 'special' (can not be masked).  Not sure if that
> >> is something worth looking at here (or even possible).
> >
> > No it isn't, because that assumes that things like spin_lock_irqsave()
> > won't mask that interrupt.  We don't have the facility to do that.
> 
> I believe sparc is doing something like this though. Look at
> arch/sparc/include/asm/irqflags_64.h, it seems NMIs are implemented
> there using an irq number that is not masked by this function.

As I said, we don't have a facility to do that.

The CPU doesn't know about interrupt levels.  It's either all-IRQs-masked
or no-IRQs-masked.  If you want anything inbetween, you have to go outside
the CPU and fiddle with the IRQ controller, which may be one of _many_
different types, and some platforms even have a shadow IRQ controller.
Plus, doing such manipulation may in itself also require locking.
Colin Cross Jan. 10, 2013, 5:27 p.m. UTC | #5
On Thu, Jan 10, 2013 at 6:02 AM, Don Zickus <dzickus@redhat.com> wrote:
> On Wed, Jan 09, 2013 at 05:57:39PM -0800, Colin Cross wrote:
>> Emulate NMIs on systems where they are not available by using timer
>> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
>> to check that the next cpu is processing hrtimer interrupts by
>> verifying that a counter is increasing.
>>
>> This patch is useful on systems where the hardlockup detector is not
>> available due to a lack of NMIs, for example most ARM SoCs.
>
> I have seen other cpus, like Sparc I think, create a 'virtual NMI' by
> reserving an IRQ line as 'special' (can not be masked).  Not sure if that
> is something worth looking at here (or even possible).
>
>> Without this patch any cpu stuck with interrupts disabled can
>> cause a hardware watchdog reset with no debugging information,
>> but with this patch the kernel can detect the lockup and panic,
>> which can result in useful debugging info.
>
> <SNIP>
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
>> +static int is_hardlockup_other_cpu(int cpu)
>> +{
>> +     unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
>> +
>> +     if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
>> +             return 1;
>> +
>> +     per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
>> +     return 0;
>
> Will this race with the other cpu you are checking?  For example if cpuA
> just updated its hrtimer_interrupts_saved and cpuB goes to check cpuA's
> hrtimer_interrupts_saved, it seems possible that cpuB could falsely assume
> cpuA is stuck?

cpuA doesn't update its own hrtimer_interrupts_saved, cpuB does.
However, there may be a similar race condition during hotplug if cpuB
updates hrtimer_interrupts_saved for cpuA, then goes offline, then
cpuC may try to check cpuA and see that hrtimer_interrupts_saved ==
hrtimer_interrupts.  I think this can be solved by setting
watchdog_nmi_touch for the next cpu when a cpu goes online or offline.

>> +}
>> +
>> +static void watchdog_check_hardlockup_other_cpu(void)
>> +{
>> +     int cpu;
>> +     cpumask_t cpus = watchdog_cpus;
>> +
>> +     /*
>> +      * Test for hardlockups every 3 samples.  The sample period is
>> +      *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
>> +      *  watchdog_thresh (over by 20%).
>> +      */
>> +     if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
>> +             return;
>> +
>> +     /* check for a hardlockup on the next cpu */
>> +     cpu = cpumask_next(smp_processor_id(), &cpus);
>> +     if (cpu >= nr_cpu_ids)
>> +             cpu = cpumask_first(&cpus);
>> +     if (cpu == smp_processor_id())
>> +             return;
>> +
>> +     smp_rmb();
>> +
>> +     if (per_cpu(watchdog_nmi_touch, cpu) == true) {
>> +             per_cpu(watchdog_nmi_touch, cpu) = false;
>> +             return;
>> +     }
>
> Same race here.  Usually touch_nmi_watchdog is reserved for those
> functions that plan on disabling interrupts for a while.  cpuB could set
> cpuA's watchdog_nmi_touch to false here expecting not to revisit this
> variable for another couple of seconds.  While cpuA could read this
> variable milliseconds later after cpuB sets it and falsely assume there is
> a lockup?
>
> Perhaps I am misreading the code?

Again, cpuA won't ever read its own watchdog_nmi_touch variable, only
cpuB will.  The only variables cpuA updates for itself is
hrtimer_interrupts or setting watchdog_nmi_touch to true.
hrtimer_interrupts_saved and setting watchdog_nmi_touch to false are
done by the cpu watching over cpuA, so the only races here are when a
cpu goes offline and a different cpu starts watching over cpuA.

> If not, I don't have a good idea on how to solve those races off the top of my
> head unfortunately.
>
> Cheers,
> Don
Don Zickus Jan. 10, 2013, 6:17 p.m. UTC | #6
On Thu, Jan 10, 2013 at 09:27:28AM -0800, Colin Cross wrote:
> On Thu, Jan 10, 2013 at 6:02 AM, Don Zickus <dzickus@redhat.com> wrote:
> > On Wed, Jan 09, 2013 at 05:57:39PM -0800, Colin Cross wrote:
> >> Emulate NMIs on systems where they are not available by using timer
> >> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
> >> to check that the next cpu is processing hrtimer interrupts by
> >> verifying that a counter is increasing.
> >>
> >> This patch is useful on systems where the hardlockup detector is not
> >> available due to a lack of NMIs, for example most ARM SoCs.
> >
> > I have seen other cpus, like Sparc I think, create a 'virtual NMI' by
> > reserving an IRQ line as 'special' (can not be masked).  Not sure if that
> > is something worth looking at here (or even possible).
> >
> >> Without this patch any cpu stuck with interrupts disabled can
> >> cause a hardware watchdog reset with no debugging information,
> >> but with this patch the kernel can detect the lockup and panic,
> >> which can result in useful debugging info.
> >
> > <SNIP>
> >> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
> >> +static int is_hardlockup_other_cpu(int cpu)
> >> +{
> >> +     unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
> >> +
> >> +     if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> >> +             return 1;
> >> +
> >> +     per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> >> +     return 0;
> >
> > Will this race with the other cpu you are checking?  For example if cpuA
> > just updated its hrtimer_interrupts_saved and cpuB goes to check cpuA's
> > hrtimer_interrupts_saved, it seems possible that cpuB could falsely assume
> > cpuA is stuck?
> 
> cpuA doesn't update its own hrtimer_interrupts_saved, cpuB does.
> However, there may be a similar race condition during hotplug if cpuB
> updates hrtimer_interrupts_saved for cpuA, then goes offline, then
> cpuC may try to check cpuA and see that hrtimer_interrupts_saved ==
> hrtimer_interrupts.  I think this can be solved by setting
> watchdog_nmi_touch for the next cpu when a cpu goes online or offline.

Ah, that is where my misunderstanding was.  I overlooked the fact that it
was only updated by the other cpu.  Sorry about that.

I'll re-review it again with that in mind.

Cheers,
Don
Tony Lindgren Jan. 10, 2013, 8:38 p.m. UTC | #7
* Colin Cross <ccross@android.com> [130109 18:05]:
> +static void watchdog_check_hardlockup_other_cpu(void)
> +{
> +	int cpu;
> +	cpumask_t cpus = watchdog_cpus;
> +
> +	/*
> +	 * Test for hardlockups every 3 samples.  The sample period is
> +	 *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
> +	 *  watchdog_thresh (over by 20%).
> +	 */
> +	if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
> +		return;
> +
> +	/* check for a hardlockup on the next cpu */
> +	cpu = cpumask_next(smp_processor_id(), &cpus);

Hmm don't you want to check cpu_oneline_mask here and
return if the other CPU is offline?

> +	if (cpu >= nr_cpu_ids)
> +		cpu = cpumask_first(&cpus);
> +	if (cpu == smp_processor_id())
> +		return;

Regards,

Tony
Colin Cross Jan. 10, 2013, 10:34 p.m. UTC | #8
On Thu, Jan 10, 2013 at 12:38 PM, Tony Lindgren <tony@atomide.com> wrote:
>
> * Colin Cross <ccross@android.com> [130109 18:05]:
> > +static void watchdog_check_hardlockup_other_cpu(void)
> > +{
> > +     int cpu;
> > +     cpumask_t cpus = watchdog_cpus;
> > +
> > +     /*
> > +      * Test for hardlockups every 3 samples.  The sample period is
> > +      *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
> > +      *  watchdog_thresh (over by 20%).
> > +      */
> > +     if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
> > +             return;
> > +
> > +     /* check for a hardlockup on the next cpu */
> > +     cpu = cpumask_next(smp_processor_id(), &cpus);
>
> Hmm don't you want to check cpu_oneline_mask here and
> return if the other CPU is offline?

watchdog_cpus is effectively a local copy of cpu_online_mask, but
updated after the watchdog_nmi_touch in watchdog_nmi_enable.  This
avoids a false positive after hotplugging in a cpu when
cpu_online_mask is true but that cpu hasn't yet run it's first
hrtimer.

> > +     if (cpu >= nr_cpu_ids)
> > +             cpu = cpumask_first(&cpus);
> > +     if (cpu == smp_processor_id())
> > +             return;
>
> Regards,
>
> Tony
Tony Lindgren Jan. 10, 2013, 11:42 p.m. UTC | #9
* Colin Cross <ccross@android.com> [130110 14:37]:
> On Thu, Jan 10, 2013 at 12:38 PM, Tony Lindgren <tony@atomide.com> wrote:
> >
> > * Colin Cross <ccross@android.com> [130109 18:05]:
> > > +static void watchdog_check_hardlockup_other_cpu(void)
> > > +{
> > > +     int cpu;
> > > +     cpumask_t cpus = watchdog_cpus;
> > > +
> > > +     /*
> > > +      * Test for hardlockups every 3 samples.  The sample period is
> > > +      *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
> > > +      *  watchdog_thresh (over by 20%).
> > > +      */
> > > +     if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
> > > +             return;
> > > +
> > > +     /* check for a hardlockup on the next cpu */
> > > +     cpu = cpumask_next(smp_processor_id(), &cpus);
> >
> > Hmm don't you want to check cpu_oneline_mask here and
> > return if the other CPU is offline?
> 
> watchdog_cpus is effectively a local copy of cpu_online_mask, but
> updated after the watchdog_nmi_touch in watchdog_nmi_enable.  This
> avoids a false positive after hotplugging in a cpu when
> cpu_online_mask is true but that cpu hasn't yet run it's first
> hrtimer.

OK thanks for clarifying that.

Tony
Chuansheng Liu Jan. 11, 2013, 1:39 a.m. UTC | #10
> -----Original Message-----
> From: Colin Cross [mailto:ccross@android.com]
> Sent: Thursday, January 10, 2013 9:58 AM
> To: linux-kernel@vger.kernel.org
> Cc: Andrew Morton; Don Zickus; Ingo Molnar; Thomas Gleixner; Liu,
> Chuansheng; linux-arm-kernel@lists.infradead.org; Colin Cross
> Subject: [PATCH] hardlockup: detect hard lockups without NMIs using
> secondary cpus
> 
> Emulate NMIs on systems where they are not available by using timer
> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
> to check that the next cpu is processing hrtimer interrupts by
> verifying that a counter is increasing.
> 
> This patch is useful on systems where the hardlockup detector is not
> available due to a lack of NMIs, for example most ARM SoCs.
> Without this patch any cpu stuck with interrupts disabled can
> cause a hardware watchdog reset with no debugging information,
> but with this patch the kernel can detect the lockup and panic,
> which can result in useful debugging info.
> 
> Signed-off-by: Colin Cross <ccross@android.com>
> +static void watchdog_check_hardlockup_other_cpu(void)
> +{
> +	int cpu;
> +	cpumask_t cpus = watchdog_cpus;
> +
> +	/*
> +	 * Test for hardlockups every 3 samples.  The sample period is
> +	 *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
> +	 *  watchdog_thresh (over by 20%).
> +	 */
> +	if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
> +		return;
> +
> +	/* check for a hardlockup on the next cpu */
> +	cpu = cpumask_next(smp_processor_id(), &cpus);
> +	if (cpu >= nr_cpu_ids)
> +		cpu = cpumask_first(&cpus);
> +	if (cpu == smp_processor_id())
> +		return;
> +
> +	smp_rmb();
> +
> +	if (per_cpu(watchdog_nmi_touch, cpu) == true) {
> +		per_cpu(watchdog_nmi_touch, cpu) = false;
> +		return;
> +	}
> +
> +	if (is_hardlockup_other_cpu(cpu)) {
> +		/* only warn once */
One possible case for new hotplug CPU that false hardlockup case.
1/ Assume CPU1, CPU2 are online, CPU3 is being hotplug:
CPU3:                                            CPU2:
watchdog_nmi_enable()
 per_cpu(watchdog_nmi_touch, cpu) = true;
 cpumask_set_cpu(cpu, &watchdog_cpus);
                                                 watchdog_check_hardlockup_other_cpu()
                                                   per_cpu(watchdog_nmi_touch, cpu) = false; == > Here cpu is CPU3

2/ Before CPU3's first hrtimer interrupt coming, CPU2 is been offlined.
  Then CPU1's next CPU is CPU3. But we can not use CPU3's watchdog_nmi_touch to defense
  false CPU3 hardlock more. When CPU1's hrtimer interrupt coming, it is possible report CPU3
  false hard lockup.

Is it the case?
Colin Cross Jan. 11, 2013, 5:34 a.m. UTC | #11
On Thu, Jan 10, 2013 at 5:39 PM, Liu, Chuansheng
<chuansheng.liu@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: Colin Cross [mailto:ccross@android.com]
>> Sent: Thursday, January 10, 2013 9:58 AM
>> To: linux-kernel@vger.kernel.org
>> Cc: Andrew Morton; Don Zickus; Ingo Molnar; Thomas Gleixner; Liu,
>> Chuansheng; linux-arm-kernel@lists.infradead.org; Colin Cross
>> Subject: [PATCH] hardlockup: detect hard lockups without NMIs using
>> secondary cpus
>>
>> Emulate NMIs on systems where they are not available by using timer
>> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
>> to check that the next cpu is processing hrtimer interrupts by
>> verifying that a counter is increasing.
>>
>> This patch is useful on systems where the hardlockup detector is not
>> available due to a lack of NMIs, for example most ARM SoCs.
>> Without this patch any cpu stuck with interrupts disabled can
>> cause a hardware watchdog reset with no debugging information,
>> but with this patch the kernel can detect the lockup and panic,
>> which can result in useful debugging info.
>>
>> Signed-off-by: Colin Cross <ccross@android.com>
>> +static void watchdog_check_hardlockup_other_cpu(void)
>> +{
>> +     int cpu;
>> +     cpumask_t cpus = watchdog_cpus;
>> +
>> +     /*
>> +      * Test for hardlockups every 3 samples.  The sample period is
>> +      *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
>> +      *  watchdog_thresh (over by 20%).
>> +      */
>> +     if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
>> +             return;
>> +
>> +     /* check for a hardlockup on the next cpu */
>> +     cpu = cpumask_next(smp_processor_id(), &cpus);
>> +     if (cpu >= nr_cpu_ids)
>> +             cpu = cpumask_first(&cpus);
>> +     if (cpu == smp_processor_id())
>> +             return;
>> +
>> +     smp_rmb();
>> +
>> +     if (per_cpu(watchdog_nmi_touch, cpu) == true) {
>> +             per_cpu(watchdog_nmi_touch, cpu) = false;
>> +             return;
>> +     }
>> +
>> +     if (is_hardlockup_other_cpu(cpu)) {
>> +             /* only warn once */
> One possible case for new hotplug CPU that false hardlockup case.
> 1/ Assume CPU1, CPU2 are online, CPU3 is being hotplug:
> CPU3:                                            CPU2:
> watchdog_nmi_enable()
>  per_cpu(watchdog_nmi_touch, cpu) = true;
>  cpumask_set_cpu(cpu, &watchdog_cpus);
>                                                  watchdog_check_hardlockup_other_cpu()
>                                                    per_cpu(watchdog_nmi_touch, cpu) = false; == > Here cpu is CPU3
>
> 2/ Before CPU3's first hrtimer interrupt coming, CPU2 is been offlined.
>   Then CPU1's next CPU is CPU3. But we can not use CPU3's watchdog_nmi_touch to defense
>   false CPU3 hardlock more. When CPU1's hrtimer interrupt coming, it is possible report CPU3
>   false hard lockup.
>
> Is it the case?

Yes, this is the same race condition I pointed out in reply to Don
Zickus earlier in the thread.  I think the easiest solution is to set
per_cpu(watchdog_nmi_touch, next_cpu) = true during cpu offlining.
Chuansheng Liu Jan. 11, 2013, 5:57 a.m. UTC | #12
> -----Original Message-----
> From: ccross@google.com [mailto:ccross@google.com] On Behalf Of Colin
> Cross
> Sent: Friday, January 11, 2013 1:34 PM
> To: Liu, Chuansheng
> Cc: linux-kernel@vger.kernel.org; Andrew Morton; Don Zickus; Ingo Molnar;
> Thomas Gleixner; linux-arm-kernel@lists.infradead.org
> Subject: Re: [PATCH] hardlockup: detect hard lockups without NMIs using
> secondary cpus
> 
> On Thu, Jan 10, 2013 at 5:39 PM, Liu, Chuansheng
> <chuansheng.liu@intel.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Colin Cross [mailto:ccross@android.com]
> >> Sent: Thursday, January 10, 2013 9:58 AM
> >> To: linux-kernel@vger.kernel.org
> >> Cc: Andrew Morton; Don Zickus; Ingo Molnar; Thomas Gleixner; Liu,
> >> Chuansheng; linux-arm-kernel@lists.infradead.org; Colin Cross
> >> Subject: [PATCH] hardlockup: detect hard lockups without NMIs using
> >> secondary cpus
> >>
> >> Emulate NMIs on systems where they are not available by using timer
> >> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
> >> to check that the next cpu is processing hrtimer interrupts by
> >> verifying that a counter is increasing.
> >>
> >> This patch is useful on systems where the hardlockup detector is not
> >> available due to a lack of NMIs, for example most ARM SoCs.
> >> Without this patch any cpu stuck with interrupts disabled can
> >> cause a hardware watchdog reset with no debugging information,
> >> but with this patch the kernel can detect the lockup and panic,
> >> which can result in useful debugging info.
> >>
> >> Signed-off-by: Colin Cross <ccross@android.com>
> >> +static void watchdog_check_hardlockup_other_cpu(void)
> >> +{
> >> +     int cpu;
> >> +     cpumask_t cpus = watchdog_cpus;
> >> +
> >> +     /*
> >> +      * Test for hardlockups every 3 samples.  The sample period is
> >> +      *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly
> over
> >> +      *  watchdog_thresh (over by 20%).
> >> +      */
> >> +     if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
> >> +             return;
> >> +
Another feeling is about __this_cpu_read(hrtimer_interrupts) % 3 != 0,
It will cause the actual timeout value for hard lockup detection is not very fix, or even
very short.
Sometimes using 3 samples can detect the lockup case, but sometimes 1 sample.
Is it the case?

And in NMI case, the NMI interrupt is coming at least every watchdog_thresh.
Colin Cross Jan. 11, 2013, 6:17 a.m. UTC | #13
On Thu, Jan 10, 2013 at 9:57 PM, Liu, Chuansheng
<chuansheng.liu@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: ccross@google.com [mailto:ccross@google.com] On Behalf Of Colin
>> Cross
>> Sent: Friday, January 11, 2013 1:34 PM
>> To: Liu, Chuansheng
>> Cc: linux-kernel@vger.kernel.org; Andrew Morton; Don Zickus; Ingo Molnar;
>> Thomas Gleixner; linux-arm-kernel@lists.infradead.org
>> Subject: Re: [PATCH] hardlockup: detect hard lockups without NMIs using
>> secondary cpus
>>
>> On Thu, Jan 10, 2013 at 5:39 PM, Liu, Chuansheng
>> <chuansheng.liu@intel.com> wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Colin Cross [mailto:ccross@android.com]
>> >> Sent: Thursday, January 10, 2013 9:58 AM
>> >> To: linux-kernel@vger.kernel.org
>> >> Cc: Andrew Morton; Don Zickus; Ingo Molnar; Thomas Gleixner; Liu,
>> >> Chuansheng; linux-arm-kernel@lists.infradead.org; Colin Cross
>> >> Subject: [PATCH] hardlockup: detect hard lockups without NMIs using
>> >> secondary cpus
>> >>
>> >> Emulate NMIs on systems where they are not available by using timer
>> >> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
>> >> to check that the next cpu is processing hrtimer interrupts by
>> >> verifying that a counter is increasing.
>> >>
>> >> This patch is useful on systems where the hardlockup detector is not
>> >> available due to a lack of NMIs, for example most ARM SoCs.
>> >> Without this patch any cpu stuck with interrupts disabled can
>> >> cause a hardware watchdog reset with no debugging information,
>> >> but with this patch the kernel can detect the lockup and panic,
>> >> which can result in useful debugging info.
>> >>
>> >> Signed-off-by: Colin Cross <ccross@android.com>
>> >> +static void watchdog_check_hardlockup_other_cpu(void)
>> >> +{
>> >> +     int cpu;
>> >> +     cpumask_t cpus = watchdog_cpus;
>> >> +
>> >> +     /*
>> >> +      * Test for hardlockups every 3 samples.  The sample period is
>> >> +      *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly
>> over
>> >> +      *  watchdog_thresh (over by 20%).
>> >> +      */
>> >> +     if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
>> >> +             return;
>> >> +
> Another feeling is about __this_cpu_read(hrtimer_interrupts) % 3 != 0,
> It will cause the actual timeout value for hard lockup detection is not very fix, or even
> very short.
> Sometimes using 3 samples can detect the lockup case, but sometimes 1 sample.
> Is it the case?

I'm not sure what you mean.  The mod 3 will cause every 3rd timer (12
seconds, assuming watchdog_thresh = 10) to check hrtimer_interrupts
vs. hrtimer_interrupts_saved, and then update it.  The sampling should
be fixed and very accurate.  It will cause a panic/warning between 12
and 24 seconds after a cpu stops processing timer interrupts,
depending on the alignment of the hrtimers between the two cpus.

> And in NMI case, the NMI interrupt is coming at least every watchdog_thresh.

NMI interrupt will happen every 10 seconds instead of 12, meaning the
panic/warning will occur between 10 and 20 seconds after a cpu stops
processing timer interrupts, depending on the alignment of the NMI
with the hrtimer, but otherwise my patch should be very similar.
Chuansheng Liu Jan. 11, 2013, 6:27 a.m. UTC | #14
> -----Original Message-----
> From: ccross@google.com [mailto:ccross@google.com] On Behalf Of Colin
> Cross
> Sent: Friday, January 11, 2013 2:18 PM
> To: Liu, Chuansheng
> Cc: linux-kernel@vger.kernel.org; Andrew Morton; Don Zickus; Ingo Molnar;
> Thomas Gleixner; linux-arm-kernel@lists.infradead.org
> Subject: Re: [PATCH] hardlockup: detect hard lockups without NMIs using
> secondary cpus
> 
> On Thu, Jan 10, 2013 at 9:57 PM, Liu, Chuansheng
> <chuansheng.liu@intel.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: ccross@google.com [mailto:ccross@google.com] On Behalf Of Colin
> >> Cross
> >> Sent: Friday, January 11, 2013 1:34 PM
> >> To: Liu, Chuansheng
> >> Cc: linux-kernel@vger.kernel.org; Andrew Morton; Don Zickus; Ingo Molnar;
> >> Thomas Gleixner; linux-arm-kernel@lists.infradead.org
> >> Subject: Re: [PATCH] hardlockup: detect hard lockups without NMIs using
> >> secondary cpus
> >>
> >> On Thu, Jan 10, 2013 at 5:39 PM, Liu, Chuansheng
> >> <chuansheng.liu@intel.com> wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Colin Cross [mailto:ccross@android.com]
> >> >> Sent: Thursday, January 10, 2013 9:58 AM
> >> >> To: linux-kernel@vger.kernel.org
> >> >> Cc: Andrew Morton; Don Zickus; Ingo Molnar; Thomas Gleixner; Liu,
> >> >> Chuansheng; linux-arm-kernel@lists.infradead.org; Colin Cross
> >> >> Subject: [PATCH] hardlockup: detect hard lockups without NMIs using
> >> >> secondary cpus
> >> >>
> >> >> Emulate NMIs on systems where they are not available by using timer
> >> >> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
> >> >> to check that the next cpu is processing hrtimer interrupts by
> >> >> verifying that a counter is increasing.
> >> >>
> >> >> This patch is useful on systems where the hardlockup detector is not
> >> >> available due to a lack of NMIs, for example most ARM SoCs.
> >> >> Without this patch any cpu stuck with interrupts disabled can
> >> >> cause a hardware watchdog reset with no debugging information,
> >> >> but with this patch the kernel can detect the lockup and panic,
> >> >> which can result in useful debugging info.
> >> >>
> >> >> Signed-off-by: Colin Cross <ccross@android.com>
> >> >> +static void watchdog_check_hardlockup_other_cpu(void)
> >> >> +{
> >> >> +     int cpu;
> >> >> +     cpumask_t cpus = watchdog_cpus;
> >> >> +
> >> >> +     /*
> >> >> +      * Test for hardlockups every 3 samples.  The sample period is
> >> >> +      *  watchdog_thresh * 2 / 5, so 3 samples gets us back to
> slightly
> >> over
> >> >> +      *  watchdog_thresh (over by 20%).
> >> >> +      */
> >> >> +     if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
> >> >> +             return;
> >> >> +
> > Another feeling is about __this_cpu_read(hrtimer_interrupts) % 3 != 0,
> > It will cause the actual timeout value for hard lockup detection is not very fix,
> or even
> > very short.
> > Sometimes using 3 samples can detect the lockup case, but sometimes 1
> sample.
> > Is it the case?
> 
> I'm not sure what you mean.  The mod 3 will cause every 3rd timer (12
> seconds, assuming watchdog_thresh = 10) to check hrtimer_interrupts
> vs. hrtimer_interrupts_saved, and then update it.  The sampling should
> be fixed and very accurate.  It will cause a panic/warning between 12
> and 24 seconds after a cpu stops processing timer interrupts,
> depending on the alignment of the hrtimers between the two cpus.
> 
You are right, thanks.
diff mbox

Patch

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index db50840..c8f8aa0 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -14,8 +14,11 @@ 
  * may be used to reset the timeout - for code which intentionally
  * disables interrupts for a long time. This call is stateless.
  */
-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
+#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR_NMI)
 #include <asm/nmi.h>
+#endif
+
+#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void touch_nmi_watchdog(void);
 #else
 static inline void touch_nmi_watchdog(void)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 75a2ab3..94c231e 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -44,6 +44,11 @@ 
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
+#endif
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
+static cpumask_t __read_mostly watchdog_cpus;
+#endif
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
 #endif
 
@@ -179,7 +184,7 @@  void touch_softlockup_watchdog_sync(void)
 	__raw_get_cpu_var(watchdog_touch_ts) = 0;
 }
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 /* watchdog detector functions */
 static int is_hardlockup(void)
 {
@@ -193,6 +198,64 @@  static int is_hardlockup(void)
 }
 #endif
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
+static int is_hardlockup_other_cpu(int cpu)
+{
+	unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
+
+	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
+		return 1;
+
+	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
+	return 0;
+}
+
+static void watchdog_check_hardlockup_other_cpu(void)
+{
+	int cpu;
+	cpumask_t cpus = watchdog_cpus;
+
+	/*
+	 * Test for hardlockups every 3 samples.  The sample period is
+	 *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
+	 *  watchdog_thresh (over by 20%).
+	 */
+	if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
+		return;
+
+	/* check for a hardlockup on the next cpu */
+	cpu = cpumask_next(smp_processor_id(), &cpus);
+	if (cpu >= nr_cpu_ids)
+		cpu = cpumask_first(&cpus);
+	if (cpu == smp_processor_id())
+		return;
+
+	smp_rmb();
+
+	if (per_cpu(watchdog_nmi_touch, cpu) == true) {
+		per_cpu(watchdog_nmi_touch, cpu) = false;
+		return;
+	}
+
+	if (is_hardlockup_other_cpu(cpu)) {
+		/* only warn once */
+		if (per_cpu(hard_watchdog_warn, cpu) == true)
+			return;
+
+		if (hardlockup_panic)
+			panic("Watchdog detected hard LOCKUP on cpu %d", cpu);
+		else
+			WARN(1, "Watchdog detected hard LOCKUP on cpu %d", cpu);
+
+		per_cpu(hard_watchdog_warn, cpu) = true;
+	} else {
+		per_cpu(hard_watchdog_warn, cpu) = false;
+	}
+}
+#else
+static inline void watchdog_check_hardlockup_other_cpu(void) { return; }
+#endif
+
 static int is_softlockup(unsigned long touch_ts)
 {
 	unsigned long now = get_timestamp(smp_processor_id());
@@ -204,7 +267,7 @@  static int is_softlockup(unsigned long touch_ts)
 	return 0;
 }
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 
 static struct perf_event_attr wd_hw_attr = {
 	.type		= PERF_TYPE_HARDWARE,
@@ -252,7 +315,7 @@  static void watchdog_overflow_callback(struct perf_event *event,
 	__this_cpu_write(hard_watchdog_warn, false);
 	return;
 }
-#endif /* CONFIG_HARDLOCKUP_DETECTOR */
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_NMI */
 
 static void watchdog_interrupt_count(void)
 {
@@ -272,6 +335,9 @@  static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	/* kick the hardlockup detector */
 	watchdog_interrupt_count();
 
+	/* test for hardlockups on the next cpu */
+	watchdog_check_hardlockup_other_cpu();
+
 	/* kick the softlockup detector */
 	wake_up_process(__this_cpu_read(softlockup_watchdog));
 
@@ -396,7 +462,7 @@  static void watchdog(unsigned int cpu)
 	__touch_watchdog();
 }
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 /*
  * People like the simple clean cpu node info on boot.
  * Reduce the watchdog noise by only printing messages
@@ -472,9 +538,31 @@  static void watchdog_nmi_disable(unsigned int cpu)
 	return;
 }
 #else
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
+static int watchdog_nmi_enable(unsigned int cpu)
+{
+	/*
+	 * The new cpu will be marked online before the first hrtimer interrupt
+	 * runs on it.  If another cpu tests for a hardlockup on the new cpu
+	 * before it has run its first hrtimer, it will get a false positive.
+	 * Touch the watchdog on the new cpu to delay the first check for at
+	 * least 3 sampling periods to guarantee one hrtimer has run on the new
+	 * cpu.
+	 */
+	per_cpu(watchdog_nmi_touch, cpu) = true;
+	smp_wmb();
+	cpumask_set_cpu(cpu, &watchdog_cpus);
+	return 0;
+}
+
+static void watchdog_nmi_disable(unsigned int cpu) {
+	cpumask_clear_cpu(cpu, &watchdog_cpus);
+}
+#else
 static int watchdog_nmi_enable(unsigned int cpu) { return 0; }
 static void watchdog_nmi_disable(unsigned int cpu) { return; }
-#endif /* CONFIG_HARDLOCKUP_DETECTOR */
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU */
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_NMI */
 
 /* prepare/enable/disable routines */
 /* sysctl functions */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index aaf8baf..f7c4859 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -191,15 +191,27 @@  config LOCKUP_DETECTOR
 	  The overhead should be minimal.  A periodic hrtimer runs to
 	  generate interrupts and kick the watchdog task every 4 seconds.
 	  An NMI is generated every 10 seconds or so to check for hardlockups.
+	  If NMIs are not available on the platform, every 12 seconds the
+	  hrtimer interrupt on one cpu will be used to check for hardlockups
+	  on the next cpu.
 
 	  The frequency of hrtimer and NMI events and the soft and hard lockup
 	  thresholds can be controlled through the sysctl watchdog_thresh.
 
-config HARDLOCKUP_DETECTOR
+config HARDLOCKUP_DETECTOR_NMI
 	def_bool y
 	depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
 	depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
 
+config HARDLOCKUP_DETECTOR_OTHER_CPU
+	def_bool y
+	depends on LOCKUP_DETECTOR && SMP
+	depends on !HARDLOCKUP_DETECTOR_NMI && !HAVE_NMI_WATCHDOG
+
+config HARDLOCKUP_DETECTOR
+	def_bool y
+	depends on HARDLOCKUP_DETECTOR_NMI || HARDLOCKUP_DETECTOR_OTHER_CPU
+
 config BOOTPARAM_HARDLOCKUP_PANIC
 	bool "Panic (Reboot) On Hard Lockups"
 	depends on HARDLOCKUP_DETECTOR