[RFC] ARM: smp: Fix the CPU hotplug race with scheduler.

Message ID	20110620101438.GD2082@n2100.arm.linux.org.uk (mailing list archive)
State	New, archived
Headers	show Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by demeter1.kernel.org (8.14.4/8.14.4) with ESMTP id p5KAFCWK018924 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <patchwork-linux-arm@patchwork.kernel.org>; Mon, 20 Jun 2011 10:15:33 GMT Date: Mon, 20 Jun 2011 11:14:38 +0100 From: Russell King - ARM Linux <linux@arm.linux.org.uk> To: Santosh Shilimkar <santosh.shilimkar@ti.com> Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler. Message-ID: <20110620101438.GD2082@n2100.arm.linux.org.uk> References: <1308561839-18407-1-git-send-email-santosh.shilimkar@ti.com> <20110620095053.GA2082@n2100.arm.linux.org.uk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20110620095053.GA2082@n2100.arm.linux.org.uk> User-Agent: Mutt/1.5.19 (2009-01-05) summary: Content analysis details: (1.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 1.3 RDNS_NONE Delivered to internal network by a host with no rDNS Cc: Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, linux-omap@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Russell King - ARM Linux June 20, 2011, 10:14 a.m. UTC

On Mon, Jun 20, 2011 at 10:50:53AM +0100, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote:
> > The current ARM CPU hotplug code suffers from couple of race conditions
> > in CPU online path with scheduler.
> > The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked
> > active as part of cpu_notify() by the CPU which brought it up before
> > enabling interrupts.
> 
> Hmm, why not just move the set_cpu_online() call before notify_cpu_starting()
> and add the wait after the set_cpu_online() ?

Actually, the race is caused by the CPU being marked online (and therefore
available for the scheduler) but not yet active (the CPU asking this one
to boot hasn't run the online notifiers yet.)

This, I feel, is a fault of generic code.  If the CPU is not ready to have
processes scheduled on it (because migration is not initialized) then we
shouldn't be scheduling processes on the new CPU yet.

In any case, this should close the window by ensuring that we don't receive
an interrupt in the online-but-not-active case.  Can you please test?

 arch/arm/kernel/smp.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

Santosh Shilimkar June 20, 2011, 10:28 a.m. UTC | #1

On 6/20/2011 3:44 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 10:50:53AM +0100, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote:
>>> The current ARM CPU hotplug code suffers from couple of race conditions
>>> in CPU online path with scheduler.
>>> The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked
>>> active as part of cpu_notify() by the CPU which brought it up before
>>> enabling interrupts.
>>
>> Hmm, why not just move the set_cpu_online() call before notify_cpu_starting()
>> and add the wait after the set_cpu_online() ?
>
> Actually, the race is caused by the CPU being marked online (and therefore
> available for the scheduler) but not yet active (the CPU asking this one
> to boot hasn't run the online notifiers yet.)
>
Scheduler uses the active mask and not online mask. For schedules CPU
is ready for migration as soon as it is marked as active and that's
the reason, interrupts should never be enabled before CPU is marked
as active in online path.

> This, I feel, is a fault of generic code.  If the CPU is not ready to have
> processes scheduled on it (because migration is not initialized) then we
> shouldn't be scheduling processes on the new CPU yet.
>
> In any case, this should close the window by ensuring that we don't receive
> an interrupt in the online-but-not-active case.  Can you please test?
>
No it doesn't work. I still get the crash. The important point
here is not to enable interrupts before CPU is marked
as online and active.

Regards
Santosh

Russell King - ARM Linux June 20, 2011, 10:35 a.m. UTC | #2

On Mon, Jun 20, 2011 at 03:58:03PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 3:44 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 10:50:53AM +0100, Russell King - ARM Linux wrote:
>>> On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote:
>>>> The current ARM CPU hotplug code suffers from couple of race conditions
>>>> in CPU online path with scheduler.
>>>> The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked
>>>> active as part of cpu_notify() by the CPU which brought it up before
>>>> enabling interrupts.
>>>
>>> Hmm, why not just move the set_cpu_online() call before notify_cpu_starting()
>>> and add the wait after the set_cpu_online() ?
>>
>> Actually, the race is caused by the CPU being marked online (and therefore
>> available for the scheduler) but not yet active (the CPU asking this one
>> to boot hasn't run the online notifiers yet.)
>>
> Scheduler uses the active mask and not online mask. For schedules CPU
> is ready for migration as soon as it is marked as active and that's
> the reason, interrupts should never be enabled before CPU is marked
> as active in online path.
>
>> This, I feel, is a fault of generic code.  If the CPU is not ready to have
>> processes scheduled on it (because migration is not initialized) then we
>> shouldn't be scheduling processes on the new CPU yet.
>>
>> In any case, this should close the window by ensuring that we don't receive
>> an interrupt in the online-but-not-active case.  Can you please test?
>>
> No it doesn't work. I still get the crash. The important point
> here is not to enable interrupts before CPU is marked
> as online and active.

But we can't do that.

Russell King - ARM Linux June 20, 2011, 10:44 a.m. UTC | #3

On Mon, Jun 20, 2011 at 03:58:03PM +0530, Santosh Shilimkar wrote:
> No it doesn't work. I still get the crash. The important point
> here is not to enable interrupts before CPU is marked
> as online and active.

What is the crash (in full please)?

Do we know what interrupt is causing it?

Santosh Shilimkar June 20, 2011, 10:45 a.m. UTC | #4

On 6/20/2011 4:05 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 03:58:03PM +0530, Santosh Shilimkar wrote:
>> On 6/20/2011 3:44 PM, Russell King - ARM Linux wrote:
>>> On Mon, Jun 20, 2011 at 10:50:53AM +0100, Russell King - ARM Linux wrote:
>>>> On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote:
>>>>> The current ARM CPU hotplug code suffers from couple of race conditions
>>>>> in CPU online path with scheduler.
>>>>> The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked
>>>>> active as part of cpu_notify() by the CPU which brought it up before
>>>>> enabling interrupts.
>>>>
>>>> Hmm, why not just move the set_cpu_online() call before notify_cpu_starting()
>>>> and add the wait after the set_cpu_online() ?
>>>
>>> Actually, the race is caused by the CPU being marked online (and therefore
>>> available for the scheduler) but not yet active (the CPU asking this one
>>> to boot hasn't run the online notifiers yet.)
>>>
>> Scheduler uses the active mask and not online mask. For schedules CPU
>> is ready for migration as soon as it is marked as active and that's
>> the reason, interrupts should never be enabled before CPU is marked
>> as active in online path.
>>
>>> This, I feel, is a fault of generic code.  If the CPU is not ready to have
>>> processes scheduled on it (because migration is not initialized) then we
>>> shouldn't be scheduling processes on the new CPU yet.
>>>
>>> In any case, this should close the window by ensuring that we don't receive
>>> an interrupt in the online-but-not-active case.  Can you please test?
>>>
>> No it doesn't work. I still get the crash. The important point
>> here is not to enable interrupts before CPU is marked
>> as online and active.
>
> But we can't do that.
Why is that ?
Is it because of calibration or the hotplug start notifies needs to
be called with interrupts enabled ?

Regards
Santosh

Santosh Shilimkar June 20, 2011, 10:47 a.m. UTC | #5

On 6/20/2011 4:14 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 03:58:03PM +0530, Santosh Shilimkar wrote:
>> No it doesn't work. I still get the crash. The important point
>> here is not to enable interrupts before CPU is marked
>> as online and active.
>
> What is the crash (in full please)?
>
> Do we know what interrupt is causing it?
Yes. It's because of interrupt and the CPU active-online
race.

Here is the chash log..
[   21.025451] CPU1: Booted secondary processor
[   21.025451] CPU1: Unknown IPI message 0x1
[   21.029113] Switched to NOHz mode on CPU #1
[   21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
[   21.029235] [<c0064704>] (unwind_backtrace+0x0/0xf4) from 
[<c028edc8>] (do_raw_spin_lock+0xd0/0x164)
[   21.029266] [<c028edc8>] (do_raw_spin_lock+0xd0/0x164) from 
[<c00cc3c4>] (tick_do_update_jiffies64+0x3c/0x118)
[   21.029296] [<c00cc3c4>] (tick_do_update_jiffies64+0x3c/0x118) from 
[<c00ccb04>] (tick_check_idle+0xb0/0x110)
[   21.029327] [<c00ccb04>] (tick_check_idle+0xb0/0x110) from 
[<c00a29cc>] (irq_enter+0x68/0x70)
[   21.029327] [<c00a29cc>] (irq_enter+0x68/0x70) from [<c00623c4>] 
(ipi_timer+0x24/0x40)
[   21.029357] [<c00623c4>] (ipi_timer+0x24/0x40) from [<c0051368>] 
(do_local_timer+0x54/0x70)
[   21.029388] [<c0051368>] (do_local_timer+0x54/0x70) from [<c048a09c>] 
(__irq_svc+0x3c/0x120)
[   21.029388] Exception stack(0xef87bf78 to 0xef87bfc0)
[   21.029388] bf60: 
    00000000 00026ec0
[   21.029418] bf80: c0622080 ffff7483 c0622080 ffff7483 ef87a000 
00000000 c0622080 411fc092
[   21.029418] bfa0: c063a4f0 00000000 00000001 ef87bfc0 c0482e08 
c0482b0c 60000113 ffffffff
[   21.029449] [<c048a09c>] (__irq_svc+0x3c/0x120) from [<c0482b0c>] 
(calibrate_delay+0x8c/0x1d4)
[   21.029479] [<c0482b0c>] (calibrate_delay+0x8c/0x1d4) from 
[<c0482e08>] (secondary_start_kernel+0x110/0x1ac)
[   21.029510] [<c0482e08>] (secondary_start_kernel+0x110/0x1ac) from 
[<c0070ee4>] (platform_cpu_die+0x34/0x54)
[   22.021362] CPU1: failed to come online
[   23.997955] CPU1: failed to come online
[   25.000122] BUG: spinlock lockup on CPU#0, kthreadd/663, efa27e64
[   25.006408] [<c0064704>] (unwind_backtrace+0x0/0xf4) from 
[<c028edc8>] (do_raw_spin_lock+0xd0/0x164)
[   25.015808] [<c028edc8>] (do_raw_spin_lock+0xd0/0x164) from 
[<c048985c>] (_raw_spin_lock_irqsave+0x4c/0x58)
[   25.025848] [<c048985c>] (_raw_spin_lock_irqsave+0x4c/0x58) from 
[<c008ba24>] (complete+0x1c/0x5c)
[   25.035095] [<c008ba24>] (complete+0x1c/0x5c) from [<c00baf78>] 
(kthread+0x68/0x90)
[   25.042968] [<c00baf78>] (kthread+0x68/0x90) from [<c005dfdc>] 
(kernel_thread_exit+0x0/0x8)

Russell King - ARM Linux June 20, 2011, 11:13 a.m. UTC | #6

On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
> Yes. It's because of interrupt and the CPU active-online
> race.

I don't see that as a conclusion from this dump.

> Here is the chash log..
> [   21.025451] CPU1: Booted secondary processor
> [   21.025451] CPU1: Unknown IPI message 0x1
> [   21.029113] Switched to NOHz mode on CPU #1
> [   21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4

That's the xtime seqlock.  We're trying to update the xtime from CPU1,
which is not yet online and not yet active.  That's fine, we're just
spinning on the spinlock here, waiting for the other CPUs to release
it.

But what this is saying is that the other CPUs aren't releasing it.
The cpu hotplug code doesn't hold the seqlock either.  So who else is
holding this lock, causing CPU1 to time out on it.

The other thing is that this is only supposed to trigger after about
one second:

        u64 loops = loops_per_jiffy * HZ;
                for (i = 0; i < loops; i++) {
                        if (arch_spin_trylock(&lock->raw_lock))
                                return;
                        __delay(1);
                }

which from the timings you have at the beginning of your printk lines
is clearly not the case - it's more like 61us.

Are you running with those h/w timer delay patches?

Santosh Shilimkar June 20, 2011, 11:25 a.m. UTC | #7

On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>> Yes. It's because of interrupt and the CPU active-online
>> race.
>
> I don't see that as a conclusion from this dump.
>
>> Here is the chash log..
>> [   21.025451] CPU1: Booted secondary processor
>> [   21.025451] CPU1: Unknown IPI message 0x1
>> [   21.029113] Switched to NOHz mode on CPU #1
>> [   21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>
> That's the xtime seqlock.  We're trying to update the xtime from CPU1,
> which is not yet online and not yet active.  That's fine, we're just
> spinning on the spinlock here, waiting for the other CPUs to release
> it.
>
> But what this is saying is that the other CPUs aren't releasing it.
> The cpu hotplug code doesn't hold the seqlock either.  So who else is
> holding this lock, causing CPU1 to time out on it.
>
> The other thing is that this is only supposed to trigger after about
> one second:
>
>          u64 loops = loops_per_jiffy * HZ;
>                  for (i = 0; i<  loops; i++) {
>                          if (arch_spin_trylock(&lock->raw_lock))
>                                  return;
>                          __delay(1);
>                  }
>
> which from the timings you have at the beginning of your printk lines
> is clearly not the case - it's more like 61us.
>
> Are you running with those h/w timer delay patches?
Nope.

Regards
Santosh

Russell King - ARM Linux June 20, 2011, 11:40 a.m. UTC | #8

On Mon, Jun 20, 2011 at 04:55:43PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>>> Yes. It's because of interrupt and the CPU active-online
>>> race.
>>
>> I don't see that as a conclusion from this dump.
>>
>>> Here is the chash log..
>>> [   21.025451] CPU1: Booted secondary processor
>>> [   21.025451] CPU1: Unknown IPI message 0x1
>>> [   21.029113] Switched to NOHz mode on CPU #1
>>> [   21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>>
>> That's the xtime seqlock.  We're trying to update the xtime from CPU1,
>> which is not yet online and not yet active.  That's fine, we're just
>> spinning on the spinlock here, waiting for the other CPUs to release
>> it.
>>
>> But what this is saying is that the other CPUs aren't releasing it.
>> The cpu hotplug code doesn't hold the seqlock either.  So who else is
>> holding this lock, causing CPU1 to time out on it.
>>
>> The other thing is that this is only supposed to trigger after about
>> one second:
>>
>>          u64 loops = loops_per_jiffy * HZ;
>>                  for (i = 0; i<  loops; i++) {
>>                          if (arch_spin_trylock(&lock->raw_lock))
>>                                  return;
>>                          __delay(1);
>>                  }
>>
>> which from the timings you have at the beginning of your printk lines
>> is clearly not the case - it's more like 61us.
>>
>> Are you running with those h/w timer delay patches?
> Nope.

Ok.  So loops_per_jiffy must be too small.  My guess is you're using an
older kernel without 71c696b1 (calibrate: extract fall-back calculation
into own helper).

The delay calibration code used to start out by setting:

	loops_per_jiffy = (1<<12);

This will shorten the delay right down, and that's probably causing these
false spinlock lockup bug dumps.

Arranging for IRQs to be disabled across the delay calibration just avoids
the issue by preventing any spinlock being taken.

The reason that CPU#0 also complains about spinlock lockup is that for
some reason CPU#1 never finishes its calibration, and so the loop also
times out early on CPU#0.

Of course, fiddling with this global variable in this way is _not_ a good
idea while other CPUs are running and using that variable.

We could also do with implementing trigger_all_cpu_backtrace() to get
backtraces from the other CPUs when spinlock lockup happens...

Santosh Shilimkar June 20, 2011, 11:42 a.m. UTC | #9

On 6/20/2011 4:15 PM, Santosh Shilimkar wrote:
> On 6/20/2011 4:05 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 03:58:03PM +0530, Santosh Shilimkar wrote:
>>> On 6/20/2011 3:44 PM, Russell King - ARM Linux wrote:
>>>> On Mon, Jun 20, 2011 at 10:50:53AM +0100, Russell King - ARM Linux
>>>> wrote:
>>>>> On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote:
>>>>>> The current ARM CPU hotplug code suffers from couple of race
>>>>>> conditions
>>>>>> in CPU online path with scheduler.
>>>>>> The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be
>>>>>> marked
>>>>>> active as part of cpu_notify() by the CPU which brought it up before
>>>>>> enabling interrupts.
>>>>>
>>>>> Hmm, why not just move the set_cpu_online() call before
>>>>> notify_cpu_starting()
>>>>> and add the wait after the set_cpu_online() ?
>>>>
>>>> Actually, the race is caused by the CPU being marked online (and
>>>> therefore
>>>> available for the scheduler) but not yet active (the CPU asking this
>>>> one
>>>> to boot hasn't run the online notifiers yet.)
>>>>
>>> Scheduler uses the active mask and not online mask. For schedules CPU
>>> is ready for migration as soon as it is marked as active and that's
>>> the reason, interrupts should never be enabled before CPU is marked
>>> as active in online path.
>>>
>>>> This, I feel, is a fault of generic code. If the CPU is not ready to
>>>> have
>>>> processes scheduled on it (because migration is not initialized)
>>>> then we
>>>> shouldn't be scheduling processes on the new CPU yet.
>>>>
>>>> In any case, this should close the window by ensuring that we don't
>>>> receive
>>>> an interrupt in the online-but-not-active case. Can you please test?
>>>>
>>> No it doesn't work. I still get the crash. The important point
>>> here is not to enable interrupts before CPU is marked
>>> as online and active.
>>
>> But we can't do that.
> Why is that ?
> Is it because of calibration or the hotplug start notifies needs to
> be called with interrupts enabled ?
>
BTW, how is ARM different from X86 here. I mean the X86 code seems to
do similar what my patch is trying to fix for ARM. Some pointers
would help me to understand why can't we delay the interrupt enable
part on ARM hotplug code.

Regards
Santosh

Santosh Shilimkar June 20, 2011, 11:51 a.m. UTC | #10

On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 04:55:43PM +0530, Santosh Shilimkar wrote:
>> On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
>>> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>>>> Yes. It's because of interrupt and the CPU active-online
>>>> race.
>>>
>>> I don't see that as a conclusion from this dump.
>>>
>>>> Here is the chash log..
>>>> [   21.025451] CPU1: Booted secondary processor
>>>> [   21.025451] CPU1: Unknown IPI message 0x1
>>>> [   21.029113] Switched to NOHz mode on CPU #1
>>>> [   21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>>>
>>> That's the xtime seqlock.  We're trying to update the xtime from CPU1,
>>> which is not yet online and not yet active.  That's fine, we're just
>>> spinning on the spinlock here, waiting for the other CPUs to release
>>> it.
>>>
>>> But what this is saying is that the other CPUs aren't releasing it.
>>> The cpu hotplug code doesn't hold the seqlock either.  So who else is
>>> holding this lock, causing CPU1 to time out on it.
>>>
>>> The other thing is that this is only supposed to trigger after about
>>> one second:
>>>
>>>           u64 loops = loops_per_jiffy * HZ;
>>>                   for (i = 0; i<   loops; i++) {
>>>                           if (arch_spin_trylock(&lock->raw_lock))
>>>                                   return;
>>>                           __delay(1);
>>>                   }
>>>
>>> which from the timings you have at the beginning of your printk lines
>>> is clearly not the case - it's more like 61us.
>>>
>>> Are you running with those h/w timer delay patches?
>> Nope.
>
> Ok.  So loops_per_jiffy must be too small.  My guess is you're using an
> older kernel without 71c696b1 (calibrate: extract fall-back calculation
> into own helper).
>
I am on V3.0-rc3+(latest mainline) and the above commit is already
part of it.

> The delay calibration code used to start out by setting:
>
> 	loops_per_jiffy = (1<<12);
>
> This will shorten the delay right down, and that's probably causing these
> false spinlock lockup bug dumps.
>
> Arranging for IRQs to be disabled across the delay calibration just avoids
> the issue by preventing any spinlock being taken.
>
> The reason that CPU#0 also complains about spinlock lockup is that for
> some reason CPU#1 never finishes its calibration, and so the loop also
> times out early on CPU#0.
>
I am not sure but what I think is happening is as soon as interrupts
start firing, as part of IRQ handling, scheduler will try to
enqueue softIRQ thread for newly booted CPU since it sees that
it's active and ready. But that's failing and both CPU's
eventually lock-up. But I may be wrong here.

> Of course, fiddling with this global variable in this way is _not_ a good
> idea while other CPUs are running and using that variable.
>
> We could also do with implementing trigger_all_cpu_backtrace() to get
> backtraces from the other CPUs when spinlock lockup happens...

Any pointers on the other question about "why we need to enable
interrupts before the CPU is ready?"

Regards
Santosh

Russell King - ARM Linux June 20, 2011, 12:19 p.m. UTC | #11

On Mon, Jun 20, 2011 at 05:21:48PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 04:55:43PM +0530, Santosh Shilimkar wrote:
>>> On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
>>>> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>>>>> Yes. It's because of interrupt and the CPU active-online
>>>>> race.
>>>>
>>>> I don't see that as a conclusion from this dump.
>>>>
>>>>> Here is the chash log..
>>>>> [   21.025451] CPU1: Booted secondary processor
>>>>> [   21.025451] CPU1: Unknown IPI message 0x1
>>>>> [   21.029113] Switched to NOHz mode on CPU #1
>>>>> [   21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>>>>
>>>> That's the xtime seqlock.  We're trying to update the xtime from CPU1,
>>>> which is not yet online and not yet active.  That's fine, we're just
>>>> spinning on the spinlock here, waiting for the other CPUs to release
>>>> it.
>>>>
>>>> But what this is saying is that the other CPUs aren't releasing it.
>>>> The cpu hotplug code doesn't hold the seqlock either.  So who else is
>>>> holding this lock, causing CPU1 to time out on it.
>>>>
>>>> The other thing is that this is only supposed to trigger after about
>>>> one second:
>>>>
>>>>           u64 loops = loops_per_jiffy * HZ;
>>>>                   for (i = 0; i<   loops; i++) {
>>>>                           if (arch_spin_trylock(&lock->raw_lock))
>>>>                                   return;
>>>>                           __delay(1);
>>>>                   }
>>>>
>>>> which from the timings you have at the beginning of your printk lines
>>>> is clearly not the case - it's more like 61us.
>>>>
>>>> Are you running with those h/w timer delay patches?
>>> Nope.
>>
>> Ok.  So loops_per_jiffy must be too small.  My guess is you're using an
>> older kernel without 71c696b1 (calibrate: extract fall-back calculation
>> into own helper).
>>
> I am on V3.0-rc3+(latest mainline) and the above commit is already
> part of it.
>
>> The delay calibration code used to start out by setting:
>>
>> 	loops_per_jiffy = (1<<12);
>>
>> This will shorten the delay right down, and that's probably causing these
>> false spinlock lockup bug dumps.
>>
>> Arranging for IRQs to be disabled across the delay calibration just avoids
>> the issue by preventing any spinlock being taken.
>>
>> The reason that CPU#0 also complains about spinlock lockup is that for
>> some reason CPU#1 never finishes its calibration, and so the loop also
>> times out early on CPU#0.
>>
> I am not sure but what I think is happening is as soon as interrupts
> start firing, as part of IRQ handling, scheduler will try to
> enqueue softIRQ thread for newly booted CPU since it sees that
> it's active and ready. But that's failing and both CPU's
> eventually lock-up. But I may be wrong here.

Even if that happens, there is NO WAY that the spinlock lockup detector
should report lockup in anything under 1s.

>> Of course, fiddling with this global variable in this way is _not_ a good
>> idea while other CPUs are running and using that variable.
>>
>> We could also do with implementing trigger_all_cpu_backtrace() to get
>> backtraces from the other CPUs when spinlock lockup happens...
>
> Any pointers on the other question about "why we need to enable
> interrupts before the CPU is ready?"

To ensure that things like the delay loop calibration and twd calibration
can run, though that looks like it'll run happily enough with the boot
CPU updating jiffies.

However, I'm still not taking your patch because I believe its just
papering over the real issue, which is not as you describe.

You first need to work out why the spinlock lockup detection is firing
after just 61us rather than the full 1s and fix that.

You then need to work out whether you really do have spinlock lockup,
and if so, why.  Implementing trigger_all_cpu_backtrace() may help to
find out what CPU#0 is doing, though we can only do that with IRQs on,
and so would be fragile.

We can test whether CPU#0 is going off to do something else while CPU#1
is being brought up, by adding a preempt_disable() / preempt_enable()
in __cpu_up() to prevent the wait-for-cpu#1-online being preempted by
other threads - I suspect you'll still see spinlock lockup on the
xtime seqlock on CPU#1 though.  That would suggest a coherency issue.

Finally, how are you provoking this - and what kernel configuration are
you using?

Santosh Shilimkar June 20, 2011, 12:27 p.m. UTC | #12

On 6/20/2011 5:49 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 05:21:48PM +0530, Santosh Shilimkar wrote:
>> On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:

[...]

>>
>> Any pointers on the other question about "why we need to enable
>> interrupts before the CPU is ready?"
>
> To ensure that things like the delay loop calibration and twd calibration
> can run, though that looks like it'll run happily enough with the boot
> CPU updating jiffies.
>
I guessed it and had same point as above. Calibration will still
work.

> However, I'm still not taking your patch because I believe its just
> papering over the real issue, which is not as you describe.
>
> You first need to work out why the spinlock lockup detection is firing
> after just 61us rather than the full 1s and fix that.
>
This is possibly because of my script which doesn't wait for 1
second.

> You then need to work out whether you really do have spinlock lockup,
> and if so, why.  Implementing trigger_all_cpu_backtrace() may help to
> find out what CPU#0 is doing, though we can only do that with IRQs on,
> and so would be fragile.
>
> We can test whether CPU#0 is going off to do something else while CPU#1
> is being brought up, by adding a preempt_disable() / preempt_enable()
> in __cpu_up() to prevent the wait-for-cpu#1-online being preempted by
> other threads - I suspect you'll still see spinlock lockup on the
> xtime seqlock on CPU#1 though.  That would suggest a coherency issue.
>
> Finally, how are you provoking this - and what kernel configuration are
> you using?
Latest mainline kernel with omap2plus_defconfig and below simple script
to trigger the failure.

-------------
while true
do
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
done


Regards
Santosh

Russell King - ARM Linux June 20, 2011, 12:57 p.m. UTC | #13

On Mon, Jun 20, 2011 at 05:57:01PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 5:49 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 05:21:48PM +0530, Santosh Shilimkar wrote:
>>> On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
>
> [...]
>
>>>
>>> Any pointers on the other question about "why we need to enable
>>> interrupts before the CPU is ready?"
>>
>> To ensure that things like the delay loop calibration and twd calibration
>> can run, though that looks like it'll run happily enough with the boot
>> CPU updating jiffies.
>>
> I guessed it and had same point as above. Calibration will still
> work.
>
>> However, I'm still not taking your patch because I believe its just
>> papering over the real issue, which is not as you describe.
>>
>> You first need to work out why the spinlock lockup detection is firing
>> after just 61us rather than the full 1s and fix that.
>>
> This is possibly because of my script which doesn't wait for 1
> second.

How could a userspace script affect the internal behaviour of
spin_lock() and the spinlock lockup detector?

> Latest mainline kernel with omap2plus_defconfig and below simple script
> to trigger the failure.
>
> -------------
> while true
> do
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 1 > /sys/devices/system/cpu/cpu1/online
> done

Thanks, I'll give it a go here and see if I can debug it further.

[RFC] ARM: smp: Fix the CPU hotplug race with scheduler.

Commit Message

Comments

Patch