diff mbox

arm: Set hardirq tracing to on when idling

Message ID 1401045323-7116-1-git-send-email-minyard@acm.org (mailing list archive)
State New, archived
Headers show

Commit Message

Corey Minyard May 25, 2014, 7:15 p.m. UTC
From: Corey Minyard <cminyard@mvista.com>

The CPU will go to idle with interrupts off, but the interrupts
will wake up the idle.  This was causing very long irqsoff trace
values because, basically, the whole idle time was traces with
irqs off, even though they weren't really off.  Rework the idle
code to turn hardirq tracing to on befor calling idle.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
---
 arch/arm/kernel/process.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

I'm not sure this is correct for all ARM boards, but it fixes the
issue for the Vexpress board I have.  No more 4 second irqsoff
times.

Comments

Arnd Bergmann May 26, 2014, 9:26 a.m. UTC | #1
On Sunday 25 May 2014 14:15:23 minyard@acm.org wrote:
> From: Corey Minyard <cminyard@mvista.com>
> 
> The CPU will go to idle with interrupts off, but the interrupts
> will wake up the idle.  This was causing very long irqsoff trace
> values because, basically, the whole idle time was traces with
> irqs off, even though they weren't really off.  Rework the idle
> code to turn hardirq tracing to on befor calling idle.
> 
> Signed-off-by: Corey Minyard <cminyard@mvista.com>
> ---
>  arch/arm/kernel/process.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> I'm not sure this is correct for all ARM boards, but it fixes the
> issue for the Vexpress board I have.  No more 4 second irqsoff
> times.

If this patch is correct for ARM, I wonder if it should be done
in architecture independent code instead. Do you see the same
thing on x86? If not, do you know what's different?

It seems like the right thing to do, I just don't understand
why nobody hit this before.

	Arnd
Corey Minyard May 27, 2014, 1:21 p.m. UTC | #2
On 05/26/2014 04:26 AM, Arnd Bergmann wrote:
> On Sunday 25 May 2014 14:15:23 minyard@acm.org wrote:
>> From: Corey Minyard <cminyard@mvista.com>
>>
>> The CPU will go to idle with interrupts off, but the interrupts
>> will wake up the idle.  This was causing very long irqsoff trace
>> values because, basically, the whole idle time was traces with
>> irqs off, even though they weren't really off.  Rework the idle
>> code to turn hardirq tracing to on befor calling idle.
>>
>> Signed-off-by: Corey Minyard <cminyard@mvista.com>
>> ---
>>  arch/arm/kernel/process.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> I'm not sure this is correct for all ARM boards, but it fixes the
>> issue for the Vexpress board I have.  No more 4 second irqsoff
>> times.
> If this patch is correct for ARM, I wonder if it should be done
> in architecture independent code instead. Do you see the same
> thing on x86? If not, do you know what's different?

MIPS doesn't idle with IRQs disabled.

x86 calls trace_cpu_idle_rcuidle, but it doesn't re-enable interrupts at
the end of the idle routine.  Adding those trace calls to ARM doesn't
seem to make any difference, though.

> It seems like the right thing to do, I just don't understand
> why nobody hit this before.

Yeah, I'm a little confused by that, too.  The RT guys use ARM
extensively, why haven't they seen this?  Of course, if you run your RT
tests under heavy load, you won't see this since you will never go idle.

-corey
Arnd Bergmann May 27, 2014, 4:16 p.m. UTC | #3
On Tuesday 27 May 2014, Corey Minyard wrote:
> On 05/26/2014 04:26 AM, Arnd Bergmann wrote:
> > On Sunday 25 May 2014 14:15:23 minyard@acm.org wrote:
> >> From: Corey Minyard <cminyard@mvista.com>
> >>
> >> The CPU will go to idle with interrupts off, but the interrupts
> >> will wake up the idle.  This was causing very long irqsoff trace
> >> values because, basically, the whole idle time was traces with
> >> irqs off, even though they weren't really off.  Rework the idle
> >> code to turn hardirq tracing to on befor calling idle.
> >>
> >> Signed-off-by: Corey Minyard <cminyard@mvista.com>
> >> ---
> >>  arch/arm/kernel/process.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> I'm not sure this is correct for all ARM boards, but it fixes the
> >> issue for the Vexpress board I have.  No more 4 second irqsoff
> >> times.
> > If this patch is correct for ARM, I wonder if it should be done
> > in architecture independent code instead. Do you see the same
> > thing on x86? If not, do you know what's different?
> 
> MIPS doesn't idle with IRQs disabled.

Looking at the mips code, I see a couple of variants of the idle function:

* r3081_wait(), r39xx_wait(), and rm7k_wait_irqoff() correctly sleep
  before enabling the IRQs

* r4k_wait first enables the IRQs and then waits, while a second
  r4k_wait_irqoff() variant does the opposite, and a comment
  explains that the first is wrong but the second may be broken
  on some implementations.

* can't tell what au1k_wait() does, but it also seems to sleep first.

> x86 calls trace_cpu_idle_rcuidle, but it doesn't re-enable interrupts at
> the end of the idle routine.  Adding those trace calls to ARM doesn't
> seem to make any difference, though.

trace_cpu_idle_rcuidle() is unrelated, I also checked that.

The comment in x86 arch_safe_halt() says "Used in the idle loop; sti
takes one instruction cycle" and the implementation is "sti; hlt",
where "sti" enables the interrupts at the same time as "hlt"
puts the cpu in low-power mode.

I don't understand what happens when ACPI is used, but I would
it also ends up calling mwait or hlt with IRQs off. Can you try
booting an x86 machine with acpi=off?

> > It seems like the right thing to do, I just don't understand
> > why nobody hit this before.
> 
> Yeah, I'm a little confused by that, too.  The RT guys use ARM
> extensively, why haven't they seen this?  Of course, if you run your RT
> tests under heavy load, you won't see this since you will never go idle.

How exactly do you see this manifest? If it's clear that the
trace comes from the idle function, maybe everybody just ignores it?

	Arnd
Stanislav Meduna May 27, 2014, 4:38 p.m. UTC | #4
On 26.05.2014 11:26, Arnd Bergmann wrote:

> It seems like the right thing to do, I just don't understand
> why nobody hit this before.

Looks like this is what I did hit a month ago and
was not able to find the culprit:
http://www.spinics.net/lists/linux-rt-users/msg11656.html

> How exactly do you see this manifest? If it's clear
> that the trace comes from the idle function, maybe
> everybody just ignores it?

That was definitely what I did ;)
Arnd Bergmann May 27, 2014, 6:49 p.m. UTC | #5
On Tuesday 27 May 2014 18:38:33 Stanislav Meduna wrote:
> On 26.05.2014 11:26, Arnd Bergmann wrote:
> 
> > It seems like the right thing to do, I just don't understand
> > why nobody hit this before.
> 
> Looks like this is what I did hit a month ago and
> was not able to find the culprit:
> http://www.spinics.net/lists/linux-rt-users/msg11656.html

Ok, so you get a 2.2ms latency on i.mx28 here, which comes
from exactly the same code location, and is quite alarming
by itself, but much less so than the 4 second latency that
Corey is seeing. This could just be an artifact of i.mx28
not staying idle for as long as vexpress.

> > How exactly do you see this manifest? If it's clear
> > that the trace comes from the idle function, maybe
> > everybody just ignores it?
> 
> That was definitely what I did 

You also commented in that thread about stop_critical_timings()/
start_critical_timings(). Corey, can you look at that, too? I
think it's designed to avoid the issue you are seeing but
for some reason doesn't.

Maybe the problem is that the tracing infrastructure gets confused
when the irq_disable() happens in a different function from
irq_enable().

	Arnd
Corey Minyard May 27, 2014, 6:50 p.m. UTC | #6
On 05/27/2014 11:16 AM, Arnd Bergmann wrote:
> On Tuesday 27 May 2014, Corey Minyard wrote:
>> On 05/26/2014 04:26 AM, Arnd Bergmann wrote:
>>> On Sunday 25 May 2014 14:15:23 minyard@acm.org wrote:
>>>> From: Corey Minyard <cminyard@mvista.com>
>>>>
>>>> The CPU will go to idle with interrupts off, but the interrupts
>>>> will wake up the idle.  This was causing very long irqsoff trace
>>>> values because, basically, the whole idle time was traces with
>>>> irqs off, even though they weren't really off.  Rework the idle
>>>> code to turn hardirq tracing to on befor calling idle.
>>>>
>>>> Signed-off-by: Corey Minyard <cminyard@mvista.com>
>>>> ---
>>>>  arch/arm/kernel/process.c | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> I'm not sure this is correct for all ARM boards, but it fixes the
>>>> issue for the Vexpress board I have.  No more 4 second irqsoff
>>>> times.
>>> If this patch is correct for ARM, I wonder if it should be done
>>> in architecture independent code instead. Do you see the same
>>> thing on x86? If not, do you know what's different?
>> MIPS doesn't idle with IRQs disabled.
> Looking at the mips code, I see a couple of variants of the idle function:
>
> * r3081_wait(), r39xx_wait(), and rm7k_wait_irqoff() correctly sleep
>   before enabling the IRQs
>
> * r4k_wait first enables the IRQs and then waits, while a second
>   r4k_wait_irqoff() variant does the opposite, and a comment
>   explains that the first is wrong but the second may be broken
>   on some implementations.
>
> * can't tell what au1k_wait() does, but it also seems to sleep first.

Yes, it seems so.  The system I have uses the r4k variant that has to
enable interrupts first.  That would seem to me to lead to other issues,
but neither my testers nor I have seen any problems.  All the RT
scheduling tests we have come back clean on MIPS.

>
>> x86 calls trace_cpu_idle_rcuidle, but it doesn't re-enable interrupts at
>> the end of the idle routine.  Adding those trace calls to ARM doesn't
>> seem to make any difference, though.
> trace_cpu_idle_rcuidle() is unrelated, I also checked that.
>
> The comment in x86 arch_safe_halt() says "Used in the idle loop; sti
> takes one instruction cycle" and the implementation is "sti; hlt",
> where "sti" enables the interrupts at the same time as "hlt"
> puts the cpu in low-power mode.
>
> I don't understand what happens when ACPI is used, but I would
> it also ends up calling mwait or hlt with IRQs off. Can you try
> booting an x86 machine with acpi=off?

I can't get my system to boot with acpi=off.  It crashes setting up the
IOAPIC.  This is current 3.10 stable with the RT patch applied.

But I obviously don't see the issue on x86 with acpi on.

>>> It seems like the right thing to do, I just don't understand
>>> why nobody hit this before.
>> Yeah, I'm a little confused by that, too.  The RT guys use ARM
>> extensively, why haven't they seen this?  Of course, if you run your RT
>> tests under heavy load, you won't see this since you will never go idle.
> How exactly do you see this manifest? If it's clear that the
> trace comes from the idle function, maybe everybody just ignores it?

It is clear that the trace comes from the idle function, but it will
quickly mask any other real irqsoff problem.

-corey
Stephen Boyd May 27, 2014, 6:53 p.m. UTC | #7
On 05/27/14 11:49, Arnd Bergmann wrote:
>
> You also commented in that thread about stop_critical_timings()/
> start_critical_timings(). Corey, can you look at that, too? I
> think it's designed to avoid the issue you are seeing but
> for some reason doesn't.

I sent a patch last week to "solve" this problem. I'm not sure if it's
right but it works for me.

https://lkml.org/lkml/2014/5/19/607
Arnd Bergmann May 27, 2014, 7:27 p.m. UTC | #8
On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote:
> On 05/27/14 11:49, Arnd Bergmann wrote:
> >
> > You also commented in that thread about stop_critical_timings()/
> > start_critical_timings(). Corey, can you look at that, too? I
> > think it's designed to avoid the issue you are seeing but
> > for some reason doesn't.
> 
> I sent a patch last week to "solve" this problem. I'm not sure if it's
> right but it works for me.
> 
> https://lkml.org/lkml/2014/5/19/607

I think that one was also wrong, as the intention of the existing
stop_critical_timings() function is already to do the same that
Corey's patch does, i.e. stop the trace before we go to idle as
if we were turning IRQs on.

Corey, does it work for you if you replace the new trace_hardirqs_on()
you added with time_hardirqs_on() or stop_critical_timing()?

	Arnd
Stephen Boyd May 27, 2014, 7:33 p.m. UTC | #9
On 05/27/14 12:27, Arnd Bergmann wrote:
> On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote:
>> On 05/27/14 11:49, Arnd Bergmann wrote:
>>> You also commented in that thread about stop_critical_timings()/
>>> start_critical_timings(). Corey, can you look at that, too? I
>>> think it's designed to avoid the issue you are seeing but
>>> for some reason doesn't.
>> I sent a patch last week to "solve" this problem. I'm not sure if it's
>> right but it works for me.
>>
>> https://lkml.org/lkml/2014/5/19/607
> I think that one was also wrong, as the intention of the existing
> stop_critical_timings() function is already to do the same that
> Corey's patch does, i.e. stop the trace before we go to idle as
> if we were turning IRQs on.

stop_critical_timings() is called in the generic idle loop. It looks
like stop_critical_timings() just isn't written correctly. All it does
is turn off the tracer, but it doesn't prevent future calls to
spinlocks, etc. from causing the tracer to turn on again between calls
to stop/start_critical_timings(). It seems better to prevent any more
tracing from happening between a call to stop_critical_timings() and
start_critical_timings() so we don't have to litter calls to that
function throughout the idle path.
Arnd Bergmann May 27, 2014, 7:39 p.m. UTC | #10
On Tuesday 27 May 2014 12:33:38 Stephen Boyd wrote:
> On 05/27/14 12:27, Arnd Bergmann wrote:
> > On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote:
> >> On 05/27/14 11:49, Arnd Bergmann wrote:
> >>> You also commented in that thread about stop_critical_timings()/
> >>> start_critical_timings(). Corey, can you look at that, too? I
> >>> think it's designed to avoid the issue you are seeing but
> >>> for some reason doesn't.
> >> I sent a patch last week to "solve" this problem. I'm not sure if it's
> >> right but it works for me.
> >>
> >> https://lkml.org/lkml/2014/5/19/607
> > I think that one was also wrong, as the intention of the existing
> > stop_critical_timings() function is already to do the same that
> > Corey's patch does, i.e. stop the trace before we go to idle as
> > if we were turning IRQs on.
> 
> stop_critical_timings() is called in the generic idle loop. It looks
> like stop_critical_timings() just isn't written correctly. All it does
> is turn off the tracer, but it doesn't prevent future calls to
> spinlocks, etc. from causing the tracer to turn on again between calls
> to stop/start_critical_timings(). It seems better to prevent any more
> tracing from happening between a call to stop_critical_timings() and
> start_critical_timings() so we don't have to litter calls to that
> function throughout the idle path.

But are there any such calls in the idle function? I understand what
you are doing in your patch, but I don't see why you have to actually
do it.

	Arnd
Stephen Boyd May 27, 2014, 8:22 p.m. UTC | #11
On 05/27/14 12:39, Arnd Bergmann wrote:
> On Tuesday 27 May 2014 12:33:38 Stephen Boyd wrote:
>> On 05/27/14 12:27, Arnd Bergmann wrote:
>>> On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote:
>>>> On 05/27/14 11:49, Arnd Bergmann wrote:
>>>>> You also commented in that thread about stop_critical_timings()/
>>>>> start_critical_timings(). Corey, can you look at that, too? I
>>>>> think it's designed to avoid the issue you are seeing but
>>>>> for some reason doesn't.
>>>> I sent a patch last week to "solve" this problem. I'm not sure if it's
>>>> right but it works for me.
>>>>
>>>> https://lkml.org/lkml/2014/5/19/607
>>> I think that one was also wrong, as the intention of the existing
>>> stop_critical_timings() function is already to do the same that
>>> Corey's patch does, i.e. stop the trace before we go to idle as
>>> if we were turning IRQs on.
>> stop_critical_timings() is called in the generic idle loop. It looks
>> like stop_critical_timings() just isn't written correctly. All it does
>> is turn off the tracer, but it doesn't prevent future calls to
>> spinlocks, etc. from causing the tracer to turn on again between calls
>> to stop/start_critical_timings(). It seems better to prevent any more
>> tracing from happening between a call to stop_critical_timings() and
>> start_critical_timings() so we don't have to litter calls to that
>> function throughout the idle path.
> But are there any such calls in the idle function? I understand what
> you are doing in your patch, but I don't see why you have to actually
> do it.
>

Yes in my build it happens immediately after in rcu_enter_idle(). It
would also happen if you had a cpuidle driver that notified the core
that the timers stop during idle (FEAT_C3_STOP). In that case we would
call clockevents_notify() which will call raw_spin_lock_irqsave().
Corey Minyard May 28, 2014, 12:28 a.m. UTC | #12
On 05/27/2014 02:27 PM, Arnd Bergmann wrote:
> On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote:
>> On 05/27/14 11:49, Arnd Bergmann wrote:
>>> You also commented in that thread about stop_critical_timings()/
>>> start_critical_timings(). Corey, can you look at that, too? I
>>> think it's designed to avoid the issue you are seeing but
>>> for some reason doesn't.
>> I sent a patch last week to "solve" this problem. I'm not sure if it's
>> right but it works for me.
>>
>> https://lkml.org/lkml/2014/5/19/607
> I think that one was also wrong, as the intention of the existing
> stop_critical_timings() function is already to do the same that
> Corey's patch does, i.e. stop the trace before we go to idle as
> if we were turning IRQs on.
>
> Corey, does it work for you if you replace the new trace_hardirqs_on()
> you added with time_hardirqs_on() or stop_critical_timing()?

Well, more information on this.  It turns out that the generic idle loop
calls stop_critical_timing() and start_critical timing(), so the
arch_cpu_idle() shouldn't have to.

However, the idle loop calls rcu_idle_enter() after it calls
stop_critical_timing(), and that is resetting the critical timing, it
appears.  It's disabling/enabling interrupts in rcu_idle_enter().  If I
switch the order of the rcu_idle and critical timing calls, the issue
goes away.

Stephen's patch does not seem to be necessary for my issue. I tried with
the patch applied, too.  It doesn't seem to hurt, at least.  It did not
fix the problem by itself, though.

-corey
Arnd Bergmann May 28, 2014, 6:46 a.m. UTC | #13
On Tuesday 27 May 2014 19:28:06 Corey Minyard wrote:
> On 05/27/2014 02:27 PM, Arnd Bergmann wrote:
> > On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote:
> >> On 05/27/14 11:49, Arnd Bergmann wrote:
> >>> You also commented in that thread about stop_critical_timings()/
> >>> start_critical_timings(). Corey, can you look at that, too? I
> >>> think it's designed to avoid the issue you are seeing but
> >>> for some reason doesn't.
> >> I sent a patch last week to "solve" this problem. I'm not sure if it's
> >> right but it works for me.
> >>
> >> https://lkml.org/lkml/2014/5/19/607
> > I think that one was also wrong, as the intention of the existing
> > stop_critical_timings() function is already to do the same that
> > Corey's patch does, i.e. stop the trace before we go to idle as
> > if we were turning IRQs on.
> >
> > Corey, does it work for you if you replace the new trace_hardirqs_on()
> > you added with time_hardirqs_on() or stop_critical_timing()?
> 
> Well, more information on this.  It turns out that the generic idle loop
> calls stop_critical_timing() and start_critical timing(), so the
> arch_cpu_idle() shouldn't have to.
> 
> However, the idle loop calls rcu_idle_enter() after it calls
> stop_critical_timing(), and that is resetting the critical timing, it
> appears.  It's disabling/enabling interrupts in rcu_idle_enter().  If I
> switch the order of the rcu_idle and critical timing calls, the issue
> goes away.
> 
> Stephen's patch does not seem to be necessary for my issue. I tried with
> the patch applied, too.  It doesn't seem to hurt, at least.  It did not
> fix the problem by itself, though.

Interesting, it looked like the "big hammer" solution to me (compared to
yours) that should deal with the problem in any case.

	Arnd
diff mbox

Patch

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 1ca6cf1..92413af 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -133,11 +133,12 @@  void (*arm_pm_idle)(void);
 
 static void default_idle(void)
 {
+	trace_hardirqs_on();
 	if (arm_pm_idle)
 		arm_pm_idle();
 	else
 		cpu_do_idle();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_prepare(void)