Message ID | alpine.LFD.2.02.1302221212500.22263@ionos (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Feb 22, 2013 at 12:07:30PM +0000, Thomas Gleixner wrote: > On Fri, 22 Feb 2013, Santosh Shilimkar wrote: > > > On Friday 22 February 2013 04:01 PM, Lorenzo Pieralisi wrote: > > > On Fri, Feb 22, 2013 at 10:24:00AM +0000, Thomas Gleixner wrote: > > > > On Fri, 22 Feb 2013, Santosh Shilimkar wrote: > > > > > BTW, Lorenzo off-list mentioned to me about warning in boot-up > > > > > which I missed while testing your patch. It will take bit more > > > > > time for me to look into it and hence thought of reporting it. > > > > > > > > > > [ 2.186126] ------------[ cut here ]------------ > > > > > [ 2.190979] WARNING: at kernel/time/tick-broadcast.c:501 > > > > > tick_broadcast_oneshot_control+0x1c0/0x21c() > > > > > > > > Which one is that? tick_broadcast_pending or tick_force_broadcast_mask ? > > > > > > It is the tick_force_broadcast_mask and I think that's because on all > > > systems we are testing, the broadcast timer IRQ is a thundering herd, > > > all CPUs get out of idle at once and try to get out of broadcast mode > > > at more or less the same time. > > > > > So the issue comes ups only when the idle state used where CPU wakeup > > more or less at same time as Lorenzo mentioned. I have two platforms > > where I could test the patch and see the issue only with one platform. > > > > Yesterday I didn't notice the warning because it wasn't seen on that > > platform :-) OMAP4 idle entry and exit in deep state is staggered > > between CPUs and hence the warning isn't seen. On OMAP5 though, > > there is an additional C-state where idle entry/exit for CPU > > isn't staggered and I see the issue in that case. > > > > Actually the broad-cast code doesn't expect such a behavior > > from CPUs since only the broad-cast affine CPU should wake > > up and rest of the CPU should be woken up by the broad-cast > > IPIs. > > That's what I feared. We might have the same issue on x86, depending > on the cpu model. > > But thinking more about it. It's actually not a real problem, just > pointless burning of cpu cycles. > > So on the CPU which gets woken along with the target CPU of the > broadcast the following happens: > > deep_idle() > <-- spurious wakeup > broadcast_exit() > set forced bit > > enable interrupts > > <-- Nothing happens > > disable interrupts > > broadcast_enter() > <-- Here we observe the forced bit is set > deep_idle() > > Now after that the target CPU of the broadcast runs the broadcast > handler and finds the other CPU in both the broadcast and the forced > mask, sends the IPI and stuff gets back to normal. > > So it's not actually harmful, just more evidence for the theory, that > hardware designers have access to very special drug supplies. > > Now we could make use of that and avoid going deep idle just to come > back right away via the IPI. Unfortunately the notification thingy has > no return value, but we can fix that. > > To confirm that theory, could you please try the hack below and add > some instrumentation (trace_printk)? Applied, and it looks like that's exactly why the warning triggers, at least on the platform I am testing on which is a dual-cluster ARM testchip. There is a still time window though where the CPU (the IPI target) can get back to idle (tick_broadcast_pending still not set) before the CPU target of the broadcast has a chance to run tick_handle_oneshot_broadcast (and set tick_broadcast_pending), or am I missing something ? It is a corner case, granted. Best thing would be to check pending IRQs in the idle driver back-end (or have always-on local timers :-)). Thanks, Lorenzo
On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote: > On Fri, Feb 22, 2013 at 12:07:30PM +0000, Thomas Gleixner wrote: > > Now we could make use of that and avoid going deep idle just to come > > back right away via the IPI. Unfortunately the notification thingy has > > no return value, but we can fix that. > > > > To confirm that theory, could you please try the hack below and add > > some instrumentation (trace_printk)? > > Applied, and it looks like that's exactly why the warning triggers, at least > on the platform I am testing on which is a dual-cluster ARM testchip. > > There is a still time window though where the CPU (the IPI target) can get > back to idle (tick_broadcast_pending still not set) before the CPU target of > the broadcast has a chance to run tick_handle_oneshot_broadcast (and set > tick_broadcast_pending), or am I missing something ? Well, the tick_broadcast_pending bit is uninteresting if the force_broadcast bit is set. Because if that bit is set we know for sure, that we got woken with the cpu which gets the broadcast timer and raced back to idle before the broadcast handler managed to send the IPI. If we did not get woken before the broadcast IPI then the pending bit is set when we exit the broadcast mode. > It is a corner case, granted. Best thing would be to check pending IRQs in the > idle driver back-end (or have always-on local timers :-)). The latter is definitely the only sane solution. Thanks, tglx
On Fri, Feb 22, 2013 at 03:03:02PM +0000, Thomas Gleixner wrote: > On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote: > > On Fri, Feb 22, 2013 at 12:07:30PM +0000, Thomas Gleixner wrote: > > > Now we could make use of that and avoid going deep idle just to come > > > back right away via the IPI. Unfortunately the notification thingy has > > > no return value, but we can fix that. > > > > > > To confirm that theory, could you please try the hack below and add > > > some instrumentation (trace_printk)? > > > > Applied, and it looks like that's exactly why the warning triggers, at least > > on the platform I am testing on which is a dual-cluster ARM testchip. > > > > There is a still time window though where the CPU (the IPI target) can get > > back to idle (tick_broadcast_pending still not set) before the CPU target of > > the broadcast has a chance to run tick_handle_oneshot_broadcast (and set > > tick_broadcast_pending), or am I missing something ? > > Well, the tick_broadcast_pending bit is uninteresting if the > force_broadcast bit is set. Because if that bit is set we know for > sure, that we got woken with the cpu which gets the broadcast timer > and raced back to idle before the broadcast handler managed to > send the IPI. Gah, my bad sorry, I mixed things up. I thought tick_check_broadcast_pending() was checking against the tick_broadcast_pending mask not tick_force_broadcast_mask as it correctly does. All clear now. Thanks a lot, Lorenzo
On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote: > On Fri, Feb 22, 2013 at 03:03:02PM +0000, Thomas Gleixner wrote: > > On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote: > > > On Fri, Feb 22, 2013 at 12:07:30PM +0000, Thomas Gleixner wrote: > > > > Now we could make use of that and avoid going deep idle just to come > > > > back right away via the IPI. Unfortunately the notification thingy has > > > > no return value, but we can fix that. > > > > > > > > To confirm that theory, could you please try the hack below and add > > > > some instrumentation (trace_printk)? > > > > > > Applied, and it looks like that's exactly why the warning triggers, at least > > > on the platform I am testing on which is a dual-cluster ARM testchip. > > > > > > There is a still time window though where the CPU (the IPI target) can get > > > back to idle (tick_broadcast_pending still not set) before the CPU target of > > > the broadcast has a chance to run tick_handle_oneshot_broadcast (and set > > > tick_broadcast_pending), or am I missing something ? > > > > Well, the tick_broadcast_pending bit is uninteresting if the > > force_broadcast bit is set. Because if that bit is set we know for > > sure, that we got woken with the cpu which gets the broadcast timer > > and raced back to idle before the broadcast handler managed to > > send the IPI. > > Gah, my bad sorry, I mixed things up. I thought > > tick_check_broadcast_pending() > > was checking against the tick_broadcast_pending mask not > > tick_force_broadcast_mask Yep, that's a misnomer. I just wanted to make sure that my theory is correct. I need to think about the real solution some more. We have two alternatives: 1) Make the clockevents_notify function have a return value. 2) Add something like the hack I gave you with a proper name. The latter has the beauty, that we just need to modify the platform independent idle code instead of going down to every callsite of the clockevents_notify thing. Thanks, tglx
On Saturday 23 February 2013 12:22 AM, Thomas Gleixner wrote: > On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote: >> On Fri, Feb 22, 2013 at 03:03:02PM +0000, Thomas Gleixner wrote: >>> On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote: >>>> On Fri, Feb 22, 2013 at 12:07:30PM +0000, Thomas Gleixner wrote: >>>>> Now we could make use of that and avoid going deep idle just to come >>>>> back right away via the IPI. Unfortunately the notification thingy has >>>>> no return value, but we can fix that. >>>>> >>>>> To confirm that theory, could you please try the hack below and add >>>>> some instrumentation (trace_printk)? >>>> >>>> Applied, and it looks like that's exactly why the warning triggers, at least >>>> on the platform I am testing on which is a dual-cluster ARM testchip. >>>> I too confirm that the warnings cause is same. >>>> There is a still time window though where the CPU (the IPI target) can get >>>> back to idle (tick_broadcast_pending still not set) before the CPU target of >>>> the broadcast has a chance to run tick_handle_oneshot_broadcast (and set >>>> tick_broadcast_pending), or am I missing something ? >>> >>> Well, the tick_broadcast_pending bit is uninteresting if the >>> force_broadcast bit is set. Because if that bit is set we know for >>> sure, that we got woken with the cpu which gets the broadcast timer >>> and raced back to idle before the broadcast handler managed to >>> send the IPI. >> >> Gah, my bad sorry, I mixed things up. I thought >> >> tick_check_broadcast_pending() >> >> was checking against the tick_broadcast_pending mask not >> >> tick_force_broadcast_mask > > Yep, that's a misnomer. I just wanted to make sure that my theory is > correct. I need to think about the real solution some more. > > We have two alternatives: > > 1) Make the clockevents_notify function have a return value. > > 2) Add something like the hack I gave you with a proper name. > > The latter has the beauty, that we just need to modify the platform > independent idle code instead of going down to every callsite of the > clockevents_notify thing. > I agree that 2 is better alternative to avoid multiple changes. Whichever alternative you choose, will be happy to test it :) Regards, Santosh
Thomas, 2013/2/23 Thomas Gleixner <tglx@linutronix.de>: > On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote: >> On Fri, Feb 22, 2013 at 03:03:02PM +0000, Thomas Gleixner wrote: >> > On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote: >> > > On Fri, Feb 22, 2013 at 12:07:30PM +0000, Thomas Gleixner wrote: >> > > > Now we could make use of that and avoid going deep idle just to come >> > > > back right away via the IPI. Unfortunately the notification thingy has >> > > > no return value, but we can fix that. >> > > > >> > > > To confirm that theory, could you please try the hack below and add >> > > > some instrumentation (trace_printk)? >> > > >> > > Applied, and it looks like that's exactly why the warning triggers, at least >> > > on the platform I am testing on which is a dual-cluster ARM testchip. >> > > >> > > There is a still time window though where the CPU (the IPI target) can get >> > > back to idle (tick_broadcast_pending still not set) before the CPU target of >> > > the broadcast has a chance to run tick_handle_oneshot_broadcast (and set >> > > tick_broadcast_pending), or am I missing something ? >> > >> > Well, the tick_broadcast_pending bit is uninteresting if the >> > force_broadcast bit is set. Because if that bit is set we know for >> > sure, that we got woken with the cpu which gets the broadcast timer >> > and raced back to idle before the broadcast handler managed to >> > send the IPI. >> >> Gah, my bad sorry, I mixed things up. I thought >> >> tick_check_broadcast_pending() >> >> was checking against the tick_broadcast_pending mask not >> >> tick_force_broadcast_mask > > Yep, that's a misnomer. I just wanted to make sure that my theory is > correct. I need to think about the real solution some more. I have applied your patch and tested, there is NO warning at all then. I think your theory is correct. > > We have two alternatives: > > 1) Make the clockevents_notify function have a return value. > > 2) Add something like the hack I gave you with a proper name. > > The latter has the beauty, that we just need to modify the platform > independent idle code instead of going down to every callsite of the > clockevents_notify thing. I prefer the solution 2). > > Thanks, > > tglx
On Fri, Feb 22, 2013 at 06:52:14PM +0000, Thomas Gleixner wrote: > On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote: > > On Fri, Feb 22, 2013 at 03:03:02PM +0000, Thomas Gleixner wrote: > > > On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote: > > > > On Fri, Feb 22, 2013 at 12:07:30PM +0000, Thomas Gleixner wrote: > > > > > Now we could make use of that and avoid going deep idle just to come > > > > > back right away via the IPI. Unfortunately the notification thingy has > > > > > no return value, but we can fix that. > > > > > > > > > > To confirm that theory, could you please try the hack below and add > > > > > some instrumentation (trace_printk)? > > > > > > > > Applied, and it looks like that's exactly why the warning triggers, at least > > > > on the platform I am testing on which is a dual-cluster ARM testchip. > > > > > > > > There is a still time window though where the CPU (the IPI target) can get > > > > back to idle (tick_broadcast_pending still not set) before the CPU target of > > > > the broadcast has a chance to run tick_handle_oneshot_broadcast (and set > > > > tick_broadcast_pending), or am I missing something ? > > > > > > Well, the tick_broadcast_pending bit is uninteresting if the > > > force_broadcast bit is set. Because if that bit is set we know for > > > sure, that we got woken with the cpu which gets the broadcast timer > > > and raced back to idle before the broadcast handler managed to > > > send the IPI. > > > > Gah, my bad sorry, I mixed things up. I thought > > > > tick_check_broadcast_pending() > > > > was checking against the tick_broadcast_pending mask not > > > > tick_force_broadcast_mask > > Yep, that's a misnomer. I just wanted to make sure that my theory is > correct. I need to think about the real solution some more. > > We have two alternatives: > > 1) Make the clockevents_notify function have a return value. > > 2) Add something like the hack I gave you with a proper name. > > The latter has the beauty, that we just need to modify the platform > independent idle code instead of going down to every callsite of the > clockevents_notify thing. Thank you. I am not sure (1) would buy us anything compared to (2) and as you said we would end up patching all callsites so (2) is preferred. As I mentioned, we can even just apply your fixes and leave platform specific code deal with this optimization, at the end of the day idle driver has just to check pending IRQs/wake-up sources (which would cover all IRQs not just TIMER IPI) if and when it has to start time consuming operations like cache cleaning to enter deep idle. If it goes into a shallow C-state so be it. On x86 I think it is HW/FW that prevents C-state entering if IRQs are pending, and on ARM it is likely to happen too, so I am just saying you should not bother if you think the code becomes too hairy to justify this change. Thank you very much for the fixes and your help, Lorenzo
Index: linux-2.6/arch/arm/kernel/process.c =================================================================== --- linux-2.6.orig/arch/arm/kernel/process.c +++ linux-2.6/arch/arm/kernel/process.c @@ -199,7 +199,7 @@ void cpu_idle(void) #ifdef CONFIG_PL310_ERRATA_769419 wmb(); #endif - if (hlt_counter) { + if (hlt_counter || tick_check_broadcast_pending()) { local_irq_enable(); cpu_relax(); } else if (!need_resched()) { Index: linux-2.6/include/linux/clockchips.h =================================================================== --- linux-2.6.orig/include/linux/clockchips.h +++ linux-2.6/include/linux/clockchips.h @@ -170,6 +170,12 @@ extern void tick_broadcast(const struct extern int tick_receive_broadcast(void); #endif +#if defined(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST) && defined(CONFIG_TICK_ONESHOT) +extern int tick_check_broadcast_pending(void); +#else +static inline int tick_check_broadcast_pending(void) { return 0; } +#endif + #ifdef CONFIG_GENERIC_CLOCKEVENTS extern void clockevents_notify(unsigned long reason, void *arg); #else Index: linux-2.6/kernel/time/tick-broadcast.c =================================================================== --- linux-2.6.orig/kernel/time/tick-broadcast.c +++ linux-2.6/kernel/time/tick-broadcast.c @@ -418,6 +418,15 @@ static int tick_broadcast_set_event(ktim return clockevents_program_event(bc, expires, force); } +/* + * Called before going idle with interrupts disabled. Checks whether a + * broadcast event from the other core is about to happen. + */ +int tick_check_broadcast_pending(void) +{ + return test_bit(smp_processor_id(), tick_force_broadcast_mask); +} + int tick_resume_broadcast_oneshot(struct clock_event_device *bc) { clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);