Message ID | 1399991966-14582-1-git-send-email-santosh.shilimkar@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
* Santosh Shilimkar <santosh.shilimkar@ti.com> [140513 07:40]: > On OMAP4 panda board, there have been several bug reports about boot > hang and lock-ups with CPU_IDLE enabled. The root cause of the issue > is missing interrupts while in idle state. Commit cb7094e8 {cpuidle / omap4 : > use CPUIDLE_FLAG_TIMER_STOP flag} moved the broadcast notifiers to common > code for right reasons but on OMAP4 which suffers from a nasty ROM code > bug with GIC, commit ff999b8a {ARM: OMAP4460: Workaround for ROM bug ..}, > we loose interrupts which leads to issues like lock-up, hangs etc. > > Patch reverts commit cb7094 {cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP > flag} to avoid the issue. With this change, OMAP4 panda boards, the mentioned > issues are getting fixed. We no longer loose interrupts which was the cause > of the regression. Thanks for figuring this regression out. Applying into fixes with Cc stable. Regards, Tony
On 05/13/2014 04:39 PM, Santosh Shilimkar wrote: > On OMAP4 panda board, there have been several bug reports about boot > hang and lock-ups with CPU_IDLE enabled. The root cause of the issue > is missing interrupts while in idle state. Commit cb7094e8 {cpuidle / omap4 : > use CPUIDLE_FLAG_TIMER_STOP flag} moved the broadcast notifiers to common > code for right reasons but on OMAP4 which suffers from a nasty ROM code > bug with GIC, commit ff999b8a {ARM: OMAP4460: Workaround for ROM bug ..}, > we loose interrupts which leads to issues like lock-up, hangs etc. > > Patch reverts commit cb7094 {cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP > flag} to avoid the issue. With this change, OMAP4 panda boards, the mentioned > issues are getting fixed. We no longer loose interrupts which was the cause > of the regression. > > Cc: Roger Quadros <rogerq@ti.com> > Cc: Kevin Hilman <khilman@linaro.org> > Cc: Tony Lindgren <tony@atomide.com> > Cc: Daniel Lezcano <daniel.lezcano@linaro.org> > Reported-tested-by: Roger Quadros <rogerq@ti.com> > Reported-tested-by: Kevin Hilman <khilman@linaro.org> > Tested-by: Tony Lindgren <tony@atomide.com> > Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> > --- > arch/arm/mach-omap2/cpuidle44xx.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c > index 01fc710..3e169d9 100644 > --- a/arch/arm/mach-omap2/cpuidle44xx.c > +++ b/arch/arm/mach-omap2/cpuidle44xx.c > @@ -14,6 +14,7 @@ > #include <linux/cpuidle.h> > #include <linux/cpu_pm.h> > #include <linux/export.h> > +#include <linux/clockchips.h> > > #include <asm/cpuidle.h> > #include <asm/proc-fns.h> > @@ -83,6 +84,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, > { > struct idle_statedata *cx = state_ptr + index; > u32 mpuss_can_lose_context = 0; > + int cpu_id = smp_processor_id(); > > /* > * CPU0 has to wait and stay ON until CPU1 is OFF state. > @@ -110,6 +112,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, > mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) && > (cx->mpu_logic_state == PWRDM_POWER_OFF); > > + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); > + > /* > * Call idle CPU PM enter notifier chain so that > * VFP and per CPU interrupt context is saved. > @@ -165,6 +169,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, > if (dev->cpu == 0 && mpuss_can_lose_context) > cpu_cluster_pm_exit(); > > + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); > + > fail: > cpuidle_coupled_parallel_barrier(dev, &abort_barrier); > cpu_done[dev->cpu] = false; > @@ -189,8 +195,7 @@ static struct cpuidle_driver omap4_idle_driver = { > /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */ > .exit_latency = 328 + 440, > .target_residency = 960, > - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | > - CPUIDLE_FLAG_TIMER_STOP, > + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, > .enter = omap_enter_idle_coupled, > .name = "C2", > .desc = "CPUx OFF, MPUSS CSWR", > @@ -199,8 +204,7 @@ static struct cpuidle_driver omap4_idle_driver = { > /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */ > .exit_latency = 460 + 518, > .target_residency = 1100, > - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | > - CPUIDLE_FLAG_TIMER_STOP, > + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, > .enter = omap_enter_idle_coupled, > .name = "C3", > .desc = "CPUx OFF, MPUSS OSWR", Shouldn't the broadcast timer to be setup with CLOCK_EVT_NOTIFY_BROADCAST_ON also ?
On Wednesday 14 May 2014 03:44 PM, Daniel Lezcano wrote: > On 05/13/2014 04:39 PM, Santosh Shilimkar wrote: >> On OMAP4 panda board, there have been several bug reports about boot >> hang and lock-ups with CPU_IDLE enabled. The root cause of the issue >> is missing interrupts while in idle state. Commit cb7094e8 {cpuidle / omap4 : >> use CPUIDLE_FLAG_TIMER_STOP flag} moved the broadcast notifiers to common >> code for right reasons but on OMAP4 which suffers from a nasty ROM code >> bug with GIC, commit ff999b8a {ARM: OMAP4460: Workaround for ROM bug ..}, >> we loose interrupts which leads to issues like lock-up, hangs etc. >> >> Patch reverts commit cb7094 {cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP >> flag} to avoid the issue. With this change, OMAP4 panda boards, the mentioned >> issues are getting fixed. We no longer loose interrupts which was the cause >> of the regression. >> >> Cc: Roger Quadros <rogerq@ti.com> >> Cc: Kevin Hilman <khilman@linaro.org> >> Cc: Tony Lindgren <tony@atomide.com> >> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> >> Reported-tested-by: Roger Quadros <rogerq@ti.com> >> Reported-tested-by: Kevin Hilman <khilman@linaro.org> >> Tested-by: Tony Lindgren <tony@atomide.com> >> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> >> --- >> arch/arm/mach-omap2/cpuidle44xx.c | 12 ++++++++---- >> 1 file changed, 8 insertions(+), 4 deletions(-) >> >> diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c >> index 01fc710..3e169d9 100644 >> --- a/arch/arm/mach-omap2/cpuidle44xx.c >> +++ b/arch/arm/mach-omap2/cpuidle44xx.c >> @@ -14,6 +14,7 @@ >> #include <linux/cpuidle.h> >> #include <linux/cpu_pm.h> >> #include <linux/export.h> >> +#include <linux/clockchips.h> >> >> #include <asm/cpuidle.h> >> #include <asm/proc-fns.h> >> @@ -83,6 +84,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >> { >> struct idle_statedata *cx = state_ptr + index; >> u32 mpuss_can_lose_context = 0; >> + int cpu_id = smp_processor_id(); >> >> /* >> * CPU0 has to wait and stay ON until CPU1 is OFF state. >> @@ -110,6 +112,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >> mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) && >> (cx->mpu_logic_state == PWRDM_POWER_OFF); >> >> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); >> + >> /* >> * Call idle CPU PM enter notifier chain so that >> * VFP and per CPU interrupt context is saved. >> @@ -165,6 +169,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >> if (dev->cpu == 0 && mpuss_can_lose_context) >> cpu_cluster_pm_exit(); >> >> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); >> + >> fail: >> cpuidle_coupled_parallel_barrier(dev, &abort_barrier); >> cpu_done[dev->cpu] = false; >> @@ -189,8 +195,7 @@ static struct cpuidle_driver omap4_idle_driver = { >> /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */ >> .exit_latency = 328 + 440, >> .target_residency = 960, >> - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | >> - CPUIDLE_FLAG_TIMER_STOP, >> + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, >> .enter = omap_enter_idle_coupled, >> .name = "C2", >> .desc = "CPUx OFF, MPUSS CSWR", >> @@ -199,8 +204,7 @@ static struct cpuidle_driver omap4_idle_driver = { >> /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */ >> .exit_latency = 460 + 518, >> .target_residency = 1100, >> - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | >> - CPUIDLE_FLAG_TIMER_STOP, >> + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, >> .enter = omap_enter_idle_coupled, >> .name = "C3", >> .desc = "CPUx OFF, MPUSS OSWR", > > Shouldn't the broadcast timer to be setup with CLOCK_EVT_NOTIFY_BROADCAST_ON also ? > Which is already taken care by __cpuidle_register_driver(). Note that setup code is still used from generic code...
On 05/14/2014 09:50 PM, Santosh Shilimkar wrote: > On Wednesday 14 May 2014 03:44 PM, Daniel Lezcano wrote: >> On 05/13/2014 04:39 PM, Santosh Shilimkar wrote: >>> On OMAP4 panda board, there have been several bug reports about boot >>> hang and lock-ups with CPU_IDLE enabled. The root cause of the issue >>> is missing interrupts while in idle state. Commit cb7094e8 {cpuidle / omap4 : >>> use CPUIDLE_FLAG_TIMER_STOP flag} moved the broadcast notifiers to common >>> code for right reasons but on OMAP4 which suffers from a nasty ROM code >>> bug with GIC, commit ff999b8a {ARM: OMAP4460: Workaround for ROM bug ..}, >>> we loose interrupts which leads to issues like lock-up, hangs etc. >>> >>> Patch reverts commit cb7094 {cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP >>> flag} to avoid the issue. With this change, OMAP4 panda boards, the mentioned >>> issues are getting fixed. We no longer loose interrupts which was the cause >>> of the regression. >>> >>> Cc: Roger Quadros <rogerq@ti.com> >>> Cc: Kevin Hilman <khilman@linaro.org> >>> Cc: Tony Lindgren <tony@atomide.com> >>> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> >>> Reported-tested-by: Roger Quadros <rogerq@ti.com> >>> Reported-tested-by: Kevin Hilman <khilman@linaro.org> >>> Tested-by: Tony Lindgren <tony@atomide.com> >>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> >>> --- [ ... ] >>> >>> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); >>> + >>> /* >>> * Call idle CPU PM enter notifier chain so that >>> * VFP and per CPU interrupt context is saved. >>> @@ -165,6 +169,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >>> if (dev->cpu == 0 && mpuss_can_lose_context) >>> cpu_cluster_pm_exit(); >>> >>> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); [ ... ] >> >> Shouldn't the broadcast timer to be setup with CLOCK_EVT_NOTIFY_BROADCAST_ON also ? >> > Which is already taken care by __cpuidle_register_driver(). Note that setup code > is still used from generic code... Nope, if the flag CPUIDLE_FLAG_TIMER_STOP is not set, the cpuidle framework won't setup the timer. static void __cpuidle_driver_init(struct cpuidle_driver *drv) { ... for (i = drv->state_count - 1; i >= 0 ; i--) { if (drv->states[i].flags & CPUIDLE_FLAG_TIMER_STOP) { drv->bctimer = 1; break; } } ... } static int __cpuidle_register_driver(struct cpuidle_driver *drv) { ... if (drv->bctimer) on_each_cpu_mask(drv->cpumask, cpuidle_setup_broadcast_timer, (void *)CLOCK_EVT_NOTIFY_BROADCAST_ON, 1); ... } So the broadcast timer does not operate with this revert. Moreover, I am not sure reverting this patch is the right solution.
* Daniel Lezcano <daniel.lezcano@linaro.org> [140514 13:02]: > On 05/14/2014 09:50 PM, Santosh Shilimkar wrote: > >On Wednesday 14 May 2014 03:44 PM, Daniel Lezcano wrote: > >>On 05/13/2014 04:39 PM, Santosh Shilimkar wrote: > >>>On OMAP4 panda board, there have been several bug reports about boot > >>>hang and lock-ups with CPU_IDLE enabled. The root cause of the issue > >>>is missing interrupts while in idle state. Commit cb7094e8 {cpuidle / omap4 : > >>>use CPUIDLE_FLAG_TIMER_STOP flag} moved the broadcast notifiers to common > >>>code for right reasons but on OMAP4 which suffers from a nasty ROM code > >>>bug with GIC, commit ff999b8a {ARM: OMAP4460: Workaround for ROM bug ..}, > >>>we loose interrupts which leads to issues like lock-up, hangs etc. > >>> > >>>Patch reverts commit cb7094 {cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP > >>>flag} to avoid the issue. With this change, OMAP4 panda boards, the mentioned > >>>issues are getting fixed. We no longer loose interrupts which was the cause > >>>of the regression. > >>> > >>>Cc: Roger Quadros <rogerq@ti.com> > >>>Cc: Kevin Hilman <khilman@linaro.org> > >>>Cc: Tony Lindgren <tony@atomide.com> > >>>Cc: Daniel Lezcano <daniel.lezcano@linaro.org> > >>>Reported-tested-by: Roger Quadros <rogerq@ti.com> > >>>Reported-tested-by: Kevin Hilman <khilman@linaro.org> > >>>Tested-by: Tony Lindgren <tony@atomide.com> > >>>Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> > >>>--- > > [ ... ] > > >>> > >>>+ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); > >>>+ > >>> /* > >>> * Call idle CPU PM enter notifier chain so that > >>> * VFP and per CPU interrupt context is saved. > >>>@@ -165,6 +169,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, > >>> if (dev->cpu == 0 && mpuss_can_lose_context) > >>> cpu_cluster_pm_exit(); > >>> > >>>+ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); > > [ ... ] > > >> > >>Shouldn't the broadcast timer to be setup with CLOCK_EVT_NOTIFY_BROADCAST_ON also ? > >> > >Which is already taken care by __cpuidle_register_driver(). Note that setup code > >is still used from generic code... > > Nope, if the flag CPUIDLE_FLAG_TIMER_STOP is not set, the cpuidle framework > won't setup the timer. > > static void __cpuidle_driver_init(struct cpuidle_driver *drv) > { > > ... > > for (i = drv->state_count - 1; i >= 0 ; i--) { > if (drv->states[i].flags & CPUIDLE_FLAG_TIMER_STOP) { > drv->bctimer = 1; > break; > } > } > > ... > > } > > static int __cpuidle_register_driver(struct cpuidle_driver *drv) > { > ... > > if (drv->bctimer) > on_each_cpu_mask(drv->cpumask, > cpuidle_setup_broadcast_timer, > (void *)CLOCK_EVT_NOTIFY_BROADCAST_ON, 1); > > ... > } > > So the broadcast timer does not operate with this revert. Moreover, I am not > sure reverting this patch is the right solution. OK I'll hold on sending anything until there's some conclusion. Are you able to reproduce the problem with current Linux next? Regards, Tony
On Wednesday 14 May 2014 04:02 PM, Daniel Lezcano wrote: > On 05/14/2014 09:50 PM, Santosh Shilimkar wrote: >> On Wednesday 14 May 2014 03:44 PM, Daniel Lezcano wrote: >>> On 05/13/2014 04:39 PM, Santosh Shilimkar wrote: >>>> On OMAP4 panda board, there have been several bug reports about boot >>>> hang and lock-ups with CPU_IDLE enabled. The root cause of the issue >>>> is missing interrupts while in idle state. Commit cb7094e8 {cpuidle / omap4 : >>>> use CPUIDLE_FLAG_TIMER_STOP flag} moved the broadcast notifiers to common >>>> code for right reasons but on OMAP4 which suffers from a nasty ROM code >>>> bug with GIC, commit ff999b8a {ARM: OMAP4460: Workaround for ROM bug ..}, >>>> we loose interrupts which leads to issues like lock-up, hangs etc. >>>> >>>> Patch reverts commit cb7094 {cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP >>>> flag} to avoid the issue. With this change, OMAP4 panda boards, the mentioned >>>> issues are getting fixed. We no longer loose interrupts which was the cause >>>> of the regression. >>>> >>>> Cc: Roger Quadros <rogerq@ti.com> >>>> Cc: Kevin Hilman <khilman@linaro.org> >>>> Cc: Tony Lindgren <tony@atomide.com> >>>> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> >>>> Reported-tested-by: Roger Quadros <rogerq@ti.com> >>>> Reported-tested-by: Kevin Hilman <khilman@linaro.org> >>>> Tested-by: Tony Lindgren <tony@atomide.com> >>>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> >>>> --- > > [ ... ] > >>>> >>>> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); >>>> + >>>> /* >>>> * Call idle CPU PM enter notifier chain so that >>>> * VFP and per CPU interrupt context is saved. >>>> @@ -165,6 +169,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >>>> if (dev->cpu == 0 && mpuss_can_lose_context) >>>> cpu_cluster_pm_exit(); >>>> >>>> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); > > [ ... ] > >>> >>> Shouldn't the broadcast timer to be setup with CLOCK_EVT_NOTIFY_BROADCAST_ON also ? >>> >> Which is already taken care by __cpuidle_register_driver(). Note that setup code >> is still used from generic code... > > Nope, if the flag CPUIDLE_FLAG_TIMER_STOP is not set, the cpuidle framework won't setup the timer. > I see. I assumed it was taken care. So we might have setup the timer too. > static void __cpuidle_driver_init(struct cpuidle_driver *drv) > { > > ... > > for (i = drv->state_count - 1; i >= 0 ; i--) { > if (drv->states[i].flags & CPUIDLE_FLAG_TIMER_STOP) { May be you should start the bc timer in case 'CPUIDLE_FLAG_TIMER_STOP' or 'CPUIDLE_FLAG_COUPLED' > drv->bctimer = 1; > break; > } > } > > ... > > } > > static int __cpuidle_register_driver(struct cpuidle_driver *drv) > { > ... > > if (drv->bctimer) > on_each_cpu_mask(drv->cpumask, > cpuidle_setup_broadcast_timer, > (void *)CLOCK_EVT_NOTIFY_BROADCAST_ON, 1); > > ... > } > > So the broadcast timer does not operate with this revert. Moreover, I am not sure reverting this patch is the right solution. > With above mentioned change, it should work. Other alternatives is OMAP4 driver does its won registration where it can start the timer. The way it was before the consolidation. Ofcourse if you have better fix, then great.
* Tony Lindgren <tony@atomide.com> [140514 13:57]: > * Daniel Lezcano <daniel.lezcano@linaro.org> [140514 13:02]: > > > > So the broadcast timer does not operate with this revert. Moreover, I am not > > sure reverting this patch is the right solution. > > OK I'll hold on sending anything until there's some conclusion. > > Are you able to reproduce the problem with current Linux next? BTW, I'm also noticing now hangs on omap3 with my PM patches on v3.15-rc4 that seem similar to the panda cpuidle hang. The hangs on omap3 do not happen on my v3.14 based branch, so I'm wondering if there are some recent cpuidle regressions too in play? Regards, Tony
Daniel, On Wednesday 14 May 2014 05:18 PM, Santosh Shilimkar wrote: > On Wednesday 14 May 2014 04:02 PM, Daniel Lezcano wrote: >> On 05/14/2014 09:50 PM, Santosh Shilimkar wrote: >>> On Wednesday 14 May 2014 03:44 PM, Daniel Lezcano wrote: >>>> On 05/13/2014 04:39 PM, Santosh Shilimkar wrote: >>>>> On OMAP4 panda board, there have been several bug reports about boot >>>>> hang and lock-ups with CPU_IDLE enabled. The root cause of the issue >>>>> is missing interrupts while in idle state. Commit cb7094e8 {cpuidle / omap4 : >>>>> use CPUIDLE_FLAG_TIMER_STOP flag} moved the broadcast notifiers to common >>>>> code for right reasons but on OMAP4 which suffers from a nasty ROM code >>>>> bug with GIC, commit ff999b8a {ARM: OMAP4460: Workaround for ROM bug ..}, >>>>> we loose interrupts which leads to issues like lock-up, hangs etc. >>>>> >>>>> Patch reverts commit cb7094 {cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP >>>>> flag} to avoid the issue. With this change, OMAP4 panda boards, the mentioned >>>>> issues are getting fixed. We no longer loose interrupts which was the cause >>>>> of the regression. >>>>> >>>>> Cc: Roger Quadros <rogerq@ti.com> >>>>> Cc: Kevin Hilman <khilman@linaro.org> >>>>> Cc: Tony Lindgren <tony@atomide.com> >>>>> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> >>>>> Reported-tested-by: Roger Quadros <rogerq@ti.com> >>>>> Reported-tested-by: Kevin Hilman <khilman@linaro.org> >>>>> Tested-by: Tony Lindgren <tony@atomide.com> >>>>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> >>>>> --- >> >> [ ... ] >> >>>>> >>>>> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); >>>>> + >>>>> /* >>>>> * Call idle CPU PM enter notifier chain so that >>>>> * VFP and per CPU interrupt context is saved. >>>>> @@ -165,6 +169,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >>>>> if (dev->cpu == 0 && mpuss_can_lose_context) >>>>> cpu_cluster_pm_exit(); >>>>> >>>>> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); >> >> [ ... ] >> >>>> >>>> Shouldn't the broadcast timer to be setup with CLOCK_EVT_NOTIFY_BROADCAST_ON also ? >>>> >>> Which is already taken care by __cpuidle_register_driver(). Note that setup code >>> is still used from generic code... >> >> Nope, if the flag CPUIDLE_FLAG_TIMER_STOP is not set, the cpuidle framework won't setup the timer. >> > I see. I assumed it was taken care. So we might have setup the timer too. > >> static void __cpuidle_driver_init(struct cpuidle_driver *drv) >> { >> >> ... >> >> for (i = drv->state_count - 1; i >= 0 ; i--) { >> if (drv->states[i].flags & CPUIDLE_FLAG_TIMER_STOP) { > May be you should start the bc timer in case 'CPUIDLE_FLAG_TIMER_STOP' > or 'CPUIDLE_FLAG_COUPLED' >> drv->bctimer = 1; >> break; >> } >> } >> >> ... >> >> } >> >> static int __cpuidle_register_driver(struct cpuidle_driver *drv) >> { >> ... >> >> if (drv->bctimer) >> on_each_cpu_mask(drv->cpumask, >> cpuidle_setup_broadcast_timer, >> (void *)CLOCK_EVT_NOTIFY_BROADCAST_ON, 1); >> >> ... >> } >> >> So the broadcast timer does not operate with this revert. Moreover, I am not sure reverting this patch is the right solution. >> > With above mentioned change, it should work. Other alternatives is OMAP4 driver does > its won registration where it can start the timer. The way it was before the > consolidation. > > Ofcourse if you have better fix, then great. > What is your suggestion. We *must* fix the regression asap. I think $subject patch with an update to bctimer start under CPUIDLE_FLAG_COUPLED seems a good way forward. Do let me know. Regards, Santosh
On 05/15/2014 07:03 PM, Santosh Shilimkar wrote: > Daniel, > > On Wednesday 14 May 2014 05:18 PM, Santosh Shilimkar wrote: >> On Wednesday 14 May 2014 04:02 PM, Daniel Lezcano wrote: >>> On 05/14/2014 09:50 PM, Santosh Shilimkar wrote: >>>> On Wednesday 14 May 2014 03:44 PM, Daniel Lezcano wrote: >>>>> On 05/13/2014 04:39 PM, Santosh Shilimkar wrote: >>>>>> On OMAP4 panda board, there have been several bug reports about boot >>>>>> hang and lock-ups with CPU_IDLE enabled. The root cause of the issue >>>>>> is missing interrupts while in idle state. Commit cb7094e8 {cpuidle / omap4 : >>>>>> use CPUIDLE_FLAG_TIMER_STOP flag} moved the broadcast notifiers to common >>>>>> code for right reasons but on OMAP4 which suffers from a nasty ROM code >>>>>> bug with GIC, commit ff999b8a {ARM: OMAP4460: Workaround for ROM bug ..}, >>>>>> we loose interrupts which leads to issues like lock-up, hangs etc. >>>>>> >>>>>> Patch reverts commit cb7094 {cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP >>>>>> flag} to avoid the issue. With this change, OMAP4 panda boards, the mentioned >>>>>> issues are getting fixed. We no longer loose interrupts which was the cause >>>>>> of the regression. >>>>>> >>>>>> Cc: Roger Quadros <rogerq@ti.com> >>>>>> Cc: Kevin Hilman <khilman@linaro.org> >>>>>> Cc: Tony Lindgren <tony@atomide.com> >>>>>> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> >>>>>> Reported-tested-by: Roger Quadros <rogerq@ti.com> >>>>>> Reported-tested-by: Kevin Hilman <khilman@linaro.org> >>>>>> Tested-by: Tony Lindgren <tony@atomide.com> >>>>>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> >>>>>> --- >>> >>> [ ... ] >>> >>>>>> >>>>>> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); >>>>>> + >>>>>> /* >>>>>> * Call idle CPU PM enter notifier chain so that >>>>>> * VFP and per CPU interrupt context is saved. >>>>>> @@ -165,6 +169,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >>>>>> if (dev->cpu == 0 && mpuss_can_lose_context) >>>>>> cpu_cluster_pm_exit(); >>>>>> >>>>>> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); >>> >>> [ ... ] >>> >>>>> >>>>> Shouldn't the broadcast timer to be setup with CLOCK_EVT_NOTIFY_BROADCAST_ON also ? >>>>> >>>> Which is already taken care by __cpuidle_register_driver(). Note that setup code >>>> is still used from generic code... >>> >>> Nope, if the flag CPUIDLE_FLAG_TIMER_STOP is not set, the cpuidle framework won't setup the timer. >>> >> I see. I assumed it was taken care. So we might have setup the timer too. >> >>> static void __cpuidle_driver_init(struct cpuidle_driver *drv) >>> { >>> >>> ... >>> >>> for (i = drv->state_count - 1; i >= 0 ; i--) { >>> if (drv->states[i].flags & CPUIDLE_FLAG_TIMER_STOP) { >> May be you should start the bc timer in case 'CPUIDLE_FLAG_TIMER_STOP' >> or 'CPUIDLE_FLAG_COUPLED' >>> drv->bctimer = 1; >>> break; >>> } >>> } >>> >>> ... >>> >>> } >>> >>> static int __cpuidle_register_driver(struct cpuidle_driver *drv) >>> { >>> ... >>> >>> if (drv->bctimer) >>> on_each_cpu_mask(drv->cpumask, >>> cpuidle_setup_broadcast_timer, >>> (void *)CLOCK_EVT_NOTIFY_BROADCAST_ON, 1); >>> >>> ... >>> } >>> >>> So the broadcast timer does not operate with this revert. Moreover, I am not sure reverting this patch is the right solution. >>> >> With above mentioned change, it should work. Other alternatives is OMAP4 driver does >> its won registration where it can start the timer. The way it was before the >> consolidation. >> >> Ofcourse if you have better fix, then great. >> > What is your suggestion. We *must* fix the regression asap. I think > $subject patch with an update to bctimer start under CPUIDLE_FLAG_COUPLED > seems a good way forward. > > Do let me know. Did you see Alex Shi's email [cc'ed] ? Reverting this change makes the panda ES to hang. I am not convinced the culprit is this code you are trying to revert. I will try to reproduce the bug on my board.
On Thursday 15 May 2014 01:50 PM, Daniel Lezcano wrote: > On 05/15/2014 07:03 PM, Santosh Shilimkar wrote: >> Daniel, >> >> On Wednesday 14 May 2014 05:18 PM, Santosh Shilimkar wrote: >>> On Wednesday 14 May 2014 04:02 PM, Daniel Lezcano wrote: >>>> On 05/14/2014 09:50 PM, Santosh Shilimkar wrote: >>>>> On Wednesday 14 May 2014 03:44 PM, Daniel Lezcano wrote: >>>>>> On 05/13/2014 04:39 PM, Santosh Shilimkar wrote: >>>>>>> On OMAP4 panda board, there have been several bug reports about boot >>>>>>> hang and lock-ups with CPU_IDLE enabled. The root cause of the issue >>>>>>> is missing interrupts while in idle state. Commit cb7094e8 {cpuidle / omap4 : >>>>>>> use CPUIDLE_FLAG_TIMER_STOP flag} moved the broadcast notifiers to common >>>>>>> code for right reasons but on OMAP4 which suffers from a nasty ROM code >>>>>>> bug with GIC, commit ff999b8a {ARM: OMAP4460: Workaround for ROM bug ..}, >>>>>>> we loose interrupts which leads to issues like lock-up, hangs etc. >>>>>>> >>>>>>> Patch reverts commit cb7094 {cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP >>>>>>> flag} to avoid the issue. With this change, OMAP4 panda boards, the mentioned >>>>>>> issues are getting fixed. We no longer loose interrupts which was the cause >>>>>>> of the regression. >>>>>>> >>>>>>> Cc: Roger Quadros <rogerq@ti.com> >>>>>>> Cc: Kevin Hilman <khilman@linaro.org> >>>>>>> Cc: Tony Lindgren <tony@atomide.com> >>>>>>> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> >>>>>>> Reported-tested-by: Roger Quadros <rogerq@ti.com> >>>>>>> Reported-tested-by: Kevin Hilman <khilman@linaro.org> >>>>>>> Tested-by: Tony Lindgren <tony@atomide.com> >>>>>>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> >>>>>>> --- >>>> >>>> [ ... ] >>>> >>>>>>> >>>>>>> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); >>>>>>> + >>>>>>> /* >>>>>>> * Call idle CPU PM enter notifier chain so that >>>>>>> * VFP and per CPU interrupt context is saved. >>>>>>> @@ -165,6 +169,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >>>>>>> if (dev->cpu == 0 && mpuss_can_lose_context) >>>>>>> cpu_cluster_pm_exit(); >>>>>>> >>>>>>> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); >>>> >>>> [ ... ] >>>> >>>>>> >>>>>> Shouldn't the broadcast timer to be setup with CLOCK_EVT_NOTIFY_BROADCAST_ON also ? >>>>>> >>>>> Which is already taken care by __cpuidle_register_driver(). Note that setup code >>>>> is still used from generic code... >>>> >>>> Nope, if the flag CPUIDLE_FLAG_TIMER_STOP is not set, the cpuidle framework won't setup the timer. >>>> >>> I see. I assumed it was taken care. So we might have setup the timer too. >>> >>>> static void __cpuidle_driver_init(struct cpuidle_driver *drv) >>>> { >>>> >>>> ... >>>> >>>> for (i = drv->state_count - 1; i >= 0 ; i--) { >>>> if (drv->states[i].flags & CPUIDLE_FLAG_TIMER_STOP) { >>> May be you should start the bc timer in case 'CPUIDLE_FLAG_TIMER_STOP' >>> or 'CPUIDLE_FLAG_COUPLED' >>>> drv->bctimer = 1; >>>> break; >>>> } >>>> } >>>> >>>> ... >>>> >>>> } >>>> >>>> static int __cpuidle_register_driver(struct cpuidle_driver *drv) >>>> { >>>> ... >>>> >>>> if (drv->bctimer) >>>> on_each_cpu_mask(drv->cpumask, >>>> cpuidle_setup_broadcast_timer, >>>> (void *)CLOCK_EVT_NOTIFY_BROADCAST_ON, 1); >>>> >>>> ... >>>> } >>>> >>>> So the broadcast timer does not operate with this revert. Moreover, I am not sure reverting this patch is the right solution. >>>> >>> With above mentioned change, it should work. Other alternatives is OMAP4 driver does >>> its won registration where it can start the timer. The way it was before the >>> consolidation. >>> >>> Ofcourse if you have better fix, then great. >>> >> What is your suggestion. We *must* fix the regression asap. I think >> $subject patch with an update to bctimer start under CPUIDLE_FLAG_COUPLED >> seems a good way forward. >> >> Do let me know. > > Did you see Alex Shi's email [cc'ed] ? Reverting this change makes the panda ES to hang. > The hang is definitely due to the bctimer not started. As I said, I assumed it was and then you corrected saying it is under the flag. > I am not convinced the culprit is this code you are trying to revert. > fair enough. Thats why I said if you have an alternative fix thats great. > I will try to reproduce the bug on my board. > Sure...
* Tony Lindgren <tony@atomide.com> [140514 16:32]: > * Tony Lindgren <tony@atomide.com> [140514 13:57]: > > * Daniel Lezcano <daniel.lezcano@linaro.org> [140514 13:02]: > > > > > > So the broadcast timer does not operate with this revert. Moreover, I am not > > > sure reverting this patch is the right solution. > > > > OK I'll hold on sending anything until there's some conclusion. > > > > Are you able to reproduce the problem with current Linux next? > > BTW, I'm also noticing now hangs on omap3 with my PM patches > on v3.15-rc4 that seem similar to the panda cpuidle hang. > > The hangs on omap3 do not happen on my v3.14 based branch, so > I'm wondering if there are some recent cpuidle regressions too > in play? Looks like the omap3 idle hang is caused by commit 6ddeb6d84459 (dmaengine: omap-dma: move IRQ handling to omap-dma). This may not be related to omap4 cpu_idle hang, but it might. I'm trying to figure out if it's related to something like omap_dma_global_context_save and omap_dma_global_context_restore. Meanwhile, to test the DMA changes have an effect on the omap4 cpu_idle issue, you can try reverting the following two patches: aa4c5b962a7a dmaengine: omap-dma: more consolidation of CCR register setup 6ddeb6d84459 dmaengine: omap-dma: move IRQ handling to omap-dma Regards, Tony
* Tony Lindgren <tony@atomide.com> [140515 17:14]: > * Tony Lindgren <tony@atomide.com> [140514 16:32]: > > * Tony Lindgren <tony@atomide.com> [140514 13:57]: > > > * Daniel Lezcano <daniel.lezcano@linaro.org> [140514 13:02]: > > > > > > > > So the broadcast timer does not operate with this revert. Moreover, I am not > > > > sure reverting this patch is the right solution. > > > > > > OK I'll hold on sending anything until there's some conclusion. > > > > > > Are you able to reproduce the problem with current Linux next? > > > > BTW, I'm also noticing now hangs on omap3 with my PM patches > > on v3.15-rc4 that seem similar to the panda cpuidle hang. > > > > The hangs on omap3 do not happen on my v3.14 based branch, so > > I'm wondering if there are some recent cpuidle regressions too > > in play? > > Looks like the omap3 idle hang is caused by commit 6ddeb6d84459 > (dmaengine: omap-dma: move IRQ handling to omap-dma). This may > not be related to omap4 cpu_idle hang, but it might. > > I'm trying to figure out if it's related to something like > omap_dma_global_context_save and omap_dma_global_context_restore. > > Meanwhile, to test the DMA changes have an effect on the omap4 > cpu_idle issue, you can try reverting the following two patches: > > aa4c5b962a7a dmaengine: omap-dma: more consolidation of CCR register setup > 6ddeb6d84459 dmaengine: omap-dma: move IRQ handling to omap-dma OK with the above reverted still seeing hangs on my panda es. So it seems the cpu_idle issue is different. Regards, Tony
diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c index 01fc710..3e169d9 100644 --- a/arch/arm/mach-omap2/cpuidle44xx.c +++ b/arch/arm/mach-omap2/cpuidle44xx.c @@ -14,6 +14,7 @@ #include <linux/cpuidle.h> #include <linux/cpu_pm.h> #include <linux/export.h> +#include <linux/clockchips.h> #include <asm/cpuidle.h> #include <asm/proc-fns.h> @@ -83,6 +84,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, { struct idle_statedata *cx = state_ptr + index; u32 mpuss_can_lose_context = 0; + int cpu_id = smp_processor_id(); /* * CPU0 has to wait and stay ON until CPU1 is OFF state. @@ -110,6 +112,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) && (cx->mpu_logic_state == PWRDM_POWER_OFF); + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); + /* * Call idle CPU PM enter notifier chain so that * VFP and per CPU interrupt context is saved. @@ -165,6 +169,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, if (dev->cpu == 0 && mpuss_can_lose_context) cpu_cluster_pm_exit(); + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); + fail: cpuidle_coupled_parallel_barrier(dev, &abort_barrier); cpu_done[dev->cpu] = false; @@ -189,8 +195,7 @@ static struct cpuidle_driver omap4_idle_driver = { /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */ .exit_latency = 328 + 440, .target_residency = 960, - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | - CPUIDLE_FLAG_TIMER_STOP, + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, .enter = omap_enter_idle_coupled, .name = "C2", .desc = "CPUx OFF, MPUSS CSWR", @@ -199,8 +204,7 @@ static struct cpuidle_driver omap4_idle_driver = { /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */ .exit_latency = 460 + 518, .target_residency = 1100, - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | - CPUIDLE_FLAG_TIMER_STOP, + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, .enter = omap_enter_idle_coupled, .name = "C3", .desc = "CPUx OFF, MPUSS OSWR",