Message ID | 53713FCF.3000006@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
* Santosh Shilimkar <santosh.shilimkar@ti.com> [140512 14:41]: > On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: > > * Kevin Hilman <khilman@linaro.org> [140509 16:46]: > >> Roger Quadros <rogerq@ti.com> writes: > >> > >>> Kevin, > >>> > >>> On 05/09/2014 01:15 AM, Kevin Hilman wrote: > >>>> Tony Lindgren <tony@atomide.com> writes: > >>>> > >>>> [...] > >>>> > >>>>> ..but I think I found the cause for recent hangs on panda, just a wild > >>>>> guess based on looking at the recent cpuidle patches after v3.14. > >>>>> > >>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts > >>>>> until all coupled CPUs leave idle) makes booting work reliably again > >>>>> on panda. > >>>>> > >>>>> Can you guys confirm, so far no issues here after few boot tests, > >>>>> but it might be too early to tell. > >>>> > >>>> Reverting that makes things a bit more stable, but it still eventually > >>>> fails in the same way. For me it took 8 boots for it to eventually > >>>> fail. > >>>> > >>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable > >>>> (20+ boots in a row and still going.) > >>>> > >>> > >>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? > >>> It worked for me 10/10 boots. > >> > >> Yup, it worked for me too for 10/10 boots in a row. > > > > But what has caused this regression, does it work reliably with let's > > say v3.13 or v3.12? > > > IIRC things were stable till some CPUIDLE code consolidation happened. > I don't recall exactly but some one did discuss about it a while back. OK that's good to hear. > Can you re-run your test-cases with patch at end of the email. This > is just a hunch so don't blame me if I waste your time testing the > patch. Seems to work after adding "#include <linux/clockchips.h>". I did about 10 reboots and they all succeeded for me. Without your revert, I'm getting a hang (with sysrq not working) about 1/3 of the boots. Kevin, Roger, does the revert from Santosh work for you too? BTW, I think the the RCU stall was/is a separate issue. That's different where the system actually recovers after about a minute, or after sysrq ctrl-a f h or l. Sorry, I no longer know if the RCU stall is only with the older kernels around v3.10 time, or if it's still also happening. Regards, Tony > From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001 > From: Santosh Shilimkar <santosh.shilimkar@ti.com> > Date: Mon, 12 May 2014 17:37:59 -0400 > Subject: [PATCH] Revert "cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag" > > This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78. > > Conflicts: > > arch/arm/mach-omap2/cpuidle44xx.c > --- > arch/arm/mach-omap2/cpuidle44xx.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c > index 01fc710..aae3606 100644 > --- a/arch/arm/mach-omap2/cpuidle44xx.c > +++ b/arch/arm/mach-omap2/cpuidle44xx.c > @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, > { > struct idle_statedata *cx = state_ptr + index; > u32 mpuss_can_lose_context = 0; > + int cpu_id = smp_processor_id(); > > /* > * CPU0 has to wait and stay ON until CPU1 is OFF state. > @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, > mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) && > (cx->mpu_logic_state == PWRDM_POWER_OFF); > > + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); > + > /* > * Call idle CPU PM enter notifier chain so that > * VFP and per CPU interrupt context is saved. > @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, > if (dev->cpu == 0 && mpuss_can_lose_context) > cpu_cluster_pm_exit(); > > + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); > + > fail: > cpuidle_coupled_parallel_barrier(dev, &abort_barrier); > cpu_done[dev->cpu] = false; > @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = { > /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */ > .exit_latency = 328 + 440, > .target_residency = 960, > - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | > - CPUIDLE_FLAG_TIMER_STOP, > + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, > .enter = omap_enter_idle_coupled, > .name = "C2", > .desc = "CPUx OFF, MPUSS CSWR", > @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = { > /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */ > .exit_latency = 460 + 518, > .target_residency = 1100, > - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | > - CPUIDLE_FLAG_TIMER_STOP, > + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, > .enter = omap_enter_idle_coupled, > .name = "C3", > .desc = "CPUx OFF, MPUSS OSWR", > -- > 1.7.9.5 > >
Santosh Shilimkar <santosh.shilimkar@ti.com> writes: > On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: >> * Kevin Hilman <khilman@linaro.org> [140509 16:46]: >>> Roger Quadros <rogerq@ti.com> writes: >>> >>>> Kevin, >>>> >>>> On 05/09/2014 01:15 AM, Kevin Hilman wrote: >>>>> Tony Lindgren <tony@atomide.com> writes: >>>>> >>>>> [...] >>>>> >>>>>> ..but I think I found the cause for recent hangs on panda, just a wild >>>>>> guess based on looking at the recent cpuidle patches after v3.14. >>>>>> >>>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts >>>>>> until all coupled CPUs leave idle) makes booting work reliably again >>>>>> on panda. >>>>>> >>>>>> Can you guys confirm, so far no issues here after few boot tests, >>>>>> but it might be too early to tell. >>>>> >>>>> Reverting that makes things a bit more stable, but it still eventually >>>>> fails in the same way. For me it took 8 boots for it to eventually >>>>> fail. >>>>> >>>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable >>>>> (20+ boots in a row and still going.) >>>>> >>>> >>>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? >>>> It worked for me 10/10 boots. >>> >>> Yup, it worked for me too for 10/10 boots in a row. >> >> But what has caused this regression, does it work reliably with let's >> say v3.13 or v3.12? >> > IIRC things were stable till some CPUIDLE code consolidation happened. > I don't recall exactly but some one did discuss about it a while back. > > Can you re-run your test-cases with patch at end of the email. This > is just a hunch so don't blame me if I waste your time testing the > patch. With your patch applied on top of next-20140512, my 4460 Panda-ES has booted 25 times in a row, and still going. Kevin
On 05/13/2014 01:07 AM, Tony Lindgren wrote: > * Santosh Shilimkar <santosh.shilimkar@ti.com> [140512 14:41]: >> On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: >>> * Kevin Hilman <khilman@linaro.org> [140509 16:46]: >>>> Roger Quadros <rogerq@ti.com> writes: >>>> >>>>> Kevin, >>>>> >>>>> On 05/09/2014 01:15 AM, Kevin Hilman wrote: >>>>>> Tony Lindgren <tony@atomide.com> writes: >>>>>> >>>>>> [...] >>>>>> >>>>>>> ..but I think I found the cause for recent hangs on panda, just a wild >>>>>>> guess based on looking at the recent cpuidle patches after v3.14. >>>>>>> >>>>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts >>>>>>> until all coupled CPUs leave idle) makes booting work reliably again >>>>>>> on panda. >>>>>>> >>>>>>> Can you guys confirm, so far no issues here after few boot tests, >>>>>>> but it might be too early to tell. >>>>>> >>>>>> Reverting that makes things a bit more stable, but it still eventually >>>>>> fails in the same way. For me it took 8 boots for it to eventually >>>>>> fail. >>>>>> >>>>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable >>>>>> (20+ boots in a row and still going.) >>>>>> >>>>> >>>>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? >>>>> It worked for me 10/10 boots. >>>> >>>> Yup, it worked for me too for 10/10 boots in a row. >>> >>> But what has caused this regression, does it work reliably with let's >>> say v3.13 or v3.12? >>> >> IIRC things were stable till some CPUIDLE code consolidation happened. >> I don't recall exactly but some one did discuss about it a while back. > > OK that's good to hear. > >> Can you re-run your test-cases with patch at end of the email. This >> is just a hunch so don't blame me if I waste your time testing the >> patch. > > Seems to work after adding "#include <linux/clockchips.h>". I did about 10 > reboots and they all succeeded for me. Without your revert, I'm getting > a hang (with sysrq not working) about 1/3 of the boots. > > Kevin, Roger, does the revert from Santosh work for you too? > next-20140508 worked for me 10/10 times with Santosh's patch. The heartbeat LED behaves normally as well. So I like it :). cheers, -roger > BTW, I think the the RCU stall was/is a separate issue. That's different > where the system actually recovers after about a minute, or after sysrq > ctrl-a f h or l. Sorry, I no longer know if the RCU stall is only with the > older kernels around v3.10 time, or if it's still also happening. > > Regards, > > Tony > >> From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001 >> From: Santosh Shilimkar <santosh.shilimkar@ti.com> >> Date: Mon, 12 May 2014 17:37:59 -0400 >> Subject: [PATCH] Revert "cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag" >> >> This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78. >> >> Conflicts: >> >> arch/arm/mach-omap2/cpuidle44xx.c >> --- >> arch/arm/mach-omap2/cpuidle44xx.c | 11 +++++++---- >> 1 file changed, 7 insertions(+), 4 deletions(-) >> >> diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c >> index 01fc710..aae3606 100644 >> --- a/arch/arm/mach-omap2/cpuidle44xx.c >> +++ b/arch/arm/mach-omap2/cpuidle44xx.c >> @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >> { >> struct idle_statedata *cx = state_ptr + index; >> u32 mpuss_can_lose_context = 0; >> + int cpu_id = smp_processor_id(); >> >> /* >> * CPU0 has to wait and stay ON until CPU1 is OFF state. >> @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >> mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) && >> (cx->mpu_logic_state == PWRDM_POWER_OFF); >> >> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); >> + >> /* >> * Call idle CPU PM enter notifier chain so that >> * VFP and per CPU interrupt context is saved. >> @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, >> if (dev->cpu == 0 && mpuss_can_lose_context) >> cpu_cluster_pm_exit(); >> >> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); >> + >> fail: >> cpuidle_coupled_parallel_barrier(dev, &abort_barrier); >> cpu_done[dev->cpu] = false; >> @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = { >> /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */ >> .exit_latency = 328 + 440, >> .target_residency = 960, >> - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | >> - CPUIDLE_FLAG_TIMER_STOP, >> + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, >> .enter = omap_enter_idle_coupled, >> .name = "C2", >> .desc = "CPUx OFF, MPUSS CSWR", >> @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = { >> /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */ >> .exit_latency = 460 + 518, >> .target_residency = 1100, >> - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | >> - CPUIDLE_FLAG_TIMER_STOP, >> + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, >> .enter = omap_enter_idle_coupled, >> .name = "C3", >> .desc = "CPUx OFF, MPUSS OSWR", >> -- >> 1.7.9.5 >> >>
On Tuesday 13 May 2014 04:10 AM, Roger Quadros wrote: > On 05/13/2014 01:07 AM, Tony Lindgren wrote: >> * Santosh Shilimkar <santosh.shilimkar@ti.com> [140512 14:41]: >>> On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: >>>> * Kevin Hilman <khilman@linaro.org> [140509 16:46]: >>>>> Roger Quadros <rogerq@ti.com> writes: >>>>> >>>>>> Kevin, >>>>>> >>>>>> On 05/09/2014 01:15 AM, Kevin Hilman wrote: >>>>>>> Tony Lindgren <tony@atomide.com> writes: >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>>> ..but I think I found the cause for recent hangs on panda, just a wild >>>>>>>> guess based on looking at the recent cpuidle patches after v3.14. >>>>>>>> >>>>>>>> Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts >>>>>>>> until all coupled CPUs leave idle) makes booting work reliably again >>>>>>>> on panda. >>>>>>>> >>>>>>>> Can you guys confirm, so far no issues here after few boot tests, >>>>>>>> but it might be too early to tell. >>>>>>> >>>>>>> Reverting that makes things a bit more stable, but it still eventually >>>>>>> fails in the same way. For me it took 8 boots for it to eventually >>>>>>> fail. >>>>>>> >>>>>>> However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable >>>>>>> (20+ boots in a row and still going.) >>>>>>> >>>>>> >>>>>> Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? >>>>>> It worked for me 10/10 boots. >>>>> >>>>> Yup, it worked for me too for 10/10 boots in a row. >>>> >>>> But what has caused this regression, does it work reliably with let's >>>> say v3.13 or v3.12? >>>> >>> IIRC things were stable till some CPUIDLE code consolidation happened. >>> I don't recall exactly but some one did discuss about it a while back. >> >> OK that's good to hear. >> >>> Can you re-run your test-cases with patch at end of the email. This >>> is just a hunch so don't blame me if I waste your time testing the >>> patch. >> >> Seems to work after adding "#include <linux/clockchips.h>". I did about 10 >> reboots and they all succeeded for me. Without your revert, I'm getting >> a hang (with sysrq not working) about 1/3 of the boots. >> >> Kevin, Roger, does the revert from Santosh work for you too? >> > > next-20140508 worked for me 10/10 times with Santosh's patch. > The heartbeat LED behaves normally as well. So I like it :). > Great. Will post the patch with change log updated and cc you guys. Regards, Santosh
diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c index 01fc710..aae3606 100644 --- a/arch/arm/mach-omap2/cpuidle44xx.c +++ b/arch/arm/mach-omap2/cpuidle44xx.c @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, { struct idle_statedata *cx = state_ptr + index; u32 mpuss_can_lose_context = 0; + int cpu_id = smp_processor_id(); /* * CPU0 has to wait and stay ON until CPU1 is OFF state. @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, mpuss_can_lose_context = (cx->mpu_state == PWRDM_POWER_RET) && (cx->mpu_logic_state == PWRDM_POWER_OFF); + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu_id); + /* * Call idle CPU PM enter notifier chain so that * VFP and per CPU interrupt context is saved. @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, if (dev->cpu == 0 && mpuss_can_lose_context) cpu_cluster_pm_exit(); + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu_id); + fail: cpuidle_coupled_parallel_barrier(dev, &abort_barrier); cpu_done[dev->cpu] = false; @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = { /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */ .exit_latency = 328 + 440, .target_residency = 960, - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | - CPUIDLE_FLAG_TIMER_STOP, + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, .enter = omap_enter_idle_coupled, .name = "C2", .desc = "CPUx OFF, MPUSS CSWR", @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = { /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */ .exit_latency = 460 + 518, .target_residency = 1100, - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | - CPUIDLE_FLAG_TIMER_STOP, + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, .enter = omap_enter_idle_coupled, .name = "C3", .desc = "CPUx OFF, MPUSS OSWR",