diff mbox

Enable arm_global_timer for Zynq brakes boot

Message ID 712d31e9-3584-48e1-aa9f-55bc94fa62c9@DB9EHSMHS001.ehs.local (mailing list archive)
State New, archived
Headers show

Commit Message

Soren Brinkmann Aug. 6, 2013, 1:28 a.m. UTC
Hi Daniel,

On Thu, Aug 01, 2013 at 07:48:04PM +0200, Daniel Lezcano wrote:
> On 08/01/2013 07:43 PM, Sören Brinkmann wrote:
> > On Thu, Aug 01, 2013 at 07:29:12PM +0200, Daniel Lezcano wrote:
> >> On 08/01/2013 01:38 AM, Sören Brinkmann wrote:
> >>> On Thu, Aug 01, 2013 at 01:01:27AM +0200, Daniel Lezcano wrote:
> >>>> On 08/01/2013 12:18 AM, Sören Brinkmann wrote:
> >>>>> On Wed, Jul 31, 2013 at 11:08:51PM +0200, Daniel Lezcano wrote:
> >>>>>> On 07/31/2013 10:58 PM, Sören Brinkmann wrote:
> >>>>>>> On Wed, Jul 31, 2013 at 10:49:06PM +0200, Daniel Lezcano wrote:
> >>>>>>>> On 07/31/2013 12:34 AM, Sören Brinkmann wrote:
> >>>>>>>>> On Tue, Jul 30, 2013 at 10:47:15AM +0200, Daniel Lezcano wrote:
> >>>>>>>>>> On 07/30/2013 02:03 AM, Sören Brinkmann wrote:
> >>>>>>>>>>> Hi Daniel,
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Jul 29, 2013 at 02:51:49PM +0200, Daniel Lezcano wrote:
> >>>>>>>>>>> (snip)
> >>>>>>>>>>>>
> >>>>>>>>>>>> the CPUIDLE_FLAG_TIMER_STOP flag tells the cpuidle framework the local
> >>>>>>>>>>>> timer will be stopped when entering to the idle state. In this case, the
> >>>>>>>>>>>> cpuidle framework will call clockevents_notify(ENTER) and switches to a
> >>>>>>>>>>>> broadcast timer and will call clockevents_notify(EXIT) when exiting the
> >>>>>>>>>>>> idle state, switching the local timer back in use.
> >>>>>>>>>>>
> >>>>>>>>>>> I've been thinking about this, trying to understand how this makes my
> >>>>>>>>>>> boot attempts on Zynq hang. IIUC, the wrongly provided TIMER_STOP flag
> >>>>>>>>>>> would make the timer core switch to a broadcast device even though it
> >>>>>>>>>>> wouldn't be necessary. But shouldn't it still work? It sounds like we do
> >>>>>>>>>>> something useless, but nothing wrong in a sense that it should result in
> >>>>>>>>>>> breakage. I guess I'm missing something obvious. This timer system will
> >>>>>>>>>>> always remain a mystery to me.
> >>>>>>>>>>>
> >>>>>>>>>>> Actually this more or less leads to the question: What is this
> >>>>>>>>>>> 'broadcast timer'. I guess that is some clockevent device which is
> >>>>>>>>>>> common to all cores? (that would be the cadence_ttc for Zynq). Is the
> >>>>>>>>>>> hang pointing to some issue with that driver?
> >>>>>>>>>>
> >>>>>>>>>> If you look at the /proc/timer_list, which timer is used for broadcasting ?
> >>>>>>>>>
> >>>>>>>>> So, the correct run results (full output attached).
> >>>>>>>>>
> >>>>>>>>> The vanilla kernel uses the twd timers as local timers and the TTC as
> >>>>>>>>> broadcast device:
> >>>>>>>>> 	Tick Device: mode:     1                                                         
> >>>>>>>>> 	Broadcast device  
> >>>>>>>>> 	Clock Event Device: ttc_clockevent
> >>>>>>>>>
> >>>>>>>>> When I remove the offending CPUIDLE flag and add the DT fragment to
> >>>>>>>>> enable the global timer, the twd timers are still used as local timers
> >>>>>>>>> and the broadcast device is the global timer:
> >>>>>>>>> 	Tick Device: mode:     1                                                         
> >>>>>>>>> 	Broadcast device                                                                 
> >>>>>>>>> 	Clock Event Device: arm_global_timer
> >>>>>>>>>
> >>>>>>>>> Again, since boot hangs in the actually broken case, I don't see way to
> >>>>>>>>> obtain this information for that case.
> >>>>>>>>
> >>>>>>>> Can't you use the maxcpus=1 option to ensure the system to boot up ?
> >>>>>>>
> >>>>>>> Right, that works. I forgot about that option after you mentioned, that
> >>>>>>> it is most likely not that useful.
> >>>>>>>
> >>>>>>> Anyway, this are those sysfs files with an unmodified cpuidle driver and
> >>>>>>> the gt enabled and having maxcpus=1 set.
> >>>>>>>
> >>>>>>> /proc/timer_list:
> >>>>>>> 	Tick Device: mode:     1
> >>>>>>> 	Broadcast device
> >>>>>>> 	Clock Event Device: arm_global_timer
> >>>>>>> 	 max_delta_ns:   12884902005
> >>>>>>> 	 min_delta_ns:   1000
> >>>>>>> 	 mult:           715827876
> >>>>>>> 	 shift:          31
> >>>>>>> 	 mode:           3
> >>>>>>
> >>>>>> Here the mode is 3 (CLOCK_EVT_MODE_ONESHOT)
> >>>>>>
> >>>>>> The previous timer_list output you gave me when removing the offending
> >>>>>> cpuidle flag, it was 1 (CLOCK_EVT_MODE_SHUTDOWN).
> >>>>>>
> >>>>>> Is it possible you try to get this output again right after onlining the
> >>>>>> cpu1 in order to check if the broadcast device switches to SHUTDOWN ?
> >>>>>
> >>>>> How do I do that? I tried to online CPU1 after booting with maxcpus=1
> >>>>> and that didn't end well:
> >>>>> 	# echo 1 > online && cat /proc/timer_list 
> >>>>
> >>>> Hmm, I was hoping to have a small delay before the kernel hangs but
> >>>> apparently this is not the case... :(
> >>>>
> >>>> I suspect the global timer is shutdown at one moment but I don't
> >>>> understand why and when.
> >>>>
> >>>> Can you add a stack trace in the "clockevents_shutdown" function with
> >>>> the clockevent device name ? Perhaps, we may see at boot time an
> >>>> interesting trace when it hangs.
> >>>
> >>> I did this change:
> >>> 	diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> >>> 	index 38959c8..3ab11c1 100644
> >>> 	--- a/kernel/time/clockevents.c
> >>> 	+++ b/kernel/time/clockevents.c
> >>> 	@@ -92,6 +92,8 @@ void clockevents_set_mode(struct clock_event_device *dev,
> >>> 	  */
> >>> 	 void clockevents_shutdown(struct clock_event_device *dev)
> >>> 	 {
> >>> 	+       pr_info("ce->name:%s\n", dev->name);
> >>> 	+       dump_stack();
> >>> 	        clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
> >>> 	        dev->next_event.tv64 = KTIME_MAX;
> >>> 	 }
> >>>
> >>> It is hit a few times during boot, so I attach a full boot log. I really
> >>> don't know what to look for, but I hope you can spot something in it. I
> >>> really appreciate you taking the time.
> >>
> >> Thanks for the traces.
> > 
> > Sure.
> > 
> >>
> >> If you try without the ttc_clockevent configured in the kernel (but with
> >> twd and gt), does it boot ?
> > 
> > Absence of the TTC doesn't seem to make any difference. It hangs at the
> > same location.
> 
> Ok, IMO there is a problem with the broadcast device registration (may
> be vs twd).
> 
> I will check later (kid duty) :)

I was actually waiting for an update from your side and did something
else, but I seem to have run into this again. I was overhauling the
cadence_ttc (patch attached, based on tip/timers/core). And it seems to
show the same behavior as enabling the global_timer. With cpuidle off, it
works. With cpuidle, on it hangs. Removing the TIMER_STOP flag from the
C2 state makes it boot again.
It works just fine on our 3.10 kernel.

Another thing I noticed - probably unrelated but hard to tell: On
3.11-rc1 and later my system stops for quite some time at the hand off
to userspace. I.e. I see the 'freeing unused kernel memory...' line and
sometimes the following 'Welcome to Buildroot...' and then it stops and
on good kernels it continues after a while and boots through and on bad
ones it just hangs there.

	Sören

Comments

Daniel Lezcano Aug. 6, 2013, 8:46 a.m. UTC | #1
On 08/06/2013 03:28 AM, Sören Brinkmann wrote:
> Hi Daniel,
> 
> On Thu, Aug 01, 2013 at 07:48:04PM +0200, Daniel Lezcano wrote:
>> On 08/01/2013 07:43 PM, Sören Brinkmann wrote:
>>> On Thu, Aug 01, 2013 at 07:29:12PM +0200, Daniel Lezcano wrote:
>>>> On 08/01/2013 01:38 AM, Sören Brinkmann wrote:
>>>>> On Thu, Aug 01, 2013 at 01:01:27AM +0200, Daniel Lezcano wrote:
>>>>>> On 08/01/2013 12:18 AM, Sören Brinkmann wrote:
>>>>>>> On Wed, Jul 31, 2013 at 11:08:51PM +0200, Daniel Lezcano wrote:
>>>>>>>> On 07/31/2013 10:58 PM, Sören Brinkmann wrote:
>>>>>>>>> On Wed, Jul 31, 2013 at 10:49:06PM +0200, Daniel Lezcano wrote:
>>>>>>>>>> On 07/31/2013 12:34 AM, Sören Brinkmann wrote:
>>>>>>>>>>> On Tue, Jul 30, 2013 at 10:47:15AM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>> On 07/30/2013 02:03 AM, Sören Brinkmann wrote:
>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 29, 2013 at 02:51:49PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>> (snip)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the CPUIDLE_FLAG_TIMER_STOP flag tells the cpuidle framework the local
>>>>>>>>>>>>>> timer will be stopped when entering to the idle state. In this case, the
>>>>>>>>>>>>>> cpuidle framework will call clockevents_notify(ENTER) and switches to a
>>>>>>>>>>>>>> broadcast timer and will call clockevents_notify(EXIT) when exiting the
>>>>>>>>>>>>>> idle state, switching the local timer back in use.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've been thinking about this, trying to understand how this makes my
>>>>>>>>>>>>> boot attempts on Zynq hang. IIUC, the wrongly provided TIMER_STOP flag
>>>>>>>>>>>>> would make the timer core switch to a broadcast device even though it
>>>>>>>>>>>>> wouldn't be necessary. But shouldn't it still work? It sounds like we do
>>>>>>>>>>>>> something useless, but nothing wrong in a sense that it should result in
>>>>>>>>>>>>> breakage. I guess I'm missing something obvious. This timer system will
>>>>>>>>>>>>> always remain a mystery to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Actually this more or less leads to the question: What is this
>>>>>>>>>>>>> 'broadcast timer'. I guess that is some clockevent device which is
>>>>>>>>>>>>> common to all cores? (that would be the cadence_ttc for Zynq). Is the
>>>>>>>>>>>>> hang pointing to some issue with that driver?
>>>>>>>>>>>>
>>>>>>>>>>>> If you look at the /proc/timer_list, which timer is used for broadcasting ?
>>>>>>>>>>>
>>>>>>>>>>> So, the correct run results (full output attached).
>>>>>>>>>>>
>>>>>>>>>>> The vanilla kernel uses the twd timers as local timers and the TTC as
>>>>>>>>>>> broadcast device:
>>>>>>>>>>> 	Tick Device: mode:     1                                                         
>>>>>>>>>>> 	Broadcast device  
>>>>>>>>>>> 	Clock Event Device: ttc_clockevent
>>>>>>>>>>>
>>>>>>>>>>> When I remove the offending CPUIDLE flag and add the DT fragment to
>>>>>>>>>>> enable the global timer, the twd timers are still used as local timers
>>>>>>>>>>> and the broadcast device is the global timer:
>>>>>>>>>>> 	Tick Device: mode:     1                                                         
>>>>>>>>>>> 	Broadcast device                                                                 
>>>>>>>>>>> 	Clock Event Device: arm_global_timer
>>>>>>>>>>>
>>>>>>>>>>> Again, since boot hangs in the actually broken case, I don't see way to
>>>>>>>>>>> obtain this information for that case.
>>>>>>>>>>
>>>>>>>>>> Can't you use the maxcpus=1 option to ensure the system to boot up ?
>>>>>>>>>
>>>>>>>>> Right, that works. I forgot about that option after you mentioned, that
>>>>>>>>> it is most likely not that useful.
>>>>>>>>>
>>>>>>>>> Anyway, this are those sysfs files with an unmodified cpuidle driver and
>>>>>>>>> the gt enabled and having maxcpus=1 set.
>>>>>>>>>
>>>>>>>>> /proc/timer_list:
>>>>>>>>> 	Tick Device: mode:     1
>>>>>>>>> 	Broadcast device
>>>>>>>>> 	Clock Event Device: arm_global_timer
>>>>>>>>> 	 max_delta_ns:   12884902005
>>>>>>>>> 	 min_delta_ns:   1000
>>>>>>>>> 	 mult:           715827876
>>>>>>>>> 	 shift:          31
>>>>>>>>> 	 mode:           3
>>>>>>>>
>>>>>>>> Here the mode is 3 (CLOCK_EVT_MODE_ONESHOT)
>>>>>>>>
>>>>>>>> The previous timer_list output you gave me when removing the offending
>>>>>>>> cpuidle flag, it was 1 (CLOCK_EVT_MODE_SHUTDOWN).
>>>>>>>>
>>>>>>>> Is it possible you try to get this output again right after onlining the
>>>>>>>> cpu1 in order to check if the broadcast device switches to SHUTDOWN ?
>>>>>>>
>>>>>>> How do I do that? I tried to online CPU1 after booting with maxcpus=1
>>>>>>> and that didn't end well:
>>>>>>> 	# echo 1 > online && cat /proc/timer_list 
>>>>>>
>>>>>> Hmm, I was hoping to have a small delay before the kernel hangs but
>>>>>> apparently this is not the case... :(
>>>>>>
>>>>>> I suspect the global timer is shutdown at one moment but I don't
>>>>>> understand why and when.
>>>>>>
>>>>>> Can you add a stack trace in the "clockevents_shutdown" function with
>>>>>> the clockevent device name ? Perhaps, we may see at boot time an
>>>>>> interesting trace when it hangs.
>>>>>
>>>>> I did this change:
>>>>> 	diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
>>>>> 	index 38959c8..3ab11c1 100644
>>>>> 	--- a/kernel/time/clockevents.c
>>>>> 	+++ b/kernel/time/clockevents.c
>>>>> 	@@ -92,6 +92,8 @@ void clockevents_set_mode(struct clock_event_device *dev,
>>>>> 	  */
>>>>> 	 void clockevents_shutdown(struct clock_event_device *dev)
>>>>> 	 {
>>>>> 	+       pr_info("ce->name:%s\n", dev->name);
>>>>> 	+       dump_stack();
>>>>> 	        clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
>>>>> 	        dev->next_event.tv64 = KTIME_MAX;
>>>>> 	 }
>>>>>
>>>>> It is hit a few times during boot, so I attach a full boot log. I really
>>>>> don't know what to look for, but I hope you can spot something in it. I
>>>>> really appreciate you taking the time.
>>>>
>>>> Thanks for the traces.
>>>
>>> Sure.
>>>
>>>>
>>>> If you try without the ttc_clockevent configured in the kernel (but with
>>>> twd and gt), does it boot ?
>>>
>>> Absence of the TTC doesn't seem to make any difference. It hangs at the
>>> same location.
>>
>> Ok, IMO there is a problem with the broadcast device registration (may
>> be vs twd).
>>
>> I will check later (kid duty) :)
> 
> I was actually waiting for an update from your side and did something
> else, but I seem to have run into this again. I was overhauling the
> cadence_ttc (patch attached, based on tip/timers/core). And it seems to
> show the same behavior as enabling the global_timer. With cpuidle off, it
> works. With cpuidle, on it hangs. Removing the TIMER_STOP flag from the
> C2 state makes it boot again.
> It works just fine on our 3.10 kernel.

This is not necessary related to the bug. If the patch you sent broke
the cadence_ttc driver, when you use it (with the TIMER_STOP), you will
be stuck. Removing the flag, may signifies you don't use the broadcast
timer, hence the bug is not surfacing.

Going back to the bug with the arm_global_timer, what is observed is the
broadcast timer is *shutdown* when the second cpu is online.

I have to dig into the kernel/time/clockevents.c|tick-*.c because IMO
the issue is coming from there but before I have to reproduce the bug,
so find a board I have where I can add the arm_global_timer.

> Another thing I noticed - probably unrelated but hard to tell: On
> 3.11-rc1 and later my system stops for quite some time at the hand off
> to userspace. I.e. I see the 'freeing unused kernel memory...' line and
> sometimes the following 'Welcome to Buildroot...' and then it stops and
> on good kernels it continues after a while and boots through and on bad
> ones it just hangs there.

did you try to dump the stacks with magic-sysrq ? Or git bisect ?
Michal Simek Aug. 6, 2013, 9:18 a.m. UTC | #2
On 08/06/2013 10:46 AM, Daniel Lezcano wrote:
> On 08/06/2013 03:28 AM, Sören Brinkmann wrote:
>> Hi Daniel,
>>
>> On Thu, Aug 01, 2013 at 07:48:04PM +0200, Daniel Lezcano wrote:
>>> On 08/01/2013 07:43 PM, Sören Brinkmann wrote:
>>>> On Thu, Aug 01, 2013 at 07:29:12PM +0200, Daniel Lezcano wrote:
>>>>> On 08/01/2013 01:38 AM, Sören Brinkmann wrote:
>>>>>> On Thu, Aug 01, 2013 at 01:01:27AM +0200, Daniel Lezcano wrote:
>>>>>>> On 08/01/2013 12:18 AM, Sören Brinkmann wrote:
>>>>>>>> On Wed, Jul 31, 2013 at 11:08:51PM +0200, Daniel Lezcano wrote:
>>>>>>>>> On 07/31/2013 10:58 PM, Sören Brinkmann wrote:
>>>>>>>>>> On Wed, Jul 31, 2013 at 10:49:06PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>> On 07/31/2013 12:34 AM, Sören Brinkmann wrote:
>>>>>>>>>>>> On Tue, Jul 30, 2013 at 10:47:15AM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>> On 07/30/2013 02:03 AM, Sören Brinkmann wrote:
>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 29, 2013 at 02:51:49PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>>> (snip)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> the CPUIDLE_FLAG_TIMER_STOP flag tells the cpuidle framework the local
>>>>>>>>>>>>>>> timer will be stopped when entering to the idle state. In this case, the
>>>>>>>>>>>>>>> cpuidle framework will call clockevents_notify(ENTER) and switches to a
>>>>>>>>>>>>>>> broadcast timer and will call clockevents_notify(EXIT) when exiting the
>>>>>>>>>>>>>>> idle state, switching the local timer back in use.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've been thinking about this, trying to understand how this makes my
>>>>>>>>>>>>>> boot attempts on Zynq hang. IIUC, the wrongly provided TIMER_STOP flag
>>>>>>>>>>>>>> would make the timer core switch to a broadcast device even though it
>>>>>>>>>>>>>> wouldn't be necessary. But shouldn't it still work? It sounds like we do
>>>>>>>>>>>>>> something useless, but nothing wrong in a sense that it should result in
>>>>>>>>>>>>>> breakage. I guess I'm missing something obvious. This timer system will
>>>>>>>>>>>>>> always remain a mystery to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Actually this more or less leads to the question: What is this
>>>>>>>>>>>>>> 'broadcast timer'. I guess that is some clockevent device which is
>>>>>>>>>>>>>> common to all cores? (that would be the cadence_ttc for Zynq). Is the
>>>>>>>>>>>>>> hang pointing to some issue with that driver?
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you look at the /proc/timer_list, which timer is used for broadcasting ?
>>>>>>>>>>>>
>>>>>>>>>>>> So, the correct run results (full output attached).
>>>>>>>>>>>>
>>>>>>>>>>>> The vanilla kernel uses the twd timers as local timers and the TTC as
>>>>>>>>>>>> broadcast device:
>>>>>>>>>>>> 	Tick Device: mode:     1                                                         
>>>>>>>>>>>> 	Broadcast device  
>>>>>>>>>>>> 	Clock Event Device: ttc_clockevent
>>>>>>>>>>>>
>>>>>>>>>>>> When I remove the offending CPUIDLE flag and add the DT fragment to
>>>>>>>>>>>> enable the global timer, the twd timers are still used as local timers
>>>>>>>>>>>> and the broadcast device is the global timer:
>>>>>>>>>>>> 	Tick Device: mode:     1                                                         
>>>>>>>>>>>> 	Broadcast device                                                                 
>>>>>>>>>>>> 	Clock Event Device: arm_global_timer
>>>>>>>>>>>>
>>>>>>>>>>>> Again, since boot hangs in the actually broken case, I don't see way to
>>>>>>>>>>>> obtain this information for that case.
>>>>>>>>>>>
>>>>>>>>>>> Can't you use the maxcpus=1 option to ensure the system to boot up ?
>>>>>>>>>>
>>>>>>>>>> Right, that works. I forgot about that option after you mentioned, that
>>>>>>>>>> it is most likely not that useful.
>>>>>>>>>>
>>>>>>>>>> Anyway, this are those sysfs files with an unmodified cpuidle driver and
>>>>>>>>>> the gt enabled and having maxcpus=1 set.
>>>>>>>>>>
>>>>>>>>>> /proc/timer_list:
>>>>>>>>>> 	Tick Device: mode:     1
>>>>>>>>>> 	Broadcast device
>>>>>>>>>> 	Clock Event Device: arm_global_timer
>>>>>>>>>> 	 max_delta_ns:   12884902005
>>>>>>>>>> 	 min_delta_ns:   1000
>>>>>>>>>> 	 mult:           715827876
>>>>>>>>>> 	 shift:          31
>>>>>>>>>> 	 mode:           3
>>>>>>>>>
>>>>>>>>> Here the mode is 3 (CLOCK_EVT_MODE_ONESHOT)
>>>>>>>>>
>>>>>>>>> The previous timer_list output you gave me when removing the offending
>>>>>>>>> cpuidle flag, it was 1 (CLOCK_EVT_MODE_SHUTDOWN).
>>>>>>>>>
>>>>>>>>> Is it possible you try to get this output again right after onlining the
>>>>>>>>> cpu1 in order to check if the broadcast device switches to SHUTDOWN ?
>>>>>>>>
>>>>>>>> How do I do that? I tried to online CPU1 after booting with maxcpus=1
>>>>>>>> and that didn't end well:
>>>>>>>> 	# echo 1 > online && cat /proc/timer_list 
>>>>>>>
>>>>>>> Hmm, I was hoping to have a small delay before the kernel hangs but
>>>>>>> apparently this is not the case... :(
>>>>>>>
>>>>>>> I suspect the global timer is shutdown at one moment but I don't
>>>>>>> understand why and when.
>>>>>>>
>>>>>>> Can you add a stack trace in the "clockevents_shutdown" function with
>>>>>>> the clockevent device name ? Perhaps, we may see at boot time an
>>>>>>> interesting trace when it hangs.
>>>>>>
>>>>>> I did this change:
>>>>>> 	diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
>>>>>> 	index 38959c8..3ab11c1 100644
>>>>>> 	--- a/kernel/time/clockevents.c
>>>>>> 	+++ b/kernel/time/clockevents.c
>>>>>> 	@@ -92,6 +92,8 @@ void clockevents_set_mode(struct clock_event_device *dev,
>>>>>> 	  */
>>>>>> 	 void clockevents_shutdown(struct clock_event_device *dev)
>>>>>> 	 {
>>>>>> 	+       pr_info("ce->name:%s\n", dev->name);
>>>>>> 	+       dump_stack();
>>>>>> 	        clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
>>>>>> 	        dev->next_event.tv64 = KTIME_MAX;
>>>>>> 	 }
>>>>>>
>>>>>> It is hit a few times during boot, so I attach a full boot log. I really
>>>>>> don't know what to look for, but I hope you can spot something in it. I
>>>>>> really appreciate you taking the time.
>>>>>
>>>>> Thanks for the traces.
>>>>
>>>> Sure.
>>>>
>>>>>
>>>>> If you try without the ttc_clockevent configured in the kernel (but with
>>>>> twd and gt), does it boot ?
>>>>
>>>> Absence of the TTC doesn't seem to make any difference. It hangs at the
>>>> same location.
>>>
>>> Ok, IMO there is a problem with the broadcast device registration (may
>>> be vs twd).
>>>
>>> I will check later (kid duty) :)
>>
>> I was actually waiting for an update from your side and did something
>> else, but I seem to have run into this again. I was overhauling the
>> cadence_ttc (patch attached, based on tip/timers/core). And it seems to
>> show the same behavior as enabling the global_timer. With cpuidle off, it
>> works. With cpuidle, on it hangs. Removing the TIMER_STOP flag from the
>> C2 state makes it boot again.
>> It works just fine on our 3.10 kernel.
> 
> This is not necessary related to the bug. If the patch you sent broke
> the cadence_ttc driver, when you use it (with the TIMER_STOP), you will
> be stuck. Removing the flag, may signifies you don't use the broadcast
> timer, hence the bug is not surfacing.
> 
> Going back to the bug with the arm_global_timer, what is observed is the
> broadcast timer is *shutdown* when the second cpu is online.
> 
> I have to dig into the kernel/time/clockevents.c|tick-*.c because IMO
> the issue is coming from there but before I have to reproduce the bug,
> so find a board I have where I can add the arm_global_timer.
> 
>> Another thing I noticed - probably unrelated but hard to tell: On
>> 3.11-rc1 and later my system stops for quite some time at the hand off
>> to userspace. I.e. I see the 'freeing unused kernel memory...' line and
>> sometimes the following 'Welcome to Buildroot...' and then it stops and
>> on good kernels it continues after a while and boots through and on bad
>> ones it just hangs there.
> 
> did you try to dump the stacks with magic-sysrq ? Or git bisect ?

Soren: Are you able to replicate this issue on QEMU?
If yes, it should be the best if you can provide Qemu, kernel .config/
rootfs and simple manual to Daniel how to reach that fault.

Thanks,
Michal
Daniel Lezcano Aug. 6, 2013, 12:30 p.m. UTC | #3
On 08/06/2013 11:18 AM, Michal Simek wrote:
> On 08/06/2013 10:46 AM, Daniel Lezcano wrote:
>> On 08/06/2013 03:28 AM, Sören Brinkmann wrote:
>>> Hi Daniel,
>>>
>>> On Thu, Aug 01, 2013 at 07:48:04PM +0200, Daniel Lezcano wrote:
>>>> On 08/01/2013 07:43 PM, Sören Brinkmann wrote:
>>>>> On Thu, Aug 01, 2013 at 07:29:12PM +0200, Daniel Lezcano wrote:
>>>>>> On 08/01/2013 01:38 AM, Sören Brinkmann wrote:
>>>>>>> On Thu, Aug 01, 2013 at 01:01:27AM +0200, Daniel Lezcano wrote:
>>>>>>>> On 08/01/2013 12:18 AM, Sören Brinkmann wrote:
>>>>>>>>> On Wed, Jul 31, 2013 at 11:08:51PM +0200, Daniel Lezcano wrote:
>>>>>>>>>> On 07/31/2013 10:58 PM, Sören Brinkmann wrote:
>>>>>>>>>>> On Wed, Jul 31, 2013 at 10:49:06PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>> On 07/31/2013 12:34 AM, Sören Brinkmann wrote:
>>>>>>>>>>>>> On Tue, Jul 30, 2013 at 10:47:15AM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>>> On 07/30/2013 02:03 AM, Sören Brinkmann wrote:
>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jul 29, 2013 at 02:51:49PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>>>> (snip)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> the CPUIDLE_FLAG_TIMER_STOP flag tells the cpuidle framework the local
>>>>>>>>>>>>>>>> timer will be stopped when entering to the idle state. In this case, the
>>>>>>>>>>>>>>>> cpuidle framework will call clockevents_notify(ENTER) and switches to a
>>>>>>>>>>>>>>>> broadcast timer and will call clockevents_notify(EXIT) when exiting the
>>>>>>>>>>>>>>>> idle state, switching the local timer back in use.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've been thinking about this, trying to understand how this makes my
>>>>>>>>>>>>>>> boot attempts on Zynq hang. IIUC, the wrongly provided TIMER_STOP flag
>>>>>>>>>>>>>>> would make the timer core switch to a broadcast device even though it
>>>>>>>>>>>>>>> wouldn't be necessary. But shouldn't it still work? It sounds like we do
>>>>>>>>>>>>>>> something useless, but nothing wrong in a sense that it should result in
>>>>>>>>>>>>>>> breakage. I guess I'm missing something obvious. This timer system will
>>>>>>>>>>>>>>> always remain a mystery to me.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Actually this more or less leads to the question: What is this
>>>>>>>>>>>>>>> 'broadcast timer'. I guess that is some clockevent device which is
>>>>>>>>>>>>>>> common to all cores? (that would be the cadence_ttc for Zynq). Is the
>>>>>>>>>>>>>>> hang pointing to some issue with that driver?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you look at the /proc/timer_list, which timer is used for broadcasting ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, the correct run results (full output attached).
>>>>>>>>>>>>>
>>>>>>>>>>>>> The vanilla kernel uses the twd timers as local timers and the TTC as
>>>>>>>>>>>>> broadcast device:
>>>>>>>>>>>>> 	Tick Device: mode:     1                                                         
>>>>>>>>>>>>> 	Broadcast device  
>>>>>>>>>>>>> 	Clock Event Device: ttc_clockevent
>>>>>>>>>>>>>
>>>>>>>>>>>>> When I remove the offending CPUIDLE flag and add the DT fragment to
>>>>>>>>>>>>> enable the global timer, the twd timers are still used as local timers
>>>>>>>>>>>>> and the broadcast device is the global timer:
>>>>>>>>>>>>> 	Tick Device: mode:     1                                                         
>>>>>>>>>>>>> 	Broadcast device                                                                 
>>>>>>>>>>>>> 	Clock Event Device: arm_global_timer
>>>>>>>>>>>>>
>>>>>>>>>>>>> Again, since boot hangs in the actually broken case, I don't see way to
>>>>>>>>>>>>> obtain this information for that case.
>>>>>>>>>>>>
>>>>>>>>>>>> Can't you use the maxcpus=1 option to ensure the system to boot up ?
>>>>>>>>>>>
>>>>>>>>>>> Right, that works. I forgot about that option after you mentioned, that
>>>>>>>>>>> it is most likely not that useful.
>>>>>>>>>>>
>>>>>>>>>>> Anyway, this are those sysfs files with an unmodified cpuidle driver and
>>>>>>>>>>> the gt enabled and having maxcpus=1 set.
>>>>>>>>>>>
>>>>>>>>>>> /proc/timer_list:
>>>>>>>>>>> 	Tick Device: mode:     1
>>>>>>>>>>> 	Broadcast device
>>>>>>>>>>> 	Clock Event Device: arm_global_timer
>>>>>>>>>>> 	 max_delta_ns:   12884902005
>>>>>>>>>>> 	 min_delta_ns:   1000
>>>>>>>>>>> 	 mult:           715827876
>>>>>>>>>>> 	 shift:          31
>>>>>>>>>>> 	 mode:           3
>>>>>>>>>>
>>>>>>>>>> Here the mode is 3 (CLOCK_EVT_MODE_ONESHOT)
>>>>>>>>>>
>>>>>>>>>> The previous timer_list output you gave me when removing the offending
>>>>>>>>>> cpuidle flag, it was 1 (CLOCK_EVT_MODE_SHUTDOWN).
>>>>>>>>>>
>>>>>>>>>> Is it possible you try to get this output again right after onlining the
>>>>>>>>>> cpu1 in order to check if the broadcast device switches to SHUTDOWN ?
>>>>>>>>>
>>>>>>>>> How do I do that? I tried to online CPU1 after booting with maxcpus=1
>>>>>>>>> and that didn't end well:
>>>>>>>>> 	# echo 1 > online && cat /proc/timer_list 
>>>>>>>>
>>>>>>>> Hmm, I was hoping to have a small delay before the kernel hangs but
>>>>>>>> apparently this is not the case... :(
>>>>>>>>
>>>>>>>> I suspect the global timer is shutdown at one moment but I don't
>>>>>>>> understand why and when.
>>>>>>>>
>>>>>>>> Can you add a stack trace in the "clockevents_shutdown" function with
>>>>>>>> the clockevent device name ? Perhaps, we may see at boot time an
>>>>>>>> interesting trace when it hangs.
>>>>>>>
>>>>>>> I did this change:
>>>>>>> 	diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
>>>>>>> 	index 38959c8..3ab11c1 100644
>>>>>>> 	--- a/kernel/time/clockevents.c
>>>>>>> 	+++ b/kernel/time/clockevents.c
>>>>>>> 	@@ -92,6 +92,8 @@ void clockevents_set_mode(struct clock_event_device *dev,
>>>>>>> 	  */
>>>>>>> 	 void clockevents_shutdown(struct clock_event_device *dev)
>>>>>>> 	 {
>>>>>>> 	+       pr_info("ce->name:%s\n", dev->name);
>>>>>>> 	+       dump_stack();
>>>>>>> 	        clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
>>>>>>> 	        dev->next_event.tv64 = KTIME_MAX;
>>>>>>> 	 }
>>>>>>>
>>>>>>> It is hit a few times during boot, so I attach a full boot log. I really
>>>>>>> don't know what to look for, but I hope you can spot something in it. I
>>>>>>> really appreciate you taking the time.
>>>>>>
>>>>>> Thanks for the traces.
>>>>>
>>>>> Sure.
>>>>>
>>>>>>
>>>>>> If you try without the ttc_clockevent configured in the kernel (but with
>>>>>> twd and gt), does it boot ?
>>>>>
>>>>> Absence of the TTC doesn't seem to make any difference. It hangs at the
>>>>> same location.
>>>>
>>>> Ok, IMO there is a problem with the broadcast device registration (may
>>>> be vs twd).
>>>>
>>>> I will check later (kid duty) :)
>>>
>>> I was actually waiting for an update from your side and did something
>>> else, but I seem to have run into this again. I was overhauling the
>>> cadence_ttc (patch attached, based on tip/timers/core). And it seems to
>>> show the same behavior as enabling the global_timer. With cpuidle off, it
>>> works. With cpuidle, on it hangs. Removing the TIMER_STOP flag from the
>>> C2 state makes it boot again.
>>> It works just fine on our 3.10 kernel.
>>
>> This is not necessary related to the bug. If the patch you sent broke
>> the cadence_ttc driver, when you use it (with the TIMER_STOP), you will
>> be stuck. Removing the flag, may signifies you don't use the broadcast
>> timer, hence the bug is not surfacing.
>>
>> Going back to the bug with the arm_global_timer, what is observed is the
>> broadcast timer is *shutdown* when the second cpu is online.
>>
>> I have to dig into the kernel/time/clockevents.c|tick-*.c because IMO
>> the issue is coming from there but before I have to reproduce the bug,
>> so find a board I have where I can add the arm_global_timer.
>>
>>> Another thing I noticed - probably unrelated but hard to tell: On
>>> 3.11-rc1 and later my system stops for quite some time at the hand off
>>> to userspace. I.e. I see the 'freeing unused kernel memory...' line and
>>> sometimes the following 'Welcome to Buildroot...' and then it stops and
>>> on good kernels it continues after a while and boots through and on bad
>>> ones it just hangs there.
>>
>> did you try to dump the stacks with magic-sysrq ? Or git bisect ?
> 
> Soren: Are you able to replicate this issue on QEMU?
> If yes, it should be the best if you can provide Qemu, kernel .config/
> rootfs and simple manual to Daniel how to reach that fault.

I tried to download qemu for zynq but it fails:

git clone git://git.xilinx.com/qemu-xarm.git
Cloning into 'qemu-xarm'...
fatal: The remote end hung up unexpectedly

I am also looking for the option specified for the kernel:

"The kernel needs to be built with this feature turned on (in
menuconfig, System Type->Xilinx Specific Features -> Device Tree At
Fixed Address)."

... but I was not able to find it.

any ideas ?

Thanks
  -- Daniel

ps : apart that, well documented website !
Michal Simek Aug. 6, 2013, 12:41 p.m. UTC | #4
On 08/06/2013 02:30 PM, Daniel Lezcano wrote:
> On 08/06/2013 11:18 AM, Michal Simek wrote:
>> On 08/06/2013 10:46 AM, Daniel Lezcano wrote:
>>> On 08/06/2013 03:28 AM, Sören Brinkmann wrote:
>>>> Hi Daniel,
>>>>
>>>> On Thu, Aug 01, 2013 at 07:48:04PM +0200, Daniel Lezcano wrote:
>>>>> On 08/01/2013 07:43 PM, Sören Brinkmann wrote:
>>>>>> On Thu, Aug 01, 2013 at 07:29:12PM +0200, Daniel Lezcano wrote:
>>>>>>> On 08/01/2013 01:38 AM, Sören Brinkmann wrote:
>>>>>>>> On Thu, Aug 01, 2013 at 01:01:27AM +0200, Daniel Lezcano wrote:
>>>>>>>>> On 08/01/2013 12:18 AM, Sören Brinkmann wrote:
>>>>>>>>>> On Wed, Jul 31, 2013 at 11:08:51PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>> On 07/31/2013 10:58 PM, Sören Brinkmann wrote:
>>>>>>>>>>>> On Wed, Jul 31, 2013 at 10:49:06PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>> On 07/31/2013 12:34 AM, Sören Brinkmann wrote:
>>>>>>>>>>>>>> On Tue, Jul 30, 2013 at 10:47:15AM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>>>> On 07/30/2013 02:03 AM, Sören Brinkmann wrote:
>>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jul 29, 2013 at 02:51:49PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>>>>> (snip)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> the CPUIDLE_FLAG_TIMER_STOP flag tells the cpuidle framework the local
>>>>>>>>>>>>>>>>> timer will be stopped when entering to the idle state. In this case, the
>>>>>>>>>>>>>>>>> cpuidle framework will call clockevents_notify(ENTER) and switches to a
>>>>>>>>>>>>>>>>> broadcast timer and will call clockevents_notify(EXIT) when exiting the
>>>>>>>>>>>>>>>>> idle state, switching the local timer back in use.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've been thinking about this, trying to understand how this makes my
>>>>>>>>>>>>>>>> boot attempts on Zynq hang. IIUC, the wrongly provided TIMER_STOP flag
>>>>>>>>>>>>>>>> would make the timer core switch to a broadcast device even though it
>>>>>>>>>>>>>>>> wouldn't be necessary. But shouldn't it still work? It sounds like we do
>>>>>>>>>>>>>>>> something useless, but nothing wrong in a sense that it should result in
>>>>>>>>>>>>>>>> breakage. I guess I'm missing something obvious. This timer system will
>>>>>>>>>>>>>>>> always remain a mystery to me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Actually this more or less leads to the question: What is this
>>>>>>>>>>>>>>>> 'broadcast timer'. I guess that is some clockevent device which is
>>>>>>>>>>>>>>>> common to all cores? (that would be the cadence_ttc for Zynq). Is the
>>>>>>>>>>>>>>>> hang pointing to some issue with that driver?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you look at the /proc/timer_list, which timer is used for broadcasting ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, the correct run results (full output attached).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The vanilla kernel uses the twd timers as local timers and the TTC as
>>>>>>>>>>>>>> broadcast device:
>>>>>>>>>>>>>> 	Tick Device: mode:     1                                                         
>>>>>>>>>>>>>> 	Broadcast device  
>>>>>>>>>>>>>> 	Clock Event Device: ttc_clockevent
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When I remove the offending CPUIDLE flag and add the DT fragment to
>>>>>>>>>>>>>> enable the global timer, the twd timers are still used as local timers
>>>>>>>>>>>>>> and the broadcast device is the global timer:
>>>>>>>>>>>>>> 	Tick Device: mode:     1                                                         
>>>>>>>>>>>>>> 	Broadcast device                                                                 
>>>>>>>>>>>>>> 	Clock Event Device: arm_global_timer
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Again, since boot hangs in the actually broken case, I don't see way to
>>>>>>>>>>>>>> obtain this information for that case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can't you use the maxcpus=1 option to ensure the system to boot up ?
>>>>>>>>>>>>
>>>>>>>>>>>> Right, that works. I forgot about that option after you mentioned, that
>>>>>>>>>>>> it is most likely not that useful.
>>>>>>>>>>>>
>>>>>>>>>>>> Anyway, this are those sysfs files with an unmodified cpuidle driver and
>>>>>>>>>>>> the gt enabled and having maxcpus=1 set.
>>>>>>>>>>>>
>>>>>>>>>>>> /proc/timer_list:
>>>>>>>>>>>> 	Tick Device: mode:     1
>>>>>>>>>>>> 	Broadcast device
>>>>>>>>>>>> 	Clock Event Device: arm_global_timer
>>>>>>>>>>>> 	 max_delta_ns:   12884902005
>>>>>>>>>>>> 	 min_delta_ns:   1000
>>>>>>>>>>>> 	 mult:           715827876
>>>>>>>>>>>> 	 shift:          31
>>>>>>>>>>>> 	 mode:           3
>>>>>>>>>>>
>>>>>>>>>>> Here the mode is 3 (CLOCK_EVT_MODE_ONESHOT)
>>>>>>>>>>>
>>>>>>>>>>> The previous timer_list output you gave me when removing the offending
>>>>>>>>>>> cpuidle flag, it was 1 (CLOCK_EVT_MODE_SHUTDOWN).
>>>>>>>>>>>
>>>>>>>>>>> Is it possible you try to get this output again right after onlining the
>>>>>>>>>>> cpu1 in order to check if the broadcast device switches to SHUTDOWN ?
>>>>>>>>>>
>>>>>>>>>> How do I do that? I tried to online CPU1 after booting with maxcpus=1
>>>>>>>>>> and that didn't end well:
>>>>>>>>>> 	# echo 1 > online && cat /proc/timer_list 
>>>>>>>>>
>>>>>>>>> Hmm, I was hoping to have a small delay before the kernel hangs but
>>>>>>>>> apparently this is not the case... :(
>>>>>>>>>
>>>>>>>>> I suspect the global timer is shutdown at one moment but I don't
>>>>>>>>> understand why and when.
>>>>>>>>>
>>>>>>>>> Can you add a stack trace in the "clockevents_shutdown" function with
>>>>>>>>> the clockevent device name ? Perhaps, we may see at boot time an
>>>>>>>>> interesting trace when it hangs.
>>>>>>>>
>>>>>>>> I did this change:
>>>>>>>> 	diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
>>>>>>>> 	index 38959c8..3ab11c1 100644
>>>>>>>> 	--- a/kernel/time/clockevents.c
>>>>>>>> 	+++ b/kernel/time/clockevents.c
>>>>>>>> 	@@ -92,6 +92,8 @@ void clockevents_set_mode(struct clock_event_device *dev,
>>>>>>>> 	  */
>>>>>>>> 	 void clockevents_shutdown(struct clock_event_device *dev)
>>>>>>>> 	 {
>>>>>>>> 	+       pr_info("ce->name:%s\n", dev->name);
>>>>>>>> 	+       dump_stack();
>>>>>>>> 	        clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
>>>>>>>> 	        dev->next_event.tv64 = KTIME_MAX;
>>>>>>>> 	 }
>>>>>>>>
>>>>>>>> It is hit a few times during boot, so I attach a full boot log. I really
>>>>>>>> don't know what to look for, but I hope you can spot something in it. I
>>>>>>>> really appreciate you taking the time.
>>>>>>>
>>>>>>> Thanks for the traces.
>>>>>>
>>>>>> Sure.
>>>>>>
>>>>>>>
>>>>>>> If you try without the ttc_clockevent configured in the kernel (but with
>>>>>>> twd and gt), does it boot ?
>>>>>>
>>>>>> Absence of the TTC doesn't seem to make any difference. It hangs at the
>>>>>> same location.
>>>>>
>>>>> Ok, IMO there is a problem with the broadcast device registration (may
>>>>> be vs twd).
>>>>>
>>>>> I will check later (kid duty) :)
>>>>
>>>> I was actually waiting for an update from your side and did something
>>>> else, but I seem to have run into this again. I was overhauling the
>>>> cadence_ttc (patch attached, based on tip/timers/core). And it seems to
>>>> show the same behavior as enabling the global_timer. With cpuidle off, it
>>>> works. With cpuidle, on it hangs. Removing the TIMER_STOP flag from the
>>>> C2 state makes it boot again.
>>>> It works just fine on our 3.10 kernel.
>>>
>>> This is not necessary related to the bug. If the patch you sent broke
>>> the cadence_ttc driver, when you use it (with the TIMER_STOP), you will
>>> be stuck. Removing the flag, may signifies you don't use the broadcast
>>> timer, hence the bug is not surfacing.
>>>
>>> Going back to the bug with the arm_global_timer, what is observed is the
>>> broadcast timer is *shutdown* when the second cpu is online.
>>>
>>> I have to dig into the kernel/time/clockevents.c|tick-*.c because IMO
>>> the issue is coming from there but before I have to reproduce the bug,
>>> so find a board I have where I can add the arm_global_timer.
>>>
>>>> Another thing I noticed - probably unrelated but hard to tell: On
>>>> 3.11-rc1 and later my system stops for quite some time at the hand off
>>>> to userspace. I.e. I see the 'freeing unused kernel memory...' line and
>>>> sometimes the following 'Welcome to Buildroot...' and then it stops and
>>>> on good kernels it continues after a while and boots through and on bad
>>>> ones it just hangs there.
>>>
>>> did you try to dump the stacks with magic-sysrq ? Or git bisect ?
>>
>> Soren: Are you able to replicate this issue on QEMU?
>> If yes, it should be the best if you can provide Qemu, kernel .config/
>> rootfs and simple manual to Daniel how to reach that fault.
> 
> I tried to download qemu for zynq but it fails:
> 
> git clone git://git.xilinx.com/qemu-xarm.git
> Cloning into 'qemu-xarm'...
> fatal: The remote end hung up unexpectedly

Not sure which site have you found but
it should be just qemu.git
https://github.com/Xilinx/qemu

or github clone.

> 
> I am also looking for the option specified for the kernel:
> 
> "The kernel needs to be built with this feature turned on (in
> menuconfig, System Type->Xilinx Specific Features -> Device Tree At
> Fixed Address)."


This also sound like a very ancient tree.
This is the latest kernel tree - master-next is the latest devel branch.
https://github.com/Xilinx/linux-xlnx

Or there should be an option to use the latest kernel from kernel.org.
(I think Soren is using it)

Zynq is the part of multiplatfrom kernel and cadence ttc is there,
dts is also in the mainline kernel.

> ps : apart that, well documented website !

Can you send me the link to it?

This should be the main page for it.
http://www.wiki.xilinx.com/

Thanks,
Michal
Daniel Lezcano Aug. 6, 2013, 1:08 p.m. UTC | #5
On 08/06/2013 02:41 PM, Michal Simek wrote:
> On 08/06/2013 02:30 PM, Daniel Lezcano wrote:
>> On 08/06/2013 11:18 AM, Michal Simek wrote:
>>> On 08/06/2013 10:46 AM, Daniel Lezcano wrote:
>>>> On 08/06/2013 03:28 AM, Sören Brinkmann wrote:
>>>>> Hi Daniel,
>>>>>
>>>>> On Thu, Aug 01, 2013 at 07:48:04PM +0200, Daniel Lezcano wrote:
>>>>>> On 08/01/2013 07:43 PM, Sören Brinkmann wrote:
>>>>>>> On Thu, Aug 01, 2013 at 07:29:12PM +0200, Daniel Lezcano wrote:
>>>>>>>> On 08/01/2013 01:38 AM, Sören Brinkmann wrote:
>>>>>>>>> On Thu, Aug 01, 2013 at 01:01:27AM +0200, Daniel Lezcano wrote:
>>>>>>>>>> On 08/01/2013 12:18 AM, Sören Brinkmann wrote:
>>>>>>>>>>> On Wed, Jul 31, 2013 at 11:08:51PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>> On 07/31/2013 10:58 PM, Sören Brinkmann wrote:
>>>>>>>>>>>>> On Wed, Jul 31, 2013 at 10:49:06PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>>> On 07/31/2013 12:34 AM, Sören Brinkmann wrote:
>>>>>>>>>>>>>>> On Tue, Jul 30, 2013 at 10:47:15AM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>>>>> On 07/30/2013 02:03 AM, Sören Brinkmann wrote:
>>>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jul 29, 2013 at 02:51:49PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>>>>>> (snip)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> the CPUIDLE_FLAG_TIMER_STOP flag tells the cpuidle framework the local
>>>>>>>>>>>>>>>>>> timer will be stopped when entering to the idle state. In this case, the
>>>>>>>>>>>>>>>>>> cpuidle framework will call clockevents_notify(ENTER) and switches to a
>>>>>>>>>>>>>>>>>> broadcast timer and will call clockevents_notify(EXIT) when exiting the
>>>>>>>>>>>>>>>>>> idle state, switching the local timer back in use.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've been thinking about this, trying to understand how this makes my
>>>>>>>>>>>>>>>>> boot attempts on Zynq hang. IIUC, the wrongly provided TIMER_STOP flag
>>>>>>>>>>>>>>>>> would make the timer core switch to a broadcast device even though it
>>>>>>>>>>>>>>>>> wouldn't be necessary. But shouldn't it still work? It sounds like we do
>>>>>>>>>>>>>>>>> something useless, but nothing wrong in a sense that it should result in
>>>>>>>>>>>>>>>>> breakage. I guess I'm missing something obvious. This timer system will
>>>>>>>>>>>>>>>>> always remain a mystery to me.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Actually this more or less leads to the question: What is this
>>>>>>>>>>>>>>>>> 'broadcast timer'. I guess that is some clockevent device which is
>>>>>>>>>>>>>>>>> common to all cores? (that would be the cadence_ttc for Zynq). Is the
>>>>>>>>>>>>>>>>> hang pointing to some issue with that driver?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If you look at the /proc/timer_list, which timer is used for broadcasting ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So, the correct run results (full output attached).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The vanilla kernel uses the twd timers as local timers and the TTC as
>>>>>>>>>>>>>>> broadcast device:
>>>>>>>>>>>>>>> 	Tick Device: mode:     1                                                         
>>>>>>>>>>>>>>> 	Broadcast device  
>>>>>>>>>>>>>>> 	Clock Event Device: ttc_clockevent
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When I remove the offending CPUIDLE flag and add the DT fragment to
>>>>>>>>>>>>>>> enable the global timer, the twd timers are still used as local timers
>>>>>>>>>>>>>>> and the broadcast device is the global timer:
>>>>>>>>>>>>>>> 	Tick Device: mode:     1                                                         
>>>>>>>>>>>>>>> 	Broadcast device                                                                 
>>>>>>>>>>>>>>> 	Clock Event Device: arm_global_timer
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Again, since boot hangs in the actually broken case, I don't see way to
>>>>>>>>>>>>>>> obtain this information for that case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can't you use the maxcpus=1 option to ensure the system to boot up ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Right, that works. I forgot about that option after you mentioned, that
>>>>>>>>>>>>> it is most likely not that useful.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyway, this are those sysfs files with an unmodified cpuidle driver and
>>>>>>>>>>>>> the gt enabled and having maxcpus=1 set.
>>>>>>>>>>>>>
>>>>>>>>>>>>> /proc/timer_list:
>>>>>>>>>>>>> 	Tick Device: mode:     1
>>>>>>>>>>>>> 	Broadcast device
>>>>>>>>>>>>> 	Clock Event Device: arm_global_timer
>>>>>>>>>>>>> 	 max_delta_ns:   12884902005
>>>>>>>>>>>>> 	 min_delta_ns:   1000
>>>>>>>>>>>>> 	 mult:           715827876
>>>>>>>>>>>>> 	 shift:          31
>>>>>>>>>>>>> 	 mode:           3
>>>>>>>>>>>>
>>>>>>>>>>>> Here the mode is 3 (CLOCK_EVT_MODE_ONESHOT)
>>>>>>>>>>>>
>>>>>>>>>>>> The previous timer_list output you gave me when removing the offending
>>>>>>>>>>>> cpuidle flag, it was 1 (CLOCK_EVT_MODE_SHUTDOWN).
>>>>>>>>>>>>
>>>>>>>>>>>> Is it possible you try to get this output again right after onlining the
>>>>>>>>>>>> cpu1 in order to check if the broadcast device switches to SHUTDOWN ?
>>>>>>>>>>>
>>>>>>>>>>> How do I do that? I tried to online CPU1 after booting with maxcpus=1
>>>>>>>>>>> and that didn't end well:
>>>>>>>>>>> 	# echo 1 > online && cat /proc/timer_list 
>>>>>>>>>>
>>>>>>>>>> Hmm, I was hoping to have a small delay before the kernel hangs but
>>>>>>>>>> apparently this is not the case... :(
>>>>>>>>>>
>>>>>>>>>> I suspect the global timer is shutdown at one moment but I don't
>>>>>>>>>> understand why and when.
>>>>>>>>>>
>>>>>>>>>> Can you add a stack trace in the "clockevents_shutdown" function with
>>>>>>>>>> the clockevent device name ? Perhaps, we may see at boot time an
>>>>>>>>>> interesting trace when it hangs.
>>>>>>>>>
>>>>>>>>> I did this change:
>>>>>>>>> 	diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
>>>>>>>>> 	index 38959c8..3ab11c1 100644
>>>>>>>>> 	--- a/kernel/time/clockevents.c
>>>>>>>>> 	+++ b/kernel/time/clockevents.c
>>>>>>>>> 	@@ -92,6 +92,8 @@ void clockevents_set_mode(struct clock_event_device *dev,
>>>>>>>>> 	  */
>>>>>>>>> 	 void clockevents_shutdown(struct clock_event_device *dev)
>>>>>>>>> 	 {
>>>>>>>>> 	+       pr_info("ce->name:%s\n", dev->name);
>>>>>>>>> 	+       dump_stack();
>>>>>>>>> 	        clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
>>>>>>>>> 	        dev->next_event.tv64 = KTIME_MAX;
>>>>>>>>> 	 }
>>>>>>>>>
>>>>>>>>> It is hit a few times during boot, so I attach a full boot log. I really
>>>>>>>>> don't know what to look for, but I hope you can spot something in it. I
>>>>>>>>> really appreciate you taking the time.
>>>>>>>>
>>>>>>>> Thanks for the traces.
>>>>>>>
>>>>>>> Sure.
>>>>>>>
>>>>>>>>
>>>>>>>> If you try without the ttc_clockevent configured in the kernel (but with
>>>>>>>> twd and gt), does it boot ?
>>>>>>>
>>>>>>> Absence of the TTC doesn't seem to make any difference. It hangs at the
>>>>>>> same location.
>>>>>>
>>>>>> Ok, IMO there is a problem with the broadcast device registration (may
>>>>>> be vs twd).
>>>>>>
>>>>>> I will check later (kid duty) :)
>>>>>
>>>>> I was actually waiting for an update from your side and did something
>>>>> else, but I seem to have run into this again. I was overhauling the
>>>>> cadence_ttc (patch attached, based on tip/timers/core). And it seems to
>>>>> show the same behavior as enabling the global_timer. With cpuidle off, it
>>>>> works. With cpuidle, on it hangs. Removing the TIMER_STOP flag from the
>>>>> C2 state makes it boot again.
>>>>> It works just fine on our 3.10 kernel.
>>>>
>>>> This is not necessary related to the bug. If the patch you sent broke
>>>> the cadence_ttc driver, when you use it (with the TIMER_STOP), you will
>>>> be stuck. Removing the flag, may signifies you don't use the broadcast
>>>> timer, hence the bug is not surfacing.
>>>>
>>>> Going back to the bug with the arm_global_timer, what is observed is the
>>>> broadcast timer is *shutdown* when the second cpu is online.
>>>>
>>>> I have to dig into the kernel/time/clockevents.c|tick-*.c because IMO
>>>> the issue is coming from there but before I have to reproduce the bug,
>>>> so find a board I have where I can add the arm_global_timer.
>>>>
>>>>> Another thing I noticed - probably unrelated but hard to tell: On
>>>>> 3.11-rc1 and later my system stops for quite some time at the hand off
>>>>> to userspace. I.e. I see the 'freeing unused kernel memory...' line and
>>>>> sometimes the following 'Welcome to Buildroot...' and then it stops and
>>>>> on good kernels it continues after a while and boots through and on bad
>>>>> ones it just hangs there.
>>>>
>>>> did you try to dump the stacks with magic-sysrq ? Or git bisect ?
>>>
>>> Soren: Are you able to replicate this issue on QEMU?
>>> If yes, it should be the best if you can provide Qemu, kernel .config/
>>> rootfs and simple manual to Daniel how to reach that fault.
>>
>> I tried to download qemu for zynq but it fails:
>>
>> git clone git://git.xilinx.com/qemu-xarm.git
>> Cloning into 'qemu-xarm'...
>> fatal: The remote end hung up unexpectedly
> 
> Not sure which site have you found but
> it should be just qemu.git
> https://github.com/Xilinx/qemu
> 
> or github clone.

Ok, cool I was able to clone it.

>> I am also looking for the option specified for the kernel:
>>
>> "The kernel needs to be built with this feature turned on (in
>> menuconfig, System Type->Xilinx Specific Features -> Device Tree At
>> Fixed Address)."

Ok.

> This also sound like a very ancient tree.
> This is the latest kernel tree - master-next is the latest devel branch.
> https://github.com/Xilinx/linux-xlnx

Ok, cool. I have the right one.

> Or there should be an option to use the latest kernel from kernel.org.
> (I think Soren is using it)
> 
> Zynq is the part of multiplatfrom kernel and cadence ttc is there,
> dts is also in the mainline kernel.
> 
>> ps : apart that, well documented website !
> 
> Can you send me the link to it?

http://xilinx.wikidot.com/zynq-qemu
http://xilinx.wikidot.com/zynq-linux


> This should be the main page for it.
> http://www.wiki.xilinx.com/

Thanks Michal !

  -- Daniel
Michal Simek Aug. 6, 2013, 1:18 p.m. UTC | #6
On 08/06/2013 03:08 PM, Daniel Lezcano wrote:
> On 08/06/2013 02:41 PM, Michal Simek wrote:
>> On 08/06/2013 02:30 PM, Daniel Lezcano wrote:
>>> On 08/06/2013 11:18 AM, Michal Simek wrote:
>>>> On 08/06/2013 10:46 AM, Daniel Lezcano wrote:
>>>>> On 08/06/2013 03:28 AM, Sören Brinkmann wrote:
>>>>>> Hi Daniel,
>>>>>>
>>>>>> On Thu, Aug 01, 2013 at 07:48:04PM +0200, Daniel Lezcano wrote:
>>>>>>> On 08/01/2013 07:43 PM, Sören Brinkmann wrote:
>>>>>>>> On Thu, Aug 01, 2013 at 07:29:12PM +0200, Daniel Lezcano wrote:
>>>>>>>>> On 08/01/2013 01:38 AM, Sören Brinkmann wrote:
>>>>>>>>>> On Thu, Aug 01, 2013 at 01:01:27AM +0200, Daniel Lezcano wrote:
>>>>>>>>>>> On 08/01/2013 12:18 AM, Sören Brinkmann wrote:
>>>>>>>>>>>> On Wed, Jul 31, 2013 at 11:08:51PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>> On 07/31/2013 10:58 PM, Sören Brinkmann wrote:
>>>>>>>>>>>>>> On Wed, Jul 31, 2013 at 10:49:06PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>>>> On 07/31/2013 12:34 AM, Sören Brinkmann wrote:
>>>>>>>>>>>>>>>> On Tue, Jul 30, 2013 at 10:47:15AM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>>>>>> On 07/30/2013 02:03 AM, Sören Brinkmann wrote:
>>>>>>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jul 29, 2013 at 02:51:49PM +0200, Daniel Lezcano wrote:
>>>>>>>>>>>>>>>>>> (snip)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> the CPUIDLE_FLAG_TIMER_STOP flag tells the cpuidle framework the local
>>>>>>>>>>>>>>>>>>> timer will be stopped when entering to the idle state. In this case, the
>>>>>>>>>>>>>>>>>>> cpuidle framework will call clockevents_notify(ENTER) and switches to a
>>>>>>>>>>>>>>>>>>> broadcast timer and will call clockevents_notify(EXIT) when exiting the
>>>>>>>>>>>>>>>>>>> idle state, switching the local timer back in use.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've been thinking about this, trying to understand how this makes my
>>>>>>>>>>>>>>>>>> boot attempts on Zynq hang. IIUC, the wrongly provided TIMER_STOP flag
>>>>>>>>>>>>>>>>>> would make the timer core switch to a broadcast device even though it
>>>>>>>>>>>>>>>>>> wouldn't be necessary. But shouldn't it still work? It sounds like we do
>>>>>>>>>>>>>>>>>> something useless, but nothing wrong in a sense that it should result in
>>>>>>>>>>>>>>>>>> breakage. I guess I'm missing something obvious. This timer system will
>>>>>>>>>>>>>>>>>> always remain a mystery to me.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Actually this more or less leads to the question: What is this
>>>>>>>>>>>>>>>>>> 'broadcast timer'. I guess that is some clockevent device which is
>>>>>>>>>>>>>>>>>> common to all cores? (that would be the cadence_ttc for Zynq). Is the
>>>>>>>>>>>>>>>>>> hang pointing to some issue with that driver?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If you look at the /proc/timer_list, which timer is used for broadcasting ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So, the correct run results (full output attached).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The vanilla kernel uses the twd timers as local timers and the TTC as
>>>>>>>>>>>>>>>> broadcast device:
>>>>>>>>>>>>>>>> 	Tick Device: mode:     1                                                         
>>>>>>>>>>>>>>>> 	Broadcast device  
>>>>>>>>>>>>>>>> 	Clock Event Device: ttc_clockevent
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When I remove the offending CPUIDLE flag and add the DT fragment to
>>>>>>>>>>>>>>>> enable the global timer, the twd timers are still used as local timers
>>>>>>>>>>>>>>>> and the broadcast device is the global timer:
>>>>>>>>>>>>>>>> 	Tick Device: mode:     1                                                         
>>>>>>>>>>>>>>>> 	Broadcast device                                                                 
>>>>>>>>>>>>>>>> 	Clock Event Device: arm_global_timer
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Again, since boot hangs in the actually broken case, I don't see way to
>>>>>>>>>>>>>>>> obtain this information for that case.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can't you use the maxcpus=1 option to ensure the system to boot up ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Right, that works. I forgot about that option after you mentioned, that
>>>>>>>>>>>>>> it is most likely not that useful.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Anyway, this are those sysfs files with an unmodified cpuidle driver and
>>>>>>>>>>>>>> the gt enabled and having maxcpus=1 set.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /proc/timer_list:
>>>>>>>>>>>>>> 	Tick Device: mode:     1
>>>>>>>>>>>>>> 	Broadcast device
>>>>>>>>>>>>>> 	Clock Event Device: arm_global_timer
>>>>>>>>>>>>>> 	 max_delta_ns:   12884902005
>>>>>>>>>>>>>> 	 min_delta_ns:   1000
>>>>>>>>>>>>>> 	 mult:           715827876
>>>>>>>>>>>>>> 	 shift:          31
>>>>>>>>>>>>>> 	 mode:           3
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here the mode is 3 (CLOCK_EVT_MODE_ONESHOT)
>>>>>>>>>>>>>
>>>>>>>>>>>>> The previous timer_list output you gave me when removing the offending
>>>>>>>>>>>>> cpuidle flag, it was 1 (CLOCK_EVT_MODE_SHUTDOWN).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is it possible you try to get this output again right after onlining the
>>>>>>>>>>>>> cpu1 in order to check if the broadcast device switches to SHUTDOWN ?
>>>>>>>>>>>>
>>>>>>>>>>>> How do I do that? I tried to online CPU1 after booting with maxcpus=1
>>>>>>>>>>>> and that didn't end well:
>>>>>>>>>>>> 	# echo 1 > online && cat /proc/timer_list 
>>>>>>>>>>>
>>>>>>>>>>> Hmm, I was hoping to have a small delay before the kernel hangs but
>>>>>>>>>>> apparently this is not the case... :(
>>>>>>>>>>>
>>>>>>>>>>> I suspect the global timer is shutdown at one moment but I don't
>>>>>>>>>>> understand why and when.
>>>>>>>>>>>
>>>>>>>>>>> Can you add a stack trace in the "clockevents_shutdown" function with
>>>>>>>>>>> the clockevent device name ? Perhaps, we may see at boot time an
>>>>>>>>>>> interesting trace when it hangs.
>>>>>>>>>>
>>>>>>>>>> I did this change:
>>>>>>>>>> 	diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
>>>>>>>>>> 	index 38959c8..3ab11c1 100644
>>>>>>>>>> 	--- a/kernel/time/clockevents.c
>>>>>>>>>> 	+++ b/kernel/time/clockevents.c
>>>>>>>>>> 	@@ -92,6 +92,8 @@ void clockevents_set_mode(struct clock_event_device *dev,
>>>>>>>>>> 	  */
>>>>>>>>>> 	 void clockevents_shutdown(struct clock_event_device *dev)
>>>>>>>>>> 	 {
>>>>>>>>>> 	+       pr_info("ce->name:%s\n", dev->name);
>>>>>>>>>> 	+       dump_stack();
>>>>>>>>>> 	        clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
>>>>>>>>>> 	        dev->next_event.tv64 = KTIME_MAX;
>>>>>>>>>> 	 }
>>>>>>>>>>
>>>>>>>>>> It is hit a few times during boot, so I attach a full boot log. I really
>>>>>>>>>> don't know what to look for, but I hope you can spot something in it. I
>>>>>>>>>> really appreciate you taking the time.
>>>>>>>>>
>>>>>>>>> Thanks for the traces.
>>>>>>>>
>>>>>>>> Sure.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you try without the ttc_clockevent configured in the kernel (but with
>>>>>>>>> twd and gt), does it boot ?
>>>>>>>>
>>>>>>>> Absence of the TTC doesn't seem to make any difference. It hangs at the
>>>>>>>> same location.
>>>>>>>
>>>>>>> Ok, IMO there is a problem with the broadcast device registration (may
>>>>>>> be vs twd).
>>>>>>>
>>>>>>> I will check later (kid duty) :)
>>>>>>
>>>>>> I was actually waiting for an update from your side and did something
>>>>>> else, but I seem to have run into this again. I was overhauling the
>>>>>> cadence_ttc (patch attached, based on tip/timers/core). And it seems to
>>>>>> show the same behavior as enabling the global_timer. With cpuidle off, it
>>>>>> works. With cpuidle, on it hangs. Removing the TIMER_STOP flag from the
>>>>>> C2 state makes it boot again.
>>>>>> It works just fine on our 3.10 kernel.
>>>>>
>>>>> This is not necessary related to the bug. If the patch you sent broke
>>>>> the cadence_ttc driver, when you use it (with the TIMER_STOP), you will
>>>>> be stuck. Removing the flag, may signifies you don't use the broadcast
>>>>> timer, hence the bug is not surfacing.
>>>>>
>>>>> Going back to the bug with the arm_global_timer, what is observed is the
>>>>> broadcast timer is *shutdown* when the second cpu is online.
>>>>>
>>>>> I have to dig into the kernel/time/clockevents.c|tick-*.c because IMO
>>>>> the issue is coming from there but before I have to reproduce the bug,
>>>>> so find a board I have where I can add the arm_global_timer.
>>>>>
>>>>>> Another thing I noticed - probably unrelated but hard to tell: On
>>>>>> 3.11-rc1 and later my system stops for quite some time at the hand off
>>>>>> to userspace. I.e. I see the 'freeing unused kernel memory...' line and
>>>>>> sometimes the following 'Welcome to Buildroot...' and then it stops and
>>>>>> on good kernels it continues after a while and boots through and on bad
>>>>>> ones it just hangs there.
>>>>>
>>>>> did you try to dump the stacks with magic-sysrq ? Or git bisect ?
>>>>
>>>> Soren: Are you able to replicate this issue on QEMU?
>>>> If yes, it should be the best if you can provide Qemu, kernel .config/
>>>> rootfs and simple manual to Daniel how to reach that fault.
>>>
>>> I tried to download qemu for zynq but it fails:
>>>
>>> git clone git://git.xilinx.com/qemu-xarm.git
>>> Cloning into 'qemu-xarm'...
>>> fatal: The remote end hung up unexpectedly
>>
>> Not sure which site have you found but
>> it should be just qemu.git
>> https://github.com/Xilinx/qemu
>>
>> or github clone.
> 
> Ok, cool I was able to clone it.
> 
>>> I am also looking for the option specified for the kernel:
>>>
>>> "The kernel needs to be built with this feature turned on (in
>>> menuconfig, System Type->Xilinx Specific Features -> Device Tree At
>>> Fixed Address)."
> 
> Ok.
> 
>> This also sound like a very ancient tree.
>> This is the latest kernel tree - master-next is the latest devel branch.
>> https://github.com/Xilinx/linux-xlnx
> 
> Ok, cool. I have the right one.
> 
>> Or there should be an option to use the latest kernel from kernel.org.
>> (I think Soren is using it)
>>
>> Zynq is the part of multiplatfrom kernel and cadence ttc is there,
>> dts is also in the mainline kernel.
>>
>>> ps : apart that, well documented website !
>>
>> Can you send me the link to it?
> 
> http://xilinx.wikidot.com/zynq-qemu
> http://xilinx.wikidot.com/zynq-linux

I will find out information why it is still there.
I think it was moved to the new location.

> 
>> This should be the main page for it.
>> http://www.wiki.xilinx.com/
> 
> Thanks Michal !

Thank you too Daniel,
Michal
Daniel Lezcano Aug. 6, 2013, 4:09 p.m. UTC | #7
On 08/06/2013 03:18 PM, Michal Simek wrote:

[ ... ]

>>>>> Soren: Are you able to replicate this issue on QEMU?
>>>>> If yes, it should be the best if you can provide Qemu, kernel .config/
>>>>> rootfs and simple manual to Daniel how to reach that fault.
>>>>
>>>> I tried to download qemu for zynq but it fails:
>>>>
>>>> git clone git://git.xilinx.com/qemu-xarm.git
>>>> Cloning into 'qemu-xarm'...
>>>> fatal: The remote end hung up unexpectedly
>>>
>>> Not sure which site have you found but
>>> it should be just qemu.git
>>> https://github.com/Xilinx/qemu
>>>
>>> or github clone.
>>
>> Ok, cool I was able to clone it.
>>
>>>> I am also looking for the option specified for the kernel:
>>>>
>>>> "The kernel needs to be built with this feature turned on (in
>>>> menuconfig, System Type->Xilinx Specific Features -> Device Tree At
>>>> Fixed Address)."
>>
>> Ok.
>>
>>> This also sound like a very ancient tree.
>>> This is the latest kernel tree - master-next is the latest devel branch.
>>> https://github.com/Xilinx/linux-xlnx
>>
>> Ok, cool. I have the right one.

Following the documentation, I was able to boot a kernel with qemu for
the linux-xlnx and qemu-xilinx.

But this kernel is outdated regarding the upstream one, so I tried to
boot a 3.11-rc4 kernel without success, I did the following:

I used the default config file from linux-xlnx for the upstream kernel.

I compiled the kernel with:

make -j 5 ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-
UIMAGE_LOADADDR=0x8000 uImage

I generated the dtb with:

make -j 5 ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- dtbs

For qemu, I started qemu with:

./arm-softmmu/qemu-system-arm -M arm-generic-fdt -nographic -smp 2
-machine linux=on -serial mon:stdio -dtb zynq-zed.dtb -kernel
kernel/zImage -initrd filesystem/ramdisk.img

I tried with the dtb available for the upstream kernel:

zynq-zc706.dtb, zynq-zc702.dtb and zynq-zed.dtb

Did I miss something ?

Thanks
  -- Daniel



>>
>>> Or there should be an option to use the latest kernel from kernel.org.
>>> (I think Soren is using it)
>>>
>>> Zynq is the part of multiplatfrom kernel and cadence ttc is there,
>>> dts is also in the mainline kernel.
>>>
>>>> ps : apart that, well documented website !
>>>
>>> Can you send me the link to it?
>>
>> http://xilinx.wikidot.com/zynq-qemu
>> http://xilinx.wikidot.com/zynq-linux
> 
> I will find out information why it is still there.
> I think it was moved to the new location.
> 
>>
>>> This should be the main page for it.
>>> http://www.wiki.xilinx.com/
Soren Brinkmann Aug. 6, 2013, 4:13 p.m. UTC | #8
On Tue, Aug 06, 2013 at 06:09:09PM +0200, Daniel Lezcano wrote:
> On 08/06/2013 03:18 PM, Michal Simek wrote:
> 
> [ ... ]
> 
> >>>>> Soren: Are you able to replicate this issue on QEMU?
> >>>>> If yes, it should be the best if you can provide Qemu, kernel .config/
> >>>>> rootfs and simple manual to Daniel how to reach that fault.
> >>>>
> >>>> I tried to download qemu for zynq but it fails:
> >>>>
> >>>> git clone git://git.xilinx.com/qemu-xarm.git
> >>>> Cloning into 'qemu-xarm'...
> >>>> fatal: The remote end hung up unexpectedly
> >>>
> >>> Not sure which site have you found but
> >>> it should be just qemu.git
> >>> https://github.com/Xilinx/qemu
> >>>
> >>> or github clone.
> >>
> >> Ok, cool I was able to clone it.
> >>
> >>>> I am also looking for the option specified for the kernel:
> >>>>
> >>>> "The kernel needs to be built with this feature turned on (in
> >>>> menuconfig, System Type->Xilinx Specific Features -> Device Tree At
> >>>> Fixed Address)."
> >>
> >> Ok.
> >>
> >>> This also sound like a very ancient tree.
> >>> This is the latest kernel tree - master-next is the latest devel branch.
> >>> https://github.com/Xilinx/linux-xlnx
> >>
> >> Ok, cool. I have the right one.
> 
> Following the documentation, I was able to boot a kernel with qemu for
> the linux-xlnx and qemu-xilinx.
> 
> But this kernel is outdated regarding the upstream one, so I tried to
> boot a 3.11-rc4 kernel without success, I did the following:
> 
> I used the default config file from linux-xlnx for the upstream kernel.
> 
> I compiled the kernel with:
> 
> make -j 5 ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-
> UIMAGE_LOADADDR=0x8000 uImage
> 
> I generated the dtb with:
> 
> make -j 5 ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- dtbs
> 
> For qemu, I started qemu with:
> 
> ./arm-softmmu/qemu-system-arm -M arm-generic-fdt -nographic -smp 2
> -machine linux=on -serial mon:stdio -dtb zynq-zed.dtb -kernel
> kernel/zImage -initrd filesystem/ramdisk.img
> 
> I tried with the dtb available for the upstream kernel:
> 
> zynq-zc706.dtb, zynq-zc702.dtb and zynq-zed.dtb
> 
> Did I miss something ?

Unfortunately the public github QEMU is behind our internal development.
IIRC, there were a couple of DT compatible strings QEMU didn't know.
Those are sometimes removed making boot fail/hang.

Are you on IRC? It's probably easier to resolve this in a more direct
way (I'm 'sorenb' on freenode). Otherwise I could give you a few more
instructions for debugging this per mail?

	Sören
Soren Brinkmann Aug. 6, 2013, 4:18 p.m. UTC | #9
On Tue, Aug 06, 2013 at 10:46:54AM +0200, Daniel Lezcano wrote:
> On 08/06/2013 03:28 AM, Sören Brinkmann wrote:
> > Hi Daniel,
> > 
> > On Thu, Aug 01, 2013 at 07:48:04PM +0200, Daniel Lezcano wrote:
> >> On 08/01/2013 07:43 PM, Sören Brinkmann wrote:
> >>> On Thu, Aug 01, 2013 at 07:29:12PM +0200, Daniel Lezcano wrote:
> >>>> On 08/01/2013 01:38 AM, Sören Brinkmann wrote:
> >>>>> On Thu, Aug 01, 2013 at 01:01:27AM +0200, Daniel Lezcano wrote:
> >>>>>> On 08/01/2013 12:18 AM, Sören Brinkmann wrote:
> >>>>>>> On Wed, Jul 31, 2013 at 11:08:51PM +0200, Daniel Lezcano wrote:
> >>>>>>>> On 07/31/2013 10:58 PM, Sören Brinkmann wrote:
> >>>>>>>>> On Wed, Jul 31, 2013 at 10:49:06PM +0200, Daniel Lezcano wrote:
> >>>>>>>>>> On 07/31/2013 12:34 AM, Sören Brinkmann wrote:
> >>>>>>>>>>> On Tue, Jul 30, 2013 at 10:47:15AM +0200, Daniel Lezcano wrote:
> >>>>>>>>>>>> On 07/30/2013 02:03 AM, Sören Brinkmann wrote:
> >>>>>>>>>>>>> Hi Daniel,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Jul 29, 2013 at 02:51:49PM +0200, Daniel Lezcano wrote:
> >>>>>>>>>>>>> (snip)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> the CPUIDLE_FLAG_TIMER_STOP flag tells the cpuidle framework the local
> >>>>>>>>>>>>>> timer will be stopped when entering to the idle state. In this case, the
> >>>>>>>>>>>>>> cpuidle framework will call clockevents_notify(ENTER) and switches to a
> >>>>>>>>>>>>>> broadcast timer and will call clockevents_notify(EXIT) when exiting the
> >>>>>>>>>>>>>> idle state, switching the local timer back in use.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I've been thinking about this, trying to understand how this makes my
> >>>>>>>>>>>>> boot attempts on Zynq hang. IIUC, the wrongly provided TIMER_STOP flag
> >>>>>>>>>>>>> would make the timer core switch to a broadcast device even though it
> >>>>>>>>>>>>> wouldn't be necessary. But shouldn't it still work? It sounds like we do
> >>>>>>>>>>>>> something useless, but nothing wrong in a sense that it should result in
> >>>>>>>>>>>>> breakage. I guess I'm missing something obvious. This timer system will
> >>>>>>>>>>>>> always remain a mystery to me.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Actually this more or less leads to the question: What is this
> >>>>>>>>>>>>> 'broadcast timer'. I guess that is some clockevent device which is
> >>>>>>>>>>>>> common to all cores? (that would be the cadence_ttc for Zynq). Is the
> >>>>>>>>>>>>> hang pointing to some issue with that driver?
> >>>>>>>>>>>>
> >>>>>>>>>>>> If you look at the /proc/timer_list, which timer is used for broadcasting ?
> >>>>>>>>>>>
> >>>>>>>>>>> So, the correct run results (full output attached).
> >>>>>>>>>>>
> >>>>>>>>>>> The vanilla kernel uses the twd timers as local timers and the TTC as
> >>>>>>>>>>> broadcast device:
> >>>>>>>>>>> 	Tick Device: mode:     1                                                         
> >>>>>>>>>>> 	Broadcast device  
> >>>>>>>>>>> 	Clock Event Device: ttc_clockevent
> >>>>>>>>>>>
> >>>>>>>>>>> When I remove the offending CPUIDLE flag and add the DT fragment to
> >>>>>>>>>>> enable the global timer, the twd timers are still used as local timers
> >>>>>>>>>>> and the broadcast device is the global timer:
> >>>>>>>>>>> 	Tick Device: mode:     1                                                         
> >>>>>>>>>>> 	Broadcast device                                                                 
> >>>>>>>>>>> 	Clock Event Device: arm_global_timer
> >>>>>>>>>>>
> >>>>>>>>>>> Again, since boot hangs in the actually broken case, I don't see way to
> >>>>>>>>>>> obtain this information for that case.
> >>>>>>>>>>
> >>>>>>>>>> Can't you use the maxcpus=1 option to ensure the system to boot up ?
> >>>>>>>>>
> >>>>>>>>> Right, that works. I forgot about that option after you mentioned, that
> >>>>>>>>> it is most likely not that useful.
> >>>>>>>>>
> >>>>>>>>> Anyway, this are those sysfs files with an unmodified cpuidle driver and
> >>>>>>>>> the gt enabled and having maxcpus=1 set.
> >>>>>>>>>
> >>>>>>>>> /proc/timer_list:
> >>>>>>>>> 	Tick Device: mode:     1
> >>>>>>>>> 	Broadcast device
> >>>>>>>>> 	Clock Event Device: arm_global_timer
> >>>>>>>>> 	 max_delta_ns:   12884902005
> >>>>>>>>> 	 min_delta_ns:   1000
> >>>>>>>>> 	 mult:           715827876
> >>>>>>>>> 	 shift:          31
> >>>>>>>>> 	 mode:           3
> >>>>>>>>
> >>>>>>>> Here the mode is 3 (CLOCK_EVT_MODE_ONESHOT)
> >>>>>>>>
> >>>>>>>> The previous timer_list output you gave me when removing the offending
> >>>>>>>> cpuidle flag, it was 1 (CLOCK_EVT_MODE_SHUTDOWN).
> >>>>>>>>
> >>>>>>>> Is it possible you try to get this output again right after onlining the
> >>>>>>>> cpu1 in order to check if the broadcast device switches to SHUTDOWN ?
> >>>>>>>
> >>>>>>> How do I do that? I tried to online CPU1 after booting with maxcpus=1
> >>>>>>> and that didn't end well:
> >>>>>>> 	# echo 1 > online && cat /proc/timer_list 
> >>>>>>
> >>>>>> Hmm, I was hoping to have a small delay before the kernel hangs but
> >>>>>> apparently this is not the case... :(
> >>>>>>
> >>>>>> I suspect the global timer is shutdown at one moment but I don't
> >>>>>> understand why and when.
> >>>>>>
> >>>>>> Can you add a stack trace in the "clockevents_shutdown" function with
> >>>>>> the clockevent device name ? Perhaps, we may see at boot time an
> >>>>>> interesting trace when it hangs.
> >>>>>
> >>>>> I did this change:
> >>>>> 	diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> >>>>> 	index 38959c8..3ab11c1 100644
> >>>>> 	--- a/kernel/time/clockevents.c
> >>>>> 	+++ b/kernel/time/clockevents.c
> >>>>> 	@@ -92,6 +92,8 @@ void clockevents_set_mode(struct clock_event_device *dev,
> >>>>> 	  */
> >>>>> 	 void clockevents_shutdown(struct clock_event_device *dev)
> >>>>> 	 {
> >>>>> 	+       pr_info("ce->name:%s\n", dev->name);
> >>>>> 	+       dump_stack();
> >>>>> 	        clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
> >>>>> 	        dev->next_event.tv64 = KTIME_MAX;
> >>>>> 	 }
> >>>>>
> >>>>> It is hit a few times during boot, so I attach a full boot log. I really
> >>>>> don't know what to look for, but I hope you can spot something in it. I
> >>>>> really appreciate you taking the time.
> >>>>
> >>>> Thanks for the traces.
> >>>
> >>> Sure.
> >>>
> >>>>
> >>>> If you try without the ttc_clockevent configured in the kernel (but with
> >>>> twd and gt), does it boot ?
> >>>
> >>> Absence of the TTC doesn't seem to make any difference. It hangs at the
> >>> same location.
> >>
> >> Ok, IMO there is a problem with the broadcast device registration (may
> >> be vs twd).
> >>
> >> I will check later (kid duty) :)
> > 
> > I was actually waiting for an update from your side and did something
> > else, but I seem to have run into this again. I was overhauling the
> > cadence_ttc (patch attached, based on tip/timers/core). And it seems to
> > show the same behavior as enabling the global_timer. With cpuidle off, it
> > works. With cpuidle, on it hangs. Removing the TIMER_STOP flag from the
> > C2 state makes it boot again.
> > It works just fine on our 3.10 kernel.
> 
> This is not necessary related to the bug. If the patch you sent broke
> the cadence_ttc driver, when you use it (with the TIMER_STOP), you will
> be stuck. Removing the flag, may signifies you don't use the broadcast
> timer, hence the bug is not surfacing.
> 
> Going back to the bug with the arm_global_timer, what is observed is the
> broadcast timer is *shutdown* when the second cpu is online.
> 
> I have to dig into the kernel/time/clockevents.c|tick-*.c because IMO
> the issue is coming from there but before I have to reproduce the bug,
> so find a board I have where I can add the arm_global_timer.
> 
> > Another thing I noticed - probably unrelated but hard to tell: On
> > 3.11-rc1 and later my system stops for quite some time at the hand off
> > to userspace. I.e. I see the 'freeing unused kernel memory...' line and
> > sometimes the following 'Welcome to Buildroot...' and then it stops and
> > on good kernels it continues after a while and boots through and on bad
> > ones it just hangs there.
> 
> did you try to dump the stacks with magic-sysrq ? Or git bisect ?

That's my plan for today. Trying to find a reasonable approach for
bisecting some of this stuff.

	Sören
Soren Brinkmann Aug. 6, 2013, 4:25 p.m. UTC | #10
On Tue, Aug 06, 2013 at 06:09:09PM +0200, Daniel Lezcano wrote:
> On 08/06/2013 03:18 PM, Michal Simek wrote:
> 
> [ ... ]
> 
> >>>>> Soren: Are you able to replicate this issue on QEMU?
> >>>>> If yes, it should be the best if you can provide Qemu, kernel .config/
> >>>>> rootfs and simple manual to Daniel how to reach that fault.
> >>>>
> >>>> I tried to download qemu for zynq but it fails:
> >>>>
> >>>> git clone git://git.xilinx.com/qemu-xarm.git
> >>>> Cloning into 'qemu-xarm'...
> >>>> fatal: The remote end hung up unexpectedly
> >>>
> >>> Not sure which site have you found but
> >>> it should be just qemu.git
> >>> https://github.com/Xilinx/qemu
> >>>
> >>> or github clone.
> >>
> >> Ok, cool I was able to clone it.
> >>
> >>>> I am also looking for the option specified for the kernel:
> >>>>
> >>>> "The kernel needs to be built with this feature turned on (in
> >>>> menuconfig, System Type->Xilinx Specific Features -> Device Tree At
> >>>> Fixed Address)."
> >>
> >> Ok.
> >>
> >>> This also sound like a very ancient tree.
> >>> This is the latest kernel tree - master-next is the latest devel branch.
> >>> https://github.com/Xilinx/linux-xlnx
> >>
> >> Ok, cool. I have the right one.
> 
> Following the documentation, I was able to boot a kernel with qemu for
> the linux-xlnx and qemu-xilinx.
> 
> But this kernel is outdated regarding the upstream one, so I tried to
> boot a 3.11-rc4 kernel without success, I did the following:
> 
> I used the default config file from linux-xlnx for the upstream kernel.
> 
> I compiled the kernel with:
> 
> make -j 5 ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-
> UIMAGE_LOADADDR=0x8000 uImage
> 
> I generated the dtb with:
> 
> make -j 5 ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- dtbs
> 
> For qemu, I started qemu with:
> 
> ./arm-softmmu/qemu-system-arm -M arm-generic-fdt -nographic -smp 2
> -machine linux=on -serial mon:stdio -dtb zynq-zed.dtb -kernel
> kernel/zImage -initrd filesystem/ramdisk.img
> 
> I tried with the dtb available for the upstream kernel:
> 
> zynq-zc706.dtb, zynq-zc702.dtb and zynq-zed.dtb
> 
> Did I miss something ?

Some debugging hints in case you wanna go through this.
Add this additional option to configure:
 --extra-cflags="-DFDT_GENERIC_UTIL_ERR_DEBUG=1

That'll print out a lot of messages when the dtb is parsed. It's likely
that QEMU invalidates some vital node due to its compatible string being
unknown. In that case you can simply add it to the list of known devices
in 
	hw/core/fdt_generic_devices.c
The list is pretty much at the end of that file. I try to get it running
here and might be able to send you a patch.

	Sören
diff mbox

Patch

From db4ad00ab774555ba8113ca6fcb9ceace15cfbbc Mon Sep 17 00:00:00 2001
From: Soren Brinkmann <soren.brinkmann@xilinx.com>
Date: Mon, 5 Aug 2013 15:48:59 -0700
Subject: [PATCH] clocksource/cadence_ttc: Use only one counter

Currently the driver uses two of the three counters the TTC provides to
implement a clocksource and a clockevent device. By using the TTC's
match feature we can implement both use cases using a single counter
only.

Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
---
 drivers/clocksource/cadence_ttc_timer.c | 97 ++++++++++++---------------------
 1 file changed, 36 insertions(+), 61 deletions(-)

diff --git a/drivers/clocksource/cadence_ttc_timer.c b/drivers/clocksource/cadence_ttc_timer.c
index b2bb3a4b..d6ab818 100644
--- a/drivers/clocksource/cadence_ttc_timer.c
+++ b/drivers/clocksource/cadence_ttc_timer.c
@@ -46,6 +46,7 @@ 
 #define TTC_CNT_CNTRL_OFFSET		0x0C /* Counter Control Reg, RW */
 #define TTC_COUNT_VAL_OFFSET		0x18 /* Counter Value Reg, RO */
 #define TTC_INTR_VAL_OFFSET		0x24 /* Interval Count Reg, RW */
+#define TTC_MATCH1_OFFSET	0x30 /* Match reg, RW */
 #define TTC_ISR_OFFSET		0x54 /* Interrupt Status Reg, RO */
 #define TTC_IER_OFFSET		0x60 /* Interrupt Enable Reg, RW */
 
@@ -61,7 +62,10 @@ 
 #define PRESCALE		2048	/* The exponent must match this */
 #define CLK_CNTRL_PRESCALE	((PRESCALE_EXPONENT - 1) << 1)
 #define CLK_CNTRL_PRESCALE_EN	1
-#define CNT_CNTRL_RESET		(1 << 4)
+#define CNT_CNTRL_RESET		BIT(4)
+#define CNT_CNTRL_MATCH		BIT(3)
+
+#define TTC_INTERRUPT_MATCH1	BIT(1)
 
 /**
  * struct ttc_timer - This definition defines local timer structure
@@ -90,6 +94,7 @@  struct ttc_timer_clocksource {
 struct ttc_timer_clockevent {
 	struct ttc_timer		ttc;
 	struct clock_event_device	ce;
+	unsigned long			interval;
 };
 
 #define to_ttc_timer_clkevent(x) \
@@ -103,25 +108,25 @@  static void __iomem *ttc_sched_clock_val_reg;
  * @timer:	Pointer to the timer instance
  * @cycles:	Timer interval ticks
  **/
-static void ttc_set_interval(struct ttc_timer *timer,
-					unsigned long cycles)
+static void ttc_set_interval(struct ttc_timer *timer, unsigned long cycles)
 {
-	u32 ctrl_reg;
+	struct ttc_timer_clockevent *ttcce = container_of(timer,
+			struct ttc_timer_clockevent, ttc);
 
-	/* Disable the counter, set the counter value  and re-enable counter */
-	ctrl_reg = __raw_readl(timer->base_addr + TTC_CNT_CNTRL_OFFSET);
-	ctrl_reg |= TTC_CNT_CNTRL_DISABLE_MASK;
-	__raw_writel(ctrl_reg, timer->base_addr + TTC_CNT_CNTRL_OFFSET);
+	/* set interval */
+	u32 reg = __raw_readl(timer->base_addr + TTC_COUNT_VAL_OFFSET);
+	reg += cycles;
+	__raw_writel(reg, timer->base_addr + TTC_MATCH1_OFFSET);
 
-	__raw_writel(cycles, timer->base_addr + TTC_INTR_VAL_OFFSET);
+	/* enable match interrupt */
+	__raw_writel(TTC_INTERRUPT_MATCH1, timer->base_addr + TTC_IER_OFFSET);
 
-	/*
-	 * Reset the counter (0x10) so that it starts from 0, one-shot
-	 * mode makes this needed for timing to be right.
-	 */
-	ctrl_reg |= CNT_CNTRL_RESET;
-	ctrl_reg &= ~TTC_CNT_CNTRL_DISABLE_MASK;
-	__raw_writel(ctrl_reg, timer->base_addr + TTC_CNT_CNTRL_OFFSET);
+	/* enable timer */
+	reg = __raw_readl(timer->base_addr + TTC_CNT_CNTRL_OFFSET);
+	reg &= ~TTC_CNT_CNTRL_DISABLE_MASK;
+	__raw_writel(reg, timer->base_addr + TTC_CNT_CNTRL_OFFSET);
+
+	ttcce->interval = cycles;
 }
 
 /**
@@ -139,6 +144,8 @@  static irqreturn_t ttc_clock_event_interrupt(int irq, void *dev_id)
 
 	/* Acknowledge the interrupt and call event handler */
 	__raw_readl(timer->base_addr + TTC_ISR_OFFSET);
+	if (ttce->ce.mode == CLOCK_EVT_MODE_PERIODIC)
+		ttc_set_interval(timer, ttce->interval);
 
 	ttce->ce.event_handler(&ttce->ce);
 
@@ -192,7 +199,6 @@  static void ttc_set_mode(enum clock_event_mode mode,
 {
 	struct ttc_timer_clockevent *ttce = to_ttc_timer_clkevent(evt);
 	struct ttc_timer *timer = &ttce->ttc;
-	u32 ctrl_reg;
 
 	switch (mode) {
 	case CLOCK_EVT_MODE_PERIODIC:
@@ -203,18 +209,9 @@  static void ttc_set_mode(enum clock_event_mode mode,
 	case CLOCK_EVT_MODE_ONESHOT:
 	case CLOCK_EVT_MODE_UNUSED:
 	case CLOCK_EVT_MODE_SHUTDOWN:
-		ctrl_reg = __raw_readl(timer->base_addr +
-					TTC_CNT_CNTRL_OFFSET);
-		ctrl_reg |= TTC_CNT_CNTRL_DISABLE_MASK;
-		__raw_writel(ctrl_reg,
-				timer->base_addr + TTC_CNT_CNTRL_OFFSET);
+		__raw_writel(0, timer->base_addr + TTC_IER_OFFSET);
 		break;
 	case CLOCK_EVT_MODE_RESUME:
-		ctrl_reg = __raw_readl(timer->base_addr +
-					TTC_CNT_CNTRL_OFFSET);
-		ctrl_reg &= ~TTC_CNT_CNTRL_DISABLE_MASK;
-		__raw_writel(ctrl_reg,
-				timer->base_addr + TTC_CNT_CNTRL_OFFSET);
 		break;
 	}
 }
@@ -287,17 +284,6 @@  static void __init ttc_setup_clocksource(struct clk *clk, void __iomem *base)
 	ttccs->cs.mask = CLOCKSOURCE_MASK(16);
 	ttccs->cs.flags = CLOCK_SOURCE_IS_CONTINUOUS;
 
-	/*
-	 * Setup the clock source counter to be an incrementing counter
-	 * with no interrupt and it rolls over at 0xFFFF. Pre-scale
-	 * it by 32 also. Let it start running now.
-	 */
-	__raw_writel(0x0,  ttccs->ttc.base_addr + TTC_IER_OFFSET);
-	__raw_writel(CLK_CNTRL_PRESCALE | CLK_CNTRL_PRESCALE_EN,
-		     ttccs->ttc.base_addr + TTC_CLK_CNTRL_OFFSET);
-	__raw_writel(CNT_CNTRL_RESET,
-		     ttccs->ttc.base_addr + TTC_CNT_CNTRL_OFFSET);
-
 	err = clocksource_register_hz(&ttccs->cs,
 			clk_get_rate(ttccs->ttc.clk) / PRESCALE);
 	if (WARN_ON(err)) {
@@ -377,16 +363,6 @@  static void __init ttc_setup_clockevent(struct clk *clk,
 	ttcce->ce.irq = irq;
 	ttcce->ce.cpumask = cpu_possible_mask;
 
-	/*
-	 * Setup the clock event timer to be an interval timer which
-	 * is prescaled by 32 using the interval interrupt. Leave it
-	 * disabled for now.
-	 */
-	__raw_writel(0x23, ttcce->ttc.base_addr + TTC_CNT_CNTRL_OFFSET);
-	__raw_writel(CLK_CNTRL_PRESCALE | CLK_CNTRL_PRESCALE_EN,
-		     ttcce->ttc.base_addr + TTC_CLK_CNTRL_OFFSET);
-	__raw_writel(0x1,  ttcce->ttc.base_addr + TTC_IER_OFFSET);
-
 	err = request_irq(irq, ttc_clock_event_interrupt,
 			  IRQF_DISABLED | IRQF_TIMER,
 			  ttcce->ce.name, ttcce);
@@ -409,7 +385,7 @@  static void __init ttc_timer_init(struct device_node *timer)
 {
 	unsigned int irq;
 	void __iomem *timer_baseaddr;
-	struct clk *clk_cs, *clk_ce;
+	struct clk *clk;
 	static int initialized;
 	int clksel;
 
@@ -429,7 +405,7 @@  static void __init ttc_timer_init(struct device_node *timer)
 		BUG();
 	}
 
-	irq = irq_of_parse_and_map(timer, 1);
+	irq = irq_of_parse_and_map(timer, 0);
 	if (irq <= 0) {
 		pr_err("ERROR: invalid interrupt number\n");
 		BUG();
@@ -437,22 +413,21 @@  static void __init ttc_timer_init(struct device_node *timer)
 
 	clksel = __raw_readl(timer_baseaddr + TTC_CLK_CNTRL_OFFSET);
 	clksel = !!(clksel & TTC_CLK_CNTRL_CSRC_MASK);
-	clk_cs = of_clk_get(timer, clksel);
-	if (IS_ERR(clk_cs)) {
+	clk = of_clk_get(timer, clksel);
+	if (IS_ERR(clk)) {
 		pr_err("ERROR: timer input clock not found\n");
 		BUG();
 	}
 
-	clksel = __raw_readl(timer_baseaddr + 4 + TTC_CLK_CNTRL_OFFSET);
-	clksel = !!(clksel & TTC_CLK_CNTRL_CSRC_MASK);
-	clk_ce = of_clk_get(timer, clksel);
-	if (IS_ERR(clk_ce)) {
-		pr_err("ERROR: timer input clock not found\n");
-		BUG();
-	}
+	__raw_writel(CLK_CNTRL_PRESCALE | CLK_CNTRL_PRESCALE_EN,
+			timer_baseaddr + TTC_CLK_CNTRL_OFFSET);
+
+	/* start timer in overflow and match mode */
+	__raw_writel(CNT_CNTRL_RESET | CNT_CNTRL_MATCH,
+			timer_baseaddr + TTC_CNT_CNTRL_OFFSET);
 
-	ttc_setup_clocksource(clk_cs, timer_baseaddr);
-	ttc_setup_clockevent(clk_ce, timer_baseaddr + 4, irq);
+	ttc_setup_clocksource(clk, timer_baseaddr);
+	ttc_setup_clockevent(clk, timer_baseaddr, irq);
 
 	pr_info("%s #0 at %p, irq=%d\n", timer->name, timer_baseaddr, irq);
 }
-- 
1.8.3.4