trace: extend trace_clock to support arch_arm clock counter

Message ID	1480666495-26536-1-git-send-email-sramana@codeaurora.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> sender: sramana@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 5206F6126A; Fri, 2 Dec 2016 08:15:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.1 smtp.codeaurora.org 5206F6126A From: Srinivas Ramana <sramana@codeaurora.org> To: catalin.marinas@arm.com, will.deacon@arm.com Subject: [PATCH] trace: extend trace_clock to support arch_arm clock counter Date: Fri, 2 Dec 2016 13:44:55 +0530 Message-Id: <1480666495-26536-1-git-send-email-sramana@codeaurora.org> Precedence: list Cc: linux-arm-msm@vger.kernel.org, Srinivas Ramana <sramana@codeaurora.org>, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Srinivas Ramana Dec. 2, 2016, 8:14 a.m. UTC

Extend the trace_clock to support the arch timer cycle
counter so that we can get the monotonic cycle count
in the traces. This will help in correlating the traces with the
timestamps/events in other subsystems in the soc which share
this common counter for driving their timers.

Signed-off-by: Srinivas Ramana <sramana@codeaurora.org>
---
 arch/arm64/include/asm/Kbuild        |  1 -
 arch/arm64/include/asm/trace_clock.h | 20 ++++++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/trace_clock.h

Will Deacon Dec. 2, 2016, 11:08 a.m. UTC | #1

On Fri, Dec 02, 2016 at 01:44:55PM +0530, Srinivas Ramana wrote:
> Extend the trace_clock to support the arch timer cycle
> counter so that we can get the monotonic cycle count
> in the traces. This will help in correlating the traces with the
> timestamps/events in other subsystems in the soc which share
> this common counter for driving their timers.

I'm not sure I follow this reasoning. What's wrong with nanoseconds? In
particular, the "perf" trace_clock hangs off sched_clock, which should
be backed by the architected counter anyway. What does the cycle counter in
isolation tell you, given that the frequency isn't architected?

I think I'm missing something here.

Will

Srinivas Ramana Dec. 4, 2016, 8:36 a.m. UTC | #2

On 12/02/2016 04:38 PM, Will Deacon wrote:
> On Fri, Dec 02, 2016 at 01:44:55PM +0530, Srinivas Ramana wrote:
>> Extend the trace_clock to support the arch timer cycle
>> counter so that we can get the monotonic cycle count
>> in the traces. This will help in correlating the traces with the
>> timestamps/events in other subsystems in the soc which share
>> this common counter for driving their timers.
>
> I'm not sure I follow this reasoning. What's wrong with nanoseconds? In
> particular, the "perf" trace_clock hangs off sched_clock, which should
> be backed by the architected counter anyway. What does the cycle counter in
> isolation tell you, given that the frequency isn't architected?
>
> I think I'm missing something here.
>
> Will
>

Having cycle counter would help in the cases where we want to correlate 
the time with other subsystems which are outside cpu subsystem. 
local_clock or even the perf track_clock uses sched_clock which gets 
suspended during system suspend. Yes, they are backed up by the 
architected counter but they ignore the cycles spent in suspend. so, 
when comparing with monotonically increasing cycle counter, other clocks 
doesn't help. It seems X86 uses the TSC counter to help such cases.

Thanks,
-- Srinivas R

Will Deacon Dec. 6, 2016, 12:13 p.m. UTC | #3

On Sun, Dec 04, 2016 at 02:06:23PM +0530, Srinivas Ramana wrote:
> On 12/02/2016 04:38 PM, Will Deacon wrote:
> >On Fri, Dec 02, 2016 at 01:44:55PM +0530, Srinivas Ramana wrote:
> >>Extend the trace_clock to support the arch timer cycle
> >>counter so that we can get the monotonic cycle count
> >>in the traces. This will help in correlating the traces with the
> >>timestamps/events in other subsystems in the soc which share
> >>this common counter for driving their timers.
> >
> >I'm not sure I follow this reasoning. What's wrong with nanoseconds? In
> >particular, the "perf" trace_clock hangs off sched_clock, which should
> >be backed by the architected counter anyway. What does the cycle counter in
> >isolation tell you, given that the frequency isn't architected?
> >
> >I think I'm missing something here.
> >
> 
> Having cycle counter would help in the cases where we want to correlate the
> time with other subsystems which are outside cpu subsystem.

Do you have an example of these subsystems? Can they be used to generate
trace data with mainline?

> local_clock or even the perf track_clock uses sched_clock which gets
> suspended during system suspend. Yes, they are backed up by the
> architected counter but they ignore the cycles spent in suspend.i

Does mono_raw solve this (also hangs off the architected counter and is
supported in the vdso)?

> so, when comparing with monotonically increasing cycle counter, other
> clocks doesn't help. It seems X86 uses the TSC counter to help such cases.

Does this mean we need a way to expose the frequency to userspace, too?

Will

Srinivas Ramana Dec. 12, 2016, 5:01 a.m. UTC | #4

On 12/06/2016 05:43 PM, Will Deacon wrote:
> On Sun, Dec 04, 2016 at 02:06:23PM +0530, Srinivas Ramana wrote:
>> On 12/02/2016 04:38 PM, Will Deacon wrote:
>>> On Fri, Dec 02, 2016 at 01:44:55PM +0530, Srinivas Ramana wrote:
>>>> Extend the trace_clock to support the arch timer cycle
>>>> counter so that we can get the monotonic cycle count
>>>> in the traces. This will help in correlating the traces with the
>>>> timestamps/events in other subsystems in the soc which share
>>>> this common counter for driving their timers.
>>>
>>> I'm not sure I follow this reasoning. What's wrong with nanoseconds? In
>>> particular, the "perf" trace_clock hangs off sched_clock, which should
>>> be backed by the architected counter anyway. What does the cycle counter in
>>> isolation tell you, given that the frequency isn't architected?
>>>
>>> I think I'm missing something here.
>>>
>>
>> Having cycle counter would help in the cases where we want to correlate the
>> time with other subsystems which are outside cpu subsystem.
>
> Do you have an example of these subsystems? Can they be used to generate
> trace data with mainline?

Some of the subsystems i can list are Modem(on a mobilephone), GPU or 
video subsystem, or a DSP among others.

>
>> local_clock or even the perf track_clock uses sched_clock which gets
>> suspended during system suspend. Yes, they are backed up by the
>> architected counter but they ignore the cycles spent in suspend.i
>
> Does mono_raw solve this (also hangs off the architected counter and is
> supported in the vdso)?

Doesn't seem like. Any of the existing clock sources are designed not 
show the jump, when there is a suspend and resume. Even though they run 
out of architected counter they just cane give exact correlation with 
the counter. Furthermore, during the initial kernel boot, these just run 
out of jiffies clock source. They also not account for the time spent in 
boot loaders.

>
>> so, when comparing with monotonically increasing cycle counter, other
>> clocks doesn't help. It seems X86 uses the TSC counter to help such cases.
>
> Does this mean we need a way to expose the frequency to userspace, too?

Not really. The CNTFRQ_EL0 of timer subsystem holds the clock frequency 
of system timer and is available to EL0.

>
> Will
>


Thanks,
-- Srinivas R

Will Deacon Dec. 12, 2016, 10:42 a.m. UTC | #5

On Mon, Dec 12, 2016 at 10:31:52AM +0530, Srinivas Ramana wrote:
> On 12/06/2016 05:43 PM, Will Deacon wrote:
> >On Sun, Dec 04, 2016 at 02:06:23PM +0530, Srinivas Ramana wrote:
> >>On 12/02/2016 04:38 PM, Will Deacon wrote:
> >>>On Fri, Dec 02, 2016 at 01:44:55PM +0530, Srinivas Ramana wrote:
> >>>>Extend the trace_clock to support the arch timer cycle
> >>>>counter so that we can get the monotonic cycle count
> >>>>in the traces. This will help in correlating the traces with the
> >>>>timestamps/events in other subsystems in the soc which share
> >>>>this common counter for driving their timers.
> >>>
> >>>I'm not sure I follow this reasoning. What's wrong with nanoseconds? In
> >>>particular, the "perf" trace_clock hangs off sched_clock, which should
> >>>be backed by the architected counter anyway. What does the cycle counter in
> >>>isolation tell you, given that the frequency isn't architected?
> >>>
> >>>I think I'm missing something here.
> >>>
> >>
> >>Having cycle counter would help in the cases where we want to correlate the
> >>time with other subsystems which are outside cpu subsystem.
> >
> >Do you have an example of these subsystems? Can they be used to generate
> >trace data with mainline?
> 
> Some of the subsystems i can list are Modem(on a mobilephone), GPU or video
> subsystem, or a DSP among others.

Oh, you're talking about hardware subsystems. That makes this slightly more
compelling, but I don't think you want the virtual counter here, since
I assume those other subsystems don't take into account CNTVOFF (and I
don't really see how they could, it being a per-cpu thing). So, if you
want to expose the *physical* counter as a trace clock, I think that's
justifiable.

> >>local_clock or even the perf track_clock uses sched_clock which gets
> >>suspended during system suspend. Yes, they are backed up by the
> >>architected counter but they ignore the cycles spent in suspend.i
> >
> >Does mono_raw solve this (also hangs off the architected counter and is
> >supported in the vdso)?
> 
> Doesn't seem like. Any of the existing clock sources are designed not show
> the jump, when there is a suspend and resume. Even though they run out of
> architected counter they just cane give exact correlation with the counter.
> Furthermore, during the initial kernel boot, these just run out of jiffies
> clock source. They also not account for the time spent in boot loaders.

Hmm, there's a thing called CLOCK_BOOTTIME, but I don't think that helps
you when CNTVOFF comes into play.

> >>so, when comparing with monotonically increasing cycle counter, other
> >>clocks doesn't help. It seems X86 uses the TSC counter to help such cases.
> >
> >Does this mean we need a way to expose the frequency to userspace, too?
> 
> Not really. The CNTFRQ_EL0 of timer subsystem holds the clock frequency of
> system timer and is available to EL0.

Experience shows that CNTFRQ_EL0 is often unreliable, and the frequency
can be overridden by the device-tree. There are also systems where the
counter stops ticking across suspend. Whilst both of these can be considered
"broken", I suspect we want runtime buy-in from the arch-timer driver
before registering this trace_clock.

Will

Srinivas Ramana Dec. 15, 2016, 1:16 p.m. UTC | #6

On 12/12/2016 04:12 PM, Will Deacon wrote:
> On Mon, Dec 12, 2016 at 10:31:52AM +0530, Srinivas Ramana wrote:
>> On 12/06/2016 05:43 PM, Will Deacon wrote:
>>> On Sun, Dec 04, 2016 at 02:06:23PM +0530, Srinivas Ramana wrote:
>>>> On 12/02/2016 04:38 PM, Will Deacon wrote:
>>>>> On Fri, Dec 02, 2016 at 01:44:55PM +0530, Srinivas Ramana wrote:
>>>>>> Extend the trace_clock to support the arch timer cycle
>>>>>> counter so that we can get the monotonic cycle count
>>>>>> in the traces. This will help in correlating the traces with the
>>>>>> timestamps/events in other subsystems in the soc which share
>>>>>> this common counter for driving their timers.
>>>>>
>>>>> I'm not sure I follow this reasoning. What's wrong with nanoseconds? In
>>>>> particular, the "perf" trace_clock hangs off sched_clock, which should
>>>>> be backed by the architected counter anyway. What does the cycle counter in
>>>>> isolation tell you, given that the frequency isn't architected?
>>>>>
>>>>> I think I'm missing something here.
>>>>>
>>>>
>>>> Having cycle counter would help in the cases where we want to correlate the
>>>> time with other subsystems which are outside cpu subsystem.
>>>
>>> Do you have an example of these subsystems? Can they be used to generate
>>> trace data with mainline?
>>
>> Some of the subsystems i can list are Modem(on a mobilephone), GPU or video
>> subsystem, or a DSP among others.
>
> Oh, you're talking about hardware subsystems. That makes this slightly more
> compelling, but I don't think you want the virtual counter here, since
> I assume those other subsystems don't take into account CNTVOFF (and I
> don't really see how they could, it being a per-cpu thing). So, if you
> want to expose the *physical* counter as a trace clock, I think that's
> justifiable.
>
Yes, I meant HW subsystems. Sorry if I was not clear.
In ARM64, it seems the access to physical counter is removed with commit 
"clocksource: arch_timer: Fix code to use physical timers when 
requested". Only ARM (32) is allowed to used physical counter in the 
current timer API. It seems only EL2 is supposed to access this. But 
yes, if there is an offset, it seems it would be difficult to get the 
exact value at EL0. However for systems where CNTVOFF is '0', this will 
work seamless. This clock would not be the default anyways and is 
optional. Local clock would continue to be the default for traces.

>>>> local_clock or even the perf track_clock uses sched_clock which gets
>>>> suspended during system suspend. Yes, they are backed up by the
>>>> architected counter but they ignore the cycles spent in suspend.i
>>>
>>> Does mono_raw solve this (also hangs off the architected counter and is
>>> supported in the vdso)?
>>
>> Doesn't seem like. Any of the existing clock sources are designed not show
>> the jump, when there is a suspend and resume. Even though they run out of
>> architected counter they just cane give exact correlation with the counter.
>> Furthermore, during the initial kernel boot, these just run out of jiffies
>> clock source. They also not account for the time spent in boot loaders.
>
> Hmm, there's a thing called CLOCK_BOOTTIME, but I don't think that helps
> you when CNTVOFF comes into play.
>
CLOCK_BOOTTIME includes the time spent in suspend. But this also doesn't 
give exact counter value since power ON. So for the purpose of comparing 
with global counter, this would not help.

>>>> so, when comparing with monotonically increasing cycle counter, other
>>>> clocks doesn't help. It seems X86 uses the TSC counter to help such cases.
>>>
>>> Does this mean we need a way to expose the frequency to userspace, too?
>>
>> Not really. The CNTFRQ_EL0 of timer subsystem holds the clock frequency of
>> system timer and is available to EL0.
>
> Experience shows that CNTFRQ_EL0 is often unreliable, and the frequency
> can be overridden by the device-tree. There are also systems where the
> counter stops ticking across suspend. Whilst both of these can be considered
> "broken", I suspect we want runtime buy-in from the arch-timer driver
> before registering this trace_clock.

Agree. It doesnt seem like architecture mandates initializing this.
For those systems where tick would stop, if not arch counter, i assume 
there is some counter which falls in 'always ON' domain without which 
they cant keep track of time.

Adding Mark Rutland and Marc Zyngier for help with this.

Thanks,
-- Srinivas R

Will Deacon Dec. 20, 2016, 5:04 p.m. UTC | #7

On Thu, Dec 15, 2016 at 06:46:09PM +0530, Srinivas Ramana wrote:
> On 12/12/2016 04:12 PM, Will Deacon wrote:
> >On Mon, Dec 12, 2016 at 10:31:52AM +0530, Srinivas Ramana wrote:
> >>On 12/06/2016 05:43 PM, Will Deacon wrote:
> >>>On Sun, Dec 04, 2016 at 02:06:23PM +0530, Srinivas Ramana wrote:
> >>>>On 12/02/2016 04:38 PM, Will Deacon wrote:
> >>>>>On Fri, Dec 02, 2016 at 01:44:55PM +0530, Srinivas Ramana wrote:
> >>>>>>Extend the trace_clock to support the arch timer cycle
> >>>>>>counter so that we can get the monotonic cycle count
> >>>>>>in the traces. This will help in correlating the traces with the
> >>>>>>timestamps/events in other subsystems in the soc which share
> >>>>>>this common counter for driving their timers.
> >>>>>
> >>>>>I'm not sure I follow this reasoning. What's wrong with nanoseconds? In
> >>>>>particular, the "perf" trace_clock hangs off sched_clock, which should
> >>>>>be backed by the architected counter anyway. What does the cycle counter in
> >>>>>isolation tell you, given that the frequency isn't architected?
> >>>>>
> >>>>>I think I'm missing something here.
> >>>>>
> >>>>
> >>>>Having cycle counter would help in the cases where we want to correlate the
> >>>>time with other subsystems which are outside cpu subsystem.
> >>>
> >>>Do you have an example of these subsystems? Can they be used to generate
> >>>trace data with mainline?
> >>
> >>Some of the subsystems i can list are Modem(on a mobilephone), GPU or video
> >>subsystem, or a DSP among others.
> >
> >Oh, you're talking about hardware subsystems. That makes this slightly more
> >compelling, but I don't think you want the virtual counter here, since
> >I assume those other subsystems don't take into account CNTVOFF (and I
> >don't really see how they could, it being a per-cpu thing). So, if you
> >want to expose the *physical* counter as a trace clock, I think that's
> >justifiable.
> >
> Yes, I meant HW subsystems. Sorry if I was not clear.
> In ARM64, it seems the access to physical counter is removed with commit
> "clocksource: arch_timer: Fix code to use physical timers when requested".
> Only ARM (32) is allowed to used physical counter in the current timer API.
> It seems only EL2 is supposed to access this. But yes, if there is an
> offset, it seems it would be difficult to get the exact value at EL0.
> However for systems where CNTVOFF is '0', this will work seamless. This
> clock would not be the default anyways and is optional. Local clock would
> continue to be the default for traces.

That still doesn't sound useful to userspace. I think we need to expose
the clock only in the cases where it's useful, so restricting it to the
physical counter is the right thing to do.

> >>>>local_clock or even the perf track_clock uses sched_clock which gets
> >>>>suspended during system suspend. Yes, they are backed up by the
> >>>>architected counter but they ignore the cycles spent in suspend.i
> >>>
> >>>Does mono_raw solve this (also hangs off the architected counter and is
> >>>supported in the vdso)?
> >>
> >>Doesn't seem like. Any of the existing clock sources are designed not show
> >>the jump, when there is a suspend and resume. Even though they run out of
> >>architected counter they just cane give exact correlation with the counter.
> >>Furthermore, during the initial kernel boot, these just run out of jiffies
> >>clock source. They also not account for the time spent in boot loaders.
> >
> >Hmm, there's a thing called CLOCK_BOOTTIME, but I don't think that helps
> >you when CNTVOFF comes into play.
> >
> CLOCK_BOOTTIME includes the time spent in suspend. But this also doesn't
> give exact counter value since power ON. So for the purpose of comparing
> with global counter, this would not help.
> 
> >>>>so, when comparing with monotonically increasing cycle counter, other
> >>>>clocks doesn't help. It seems X86 uses the TSC counter to help such cases.
> >>>
> >>>Does this mean we need a way to expose the frequency to userspace, too?
> >>
> >>Not really. The CNTFRQ_EL0 of timer subsystem holds the clock frequency of
> >>system timer and is available to EL0.
> >
> >Experience shows that CNTFRQ_EL0 is often unreliable, and the frequency
> >can be overridden by the device-tree. There are also systems where the
> >counter stops ticking across suspend. Whilst both of these can be considered
> >"broken", I suspect we want runtime buy-in from the arch-timer driver
> >before registering this trace_clock.
> 
> Agree. It doesnt seem like architecture mandates initializing this.
> For those systems where tick would stop, if not arch counter, i assume there
> is some counter which falls in 'always ON' domain without which they cant
> keep track of time.

We just need to avoid exposing this trace clock if the frequency was
provided by firmware.

Will

Stephen Boyd Dec. 30, 2016, 7:15 p.m. UTC | #8

On 12/20, Will Deacon wrote:
> On Thu, Dec 15, 2016 at 06:46:09PM +0530, Srinivas Ramana wrote:
> > On 12/12/2016 04:12 PM, Will Deacon wrote:
> > >On Mon, Dec 12, 2016 at 10:31:52AM +0530, Srinivas Ramana wrote:
> > >>On 12/06/2016 05:43 PM, Will Deacon wrote:
> > >>>Does this mean we need a way to expose the frequency to userspace, too?
> > >>
> > >>Not really. The CNTFRQ_EL0 of timer subsystem holds the clock frequency of
> > >>system timer and is available to EL0.
> > >
> > >Experience shows that CNTFRQ_EL0 is often unreliable, and the frequency
> > >can be overridden by the device-tree. There are also systems where the
> > >counter stops ticking across suspend. Whilst both of these can be considered
> > >"broken", I suspect we want runtime buy-in from the arch-timer driver
> > >before registering this trace_clock.
> > 
> > Agree. It doesnt seem like architecture mandates initializing this.
> > For those systems where tick would stop, if not arch counter, i assume there
> > is some counter which falls in 'always ON' domain without which they cant
> > keep track of time.
> 
> We just need to avoid exposing this trace clock if the frequency was
> provided by firmware.
> 

We would need to know the frequency if we wanted to convert the
counter values into seconds. In our case, we don't really care to
do that. All we want to do is compare events in the ftrace log
with events in other hw subsystem logs. If we have the raw
counter value there then it makes it simple to compare the two
and debug problems. Now that isn't to say that it would be useful
to convert the counter value into seconds, but it doesn't look to
be a prerequisite of registering the trace clock.

If we want to expose the counter frequency to userspace we could
make a sysfs attribute for that and have userspace rely on it
instead of CNTFRQ_EL0. Or if we can make CNTFRQ_EL0 accesses trap
(forgive me for not looking at the ARM ARM right now) we can
emulate it based on the DT property.

And for systems where the counter stops during suspend, I imagine
the only problem would be tracing across suspend would show a
clock that doesn't keep counting while suspended. sched_clock()
already exhibits that behavior, so I'm not sure we've lost
anything here.

trace: extend trace_clock to support arch_arm clock counter

Commit Message

Comments

Patch