mbox series

[v2,00/16] coresight: Add support for CPU-wide trace scenarios

Message ID 20190325215632.17013-1-mathieu.poirier@linaro.org (mailing list archive)
Headers show
Series coresight: Add support for CPU-wide trace scenarios | expand

Message

Mathieu Poirier March 25, 2019, 9:56 p.m. UTC
This is the second revision of a patchset that adds support for CPU-wide
trace scenarios and as such, it is now possible to issue the following
commands:

	# perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND
	# perf record -e cs_etm/@20070000.etr/ -a $COMMAND

The solution is designed to work for both 1:1 and N:1 source/sink
topologies, though the former hasn't been tested for lack of access to HW.

Most of the changes revolve around allowing more than one event to use
a sink when operated from perf.  More specifically the first event to
use a sink switches it on while the last one is tasked to aggregate traces
and switching off the device.

This is the kernel part of the solution, with the user space portion to be
released in a later set.  All patches (user and kernel) have been rebased
on v5.1-rc2 and are hosted here[1].  Everything has been tested on Juno and
410c dragonboard platforms.

Regards,
Mathieu

[1]. https://git.linaro.org/people/mathieu.poirier/coresight.git (5.1-rc2-cpu-wide-v2) 

== Changes for V2 ==
* Using define ETM4_CFG_BIT_CTXTID rather than hard coded value (Suzuki).
* Moved pid out of struct etr_buf and into struct etr_perf_buffer (Suzuki).
* Removed code related to forcing double buffering (Suzuki).
* Fixed function reallocarray() for older distributions (Mike).
* Fixed counter configuration when dealing with errors(Leo).
* Automatically selecting PID_IN_CONTEXTIDR with ETMv4 driver.
* Rebased to v5.1-rc2.

Mathieu Poirier (16):
  coresight: pmu: Adding ITRACE property to cs_etm PMU
  coresight: etm4x: Add kernel configuration for CONTEXTID
  coresight: etm4x: Configure tracers to emit timestamps
  coresight: Adding return code to sink::disable() operation
  coresight: Move reference counting inside sink drivers
  coresight: Properly address errors in sink::disable() functions
  coresight: Properly address concurrency in sink::update() functions
  coresight: perf: Clean up function etm_setup_aux()
  coresight: perf: Refactor function free_event_data()
  coresight: Communicate perf event to sink buffer allocation function
  coresight: tmc-etr: Refactor function tmc_etr_setup_perf_buf()
  coresight: tmc-etr: Introduce the notion of process ID to ETR devices
  coresight: tmc-etr: Allow events to use the same ETR buffer
  coresight: tmc-etr: Add support for CPU-wide trace scenarios
  coresight: tmc-etf: Add support for CPU-wide trace scenarios
  coresight: etb10: Add support for CPU-wide trace scenarios

 drivers/hwtracing/coresight/Kconfig           |   1 +
 drivers/hwtracing/coresight/coresight-etb10.c |  83 +++++--
 .../hwtracing/coresight/coresight-etm-perf.c  |  37 +++-
 drivers/hwtracing/coresight/coresight-etm4x.c | 120 ++++++++++-
 .../hwtracing/coresight/coresight-tmc-etf.c   |  82 +++++--
 .../hwtracing/coresight/coresight-tmc-etr.c   | 204 +++++++++++++++---
 drivers/hwtracing/coresight/coresight-tmc.c   |   2 +
 drivers/hwtracing/coresight/coresight-tmc.h   |   6 +
 drivers/hwtracing/coresight/coresight-tpiu.c  |   9 +-
 drivers/hwtracing/coresight/coresight.c       |  28 +--
 include/linux/coresight-pmu.h                 |   2 +
 include/linux/coresight.h                     |   7 +-
 tools/include/linux/coresight-pmu.h           |   2 +
 13 files changed, 482 insertions(+), 101 deletions(-)

Comments

Leo Yan March 27, 2019, 7:52 a.m. UTC | #1
On Mon, Mar 25, 2019 at 03:56:16PM -0600, Mathieu Poirier wrote:
> This is the second revision of a patchset that adds support for CPU-wide
> trace scenarios and as such, it is now possible to issue the following
> commands:
> 
> 	# perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND
> 	# perf record -e cs_etm/@20070000.etr/ -a $COMMAND
> 
> The solution is designed to work for both 1:1 and N:1 source/sink
> topologies, though the former hasn't been tested for lack of access to HW.
> 
> Most of the changes revolve around allowing more than one event to use
> a sink when operated from perf.  More specifically the first event to
> use a sink switches it on while the last one is tasked to aggregate traces
> and switching off the device.
> 
> This is the kernel part of the solution, with the user space portion to be
> released in a later set.  All patches (user and kernel) have been rebased
> on v5.1-rc2 and are hosted here[1].  Everything has been tested on Juno and
> 410c dragonboard platforms.

FWIW, gave the testing on my Hikey620 board and these patches also work
well with below commands:

  # perf record -e cs_etm/@f6402000.etf/ -a uname
  # perf record -e cs_etm/@f6402000.etf/ -C 0,4 uname

  # perf record -e cs_etm/@f6404000.etr/ -a uname
  # perf record -e cs_etm/@f6404000.etr/ -C 1,2,7 uname

P.s. just note here and this is off topic to this patch set, the
'perf report' command took below time for decoding ~4MB CoreSight trace
data on my Hikey board, seems we can take a look for decoding speed
optimization later:

  # time perf report  --vmlinux /mnt/linux-kernel/linux-next/vmlinux
  real    2m11.153s
  user    1m47.979s
  sys     0m0.204s

Thanks,
Leo Yan
Mathieu Poirier March 27, 2019, 2:40 p.m. UTC | #2
On Wed, 27 Mar 2019 at 01:52, Leo Yan <leo.yan@linaro.org> wrote:
>
> On Mon, Mar 25, 2019 at 03:56:16PM -0600, Mathieu Poirier wrote:
> > This is the second revision of a patchset that adds support for CPU-wide
> > trace scenarios and as such, it is now possible to issue the following
> > commands:
> >
> >       # perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND
> >       # perf record -e cs_etm/@20070000.etr/ -a $COMMAND
> >
> > The solution is designed to work for both 1:1 and N:1 source/sink
> > topologies, though the former hasn't been tested for lack of access to HW.
> >
> > Most of the changes revolve around allowing more than one event to use
> > a sink when operated from perf.  More specifically the first event to
> > use a sink switches it on while the last one is tasked to aggregate traces
> > and switching off the device.
> >
> > This is the kernel part of the solution, with the user space portion to be
> > released in a later set.  All patches (user and kernel) have been rebased
> > on v5.1-rc2 and are hosted here[1].  Everything has been tested on Juno and
> > 410c dragonboard platforms.
>
> FWIW, gave the testing on my Hikey620 board and these patches also work
> well with below commands:
>
>   # perf record -e cs_etm/@f6402000.etf/ -a uname
>   # perf record -e cs_etm/@f6402000.etf/ -C 0,4 uname
>
>   # perf record -e cs_etm/@f6404000.etr/ -a uname
>   # perf record -e cs_etm/@f6404000.etr/ -C 1,2,7 uname

Should I add your "Tested-by:" to the patches then?

>
> P.s. just note here and this is off topic to this patch set, the
> 'perf report' command took below time for decoding ~4MB CoreSight trace
> data on my Hikey board, seems we can take a look for decoding speed
> optimization later:
>
>   # time perf report  --vmlinux /mnt/linux-kernel/linux-next/vmlinux
>   real    2m11.153s
>   user    1m47.979s
>   sys     0m0.204s
>
> Thanks,
> Leo Yan
Leo Yan March 27, 2019, 2:44 p.m. UTC | #3
On Wed, Mar 27, 2019 at 08:40:18AM -0600, Mathieu Poirier wrote:
> On Wed, 27 Mar 2019 at 01:52, Leo Yan <leo.yan@linaro.org> wrote:
> >
> > On Mon, Mar 25, 2019 at 03:56:16PM -0600, Mathieu Poirier wrote:
> > > This is the second revision of a patchset that adds support for CPU-wide
> > > trace scenarios and as such, it is now possible to issue the following
> > > commands:
> > >
> > >       # perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND
> > >       # perf record -e cs_etm/@20070000.etr/ -a $COMMAND
> > >
> > > The solution is designed to work for both 1:1 and N:1 source/sink
> > > topologies, though the former hasn't been tested for lack of access to HW.
> > >
> > > Most of the changes revolve around allowing more than one event to use
> > > a sink when operated from perf.  More specifically the first event to
> > > use a sink switches it on while the last one is tasked to aggregate traces
> > > and switching off the device.
> > >
> > > This is the kernel part of the solution, with the user space portion to be
> > > released in a later set.  All patches (user and kernel) have been rebased
> > > on v5.1-rc2 and are hosted here[1].  Everything has been tested on Juno and
> > > 410c dragonboard platforms.
> >
> > FWIW, gave the testing on my Hikey620 board and these patches also work
> > well with below commands:
> >
> >   # perf record -e cs_etm/@f6402000.etf/ -a uname
> >   # perf record -e cs_etm/@f6402000.etf/ -C 0,4 uname
> >
> >   # perf record -e cs_etm/@f6404000.etr/ -a uname
> >   # perf record -e cs_etm/@f6404000.etr/ -C 1,2,7 uname
> 
> Should I add your "Tested-by:" to the patches then?

Yes.  Please add my test tag:

Tested-by: Leo Yan <leo.yan@linaro.org>

Thanks,
Leo Yan
Robert Walker April 11, 2019, 6:52 p.m. UTC | #4
Hi Mathieu,

On 25/03/2019 21:56, Mathieu Poirier wrote:
> This is the second revision of a patchset that adds support for CPU-wide
> trace scenarios and as such, it is now possible to issue the following
> commands:
>
> 	# perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND
> 	# perf record -e cs_etm/@20070000.etr/ -a $COMMAND
>
> The solution is designed to work for both 1:1 and N:1 source/sink
> topologies, though the former hasn't been tested for lack of access to HW.
>
> Most of the changes revolve around allowing more than one event to use
> a sink when operated from perf.  More specifically the first event to
> use a sink switches it on while the last one is tasked to aggregate traces
> and switching off the device.
>
> This is the kernel part of the solution, with the user space portion to be
> released in a later set.  All patches (user and kernel) have been rebased
> on v5.1-rc2 and are hosted here[1].  Everything has been tested on Juno and
> 410c dragonboard platforms.
>
> Regards,
> Mathieu
>
> [1]. https://git.linaro.org/people/mathieu.poirier/coresight.git (5.1-rc2-cpu-wide-v2)
>
I've tested this patch set and the associated perf patches on the HiKey 
960 - trace collection and decode appears to work OK. However, in order 
to get the timestamps in the trace stream, I needed to enable the 
CoreSight Timestamp generator before starting trace.  Without this, all 
the timestamp packets had a value of 0. This will likely affect other 
platforms.  For testing purposes, I enabled it by poking the control 
register directly via /dev/mem, but for full support you will need a 
driver for this component (it's fairly simple - just a single register 
to write to enable / disable) and entries in the device tree / ACPI 
tables - it's similar to the helper devices like CTI & CATU which aren't 
on the trace data path, but are associated with a device that is.

Also, the use of a counter to generate the timestamps periodically 
conflicts with the ETM strobing patch we've been using for AutoFDO.  
This strobing requires 2 counters and as most ETM implementations only 
have 2 counters, there is only one available if one is used for 
timestamps.  We'll have to do some investigation to work out a way 
around this.

Regards

Rob
Mathieu Poirier April 16, 2019, 7:37 p.m. UTC | #5
Hi Robert,

On Thu, 11 Apr 2019 at 12:52, Robert Walker <robert.walker@arm.com> wrote:
>
> Hi Mathieu,
>
> On 25/03/2019 21:56, Mathieu Poirier wrote:
> > This is the second revision of a patchset that adds support for CPU-wide
> > trace scenarios and as such, it is now possible to issue the following
> > commands:
> >
> >       # perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND
> >       # perf record -e cs_etm/@20070000.etr/ -a $COMMAND
> >
> > The solution is designed to work for both 1:1 and N:1 source/sink
> > topologies, though the former hasn't been tested for lack of access to HW.
> >
> > Most of the changes revolve around allowing more than one event to use
> > a sink when operated from perf.  More specifically the first event to
> > use a sink switches it on while the last one is tasked to aggregate traces
> > and switching off the device.
> >
> > This is the kernel part of the solution, with the user space portion to be
> > released in a later set.  All patches (user and kernel) have been rebased
> > on v5.1-rc2 and are hosted here[1].  Everything has been tested on Juno and
> > 410c dragonboard platforms.
> >
> > Regards,
> > Mathieu
> >
> > [1]. https://git.linaro.org/people/mathieu.poirier/coresight.git (5.1-rc2-cpu-wide-v2)
> >
> I've tested this patch set and the associated perf patches on the HiKey
> 960 - trace collection and decode appears to work OK. However, in order
> to get the timestamps in the trace stream, I needed to enable the
> CoreSight Timestamp generator before starting trace.  Without this, all
> the timestamp packets had a value of 0. This will likely affect other
> platforms.  For testing purposes, I enabled it by poking the control
> register directly via /dev/mem, but for full support you will need a
> driver for this component (it's fairly simple - just a single register
> to write to enable / disable) and entries in the device tree / ACPI
> tables - it's similar to the helper devices like CTI & CATU which aren't
> on the trace data path, but are associated with a device that is.
>

Thank you for taking the time to test this.  Can I add your
"Tested-by:" to the set?

Platforms where the timestamp generator needs to explicitly be enabled
are slowly emerging - I have heard of the issue on the CS mailing list
about a month ago.  Since I don't have HW to test the feature it will
not be part of this set.

> Also, the use of a counter to generate the timestamps periodically
> conflicts with the ETM strobing patch we've been using for AutoFDO.
> This strobing requires 2 counters and as most ETM implementations only
> have 2 counters, there is only one available if one is used for
> timestamps.  We'll have to do some investigation to work out a way
> around this.

I noticed that clocks were in short supply and as such added an
explicity test to make sure there were enough of them before
proceeding.  Like topology issues there isn't much we can currently do
other than preventing a trace session from moving forward if there
isn't enough counters.

Thanks,
Mathieu

>
> Regards
>
> Rob
>
Robert Walker April 24, 2019, 4:22 p.m. UTC | #6
Hi Mathieu,

On 16/04/2019 20:37, Mathieu Poirier wrote:
> Hi Robert,
>
> On Thu, 11 Apr 2019 at 12:52, Robert Walker <robert.walker@arm.com> wrote:
>> Hi Mathieu,
>>
>> On 25/03/2019 21:56, Mathieu Poirier wrote:
>>> This is the second revision of a patchset that adds support for CPU-wide
>>> trace scenarios and as such, it is now possible to issue the following
>>> commands:
>>>
>>>        # perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND
>>>        # perf record -e cs_etm/@20070000.etr/ -a $COMMAND
>>>
>>> The solution is designed to work for both 1:1 and N:1 source/sink
>>> topologies, though the former hasn't been tested for lack of access to HW.
>>>
>>> Most of the changes revolve around allowing more than one event to use
>>> a sink when operated from perf.  More specifically the first event to
>>> use a sink switches it on while the last one is tasked to aggregate traces
>>> and switching off the device.
>>>
>>> This is the kernel part of the solution, with the user space portion to be
>>> released in a later set.  All patches (user and kernel) have been rebased
>>> on v5.1-rc2 and are hosted here[1].  Everything has been tested on Juno and
>>> 410c dragonboard platforms.
>>>
>>> Regards,
>>> Mathieu
>>>
>>> [1]. https://git.linaro.org/people/mathieu.poirier/coresight.git (5.1-rc2-cpu-wide-v2)
>>>
>> I've tested this patch set and the associated perf patches on the HiKey
>> 960 - trace collection and decode appears to work OK. However, in order
>> to get the timestamps in the trace stream, I needed to enable the
>> CoreSight Timestamp generator before starting trace.  Without this, all
>> the timestamp packets had a value of 0. This will likely affect other
>> platforms.  For testing purposes, I enabled it by poking the control
>> register directly via /dev/mem, but for full support you will need a
>> driver for this component (it's fairly simple - just a single register
>> to write to enable / disable) and entries in the device tree / ACPI
>> tables - it's similar to the helper devices like CTI & CATU which aren't
>> on the trace data path, but are associated with a device that is.
>>
> Thank you for taking the time to test this.  Can I add your
> "Tested-by:" to the set?

Yes, please do.  I've also tested v3 of these patches.

> Platforms where the timestamp generator needs to explicitly be enabled
> are slowly emerging - I have heard of the issue on the CS mailing list
> about a month ago.  Since I don't have HW to test the feature it will
> not be part of this set.
>
>> Also, the use of a counter to generate the timestamps periodically
>> conflicts with the ETM strobing patch we've been using for AutoFDO.
>> This strobing requires 2 counters and as most ETM implementations only
>> have 2 counters, there is only one available if one is used for
>> timestamps.  We'll have to do some investigation to work out a way
>> around this.
> I noticed that clocks were in short supply and as such added an
> explicity test to make sure there were enough of them before
> proceeding.  Like topology issues there isn't much we can currently do
> other than preventing a trace session from moving forward if there
> isn't enough counters.
>
My current thinking on this is that when using the strobing mode for 
AutoFDO, we only get short bursts of trace from each core and are only 
interested in following the program flow during that burst, so precise 
alignment between the instructions streams of each core is less 
important (and unlikely - we wouldn't expect the strobes of multiple 
cores to coincide).  We get a timestamp as a result of the trace burst 
starting which is sufficient for coarse alignment of the bursts.  So 
I've reworked my strobing patch to use both counters for the strobing 
and not enable the timestamp counter when strobing is enabled.

Regards


Rob