Message ID | 20190325215632.17013-1-mathieu.poirier@linaro.org (mailing list archive) |
---|---|
Headers | show |
Series | coresight: Add support for CPU-wide trace scenarios | expand |
On Mon, Mar 25, 2019 at 03:56:16PM -0600, Mathieu Poirier wrote: > This is the second revision of a patchset that adds support for CPU-wide > trace scenarios and as such, it is now possible to issue the following > commands: > > # perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND > # perf record -e cs_etm/@20070000.etr/ -a $COMMAND > > The solution is designed to work for both 1:1 and N:1 source/sink > topologies, though the former hasn't been tested for lack of access to HW. > > Most of the changes revolve around allowing more than one event to use > a sink when operated from perf. More specifically the first event to > use a sink switches it on while the last one is tasked to aggregate traces > and switching off the device. > > This is the kernel part of the solution, with the user space portion to be > released in a later set. All patches (user and kernel) have been rebased > on v5.1-rc2 and are hosted here[1]. Everything has been tested on Juno and > 410c dragonboard platforms. FWIW, gave the testing on my Hikey620 board and these patches also work well with below commands: # perf record -e cs_etm/@f6402000.etf/ -a uname # perf record -e cs_etm/@f6402000.etf/ -C 0,4 uname # perf record -e cs_etm/@f6404000.etr/ -a uname # perf record -e cs_etm/@f6404000.etr/ -C 1,2,7 uname P.s. just note here and this is off topic to this patch set, the 'perf report' command took below time for decoding ~4MB CoreSight trace data on my Hikey board, seems we can take a look for decoding speed optimization later: # time perf report --vmlinux /mnt/linux-kernel/linux-next/vmlinux real 2m11.153s user 1m47.979s sys 0m0.204s Thanks, Leo Yan
On Wed, 27 Mar 2019 at 01:52, Leo Yan <leo.yan@linaro.org> wrote: > > On Mon, Mar 25, 2019 at 03:56:16PM -0600, Mathieu Poirier wrote: > > This is the second revision of a patchset that adds support for CPU-wide > > trace scenarios and as such, it is now possible to issue the following > > commands: > > > > # perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND > > # perf record -e cs_etm/@20070000.etr/ -a $COMMAND > > > > The solution is designed to work for both 1:1 and N:1 source/sink > > topologies, though the former hasn't been tested for lack of access to HW. > > > > Most of the changes revolve around allowing more than one event to use > > a sink when operated from perf. More specifically the first event to > > use a sink switches it on while the last one is tasked to aggregate traces > > and switching off the device. > > > > This is the kernel part of the solution, with the user space portion to be > > released in a later set. All patches (user and kernel) have been rebased > > on v5.1-rc2 and are hosted here[1]. Everything has been tested on Juno and > > 410c dragonboard platforms. > > FWIW, gave the testing on my Hikey620 board and these patches also work > well with below commands: > > # perf record -e cs_etm/@f6402000.etf/ -a uname > # perf record -e cs_etm/@f6402000.etf/ -C 0,4 uname > > # perf record -e cs_etm/@f6404000.etr/ -a uname > # perf record -e cs_etm/@f6404000.etr/ -C 1,2,7 uname Should I add your "Tested-by:" to the patches then? > > P.s. just note here and this is off topic to this patch set, the > 'perf report' command took below time for decoding ~4MB CoreSight trace > data on my Hikey board, seems we can take a look for decoding speed > optimization later: > > # time perf report --vmlinux /mnt/linux-kernel/linux-next/vmlinux > real 2m11.153s > user 1m47.979s > sys 0m0.204s > > Thanks, > Leo Yan
On Wed, Mar 27, 2019 at 08:40:18AM -0600, Mathieu Poirier wrote: > On Wed, 27 Mar 2019 at 01:52, Leo Yan <leo.yan@linaro.org> wrote: > > > > On Mon, Mar 25, 2019 at 03:56:16PM -0600, Mathieu Poirier wrote: > > > This is the second revision of a patchset that adds support for CPU-wide > > > trace scenarios and as such, it is now possible to issue the following > > > commands: > > > > > > # perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND > > > # perf record -e cs_etm/@20070000.etr/ -a $COMMAND > > > > > > The solution is designed to work for both 1:1 and N:1 source/sink > > > topologies, though the former hasn't been tested for lack of access to HW. > > > > > > Most of the changes revolve around allowing more than one event to use > > > a sink when operated from perf. More specifically the first event to > > > use a sink switches it on while the last one is tasked to aggregate traces > > > and switching off the device. > > > > > > This is the kernel part of the solution, with the user space portion to be > > > released in a later set. All patches (user and kernel) have been rebased > > > on v5.1-rc2 and are hosted here[1]. Everything has been tested on Juno and > > > 410c dragonboard platforms. > > > > FWIW, gave the testing on my Hikey620 board and these patches also work > > well with below commands: > > > > # perf record -e cs_etm/@f6402000.etf/ -a uname > > # perf record -e cs_etm/@f6402000.etf/ -C 0,4 uname > > > > # perf record -e cs_etm/@f6404000.etr/ -a uname > > # perf record -e cs_etm/@f6404000.etr/ -C 1,2,7 uname > > Should I add your "Tested-by:" to the patches then? Yes. Please add my test tag: Tested-by: Leo Yan <leo.yan@linaro.org> Thanks, Leo Yan
Hi Mathieu, On 25/03/2019 21:56, Mathieu Poirier wrote: > This is the second revision of a patchset that adds support for CPU-wide > trace scenarios and as such, it is now possible to issue the following > commands: > > # perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND > # perf record -e cs_etm/@20070000.etr/ -a $COMMAND > > The solution is designed to work for both 1:1 and N:1 source/sink > topologies, though the former hasn't been tested for lack of access to HW. > > Most of the changes revolve around allowing more than one event to use > a sink when operated from perf. More specifically the first event to > use a sink switches it on while the last one is tasked to aggregate traces > and switching off the device. > > This is the kernel part of the solution, with the user space portion to be > released in a later set. All patches (user and kernel) have been rebased > on v5.1-rc2 and are hosted here[1]. Everything has been tested on Juno and > 410c dragonboard platforms. > > Regards, > Mathieu > > [1]. https://git.linaro.org/people/mathieu.poirier/coresight.git (5.1-rc2-cpu-wide-v2) > I've tested this patch set and the associated perf patches on the HiKey 960 - trace collection and decode appears to work OK. However, in order to get the timestamps in the trace stream, I needed to enable the CoreSight Timestamp generator before starting trace. Without this, all the timestamp packets had a value of 0. This will likely affect other platforms. For testing purposes, I enabled it by poking the control register directly via /dev/mem, but for full support you will need a driver for this component (it's fairly simple - just a single register to write to enable / disable) and entries in the device tree / ACPI tables - it's similar to the helper devices like CTI & CATU which aren't on the trace data path, but are associated with a device that is. Also, the use of a counter to generate the timestamps periodically conflicts with the ETM strobing patch we've been using for AutoFDO. This strobing requires 2 counters and as most ETM implementations only have 2 counters, there is only one available if one is used for timestamps. We'll have to do some investigation to work out a way around this. Regards Rob
Hi Robert, On Thu, 11 Apr 2019 at 12:52, Robert Walker <robert.walker@arm.com> wrote: > > Hi Mathieu, > > On 25/03/2019 21:56, Mathieu Poirier wrote: > > This is the second revision of a patchset that adds support for CPU-wide > > trace scenarios and as such, it is now possible to issue the following > > commands: > > > > # perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND > > # perf record -e cs_etm/@20070000.etr/ -a $COMMAND > > > > The solution is designed to work for both 1:1 and N:1 source/sink > > topologies, though the former hasn't been tested for lack of access to HW. > > > > Most of the changes revolve around allowing more than one event to use > > a sink when operated from perf. More specifically the first event to > > use a sink switches it on while the last one is tasked to aggregate traces > > and switching off the device. > > > > This is the kernel part of the solution, with the user space portion to be > > released in a later set. All patches (user and kernel) have been rebased > > on v5.1-rc2 and are hosted here[1]. Everything has been tested on Juno and > > 410c dragonboard platforms. > > > > Regards, > > Mathieu > > > > [1]. https://git.linaro.org/people/mathieu.poirier/coresight.git (5.1-rc2-cpu-wide-v2) > > > I've tested this patch set and the associated perf patches on the HiKey > 960 - trace collection and decode appears to work OK. However, in order > to get the timestamps in the trace stream, I needed to enable the > CoreSight Timestamp generator before starting trace. Without this, all > the timestamp packets had a value of 0. This will likely affect other > platforms. For testing purposes, I enabled it by poking the control > register directly via /dev/mem, but for full support you will need a > driver for this component (it's fairly simple - just a single register > to write to enable / disable) and entries in the device tree / ACPI > tables - it's similar to the helper devices like CTI & CATU which aren't > on the trace data path, but are associated with a device that is. > Thank you for taking the time to test this. Can I add your "Tested-by:" to the set? Platforms where the timestamp generator needs to explicitly be enabled are slowly emerging - I have heard of the issue on the CS mailing list about a month ago. Since I don't have HW to test the feature it will not be part of this set. > Also, the use of a counter to generate the timestamps periodically > conflicts with the ETM strobing patch we've been using for AutoFDO. > This strobing requires 2 counters and as most ETM implementations only > have 2 counters, there is only one available if one is used for > timestamps. We'll have to do some investigation to work out a way > around this. I noticed that clocks were in short supply and as such added an explicity test to make sure there were enough of them before proceeding. Like topology issues there isn't much we can currently do other than preventing a trace session from moving forward if there isn't enough counters. Thanks, Mathieu > > Regards > > Rob >
Hi Mathieu, On 16/04/2019 20:37, Mathieu Poirier wrote: > Hi Robert, > > On Thu, 11 Apr 2019 at 12:52, Robert Walker <robert.walker@arm.com> wrote: >> Hi Mathieu, >> >> On 25/03/2019 21:56, Mathieu Poirier wrote: >>> This is the second revision of a patchset that adds support for CPU-wide >>> trace scenarios and as such, it is now possible to issue the following >>> commands: >>> >>> # perf record -e cs_etm/@20070000.etr/ -C 2,3 $COMMAND >>> # perf record -e cs_etm/@20070000.etr/ -a $COMMAND >>> >>> The solution is designed to work for both 1:1 and N:1 source/sink >>> topologies, though the former hasn't been tested for lack of access to HW. >>> >>> Most of the changes revolve around allowing more than one event to use >>> a sink when operated from perf. More specifically the first event to >>> use a sink switches it on while the last one is tasked to aggregate traces >>> and switching off the device. >>> >>> This is the kernel part of the solution, with the user space portion to be >>> released in a later set. All patches (user and kernel) have been rebased >>> on v5.1-rc2 and are hosted here[1]. Everything has been tested on Juno and >>> 410c dragonboard platforms. >>> >>> Regards, >>> Mathieu >>> >>> [1]. https://git.linaro.org/people/mathieu.poirier/coresight.git (5.1-rc2-cpu-wide-v2) >>> >> I've tested this patch set and the associated perf patches on the HiKey >> 960 - trace collection and decode appears to work OK. However, in order >> to get the timestamps in the trace stream, I needed to enable the >> CoreSight Timestamp generator before starting trace. Without this, all >> the timestamp packets had a value of 0. This will likely affect other >> platforms. For testing purposes, I enabled it by poking the control >> register directly via /dev/mem, but for full support you will need a >> driver for this component (it's fairly simple - just a single register >> to write to enable / disable) and entries in the device tree / ACPI >> tables - it's similar to the helper devices like CTI & CATU which aren't >> on the trace data path, but are associated with a device that is. >> > Thank you for taking the time to test this. Can I add your > "Tested-by:" to the set? Yes, please do. I've also tested v3 of these patches. > Platforms where the timestamp generator needs to explicitly be enabled > are slowly emerging - I have heard of the issue on the CS mailing list > about a month ago. Since I don't have HW to test the feature it will > not be part of this set. > >> Also, the use of a counter to generate the timestamps periodically >> conflicts with the ETM strobing patch we've been using for AutoFDO. >> This strobing requires 2 counters and as most ETM implementations only >> have 2 counters, there is only one available if one is used for >> timestamps. We'll have to do some investigation to work out a way >> around this. > I noticed that clocks were in short supply and as such added an > explicity test to make sure there were enough of them before > proceeding. Like topology issues there isn't much we can currently do > other than preventing a trace session from moving forward if there > isn't enough counters. > My current thinking on this is that when using the strobing mode for AutoFDO, we only get short bursts of trace from each core and are only interested in following the program flow during that burst, so precise alignment between the instructions streams of each core is less important (and unlikely - we wouldn't expect the strobes of multiple cores to coincide). We get a timestamp as a result of the trace burst starting which is sufficient for coarse alignment of the bursts. So I've reworked my strobing patch to use both counters for the strobing and not enable the timestamp counter when strobing is enabled. Regards Rob