mbox series

[RFC,00/11] arm64: coresight: Enable ETE and TRBE

Message ID 1605012309-24812-1-git-send-email-anshuman.khandual@arm.com (mailing list archive)
Headers show
Series arm64: coresight: Enable ETE and TRBE | expand

Message

Anshuman Khandual Nov. 10, 2020, 12:44 p.m. UTC
This series enables future IP trace features Embedded Trace Extension (ETE)
and Trace Buffer Extension (TRBE). This series depends on the ETM system
register instruction support series [0] and the v8.4 Self hosted tracing
support series (Jonathan Zhou) [1]. The tree is available here [2] for
quick access.

ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
extensions. ETE overlaps with the ETMv4 architecture, with additions to
support the newer architecture features and some restrictions on the
supported features w.r.t ETMv4. The ETE support is added by extending the
ETMv4 driver to recognise the ETE and handle the features as exposed by the
TRCIDRx registers. ETE only supports system instructions access from the
host CPU. The ETE could be integrated with a TRBE (see below), or with the
legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
description as the ETMs and requires a node per instance. 

Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
accessible via the system registers and can be combined with the ETE to
provide a 1x1 configuration of source & sink. TRBE is being represented
here as a CoreSight sink. Primary reason is that the ETE source could work
with other traditional CoreSight sink devices. As TRBE captures the trace
data which is produced by ETE, it cannot work alone.

TRBE representation here have some distinct deviations from a traditional
CoreSight sink device. Coresight path between ETE and TRBE are not built
during boot looking at respective DT or ACPI entries. Instead TRBE gets
checked on each available CPU, when found gets connected with respective
ETE source device on the same CPU, after altering its outward connections.
ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
coupling/decoupling method implemented here is not optimal and would be
reworked later on.

Unlike traditional sinks, TRBE can generate interrupts to signal including
many other things, buffer got filled. The interrupt is a PPI and should be
communicated from the platform. DT or ACPI entry representing TRBE should
have the PPI number for a given platform. During perf session, the TRBE IRQ
handler should capture trace for perf auxiliary buffer before restarting it
back. System registers being used here to configure ETE and TRBE could be
referred in the link below.

https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.

This adds another change where CoreSight sink device needs to be disabled
before capturing the trace data for perf in order to avoid race condition
with another simultaneous TRBE IRQ handling. This might cause problem with
traditional sink devices which can be operated in both sysfs and perf mode.
This needs to be addressed correctly. One option would be to move the
update_buffer callback into the respective sink devices. e.g, disable().

This series is primarily looking from some early feed back both on proposed
design and its implementation. It acknowledges, that it might be incomplete
and will have scopes for improvement.

Things todo:
- Improve ETE-TRBE coupling and decoupling method
- Improve TRBE IRQ handling for all possible corner cases
- Implement sysfs based trace sessions

[0] https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.poulose@arm.com/
[1] https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-jonathan.zhouwen@huawei.com/ 
[2] https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-hosted

Anshuman Khandual (6):
  arm64: Add TRBE definitions
  coresight: sink: Add TRBE driver
  coresight: etm-perf: Truncate the perf record if handle has no space
  coresight: etm-perf: Disable the path before capturing the trace data
  coresgith: etm-perf: Connect TRBE sink with ETE source
  dts: bindings: Document device tree binding for Arm TRBE

Suzuki K Poulose (5):
  coresight: etm-perf: Allow an event to use different sinks
  coresight: Do not scan for graph if none is present
  coresight: etm4x: Add support for PE OS lock
  coresight: ete: Add support for sysreg support
  coresight: ete: Detect ETE as one of the supported ETMs

 .../devicetree/bindings/arm/coresight.txt          |   3 +
 Documentation/devicetree/bindings/arm/trbe.txt     |  20 +
 Documentation/trace/coresight/coresight-trbe.rst   |  36 +
 arch/arm64/include/asm/sysreg.h                    |  51 ++
 drivers/hwtracing/coresight/Kconfig                |  11 +
 drivers/hwtracing/coresight/Makefile               |   1 +
 drivers/hwtracing/coresight/coresight-etm-perf.c   |  85 ++-
 drivers/hwtracing/coresight/coresight-etm-perf.h   |   4 +
 drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++-
 drivers/hwtracing/coresight/coresight-etm4x.h      |  64 +-
 drivers/hwtracing/coresight/coresight-platform.c   |   9 +-
 drivers/hwtracing/coresight/coresight-trbe.c       | 768 +++++++++++++++++++++
 drivers/hwtracing/coresight/coresight-trbe.h       | 525 ++++++++++++++
 include/linux/coresight.h                          |   2 +
 14 files changed, 1680 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
 create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
 create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
 create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h

Comments

Mathieu Poirier Nov. 10, 2020, 6:25 p.m. UTC | #1
Hi Anshuman,

On Tue, 10 Nov 2020 at 05:45, Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
> This series enables future IP trace features Embedded Trace Extension (ETE)
> and Trace Buffer Extension (TRBE). This series depends on the ETM system
> register instruction support series [0] and the v8.4 Self hosted tracing
> support series (Jonathan Zhou) [1]. The tree is available here [2] for
> quick access.
>
> ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
> extensions. ETE overlaps with the ETMv4 architecture, with additions to
> support the newer architecture features and some restrictions on the
> supported features w.r.t ETMv4. The ETE support is added by extending the
> ETMv4 driver to recognise the ETE and handle the features as exposed by the
> TRCIDRx registers. ETE only supports system instructions access from the
> host CPU. The ETE could be integrated with a TRBE (see below), or with the
> legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
> description as the ETMs and requires a node per instance.
>
> Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
> accessible via the system registers and can be combined with the ETE to
> provide a 1x1 configuration of source & sink. TRBE is being represented
> here as a CoreSight sink. Primary reason is that the ETE source could work
> with other traditional CoreSight sink devices. As TRBE captures the trace
> data which is produced by ETE, it cannot work alone.
>
> TRBE representation here have some distinct deviations from a traditional
> CoreSight sink device. Coresight path between ETE and TRBE are not built
> during boot looking at respective DT or ACPI entries. Instead TRBE gets
> checked on each available CPU, when found gets connected with respective
> ETE source device on the same CPU, after altering its outward connections.
> ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
> coupling/decoupling method implemented here is not optimal and would be
> reworked later on.
>
> Unlike traditional sinks, TRBE can generate interrupts to signal including
> many other things, buffer got filled. The interrupt is a PPI and should be
> communicated from the platform. DT or ACPI entry representing TRBE should
> have the PPI number for a given platform. During perf session, the TRBE IRQ
> handler should capture trace for perf auxiliary buffer before restarting it
> back. System registers being used here to configure ETE and TRBE could be
> referred in the link below.
>
> https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
>
> This adds another change where CoreSight sink device needs to be disabled
> before capturing the trace data for perf in order to avoid race condition
> with another simultaneous TRBE IRQ handling. This might cause problem with
> traditional sink devices which can be operated in both sysfs and perf mode.
> This needs to be addressed correctly. One option would be to move the
> update_buffer callback into the respective sink devices. e.g, disable().
>
> This series is primarily looking from some early feed back both on proposed
> design and its implementation. It acknowledges, that it might be incomplete
> and will have scopes for improvement.
>
> Things todo:
> - Improve ETE-TRBE coupling and decoupling method
> - Improve TRBE IRQ handling for all possible corner cases
> - Implement sysfs based trace sessions
>
> [0] https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.poulose@arm.com/
> [1] https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-jonathan.zhouwen@huawei.com/
> [2] https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-hosted
>
> Anshuman Khandual (6):
>   arm64: Add TRBE definitions
>   coresight: sink: Add TRBE driver
>   coresight: etm-perf: Truncate the perf record if handle has no space
>   coresight: etm-perf: Disable the path before capturing the trace data
>   coresgith: etm-perf: Connect TRBE sink with ETE source
>   dts: bindings: Document device tree binding for Arm TRBE
>
> Suzuki K Poulose (5):
>   coresight: etm-perf: Allow an event to use different sinks
>   coresight: Do not scan for graph if none is present
>   coresight: etm4x: Add support for PE OS lock
>   coresight: ete: Add support for sysreg support
>   coresight: ete: Detect ETE as one of the supported ETMs
>
>  .../devicetree/bindings/arm/coresight.txt          |   3 +
>  Documentation/devicetree/bindings/arm/trbe.txt     |  20 +
>  Documentation/trace/coresight/coresight-trbe.rst   |  36 +
>  arch/arm64/include/asm/sysreg.h                    |  51 ++
>  drivers/hwtracing/coresight/Kconfig                |  11 +
>  drivers/hwtracing/coresight/Makefile               |   1 +
>  drivers/hwtracing/coresight/coresight-etm-perf.c   |  85 ++-
>  drivers/hwtracing/coresight/coresight-etm-perf.h   |   4 +
>  drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++-
>  drivers/hwtracing/coresight/coresight-etm4x.h      |  64 +-
>  drivers/hwtracing/coresight/coresight-platform.c   |   9 +-
>  drivers/hwtracing/coresight/coresight-trbe.c       | 768 +++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-trbe.h       | 525 ++++++++++++++
>  include/linux/coresight.h                          |   2 +
>  14 files changed, 1680 insertions(+), 43 deletions(-)

This is to confirm that I have received your work and it is now on my
list of patchset to review.  However doing so likely won't happen
before a couple of weeks because of patchsets already in the queue.  I
will touch base with you again if there are further delays.

Thanks,
Mathieu

>  create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
>  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
>
> --
> 2.7.4
>
Tingwei Zhang Nov. 14, 2020, 5:17 a.m. UTC | #2
Hi Anshuman,

On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
> This series enables future IP trace features Embedded Trace Extension (ETE)
> and Trace Buffer Extension (TRBE). This series depends on the ETM system
> register instruction support series [0] and the v8.4 Self hosted tracing
> support series (Jonathan Zhou) [1]. The tree is available here [2] for
> quick access.
> 
> ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
> extensions. ETE overlaps with the ETMv4 architecture, with additions to
> support the newer architecture features and some restrictions on the
> supported features w.r.t ETMv4. The ETE support is added by extending the
> ETMv4 driver to recognise the ETE and handle the features as exposed by the
> TRCIDRx registers. ETE only supports system instructions access from the
> host CPU. The ETE could be integrated with a TRBE (see below), or with the
> legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
> description as the ETMs and requires a node per instance.
> 
> Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
> accessible via the system registers and can be combined with the ETE to
> provide a 1x1 configuration of source & sink. TRBE is being represented
> here as a CoreSight sink. Primary reason is that the ETE source could work
> with other traditional CoreSight sink devices. As TRBE captures the trace
> data which is produced by ETE, it cannot work alone.
> 
> TRBE representation here have some distinct deviations from a traditional
> CoreSight sink device. Coresight path between ETE and TRBE are not built
> during boot looking at respective DT or ACPI entries. Instead TRBE gets
> checked on each available CPU, when found gets connected with respective
> ETE source device on the same CPU, after altering its outward connections.
> ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
> coupling/decoupling method implemented here is not optimal and would be
> reworked later on.

Only perf mode is supported in TRBE in current path. Will you consider
support sysfs mode as well in following patch sets?

Thanks,
Tingwei

> 
> Unlike traditional sinks, TRBE can generate interrupts to signal including
> many other things, buffer got filled. The interrupt is a PPI and should be
> communicated from the platform. DT or ACPI entry representing TRBE should
> have the PPI number for a given platform. During perf session, the TRBE IRQ
> handler should capture trace for perf auxiliary buffer before restarting it
> back. System registers being used here to configure ETE and TRBE could be
> referred in the link below.
> 
> https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
> 
> This adds another change where CoreSight sink device needs to be disabled
> before capturing the trace data for perf in order to avoid race condition
> with another simultaneous TRBE IRQ handling. This might cause problem with
> traditional sink devices which can be operated in both sysfs and perf mode.
> This needs to be addressed correctly. One option would be to move the
> update_buffer callback into the respective sink devices. e.g, disable().
> 
> This series is primarily looking from some early feed back both on proposed
> design and its implementation. It acknowledges, that it might be incomplete
> and will have scopes for improvement.
> 
> Things todo:
> - Improve ETE-TRBE coupling and decoupling method
> - Improve TRBE IRQ handling for all possible corner cases
> - Implement sysfs based trace sessions
> 
> [0] 
> https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.poulose@arm.com/
> [1] 
> https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-jonathan.zhouwen@huawei.com/
> [2] 
> https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-hosted
> 
> Anshuman Khandual (6):
>   arm64: Add TRBE definitions
>   coresight: sink: Add TRBE driver
>   coresight: etm-perf: Truncate the perf record if handle has no space
>   coresight: etm-perf: Disable the path before capturing the trace data
>   coresgith: etm-perf: Connect TRBE sink with ETE source
>   dts: bindings: Document device tree binding for Arm TRBE
> 
> Suzuki K Poulose (5):
>   coresight: etm-perf: Allow an event to use different sinks
>   coresight: Do not scan for graph if none is present
>   coresight: etm4x: Add support for PE OS lock
>   coresight: ete: Add support for sysreg support
>   coresight: ete: Detect ETE as one of the supported ETMs
> 
>  .../devicetree/bindings/arm/coresight.txt          |   3 +
>  Documentation/devicetree/bindings/arm/trbe.txt     |  20 +
>  Documentation/trace/coresight/coresight-trbe.rst   |  36 +
>  arch/arm64/include/asm/sysreg.h                    |  51 ++
>  drivers/hwtracing/coresight/Kconfig                |  11 +
>  drivers/hwtracing/coresight/Makefile               |   1 +
>  drivers/hwtracing/coresight/coresight-etm-perf.c   |  85 ++-
>  drivers/hwtracing/coresight/coresight-etm-perf.h   |   4 +
>  drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++-
>  drivers/hwtracing/coresight/coresight-etm4x.h      |  64 +-
>  drivers/hwtracing/coresight/coresight-platform.c   |   9 +-
>  drivers/hwtracing/coresight/coresight-trbe.c       | 768 
> +++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-trbe.h       | 525 ++++++++++++++
>  include/linux/coresight.h                          |   2 +
>  14 files changed, 1680 insertions(+), 43 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
>  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
> 
> -- 
> 2.7.4
> 
> _______________________________________________
> CoreSight mailing list
> CoreSight@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/coresight
Mike Leach Nov. 16, 2020, 3 p.m. UTC | #3
Hi Anshuman,

I've not looked in detail at this set yet, but having skimmed through
it  I do have an initial question about the handling of wrapped data
buffers.

With the ETR/ETB we found an issue with the way perf concatenated data
captured from the hardware buffer into a single contiguous data
block. The issue occurs when a wrapped buffer appears after another
buffer in the data file. In a typical session perf would stop trace
and copy the hardware buffer multiple times into the auxtrace buffer.

e.g.

For ETR/ETB we have a fixed length hardware data buffer - and no way
of detecting buffer wraps using interrupts as the tracing is in
progress.

If the buffer is not full at the point that perf transfers it then the
data will look like this:-
1) <async><synced trace data>
easy to decode, we can see the async at the start of the data - which
would be the async issued at the start of trace.

If the buffer wraps we see this:-

2) <unsynced trace data><async><synced trace data>

Again no real issue, the decoder will skip to the async and trace from
there - we lose the unsynced data.

Now the problem occurs when multiple transfers of data occur. We can
see the following appearing as contiguous trace in the auxtrace
buffer:-

3) < async><synced trace data><unsynced trace data><async><synced trace data>

Now the decoder cannot spot the point that the synced data from the
first capture ends, and the unsynced data from the second capture
begins.
This means it will continue to decode into the unsynced data - which
will result in incorrect trace / outright errors. To get round this
for ETR/ETB the driver will insert barrier packets into the datafile
if a wrap event is detected.

4) <async><synced trace data><barrier><unsynced trace
data><async><synced trace data>

This <barrier> has the effect of resetting the decoder into the
unsynced state so that the invalid trace is not decoded. This is a
workaround we have to do to handle the limitations of the ETR / ETB
trace hardware.

For TRBE we do have interrupts, so it should be possible to prevent
the buffer wrapping in most cases - but I did see in the code that
there are handlers for the TRBE buffer wrap management event. Are
there other factors in play that will prevent data pattern 3) from
appearing in the auxtrace buffer?

Regards

Mike





On Sat, 14 Nov 2020 at 05:17, Tingwei Zhang <tingweiz@codeaurora.org> wrote:
>
> Hi Anshuman,
>
> On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
> > This series enables future IP trace features Embedded Trace Extension (ETE)
> > and Trace Buffer Extension (TRBE). This series depends on the ETM system
> > register instruction support series [0] and the v8.4 Self hosted tracing
> > support series (Jonathan Zhou) [1]. The tree is available here [2] for
> > quick access.
> >
> > ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
> > extensions. ETE overlaps with the ETMv4 architecture, with additions to
> > support the newer architecture features and some restrictions on the
> > supported features w.r.t ETMv4. The ETE support is added by extending the
> > ETMv4 driver to recognise the ETE and handle the features as exposed by the
> > TRCIDRx registers. ETE only supports system instructions access from the
> > host CPU. The ETE could be integrated with a TRBE (see below), or with the
> > legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
> > description as the ETMs and requires a node per instance.
> >
> > Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
> > accessible via the system registers and can be combined with the ETE to
> > provide a 1x1 configuration of source & sink. TRBE is being represented
> > here as a CoreSight sink. Primary reason is that the ETE source could work
> > with other traditional CoreSight sink devices. As TRBE captures the trace
> > data which is produced by ETE, it cannot work alone.
> >
> > TRBE representation here have some distinct deviations from a traditional
> > CoreSight sink device. Coresight path between ETE and TRBE are not built
> > during boot looking at respective DT or ACPI entries. Instead TRBE gets
> > checked on each available CPU, when found gets connected with respective
> > ETE source device on the same CPU, after altering its outward connections.
> > ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
> > coupling/decoupling method implemented here is not optimal and would be
> > reworked later on.
>
> Only perf mode is supported in TRBE in current path. Will you consider
> support sysfs mode as well in following patch sets?
>
> Thanks,
> Tingwei
>
> >
> > Unlike traditional sinks, TRBE can generate interrupts to signal including
> > many other things, buffer got filled. The interrupt is a PPI and should be
> > communicated from the platform. DT or ACPI entry representing TRBE should
> > have the PPI number for a given platform. During perf session, the TRBE IRQ
> > handler should capture trace for perf auxiliary buffer before restarting it
> > back. System registers being used here to configure ETE and TRBE could be
> > referred in the link below.
> >
> > https://developer.arm.com/docs/ddi0601/g/aarch64-system-registers.
> >
> > This adds another change where CoreSight sink device needs to be disabled
> > before capturing the trace data for perf in order to avoid race condition
> > with another simultaneous TRBE IRQ handling. This might cause problem with
> > traditional sink devices which can be operated in both sysfs and perf mode.
> > This needs to be addressed correctly. One option would be to move the
> > update_buffer callback into the respective sink devices. e.g, disable().
> >
> > This series is primarily looking from some early feed back both on proposed
> > design and its implementation. It acknowledges, that it might be incomplete
> > and will have scopes for improvement.
> >
> > Things todo:
> > - Improve ETE-TRBE coupling and decoupling method
> > - Improve TRBE IRQ handling for all possible corner cases
> > - Implement sysfs based trace sessions
> >
> > [0]
> > https://lore.kernel.org/linux-arm-kernel/20201028220945.3826358-1-suzuki.poulose@arm.com/
> > [1]
> > https://lore.kernel.org/linux-arm-kernel/1600396210-54196-1-git-send-email-jonathan.zhouwen@huawei.com/
> > [2]
> > https://gitlab.arm.com/linux-arm/linux-skp/-/tree/coresight/etm/v8.4-self-hosted
> >
> > Anshuman Khandual (6):
> >   arm64: Add TRBE definitions
> >   coresight: sink: Add TRBE driver
> >   coresight: etm-perf: Truncate the perf record if handle has no space
> >   coresight: etm-perf: Disable the path before capturing the trace data
> >   coresgith: etm-perf: Connect TRBE sink with ETE source
> >   dts: bindings: Document device tree binding for Arm TRBE
> >
> > Suzuki K Poulose (5):
> >   coresight: etm-perf: Allow an event to use different sinks
> >   coresight: Do not scan for graph if none is present
> >   coresight: etm4x: Add support for PE OS lock
> >   coresight: ete: Add support for sysreg support
> >   coresight: ete: Detect ETE as one of the supported ETMs
> >
> >  .../devicetree/bindings/arm/coresight.txt          |   3 +
> >  Documentation/devicetree/bindings/arm/trbe.txt     |  20 +
> >  Documentation/trace/coresight/coresight-trbe.rst   |  36 +
> >  arch/arm64/include/asm/sysreg.h                    |  51 ++
> >  drivers/hwtracing/coresight/Kconfig                |  11 +
> >  drivers/hwtracing/coresight/Makefile               |   1 +
> >  drivers/hwtracing/coresight/coresight-etm-perf.c   |  85 ++-
> >  drivers/hwtracing/coresight/coresight-etm-perf.h   |   4 +
> >  drivers/hwtracing/coresight/coresight-etm4x-core.c | 144 +++-
> >  drivers/hwtracing/coresight/coresight-etm4x.h      |  64 +-
> >  drivers/hwtracing/coresight/coresight-platform.c   |   9 +-
> >  drivers/hwtracing/coresight/coresight-trbe.c       | 768
> > +++++++++++++++++++++
> >  drivers/hwtracing/coresight/coresight-trbe.h       | 525 ++++++++++++++
> >  include/linux/coresight.h                          |   2 +
> >  14 files changed, 1680 insertions(+), 43 deletions(-)
> >  create mode 100644 Documentation/devicetree/bindings/arm/trbe.txt
> >  create mode 100644 Documentation/trace/coresight/coresight-trbe.rst
> >  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.c
> >  create mode 100644 drivers/hwtracing/coresight/coresight-trbe.h
> >
> > --
> > 2.7.4
> >
> > _______________________________________________
> > CoreSight mailing list
> > CoreSight@lists.linaro.org
> > https://lists.linaro.org/mailman/listinfo/coresight
> _______________________________________________
> CoreSight mailing list
> CoreSight@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/coresight
Anshuman Khandual Nov. 23, 2020, 2:43 a.m. UTC | #4
Hello Tingwei,

On 11/14/20 10:47 AM, Tingwei Zhang wrote:
> Hi Anshuman,
> 
> On Tue, Nov 10, 2020 at 08:44:58PM +0800, Anshuman Khandual wrote:
>> This series enables future IP trace features Embedded Trace Extension (ETE)
>> and Trace Buffer Extension (TRBE). This series depends on the ETM system
>> register instruction support series [0] and the v8.4 Self hosted tracing
>> support series (Jonathan Zhou) [1]. The tree is available here [2] for
>> quick access.
>>
>> ETE is the PE (CPU) trace unit for CPUs, implementing future architecture
>> extensions. ETE overlaps with the ETMv4 architecture, with additions to
>> support the newer architecture features and some restrictions on the
>> supported features w.r.t ETMv4. The ETE support is added by extending the
>> ETMv4 driver to recognise the ETE and handle the features as exposed by the
>> TRCIDRx registers. ETE only supports system instructions access from the
>> host CPU. The ETE could be integrated with a TRBE (see below), or with the
>> legacy CoreSight trace bus (e.g, ETRs). Thus the ETE follows same firmware
>> description as the ETMs and requires a node per instance.
>>
>> Trace Buffer Extensions (TRBE) implements a per CPU trace buffer, which is
>> accessible via the system registers and can be combined with the ETE to
>> provide a 1x1 configuration of source & sink. TRBE is being represented
>> here as a CoreSight sink. Primary reason is that the ETE source could work
>> with other traditional CoreSight sink devices. As TRBE captures the trace
>> data which is produced by ETE, it cannot work alone.
>>
>> TRBE representation here have some distinct deviations from a traditional
>> CoreSight sink device. Coresight path between ETE and TRBE are not built
>> during boot looking at respective DT or ACPI entries. Instead TRBE gets
>> checked on each available CPU, when found gets connected with respective
>> ETE source device on the same CPU, after altering its outward connections.
>> ETE TRBE path connection lasts only till the CPU is online. But ETE-TRBE
>> coupling/decoupling method implemented here is not optimal and would be
>> reworked later on.
> Only perf mode is supported in TRBE in current path. Will you consider
> support sysfs mode as well in following patch sets?

Yes, either in subsequent versions or later on, after first getting the perf
based functionality enabled. Nonetheless, sysfs is also on the todo list as
mentioned in the cover letter.

- Anshuman
Anshuman Khandual Nov. 23, 2020, 3:40 a.m. UTC | #5
Hello Mike,

On 11/16/20 8:30 PM, Mike Leach wrote:
> Hi Anshuman,
> 
> I've not looked in detail at this set yet, but having skimmed through
> it  I do have an initial question about the handling of wrapped data
> buffers.
> 
> With the ETR/ETB we found an issue with the way perf concatenated data
> captured from the hardware buffer into a single contiguous data
> block. The issue occurs when a wrapped buffer appears after another
> buffer in the data file. In a typical session perf would stop trace
> and copy the hardware buffer multiple times into the auxtrace buffer.

The hardware buffer and perf aux trace buffer are the same for TRBE and
hence there is no actual copy involved. Trace data gets pushed into the
user space via perf_aux_output_end() either via etm_event_stop() or via
the IRQ handler i.e arm_trbe_irq_handler(). Data transfer to user space
happens via updates to perf aux buffer indices i.e head, tail, wake up.
But logically, they will appear as a stream of records to the user space
while parsing perf.data file.

> 
> e.g.
> 
> For ETR/ETB we have a fixed length hardware data buffer - and no way
> of detecting buffer wraps using interrupts as the tracing is in
> progress.

TRBE has an interrupt. Hence there will be an opportunity to insert any
additional packets if required to demarcate pre and post IRQ trace data
streams. 

> 
> If the buffer is not full at the point that perf transfers it then the
> data will look like this:-
> 1) <async><synced trace data>
> easy to decode, we can see the async at the start of the data - which
> would be the async issued at the start of trace.

Just curious, what makes the tracer to generate the <async> trace packet.
Is there an explicit instruction or that is how the tracer starts when
enabled ?

> 
> If the buffer wraps we see this:-
> 
> 2) <unsynced trace data><async><synced trace data>
> 
> Again no real issue, the decoder will skip to the async and trace from
> there - we lose the unsynced data.

Could you please elaborate more on the difference between sync and async
trace data ?

> 
> Now the problem occurs when multiple transfers of data occur. We can
> see the following appearing as contiguous trace in the auxtrace
> buffer:-
> 
> 3) < async><synced trace data><unsynced trace data><async><synced trace data>

So there is an wrap around event between <synced trace data> and
<unsynced trace data> ? Are there any other situations where this
might happen ?

> 
> Now the decoder cannot spot the point that the synced data from the
> first capture ends, and the unsynced data from the second capture
> begins.

Got it.

> This means it will continue to decode into the unsynced data - which
> will result in incorrect trace / outright errors. To get round this
> for ETR/ETB the driver will insert barrier packets into the datafile
> if a wrap event is detected.

But you mentioned there are on IRQs on ETR/ETB. So how the wrap event
is even detected ?

> 
> 4) <async><synced trace data><barrier><unsynced trace
> data><async><synced trace data>
> 
> This <barrier> has the effect of resetting the decoder into the
> unsynced state so that the invalid trace is not decoded. This is a
> workaround we have to do to handle the limitations of the ETR / ETB
> trace hardware.
Got it.

> 
> For TRBE we do have interrupts, so it should be possible to prevent
> the buffer wrapping in most cases - but I did see in the code that
> there are handlers for the TRBE buffer wrap management event. Are
> there other factors in play that will prevent data pattern 3) from
> appearing in the auxtrace buffer ?

On TRBE, the buffer wrapping cannot happen without generating an IRQ. I
would assume that ETE will then start again with an <async> data packet
first when the handler returns. Otherwise we might also have to insert
a similar barrier packet for the user space tool to reset. As trace data
should not get lost during an wrap event, ETE should complete the packet
after the handler returns, hence aux buffer should still have logically
contiguous stream of <synced trace data> to decode. I am not sure right
now, but will look into this.

- Anshuman
Mike Leach Nov. 23, 2020, 12:30 p.m. UTC | #6
Hi Anshuman,

On Mon, 23 Nov 2020 at 03:40, Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
> Hello Mike,
>
> On 11/16/20 8:30 PM, Mike Leach wrote:
> > Hi Anshuman,
> >
> > I've not looked in detail at this set yet, but having skimmed through
> > it  I do have an initial question about the handling of wrapped data
> > buffers.
> >
> > With the ETR/ETB we found an issue with the way perf concatenated data
> > captured from the hardware buffer into a single contiguous data
> > block. The issue occurs when a wrapped buffer appears after another
> > buffer in the data file. In a typical session perf would stop trace
> > and copy the hardware buffer multiple times into the auxtrace buffer.
>
> The hardware buffer and perf aux trace buffer are the same for TRBE and
> hence there is no actual copy involved. Trace data gets pushed into the
> user space via perf_aux_output_end() either via etm_event_stop() or via
> the IRQ handler i.e arm_trbe_irq_handler(). Data transfer to user space
> happens via updates to perf aux buffer indices i.e head, tail, wake up.
> But logically, they will appear as a stream of records to the user space
> while parsing perf.data file.
>

Understood - I suspected this would use direct write to the aux trace
buffer, but the principle is the same. TRBE determines the location of
data in the buffer so even without a copy, it is possible to get
multiple TRBE "buffers" in the auxbuffer as the TRBE is stopped and
restarted. The later copy to userspace is independent of this.

> >
> > e.g.
> >
> > For ETR/ETB we have a fixed length hardware data buffer - and no way
> > of detecting buffer wraps using interrupts as the tracing is in
> > progress.
>
> TRBE has an interrupt. Hence there will be an opportunity to insert any
> additional packets if required to demarcate pre and post IRQ trace data
> streams.
>
> >
> > If the buffer is not full at the point that perf transfers it then the
> > data will look like this:-
> > 1) <async><synced trace data>
> > easy to decode, we can see the async at the start of the data - which
> > would be the async issued at the start of trace.
>
> Just curious, what makes the tracer to generate the <async> trace packet.
> Is there an explicit instruction or that is how the tracer starts when
> enabled ?

ETM / ETE will generate an async at the start of trace, and then
periodically afterwards.

>
> >
> > If the buffer wraps we see this:-
> >
> > 2) <unsynced trace data><async><synced trace data>
> >
> > Again no real issue, the decoder will skip to the async and trace from
> > there - we lose the unsynced data.
>
> Could you please elaborate more on the difference between sync and async
> trace data ?
>

The decoder will start reading trace from the start of the buffer.
Unsynced trace is trace data that appears before the first async
packet. We cannot decode this as we do not know where the packet
boundaries are.
Synced trace is any data after the first async packet - the async
enables us to determine where the packet boundaries are so we can now
determine the packets and decode the trace.

For an unwrapped buffer, we always see the first async that the ETE
generated when the trace generation was started. In a wrapped buffer
we search till we find an async generated as part of the periodic
async packets.

> >
> > Now the problem occurs when multiple transfers of data occur. We can
> > see the following appearing as contiguous trace in the auxtrace
> > buffer:-
> >
> > 3) < async><synced trace data><unsynced trace data><async><synced trace data>
>
> So there is an wrap around event between <synced trace data> and
> <unsynced trace data> ? Are there any other situations where this
> might happen ?

Not that I am aware of.

>
> >
> > Now the decoder cannot spot the point that the synced data from the
> > first capture ends, and the unsynced data from the second capture
> > begins.
>
> Got it.
>
> > This means it will continue to decode into the unsynced data - which
> > will result in incorrect trace / outright errors. To get round this
> > for ETR/ETB the driver will insert barrier packets into the datafile
> > if a wrap event is detected.
>
> But you mentioned there are on IRQs on ETR/ETB. So how the wrap event
> is even detected ?

A bit in the status register tells us the buffer is full - i.e. the
write pointer has wrapped around to the location it started at.
We cannot tell how far, or if multiple wraps have occurred, just that
the event has occurred.

>
> >
> > 4) <async><synced trace data><barrier><unsynced trace
> > data><async><synced trace data>
> >
> > This <barrier> has the effect of resetting the decoder into the
> > unsynced state so that the invalid trace is not decoded. This is a
> > workaround we have to do to handle the limitations of the ETR / ETB
> > trace hardware.
> Got it.
>
> >
> > For TRBE we do have interrupts, so it should be possible to prevent
> > the buffer wrapping in most cases - but I did see in the code that
> > there are handlers for the TRBE buffer wrap management event. Are
> > there other factors in play that will prevent data pattern 3) from
> > appearing in the auxtrace buffer ?
>
> On TRBE, the buffer wrapping cannot happen without generating an IRQ. I
> would assume that ETE will then start again with an <async> data packet
> first when the handler returns.

This would only occur if the ETE was stopped and flushed prior to the
wrap event. Does this happen? I am assuming that the sink is
independent from the ETE, as ETM are from ETR.

> Otherwise we might also have to insert
> a similar barrier packet for the user space tool to reset. As trace data
> should not get lost during an wrap event,

My understanding is that if a wrap has even occurred, then data is already lost.


> ETE should complete the packet
> after the handler returns, hence aux buffer should still have logically
> contiguous stream of <synced trace data> to decode. I am not sure right
> now, but will look into this.
>

So you are relying on backpressure to stop ETE emitting packets? This
could result in trace being lost due to overflow if the IRQ is not
handled sufficiently quickly/.

Regards

Mike

> - Anshuman


--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK