mbox series

[v9,0/7] kvm/coresight: Support exclude guest and exclude host

Message ID 20250106142446.628923-1-james.clark@linaro.org (mailing list archive)
Headers show
Series kvm/coresight: Support exclude guest and exclude host | expand

Message

James Clark Jan. 6, 2025, 2:24 p.m. UTC
FEAT_TRF is a Coresight feature that allows trace capture to be
completely filtered at different exception levels, unlike the existing
TRCVICTLR controls which may still emit target addresses of branches,
even if the following trace is filtered.

Without FEAT_TRF, it was possible to start a trace session on a host and
also collect trace from the guest as TRCVICTLR was never programmed to
exclude guests (and it could still emit target addresses even if it
was).

With FEAT_TRF, the current behavior of trace in guests exists depends on
whether nVHE or VHE are being used. Both of the examples below are from
the host's point of view, as Coresight isn't accessible from guests.
This patchset is only relevant to when FEAT_TRF exists, otherwise there
is no change.

Current behavior:

  nVHE/pKVM:

  Because the host and the guest are both using TRFCR_EL1, trace will be
  generated in guests depending on the same filter rules the host is
  using. For example if the host is tracing userspace only, then guest
  userspace trace will also be collected.

  (This is further limited by whether TRBE is used because an issue
  with TRBE means that it's completely disabled in nVHE guests, but it's
  possible to have other tracing components.)

  VHE:

  With VHE, the host filters will be in TRFCR_EL2, but the filters in
  TRFCR_EL1 will be active when the guest is running. Because we don't
  write to TRFCR_EL1, guest trace will be completely disabled.

New behavior:

The guest filtering rules from the Perf session are now honored for both
nVHE and VHE modes. This is done by either writing to TRFCR_EL12 at the
start of the Perf session and doing nothing else further, or caching the
guest value and writing it at guest switch for nVHE. In pKVM, trace is
now be disabled for both protected and unprotected guests.

There is also an optimization where the Coresight drivers pass their
enabled state to KVM. This means in the common case KVM doesn't have to
touch any sysregs when the feature isn't in use.

Applies to kvmarm/next (00163be8bb59).

---

Changes since V8 [8]:
  * Rename guest_trfcr_el1 -> trfcr_while_in_guest
  * Rename GUEST_FILTER -> EL1_TRACING_CONFIGURED
  * Rename kvm_set_trfcr() -> kvm_tracing_set_el1_configuration()
  * #include ordering
  * Reorder Coresight driver to remove need for preempt_disable()
    to avoid the warning
  * Force EL1_TRACING_CONFIGURED on for pKVM which drops an additional
    special case but still disables trace
  * Change set/clear trfcr to a single function that disables swapping
    if it has the same value as the host
  * Make the drain condition a bit clearer with __trace_needs_drain()
    instead of host trfcr != 0 (Or checking individual E*TRE bits)
  * Drain is only really required on switch to guest so move it there
  * Only for pKVM, restore the original behavior for draining whenever
    trbe is enabled. This prevents hypothetical case where a host has
    the filters disabled but hasn't drained yet which we had by only
    looking at host trfcr != 0

Changes since V7 [6]:
  * Drop SPE changes
  * Change the interface to be based on intent, i.e kvm_enable_trbe()
    rather than passing the raw register value
  * Drop change to re-use vcpu_flags mechanism in favour of [7]
  * Simplify by using the same switch function to and from guest

Changes since V6 [5]:
  * Implement a better "do nothing" case where both the SPE and Coresight
    drivers give the enabled state to KVM, allowing some register
    reads to be dropped.
  * Move the state and feature flags out of the vCPU into the per-CPU
    host_debug_state.
  * Simplify the switch logic by adding a new flag HOST_STATE_SWAP_TRFCR
    and only storing a single TRFCR value.
  * Rename vcpu flag macros to a more generic kvm_flag...

Changes since V5 [4]:
  * Sort new sysreg entries by encoding
  * Add a comment about sorting arch/arm64/tools/sysreg
  * Warn on preemptible() before calling smp_processor_id()
  * Pickup tags
  * Change TRFCR_EL2 from SysregFields to Sysreg because it was only
    used once

Changes since V4 [3]:
  * Remove all V3 changes that made it work in pKVM and just disable
    trace there instead
  * Restore PMU host/hyp state sharing back to how it was
    (kvm_pmu_update_vcpu_events())
  * Simplify some of the duplication in the comments and function docs
  * Add a WARN_ON_ONCE() if kvm_etm_set_guest_trfcr() is called when
    the trace filtering feature doesn't exist.
  * Split sysreg change into a tools update followed by the new register
    addition

Changes since V3:
  * Create a new shared area to store the host state instead of copying
    it before each VCPU run
  * Drop commit that moved SPE and trace registers from host_debug_state
    into the kvm sysregs array because the guest values were never used
  * Document kvm_etm_set_guest_trfcr()
  * Guard kvm_etm_set_guest_trfcr() with a feature check
  * Drop Mark B and Suzuki's review tags on the sysreg patch because it
    turned out that broke the Perf build and needed some unconventional
    changes to fix it (as in: to update the tools copy of the headers in
    the same commit as the kernel changes)

Changes since V2:

  * Add a new iflag to signify presence of FEAT_TRF and keep the
    existing TRBE iflag. This fixes the issue where TRBLIMITR_EL1 was
    being accessed even if TRBE didn't exist
  * Reword a commit message

Changes since V1:

  * Squashed all the arm64/tools/sysreg changes into the first commit
  * Add a new commit to move SPE and TRBE regs into the kvm sysreg array
  * Add a comment above the TRFCR global that it's per host CPU rather
    than vcpu

Changes since nVHE RFC [1]:

 * Re-write just in terms of the register value to be written for the
   host and the guest. This removes some logic from the hyp code and
   a value of kvm_vcpu_arch:trfcr_el1 = 0 no longer means "don't
   restore".
 * Remove all the conditional compilation and new files.
 * Change the kvm_etm_update_vcpu_events macro to a function.
 * Re-use DEBUG_STATE_SAVE_TRFCR so iflags don't need to be expanded
   anymore.
 * Expand the cover letter.

Changes since VHE v3 [2]:

 * Use the same interface as nVHE mode so TRFCR_EL12 is now written by
   kvm.

[1]: https://lore.kernel.org/kvmarm/20230804101317.460697-1-james.clark@arm.com/
[2]: https://lore.kernel.org/kvmarm/20230905102117.2011094-1-james.clark@arm.com/
[3]: https://lore.kernel.org/linux-arm-kernel/20240104162714.1062610-1-james.clark@arm.com/
[4]: https://lore.kernel.org/all/20240220100924.2761706-1-james.clark@arm.com/
[5]: https://lore.kernel.org/linux-arm-kernel/20240226113044.228403-1-james.clark@arm.com/
[6]: https://lore.kernel.org/kvmarm/20241112103717.589952-1-james.clark@linaro.org/T/#t
[7]: https://lore.kernel.org/kvmarm/20241115224924.2132364-4-oliver.upton@linux.dev/
[8]: https://lore.kernel.org/linux-arm-kernel/20241127100130.1162639-1-james.clark@linaro.org/

James Clark (7):
  arm64/sysreg: Add a comment that the sysreg file should be sorted
  tools: arm64: Update sysreg.h header files
  arm64/sysreg/tools: Move TRFCR definitions to sysreg
  coresight: trbe: Remove redundant disable call
  KVM: arm64: coresight: Give TRBE enabled state to KVM
  KVM: arm64: Support trace filtering for guests
  coresight: Pass guest TRFCR value to KVM

 arch/arm64/include/asm/kvm_host.h             |  11 +
 arch/arm64/include/asm/sysreg.h               |  12 -
 arch/arm64/kvm/debug.c                        |  50 ++-
 arch/arm64/kvm/hyp/nvhe/debug-sr.c            |  63 +--
 arch/arm64/tools/sysreg                       |  38 ++
 .../coresight/coresight-etm4x-core.c          |  49 ++-
 drivers/hwtracing/coresight/coresight-etm4x.h |   2 +-
 drivers/hwtracing/coresight/coresight-priv.h  |   3 +
 .../coresight/coresight-self-hosted-trace.h   |   9 -
 drivers/hwtracing/coresight/coresight-trbe.c  |  15 +-
 tools/arch/arm64/include/asm/sysreg.h         | 410 +++++++++++++++++-
 tools/include/linux/kasan-tags.h              |  15 +
 12 files changed, 599 insertions(+), 78 deletions(-)
 create mode 100644 tools/include/linux/kasan-tags.h

Comments

Marc Zyngier Jan. 6, 2025, 2:48 p.m. UTC | #1
On Mon, 06 Jan 2025 14:24:35 +0000,
James Clark <james.clark@linaro.org> wrote:
> 
> FEAT_TRF is a Coresight feature that allows trace capture to be
> completely filtered at different exception levels, unlike the existing
> TRCVICTLR controls which may still emit target addresses of branches,
> even if the following trace is filtered.
> 
> Without FEAT_TRF, it was possible to start a trace session on a host and
> also collect trace from the guest as TRCVICTLR was never programmed to
> exclude guests (and it could still emit target addresses even if it
> was).
> 
> With FEAT_TRF, the current behavior of trace in guests exists depends on
> whether nVHE or VHE are being used. Both of the examples below are from
> the host's point of view, as Coresight isn't accessible from guests.
> This patchset is only relevant to when FEAT_TRF exists, otherwise there
> is no change.
> 
> Current behavior:
> 
>   nVHE/pKVM:
> 
>   Because the host and the guest are both using TRFCR_EL1, trace will be
>   generated in guests depending on the same filter rules the host is
>   using. For example if the host is tracing userspace only, then guest
>   userspace trace will also be collected.
> 
>   (This is further limited by whether TRBE is used because an issue
>   with TRBE means that it's completely disabled in nVHE guests, but it's
>   possible to have other tracing components.)
> 
>   VHE:
> 
>   With VHE, the host filters will be in TRFCR_EL2, but the filters in
>   TRFCR_EL1 will be active when the guest is running. Because we don't
>   write to TRFCR_EL1, guest trace will be completely disabled.
> 
> New behavior:
> 
> The guest filtering rules from the Perf session are now honored for both
> nVHE and VHE modes. This is done by either writing to TRFCR_EL12 at the
> start of the Perf session and doing nothing else further, or caching the
> guest value and writing it at guest switch for nVHE. In pKVM, trace is
> now be disabled for both protected and unprotected guests.
> 
> There is also an optimization where the Coresight drivers pass their
> enabled state to KVM. This means in the common case KVM doesn't have to
> touch any sysregs when the feature isn't in use.
> 
> Applies to kvmarm/next (00163be8bb59).

Can you *PLEASE* stop this absolutely nonsense of posting patches
based on top of random commits? Please look at how we integrate new
developments: they are *always* based on an early -rc tag (usually
-rc3).

If you depend on other patches, add them to your series and post the
whole thing.

Thanks,

	M.
James Clark Jan. 7, 2025, 11:37 a.m. UTC | #2
On 06/01/2025 2:48 pm, Marc Zyngier wrote:
> On Mon, 06 Jan 2025 14:24:35 +0000,
> James Clark <james.clark@linaro.org> wrote:
>>
>> FEAT_TRF is a Coresight feature that allows trace capture to be
>> completely filtered at different exception levels, unlike the existing
>> TRCVICTLR controls which may still emit target addresses of branches,
>> even if the following trace is filtered.
>>
>> Without FEAT_TRF, it was possible to start a trace session on a host and
>> also collect trace from the guest as TRCVICTLR was never programmed to
>> exclude guests (and it could still emit target addresses even if it
>> was).
>>
>> With FEAT_TRF, the current behavior of trace in guests exists depends on
>> whether nVHE or VHE are being used. Both of the examples below are from
>> the host's point of view, as Coresight isn't accessible from guests.
>> This patchset is only relevant to when FEAT_TRF exists, otherwise there
>> is no change.
>>
>> Current behavior:
>>
>>    nVHE/pKVM:
>>
>>    Because the host and the guest are both using TRFCR_EL1, trace will be
>>    generated in guests depending on the same filter rules the host is
>>    using. For example if the host is tracing userspace only, then guest
>>    userspace trace will also be collected.
>>
>>    (This is further limited by whether TRBE is used because an issue
>>    with TRBE means that it's completely disabled in nVHE guests, but it's
>>    possible to have other tracing components.)
>>
>>    VHE:
>>
>>    With VHE, the host filters will be in TRFCR_EL2, but the filters in
>>    TRFCR_EL1 will be active when the guest is running. Because we don't
>>    write to TRFCR_EL1, guest trace will be completely disabled.
>>
>> New behavior:
>>
>> The guest filtering rules from the Perf session are now honored for both
>> nVHE and VHE modes. This is done by either writing to TRFCR_EL12 at the
>> start of the Perf session and doing nothing else further, or caching the
>> guest value and writing it at guest switch for nVHE. In pKVM, trace is
>> now be disabled for both protected and unprotected guests.
>>
>> There is also an optimization where the Coresight drivers pass their
>> enabled state to KVM. This means in the common case KVM doesn't have to
>> touch any sysregs when the feature isn't in use.
>>
>> Applies to kvmarm/next (00163be8bb59).
> 
> Can you *PLEASE* stop this absolutely nonsense of posting patches
> based on top of random commits? Please look at how we integrate new
> developments: they are *always* based on an early -rc tag (usually
> -rc3).
> 
> If you depend on other patches, add them to your series and post the
> whole thing.
> 
> Thanks,
> 
> 	M.
> 

Sure, I re-posted it on the latest -rc with a few commits picked up.

Thanks
James