mbox series

[RFCv1,0/4] arm64: Use static key for PID in CONTEXTIDR

Message ID 20211021134530.206216-1-leo.yan@linaro.org (mailing list archive)
Headers show
Series arm64: Use static key for PID in CONTEXTIDR | expand

Message

Leo Yan Oct. 21, 2021, 1:45 p.m. UTC
Kernel provides configuration PID_IN_CONTEXTIDR so we can save PID into
system register contextidr_el{1|2}.  This means developer must build
kernel with this config, otherwise, the feature is disabled and cannot
be used by hardware tracing modules (Like Arm CoreSight, SPE, etc) on
the fly.

Suggested by Stephane Eranian, this patch series introduces static key
for PID in CONTEXTIDR.  If the config PID_IN_CONTEXTIDR is selected when
build kernel, then the static key will be set as true and kernel will
always trace PID into CONTEXTIDR, so we can keep the same semantics for
PID_IN_CONTEXTIDR before and after applying this patch series.

If the config PID_IN_CONTEXTIDR is not selected, the kernel modules
still can invoke the pair functions contextidr_enable() and
contextidr_disable() to dynamically enable and disable PID tracing in
the profiling flow.  As result, Arm SPE is the first consumer to use
static key.

When I review Arm CoreSight driver, I found it misses to check root PID
namespace for its register setting.  So it would use a dedicate patch
series to firstly correct namespace checking and then apply static key
for PID tracing.  For this reason, this patch set doesn't contain Arm
CoreSight related enhancement.

We also need to provide arm32 variant to use static key for PID in
CONTEXTIDR.  I'd like to send out this patch series firstly for comment,
in case this approach is not accepted by maintainer.  If we can conclude
this is the right thing to do, I will supplement arm32 support in next
spin.

This patch set has been verified on Hisilicon D06 board with Arm SPE
driver.

I tested the performance for using static key, the result doesn't
show regression.  In the testing, I used the command 'perf bench sched
messaging -t -g 20 -l 1000' to measure the scheduling latency for 4
different modes:

    'dis': Disable kernel configuration PID_IN_CONTEXTIDR.
    'enb': Enable kernel configuration PID_IN_CONTEXTIDR.
    'true': Set static key to 'TRUE'
    'false': Set static key to 'FALSE' (so don't store PID into CONTEXTIDR)

The testing iterates for 5 loops for every configuration, and get the
run time (in seconds):

          dis      enb     true     false
    ---+--------+--------+--------+-------
    #1   26.568 | 26.786 | 26.056 | 26.197
    #2   26.442 | 26.457 | 26.458 | 26.846
    #3   26.719 | 26.701 | 27.119 | 26.281
    #4   26.448 | 27.595 | 26.953 | 27.043
    #5   27.017 | 27.263 | 26.638 | 26.933
    ---+--------+--------+--------+-------
    avg. 26.638 | 26.960 | 26.644 | 26.66
    ---+--------+--------+--------+-------
    delta pct.  | 1.21%  | 0.02%  | 0.08%


Leo Yan (4):
  arm64: Use static key for tracing PID in CONTEXTIDR
  arm64: entry: Always apply workaround for contextidr_el1
  arm64: Introduce functions for controlling PID tracing
  perf: arm_spe: Dynamically switch PID tracing to contextidr

 arch/arm64/include/asm/mmu_context.h | 14 +++++++++++++-
 arch/arm64/kernel/entry.S            |  4 ----
 arch/arm64/kernel/process.c          | 11 +++++++++++
 drivers/perf/arm_spe_pmu.c           | 13 ++++++++++++-
 4 files changed, 36 insertions(+), 6 deletions(-)