[v4] arm64: kernel: implement fast refcount checking

This adds support to arm64 for fast refcount checking, as proposed by
Kees for x86 based on the implementation by grsecurity/PaX.

The general approach is identical: the existing atomic_t helpers are
cloned for refcount_t, with the arithmetic instruction modified to set
the PSTATE flags, and one or two branch instructions added that jump to
an out of line handler if overflow, decrement to zero or increment from
zero are detected.

One complication that we have to deal with on arm64 is the fact that
it has two atomics implementations: the original LL/SC implementation
using load/store exclusive loops, and the newer LSE one that does mostly
the same in a single instruction. So we need to clone some parts of
both for the refcount handlers, but we also need to deal with the way
LSE builds fall back to LL/SC at runtime if the hardware does not
support it. (The only exception is refcount_add_not_zero(), which
updates the refcount conditionally, so it is only implemented using
a load/store exclusive loop)

As is the case with the x86 version, the performance delta is in the
noise (Cortex-A57 @ 2 GHz, using LL/SC not LSE), even though the arm64
implementation incorporates an add-from-zero check as well:

perf stat -B -- cat <(echo ATOMIC_TIMING) >/sys/kernel/debug/provoke-crash/DIRECT

 Performance counter stats for 'cat /dev/fd/63':

      65758.632112      task-clock (msec)         #    1.000 CPUs utilized
                 2      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                47      page-faults               #    0.001 K/sec
      131421735632      cycles                    #    1.999 GHz
       36752227542      instructions              #    0.28  insn per cycle
   <not supported>      branches
            961008      branch-misses

      65.785264736 seconds time elapsed

perf stat -B -- cat <(echo REFCOUNT_TIMING) >/sys/kernel/debug/provoke-crash/DIRECT

 Performance counter stats for 'cat /dev/fd/63':

      65734.255992      task-clock (msec)         #    1.000 CPUs utilized
                 2      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                46      page-faults               #    0.001 K/sec
      131376830467      cycles                    #    1.999 GHz
       43183673156      instructions              #    0.33  insn per cycle
   <not supported>      branches
            879345      branch-misses

      65.735309648 seconds time elapsed

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
v4: Implement add-from-zero checking using a conditional compare rather than
    a conditional branch, which I omitted from v3 due to the 10% performance
    hit: this will result in the new refcount to be written back to memory
    before invoking the handler, which is more in line with the other checks,
    and is apparently much easier on the branch predictor, given that there
    is no performance hit whatsoever.

 arch/arm64/Kconfig                    |  1 +
 arch/arm64/include/asm/atomic.h       | 25 +++++++
 arch/arm64/include/asm/atomic_ll_sc.h | 29 ++++++++
 arch/arm64/include/asm/atomic_lse.h   | 56 +++++++++++++++
 arch/arm64/include/asm/brk-imm.h      |  1 +
 arch/arm64/include/asm/refcount.h     | 71 ++++++++++++++++++++
 arch/arm64/kernel/traps.c             | 29 ++++++++
 arch/arm64/lib/atomic_ll_sc.c         | 16 +++++
 8 files changed, 228 insertions(+)

[v4] arm64: kernel: implement fast refcount checking

Commit Message

Comments

Patch