[v3,00/36] arm64/gcs: Provide support for GCS in userspace

Message ID	20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Mark Brown <broonie@kernel.org> Subject: [PATCH v3 00/36] arm64/gcs: Provide support for GCS in userspace Date: Mon, 31 Jul 2023 14:43:09 +0100 Message-Id: <20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit To: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Jonathan Corbet <corbet@lwn.net>, Andrew Morton <akpm@linux-foundation.org>, Marc Zyngier <maz@kernel.org>, Oliver Upton <oliver.upton@linux.dev>, James Morse <james.morse@arm.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Arnd Bergmann <arnd@arndb.de>, Oleg Nesterov <oleg@redhat.com>, Eric Biederman <ebiederm@xmission.com>, Kees Cook <keescook@chromium.org>, Shuah Khan <shuah@kernel.org>, "Rick P. Edgecombe" <rick.p.edgecombe@intel.com>, Deepak Gupta <debug@rivosinc.com>, Ard Biesheuvel <ardb@kernel.org>, Szabolcs Nagy <Szabolcs.Nagy@arm.com> Cc: "H.J. Lu" <hjl.tools@gmail.com>, Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu>, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Mark Brown <broonie@kernel.org> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	arm64/gcs: Provide support for GCS in userspace \| expand [v3,00/36] arm64/gcs: Provide support for GCS in userspace [v3,01/36] prctl: arch-agnostic prctl for shadow stack [v3,02/36] arm64: Document boot requirements for Guarded Control Stacks [v3,03/36] arm64/gcs: Document the ABI for Guarded Control Stacks [v3,04/36] arm64/sysreg: Add new system registers for GCS [v3,05/36] arm64/sysreg: Add definitions for architected GCS caps [v3,06/36] arm64/gcs: Add manual encodings of GCS instructions [v3,07/36] arm64/gcs: Provide copy_to_user_gcs() [v3,08/36] arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS) [v3,09/36] arm64/mm: Allocate PIE slots for EL0 guarded control stack [v3,10/36] mm: Define VM_SHADOW_STACK for arm64 when we support GCS [v3,11/36] arm64/mm: Map pages for guarded control stack [v3,12/36] KVM: arm64: Manage GCS registers for guests [v3,13/36] arm64/gcs: Allow GCS usage at EL0 and EL1 [v3,14/36] arm64/idreg: Add overrride for GCS [v3,15/36] arm64/hwcap: Add hwcap for GCS [v3,16/36] arm64/traps: Handle GCS exceptions [v3,17/36] arm64/mm: Handle GCS data aborts [v3,18/36] arm64/gcs: Context switch GCS state for EL0 [v3,19/36] arm64/gcs: Allocate a new GCS for threads with GCS enabled [v3,20/36] arm64/gcs: Implement shadow stack prctl() interface [v3,21/36] arm64/mm: Implement map_shadow_stack() [v3,22/36] arm64/signal: Set up and restore the GCS context for signal handlers [v3,23/36] arm64/signal: Expose GCS state in signal frames [v3,24/36] arm64/ptrace: Expose GCS via ptrace and core files [v3,25/36] arm64: Add Kconfig for Guarded Control Stack (GCS) [v3,26/36] kselftest/arm64: Verify the GCS hwcap [v3,27/36] kselftest/arm64: Add GCS as a detected feature in the signal tests [v3,28/36] kselftest/arm64: Add framework support for GCS to signal handling tests [v3,29/36] kselftest/arm64: Allow signals tests to specify an expected si_code [v3,30/36] kselftest/arm64: Always run signals tests with GCS enabled [v3,31/36] kselftest/arm64: Add very basic GCS test program [v3,32/36] kselftest/arm64: Add a GCS test program built with the system libc [v3,33/36] kselftest/arm64: Add test coverage for GCS mode locking [v3,34/36] selftests/arm64: Add GCS signal tests [v3,35/36] kselftest/arm64: Add a GCS stress test [v3,36/36] kselftest/arm64: Enable GCS for the FP stress tests

Mark Brown July 31, 2023, 1:43 p.m. UTC

The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.

When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations.  When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match.  GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.

This series implements support for use of GCS by userspace, along with
support for use of GCS within KVM guests.  It does not enable use of GCS
by either EL1 or EL2.  Executables are started without GCS and must use
a prctl() to enable it, it is expected that this will be done very early
in application execution by the dynamic linker or other startup code.

x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible.  As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.

The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it.  This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.

x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V zisslpcfi feature which I initially
adopted fairly directly but following review feedback has been reviewed
quite a bit.

There is an open issue with support for CRIU, on x86 this required the
ability to set the GCS mode via ptrace.  This series supports
configuring mode bits other than enable/disable via ptrace but it needs
to be confirmed if this is sufficient.

There's a few bits where I'm not convinced with where I've placed
things, in particular the GCS write operation is in the GCS header not
in uaccess.h, I wasn't sure what was clearest there and am probably too
close to the code to have a clear opinion.  The reporting of GCS in
/proc/PID/smaps is also a bit awkward.

The series depends on the x86 shadow stack support:

   https://lore.kernel.org/lkml/20230227222957.24501-1-rick.p.edgecombe@intel.com/

I've rebased this onto v6.5-rc3 but not included it in the series in
order to avoid confusion with Rick's work and cut down the size of the
series, you can see the branch at:

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs

[1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/

Signed-off-by: Mark Brown <broonie@kernel.org>
---
Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org

Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
  requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org

---
Mark Brown (36):
      prctl: arch-agnostic prctl for shadow stack
      arm64: Document boot requirements for Guarded Control Stacks
      arm64/gcs: Document the ABI for Guarded Control Stacks
      arm64/sysreg: Add new system registers for GCS
      arm64/sysreg: Add definitions for architected GCS caps
      arm64/gcs: Add manual encodings of GCS instructions
      arm64/gcs: Provide copy_to_user_gcs()
      arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
      arm64/mm: Allocate PIE slots for EL0 guarded control stack
      mm: Define VM_SHADOW_STACK for arm64 when we support GCS
      arm64/mm: Map pages for guarded control stack
      KVM: arm64: Manage GCS registers for guests
      arm64/gcs: Allow GCS usage at EL0 and EL1
      arm64/idreg: Add overrride for GCS
      arm64/hwcap: Add hwcap for GCS
      arm64/traps: Handle GCS exceptions
      arm64/mm: Handle GCS data aborts
      arm64/gcs: Context switch GCS state for EL0
      arm64/gcs: Allocate a new GCS for threads with GCS enabled
      arm64/gcs: Implement shadow stack prctl() interface
      arm64/mm: Implement map_shadow_stack()
      arm64/signal: Set up and restore the GCS context for signal handlers
      arm64/signal: Expose GCS state in signal frames
      arm64/ptrace: Expose GCS via ptrace and core files
      arm64: Add Kconfig for Guarded Control Stack (GCS)
      kselftest/arm64: Verify the GCS hwcap
      kselftest/arm64: Add GCS as a detected feature in the signal tests
      kselftest/arm64: Add framework support for GCS to signal handling tests
      kselftest/arm64: Allow signals tests to specify an expected si_code
      kselftest/arm64: Always run signals tests with GCS enabled
      kselftest/arm64: Add very basic GCS test program
      kselftest/arm64: Add a GCS test program built with the system libc
      kselftest/arm64: Add test coverage for GCS mode locking
      selftests/arm64: Add GCS signal tests
      kselftest/arm64: Add a GCS stress test
      kselftest/arm64: Enable GCS for the FP stress tests

 Documentation/admin-guide/kernel-parameters.txt    |   3 +
 Documentation/arch/arm64/booting.rst               |  22 +
 Documentation/arch/arm64/elf_hwcaps.rst            |   3 +
 Documentation/arch/arm64/gcs.rst                   | 225 +++++++++
 Documentation/arch/arm64/index.rst                 |   1 +
 Documentation/filesystems/proc.rst                 |   2 +-
 arch/arm64/Kconfig                                 |  19 +
 arch/arm64/include/asm/cpufeature.h                |   6 +
 arch/arm64/include/asm/el2_setup.h                 |  17 +
 arch/arm64/include/asm/esr.h                       |  28 +-
 arch/arm64/include/asm/exception.h                 |   2 +
 arch/arm64/include/asm/gcs.h                       | 106 ++++
 arch/arm64/include/asm/hwcap.h                     |   1 +
 arch/arm64/include/asm/kvm_arm.h                   |   4 +-
 arch/arm64/include/asm/kvm_host.h                  |  12 +
 arch/arm64/include/asm/pgtable-prot.h              |  14 +-
 arch/arm64/include/asm/processor.h                 |   7 +
 arch/arm64/include/asm/sysreg.h                    |  20 +
 arch/arm64/include/asm/uaccess.h                   |  42 ++
 arch/arm64/include/uapi/asm/hwcap.h                |   1 +
 arch/arm64/include/uapi/asm/ptrace.h               |   8 +
 arch/arm64/include/uapi/asm/sigcontext.h           |   9 +
 arch/arm64/kernel/cpufeature.c                     |  19 +
 arch/arm64/kernel/cpuinfo.c                        |   1 +
 arch/arm64/kernel/entry-common.c                   |  23 +
 arch/arm64/kernel/idreg-override.c                 |   2 +
 arch/arm64/kernel/process.c                        |  85 ++++
 arch/arm64/kernel/ptrace.c                         |  59 +++
 arch/arm64/kernel/signal.c                         | 237 ++++++++-
 arch/arm64/kernel/traps.c                          |  11 +
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h         |  17 +
 arch/arm64/kvm/sys_regs.c                          |  22 +
 arch/arm64/mm/Makefile                             |   1 +
 arch/arm64/mm/fault.c                              |  78 ++-
 arch/arm64/mm/gcs.c                                | 226 +++++++++
 arch/arm64/mm/mmap.c                               |  17 +-
 arch/arm64/tools/cpucaps                           |   1 +
 arch/arm64/tools/sysreg                            |  55 +++
 fs/proc/task_mmu.c                                 |   3 +
 include/linux/mm.h                                 |  16 +-
 include/linux/syscalls.h                           |   1 +
 include/uapi/asm-generic/unistd.h                  |   5 +-
 include/uapi/linux/elf.h                           |   1 +
 include/uapi/linux/prctl.h                         |  22 +
 kernel/sys.c                                       |  30 ++
 kernel/sys_ni.c                                    |   1 +
 tools/testing/selftests/arm64/Makefile             |   2 +-
 tools/testing/selftests/arm64/abi/hwcap.c          |  19 +
 tools/testing/selftests/arm64/fp/assembler.h       |  15 +
 tools/testing/selftests/arm64/fp/fpsimd-test.S     |   2 +
 tools/testing/selftests/arm64/fp/sve-test.S        |   2 +
 tools/testing/selftests/arm64/fp/za-test.S         |   2 +
 tools/testing/selftests/arm64/fp/zt-test.S         |   2 +
 tools/testing/selftests/arm64/gcs/.gitignore       |   5 +
 tools/testing/selftests/arm64/gcs/Makefile         |  23 +
 tools/testing/selftests/arm64/gcs/asm-offsets.h    |   0
 tools/testing/selftests/arm64/gcs/basic-gcs.c      | 351 ++++++++++++++
 tools/testing/selftests/arm64/gcs/gcs-locking.c    | 200 ++++++++
 .../selftests/arm64/gcs/gcs-stress-thread.S        | 311 ++++++++++++
 tools/testing/selftests/arm64/gcs/gcs-stress.c     | 532 +++++++++++++++++++++
 tools/testing/selftests/arm64/gcs/gcs-util.h       |  87 ++++
 tools/testing/selftests/arm64/gcs/libc-gcs.c       | 372 ++++++++++++++
 tools/testing/selftests/arm64/signal/.gitignore    |   1 +
 .../testing/selftests/arm64/signal/test_signals.c  |  17 +-
 .../testing/selftests/arm64/signal/test_signals.h  |   6 +
 .../selftests/arm64/signal/test_signals_utils.c    |  32 +-
 .../selftests/arm64/signal/test_signals_utils.h    |  39 ++
 .../arm64/signal/testcases/gcs_exception_fault.c   |  59 +++
 .../selftests/arm64/signal/testcases/gcs_frame.c   |  78 +++
 .../arm64/signal/testcases/gcs_write_fault.c       |  67 +++
 .../selftests/arm64/signal/testcases/testcases.c   |   7 +
 .../selftests/arm64/signal/testcases/testcases.h   |   1 +
 72 files changed, 3683 insertions(+), 34 deletions(-)
---
base-commit: 730a197c555893dfad0deebcace710d5c7425ba5
change-id: 20230303-arm64-gcs-e311ab0d8729

Best regards,

Will Deacon Aug. 1, 2023, 2:13 p.m. UTC | #1

On Mon, Jul 31, 2023 at 02:43:09PM +0100, Mark Brown wrote:
> The arm64 Guarded Control Stack (GCS) feature provides support for
> hardware protected stacks of return addresses, intended to provide
> hardening against return oriented programming (ROP) attacks and to make
> it easier to gather call stacks for applications such as profiling.

Why is this better than Clang's software shadow stack implementation? It
would be nice to see some justification behind adding all this, rather
than it being an architectural tick-box exercise.

Will

Mark Brown Aug. 1, 2023, 3:09 p.m. UTC | #2

On Tue, Aug 01, 2023 at 03:13:20PM +0100, Will Deacon wrote:
> On Mon, Jul 31, 2023 at 02:43:09PM +0100, Mark Brown wrote:

> > The arm64 Guarded Control Stack (GCS) feature provides support for
> > hardware protected stacks of return addresses, intended to provide
> > hardening against return oriented programming (ROP) attacks and to make
> > it easier to gather call stacks for applications such as profiling.

> Why is this better than Clang's software shadow stack implementation? It
> would be nice to see some justification behind adding all this, rather
> than it being an architectural tick-box exercise.

Mainly that it's hardware enforced (as the quoted paragraph says).  This
makes it harder to attack, and hopefully it's also a bit faster (how
measurable that might be will be an open question, but even NOPs in
function entry/exit tend to get noticed).

Szabolcs Nagy Aug. 8, 2023, 10:27 a.m. UTC | #3

The 08/01/2023 16:09, Mark Brown wrote:
> On Tue, Aug 01, 2023 at 03:13:20PM +0100, Will Deacon wrote:
> > On Mon, Jul 31, 2023 at 02:43:09PM +0100, Mark Brown wrote:
> 
> > > The arm64 Guarded Control Stack (GCS) feature provides support for
> > > hardware protected stacks of return addresses, intended to provide
> > > hardening against return oriented programming (ROP) attacks and to make
> > > it easier to gather call stacks for applications such as profiling.
> 
> > Why is this better than Clang's software shadow stack implementation? It
> > would be nice to see some justification behind adding all this, rather
> > than it being an architectural tick-box exercise.
> 
> Mainly that it's hardware enforced (as the quoted paragraph says).  This
> makes it harder to attack, and hopefully it's also a bit faster (how
> measurable that might be will be an open question, but even NOPs in
> function entry/exit tend to get noticed).

clang shadowstack seems to use x18. this is only valid on a
platform like android that can reserve x18, not deployable
widely on linux distros.

with gcs the same binary works with gcs enabled or disabled.
and it can support disabling gcs at runtime. this is
important for incremental deployment or with late detection
of incompatibility. clang shadowstack cannot do this. (and
there is no abi marking so it is easy to create broken
binaries.)

android uses fixed 16k shadowstack, controlling this size
from userspace is missing from the current gcs abi patches.
the default gcs size can be huge so this may be an actual
issue for gcs on android where RLIMIT_AS, RLIMIT_DATA etc
are often set i think. but the fixed size has its problems
too (e.g. there are libraries, boehm gc, that recursively
call a function until segfault to detect stack bounds).

i think the clang shadowstack design does not allow safely
switching between shadow stacks. bionic has no makecontext
so code that does userspace task scheduling presumably has
to do custom things which would need modifications and likely
introdce security weakness where x18 is set. (this also means
sigaltstack would have the same limitations as the current
gcs patches: shadow stack overflow cannot be handled if the
signal handler itself wants to use the same shadow stack. one
advantage of the weaker software solution is that it can be
disabled per function however a signal handler may indirectly
call many other functions so i'm not sure if this helps in
practice.)

as usual with these sanitizers we cannot recommend them to
users in general: they only work in a narrow context. to be
fair shstk and gcs are only a little bit better in this case.

Will Deacon Aug. 8, 2023, 1:38 p.m. UTC | #4

On Tue, Aug 01, 2023 at 04:09:58PM +0100, Mark Brown wrote:
> On Tue, Aug 01, 2023 at 03:13:20PM +0100, Will Deacon wrote:
> > On Mon, Jul 31, 2023 at 02:43:09PM +0100, Mark Brown wrote:
> 
> > > The arm64 Guarded Control Stack (GCS) feature provides support for
> > > hardware protected stacks of return addresses, intended to provide
> > > hardening against return oriented programming (ROP) attacks and to make
> > > it easier to gather call stacks for applications such as profiling.
> 
> > Why is this better than Clang's software shadow stack implementation? It
> > would be nice to see some justification behind adding all this, rather
> > than it being an architectural tick-box exercise.
> 
> Mainly that it's hardware enforced (as the quoted paragraph says).  This
> makes it harder to attack, and hopefully it's also a bit faster (how
> measurable that might be will be an open question, but even NOPs in
> function entry/exit tend to get noticed).

I dunno, "hardware enforced" can also mean worse security nowadays ;)

But seriously, I think the question is more about what this brings us
*on top of* SCS, since for the forseeable future folks that care about
this stuff (like Android) will be using SCS. GCS on its own doesn't make
sense to me, given the recompilation effort to remove SCS and the lack
of hardware, so then you have to look at what it brings in addition to
GCS and balance that against the performance cost.

Given that, is anybody planning to ship a distribution with this enabled?
If not, why are we bothering? If so, how much of that distribution has
been brought up and how does the "dynamic linker or other startup code"
decide what to do?

After the mess we had with BTI and mprotect(), I'm hesitant to merge
features like this without knowing that the ABI can stand real code.

Will

Mark Brown Aug. 8, 2023, 8:25 p.m. UTC | #5

On Tue, Aug 08, 2023 at 02:38:58PM +0100, Will Deacon wrote:

> But seriously, I think the question is more about what this brings us
> *on top of* SCS, since for the forseeable future folks that care about
> this stuff (like Android) will be using SCS. GCS on its own doesn't make
> sense to me, given the recompilation effort to remove SCS and the lack
> of hardware, so then you have to look at what it brings in addition to
> GCS and balance that against the performance cost.

> Given that, is anybody planning to ship a distribution with this enabled?

I'm not sure that your assumption that the only people would would
consider deploying this are those who have deployed SCS is a valid one,
SCS users are definitely part of the mix but GCS is expected to be much
more broadly applicable.  As you say SCS is very invasive, requires a
rebuild of everything with different code generated and as Szabolcs
outlined has ABI challenges for general distros.  Any code built (or
JITed) with anything other than clang is going to require some explicit
support to do SCS (eg, the kernel's SCS support does nothing for
assembly code) and there's a bunch of runtime support.  It's very much a
specialist feature, mainly practical in well controlled somewhat
vertical systems - I've not seen any suggestion that general purpose
distros are considering using it.

In contrast in the case of GCS one of the nice features is that for most
code it's very much non-invasive, much less so than things like PAC/BTI
and SCS, which means that the audience is much wider than it is for SCS
- it's a *much* easier sell for general purpose distros to enable GCS
than to enable SCS.  For the majority of programs all the support that
is needed is in the kernel and libgcc/libc, there's no impact on the
code generation.  There are no extra instructions in the normal flow
which will impact systems without the feature, and there are no extra
registers in use, so even if the binaries are run on a system without
GCS or for some reason someone decides that it's best to turn the
feature off on a system that is capable of using it the fact that it's
just using the existing bl/ret pairs means that there is minimal
overhead.  This all means that it's much more practical to deploy in
general purpose distros.  On the other hand when active it affects all
code, this improves coverage but the improved coverage can be a worry.

I can see that systems that have gone through all the effort of enabling
SCS might not rush to implement GCS, though there should be no harm in
having the two features running side by side beyond the doubled memory
requirements so you can at least have a transition plan (GCS does have
some allowances which enable hardware to mitigate some of the memory
bandwidth requirements at least).  You do still get the benefit of the
additional hardware protections GCS offers, and the coverage of all
branch and ret instructions will be of interest both for security and
for unwinders.  It's definitely offers less of an incremental
improvement on top of SCS than it is without SCS though.

GCS and SCS are comparable features in terms of the protection they aim
to add but their system integration impacts are different.

> If not, why are we bothering? If so, how much of that distribution has
> been brought up and how does the "dynamic linker or other startup code"
> decide what to do?

There is active interest in the x86 shadow stack support from distros,
GCS is a lot earlier on in the process but isn't fundamentally different
so it is expected that this will translate.  There is also a chicken and
egg thing where upstream support gates a lot of people's interest, what
people will consider carrying out of tree is different to what they'll
enable.  Architecture specific feedback on the implementation can also
be fed back into the still ongoing review of the ABI that is being
established for x86, there will doubtless be pushback about variations
between architectures from userspace people.

The userspace decision about enablement will primarily be driven by an
ELF marking which the dynamic linker looks at to determine if the
binaries it is loading can support GCS, a later dlopen() can either
refuse to load an additional library if the process currently has GCS
enabled, ignore the issue and hope things work out (there's a good
chance they will but obviously that's not safe) or (more complicatedly)
go round all the threads and disable GCS before proceeding.  The main
reason any sort of rebuild is required for most code is to add the ELF
marking, there will be a compiler option to select it.  Static binaries
should know if everything linked into them is GCS compatible and enable
GCS if appropriate in their startup code.

The majority of the full distro work at this point is on the x86 side
given the hardware availability, we are looking at that within Arm of
course.  I'm not aware of any huge blockers we have encountered thus
far.

It is fair to say that there's less active interest on the arm64 side
since as you say the feature is quite a way off making it's way into
hardware, though there are also long lead times on getting the full
software stack to end users and kernel support becomes a blocker for
the userspace stack.

> After the mess we had with BTI and mprotect(), I'm hesitant to merge
> features like this without knowing that the ABI can stand real code.

The equivalent x86 feature is in current hardware[1], there has been
some distro work (I believe one of the issues x86 has had is coping with
a distro which shipped an early out of tree ABI, that experience has
informed the current ABI which as the cover letter says we are following
closely).  AIUI the biggest blocker on userspace work for x86 right now
is landing the kernel side of things so that everyone else has a stable
ABI to work from and don't need to carry out of tree patches, I've heard
frustration expressed at the deployment being held up.  IIRC Fedora were
on the leading edge in terms of active interest, they tend to be given
that they're one of the most quickly iterating distros.  

This definitely does rely fairly heavily on the x86 experience for
confidence in the ABI, and to be honest one of the big unknowns at this
point is if you or Catalin will have opinions on how things are being
done.

[1] https://edc.intel.com/content/www/us/en/design/ipla/software-development-platforms/client/platforms/alder-lake-desktop/12th-generation-intel-core-processors-datasheet-volume-1-of-2/009/shadow-stack/

Will Deacon Aug. 10, 2023, 9:40 a.m. UTC | #6

On Tue, Aug 08, 2023 at 09:25:11PM +0100, Mark Brown wrote:
> On Tue, Aug 08, 2023 at 02:38:58PM +0100, Will Deacon wrote:
> 
> > But seriously, I think the question is more about what this brings us
> > *on top of* SCS, since for the forseeable future folks that care about
> > this stuff (like Android) will be using SCS. GCS on its own doesn't make
> > sense to me, given the recompilation effort to remove SCS and the lack
> > of hardware, so then you have to look at what it brings in addition to
> > GCS and balance that against the performance cost.
> 
> > Given that, is anybody planning to ship a distribution with this enabled?
> 
> I'm not sure that your assumption that the only people would would
> consider deploying this are those who have deployed SCS is a valid one,
> SCS users are definitely part of the mix but GCS is expected to be much
> more broadly applicable.  As you say SCS is very invasive, requires a
> rebuild of everything with different code generated and as Szabolcs
> outlined has ABI challenges for general distros.  Any code built (or
> JITed) with anything other than clang is going to require some explicit
> support to do SCS (eg, the kernel's SCS support does nothing for
> assembly code) and there's a bunch of runtime support.  It's very much a
> specialist feature, mainly practical in well controlled somewhat
> vertical systems - I've not seen any suggestion that general purpose
> distros are considering using it.

I've also seen no suggestion that general purpose distros are considering
GCS -- that's what I'm asking about here, and also saying that we shouldn't
rush in an ABI without confidence that it actually works beyond unit tests
(although it's great that you wrote selftests!).

> In contrast in the case of GCS one of the nice features is that for most
> code it's very much non-invasive, much less so than things like PAC/BTI
> and SCS, which means that the audience is much wider than it is for SCS
> - it's a *much* easier sell for general purpose distros to enable GCS
> than to enable SCS.

This sounds compelling, but has anybody tried running significant parts of a
distribution (e.g. running Debian source package tests, booting Android,
using a browser, running QEMU) with GCS enabled? I can well imagine
non-trivial applications violating both assumptions of the architecture and
the ABI.

> For the majority of programs all the support that is needed is in the
> kernel and libgcc/libc, there's no impact on the code generation.  There
> are no extra instructions in the normal flow which will impact systems
> without the feature, and there are no extra registers in use, so even if
> the binaries are run on a system without GCS or for some reason someone
> decides that it's best to turn the feature off on a system that is capable
> of using it the fact that it's just using the existing bl/ret pairs means
> that there is minimal overhead.  This all means that it's much more
> practical to deploy in general purpose distros.  On the other hand when
> active it affects all code, this improves coverage but the improved
> coverage can be a worry.
> 
> I can see that systems that have gone through all the effort of enabling
> SCS might not rush to implement GCS, though there should be no harm in
> having the two features running side by side beyond the doubled memory
> requirements so you can at least have a transition plan (GCS does have
> some allowances which enable hardware to mitigate some of the memory
> bandwidth requirements at least).  You do still get the benefit of the
> additional hardware protections GCS offers, and the coverage of all
> branch and ret instructions will be of interest both for security and
> for unwinders.  It's definitely offers less of an incremental
> improvement on top of SCS than it is without SCS though.
> 
> GCS and SCS are comparable features in terms of the protection they aim
> to add but their system integration impacts are different.

Again, this sounds plausible but I don't see any data to back it up so I
don't really have a feeling as to how true it is.

> > If not, why are we bothering? If so, how much of that distribution has
> > been brought up and how does the "dynamic linker or other startup code"
> > decide what to do?
> 
> There is active interest in the x86 shadow stack support from distros,
> GCS is a lot earlier on in the process but isn't fundamentally different
> so it is expected that this will translate.  There is also a chicken and
> egg thing where upstream support gates a lot of people's interest, what
> people will consider carrying out of tree is different to what they'll
> enable. 

I'm not saying we should wait until distros are committed, but Arm should
be able to do that work on a fork, exactly like we did for the arm64
bringup. We have the fastmodel, so running interesting stuff with GCS
enabled should be dead easy, no?

> Architecture specific feedback on the implementation can also be fed back
> into the still ongoing review of the ABI that is being established for
> x86, there will doubtless be pushback about variations between
> architectures from userspace people.
> 
> The userspace decision about enablement will primarily be driven by an
> ELF marking which the dynamic linker looks at to determine if the
> binaries it is loading can support GCS, a later dlopen() can either
> refuse to load an additional library if the process currently has GCS
> enabled, ignore the issue and hope things work out (there's a good
> chance they will but obviously that's not safe) or (more complicatedly)
> go round all the threads and disable GCS before proceeding.  The main
> reason any sort of rebuild is required for most code is to add the ELF
> marking, there will be a compiler option to select it.  Static binaries
> should know if everything linked into them is GCS compatible and enable
> GCS if appropriate in their startup code.
> 
> The majority of the full distro work at this point is on the x86 side
> given the hardware availability, we are looking at that within Arm of
> course.  I'm not aware of any huge blockers we have encountered thus
> far.

Ok, so it sounds like you've started something then? How far have you got?

> It is fair to say that there's less active interest on the arm64 side
> since as you say the feature is quite a way off making it's way into
> hardware, though there are also long lead times on getting the full
> software stack to end users and kernel support becomes a blocker for
> the userspace stack.
>
> 
> > After the mess we had with BTI and mprotect(), I'm hesitant to merge
> > features like this without knowing that the ABI can stand real code.
> 
> The equivalent x86 feature is in current hardware[1], there has been
> some distro work (I believe one of the issues x86 has had is coping with
> a distro which shipped an early out of tree ABI, that experience has
> informed the current ABI which as the cover letter says we are following
> closely).  AIUI the biggest blocker on userspace work for x86 right now
> is landing the kernel side of things so that everyone else has a stable
> ABI to work from and don't need to carry out of tree patches, I've heard
> frustration expressed at the deployment being held up.  IIRC Fedora were
> on the leading edge in terms of active interest, they tend to be given
> that they're one of the most quickly iterating distros.  
> 
> This definitely does rely fairly heavily on the x86 experience for
> confidence in the ABI, and to be honest one of the big unknowns at this
> point is if you or Catalin will have opinions on how things are being
> done.

While we'd be daft not to look at what the x86 folks are doing, I don't
think we should rely solely on them to inform the design for arm64 when
it should be relatively straightforward to prototype the distro work on
the model. There's also no rush to land the kernel changes given that
GCS hardware doesn't exist.

Will

Mark Brown Aug. 10, 2023, 4:05 p.m. UTC | #7

On Thu, Aug 10, 2023 at 10:40:16AM +0100, Will Deacon wrote:
> On Tue, Aug 08, 2023 at 09:25:11PM +0100, Mark Brown wrote:

> > I'm not sure that your assumption that the only people would would
> > consider deploying this are those who have deployed SCS is a valid one,
> > SCS users are definitely part of the mix but GCS is expected to be much
> > more broadly applicable.  As you say SCS is very invasive, requires a
> > rebuild of everything with different code generated and as Szabolcs
> > outlined has ABI challenges for general distros.  Any code built (or
> > JITed) with anything other than clang is going to require some explicit
> > support to do SCS (eg, the kernel's SCS support does nothing for
> > assembly code) and there's a bunch of runtime support.  It's very much a
> > specialist feature, mainly practical in well controlled somewhat
> > vertical systems - I've not seen any suggestion that general purpose
> > distros are considering using it.

> I've also seen no suggestion that general purpose distros are considering
> GCS -- that's what I'm asking about here, and also saying that we shouldn't
> rush in an ABI without confidence that it actually works beyond unit tests
> (although it's great that you wrote selftests!).

It defintely works substantially beyond selftests.  For the actual
distros there's definitely interest out there, gated on upstreaming.

> > In contrast in the case of GCS one of the nice features is that for most
> > code it's very much non-invasive, much less so than things like PAC/BTI
> > and SCS, which means that the audience is much wider than it is for SCS
> > - it's a *much* easier sell for general purpose distros to enable GCS
> > than to enable SCS.

> This sounds compelling, but has anybody tried running significant parts of a
> distribution (e.g. running Debian source package tests, booting Android,
> using a browser, running QEMU) with GCS enabled? I can well imagine
> non-trivial applications violating both assumptions of the architecture and
> the ABI.

Android is the main full userspace that people have been working with,
we've not run into anything ABI related yet that I'm aware of - there is
one thing that's being chased down but we're fairly confident that is a
bug somewhere rather than the ABI being unsuitable.

> > > If not, why are we bothering? If so, how much of that distribution has
> > > been brought up and how does the "dynamic linker or other startup code"
> > > decide what to do?

> > There is active interest in the x86 shadow stack support from distros,
> > GCS is a lot earlier on in the process but isn't fundamentally different
> > so it is expected that this will translate.  There is also a chicken and
> > egg thing where upstream support gates a lot of people's interest, what
> > people will consider carrying out of tree is different to what they'll
> > enable. 

> I'm not saying we should wait until distros are committed, but Arm should
> be able to do that work on a fork, exactly like we did for the arm64
> bringup. We have the fastmodel, so running interesting stuff with GCS
> enabled should be dead easy, no?

Right, this is happening but your pushback seemed to be "why would
anyone even consider deploying this?" rather than "could anyone deploy
this?", tests on forks can help a bit with the first question but your
concern seemed more at the level of even getting people to look at the
work rather than just rejecting it out of hand.

> > The majority of the full distro work at this point is on the x86 side
> > given the hardware availability, we are looking at that within Arm of
> > course.  I'm not aware of any huge blockers we have encountered thus
> > far.

> Ok, so it sounds like you've started something then? How far have you got?

I'd say thus far text mode embedded/server type stuff is looking pretty
good, especially for C stuff - setjmp/longjmp and an unwinder cover a
*lot*.  We do need to do more here, especially GUI stuff, but it's
progressing well thus far.

> While we'd be daft not to look at what the x86 folks are doing, I don't
> think we should rely solely on them to inform the design for arm64 when
> it should be relatively straightforward to prototype the distro work on
> the model. There's also no rush to land the kernel changes given that
> GCS hardware doesn't exist.

Sure, but we're also in the position where there's only been the very
beginnings of kernel review and obviously that's very important too and
there's often really substantial lead times on that, plus the potential
for need for redoing all the testing if there's issues identified.  I'd
hope to at least be able to get to a point where the major concern
people have is testing.  Another goal here is to feed any concerns we do
have into what's happening with x86 and RISC-V so that we have as much
alignment as possible in how this is supposed to work on Linux, that'll
make everyone's life easier.

In terms of timescales given that users with generic distros are a big
part of the expected audience while we're well in advance of where it's
actually going to be used we do need to be mindful of lead times in
getting support into the software users are likely to want to run so
they've got something they can use when they do get hardware.  We don't
need to rush into anything, but we should probably use that time for
careful consideration.

[v3,00/36] arm64/gcs: Provide support for GCS in userspace

Message

Comments