mbox series

[v3,00/30] arm64: support WXN and entry with MMU enabled

Message ID 20220411094824.4176877-1-ardb@kernel.org (mailing list archive)
Headers show
Series arm64: support WXN and entry with MMU enabled | expand

Message

Ard Biesheuvel April 11, 2022, 9:47 a.m. UTC
[ TL;DR this series does the following:
  - move variable definitions and assignments out of early asm code
    where possible, and get rid of explicit cache maintenance;
  - convert initial ID map so it covers the entire loaded image as well
    as the DT blob;
  - create the kernel mapping only once instead of twice (for KASLR),
    and do it with the MMU and caches on;
  - avoid mappings that are both writable and executable entirely;
  - avoid parsing the DT while the kernel text and rodata are still
    mapped writable;
  - allow WXN to be enabled (with an opt-out) so writable mappings are
    never executable;
  - create the initial ID map with the MMU and caches on if that is how
    we entered, and take advantage of this when doing EFI boot. ]

This is a followup to a previous series of mine [0][1], and it aims to
streamline the boot flow with respect to cache maintenance and redundant
copying of data in memory, as well as eliminate writable executable
mappings at any time during the boot.

Combined with my proof-of-concept firmware for QEMU/arm64 [2], this
results in a boot where both the kernel and the initrd are loaded
straight to their final locations in memory, while the physical
placement of the kernel image is still randomized by the loader. It also
removes all memory accesses performed with the MMU and caches off
(except for instruction fetches) that are done from the moment the VM
comes out of reset.

On the kernel side, this comes down to:
- increasing the ID map to cover the entire kernel image, so we can
  build the kernel page tables with the MMU and caches enabled;
- deal with the MMU already being on at boot, and keep it on while
  building the ID map;
- ensure all stores to memory that are now done with the MMU and caches
  on are not negated by the subsequent cache invalidation.

Additionally, this series removes the little dance we do to create a
kernel mapping, relocate the kernel, run the KASLR init code, tear down
the old mapping and create a new one, relocate the kernel again, and
finally enter the kernel proper. Instead, it invokes a minimal C
function 'kaslr_early_init()' while running from the ID map which
includes a temporary mapping of the FDT. This change represents a
substantial chunk of the diffstat, as it requires some work to
instantiate code that can run safely from an arbitrary load address.

Changes since v2:
- create a separate, initial ID map that is discarded after boot, and
  create the permanent ID map from C code using the ordinary memory
  mapping code;
- refactor the extended ID map handling, and along with it, simplify the
  early memory mapping macros, so that we can deal with an extended ID
  map that requires multiple table entries at intermediate levels;
- eliminate all variable assignments with the MMU off from the happy
  flow;
- replace temporary FDT mapping in TTBR1 with a FDT mapping in the
  initial ID map;
- use read-only attributes for all code mappings, so we can boot with
  WXN enabled if we elect to do so.

Changes since v1:
- Remove the dodgy handling of the KASLR seed, which was necessary to
  avoid doing two iterations of the setup/teardown of the page tables.
  This is now dealt with by creating the TTBR1 page tables while
  executing from TTBR0, and so all memory manipulations are still done
  with the MMU and caches on.
- Only boot from EFI with the MMU and caches on if the image was not
  moved around in memory. Otherwise, we cannot rely on the firmware's ID
  map to have created an executable mapping for the copied code.

[0] https://lore.kernel.org/all/20220304175657.2744400-1-ardb@kernel.org/
[1] https://lore.kernel.org/all/20220330154205.2483167-1-ardb@kernel.org/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/efilite.git/

Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>

Ard Biesheuvel (30):
  arm64: head: move kimage_vaddr variable into C file
  arm64: mm: make vabits_actual a build time constant if possible
  arm64: head: move assignment of idmap_t0sz to C code
  arm64: head: drop idmap_ptrs_per_pgd
  arm64: head: simplify page table mapping macros (slightly)
  arm64: head: switch to map_memory macro for the extended ID map
  arm64: head: split off idmap creation code
  arm64: kernel: drop unnecessary PoC cache clean+invalidate
  arm64: head: pass ID map root table address to __enable_mmu()
  arm64: mm: provide idmap pointer to cpu_replace_ttbr1()
  arm64: head: add helper function to remap regions in early page tables
  arm64: head: cover entire kernel image in initial ID map
  arm64: head: use relative references to the RELA and RELR tables
  arm64: head: create a temporary FDT mapping in the initial ID map
  arm64: idreg-override: use early FDT mapping in ID map
  arm64: head: factor out TTBR1 assignment into a macro
  arm64: head: populate kernel page tables with MMU and caches on
  arm64: head: record CPU boot mode after enabling the MMU
  arm64: kaslr: deal with init called with VA randomization enabled
  arm64: head: relocate kernel only a single time if KASLR is enabled
  arm64: head: remap the kernel text/inittext region read-only
  arm64: setup: drop early FDT pointer helpers
  arm64: mm: move ro_after_init section into the data segment
  arm64: mm: add support for WXN memory translation attribute
  arm64: head: record the MMU state at primary entry
  arm64: head: avoid cache invalidation when entering with the MMU on
  arm64: head: clean the ID map page to the PoC
  efi: libstub: pass image handle to handle_kernel_image()
  efi/arm64: libstub: run image in place if randomized by the loader
  arm64: efi/libstub: enter with the MMU on if executing in place

 arch/arm64/Kconfig                        |  11 +
 arch/arm64/include/asm/assembler.h        |  14 +
 arch/arm64/include/asm/kernel-pgtable.h   |  18 +-
 arch/arm64/include/asm/memory.h           |   6 +
 arch/arm64/include/asm/mmu_context.h      |  47 +-
 arch/arm64/include/asm/setup.h            |   3 -
 arch/arm64/kernel/Makefile                |   2 +-
 arch/arm64/kernel/cpufeature.c            |   2 +-
 arch/arm64/kernel/efi-entry.S             |   4 +
 arch/arm64/kernel/head.S                  | 570 +++++++++++---------
 arch/arm64/kernel/idreg-override.c        |  33 +-
 arch/arm64/kernel/image-vars.h            |   4 +
 arch/arm64/kernel/kaslr.c                 |  83 +--
 arch/arm64/kernel/pi/Makefile             |  33 ++
 arch/arm64/kernel/pi/kaslr_early.c        | 128 +++++
 arch/arm64/kernel/setup.c                 |  27 +-
 arch/arm64/kernel/sleep.S                 |   1 +
 arch/arm64/kernel/suspend.c               |   2 +-
 arch/arm64/kernel/vmlinux.lds.S           |  60 ++-
 arch/arm64/mm/kasan_init.c                |   4 +-
 arch/arm64/mm/mmu.c                       |  84 ++-
 arch/arm64/mm/proc.S                      |   8 +-
 drivers/firmware/efi/libstub/arm32-stub.c |   3 +-
 drivers/firmware/efi/libstub/arm64-stub.c |  15 +-
 drivers/firmware/efi/libstub/efi-stub.c   |   2 +-
 drivers/firmware/efi/libstub/efistub.h    |   3 +-
 drivers/firmware/efi/libstub/riscv-stub.c |   3 +-
 include/linux/efi.h                       |  11 +
 28 files changed, 745 insertions(+), 436 deletions(-)
 create mode 100644 arch/arm64/kernel/pi/Makefile
 create mode 100644 arch/arm64/kernel/pi/kaslr_early.c

Comments

Kees Cook April 12, 2022, 4:59 p.m. UTC | #1
On Mon, Apr 11, 2022 at 11:47:54AM +0200, Ard Biesheuvel wrote:
>   - allow WXN to be enabled (with an opt-out) so writable mappings are
>     never executable;

Besides all the rest of this series's awesomeness; this really stands
out to me. I didn't even know this was a feature in aarch64. Nice! I
really like the idea of having this enabled -- anything executing out of
a writable mapping should already be considered a mistake (and tons of
work over the last 2 decades has already gone into making stuff this
doesn't happen in both the kernel and userspace). We could even make a
new LKDTM test for this. (Right now stuff like EXEC_DATA just verifies
that the .data segment doesn't have the X bit... but adding something
like EXEC_WXN where a memory region is made explicitly W+X, and it
_still_ can't be executed would be great.)

Cool!

-Kees