mbox series

[v1,0/2] mm: In-kernel support for memory-deny-write-execute (MDWE)

Message ID 20221026150457.36957-1-joey.gouly@arm.com (mailing list archive)
Headers show
Series mm: In-kernel support for memory-deny-write-execute (MDWE) | expand

Message

Joey Gouly Oct. 26, 2022, 3:04 p.m. UTC
Hi all,

This is a follow up to the RFC that Catalin posted:
  https://lore.kernel.org/linux-arm-kernel/20220413134946.2732468-1-catalin.marinas@arm.com/

The background to this is that systemd has a configuration option called
MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim
is to prevent a user task from inadvertently creating an executable
mapping that is (or was) writeable. Since such BPF filter is stateless,
it cannot detect mappings that were previously writeable but
subsequently changed to read-only. Therefore the filter simply rejects
any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
support (Branch Target Identification), the dynamic loader cannot change
an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
For libraries, it can resort to unmapping and re-mapping but for the
main executable it does not have a file descriptor. The original bug
report in the Red Hat bugzilla - [2] - and subsequent glibc workaround
for libraries - [3].

This series adds in-kernel support for this feature as a prctl PR_SET_MDWE,
that is inherited on fork(). The prctl denies PROT_WRITE | PROT_EXEC mappings.
Like the systemd BPF filter it also denies adding PROT_EXEC to mappings.
However unlike the BPF filter it only denies it if the mapping didn't previous
have PROT_EXEC. This allows to PROT_EXEC -> PROT_EXEC | PROT_BTI with mprotect(),
which is a problem with the BPF filter.

Thanks,
Joey

[1] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#MemoryDenyWriteExecute=
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1888842
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=26831

Joey Gouly (2):
  mm: Implement memory-deny-write-execute as a prctl
  kselftest: vm: add tests for memory-deny-write-execute

 include/linux/mman.h                   |  15 ++
 include/linux/sched/coredump.h         |   6 +-
 include/uapi/linux/prctl.h             |   6 +
 kernel/sys.c                           |  18 +++
 mm/mmap.c                              |   3 +
 mm/mprotect.c                          |   5 +
 tools/testing/selftests/vm/mdwe_test.c | 194 +++++++++++++++++++++++++
 7 files changed, 246 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/vm/mdwe_test.c

Comments

Topi Miettinen Nov. 6, 2022, 7:42 p.m. UTC | #1
On 26.10.2022 18.04, Joey Gouly wrote:
> Hi all,
> 
> This is a follow up to the RFC that Catalin posted:
>    https://lore.kernel.org/linux-arm-kernel/20220413134946.2732468-1-catalin.marinas@arm.com/
> 
> The background to this is that systemd has a configuration option called
> MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim
> is to prevent a user task from inadvertently creating an executable
> mapping that is (or was) writeable. Since such BPF filter is stateless,
> it cannot detect mappings that were previously writeable but
> subsequently changed to read-only. Therefore the filter simply rejects
> any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
> support (Branch Target Identification), the dynamic loader cannot change
> an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
> For libraries, it can resort to unmapping and re-mapping but for the
> main executable it does not have a file descriptor. The original bug
> report in the Red Hat bugzilla - [2] - and subsequent glibc workaround
> for libraries - [3].
> 
> This series adds in-kernel support for this feature as a prctl PR_SET_MDWE,
> that is inherited on fork(). The prctl denies PROT_WRITE | PROT_EXEC mappings.
> Like the systemd BPF filter it also denies adding PROT_EXEC to mappings.
> However unlike the BPF filter it only denies it if the mapping didn't previous
> have PROT_EXEC. This allows to PROT_EXEC -> PROT_EXEC | PROT_BTI with mprotect(),
> which is a problem with the BPF filter.

Draft PR for systemd: https://github.com/systemd/systemd/pull/25276

-Topi

> 
> Thanks,
> Joey
> 
> [1] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#MemoryDenyWriteExecute=
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1888842
> [3] https://sourceware.org/bugzilla/show_bug.cgi?id=26831
> 
> Joey Gouly (2):
>    mm: Implement memory-deny-write-execute as a prctl
>    kselftest: vm: add tests for memory-deny-write-execute
> 
>   include/linux/mman.h                   |  15 ++
>   include/linux/sched/coredump.h         |   6 +-
>   include/uapi/linux/prctl.h             |   6 +
>   kernel/sys.c                           |  18 +++
>   mm/mmap.c                              |   3 +
>   mm/mprotect.c                          |   5 +
>   tools/testing/selftests/vm/mdwe_test.c | 194 +++++++++++++++++++++++++
>   7 files changed, 246 insertions(+), 1 deletion(-)
>   create mode 100644 tools/testing/selftests/vm/mdwe_test.c
>