Message ID | 20210916131816.8841-1-will@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] fs/compat_binfmt_elf: Introduce sysctl to disable compat ELF loader | expand |
On Thu, Sep 16, 2021 at 3:18 PM Will Deacon <will@kernel.org> wrote: > > Distributions such as Android which support a mixture of 32-bit (compat) > and 64-bit (native) tasks necessarily ship with the compat ELF loader > enabled in their kernels. However, as time goes by, an ever-increasing > proportion of userspace consists of native applications and in some cases > 32-bit capabilities are starting to be removed from the CPUs altogether. > > Inevitably, this means that the compat code becomes somewhat of a > maintenance burden, receiving less testing coverage and exposing an > additional kernel attack surface to userspace during the lengthy > transitional period where some shipping devices require support for > 32-bit binaries. > > Introduce a new sysctl 'fs.compat-binfmt-elf-enable' to allow the compat > ELF loader to be disabled dynamically on devices where it is not required. > On arm64, this is sufficient to prevent userspace from executing 32-bit > code at all. > > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Andy Lutomirski <luto@kernel.org> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Kees Cook <keescook@chromium.org> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Will Deacon <will@kernel.org> > --- > fs/compat_binfmt_elf.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > I started off hacking this into the arch code, but then I realised it was > just as easy doing it in the core for everybody to enjoy. Unfortunately, > after talking to Peter, it sounds like it doesn't really help on x86 > where userspace can switch to 32-bit without involving the kernel at all. > > Thoughts? I'm not sure I understand the logic behind the sysctl. Are you worried about exposing attack surface on devices that don't support 32-bit instructions at all but might be tricked into loading a 32-bit binary that exploits a bug in the elf loader, or do you want to remove compat support on some but not all devices running the same kernel? In the first case, having the kernel make the decision based on CPU feature flags would be easier. In the second case, I would expect this to be a per-process setting similar to prctl, capability or seccomp. This would make it possible to do it for separately per container and avoid ambiguity about what happens to already-running 32-bit tasks. Arnd
Hi Arnd, On Thu, Sep 16, 2021 at 04:46:15PM +0200, Arnd Bergmann wrote: > On Thu, Sep 16, 2021 at 3:18 PM Will Deacon <will@kernel.org> wrote: > > > > Distributions such as Android which support a mixture of 32-bit (compat) > > and 64-bit (native) tasks necessarily ship with the compat ELF loader > > enabled in their kernels. However, as time goes by, an ever-increasing > > proportion of userspace consists of native applications and in some cases > > 32-bit capabilities are starting to be removed from the CPUs altogether. > > > > Inevitably, this means that the compat code becomes somewhat of a > > maintenance burden, receiving less testing coverage and exposing an > > additional kernel attack surface to userspace during the lengthy > > transitional period where some shipping devices require support for > > 32-bit binaries. > > > > Introduce a new sysctl 'fs.compat-binfmt-elf-enable' to allow the compat > > ELF loader to be disabled dynamically on devices where it is not required. > > On arm64, this is sufficient to prevent userspace from executing 32-bit > > code at all. > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > Cc: Andy Lutomirski <luto@kernel.org> > > Cc: Arnd Bergmann <arnd@arndb.de> > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > Cc: Kees Cook <keescook@chromium.org> > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Signed-off-by: Will Deacon <will@kernel.org> > > --- > > fs/compat_binfmt_elf.c | 24 +++++++++++++++++++++++- > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > I started off hacking this into the arch code, but then I realised it was > > just as easy doing it in the core for everybody to enjoy. Unfortunately, > > after talking to Peter, it sounds like it doesn't really help on x86 > > where userspace can switch to 32-bit without involving the kernel at all. > > > > Thoughts? > > I'm not sure I understand the logic behind the sysctl. Are you worried > about exposing attack surface on devices that don't support 32-bit > instructions at all but might be tricked into loading a 32-bit binary that > exploits a bug in the elf loader, or do you want to remove compat support > on some but not all devices running the same kernel? It's the latter case. With the GKI effort in Android, we want to run the same kernel binary across multiple devices. However, for some devices we may be able to determine that there is no need to support 32-bit applications even though the hardware may support them, and we would like to ensure that things like the compat syscall wrappers, compat vDSO, signal handling etc are not accessible to applications. > In the first case, having the kernel make the decision based on CPU > feature flags would be easier. In the second case, I would expect this > to be a per-process setting similar to prctl, capability or seccomp. > This would make it possible to do it for separately per container > and avoid ambiguity about what happens to already-running 32-bit > tasks. I'm not sure I follow the per-process aspect of your suggestion -- we want to prevent 32-bit tasks from existing at all. If it wasn't for GKI, we'd just disable CONFIG_COMPAT altogether, but while there is a need for 32-bit support on some devices then we're not able to do that. Does that make more sense now? Cheers, Will
On Thu, Sep 16, 2021 at 04:13:37PM +0100, Will Deacon wrote: > Hi Arnd, > > On Thu, Sep 16, 2021 at 04:46:15PM +0200, Arnd Bergmann wrote: > > On Thu, Sep 16, 2021 at 3:18 PM Will Deacon <will@kernel.org> wrote: > > > > > > Distributions such as Android which support a mixture of 32-bit (compat) > > > and 64-bit (native) tasks necessarily ship with the compat ELF loader > > > enabled in their kernels. However, as time goes by, an ever-increasing > > > proportion of userspace consists of native applications and in some cases > > > 32-bit capabilities are starting to be removed from the CPUs altogether. > > > > > > Inevitably, this means that the compat code becomes somewhat of a > > > maintenance burden, receiving less testing coverage and exposing an > > > additional kernel attack surface to userspace during the lengthy > > > transitional period where some shipping devices require support for > > > 32-bit binaries. > > > > > > Introduce a new sysctl 'fs.compat-binfmt-elf-enable' to allow the compat > > > ELF loader to be disabled dynamically on devices where it is not required. > > > On arm64, this is sufficient to prevent userspace from executing 32-bit > > > code at all. > > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > > Cc: Andy Lutomirski <luto@kernel.org> > > > Cc: Arnd Bergmann <arnd@arndb.de> > > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > > Cc: Kees Cook <keescook@chromium.org> > > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > > > Cc: Peter Zijlstra <peterz@infradead.org> > > > Signed-off-by: Will Deacon <will@kernel.org> > > > --- > > > fs/compat_binfmt_elf.c | 24 +++++++++++++++++++++++- > > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > > > > I started off hacking this into the arch code, but then I realised it was > > > just as easy doing it in the core for everybody to enjoy. Unfortunately, > > > after talking to Peter, it sounds like it doesn't really help on x86 > > > where userspace can switch to 32-bit without involving the kernel at all. > > > > > > Thoughts? > > > > I'm not sure I understand the logic behind the sysctl. Are you worried > > about exposing attack surface on devices that don't support 32-bit > > instructions at all but might be tricked into loading a 32-bit binary that > > exploits a bug in the elf loader, or do you want to remove compat support > > on some but not all devices running the same kernel? > > It's the latter case. With the GKI effort in Android, we want to run the > same kernel binary across multiple devices. However, for some devices > we may be able to determine that there is no need to support 32-bit > applications even though the hardware may support them, and we would > like to ensure that things like the compat syscall wrappers, compat vDSO, > signal handling etc are not accessible to applications. I like the idea! I wonder if the binfmts should have an "enabled" flag instead? This would make it not compat_binfmt_elf-specific, and would avoid a new "special" sysfs flag: static bool enabled = 1; module_param(enabled, bool, 0600); MODULE_PARM_DESC(enabled, "Whether this binfmt available for loading"); Then: echo 0 > /sys/module/compat_binfmt_elf/enabled > > > In the first case, having the kernel make the decision based on CPU > > feature flags would be easier. In the second case, I would expect this > > to be a per-process setting similar to prctl, capability or seccomp. > > This would make it possible to do it for separately per container > > and avoid ambiguity about what happens to already-running 32-bit > > tasks. > > I'm not sure I follow the per-process aspect of your suggestion -- we want > to prevent 32-bit tasks from existing at all. If it wasn't for GKI, we'd > just disable CONFIG_COMPAT altogether, but while there is a need for 32-bit > support on some devices then we're not able to do that. It's possible to do process-hierarchy-controlled compat-restriction on all architectures with an seccomp ARCH test. For example: BPF_STMT(BPF_LD+BPF_W+BPF_ABS, arch_nr), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, AUDIT_ARCH_X86_64, 1, 0), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL_PROCESS) BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW) This filter will have fixed tiny overhead because of the automatic seccomp bitmaps. FWIW, systemd exposes this feature via "SystemCallArchitectures=native". This doesn't stop the loader attack surface, though, so I think something to control that makes sense.
From: Will Deacon > Sent: 16 September 2021 16:14 ... > > I'm not sure I understand the logic behind the sysctl. Are you worried > > about exposing attack surface on devices that don't support 32-bit > > instructions at all but might be tricked into loading a 32-bit binary that > > exploits a bug in the elf loader, or do you want to remove compat support > > on some but not all devices running the same kernel? > > It's the latter case. With the GKI effort in Android, we want to run the > same kernel binary across multiple devices. However, for some devices > we may be able to determine that there is no need to support 32-bit > applications even though the hardware may support them, and we would > like to ensure that things like the compat syscall wrappers, compat vDSO, > signal handling etc are not accessible to applications. Interesting because there is the opposite requirement to run 32bit user code under emulation on a 64bit only cpu. This largely requires the kernel to contain the 32bit compatibility code - even though it can't execute the instructions. I suspect you could even embed the instruction emulator inside the elf interpreter. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, Sep 16, 2021 at 5:13 PM Will Deacon <will@kernel.org> wrote: > On Thu, Sep 16, 2021 at 04:46:15PM +0200, Arnd Bergmann wrote: > > On Thu, Sep 16, 2021 at 3:18 PM Will Deacon <will@kernel.org> wrote: > > In the first case, having the kernel make the decision based on CPU > > feature flags would be easier. In the second case, I would expect this > > to be a per-process setting similar to prctl, capability or seccomp. > > This would make it possible to do it for separately per container > > and avoid ambiguity about what happens to already-running 32-bit > > tasks. > > I'm not sure I follow the per-process aspect of your suggestion -- we want > to prevent 32-bit tasks from existing at all. If it wasn't for GKI, we'd > just disable CONFIG_COMPAT altogether, but while there is a need for 32-bit > support on some devices then we're not able to do that. > > Does that make more sense now? That sounds rather specific to your use case, but others may have similar requirements that are better served with a per-container or per-process flag. If your init process sets the process specific flag to prevent compat mode and non-root tasks are unable to set it back, the effect for you should be the same, but others may also be able to use the feature. Another option would be to make the binfmt helper a device specific module, in that case you wouldn't need to use a runtime feature at all, you just prevent the module from getting loaded. ;-) On a somewhat related note, a topic that has come up in the past is to make the syscall ABI user selectable across all architectures, and allow e.g. an arm64 task to call normal syscalls using the arm32 compat calling conventions, in order to simplify user space ISA emulation. This could even be done in a way to allow using foreign architecture syscall semantics for things like fex that emulates x86 on arm. If this gets added, having the conditional in the binfmt loader is a bit pointless. Arnd
diff --git a/fs/compat_binfmt_elf.c b/fs/compat_binfmt_elf.c index 95e72d271b95..e8ce6c8fff42 100644 --- a/fs/compat_binfmt_elf.c +++ b/fs/compat_binfmt_elf.c @@ -15,6 +15,8 @@ */ #include <linux/elfcore-compat.h> +#include <linux/init.h> +#include <linux/sysctl.h> #include <linux/time.h> #define ELF_COMPAT 1 @@ -63,7 +65,8 @@ */ #undef elf_check_arch -#define elf_check_arch compat_elf_check_arch +#define elf_check_arch(ex) \ + (compat_binfmt_elf_enable && compat_elf_check_arch(ex)) #ifdef COMPAT_ELF_PLATFORM #undef ELF_PLATFORM @@ -136,6 +139,25 @@ #define init_elf_binfmt init_compat_elf_binfmt #define exit_elf_binfmt exit_compat_elf_binfmt +static int compat_binfmt_elf_enable = 1; + +static struct ctl_table compat_elf_sysctl_table[] = { + { + .procname = "compat-binfmt-elf-enable", + .data = &compat_binfmt_elf_enable, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { }, +}; + +static int __init compat_elf_init(void) +{ + return register_sysctl("fs", compat_elf_sysctl_table) == NULL; +} +fs_initcall(compat_elf_init); + /* * We share all the actual code with the native (64-bit) version. */
Distributions such as Android which support a mixture of 32-bit (compat) and 64-bit (native) tasks necessarily ship with the compat ELF loader enabled in their kernels. However, as time goes by, an ever-increasing proportion of userspace consists of native applications and in some cases 32-bit capabilities are starting to be removed from the CPUs altogether. Inevitably, this means that the compat code becomes somewhat of a maintenance burden, receiving less testing coverage and exposing an additional kernel attack surface to userspace during the lengthy transitional period where some shipping devices require support for 32-bit binaries. Introduce a new sysctl 'fs.compat-binfmt-elf-enable' to allow the compat ELF loader to be disabled dynamically on devices where it is not required. On arm64, this is sufficient to prevent userspace from executing 32-bit code at all. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Deacon <will@kernel.org> --- fs/compat_binfmt_elf.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) I started off hacking this into the arch code, but then I realised it was just as easy doing it in the core for everybody to enjoy. Unfortunately, after talking to Peter, it sounds like it doesn't really help on x86 where userspace can switch to 32-bit without involving the kernel at all. Thoughts?