diff mbox series

[RFC] fs/compat_binfmt_elf: Introduce sysctl to disable compat ELF loader

Message ID 20210916131816.8841-1-will@kernel.org (mailing list archive)
State New, archived
Headers show
Series [RFC] fs/compat_binfmt_elf: Introduce sysctl to disable compat ELF loader | expand

Commit Message

Will Deacon Sept. 16, 2021, 1:18 p.m. UTC
Distributions such as Android which support a mixture of 32-bit (compat)
and 64-bit (native) tasks necessarily ship with the compat ELF loader
enabled in their kernels. However, as time goes by, an ever-increasing
proportion of userspace consists of native applications and in some cases
32-bit capabilities are starting to be removed from the CPUs altogether.

Inevitably, this means that the compat code becomes somewhat of a
maintenance burden, receiving less testing coverage and exposing an
additional kernel attack surface to userspace during the lengthy
transitional period where some shipping devices require support for
32-bit binaries.

Introduce a new sysctl 'fs.compat-binfmt-elf-enable' to allow the compat
ELF loader to be disabled dynamically on devices where it is not required.
On arm64, this is sufficient to prevent userspace from executing 32-bit
code at all.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 fs/compat_binfmt_elf.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

I started off hacking this into the arch code, but then I realised it was
just as easy doing it in the core for everybody to enjoy. Unfortunately,
after talking to Peter, it sounds like it doesn't really help on x86
where userspace can switch to 32-bit without involving the kernel at all.

Thoughts?

Comments

Arnd Bergmann Sept. 16, 2021, 2:46 p.m. UTC | #1
On Thu, Sep 16, 2021 at 3:18 PM Will Deacon <will@kernel.org> wrote:
>
> Distributions such as Android which support a mixture of 32-bit (compat)
> and 64-bit (native) tasks necessarily ship with the compat ELF loader
> enabled in their kernels. However, as time goes by, an ever-increasing
> proportion of userspace consists of native applications and in some cases
> 32-bit capabilities are starting to be removed from the CPUs altogether.
>
> Inevitably, this means that the compat code becomes somewhat of a
> maintenance burden, receiving less testing coverage and exposing an
> additional kernel attack surface to userspace during the lengthy
> transitional period where some shipping devices require support for
> 32-bit binaries.
>
> Introduce a new sysctl 'fs.compat-binfmt-elf-enable' to allow the compat
> ELF loader to be disabled dynamically on devices where it is not required.
> On arm64, this is sufficient to prevent userspace from executing 32-bit
> code at all.
>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  fs/compat_binfmt_elf.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> I started off hacking this into the arch code, but then I realised it was
> just as easy doing it in the core for everybody to enjoy. Unfortunately,
> after talking to Peter, it sounds like it doesn't really help on x86
> where userspace can switch to 32-bit without involving the kernel at all.
>
> Thoughts?

I'm not sure I understand the logic behind the sysctl. Are you worried
about exposing attack surface on devices that don't support 32-bit
instructions at all but might be tricked into loading a 32-bit binary that
exploits a bug in the elf loader, or do you want to remove compat support
on some but not all devices running the same kernel?

In the first case, having the kernel make the decision based on CPU
feature flags would be easier. In the second case, I would expect this
to be a per-process setting similar to prctl, capability or seccomp.
This would make it possible to do it for separately per container
and avoid ambiguity about what happens to already-running 32-bit
tasks.

        Arnd
Will Deacon Sept. 16, 2021, 3:13 p.m. UTC | #2
Hi Arnd,

On Thu, Sep 16, 2021 at 04:46:15PM +0200, Arnd Bergmann wrote:
> On Thu, Sep 16, 2021 at 3:18 PM Will Deacon <will@kernel.org> wrote:
> >
> > Distributions such as Android which support a mixture of 32-bit (compat)
> > and 64-bit (native) tasks necessarily ship with the compat ELF loader
> > enabled in their kernels. However, as time goes by, an ever-increasing
> > proportion of userspace consists of native applications and in some cases
> > 32-bit capabilities are starting to be removed from the CPUs altogether.
> >
> > Inevitably, this means that the compat code becomes somewhat of a
> > maintenance burden, receiving less testing coverage and exposing an
> > additional kernel attack surface to userspace during the lengthy
> > transitional period where some shipping devices require support for
> > 32-bit binaries.
> >
> > Introduce a new sysctl 'fs.compat-binfmt-elf-enable' to allow the compat
> > ELF loader to be disabled dynamically on devices where it is not required.
> > On arm64, this is sufficient to prevent userspace from executing 32-bit
> > code at all.
> >
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  fs/compat_binfmt_elf.c | 24 +++++++++++++++++++++++-
> >  1 file changed, 23 insertions(+), 1 deletion(-)
> >
> > I started off hacking this into the arch code, but then I realised it was
> > just as easy doing it in the core for everybody to enjoy. Unfortunately,
> > after talking to Peter, it sounds like it doesn't really help on x86
> > where userspace can switch to 32-bit without involving the kernel at all.
> >
> > Thoughts?
> 
> I'm not sure I understand the logic behind the sysctl. Are you worried
> about exposing attack surface on devices that don't support 32-bit
> instructions at all but might be tricked into loading a 32-bit binary that
> exploits a bug in the elf loader, or do you want to remove compat support
> on some but not all devices running the same kernel?

It's the latter case. With the GKI effort in Android, we want to run the
same kernel binary across multiple devices. However, for some devices
we may be able to determine that there is no need to support 32-bit
applications even though the hardware may support them, and we would
like to ensure that things like the compat syscall wrappers, compat vDSO,
signal handling etc are not accessible to applications.

> In the first case, having the kernel make the decision based on CPU
> feature flags would be easier. In the second case, I would expect this
> to be a per-process setting similar to prctl, capability or seccomp.
> This would make it possible to do it for separately per container
> and avoid ambiguity about what happens to already-running 32-bit
> tasks.

I'm not sure I follow the per-process aspect of your suggestion -- we want
to prevent 32-bit tasks from existing at all. If it wasn't for GKI, we'd
just disable CONFIG_COMPAT altogether, but while there is a need for 32-bit
support on some devices then we're not able to do that.

Does that make more sense now?

Cheers,

Will
Kees Cook Sept. 16, 2021, 3:56 p.m. UTC | #3
On Thu, Sep 16, 2021 at 04:13:37PM +0100, Will Deacon wrote:
> Hi Arnd,
> 
> On Thu, Sep 16, 2021 at 04:46:15PM +0200, Arnd Bergmann wrote:
> > On Thu, Sep 16, 2021 at 3:18 PM Will Deacon <will@kernel.org> wrote:
> > >
> > > Distributions such as Android which support a mixture of 32-bit (compat)
> > > and 64-bit (native) tasks necessarily ship with the compat ELF loader
> > > enabled in their kernels. However, as time goes by, an ever-increasing
> > > proportion of userspace consists of native applications and in some cases
> > > 32-bit capabilities are starting to be removed from the CPUs altogether.
> > >
> > > Inevitably, this means that the compat code becomes somewhat of a
> > > maintenance burden, receiving less testing coverage and exposing an
> > > additional kernel attack surface to userspace during the lengthy
> > > transitional period where some shipping devices require support for
> > > 32-bit binaries.
> > >
> > > Introduce a new sysctl 'fs.compat-binfmt-elf-enable' to allow the compat
> > > ELF loader to be disabled dynamically on devices where it is not required.
> > > On arm64, this is sufficient to prevent userspace from executing 32-bit
> > > code at all.
> > >
> > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > Cc: Andy Lutomirski <luto@kernel.org>
> > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  fs/compat_binfmt_elf.c | 24 +++++++++++++++++++++++-
> > >  1 file changed, 23 insertions(+), 1 deletion(-)
> > >
> > > I started off hacking this into the arch code, but then I realised it was
> > > just as easy doing it in the core for everybody to enjoy. Unfortunately,
> > > after talking to Peter, it sounds like it doesn't really help on x86
> > > where userspace can switch to 32-bit without involving the kernel at all.
> > >
> > > Thoughts?
> > 
> > I'm not sure I understand the logic behind the sysctl. Are you worried
> > about exposing attack surface on devices that don't support 32-bit
> > instructions at all but might be tricked into loading a 32-bit binary that
> > exploits a bug in the elf loader, or do you want to remove compat support
> > on some but not all devices running the same kernel?
> 
> It's the latter case. With the GKI effort in Android, we want to run the
> same kernel binary across multiple devices. However, for some devices
> we may be able to determine that there is no need to support 32-bit
> applications even though the hardware may support them, and we would
> like to ensure that things like the compat syscall wrappers, compat vDSO,
> signal handling etc are not accessible to applications.

I like the idea! I wonder if the binfmts should have an "enabled" flag
instead? This would make it not compat_binfmt_elf-specific, and would
avoid a new "special" sysfs flag:

static bool enabled = 1;
module_param(enabled, bool, 0600);
MODULE_PARM_DESC(enabled, "Whether this binfmt available for loading");

Then:
echo 0 > /sys/module/compat_binfmt_elf/enabled

> 
> > In the first case, having the kernel make the decision based on CPU
> > feature flags would be easier. In the second case, I would expect this
> > to be a per-process setting similar to prctl, capability or seccomp.
> > This would make it possible to do it for separately per container
> > and avoid ambiguity about what happens to already-running 32-bit
> > tasks.
> 
> I'm not sure I follow the per-process aspect of your suggestion -- we want
> to prevent 32-bit tasks from existing at all. If it wasn't for GKI, we'd
> just disable CONFIG_COMPAT altogether, but while there is a need for 32-bit
> support on some devices then we're not able to do that.

It's possible to do process-hierarchy-controlled compat-restriction on
all architectures with an seccomp ARCH test. For example:

	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, arch_nr),
	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, AUDIT_ARCH_X86_64, 1, 0),
	BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL_PROCESS)
	BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)

This filter will have fixed tiny overhead because of the automatic
seccomp bitmaps.

FWIW, systemd exposes this feature via "SystemCallArchitectures=native".

This doesn't stop the loader attack surface, though, so I think
something to control that makes sense.
David Laight Sept. 16, 2021, 4:07 p.m. UTC | #4
From: Will Deacon
> Sent: 16 September 2021 16:14
...
> > I'm not sure I understand the logic behind the sysctl. Are you worried
> > about exposing attack surface on devices that don't support 32-bit
> > instructions at all but might be tricked into loading a 32-bit binary that
> > exploits a bug in the elf loader, or do you want to remove compat support
> > on some but not all devices running the same kernel?
> 
> It's the latter case. With the GKI effort in Android, we want to run the
> same kernel binary across multiple devices. However, for some devices
> we may be able to determine that there is no need to support 32-bit
> applications even though the hardware may support them, and we would
> like to ensure that things like the compat syscall wrappers, compat vDSO,
> signal handling etc are not accessible to applications.

Interesting because there is the opposite requirement to run
32bit user code under emulation on a 64bit only cpu.
This largely requires the kernel to contain the 32bit
compatibility code - even though it can't execute the instructions.

I suspect you could even embed the instruction emulator inside the
elf interpreter.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Arnd Bergmann Sept. 16, 2021, 4:17 p.m. UTC | #5
On Thu, Sep 16, 2021 at 5:13 PM Will Deacon <will@kernel.org> wrote:
> On Thu, Sep 16, 2021 at 04:46:15PM +0200, Arnd Bergmann wrote:
> > On Thu, Sep 16, 2021 at 3:18 PM Will Deacon <will@kernel.org> wrote:
> > In the first case, having the kernel make the decision based on CPU
> > feature flags would be easier. In the second case, I would expect this
> > to be a per-process setting similar to prctl, capability or seccomp.
> > This would make it possible to do it for separately per container
> > and avoid ambiguity about what happens to already-running 32-bit
> > tasks.
>
> I'm not sure I follow the per-process aspect of your suggestion -- we want
> to prevent 32-bit tasks from existing at all. If it wasn't for GKI, we'd
> just disable CONFIG_COMPAT altogether, but while there is a need for 32-bit
> support on some devices then we're not able to do that.
>
> Does that make more sense now?

That sounds rather specific to your use case, but others may have similar
requirements that are better served with a per-container or per-process
flag. If your init process sets the process specific flag to prevent compat
mode and non-root tasks are unable to set it back, the effect for you
should be the same, but others may also be able to use the feature.

Another option would be to make the binfmt helper a device specific
module, in that case you wouldn't need to use a runtime feature at all,
you just prevent the module from getting loaded. ;-)

On a somewhat related note, a topic that has come up in the past
is to make the syscall ABI user selectable across all architectures, and
allow e.g. an arm64 task to call normal syscalls using the arm32
compat calling conventions, in order to simplify user space ISA emulation.
This could even be done in a way to allow using foreign architecture
syscall semantics for things like fex that emulates x86 on arm.
If this gets added, having the conditional in the binfmt loader is
a bit pointless.

      Arnd
diff mbox series

Patch

diff --git a/fs/compat_binfmt_elf.c b/fs/compat_binfmt_elf.c
index 95e72d271b95..e8ce6c8fff42 100644
--- a/fs/compat_binfmt_elf.c
+++ b/fs/compat_binfmt_elf.c
@@ -15,6 +15,8 @@ 
  */
 
 #include <linux/elfcore-compat.h>
+#include <linux/init.h>
+#include <linux/sysctl.h>
 #include <linux/time.h>
 
 #define ELF_COMPAT	1
@@ -63,7 +65,8 @@ 
  */
 
 #undef	elf_check_arch
-#define	elf_check_arch	compat_elf_check_arch
+#define	elf_check_arch(ex)	\
+	(compat_binfmt_elf_enable && compat_elf_check_arch(ex))
 
 #ifdef	COMPAT_ELF_PLATFORM
 #undef	ELF_PLATFORM
@@ -136,6 +139,25 @@ 
 #define init_elf_binfmt		init_compat_elf_binfmt
 #define exit_elf_binfmt		exit_compat_elf_binfmt
 
+static int compat_binfmt_elf_enable = 1;
+
+static struct ctl_table compat_elf_sysctl_table[] = {
+	{
+		.procname	= "compat-binfmt-elf-enable",
+		.data		= &compat_binfmt_elf_enable,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{ },
+};
+
+static int __init compat_elf_init(void)
+{
+	return register_sysctl("fs", compat_elf_sysctl_table) == NULL;
+}
+fs_initcall(compat_elf_init);
+
 /*
  * We share all the actual code with the native (64-bit) version.
  */