[RFC,v19,1/5] exec: Add a new AT_CHECK flag to execveat(2)

Message ID	20240704190137.696169-2-mic@digikod.net (mailing list archive)
State	New
Headers	show Received: from smtp-42ab.mail.infomaniak.ch (smtp-42ab.mail.infomaniak.ch [84.16.66.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6BB213D608; Thu, 4 Jul 2024 19:02:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=84.16.66.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720119739; cv=none; b=Yle/pfJV+2JqAwIpTxMRbH/w1cbrg8E0fHKnPrFpbFSmoOx4FQg3CKleopzwN/CmgKURA/mAvIgoVOhZuOJAnsAqIwYsnfXqo98fm4FpBotJcXvnRokZdF7UZYS6HbIZH+DHL9KHwCM9imyrEGvS5SWqSA6pzdoWMBzjPgzwMsA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720119739; c=relaxed/simple; bh=QkSeR/S7n24t9rwdBxmCu1WbP7aeAIrwOrKa2VVZbjA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NoxNucjD78ljwxXLfIxVcXSoFFPQVOqClycV+XMpJ3zpw5ke6Mw/dG6EvXN4kQMZIxwtpXWj8xdHvYTOa3Ho32N0REmUJw+D8ZGVXMMifoOzzphBh+CUpM+/OePhfsOp/p04mkxFem1w2qDhOar+Z1SRX/V17z8emeK5kXHWNeo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=RoJpMmrw; arc=none smtp.client-ip=84.16.66.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="RoJpMmrw" Received: from smtp-3-0001.mail.infomaniak.ch (smtp-3-0001.mail.infomaniak.ch [10.4.36.108]) by smtp-4-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4WFQxW5nlpzC3q; Thu, 4 Jul 2024 21:02:07 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1720119727; bh=Y9d96OsDMPb7+yCSisfZWoz+qtMBDpJ4FX9JzX/byVk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RoJpMmrwajgMIZzj+FDE8d2fMbX6V4zlKWa7FHgUGq82BsioXPo8PzYQfoHIFVUkz TV8A2QFNZM2+q+h0r43ok920fl0wvuPue9FRnVPait38jkHnUbIUXrLRWLEaLimXlR 1rX+c3eFRYDpywr2XT9ozXTEknnIa3ZHKGwnQiAQ= Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4WFQxR1bl1z3JG; Thu, 4 Jul 2024 21:02:03 +0200 (CEST) From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= <mic@digikod.net> To: Al Viro <viro@zeniv.linux.org.uk>, Christian Brauner <brauner@kernel.org>, Kees Cook <keescook@chromium.org>, Linus Torvalds <torvalds@linux-foundation.org>, Paul Moore <paul@paul-moore.com>, Theodore Ts'o <tytso@mit.edu> Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= <mic@digikod.net>, Alejandro Colomar <alx.manpages@gmail.com>, Aleksa Sarai <cyphar@cyphar.com>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>, Casey Schaufler <casey@schaufler-ca.com>, Christian Heimes <christian@python.org>, Dmitry Vyukov <dvyukov@google.com>, Eric Biggers <ebiggers@kernel.org>, Eric Chiang <ericchiang@google.com>, Fan Wu <wufan@linux.microsoft.com>, Florian Weimer <fweimer@redhat.com>, Geert Uytterhoeven <geert@linux-m68k.org>, James Morris <jamorris@linux.microsoft.com>, Jan Kara <jack@suse.cz>, Jann Horn <jannh@google.com>, Jeff Xu <jeffxu@google.com>, Jonathan Corbet <corbet@lwn.net>, Jordan R Abrahams <ajordanr@google.com>, Lakshmi Ramasubramanian <nramas@linux.microsoft.com>, Luca Boccassi <bluca@debian.org>, Luis Chamberlain <mcgrof@kernel.org>, "Madhavan T . Venkataraman" <madvenka@linux.microsoft.com>, Matt Bobrowski <mattbobrowski@google.com>, Matthew Garrett <mjg59@srcf.ucam.org>, Matthew Wilcox <willy@infradead.org>, Miklos Szeredi <mszeredi@redhat.com>, Mimi Zohar <zohar@linux.ibm.com>, Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>, Scott Shell <scottsh@microsoft.com>, Shuah Khan <shuah@kernel.org>, Stephen Rothwell <sfr@canb.auug.org.au>, Steve Dower <steve.dower@python.org>, Steve Grubb <sgrubb@redhat.com>, Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>, Vincent Strubel <vincent.strubel@ssi.gouv.fr>, Xiaoming Ni <nixiaoming@huawei.com>, Yin Fengwei <fengwei.yin@intel.com>, kernel-hardening@lists.openwall.com, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-integrity@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Date: Thu, 4 Jul 2024 21:01:33 +0200 Message-ID: <20240704190137.696169-2-mic@digikod.net> In-Reply-To: <20240704190137.696169-1-mic@digikod.net> References: <20240704190137.696169-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-security-module@vger.kernel.org List-Id: <linux-security-module.vger.kernel.org> List-Subscribe: <mailto:linux-security-module+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-security-module+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Infomaniak-Routing: alpha
Series	Script execution control (was O_MAYEXEC) \| expand [RFC,v19,0/5] Script execution control (was O_MAYEXEC) [RFC,v19,1/5] exec: Add a new AT_CHECK flag to execveat(2) [RFC,v19,2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits [RFC,v19,3/5] selftests/exec: Add tests for AT_CHECK and related securebits [RFC,v19,4/5] selftests/landlock: Add tests for execveat + AT_CHECK [RFC,v19,5/5] samples/should-exec: Add set-should-exec

Mickaël Salaün July 4, 2024, 7:01 p.m. UTC

Add a new AT_CHECK flag to execveat(2) to check if a file would be
allowed for execution.  The main use case is for script interpreters and
dynamic linkers to check execution permission according to the kernel's
security policy. Another use case is to add context to access logs e.g.,
which script (instead of interpreter) accessed a file.  As any
executable code, scripts could also use this check [1].

This is different than faccessat(2) which only checks file access
rights, but not the full context e.g. mount point's noexec, stack limit,
and all potential LSM extra checks (e.g. argv, envp, credentials).
Since the use of AT_CHECK follows the exact kernel semantic as for a
real execution, user space gets the same error codes.

With the information that a script interpreter is about to interpret a
script, an LSM security policy can adjust caller's access rights or log
execution request as for native script execution (e.g. role transition).
This is possible thanks to the call to security_bprm_creds_for_exec().

Because LSMs may only change bprm's credentials, use of AT_CHECK with
current kernel code should not be a security issue (e.g. unexpected role
transition).  LSMs willing to update the caller's credential could now
do so when bprm->is_check is set.  Of course, such policy change should
be in line with the new user space code.

Because AT_CHECK is dedicated to user space interpreters, it doesn't
make sense for the kernel to parse the checked files, look for
interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
if the format is unknown.  Because of that, security_bprm_check() is
never called when AT_CHECK is used.

It should be noted that script interpreters cannot directly use
execveat(2) (without this new AT_CHECK flag) because this could lead to
unexpected behaviors e.g., `python script.sh` could lead to Bash being
executed to interpret the script.  Unlike the kernel, script
interpreters may just interpret the shebang as a simple comment, which
should not change for backward compatibility reasons.

Because scripts or libraries files might not currently have the
executable permission set, or because we might want specific users to be
allowed to run arbitrary scripts, the following patch provides a dynamic
configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
SECBIT_SHOULD_EXEC_RESTRICT securebits.

This is a redesign of the CLIP OS 4's O_MAYEXEC:
https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
This patch has been used for more than a decade with customized script
interpreters.  Some examples can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://docs.python.org/3/library/io.html#io.open_code [1]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
---

New design since v18:
https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
---
 fs/exec.c                  |  5 +++--
 include/linux/binfmts.h    |  7 ++++++-
 include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
 kernel/audit.h             |  1 +
 kernel/auditsc.c           |  1 +
 5 files changed, 41 insertions(+), 3 deletions(-)

Kees Cook July 5, 2024, 12:04 a.m. UTC | #1

On Thu, Jul 04, 2024 at 09:01:33PM +0200, Mickaël Salaün wrote:
> Add a new AT_CHECK flag to execveat(2) to check if a file would be
> allowed for execution.  The main use case is for script interpreters and
> dynamic linkers to check execution permission according to the kernel's
> security policy. Another use case is to add context to access logs e.g.,
> which script (instead of interpreter) accessed a file.  As any
> executable code, scripts could also use this check [1].
> 
> This is different than faccessat(2) which only checks file access
> rights, but not the full context e.g. mount point's noexec, stack limit,
> and all potential LSM extra checks (e.g. argv, envp, credentials).
> Since the use of AT_CHECK follows the exact kernel semantic as for a
> real execution, user space gets the same error codes.

Nice! I much prefer this method of going through the exec machinery so
we always have a single code path for these kinds of checks.

> Because AT_CHECK is dedicated to user space interpreters, it doesn't
> make sense for the kernel to parse the checked files, look for
> interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> if the format is unknown.  Because of that, security_bprm_check() is
> never called when AT_CHECK is used.

I'd like some additional comments in the code that reminds us that
access control checks have finished past a certain point.

[...]
> diff --git a/fs/exec.c b/fs/exec.c
> index 40073142288f..ea2a1867afdc 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
>  		.lookup_flags = LOOKUP_FOLLOW,
>  	};
>  
> -	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> +	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
>  		return ERR_PTR(-EINVAL);
>  	if (flags & AT_SYMLINK_NOFOLLOW)
>  		open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
[...]
> + * To avoid race conditions leading to time-of-check to time-of-use issues,
> + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> + * descriptor instead of a path.

I want this enforced by the kernel. Let's not leave trivial ToCToU
foot-guns around. i.e.:

	if ((flags & AT_CHECK) == AT_CHECK && (flags & AT_EMPTY_PATH) == 0)
  		return ERR_PTR(-EBADF);

Mickaël Salaün July 5, 2024, 5:53 p.m. UTC | #2

On Thu, Jul 04, 2024 at 05:04:03PM -0700, Kees Cook wrote:
> On Thu, Jul 04, 2024 at 09:01:33PM +0200, Mickaël Salaün wrote:
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
> > 
> > This is different than faccessat(2) which only checks file access
> > rights, but not the full context e.g. mount point's noexec, stack limit,
> > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > real execution, user space gets the same error codes.
> 
> Nice! I much prefer this method of going through the exec machinery so
> we always have a single code path for these kinds of checks.
> 
> > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > make sense for the kernel to parse the checked files, look for
> > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > if the format is unknown.  Because of that, security_bprm_check() is
> > never called when AT_CHECK is used.
> 
> I'd like some additional comments in the code that reminds us that
> access control checks have finished past a certain point.

Where in the code? Just before the bprm->is_check assignment?

> 
> [...]
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 40073142288f..ea2a1867afdc 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> >  		.lookup_flags = LOOKUP_FOLLOW,
> >  	};
> >  
> > -	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > +	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> >  		return ERR_PTR(-EINVAL);
> >  	if (flags & AT_SYMLINK_NOFOLLOW)
> >  		open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> [...]
> > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > + * descriptor instead of a path.
> 
> I want this enforced by the kernel. Let's not leave trivial ToCToU
> foot-guns around. i.e.:
> 
> 	if ((flags & AT_CHECK) == AT_CHECK && (flags & AT_EMPTY_PATH) == 0)
>   		return ERR_PTR(-EBADF);

There are valid use cases relying on pathnames. See Linus's comment:
https://lore.kernel.org/r/CAHk-=whb=XuU=LGKnJWaa7LOYQz9VwHs8SLfgLbT5sf2VAbX1A@mail.gmail.com

Florian Weimer July 5, 2024, 6:03 p.m. UTC | #3

* Mickaël Salaün:

> Add a new AT_CHECK flag to execveat(2) to check if a file would be
> allowed for execution.  The main use case is for script interpreters and
> dynamic linkers to check execution permission according to the kernel's
> security policy. Another use case is to add context to access logs e.g.,
> which script (instead of interpreter) accessed a file.  As any
> executable code, scripts could also use this check [1].

Some distributions no longer set executable bits on most shared objects,
which I assume would interfere with AT_CHECK probing for shared objects.
Removing the executable bit is attractive because of a combination of
two bugs: a binutils wart which until recently always set the entry
point address in the ELF header to zero, and the kernel not checking for
a zero entry point (maybe in combination with an absent program
interpreter) and failing the execve with ELIBEXEC, instead of doing the
execve and then faulting at virtual address zero.  Removing the
executable bit is currently the only way to avoid these confusing
crashes, so I understand the temptation.

Thanks,
Florian

Andy Lutomirski July 6, 2024, 8:52 a.m. UTC | #4

On Fri, Jul 5, 2024 at 3:03 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> Add a new AT_CHECK flag to execveat(2) to check if a file would be
> allowed for execution.  The main use case is for script interpreters and
> dynamic linkers to check execution permission according to the kernel's
> security policy. Another use case is to add context to access logs e.g.,
> which script (instead of interpreter) accessed a file.  As any
> executable code, scripts could also use this check [1].
>

Can you give a worked-out example of how this is useful?

I assume the idea is that a program could open a file, then pass the
fd to execveat() to get the kernel's idea of whether it's permissible
to execute it.  And then the program would interpret the file, which
is morally like executing it.  And there would be a big warning in the
manpage that passing a *path* is subject to a TOCTOU race.

This type of usage will do the wrong thing if LSM policy intends to
lock down the task if the task were to actually exec the file.  I
personally think this is a mis-design (let the program doing the
exec-ing lock itself down, possibly by querying a policy, but having
magic happen on exec seems likely to do the wrong thing more often
that it does the wright thing), but that ship sailed a long time ago.

So maybe what's actually needed is a rather different API: a way to
check *and perform* the security transition for an exec without
actually execing.  This would need to be done NO_NEW_PRIVS style for
reasons that are hopefully obvious, but it would permit:

fd = open(some script);
if (do_exec_transition_without_exec(fd) != 0)
  return;  // don't actually do it

// OK, we may have just lost privileges.  But that's okay, because we
meant to do that.
// Make sure we've munmapped anything sensitive and erased any secrets
from memory,
// and then interpret the script!

I think this would actually be straightforward to implement in the
kernel -- one would need to make sure that all the relevant
no_new_privs checks are looking in the right place (as the task might
not actually have no_new_privs set, but LSM_UNSAFE_NO_NEW_PRIVS would
still be set), but I don't see any reason this would be
insurmountable, nor do I expect there would be any fundamental
problems.

> This is different than faccessat(2) which only checks file access
> rights, but not the full context e.g. mount point's noexec, stack limit,
> and all potential LSM extra checks (e.g. argv, envp, credentials).
> Since the use of AT_CHECK follows the exact kernel semantic as for a
> real execution, user space gets the same error codes.
>
> With the information that a script interpreter is about to interpret a
> script, an LSM security policy can adjust caller's access rights or log
> execution request as for native script execution (e.g. role transition).
> This is possible thanks to the call to security_bprm_creds_for_exec().
>
> Because LSMs may only change bprm's credentials, use of AT_CHECK with
> current kernel code should not be a security issue (e.g. unexpected role
> transition).  LSMs willing to update the caller's credential could now
> do so when bprm->is_check is set.  Of course, such policy change should
> be in line with the new user space code.
>
> Because AT_CHECK is dedicated to user space interpreters, it doesn't
> make sense for the kernel to parse the checked files, look for
> interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> if the format is unknown.  Because of that, security_bprm_check() is
> never called when AT_CHECK is used.
>
> It should be noted that script interpreters cannot directly use
> execveat(2) (without this new AT_CHECK flag) because this could lead to
> unexpected behaviors e.g., `python script.sh` could lead to Bash being
> executed to interpret the script.  Unlike the kernel, script
> interpreters may just interpret the shebang as a simple comment, which
> should not change for backward compatibility reasons.
>
> Because scripts or libraries files might not currently have the
> executable permission set, or because we might want specific users to be
> allowed to run arbitrary scripts, the following patch provides a dynamic
> configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> SECBIT_SHOULD_EXEC_RESTRICT securebits.

Can you explain what those bits do?  And why they're useful?

>
> This is a redesign of the CLIP OS 4's O_MAYEXEC:
> https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> This patch has been used for more than a decade with customized script
> interpreters.  Some examples can be found here:
> https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC

This one at least returns an fd, so it looks less likely to get
misused in a way that adds a TOCTOU race.

Mickaël Salaün July 6, 2024, 2:55 p.m. UTC | #5

On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote:
> * Mickaël Salaün:
> 
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
> 
> Some distributions no longer set executable bits on most shared objects,
> which I assume would interfere with AT_CHECK probing for shared objects.

A file without the execute permission is not considered as executable by
the kernel.  The AT_CHECK flag doesn't change this semantic.  Please
note that this is just a check, not a restriction.  See the next patch
for the optional policy enforcement.

Anyway, we need to define the policy, and for Linux this is done with
the file permission bits.  So for systems willing to have a consistent
execution policy, we need to rely on the same bits.

> Removing the executable bit is attractive because of a combination of
> two bugs: a binutils wart which until recently always set the entry
> point address in the ELF header to zero, and the kernel not checking for
> a zero entry point (maybe in combination with an absent program
> interpreter) and failing the execve with ELIBEXEC, instead of doing the
> execve and then faulting at virtual address zero.  Removing the
> executable bit is currently the only way to avoid these confusing
> crashes, so I understand the temptation.

Interesting.  Can you please point to the bug report and the fix?  I
don't see any ELIBEXEC in the kernel.

FYI, AT_CHECK doesn't check the content of the file (unlike a full
execve call).

Anyway, I think we should not design a new kernel interface to work
around a current user space bug.

Florian Weimer July 6, 2024, 3:32 p.m. UTC | #6

* Mickaël Salaün:

> On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote:
>> * Mickaël Salaün:
>> 
>> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
>> > allowed for execution.  The main use case is for script interpreters and
>> > dynamic linkers to check execution permission according to the kernel's
>> > security policy. Another use case is to add context to access logs e.g.,
>> > which script (instead of interpreter) accessed a file.  As any
>> > executable code, scripts could also use this check [1].
>> 
>> Some distributions no longer set executable bits on most shared objects,
>> which I assume would interfere with AT_CHECK probing for shared objects.
>
> A file without the execute permission is not considered as executable by
> the kernel.  The AT_CHECK flag doesn't change this semantic.  Please
> note that this is just a check, not a restriction.  See the next patch
> for the optional policy enforcement.
>
> Anyway, we need to define the policy, and for Linux this is done with
> the file permission bits.  So for systems willing to have a consistent
> execution policy, we need to rely on the same bits.

Yes, that makes complete sense.  I just wanted to point out the odd
interaction with the old binutils bug and the (sadly still current)
kernel bug.

>> Removing the executable bit is attractive because of a combination of
>> two bugs: a binutils wart which until recently always set the entry
>> point address in the ELF header to zero, and the kernel not checking for
>> a zero entry point (maybe in combination with an absent program
>> interpreter) and failing the execve with ELIBEXEC, instead of doing the
>> execve and then faulting at virtual address zero.  Removing the
>> executable bit is currently the only way to avoid these confusing
>> crashes, so I understand the temptation.
>
> Interesting.  Can you please point to the bug report and the fix?  I
> don't see any ELIBEXEC in the kernel.

The kernel hasn't been fixed yet.  I do think this should be fixed, so
that distributions can bring back the executable bit.

Thanks,
Florian

Mickaël Salaün July 7, 2024, 9:01 a.m. UTC | #7

On Sat, Jul 06, 2024 at 04:52:42PM +0800, Andy Lutomirski wrote:
> On Fri, Jul 5, 2024 at 3:03 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
> >
> 
> Can you give a worked-out example of how this is useful?

Which part?  Please take a look at CLIP OS, chromeOS, and PEP 578 use
cases and related code (see cover letter).

> 
> I assume the idea is that a program could open a file, then pass the
> fd to execveat() to get the kernel's idea of whether it's permissible
> to execute it.  And then the program would interpret the file, which
> is morally like executing it.  And there would be a big warning in the
> manpage that passing a *path* is subject to a TOCTOU race.

yes

> 
> This type of usage will do the wrong thing if LSM policy intends to
> lock down the task if the task were to actually exec the file.  I

Why? LSMs should currently only change the bprm's credentials not the
current's credentials.  If needed, we can extend the current patch
series with LSM specific patches for them to check bprm->is_check.

> personally think this is a mis-design (let the program doing the
> exec-ing lock itself down, possibly by querying a policy, but having
> magic happen on exec seems likely to do the wrong thing more often
> that it does the wright thing), but that ship sailed a long time ago.

The execveat+AT_CHECK is only a check that doesn't impact the caller.
Maybe you're talking about process transition with future LSM changes?
In this case, we could add another flag, but I'm convinced it would be
confusing for users.  Anyway, let LSMs experiment with that and we'll
come up with a new flag if needed.  The current approach is a good and
useful piece to fill a gap in Linux access control systems.

> 
> So maybe what's actually needed is a rather different API: a way to
> check *and perform* the security transition for an exec without
> actually execing.  This would need to be done NO_NEW_PRIVS style for
> reasons that are hopefully obvious, but it would permit:

NO_NEW_PRIVS is not that obvious in this case because the restrictions
are enforced by user space, not the kernel.  NO_NEW_PRIVS makes sense to
avoid kernel restrictions be requested by a malicious/unprivileged
process to change the behavior of a (child) privileged/trusted process.
We are not in this configuration here.  The only change would be for
ptrace, which is a good thing either way and should not harm SUID
processes but avoid confused deputy attack for them too.

If this is about an LSM changing the caller's credentials, then yes it
might want to set additional flags, but that would be specific to their
implementation, not part of this patch.

> 
> fd = open(some script);
> if (do_exec_transition_without_exec(fd) != 0)
>   return;  // don't actually do it
> 
> // OK, we may have just lost privileges.  But that's okay, because we
> meant to do that.
> // Make sure we've munmapped anything sensitive and erased any secrets
> from memory,
> // and then interpret the script!
> 
> I think this would actually be straightforward to implement in the
> kernel -- one would need to make sure that all the relevant
> no_new_privs checks are looking in the right place (as the task might
> not actually have no_new_privs set, but LSM_UNSAFE_NO_NEW_PRIVS would
> still be set), but I don't see any reason this would be
> insurmountable, nor do I expect there would be any fundamental
> problems.

OK, that's what is described below with security_bprm_creds_for_exec().
Each LSM can implement this change with the current patch series, but
that should be part of a dedicated patch series per LSM, for those
willing to leverage this new feature.

> 
> 
> > This is different than faccessat(2) which only checks file access
> > rights, but not the full context e.g. mount point's noexec, stack limit,
> > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > real execution, user space gets the same error codes.
> >
> > With the information that a script interpreter is about to interpret a
> > script, an LSM security policy can adjust caller's access rights or log
> > execution request as for native script execution (e.g. role transition).
> > This is possible thanks to the call to security_bprm_creds_for_exec().
> >
> > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > current kernel code should not be a security issue (e.g. unexpected role
> > transition).  LSMs willing to update the caller's credential could now
> > do so when bprm->is_check is set.  Of course, such policy change should
> > be in line with the new user space code.
> >
> > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > make sense for the kernel to parse the checked files, look for
> > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > if the format is unknown.  Because of that, security_bprm_check() is
> > never called when AT_CHECK is used.
> >
> > It should be noted that script interpreters cannot directly use
> > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > executed to interpret the script.  Unlike the kernel, script
> > interpreters may just interpret the shebang as a simple comment, which
> > should not change for backward compatibility reasons.
> >
> > Because scripts or libraries files might not currently have the
> > executable permission set, or because we might want specific users to be
> > allowed to run arbitrary scripts, the following patch provides a dynamic
> > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> 
> Can you explain what those bits do?  And why they're useful?

I didn't want to duplicate the comments above their definition
explaining their usage.  Please let me know if it's not enough.

> 
> >
> > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > This patch has been used for more than a decade with customized script
> > interpreters.  Some examples can be found here:
> > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> 
> This one at least returns an fd, so it looks less likely to get
> misused in a way that adds a TOCTOU race.

We can use both an FD or a path name with execveat(2).  See discussion
with Kees and comment from Linus.

[RFC,v19,1/5] exec: Add a new AT_CHECK flag to execveat(2)

Commit Message

Comments

Patch