[v2,0/5] Add support for O_MAYEXEC
mbox series

Message ID 20190906152455.22757-1-mic@digikod.net
Headers show
Series
  • Add support for O_MAYEXEC
Related show

Message

Mickaël Salaün Sept. 6, 2019, 3:24 p.m. UTC
Hi,

The goal of this patch series is to control script interpretation.  A
new O_MAYEXEC flag used by sys_open() is added to enable userspace
script interpreter to delegate to the kernel (and thus the system
security policy) the permission to interpret/execute scripts or other
files containing what can be seen as commands.

This second series mainly differ from the previous one [1] by moving the
basic security policy from Yama to the filesystem subsystem.  This
policy can be enforced by the system administrator through a sysctl
configuration consistent with the mount points.

Furthermore, the security policy can also be delegated to an LSM, either
a MAC system or an integrity system.  For instance, the new kernel
MAY_OPENEXEC flag closes a major IMA measurement/appraisal interpreter
integrity gap by bringing the ability to check the use of scripts [2].
Other uses are expected, such as for openat2(2) [3], SGX integration
[4], and bpffs [5].

Userspace need to adapt to take advantage of this new feature.  For
example, the PEP 578 [6] (Runtime Audit Hooks) enables Python 3.8 to be
extended with policy enforcement points related to code interpretation,
which can be used to align with the PowerShell audit features.
Additional Python security improvements (e.g. a limited interpreter
withou -c, stdin piping of code) are on their way.

The initial idea come from CLIP OS and the original implementation has
been used for more than 10 years:
https://github.com/clipos-archive/clipos4_doc

An introduction to O_MAYEXEC was given at the Linux Security Summit
Europe 2018 - Linux Kernel Security Contributions by ANSSI:
https://www.youtube.com/watch?v=chNjCRtPKQY&t=17m15s
The "write xor execute" principle was explained at Kernel Recipes 2018 -
CLIP OS: a defense-in-depth OS:
https://www.youtube.com/watch?v=PjRE0uBtkHU&t=11m14s

This patch series can be applied on top of v5.3-rc7.  This can be tested
with CONFIG_SYSCTL.  I would really appreciate constructive comments on
this patch series.


# Changes since v1

* move code from Yama to the FS subsystem
* set __FMODE_EXEC when using O_MAYEXEC to make this information
  available through the new fanotify/FAN_OPEN_EXEC event
* only match regular files (not directories nor other types), which
  follows the same semantic as commit 73601ea5b7b1 ("fs/open.c: allow
  opening only regular files during execve()")
* improve tests

[1] https://lore.kernel.org/lkml/20181212081712.32347-1-mic@digikod.net/
[2] https://lore.kernel.org/lkml/1544647356.4028.105.camel@linux.ibm.com/
[3] https://lore.kernel.org/lkml/20190904201933.10736-6-cyphar@cyphar.com/
[4] https://lore.kernel.org/lkml/CALCETrVovr8XNZSroey7pHF46O=kj_c5D9K8h=z2T_cNrpvMig@mail.gmail.com/
[5] https://lore.kernel.org/lkml/CALCETrVeZ0eufFXwfhtaG_j+AdvbzEWE0M3wjXMWVEO7pj+xkw@mail.gmail.com/
[6] https://www.python.org/dev/peps/pep-0578/

Regards,

Mickaël Salaün (5):
  fs: Add support for an O_MAYEXEC flag on sys_open()
  fs: Add a MAY_EXECMOUNT flag to infer the noexec mount propertie
  fs: Enable to enforce noexec mounts or file exec through O_MAYEXEC
  selftest/exec: Add tests for O_MAYEXEC enforcing
  doc: Add documentation for the fs.open_mayexec_enforce sysctl

 Documentation/admin-guide/sysctl/fs.rst     |  43 +++
 fs/fcntl.c                                  |   2 +-
 fs/namei.c                                  |  70 +++++
 fs/open.c                                   |   6 +
 include/linux/fcntl.h                       |   2 +-
 include/linux/fs.h                          |   7 +
 include/uapi/asm-generic/fcntl.h            |   3 +
 kernel/sysctl.c                             |   7 +
 tools/testing/selftests/exec/.gitignore     |   1 +
 tools/testing/selftests/exec/Makefile       |   4 +-
 tools/testing/selftests/exec/omayexec.c     | 317 ++++++++++++++++++++
 tools/testing/selftests/kselftest_harness.h |   3 +
 12 files changed, 462 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/exec/omayexec.c

Comments

Steve Grubb Sept. 6, 2019, 6:50 p.m. UTC | #1
On Friday, September 6, 2019 11:24:50 AM EDT Mickaël Salaün wrote:
> The goal of this patch series is to control script interpretation.  A
> new O_MAYEXEC flag used by sys_open() is added to enable userspace
> script interpreter to delegate to the kernel (and thus the system
> security policy) the permission to interpret/execute scripts or other
> files containing what can be seen as commands.

The problem is that this is only a gentleman's handshake. If I don't tell the
kernel that what I'm opening is tantamount to executing it, then the security
feature is never invoked. It is simple to strip the flags off of any system
call without needing privileges. For example:

#define _GNU_SOURCE
#include <link.h>
#include <fcntl.h>
#include <string.h>

unsigned int
la_version(unsigned int version)
{
    return version;
}

unsigned int
la_objopen(struct link_map *map, Lmid_t lmid, uintptr_t *cookie)
{
    return LA_FLG_BINDTO | LA_FLG_BINDFROM;
}

typedef int (*openat_t) (int dirfd, const char *pathname, int flags, mode_t mode);
static openat_t real_openat = 0L;
int my_openat(int dirfd, const char *pathname, int flags, mode_t mode)
{
        flags &= ~O_CLOEXEC;
        return real_openat(dirfd, pathname, flags, mode);
}

uintptr_t
la_symbind64(Elf64_Sym *sym, unsigned int ndx, uintptr_t *refcook,
        uintptr_t *defcook, unsigned int *flags, const char *symname)
{
    if (real_openat == 0L && strcmp(symname, "openat") == 0) {
        real_openat = (openat_t) sym->st_value;
        return (uintptr_t) my_openat;
    }
    return sym->st_value;
}

gcc -c -g -Wno-unused-parameter -W -Wall -Wundef -O2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fPIC  test.c
gcc -o strip-flags.so.0 -shared -Wl,-soname,strip-flags.so.0 -ldl test.o

Now, let's make a test program:

#include <stdio.h>
#include <dirent.h>
#include <fcntl.h>
#include <unistd.h>

int main(void)
{
        int dir_fd, fd;
        DIR *d = opendir("/etc");
        dir_fd = dirfd(d);
        fd = openat(dir_fd, "passwd", O_RDONLY|O_CLOEXEC);
        close (fd);
        closedir(d);
        return 0;
}

gcc -g -W -Wall -Wundef test.c -o test

OK, let's see what happens.
$ strace ./test 2>&1 | grep passwd
openat(3, "passwd", O_RDONLY|O_CLOEXEC) = 4

Now with LD_AUDIT
$ LD_AUDIT=/home/sgrubb/test/openflags/strip-flags.so.0 strace ./test 2>&1 | grep passwd
openat(3, "passwd", O_RDONLY)           = 4

No O_CLOEXEC flag.

-Steve
Florian Weimer Sept. 6, 2019, 6:57 p.m. UTC | #2
* Steve Grubb:

> Now with LD_AUDIT
> $ LD_AUDIT=/home/sgrubb/test/openflags/strip-flags.so.0 strace ./test 2>&1 | grep passwd
> openat(3, "passwd", O_RDONLY)           = 4
>
> No O_CLOEXEC flag.

I think you need to explain in detail why you consider this a problem.

With LD_PRELOAD and LD_AUDIT, you can already do anything, including
scanning other loaded objects for a system call instruction and jumping
to that (in case a security module in the kernel performs a PC check to
confer additional privileges).

Thanks,
Florian
Steve Grubb Sept. 6, 2019, 7:07 p.m. UTC | #3
On Friday, September 6, 2019 2:57:00 PM EDT Florian Weimer wrote:
> * Steve Grubb:
> > Now with LD_AUDIT
> > $ LD_AUDIT=/home/sgrubb/test/openflags/strip-flags.so.0 strace ./test
> > 2>&1 | grep passwd openat(3, "passwd", O_RDONLY)           = 4
> > 
> > No O_CLOEXEC flag.
> 
> I think you need to explain in detail why you consider this a problem.

Because you can strip the O_MAYEXEC flag from being passed into the kernel. 
Once you do that, you defeat the security mechanism because it never gets 
invoked. The issue is that the only thing that knows _why_ something is being 
opened is user space. With this mechanism, you can attempt to pass this 
reason to the kernel so that it may see if policy permits this. But you can 
just remove the flag.

-Steve

> With LD_PRELOAD and LD_AUDIT, you can already do anything, including
> scanning other loaded objects for a system call instruction and jumping
> to that (in case a security module in the kernel performs a PC check to
> confer additional privileges).
> 
> Thanks,
> Florian
Andy Lutomirski Sept. 6, 2019, 7:26 p.m. UTC | #4
> On Sep 6, 2019, at 12:07 PM, Steve Grubb <sgrubb@redhat.com> wrote:
> 
>> On Friday, September 6, 2019 2:57:00 PM EDT Florian Weimer wrote:
>> * Steve Grubb:
>>> Now with LD_AUDIT
>>> $ LD_AUDIT=/home/sgrubb/test/openflags/strip-flags.so.0 strace ./test
>>> 2>&1 | grep passwd openat(3, "passwd", O_RDONLY)           = 4
>>> 
>>> No O_CLOEXEC flag.
>> 
>> I think you need to explain in detail why you consider this a problem.
> 
> Because you can strip the O_MAYEXEC flag from being passed into the kernel. 
> Once you do that, you defeat the security mechanism because it never gets 
> invoked. The issue is that the only thing that knows _why_ something is being 
> opened is user space. With this mechanism, you can attempt to pass this 
> reason to the kernel so that it may see if policy permits this. But you can 
> just remove the flag.

I’m with Florian here. Once you are executing code in a process, you could just emulate some other unapproved code. This series is not intended to provide the kind of absolute protection you’re imagining.

What the kernel *could* do is prevent mmapping a non-FMODE_EXEC file with PROT_EXEC, which would indeed have a real effect (in an iOS-like world, for example) but would break many, many things.
Aleksa Sarai Sept. 6, 2019, 10:44 p.m. UTC | #5
On 2019-09-06, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Sep 6, 2019, at 12:07 PM, Steve Grubb <sgrubb@redhat.com> wrote:
> > 
> >> On Friday, September 6, 2019 2:57:00 PM EDT Florian Weimer wrote:
> >> * Steve Grubb:
> >>> Now with LD_AUDIT
> >>> $ LD_AUDIT=/home/sgrubb/test/openflags/strip-flags.so.0 strace ./test
> >>> 2>&1 | grep passwd openat(3, "passwd", O_RDONLY)           = 4
> >>> 
> >>> No O_CLOEXEC flag.
> >> 
> >> I think you need to explain in detail why you consider this a problem.
> > 
> > Because you can strip the O_MAYEXEC flag from being passed into the kernel. 
> > Once you do that, you defeat the security mechanism because it never gets 
> > invoked. The issue is that the only thing that knows _why_ something is being 
> > opened is user space. With this mechanism, you can attempt to pass this 
> > reason to the kernel so that it may see if policy permits this. But you can 
> > just remove the flag.
> 
> I’m with Florian here. Once you are executing code in a process, you
> could just emulate some other unapproved code. This series is not
> intended to provide the kind of absolute protection you’re imagining.

I also agree, though I think that there is a separate argument to be
made that there are two possible problems with O_MAYEXEC (which might
not be really big concerns):

  * It's very footgun-prone if you didn't call O_MAYEXEC yourself and
    you pass the descriptor elsewhere. You need to check f_flags to see
    if it contains O_MAYEXEC. Maybe there is an argument to be made that
    passing O_MAYEXECs around isn't a valid use-case, but in that case
    there should be some warnings about that.

  * There's effectively a TOCTOU flaw (even if you are sure O_MAYEXEC is
    in f_flags) -- if the filesystem becomes re-mounted noexec (or the
    file has a-x permissions) after you've done the check you won't get
    hit with an error when you go to use the file descriptor later.

To fix both you'd need to do what you mention later:

> What the kernel *could* do is prevent mmapping a non-FMODE_EXEC file
> with PROT_EXEC, which would indeed have a real effect (in an iOS-like
> world, for example) but would break many, many things.

And I think this would be useful (with the two possible ways of
executing .text split into FMODE_EXEC and FMODE_MAP_EXEC, as mentioned
in a sister subthread), but would have to be opt-in for the obvious
reason you outlined. However, we could make it the default for
openat2(2) -- assuming we can agree on what the semantics of a
theoretical FMODE_EXEC should be.

And of course we'd need to do FMODE_UPGRADE_EXEC (which would need to
also permit fexecve(2) though probably not PROT_EXEC -- I don't think
you can mmap() an O_PATH descriptor).
James Morris Sept. 9, 2019, 12:16 a.m. UTC | #6
On Fri, 6 Sep 2019, Mickaël Salaün wrote:

> Furthermore, the security policy can also be delegated to an LSM, either
> a MAC system or an integrity system.  For instance, the new kernel
> MAY_OPENEXEC flag closes a major IMA measurement/appraisal interpreter
> integrity gap by bringing the ability to check the use of scripts [2].

To clarify, scripts are already covered by IMA if they're executed 
directly, and the gap is when invoking a script as a parameter to the 
interpreter (and for any sourced files). In that case only the interpreter 
is measured/appraised, unless there's a rule also covering the script 
file(s).

See:
https://en.opensuse.org/SDB:Ima_evm#script-behaviour

In theory you could probably also close the gap by modifying the 
interpreters to check for the execute bit on any file opened for 
interpretation (as earlier suggested by Steve Grubb), and then you could 
have IMA measure/appraise all files with +x.  I suspect this could get 
messy in terms of unwanted files being included, and the MAY_OPENEXEC flag 
has cleaner semantics.
Mickaël Salaün Sept. 9, 2019, 9:09 a.m. UTC | #7
On 07/09/2019 00:44, Aleksa Sarai wrote:
> On 2019-09-06, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Sep 6, 2019, at 12:07 PM, Steve Grubb <sgrubb@redhat.com> wrote:
>>>
>>>> On Friday, September 6, 2019 2:57:00 PM EDT Florian Weimer wrote:
>>>> * Steve Grubb:
>>>>> Now with LD_AUDIT
>>>>> $ LD_AUDIT=/home/sgrubb/test/openflags/strip-flags.so.0 strace ./test
>>>>> 2>&1 | grep passwd openat(3, "passwd", O_RDONLY)           = 4
>>>>>
>>>>> No O_CLOEXEC flag.
>>>>
>>>> I think you need to explain in detail why you consider this a problem.

Right, LD_PRELOAD and such things are definitely not part of the threat
model for O_MAYEXEC, on purpose, because this must be addressed with
other security mechanism (e.g. correct file system access-control, IMA
policy, SELinux or other LSM security policies). This is a requirement
for O_MAYEXEC to be useful.

An interpreter is just a flexible program which is generic and doesn't
have other purpose other than behaving accordingly to external rules
(i.e. scripts). If you don't trust your interpreter, it should not be
executable in the first place. O_MAYEXEC enables to restrict the use of
(some) interpreters accordingly to a *global* system security policy.

>>>
>>> Because you can strip the O_MAYEXEC flag from being passed into the kernel.
>>> Once you do that, you defeat the security mechanism because it never gets
>>> invoked. The issue is that the only thing that knows _why_ something is being
>>> opened is user space. With this mechanism, you can attempt to pass this
>>> reason to the kernel so that it may see if policy permits this. But you can
>>> just remove the flag.
>>
>> I’m with Florian here. Once you are executing code in a process, you
>> could just emulate some other unapproved code. This series is not
>> intended to provide the kind of absolute protection you’re imagining.
>
> I also agree, though I think that there is a separate argument to be
> made that there are two possible problems with O_MAYEXEC (which might
> not be really big concerns):
>
>   * It's very footgun-prone if you didn't call O_MAYEXEC yourself and
>     you pass the descriptor elsewhere. You need to check f_flags to see
>     if it contains O_MAYEXEC. Maybe there is an argument to be made that
>     passing O_MAYEXECs around isn't a valid use-case, but in that case
>     there should be some warnings about that.

That could be an issue if you don't trust your system, especially if the
mount points (and the "noexec" option) can be changed by untrusted
users. As I said above, there is a requirement for basic security
properties as a meaningful file system access control, and obviously not
letting any user change mount points (which can lead to much sever
security issues anyway).

If a process A pass a FD to an interpreter B, then the interpreter B
must trust the process A. Moreover, being able to tell if the FD was
open with O_MAYEXEC and relying on it may create a wrong feeling of
security. As I said in a previous email, being able to probe for
O_MAYEXEC does not make sense because it would not be enough to
know the system policy (either this flag is enforced or not, for mount
points, based on xattr, time…). The main goal of O_MAYEXEC is to ask the
kernel, on a trusted link (hence without LD_PRELOAD-like interfering),
for a file which is allowed to be interpreted/executed by this interpreter.

To be able to correctly handle the case you pointed out (FD passing),
either an existing or a new LSM should handle this behavior according to
the origin of the FD and the chain of processes getting it.

Some advanced LSM rules could tie interpreters with scripts dedicated to
them, and have different behavior for the same scripts but with
different interpreters.

>
>   * There's effectively a TOCTOU flaw (even if you are sure O_MAYEXEC is
>     in f_flags) -- if the filesystem becomes re-mounted noexec (or the
>     file has a-x permissions) after you've done the check you won't get
>     hit with an error when you go to use the file descriptor later.

Again, the threat model needs to be appropriate to make O_MAYEXEC
useful. The security policies of the system need to be seen as a whole,
and updated as such.

As for most file system access control on Linux, it may be possible to
have TOCTOU, but the whole system should be designed to protect against
that. For example, changing file access control (e.g. mount point
options) without a reboot may lead to inconsistent security properties,
which is why such thing are discouraged by some access control systems
(e.g. SELinux).

>
> To fix both you'd need to do what you mention later:
>
>> What the kernel *could* do is prevent mmapping a non-FMODE_EXEC file
>> with PROT_EXEC, which would indeed have a real effect (in an iOS-like
>> world, for example) but would break many, many things.
>
> And I think this would be useful (with the two possible ways of
> executing .text split into FMODE_EXEC and FMODE_MAP_EXEC, as mentioned
> in a sister subthread), but would have to be opt-in for the obvious
> reason you outlined. However, we could make it the default for
> openat2(2) -- assuming we can agree on what the semantics of a
> theoretical FMODE_EXEC should be.
>
> And of course we'd need to do FMODE_UPGRADE_EXEC (which would need to
> also permit fexecve(2) though probably not PROT_EXEC -- I don't think
> you can mmap() an O_PATH descriptor).

The mmapping restriction may be interesting but it is a different use
case. This series address the interpreter/script problem. Either the
script may be mapped executable is the choice of the interpreter. In
most cases, no script are mapped as such, exactly because they are
interpreted by a process but not by the CPU.


--
Mickaël Salaün

Les données à caractère personnel recueillies et traitées dans le cadre de cet échange, le sont à seule fin d’exécution d’une relation professionnelle et s’opèrent dans cette seule finalité et pour la durée nécessaire à cette relation. Si vous souhaitez faire usage de vos droits de consultation, de rectification et de suppression de vos données, veuillez contacter contact.rgpd@sgdsn.gouv.fr. Si vous avez reçu ce message par erreur, nous vous remercions d’en informer l’expéditeur et de détruire le message. The personal data collected and processed during this exchange aims solely at completing a business relationship and is limited to the necessary duration of that relationship. If you wish to use your rights of consultation, rectification and deletion of your data, please contact: contact.rgpd@sgdsn.gouv.fr. If you have received this message in error, we thank you for informing the sender and destroying the message.