Message ID | 20191105090553.6350-1-cyphar@cyphar.com (mailing list archive) |
---|---|
Headers | show |
Series | open: introduce openat2(2) syscall | expand |
On 2019-11-05, Aleksa Sarai <cyphar@cyphar.com> wrote: > This patchset is being developed here: > <https://github.com/cyphar/linux/tree/openat2/master> > > Patch changelog: > v15: > * Fix code style for LOOKUP_IN_ROOT handling in path_init(). [Linus Torvalds] > * Split out patches for each individual LOOKUP flag. > * Reword commit messages to give more background information about the > series, as well as mention the semantics of each flag in more detail. > v14: <https://lore.kernel.org/lkml/20191010054140.8483-1-cyphar@cyphar.com/> > <https://lore.kernel.org/lkml/20191026185700.10708-1-cyphar@cyphar.com> > v13: <https://lore.kernel.org/lkml/20190930183316.10190-1-cyphar@cyphar.com/> > v12: <https://lore.kernel.org/lkml/20190904201933.10736-1-cyphar@cyphar.com/> > v11: <https://lore.kernel.org/lkml/20190820033406.29796-1-cyphar@cyphar.com/> > <https://lore.kernel.org/lkml/20190728010207.9781-1-cyphar@cyphar.com/> > v10: <https://lore.kernel.org/lkml/20190719164225.27083-1-cyphar@cyphar.com/> > v09: <https://lore.kernel.org/lkml/20190706145737.5299-1-cyphar@cyphar.com/> > v08: <https://lore.kernel.org/lkml/20190520133305.11925-1-cyphar@cyphar.com/> > v07: <https://lore.kernel.org/lkml/20190507164317.13562-1-cyphar@cyphar.com/> > v06: <https://lore.kernel.org/lkml/20190506165439.9155-1-cyphar@cyphar.com/> > v05: <https://lore.kernel.org/lkml/20190320143717.2523-1-cyphar@cyphar.com/> > v04: <https://lore.kernel.org/lkml/20181112142654.341-1-cyphar@cyphar.com/> > v03: <https://lore.kernel.org/lkml/20181009070230.12884-1-cyphar@cyphar.com/> > v02: <https://lore.kernel.org/lkml/20181009065300.11053-1-cyphar@cyphar.com/> > v01: <https://lore.kernel.org/lkml/20180929103453.12025-1-cyphar@cyphar.com/> > > For a very long time, extending openat(2) with new features has been > incredibly frustrating. This stems from the fact that openat(2) is > possibly the most famous counter-example to the mantra "don't silently > accept garbage from userspace" -- it doesn't check whether unknown flags > are present[1]. > > This means that (generally) the addition of new flags to openat(2) has > been fraught with backwards-compatibility issues (O_TMPFILE has to be > defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old > kernels gave errors, since it's insecure to silently ignore the > flag[2]). All new security-related flags therefore have a tough road to > being added to openat(2). > > Furthermore, the need for some sort of control over VFS's path resolution (to > avoid malicious paths resulting in inadvertent breakouts) has been a very > long-standing desire of many userspace applications. This patchset is a revival > of Al Viro's old AT_NO_JUMPS[3] patchset (which was a variant of David > Drysdale's O_BENEATH patchset[4] which was a spin-off of the Capsicum > project[5]) with a few additions and changes made based on the previous > discussion within [6] as well as others I felt were useful. > > In line with the conclusions of the original discussion of AT_NO_JUMPS, the > flag has been split up into separate flags. However, instead of being an > openat(2) flag it is provided through a new syscall openat2(2) which provides > several other improvements to the openat(2) interface (see the patch > description for more details). The following new LOOKUP_* flags are added: > > * LOOKUP_NO_XDEV blocks all mountpoint crossings (upwards, downwards, > or through absolute links). Absolute pathnames alone in openat(2) do not > trigger this. Magic-link traversal which implies a vfsmount jump is also > blocked (though magic-link jumps on the same vfsmount are permitted). > > * LOOKUP_NO_MAGICLINKS blocks resolution through /proc/$pid/fd-style > links. This is done by blocking the usage of nd_jump_link() during > resolution in a filesystem. The term "magic-links" is used to match > with the only reference to these links in Documentation/, but I'm > happy to change the name. > > It should be noted that this is different to the scope of > ~LOOKUP_FOLLOW in that it applies to all path components. However, > you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it > will *not* fail (assuming that no parent component was a > magic-link), and you will have an fd for the magic-link. > > In order to correctly detect magic-links, the introduction of a new > LOOKUP_MAGICLINK_JUMPED state flag was required. > > * LOOKUP_BENEATH disallows escapes to outside the starting dirfd's > tree, using techniques such as ".." or absolute links. Absolute > paths in openat(2) are also disallowed. Conceptually this flag is to > ensure you "stay below" a certain point in the filesystem tree -- > but this requires some additional to protect against various races > that would allow escape using "..". > > Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it > can trivially beam you around the filesystem (breaking the > protection). In future, there might be similar safety checks done as > in LOOKUP_IN_ROOT, but that requires more discussion. > > In addition, two new flags are added that expand on the above ideas: > > * LOOKUP_NO_SYMLINKS does what it says on the tin. No symlink > resolution is allowed at all, including magic-links. Just as with > LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an > fd for the symlink as long as no parent path had a symlink > component. > > * LOOKUP_IN_ROOT is an extension of LOOKUP_BENEATH that, rather than > blocking attempts to move past the root, forces all such movements > to be scoped to the starting point. This provides chroot(2)-like > protection but without the cost of a chroot(2) for each filesystem > operation, as well as being safe against race attacks that chroot(2) > is not. > > If a race is detected (as with LOOKUP_BENEATH) then an error is > generated, and similar to LOOKUP_BENEATH it is not permitted to cross > magic-links with LOOKUP_IN_ROOT. > > The primary need for this is from container runtimes, which > currently need to do symlink scoping in userspace[7] when opening > paths in a potentially malicious container. There is a long list of > CVEs that could have bene mitigated by having RESOLVE_THIS_ROOT > (such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and > CVE-2019-5736, just to name a few). > > In order to make all of the above more usable, I'm working on > libpathrs[8] which is a C-friendly library for safe path resolution. It > features a userspace-emulated backend if the kernel doesn't support > openat2(2). Hopefully we can get userspace to switch to using it, and > thus get openat2(2) support for free once it's ready. > > Future work would include implementing things like RESOLVE_NO_AUTOMOUNT and > possibly a RESOLVE_NO_REMOTE (to allow programs to be sure they don't hit DoSes > though stale NFS handles). > > [1]: https://lwn.net/Articles/588444/ > [2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com > [3]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk > [4]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@google.com > [5]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@google.com > [6]: https://lwn.net/Articles/723057/ > [7]: https://github.com/cyphar/filepath-securejoin > [8]: https://github.com/openSUSE/libpathrs > > The current draft of the openat2(2) man-page is included below. > > --8<--------------------------------------------------------------------------- > OPENAT2(2) Linux Programmer's Manual OPENAT2(2) > > NAME > openat2 - open and possibly create a file (extended) > > SYNOPSIS > #include <sys/types.h> > #include <sys/stat.h> > #include <fcntl.h> > > int openat2(int dirfd, const char *pathname, struct open_how *how, size_t size); > > Note: There is no glibc wrapper for this system call; see NOTES. > > DESCRIPTION > The openat2() system call opens the file specified by pathname. If the specified file > does not exist, it may optionally (if O_CREAT is specified in how.flags) be created by > openat2(). > > As with openat(2), if pathname is relative, then it is interpreted relative to the direc- > tory referred to by the file descriptor dirfd (or the current working directory of the > calling process, if dirfd is the special value AT_FDCWD.) If pathname is absolute, then > dirfd is ignored (unless how.resolve contains RESOLVE_IN_ROOT, in which case pathname is > resolved relative to dirfd.) > > The openat2() system call is an extension of openat(2) and provides a superset of its > functionality. Rather than taking a single flag argument, an extensible structure (how) > is passed instead to allow for future extensions. size must be set to sizeof(struct > open_how), to facilitate future extensions (see the "Extensibility" section of the NOTES > for more detail on how extensions are handled.) > > The open_how structure > The following structure indicates how pathname should be opened, and acts as a superset of > the flag and mode arguments to openat(2). > > struct open_how { > __aligned_u64 flags; /* O_* flags. */ > __u16 mode; /* Mode for O_{CREAT,TMPFILE}. */ > __u16 __padding[3]; /* Must be zeroed. */ > __aligned_u64 resolve; /* RESOLVE_* flags. */ > }; > > Any future extensions to openat2() will be implemented as new fields appended to the above > structure (or through reuse of pre-existing padding space), with the zero value of the new > fields acting as though the extension were not present. > > The meaning of each field is as follows: > > flags > The file creation and status flags to use for this operation. All of the > O_* flags defined for openat(2) are valid openat2() flag values. > > Unlike openat(2), it is an error to provide openat2() unknown or conflicting > flags in flags. > > mode > File mode for the new file, with identical semantics to the mode argument to > openat(2). However, unlike openat(2), it is an error to provide openat2() > with a mode which contains bits other than 0777. > > It is an error to provide openat2() a non-zero mode if flags does not con- > tain O_CREAT or O_TMPFILE. > > resolve > Change how the components of pathname will be resolved (see path_resolu- > tion(7) for background information.) The primary use case for these flags > is to allow trusted programs to restrict how untrusted paths (or paths in- > side untrusted directories) are resolved. The full list of resolve flags is > given below. > > RESOLVE_NO_XDEV > Disallow traversal of mount points during path resolution (including > all bind mounts). > > Users of this flag are encouraged to make its use configurable (un- > less it is used for a specific security purpose), as bind mounts are > very widely used by end-users. Setting this flag indiscrimnately for > all uses of openat2() may result in spurious errors on previously- > functional systems. > > RESOLVE_NO_SYMLINKS > Disallow resolution of symbolic links during path resolution. This > option implies RESOLVE_NO_MAGICLINKS. > > If the trailing component is a symbolic link, and flags contains both > O_PATH and O_NOFOLLOW, then an O_PATH file descriptor referencing the > symbolic link will be returned. > > Users of this flag are encouraged to make its use configurable (un- > less it is used for a specific security purpose), as symbolic links > are very widely used by end-users. Setting this flag indiscrimnately > for all uses of openat2() may result in spurious errors on previ- > ously-functional systems. > > RESOLVE_NO_MAGICLINKS > Disallow all magic link resolution during path resolution. > > If the trailing component is a magic link, and flags contains both > O_PATH and O_NOFOLLOW, then an O_PATH file descriptor referencing the > magic link will be returned. > > Magic-links are symbolic link-like objects that are most notably > found in proc(5) (examples include /proc/[pid]/exe and > /proc/[pid]/fd/*.) Due to the potential danger of unknowingly open- > ing these magic links, it may be preferable for users to disable > their resolution entirely (see symboliclink(7) for more details.) > > RESOLVE_BENEATH > Do not permit the path resolution to succeed if any component of the > resolution is not a descendant of the directory indicated by dirfd. > This results in absolute symbolic links (and absolute values of path- > name) to be rejected. > > Currently, this flag also disables magic link resolution. However, > this may change in the future. The caller should explicitly specify > RESOLVE_NO_MAGICLINKS to ensure that magic links are not resolved. > > RESOLVE_IN_ROOT > Treat dirfd as the root directory while resolving pathname (as though > the user called chroot(2) with dirfd as the argument.) Absolute sym- > bolic links and ".." path components will be scoped to dirfd. If > pathname is an absolute path, it is also treated relative to dirfd. > > However, unlike chroot(2) (which changes the filesystem root perma- > nently for a process), RESOLVE_IN_ROOT allows a program to effi- > ciently restrict path resolution for only certain operations. It > also has several hardening features (such detecting escape attempts > during .. resolution) which chroot(2) does not. > > Currently, this flag also disables magic link resolution. However, > this may change in the future. The caller should explicitly specify > RESOLVE_NO_MAGICLINKS to ensure that magic links are not resolved. > > It is an error to provide openat2() unknown flags in resolve. > > RETURN VALUE > On success, a new file descriptor is returned. On error, -1 is returned, and errno is set > appropriately. > > ERRORS > The set of errors returned by openat2() includes all of the errors returned by openat(2), > as well as the following additional errors: > > EINVAL An unknown flag or invalid value was specified in how. > > EINVAL mode is non-zero, but flags does not contain O_CREAT or O_TMPFILE. > > EINVAL size was smaller than any known version of struct open_how. > > E2BIG An extension was specified in how, which the current kernel does not support (see > the "Extensibility" section of the NOTES for more detail on how extensions are han- > dled.) > > EAGAIN resolve contains either RESOLVE_IN_ROOT or RESOLVE_BENEATH, and the kernel could > not ensure that a ".." component didn't escape (due to a race condition or poten- > tial attack.) Callers may choose to retry the openat2() call. > > EXDEV resolve contains either RESOLVE_IN_ROOT or RESOLVE_BENEATH, and an escape from the > root during path resolution was detected. > > EXDEV resolve contains RESOLVE_NO_XDEV, and a path component attempted to cross a mount > point. > > ELOOP resolve contains RESOLVE_NO_SYMLINKS, and one of the path components was a symbolic > link (or magic link). > > ELOOP resolve contains RESOLVE_NO_MAGICLINKS, and one of the path components was a magic > link. > > VERSIONS > openat2() was added to Linux in kernel 5.FOO. > > CONFORMING TO > This system call is Linux-specific. > > The semantics of RESOLVE_BENEATH were modelled after FreeBSD's O_BENEATH. > > NOTES > Glibc does not provide a wrapper for this system call; call it using systemcall(2). > > Extensibility > In order to allow for struct open_how to be extended in future kernel revisions, openat2() > requires userspace to specify the size of struct open_how structure they are passing. By > providing this information, it is possible for openat2() to provide both forwards- and > backwards-compatibility — with size acting as an implicit version number (because new ex- > tension fields will always be appended, the size will always increase.) This extensibil- > ity design is very similar to other system calls such as perf_setattr(2), > perf_event_open(2), and clone(3). > > If we let usize be the size of the structure according to userspace and ksize be the size > of the structure which the kernel supports, then there are only three cases to consider: > > * If ksize equals usize, then there is no version mismatch and how can be used > verbatim. > > * If ksize is larger than usize, then there are some extensions the kernel sup- > ports which the userspace program is unaware of. Because all extensions must > have their zero values be a no-op, the kernel treats all of the extension fields > not set by userspace to have zero values. This provides backwards-compatibil- > ity. > > * If ksize is smaller than usize, then there are some extensions which the > userspace program is aware of but the kernel does not support. Because all ex- > tensions must have their zero values be a no-op, the kernel can safely ignore > the unsupported extension fields if they are all-zero. If any unsupported ex- > tension fields are non-zero, then -1 is returned and errno is set to E2BIG. > This provides forwards-compatibility. > > Therefore, most userspace programs will not need to have any special handling of exten- > sions. However, if a userspace program wishes to determine what extensions the running > kernel supports, they may conduct a binary search on size (to find the largest value which > doesn't produce an error of E2BIG.) > > SEE ALSO > openat(2), path_resolution(7), symlink(7) > > Linux 2019-11-05 OPENAT2(2) > --8<--------------------------------------------------------------------------- > > Aleksa Sarai (9): > namei: LOOKUP_NO_SYMLINKS: block symlink resolution > namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution > namei: LOOKUP_NO_XDEV: block mountpoint crossing > namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution > namei: LOOKUP_IN_ROOT: chroot-like scoped resolution > namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution > open: introduce openat2(2) syscall > selftests: add openat2(2) selftests > Documentation: path-lookup: mention LOOKUP_MAGICLINK_JUMPED > > CREDITS | 4 +- > Documentation/filesystems/path-lookup.rst | 18 +- > arch/alpha/kernel/syscalls/syscall.tbl | 1 + > arch/arm/tools/syscall.tbl | 1 + > arch/arm64/include/asm/unistd.h | 2 +- > arch/arm64/include/asm/unistd32.h | 2 + > arch/ia64/kernel/syscalls/syscall.tbl | 1 + > arch/m68k/kernel/syscalls/syscall.tbl | 1 + > arch/microblaze/kernel/syscalls/syscall.tbl | 1 + > arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + > arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + > arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + > arch/parisc/kernel/syscalls/syscall.tbl | 1 + > arch/powerpc/kernel/syscalls/syscall.tbl | 1 + > arch/s390/kernel/syscalls/syscall.tbl | 1 + > arch/sh/kernel/syscalls/syscall.tbl | 1 + > arch/sparc/kernel/syscalls/syscall.tbl | 1 + > arch/x86/entry/syscalls/syscall_32.tbl | 1 + > arch/x86/entry/syscalls/syscall_64.tbl | 1 + > arch/xtensa/kernel/syscalls/syscall.tbl | 1 + > fs/namei.c | 176 +++++- > fs/open.c | 149 +++-- > include/linux/fcntl.h | 12 +- > include/linux/namei.h | 11 + > include/linux/syscalls.h | 3 + > include/uapi/asm-generic/unistd.h | 5 +- > include/uapi/linux/fcntl.h | 41 ++ > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/openat2/.gitignore | 1 + > tools/testing/selftests/openat2/Makefile | 8 + > tools/testing/selftests/openat2/helpers.c | 109 ++++ > tools/testing/selftests/openat2/helpers.h | 107 ++++ > .../testing/selftests/openat2/openat2_test.c | 316 +++++++++++ > .../selftests/openat2/rename_attack_test.c | 160 ++++++ > .../testing/selftests/openat2/resolve_test.c | 523 ++++++++++++++++++ > 35 files changed, 1591 insertions(+), 73 deletions(-) > create mode 100644 tools/testing/selftests/openat2/.gitignore > create mode 100644 tools/testing/selftests/openat2/Makefile > create mode 100644 tools/testing/selftests/openat2/helpers.c > create mode 100644 tools/testing/selftests/openat2/helpers.h > create mode 100644 tools/testing/selftests/openat2/openat2_test.c > create mode 100644 tools/testing/selftests/openat2/rename_attack_test.c > create mode 100644 tools/testing/selftests/openat2/resolve_test.c > > > base-commit: a99d8080aaf358d5d23581244e5da23b35e340b9 Ping -- this patch hasn't been touched for a week. Thanks.
On Tue, Nov 12, 2019 at 12:24:04AM +1100, Aleksa Sarai wrote: > On 2019-11-05, Aleksa Sarai <cyphar@cyphar.com> wrote: > > This patchset is being developed here: > > <https://github.com/cyphar/linux/tree/openat2/master> > > > > Patch changelog: > > v15: > > * Fix code style for LOOKUP_IN_ROOT handling in path_init(). [Linus Torvalds] > > * Split out patches for each individual LOOKUP flag. > > * Reword commit messages to give more background information about the > > series, as well as mention the semantics of each flag in more detail. > > [...] > > Ping -- this patch hasn't been touched for a week. Thanks. If I've been following correctly, everyone is happy with this series. (i.e. Linus's comment appear to have been addressed.) Perhaps the next question is should this go via a pull request by you to Linus directly during the v5.5 merge window, via akpm, via akpm, via Christian, or some other path? Besides Linus, it's not been clear who should "claim" this series. :)
On Tue, Nov 12, 2019 at 03:01:26PM -0800, Kees Cook wrote: > On Tue, Nov 12, 2019 at 12:24:04AM +1100, Aleksa Sarai wrote: > > On 2019-11-05, Aleksa Sarai <cyphar@cyphar.com> wrote: > > > This patchset is being developed here: > > > <https://github.com/cyphar/linux/tree/openat2/master> > > > > > > Patch changelog: > > > v15: > > > * Fix code style for LOOKUP_IN_ROOT handling in path_init(). [Linus Torvalds] > > > * Split out patches for each individual LOOKUP flag. > > > * Reword commit messages to give more background information about the > > > series, as well as mention the semantics of each flag in more detail. > > > [...] > > > > Ping -- this patch hasn't been touched for a week. Thanks. > > If I've been following correctly, everyone is happy with this series. > (i.e. Linus's comment appear to have been addressed.) > > Perhaps the next question is should this go via a pull request by you to > Linus directly during the v5.5 merge window, via akpm, via akpm, via > Christian, or some other path? Besides Linus, it's not been clear who > should "claim" this series. :) I like this series and the same with the copy_struct_from_user() part of it I've taken I'm happy to stuff this into a dedicated branch, merge it into my for-next and send it for v5.5. Though I'd _much_ rather see Al pick this up or have him give his blessing first. Christian
On 2019-11-12, Kees Cook <keescook@chromium.org> wrote: > On Tue, Nov 12, 2019 at 12:24:04AM +1100, Aleksa Sarai wrote: > > On 2019-11-05, Aleksa Sarai <cyphar@cyphar.com> wrote: > > > This patchset is being developed here: > > > <https://github.com/cyphar/linux/tree/openat2/master> > > > > > > Patch changelog: > > > v15: > > > * Fix code style for LOOKUP_IN_ROOT handling in path_init(). [Linus Torvalds] > > > * Split out patches for each individual LOOKUP flag. > > > * Reword commit messages to give more background information about the > > > series, as well as mention the semantics of each flag in more detail. > > > [...] > > > > Ping -- this patch hasn't been touched for a week. Thanks. > > If I've been following correctly, everyone is happy with this series. > (i.e. Linus's comment appear to have been addressed.) > > Perhaps the next question is should this go via a pull request by you to > Linus directly during the v5.5 merge window, via akpm, via akpm, via > Christian, or some other path? Besides Linus, it's not been clear who > should "claim" this series. :) Given the namei changes, I wanted to avoid stepping on Al's toes. Though he did review the series a few versions ago, the discussion didn't focus on the openat2(2) semantics (which have also changed since then). I'm not sure whether to interpret the silence to mean he's satisfied with things as they are, or if he hasn't had more time to look at the series. As for which tree it should be routed to, I don't mind -- Christian is the most straight-forward choice (but if Al wants to route it, that's fine with me too).