From patchwork Tue Nov 5 09:05:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 11227199 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF59B913 for ; Tue, 5 Nov 2019 09:07:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D820721929 for ; Tue, 5 Nov 2019 09:07:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730640AbfKEJHZ (ORCPT ); Tue, 5 Nov 2019 04:07:25 -0500 Received: from mout-p-202.mailbox.org ([80.241.56.172]:31768 "EHLO mout-p-202.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727093AbfKEJHZ (ORCPT ); Tue, 5 Nov 2019 04:07:25 -0500 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 476kMD5BTTzQlBr; Tue, 5 Nov 2019 10:07:20 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by hefe.heinlein-support.de (hefe.heinlein-support.de [91.198.250.172]) (amavisd-new, port 10030) with ESMTP id hBZfkmaeX9ST; Tue, 5 Nov 2019 10:07:08 +0100 (CET) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan , Ingo Molnar , Peter Zijlstra Cc: Aleksa Sarai , Christian Brauner , Linus Torvalds , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Rasmus Villemoes , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Christian Brauner , Aleksa Sarai , containers@lists.linux-foundation.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, libc-alpha@sourceware.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH v15 1/9] namei: LOOKUP_NO_SYMLINKS: block symlink resolution Date: Tue, 5 Nov 2019 20:05:45 +1100 Message-Id: <20191105090553.6350-2-cyphar@cyphar.com> In-Reply-To: <20191105090553.6350-1-cyphar@cyphar.com> References: <20191105090553.6350-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org /* Background. */ Userspace cannot easily resolve a path without resolving symlinks, and would have to manually resolve each path component with O_PATH and O_NOFOLLOW. This is clearly inefficient, and can be fairly easy to screw up (resulting in possible security bugs). Linus has mentioned that Git has a particular need for this kind of flag[1]. It also resolves a fairly long-standing perceived deficiency in O_NOFOLLOw -- that it only blocks the opening of trailing symlinks. This is part of a refresh of Al's AT_NO_JUMPS patchset[2] (which was a variation on David Drysdale's O_BENEATH patchset[3], which in turn was based on the Capsicum project[4]). /* Userspace API. */ LOOKUP_NO_SYMLINKS will be exposed to userspace through openat2(2). /* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_NO_SYMLINKS applies to all components of the path. With LOOKUP_NO_SYMLINKS, any symlink path component encountered during path resolution will yield -ELOOP. If the trailing component is a symlink (and no other components were symlinks), then O_PATH|O_NOFOLLOW will not error out and will instead provide a handle to the trailing symlink -- without resolving it. /* Testing. */ LOOKUP_NO_SYMLINKS is tested as part of the openat2(2) selftests. [1]: https://lore.kernel.org/lkml/CA+55aFyOKM7DW7+0sdDFKdZFXgptb5r1id9=Wvhd8AgSP7qjwQ@mail.gmail.com/ [2]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk/ [3]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@google.com/ [4]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@google.com/ Cc: Christian Brauner Suggested-by: Al Viro Suggested-by: Linus Torvalds Signed-off-by: Aleksa Sarai --- fs/namei.c | 3 +++ include/linux/namei.h | 3 +++ 2 files changed, 6 insertions(+) diff --git a/fs/namei.c b/fs/namei.c index 671c3c1a3425..4e85d6fa4048 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1045,6 +1045,9 @@ const char *get_link(struct nameidata *nd) int error; const char *res; + if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS)) + return ERR_PTR(-ELOOP); + if (!(nd->flags & LOOKUP_RCU)) { touch_atime(&last->link); cond_resched(); diff --git a/include/linux/namei.h b/include/linux/namei.h index 397a08ade6a2..ee2e35af387f 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -39,6 +39,9 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; #define LOOKUP_ROOT 0x2000 #define LOOKUP_ROOT_GRABBED 0x0008 +/* Scoping flags for lookup. */ +#define LOOKUP_NO_SYMLINKS 0x020000 /* No symlink crossing. */ + extern int path_pts(struct path *path); extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty); From patchwork Tue Nov 5 09:05:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 11227219 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3114F4A7D for ; Tue, 5 Nov 2019 09:07:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1960F217F5 for ; Tue, 5 Nov 2019 09:07:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730638AbfKEJHw (ORCPT ); Tue, 5 Nov 2019 04:07:52 -0500 Received: from mout-p-101.mailbox.org ([80.241.56.151]:52700 "EHLO mout-p-101.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730520AbfKEJHw (ORCPT ); Tue, 5 Nov 2019 04:07:52 -0500 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 476kMm3cMqzKmZ4; Tue, 5 Nov 2019 10:07:48 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter02.heinlein-hosting.de (spamfilter02.heinlein-hosting.de [80.241.56.116]) (amavisd-new, port 10030) with ESMTP id zzFBXk6XVzBI; Tue, 5 Nov 2019 10:07:28 +0100 (CET) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan , Ingo Molnar , Peter Zijlstra Cc: Aleksa Sarai , Christian Brauner , David Drysdale , Andy Lutomirski , Linus Torvalds , Eric Biederman , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , Chanho Min , Oleg Nesterov , Rasmus Villemoes , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Christian Brauner , Aleksa Sarai , containers@lists.linux-foundation.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, libc-alpha@sourceware.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH v15 2/9] namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution Date: Tue, 5 Nov 2019 20:05:46 +1100 Message-Id: <20191105090553.6350-3-cyphar@cyphar.com> In-Reply-To: <20191105090553.6350-1-cyphar@cyphar.com> References: <20191105090553.6350-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org /* Background. */ There has always been a special class of symlink-like objects in procfs (and a few other pseudo-filesystems) which allow for non-lexical resolution of paths using nd_jump_link(). These "magic-links" do not follow traditional mount namespace boundaries, and have been used consistently in container escape attacks because they can be used to trick unsuspecting privileged processes into resolving unexpected paths. It is also non-trivial for userspace to unambiguously avoid resolving magic-links, because they do not have a reliable indication that they are a magic-link (in order to verify them you'd have to manually open the path given by readlink(2) and then verify that the two file descriptors reference the same underlying file, which is plagued with possible race conditions or supplementary attack scenarios). It would therefore be very helpful for userspace to be able to avoid these symlinks easily, thus hopefully removing a tool from attackers' toolboxes. This is part of a refresh of Al's AT_NO_JUMPS patchset[1] (which was a variation on David Drysdale's O_BENEATH patchset[2], which in turn was based on the Capsicum project[3]). /* Userspace API. */ LOOKUP_NO_MAGICLINKS will be exposed to userspace through openat2(2). /* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_NO_MAGICLINKS applies to all components of the path. With LOOKUP_NO_MAGICLINKS, any magic-link path component encountered during path resolution will yield -ELOOP. The handling of ~LOOKUP_FOLLOW for a trailing magic-link is identical to LOOKUP_NO_SYMLINKS. LOOKUP_NO_SYMLINKS implies LOOKUP_NO_MAGICLINKS. /* Testing. */ LOOKUP_NO_MAGICLINKS is tested as part of the openat2(2) selftests. [1]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk/ [2]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@google.com/ [3]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@google.com/ Cc: Christian Brauner Suggested-by: David Drysdale Suggested-by: Al Viro Suggested-by: Andy Lutomirski Suggested-by: Linus Torvalds Signed-off-by: Aleksa Sarai --- fs/namei.c | 7 ++++++- include/linux/namei.h | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/namei.c b/fs/namei.c index 4e85d6fa4048..1f0d871199e5 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -866,7 +866,7 @@ void nd_jump_link(struct path *path) nd->path = *path; nd->inode = nd->path.dentry->d_inode; - nd->flags |= LOOKUP_JUMPED; + nd->flags |= LOOKUP_JUMPED | LOOKUP_MAGICLINK_JUMPED; } static inline void put_link(struct nameidata *nd) @@ -1063,6 +1063,7 @@ const char *get_link(struct nameidata *nd) return ERR_PTR(error); nd->last_type = LAST_BIND; + nd->flags &= ~LOOKUP_MAGICLINK_JUMPED; res = READ_ONCE(inode->i_link); if (!res) { const char * (*get)(struct dentry *, struct inode *, @@ -1078,6 +1079,10 @@ const char *get_link(struct nameidata *nd) } else { res = get(dentry, inode, &last->done); } + if (nd->flags & LOOKUP_MAGICLINK_JUMPED) { + if (unlikely(nd->flags & LOOKUP_NO_MAGICLINKS)) + return ERR_PTR(-ELOOP); + } if (IS_ERR_OR_NULL(res)) return res; } diff --git a/include/linux/namei.h b/include/linux/namei.h index ee2e35af387f..a8b3f93338da 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -38,9 +38,11 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; #define LOOKUP_JUMPED 0x1000 #define LOOKUP_ROOT 0x2000 #define LOOKUP_ROOT_GRABBED 0x0008 +#define LOOKUP_MAGICLINK_JUMPED 0x10000 /* Scoping flags for lookup. */ #define LOOKUP_NO_SYMLINKS 0x020000 /* No symlink crossing. */ +#define LOOKUP_NO_MAGICLINKS 0x040000 /* No /proc/$pid/fd/ "symlink" crossing. */ extern int path_pts(struct path *path); From patchwork Tue Nov 5 09:05:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 11227223 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D1AC9139A for ; Tue, 5 Nov 2019 09:07:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B060C217F5 for ; Tue, 5 Nov 2019 09:07:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387826AbfKEJH6 (ORCPT ); Tue, 5 Nov 2019 04:07:58 -0500 Received: from mout-p-201.mailbox.org ([80.241.56.171]:9360 "EHLO mout-p-201.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730656AbfKEJH5 (ORCPT ); Tue, 5 Nov 2019 04:07:57 -0500 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 476kMr0pdBzQlBB; Tue, 5 Nov 2019 10:07:52 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter03.heinlein-hosting.de (spamfilter03.heinlein-hosting.de [80.241.56.117]) (amavisd-new, port 10030) with ESMTP id xLJwHfZCUuzQ; Tue, 5 Nov 2019 10:07:46 +0100 (CET) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan , Ingo Molnar , Peter Zijlstra Cc: Aleksa Sarai , Christian Brauner , David Drysdale , Andy Lutomirski , Linus Torvalds , Eric Biederman , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , Chanho Min , Oleg Nesterov , Rasmus Villemoes , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Christian Brauner , Aleksa Sarai , containers@lists.linux-foundation.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, libc-alpha@sourceware.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH v15 3/9] namei: LOOKUP_NO_XDEV: block mountpoint crossing Date: Tue, 5 Nov 2019 20:05:47 +1100 Message-Id: <20191105090553.6350-4-cyphar@cyphar.com> In-Reply-To: <20191105090553.6350-1-cyphar@cyphar.com> References: <20191105090553.6350-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org /* Background. */ The need to contain path operations within a mountpoint has been a long-standing usecase that userspace has historically implemented manually with liberal usage of stat(). find, rsync, tar and many other programs implement these semantics -- but it'd be much simpler to have a fool-proof way of refusing to open a path if it crosses a mountpoint. This is part of a refresh of Al's AT_NO_JUMPS patchset[1] (which was a variation on David Drysdale's O_BENEATH patchset[2], which in turn was based on the Capsicum project[3]). /* Userspace API. */ LOOKUP_NO_XDEV will be exposed to userspace through openat2(2). /* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_NO_XDEV applies to all components of the path. With LOOKUP_NO_XDEV, any path component which crosses a mount-point during path resolution (including "..") will yield an -EXDEV. Absolute paths, absolute symlinks, and magic-links will only yield an -EXDEV if the jump involved changing mount-points. /* Testing. */ LOOKUP_NO_XDEV is tested as part of the openat2(2) selftests. [1]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk/ [2]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@google.com/ [3]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@google.com/ Cc: Christian Brauner Suggested-by: David Drysdale Suggested-by: Al Viro Suggested-by: Andy Lutomirski Suggested-by: Linus Torvalds Signed-off-by: Aleksa Sarai --- fs/namei.c | 34 ++++++++++++++++++++++++++++++---- include/linux/namei.h | 1 + 2 files changed, 31 insertions(+), 4 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 1f0d871199e5..b73ee1601bd4 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -504,6 +504,9 @@ struct nameidata { struct filename *name; struct nameidata *saved; struct inode *link_inode; + struct { + bool same_mnt; + } last_magiclink; unsigned root_seq; int dfd; } __randomize_layout; @@ -837,6 +840,11 @@ static inline void path_to_nameidata(const struct path *path, static int nd_jump_root(struct nameidata *nd) { + if (unlikely(nd->flags & LOOKUP_NO_XDEV)) { + /* Absolute path arguments to path_init() are allowed. */ + if (nd->path.mnt != NULL && nd->path.mnt != nd->root.mnt) + return -EXDEV; + } if (nd->flags & LOOKUP_RCU) { struct dentry *d; nd->path = nd->root; @@ -862,6 +870,8 @@ static int nd_jump_root(struct nameidata *nd) void nd_jump_link(struct path *path) { struct nameidata *nd = current->nameidata; + + nd->last_magiclink.same_mnt = (nd->path.mnt == path->mnt); path_put(&nd->path); nd->path = *path; @@ -1082,6 +1092,10 @@ const char *get_link(struct nameidata *nd) if (nd->flags & LOOKUP_MAGICLINK_JUMPED) { if (unlikely(nd->flags & LOOKUP_NO_MAGICLINKS)) return ERR_PTR(-ELOOP); + if (unlikely(nd->flags & LOOKUP_NO_XDEV)) { + if (!nd->last_magiclink.same_mnt) + return ERR_PTR(-EXDEV); + } } if (IS_ERR_OR_NULL(res)) return res; @@ -1271,12 +1285,16 @@ static int follow_managed(struct path *path, struct nameidata *nd) break; } - if (need_mntput && path->mnt == mnt) - mntput(path->mnt); + if (need_mntput) { + if (path->mnt == mnt) + mntput(path->mnt); + if (unlikely(nd->flags & LOOKUP_NO_XDEV)) + ret = -EXDEV; + else + nd->flags |= LOOKUP_JUMPED; + } if (ret == -EISDIR || !ret) ret = 1; - if (need_mntput) - nd->flags |= LOOKUP_JUMPED; if (unlikely(ret < 0)) path_put_conditional(path, nd); return ret; @@ -1333,6 +1351,8 @@ static bool __follow_mount_rcu(struct nameidata *nd, struct path *path, mounted = __lookup_mnt(path->mnt, path->dentry); if (!mounted) break; + if (unlikely(nd->flags & LOOKUP_NO_XDEV)) + return false; path->mnt = &mounted->mnt; path->dentry = mounted->mnt.mnt_root; nd->flags |= LOOKUP_JUMPED; @@ -1379,6 +1399,8 @@ static int follow_dotdot_rcu(struct nameidata *nd) return -ECHILD; if (&mparent->mnt == nd->path.mnt) break; + if (unlikely(nd->flags & LOOKUP_NO_XDEV)) + return -EXDEV; /* we know that mountpoint was pinned */ nd->path.dentry = mountpoint; nd->path.mnt = &mparent->mnt; @@ -1393,6 +1415,8 @@ static int follow_dotdot_rcu(struct nameidata *nd) return -ECHILD; if (!mounted) break; + if (unlikely(nd->flags & LOOKUP_NO_XDEV)) + return -EXDEV; nd->path.mnt = &mounted->mnt; nd->path.dentry = mounted->mnt.mnt_root; inode = nd->path.dentry->d_inode; @@ -1491,6 +1515,8 @@ static int follow_dotdot(struct nameidata *nd) } if (!follow_up(&nd->path)) break; + if (unlikely(nd->flags & LOOKUP_NO_XDEV)) + return -EXDEV; } follow_mount(&nd->path); nd->inode = nd->path.dentry->d_inode; diff --git a/include/linux/namei.h b/include/linux/namei.h index a8b3f93338da..6105c8a59fc8 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -43,6 +43,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; /* Scoping flags for lookup. */ #define LOOKUP_NO_SYMLINKS 0x020000 /* No symlink crossing. */ #define LOOKUP_NO_MAGICLINKS 0x040000 /* No /proc/$pid/fd/ "symlink" crossing. */ +#define LOOKUP_NO_XDEV 0x080000 /* No mountpoint crossing. */ extern int path_pts(struct path *path); From patchwork Tue Nov 5 09:05:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 11227235 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 55BCA1599 for ; Tue, 5 Nov 2019 09:08:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2B77821929 for ; Tue, 5 Nov 2019 09:08:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387924AbfKEJIP (ORCPT ); Tue, 5 Nov 2019 04:08:15 -0500 Received: from mout-p-202.mailbox.org ([80.241.56.172]:31828 "EHLO mout-p-202.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730652AbfKEJIP (ORCPT ); Tue, 5 Nov 2019 04:08:15 -0500 Received: from smtp2.mailbox.org (smtp2.mailbox.org [80.241.60.241]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 476kNB6hrJzQjjH; Tue, 5 Nov 2019 10:08:10 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter05.heinlein-hosting.de (spamfilter05.heinlein-hosting.de [80.241.56.123]) (amavisd-new, port 10030) with ESMTP id gm_8kwRv8N6d; Tue, 5 Nov 2019 10:08:05 +0100 (CET) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan , Ingo Molnar , Peter Zijlstra Cc: Aleksa Sarai , Christian Brauner , David Drysdale , Andy Lutomirski , Linus Torvalds , Eric Biederman , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , Chanho Min , Oleg Nesterov , Rasmus Villemoes , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Christian Brauner , Aleksa Sarai , containers@lists.linux-foundation.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, libc-alpha@sourceware.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH v15 4/9] namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution Date: Tue, 5 Nov 2019 20:05:48 +1100 Message-Id: <20191105090553.6350-5-cyphar@cyphar.com> In-Reply-To: <20191105090553.6350-1-cyphar@cyphar.com> References: <20191105090553.6350-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org /* Background. */ There are many circumstances when userspace wants to resolve a path and ensure that it doesn't go outside of a particular root directory during resolution. Obvious examples include archive extraction tools, as well as other security-conscious userspace programs. FreeBSD spun out O_BENEATH from their Capsicum project[1,2], so it also seems reasonable to implement similar functionality for Linux. This is part of a refresh of Al's AT_NO_JUMPS patchset[3] (which was a variation on David Drysdale's O_BENEATH patchset[4], which in turn was based on the Capsicum project[5]). /* Userspace API. */ LOOKUP_BENEATH will be exposed to userspace through openat2(2). /* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_BENEATH applies to all components of the path. With LOOKUP_BENEATH, any path component which attempts to "escape" the starting point of the filesystem lookup (the dirfd passed to openat) will yield -EXDEV. Thus, all absolute paths and symlinks are disallowed. Due to a security concern brought up by Jann[6], any ".." path components are also blocked. This restriction will be lifted in a future patch, but requires more work to ensure that permitting ".." is done safely. Magic-link jumps are also blocked, because they can beam the path lookup across the starting point. It would be possible to detect and block only the "bad" crossings with path_is_under() checks, but it's unclear whether it makes sense to permit magic-links at all. However, userspace is recommended to pass LOOKUP_NO_MAGICLINKS if they want to ensure that magic-link crossing is entirely disabled. /* Testing. */ LOOKUP_BENEATH is tested as part of the openat2(2) selftests. [1]: https://reviews.freebsd.org/D2808 [2]: https://reviews.freebsd.org/D17547 [3]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk/ [4]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@google.com/ [5]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@google.com/ [6]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/ Cc: Christian Brauner Suggested-by: David Drysdale Suggested-by: Al Viro Suggested-by: Andy Lutomirski Suggested-by: Linus Torvalds Signed-off-by: Aleksa Sarai --- fs/namei.c | 91 +++++++++++++++++++++++++++++++++++-------- include/linux/namei.h | 4 ++ 2 files changed, 79 insertions(+), 16 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index b73ee1601bd4..54fdbdfbeb94 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -644,6 +644,14 @@ static bool legitimize_links(struct nameidata *nd) static bool legitimize_root(struct nameidata *nd) { + /* + * For scoped-lookups (where nd->root has been zeroed), we need to + * restart the whole lookup from scratch -- because set_root() is wrong + * for these lookups (nd->dfd is the root, not the filesystem root). + */ + if (!nd->root.mnt && (nd->flags & LOOKUP_IS_SCOPED)) + return false; + /* Nothing to do if nd->root is zero or is managed by the VFS user. */ if (!nd->root.mnt || (nd->flags & LOOKUP_ROOT)) return true; nd->flags |= LOOKUP_ROOT_GRABBED; @@ -779,7 +787,11 @@ static int complete_walk(struct nameidata *nd) int status; if (nd->flags & LOOKUP_RCU) { - if (!(nd->flags & LOOKUP_ROOT)) + /* + * We don't want to zero nd->root for scoped-lookups or + * externally-managed nd->root. + */ + if (!(nd->flags & (LOOKUP_ROOT | LOOKUP_IS_SCOPED))) nd->root.mnt = NULL; if (unlikely(unlazy_walk(nd))) return -ECHILD; @@ -801,10 +813,18 @@ static int complete_walk(struct nameidata *nd) return status; } -static void set_root(struct nameidata *nd) +static int set_root(struct nameidata *nd) { struct fs_struct *fs = current->fs; + /* + * Jumping to the real root in a scoped-lookup is a BUG in namei, but we + * still have to ensure it doesn't happen because it will cause a breakout + * from the dirfd. + */ + if (WARN_ON(nd->flags & LOOKUP_IS_SCOPED)) + return -ENOTRECOVERABLE; + if (nd->flags & LOOKUP_RCU) { unsigned seq; @@ -817,6 +837,7 @@ static void set_root(struct nameidata *nd) get_fs_root(fs, &nd->root); nd->flags |= LOOKUP_ROOT_GRABBED; } + return 0; } static void path_put_conditional(struct path *path, struct nameidata *nd) @@ -840,11 +861,18 @@ static inline void path_to_nameidata(const struct path *path, static int nd_jump_root(struct nameidata *nd) { + if (unlikely(nd->flags & LOOKUP_BENEATH)) + return -EXDEV; if (unlikely(nd->flags & LOOKUP_NO_XDEV)) { /* Absolute path arguments to path_init() are allowed. */ if (nd->path.mnt != NULL && nd->path.mnt != nd->root.mnt) return -EXDEV; } + if (!nd->root.mnt) { + int error = set_root(nd); + if (error) + return error; + } if (nd->flags & LOOKUP_RCU) { struct dentry *d; nd->path = nd->root; @@ -1096,15 +1124,17 @@ const char *get_link(struct nameidata *nd) if (!nd->last_magiclink.same_mnt) return ERR_PTR(-EXDEV); } + /* Not currently safe for scoped-lookups. */ + if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) + return ERR_PTR(-EXDEV); } if (IS_ERR_OR_NULL(res)) return res; } if (*res == '/') { - if (!nd->root.mnt) - set_root(nd); - if (unlikely(nd_jump_root(nd))) - return ERR_PTR(-ECHILD); + error = nd_jump_root(nd); + if (unlikely(error)) + return ERR_PTR(error); while (unlikely(*++res == '/')) ; } @@ -1373,8 +1403,11 @@ static int follow_dotdot_rcu(struct nameidata *nd) struct inode *inode = nd->inode; while (1) { - if (path_equal(&nd->path, &nd->root)) + if (path_equal(&nd->path, &nd->root)) { + if (unlikely(nd->flags & LOOKUP_BENEATH)) + return -EXDEV; break; + } if (nd->path.dentry != nd->path.mnt->mnt_root) { struct dentry *old = nd->path.dentry; struct dentry *parent = old->d_parent; @@ -1505,8 +1538,11 @@ static int path_parent_directory(struct path *path) static int follow_dotdot(struct nameidata *nd) { while(1) { - if (path_equal(&nd->path, &nd->root)) + if (path_equal(&nd->path, &nd->root)) { + if (unlikely(nd->flags & LOOKUP_BENEATH)) + return -EXDEV; break; + } if (nd->path.dentry != nd->path.mnt->mnt_root) { int ret = path_parent_directory(&nd->path); if (ret) @@ -1731,8 +1767,20 @@ static inline int may_lookup(struct nameidata *nd) static inline int handle_dots(struct nameidata *nd, int type) { if (type == LAST_DOTDOT) { - if (!nd->root.mnt) - set_root(nd); + int error = 0; + + /* + * Scoped-lookup flags resolving ".." is not currently safe -- + * races can cause our parent to have moved outside of the root + * and us to skip over it. + */ + if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) + return -EXDEV; + if (!nd->root.mnt) { + error = set_root(nd); + if (error) + return error; + } if (nd->flags & LOOKUP_RCU) { return follow_dotdot_rcu(nd); } else @@ -2195,6 +2243,7 @@ static int link_path_walk(const char *name, struct nameidata *nd) /* must be paired with terminate_walk() */ static const char *path_init(struct nameidata *nd, unsigned flags) { + int error; const char *s = nd->name->name; if (!*s) @@ -2227,11 +2276,12 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->path.dentry = NULL; nd->m_seq = read_seqbegin(&mount_lock); + + /* Figure out the starting path and root (if needed). */ if (*s == '/') { - set_root(nd); - if (likely(!nd_jump_root(nd))) - return s; - return ERR_PTR(-ECHILD); + error = nd_jump_root(nd); + if (unlikely(error)) + return ERR_PTR(error); } else if (nd->dfd == AT_FDCWD) { if (flags & LOOKUP_RCU) { struct fs_struct *fs = current->fs; @@ -2247,7 +2297,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) get_fs_pwd(current->fs, &nd->path); nd->inode = nd->path.dentry->d_inode; } - return s; } else { /* Caller must check execute permissions on the starting path component */ struct fd f = fdget_raw(nd->dfd); @@ -2272,8 +2321,18 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->inode = nd->path.dentry->d_inode; } fdput(f); - return s; } + /* For scoped-lookups we need to set the root to the dirfd as well. */ + if (flags & LOOKUP_IS_SCOPED) { + nd->root = nd->path; + if (flags & LOOKUP_RCU) { + nd->root_seq = nd->seq; + } else { + path_get(&nd->root); + nd->flags |= LOOKUP_ROOT_GRABBED; + } + } + return s; } static const char *trailing_symlink(struct nameidata *nd) diff --git a/include/linux/namei.h b/include/linux/namei.h index 6105c8a59fc8..12f4f36835c2 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -2,6 +2,7 @@ #ifndef _LINUX_NAMEI_H #define _LINUX_NAMEI_H +#include #include #include #include @@ -44,6 +45,9 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; #define LOOKUP_NO_SYMLINKS 0x020000 /* No symlink crossing. */ #define LOOKUP_NO_MAGICLINKS 0x040000 /* No /proc/$pid/fd/ "symlink" crossing. */ #define LOOKUP_NO_XDEV 0x080000 /* No mountpoint crossing. */ +#define LOOKUP_BENEATH 0x100000 /* No escaping from starting point. */ +/* LOOKUP_* flags which do scope-related checks based on the dirfd. */ +#define LOOKUP_IS_SCOPED LOOKUP_BENEATH extern int path_pts(struct path *path); From patchwork Tue Nov 5 09:05:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 11227253 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C69AF139A for ; Tue, 5 Nov 2019 09:08:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A420C217F5 for ; Tue, 5 Nov 2019 09:08:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387863AbfKEJIf (ORCPT ); Tue, 5 Nov 2019 04:08:35 -0500 Received: from mout-p-201.mailbox.org ([80.241.56.171]:9426 "EHLO mout-p-201.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730533AbfKEJIf (ORCPT ); Tue, 5 Nov 2019 04:08:35 -0500 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 476kNZ1fKKzQlBB; Tue, 5 Nov 2019 10:08:30 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter05.heinlein-hosting.de (spamfilter05.heinlein-hosting.de [80.241.56.123]) (amavisd-new, port 10030) with ESMTP id XFEh1KaOuye9; Tue, 5 Nov 2019 10:08:24 +0100 (CET) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan , Ingo Molnar , Peter Zijlstra Cc: Aleksa Sarai , Christian Brauner , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Rasmus Villemoes , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Christian Brauner , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, libc-alpha@sourceware.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH v15 5/9] namei: LOOKUP_IN_ROOT: chroot-like scoped resolution Date: Tue, 5 Nov 2019 20:05:49 +1100 Message-Id: <20191105090553.6350-6-cyphar@cyphar.com> In-Reply-To: <20191105090553.6350-1-cyphar@cyphar.com> References: <20191105090553.6350-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org /* Background. */ Container runtimes or other administrative management processes will often interact with root filesystems while in the host mount namespace, because the cost of doing a chroot(2) on every operation is too prohibitive (especially in Go, which cannot safely use vfork). However, a malicious program can trick the management process into doing operations on files outside of the root filesystem through careful crafting of symlinks. Most programs that need this feature have attempted to make this process safe, by doing all of the path resolution in userspace (with symlinks being scoped to the root of the malicious root filesystem). Unfortunately, this method is prone to foot-guns and usually such implementations have subtle security bugs. Thus, what userspace needs is a way to resolve a path as though it were in a chroot(2) -- with all absolute symlinks being resolved relative to the dirfd root (and ".." components being stuck under the dirfd root). It is much simpler and more straight-forward to provide this functionality in-kernel (because it can be done far more cheaply and correctly). More classical applications that also have this problem (which have their own potentially buggy userspace path sanitisation code) include web servers, archive extraction tools, network file servers, and so on. /* Userspace API. */ LOOKUP_IN_ROOT will be exposed to userspace through openat2(2). /* Semantics. */ Unlike most other LOOKUP flags (most notably LOOKUP_FOLLOW), LOOKUP_IN_ROOT applies to all components of the path. With LOOKUP_IN_ROOT, any path component which attempts to cross the starting point of the pathname lookup (the dirfd passed to openat) will remain at the starting point. Thus, all absolute paths and symlinks will be scoped within the starting point. There is a slight change in behaviour regarding pathnames -- if the pathname is absolute then the dirfd is still used as the root of resolution of LOOKUP_IN_ROOT is specified (this is to avoid obvious foot-guns, at the cost of a minor API inconsistency). As with LOOKUP_BENEATH, Jann's security concern about ".."[1] applies to LOOKUP_IN_ROOT -- therefore ".." resolution is blocked. This restriction will be lifted in a future patch, but requires more work to ensure that permitting ".." is done safely. Magic-link jumps are also blocked, because they can beam the path lookup across the starting point. It would be possible to detect and block only the "bad" crossings with path_is_under() checks, but it's unclear whether it makes sense to permit magic-links at all. However, userspace is recommended to pass LOOKUP_NO_MAGICLINKS if they want to ensure that magic-link crossing is entirely disabled. /* Testing. */ LOOKUP_IN_ROOT is tested as part of the openat2(2) selftests. [1]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/ Cc: Christian Brauner Signed-off-by: Aleksa Sarai --- fs/namei.c | 15 ++++++++++++--- include/linux/namei.h | 3 ++- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 54fdbdfbeb94..a3d199a60708 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2277,12 +2277,20 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->m_seq = read_seqbegin(&mount_lock); - /* Figure out the starting path and root (if needed). */ - if (*s == '/') { + /* Absolute pathname -- fetch the root. */ + if (flags & LOOKUP_IN_ROOT) { + /* With LOOKUP_IN_ROOT, act as a relative path. */ + while (*s == '/') + s++; + } else if (*s == '/') { error = nd_jump_root(nd); if (unlikely(error)) return ERR_PTR(error); - } else if (nd->dfd == AT_FDCWD) { + return s; + } + + /* Relative pathname -- get the starting-point it is relative to. */ + if (nd->dfd == AT_FDCWD) { if (flags & LOOKUP_RCU) { struct fs_struct *fs = current->fs; unsigned seq; @@ -2322,6 +2330,7 @@ static const char *path_init(struct nameidata *nd, unsigned flags) } fdput(f); } + /* For scoped-lookups we need to set the root to the dirfd as well. */ if (flags & LOOKUP_IS_SCOPED) { nd->root = nd->path; diff --git a/include/linux/namei.h b/include/linux/namei.h index 12f4f36835c2..96b374e08230 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -46,8 +46,9 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; #define LOOKUP_NO_MAGICLINKS 0x040000 /* No /proc/$pid/fd/ "symlink" crossing. */ #define LOOKUP_NO_XDEV 0x080000 /* No mountpoint crossing. */ #define LOOKUP_BENEATH 0x100000 /* No escaping from starting point. */ +#define LOOKUP_IN_ROOT 0x200000 /* Treat dirfd as %current->fs->root. */ /* LOOKUP_* flags which do scope-related checks based on the dirfd. */ -#define LOOKUP_IS_SCOPED LOOKUP_BENEATH +#define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT) extern int path_pts(struct path *path); From patchwork Tue Nov 5 09:05:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 11227265 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B93D1599 for ; Tue, 5 Nov 2019 09:09:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2A543217F4 for ; Tue, 5 Nov 2019 09:09:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388072AbfKEJI4 (ORCPT ); Tue, 5 Nov 2019 04:08:56 -0500 Received: from mout-p-101.mailbox.org ([80.241.56.151]:52834 "EHLO mout-p-101.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730620AbfKEJIz (ORCPT ); Tue, 5 Nov 2019 04:08:55 -0500 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 476kNy2PVPzKmfb; Tue, 5 Nov 2019 10:08:50 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by gerste.heinlein-support.de (gerste.heinlein-support.de [91.198.250.173]) (amavisd-new, port 10030) with ESMTP id 3l4xZapOJhKq; Tue, 5 Nov 2019 10:08:43 +0100 (CET) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan , Ingo Molnar , Peter Zijlstra Cc: Aleksa Sarai , Christian Brauner , Jann Horn , Linus Torvalds , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Rasmus Villemoes , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Christian Brauner , Aleksa Sarai , containers@lists.linux-foundation.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, libc-alpha@sourceware.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH v15 6/9] namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution Date: Tue, 5 Nov 2019 20:05:50 +1100 Message-Id: <20191105090553.6350-7-cyphar@cyphar.com> In-Reply-To: <20191105090553.6350-1-cyphar@cyphar.com> References: <20191105090553.6350-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Allow LOOKUP_BENEATH and LOOKUP_IN_ROOT to safely permit ".." resolution (in the case of LOOKUP_BENEATH the resolution will still fail if ".." resolution would resolve a path outside of the root -- while LOOKUP_IN_ROOT will chroot(2)-style scope it). Magic-link jumps are still disallowed entirely[*]. As Jann explains[1,2], the need for this patch (and the original no-".." restriction) is explained by observing there is a fairly easy-to-exploit race condition with chroot(2) (and thus by extension LOOKUP_IN_ROOT and LOOKUP_BENEATH if ".." is allowed) where a rename(2) of a path can be used to "skip over" nd->root and thus escape to the filesystem above nd->root. thread1 [attacker]: for (;;) renameat2(AT_FDCWD, "/a/b/c", AT_FDCWD, "/a/d", RENAME_EXCHANGE); thread2 [victim]: for (;;) openat2(dirb, "b/c/../../etc/shadow", { .flags = O_PATH, .resolve = RESOLVE_IN_ROOT } ); With fairly significant regularity, thread2 will resolve to "/etc/shadow" rather than "/a/b/etc/shadow". There is also a similar (though somewhat more privileged) attack using MS_MOVE. With this patch, such cases will be detected *during* ".." resolution and will return -EAGAIN for userspace to decide to either retry or abort the lookup. It should be noted that ".." is the weak point of chroot(2) -- walking *into* a subdirectory tautologically cannot result in you walking *outside* nd->root (except through a bind-mount or magic-link). There is also no other way for a directory's parent to change (which is the primary worry with ".." resolution here) other than a rename or MS_MOVE. This is a first-pass implementation, where -EAGAIN will be returned if any rename or mount occurs anywhere on the host (in any namespace). This will result in spurious errors, but there isn't a satisfactory alternative (other than denying ".." altogether). One other possible alternative (which previous versions of this patch used) would be to check with path_is_under() if there was a racing rename or mount (after re-taking the relevant seqlocks). While this does work, it results in possible O(n*m) behaviour if there are many renames or mounts occuring *anywhere on the system*. A variant of the above attack is included in the selftests for openat2(2) later in this patch series. I've run this test on several machines for several days and no instances of a breakout were detected. While this is not concrete proof that this is safe, when combined with the above argument it should lend some trustworthiness to this construction. [*] It may be acceptable in the future to do a path_is_under() check (as with the alternative solution for "..") for magic-links after they are resolved. However this seems unlikely to be a feature that people *really* need -- it can be added later if it turns out a lot of people want it. [1]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/ [2]: https://lore.kernel.org/lkml/CAG48ez30WJhbsro2HOc_DR7V91M+hNFzBP5ogRMZaxbAORvqzg@mail.gmail.com/ Cc: Christian Brauner Suggested-by: Jann Horn Suggested-by: Linus Torvalds Signed-off-by: Aleksa Sarai --- fs/namei.c | 42 +++++++++++++++++++++++++++++------------- 1 file changed, 29 insertions(+), 13 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index a3d199a60708..174d69cf9084 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -491,7 +491,7 @@ struct nameidata { struct path root; struct inode *inode; /* path.dentry.d_inode */ unsigned int flags; - unsigned seq, m_seq; + unsigned seq, m_seq, r_seq; int last_type; unsigned depth; int total_link_count; @@ -1769,22 +1769,35 @@ static inline int handle_dots(struct nameidata *nd, int type) if (type == LAST_DOTDOT) { int error = 0; - /* - * Scoped-lookup flags resolving ".." is not currently safe -- - * races can cause our parent to have moved outside of the root - * and us to skip over it. - */ - if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) - return -EXDEV; if (!nd->root.mnt) { error = set_root(nd); if (error) return error; } - if (nd->flags & LOOKUP_RCU) { - return follow_dotdot_rcu(nd); - } else - return follow_dotdot(nd); + if (nd->flags & LOOKUP_RCU) + error = follow_dotdot_rcu(nd); + else + error = follow_dotdot(nd); + if (error) + return error; + + if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) { + bool m_retry = read_seqretry(&mount_lock, nd->m_seq); + bool r_retry = read_seqretry(&rename_lock, nd->r_seq); + + /* + * If there was a racing rename or mount along our + * path, then we can't be sure that ".." hasn't jumped + * above nd->root (and so userspace should retry or use + * some fallback). + * + * In future we could do a path_is_under() check here + * instead, but there are O(n*m) performance + * considerations with such a setup. + */ + if (unlikely(m_retry || r_retry)) + return -EAGAIN; + } } return 0; } @@ -2254,6 +2267,10 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->last_type = LAST_ROOT; /* if there are only slashes... */ nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT; nd->depth = 0; + + nd->m_seq = read_seqbegin(&mount_lock); + nd->r_seq = read_seqbegin(&rename_lock); + if (flags & LOOKUP_ROOT) { struct dentry *root = nd->root.dentry; struct inode *inode = root->d_inode; @@ -2275,7 +2292,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->path.mnt = NULL; nd->path.dentry = NULL; - nd->m_seq = read_seqbegin(&mount_lock); /* Absolute pathname -- fetch the root. */ if (flags & LOOKUP_IN_ROOT) { From patchwork Tue Nov 5 09:05:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 11227279 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CE10E1599 for ; Tue, 5 Nov 2019 09:09:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A287320869 for ; Tue, 5 Nov 2019 09:09:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730718AbfKEJJn (ORCPT ); Tue, 5 Nov 2019 04:09:43 -0500 Received: from mout-p-201.mailbox.org ([80.241.56.171]:9520 "EHLO mout-p-201.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730555AbfKEJJm (ORCPT ); Tue, 5 Nov 2019 04:09:42 -0500 Received: from smtp2.mailbox.org (smtp2.mailbox.org [80.241.60.241]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 476kPr5J0zzQlBB; Tue, 5 Nov 2019 10:09:36 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter05.heinlein-hosting.de (spamfilter05.heinlein-hosting.de [80.241.56.123]) (amavisd-new, port 10030) with ESMTP id QaW7105Y5pBW; Tue, 5 Nov 2019 10:09:02 +0100 (CET) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan , Ingo Molnar , Peter Zijlstra Cc: Aleksa Sarai , Christian Brauner , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Rasmus Villemoes , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, libc-alpha@sourceware.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH v15 7/9] open: introduce openat2(2) syscall Date: Tue, 5 Nov 2019 20:05:51 +1100 Message-Id: <20191105090553.6350-8-cyphar@cyphar.com> In-Reply-To: <20191105090553.6350-1-cyphar@cyphar.com> References: <20191105090553.6350-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org /* Background. */ For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Userspace also has a hard time figuring out whether a particular flag is supported on a particular kernel. While it is now possible with contemporary kernels (thanks to [3]), older kernels will expose unknown flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during openat(2) time matches modern syscall designs and is far more fool-proof. In addition, the newly-added path resolution restriction LOOKUP flags (which we would like to expose to user-space) don't feel related to the pre-existing O_* flag set -- they affect all components of path lookup. We'd therefore like to add a new flag argument. Adding a new syscall allows us to finally fix the flag-ignoring problem, and we can make it extensible enough so that we will hopefully never need an openat3(2). /* Syscall Prototype. */ /* * open_how is an extensible structure (similar in interface to * clone3(2) or sched_setattr(2)). The size parameter must be set to * sizeof(struct open_how), to allow for future extensions. All future * extensions will be appended to open_how, with their zero value * acting as a no-op default. */ struct open_how { /* ... */ }; int openat2(int dfd, const char *pathname, struct open_how *how, size_t size); /* Description. */ The initial version of 'struct open_how' contains the following fields: flags Used to specify openat(2)-style flags. However, any unknown flag bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR) will result in -EINVAL. In addition, this field is 64-bits wide to allow for more O_ flags than currently permitted with openat(2). mode The file mode for O_CREAT or O_TMPFILE. Must be set to zero if flags does not contain O_CREAT or O_TMPFILE. __padding Must be set to all zeroes. resolve Restrict path resolution (in contrast to O_* flags they affect all path components). The current set of flags are as follows (at the moment, all of the RESOLVE_ flags are implemented as just passing the corresponding LOOKUP_ flag). RESOLVE_NO_XDEV => LOOKUP_NO_XDEV RESOLVE_NO_SYMLINKS => LOOKUP_NO_SYMLINKS RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS RESOLVE_BENEATH => LOOKUP_BENEATH RESOLVE_IN_ROOT => LOOKUP_IN_ROOT open_how does not contain an embedded size field, because it is of little benefit (userspace can figure out the kernel open_how size at runtime fairly easily without it). Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE is no longer permitted for openat(2). As far as I can tell, this has always been a bug and appears to not be used by userspace (and I've not seen any problems on my machines by disallowing it). If it turns out this breaks something, we can special-case it and only permit it for openat(2) but not openat2(2). /* Testing. */ In a follow-up patch there are over 200 selftests which ensure that this syscall has the correct semantics and will correctly handle several attack scenarios. In addition, I've written a userspace library[4] which provides convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care must be taken when using RESOLVE_IN_ROOT'd file descriptors with other syscalls). During the development of this patch, I've run numerous verification tests using libpathrs (showing that the API is reasonably usable by userspace). /* Future Work. */ Additional RESOLVE_ flags have been suggested during the review period. These can be easily implemented separately (such as blocking auto-mount during resolution). Furthermore, there are some other proposed changes to the openat(2) interface (the most obvious example is magic-link hardening[5]) which would be a good opportunity to add a way for userspace to restrict how O_PATH file descriptors can be re-opened. [1]: https://lwn.net/Articles/588444/ [2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com [3]: commit 629e014bb834 ("fs: completely ignore unknown open flags") [4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523 [5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/ Suggested-by: Christian Brauner Signed-off-by: Aleksa Sarai --- CREDITS | 4 +- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + fs/open.c | 149 +++++++++++++++----- include/linux/fcntl.h | 12 +- include/linux/syscalls.h | 3 + include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/fcntl.h | 41 ++++++ 24 files changed, 196 insertions(+), 38 deletions(-) diff --git a/CREDITS b/CREDITS index 031605d46b4d..a048e001d726 100644 --- a/CREDITS +++ b/CREDITS @@ -3301,7 +3301,9 @@ S: France N: Aleksa Sarai E: cyphar@cyphar.com W: https://www.cyphar.com/ -D: `pids` cgroup subsystem +D: /sys/fs/cgroup/pids +D: openat2(2) +S: Sydney, Australia N: Dipankar Sarma E: dipankar@in.ibm.com diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 728fe028c02c..9f374f7d9514 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -475,3 +475,4 @@ 543 common fspick sys_fspick 544 common pidfd_open sys_pidfd_open # 545 reserved for clone3 +547 common openat2 sys_openat2 diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 6da7dc4d79cc..4ba54bc7e19a 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -449,3 +449,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +437 common openat2 sys_openat2 diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 2629a68b8724..8aa00ccb0b96 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 436 +#define __NR_compat_syscalls 438 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 94ab29cf4f00..57f6f592d460 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -879,6 +879,8 @@ __SYSCALL(__NR_fspick, sys_fspick) __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) +#define __NR_openat2 437 +__SYSCALL(__NR_openat2, sys_openat2) /* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 36d5faf4c86c..8d36f2e2dc89 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -356,3 +356,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +437 common openat2 sys_openat2 diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index a88a285a0e5f..2559925f1924 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -435,3 +435,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +437 common openat2 sys_openat2 diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 09b0cd7dab0a..c04385e60833 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +437 common openat2 sys_openat2 diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index e7c5ab38e403..68c9ec06851f 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -374,3 +374,4 @@ 433 n32 fspick sys_fspick 434 n32 pidfd_open sys_pidfd_open 435 n32 clone3 __sys_clone3 +437 n32 openat2 sys_openat2 diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index 13cd66581f3b..42a72d010050 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -350,3 +350,4 @@ 433 n64 fspick sys_fspick 434 n64 pidfd_open sys_pidfd_open 435 n64 clone3 __sys_clone3 +437 n64 openat2 sys_openat2 diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 353539ea4140..f114c4aed0ed 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -423,3 +423,4 @@ 433 o32 fspick sys_fspick 434 o32 pidfd_open sys_pidfd_open 435 o32 clone3 __sys_clone3 +437 o32 openat2 sys_openat2 diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 285ff516150c..b550ae9a7fea 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -433,3 +433,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3_wrapper +437 common openat2 sys_openat2 diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 43f736ed47f2..a8b5ecb5b602 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -517,3 +517,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 nospu clone3 ppc_clone3 +437 common openat2 sys_openat2 diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 3054e9c035a3..16b571c06161 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick sys_fspick 434 common pidfd_open sys_pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 sys_clone3 +437 common openat2 sys_openat2 sys_openat2 diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index b5ed26c4c005..a7185cc18626 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +437 common openat2 sys_openat2 diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8c8cc7537fb2..b11c19552022 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -481,3 +481,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +437 common openat2 sys_openat2 diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 3fe02546aed3..e5c022e9a5c4 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -440,3 +440,4 @@ 433 i386 fspick sys_fspick __ia32_sys_fspick 434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open 435 i386 clone3 sys_clone3 __ia32_sys_clone3 +437 i386 openat2 sys_openat2 __ia32_sys_openat2 diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c29976eca4a8..9035647ef236 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -357,6 +357,7 @@ 433 common fspick __x64_sys_fspick 434 common pidfd_open __x64_sys_pidfd_open 435 common clone3 __x64_sys_clone3/ptregs +437 common openat2 __x64_sys_openat2 # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 25f4de729a6d..f0a68013c038 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -406,3 +406,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +437 common openat2 sys_openat2 diff --git a/fs/open.c b/fs/open.c index b62f5c0923a8..50a46501bcc9 100644 --- a/fs/open.c +++ b/fs/open.c @@ -955,48 +955,86 @@ struct file *open_with_fake_path(const struct path *path, int flags, } EXPORT_SYMBOL(open_with_fake_path); -static inline int build_open_flags(int flags, umode_t mode, struct open_flags *op) +#define WILL_CREATE(flags) (flags & (O_CREAT | __O_TMPFILE)) +#define O_PATH_FLAGS (O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC) + +static inline struct open_how build_open_how(int flags, umode_t mode) +{ + struct open_how how = { + .flags = flags & VALID_OPEN_FLAGS, + .mode = mode & S_IALLUGO, + }; + + /* O_PATH beats everything else. */ + if (how.flags & O_PATH) + how.flags &= O_PATH_FLAGS; + /* Modes should only be set for create-like flags. */ + if (!WILL_CREATE(how.flags)) + how.mode = 0; + return how; +} + +static inline int build_open_flags(const struct open_how *how, + struct open_flags *op) { + int flags = how->flags; int lookup_flags = 0; int acc_mode = ACC_MODE(flags); + /* Must never be set by userspace */ + flags &= ~(FMODE_NONOTIFY | O_CLOEXEC); + /* - * Clear out all open flags we don't know about so that we don't report - * them in fcntl(F_GETFD) or similar interfaces. + * Older syscalls implicitly clear all of the invalid flags or argument + * values before calling build_open_flags(), but openat2(2) checks all + * of its arguments. */ - flags &= VALID_OPEN_FLAGS; + if (flags & ~VALID_OPEN_FLAGS) + return -EINVAL; + if (how->resolve & ~VALID_RESOLVE_FLAGS) + return -EINVAL; + if (memchr_inv(how->__padding, 0, sizeof(how->__padding))) + return -EINVAL; - if (flags & (O_CREAT | __O_TMPFILE)) - op->mode = (mode & S_IALLUGO) | S_IFREG; - else + /* Deal with the mode. */ + if (WILL_CREATE(flags)) { + if (how->mode & ~S_IALLUGO) + return -EINVAL; + op->mode = how->mode | S_IFREG; + } else { + if (how->mode != 0) + return -EINVAL; op->mode = 0; - - /* Must never be set by userspace */ - flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC; + } /* - * O_SYNC is implemented as __O_SYNC|O_DSYNC. As many places only - * check for O_DSYNC if the need any syncing at all we enforce it's - * always set instead of having to deal with possibly weird behaviour - * for malicious applications setting only __O_SYNC. + * In order to ensure programs get explicit errors when trying to use + * O_TMPFILE on old kernels, O_TMPFILE is implemented such that it + * looks like (O_DIRECTORY|O_RDWR & ~O_CREAT) to old kernels. But we + * have to require userspace to explicitly set it. */ - if (flags & __O_SYNC) - flags |= O_DSYNC; - if (flags & __O_TMPFILE) { if ((flags & O_TMPFILE_MASK) != O_TMPFILE) return -EINVAL; if (!(acc_mode & MAY_WRITE)) return -EINVAL; - } else if (flags & O_PATH) { - /* - * If we have O_PATH in the open flag. Then we - * cannot have anything other than the below set of flags - */ - flags &= O_DIRECTORY | O_NOFOLLOW | O_PATH; + } + if (flags & O_PATH) { + /* O_PATH only permits certain other flags to be set. */ + if (flags & ~O_PATH_FLAGS) + return -EINVAL; acc_mode = 0; } + /* + * O_SYNC is implemented as __O_SYNC|O_DSYNC. As many places only + * check for O_DSYNC if the need any syncing at all we enforce it's + * always set instead of having to deal with possibly weird behaviour + * for malicious applications setting only __O_SYNC. + */ + if (flags & __O_SYNC) + flags |= O_DSYNC; + op->open_flag = flags; /* O_TRUNC implies we need access checks for write permissions */ @@ -1022,6 +1060,18 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o lookup_flags |= LOOKUP_DIRECTORY; if (!(flags & O_NOFOLLOW)) lookup_flags |= LOOKUP_FOLLOW; + + if (how->resolve & RESOLVE_NO_XDEV) + lookup_flags |= LOOKUP_NO_XDEV; + if (how->resolve & RESOLVE_NO_MAGICLINKS) + lookup_flags |= LOOKUP_NO_MAGICLINKS; + if (how->resolve & RESOLVE_NO_SYMLINKS) + lookup_flags |= LOOKUP_NO_SYMLINKS; + if (how->resolve & RESOLVE_BENEATH) + lookup_flags |= LOOKUP_BENEATH; + if (how->resolve & RESOLVE_IN_ROOT) + lookup_flags |= LOOKUP_IN_ROOT; + op->lookup_flags = lookup_flags; return 0; } @@ -1040,8 +1090,11 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o struct file *file_open_name(struct filename *name, int flags, umode_t mode) { struct open_flags op; - int err = build_open_flags(flags, mode, &op); - return err ? ERR_PTR(err) : do_filp_open(AT_FDCWD, name, &op); + struct open_how how = build_open_how(flags, mode); + int err = build_open_flags(&how, &op); + if (err) + return ERR_PTR(err); + return do_filp_open(AT_FDCWD, name, &op); } /** @@ -1072,17 +1125,19 @@ struct file *file_open_root(struct dentry *dentry, struct vfsmount *mnt, const char *filename, int flags, umode_t mode) { struct open_flags op; - int err = build_open_flags(flags, mode, &op); + struct open_how how = build_open_how(flags, mode); + int err = build_open_flags(&how, &op); if (err) return ERR_PTR(err); return do_file_open_root(dentry, mnt, filename, &op); } EXPORT_SYMBOL(file_open_root); -long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) +static long do_sys_openat2(int dfd, const char __user *filename, + struct open_how *how) { struct open_flags op; - int fd = build_open_flags(flags, mode, &op); + int fd = build_open_flags(how, &op); struct filename *tmp; if (fd) @@ -1092,7 +1147,7 @@ long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) if (IS_ERR(tmp)) return PTR_ERR(tmp); - fd = get_unused_fd_flags(flags); + fd = get_unused_fd_flags(how->flags); if (fd >= 0) { struct file *f = do_filp_open(dfd, tmp, &op); if (IS_ERR(f)) { @@ -1107,12 +1162,16 @@ long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) return fd; } -SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode) +long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) { - if (force_o_largefile()) - flags |= O_LARGEFILE; + struct open_how how = build_open_how(flags, mode); + return do_sys_openat2(dfd, filename, &how); +} - return do_sys_open(AT_FDCWD, filename, flags, mode); + +SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode) +{ + return ksys_open(filename, flags, mode); } SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags, @@ -1120,10 +1179,32 @@ SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags, { if (force_o_largefile()) flags |= O_LARGEFILE; - return do_sys_open(dfd, filename, flags, mode); } +SYSCALL_DEFINE4(openat2, int, dfd, const char __user *, filename, + struct open_how __user *, how, size_t, usize) +{ + int err; + struct open_how tmp; + + BUILD_BUG_ON(sizeof(struct open_how) < OPEN_HOW_SIZE_VER0); + BUILD_BUG_ON(sizeof(struct open_how) != OPEN_HOW_SIZE_LATEST); + + if (unlikely(usize < OPEN_HOW_SIZE_VER0)) + return -EINVAL; + + err = copy_struct_from_user(&tmp, sizeof(tmp), how, usize); + if (err) + return err; + + /* O_LARGEFILE is only allowed for non-O_PATH. */ + if (!(tmp.flags & O_PATH) && force_o_largefile()) + tmp.flags |= O_LARGEFILE; + + return do_sys_openat2(dfd, filename, &tmp); +} + #ifdef CONFIG_COMPAT /* * Exactly like sys_open(), except that it doesn't set the diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index d019df946cb2..f2eb05bd3af3 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -2,15 +2,25 @@ #ifndef _LINUX_FCNTL_H #define _LINUX_FCNTL_H +#include #include -/* list of all valid flags for the open/openat flags argument: */ +/* List of all valid flags for the open/openat flags argument: */ #define VALID_OPEN_FLAGS \ (O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \ O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \ FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \ O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE) +/* List of all valid flags for the how->upgrade_mask argument: */ +#define VALID_UPGRADE_FLAGS \ + (UPGRADE_NOWRITE | UPGRADE_NOREAD) + +/* List of all valid flags for the how->resolve argument: */ +#define VALID_RESOLVE_FLAGS \ + (RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \ + RESOLVE_BENEATH | RESOLVE_IN_ROOT) + #ifndef force_o_largefile #define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index f7c561c4dcdd..808f103b7a62 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -69,6 +69,7 @@ struct rseq; union bpf_attr; struct io_uring_params; struct clone_args; +struct open_how; #include #include @@ -439,6 +440,8 @@ asmlinkage long sys_fchownat(int dfd, const char __user *filename, uid_t user, asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group); asmlinkage long sys_openat(int dfd, const char __user *filename, int flags, umode_t mode); +asmlinkage long sys_openat2(int dfd, const char __user *filename, + struct open_how *how, size_t size); asmlinkage long sys_close(unsigned int fd); asmlinkage long sys_vhangup(void); diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1fc8faa6e973..d4122c091472 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -851,8 +851,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open) __SYSCALL(__NR_clone3, sys_clone3) #endif +#define __NR_openat2 437 +__SYSCALL(__NR_openat2, sys_openat2) + #undef __NR_syscalls -#define __NR_syscalls 436 +#define __NR_syscalls 438 /* * 32 bit systems traditionally used different diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 1d338357df8a..5de8b0006a95 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -93,5 +93,46 @@ #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ +/* + * Arguments for how openat2(2) should open the target path. If @resolve is + * zero, then openat2(2) operates very similarly to openat(2). + * + * However, unlike openat(2), unknown bits in @flags result in -EINVAL rather + * than being silently ignored. @mode must be zero unless one of {O_CREAT, + * O_TMPFILE} are set, and @upgrade_mask must be zero unless O_PATH is set. + * + * @flags: O_* flags. + * @mode: O_CREAT/O_TMPFILE file mode. + * @upgrade_mask: UPGRADE_* flags (to restrict O_PATH re-opening). + * @resolve: RESOLVE_* flags. + */ +struct open_how { + __aligned_u64 flags; + __u16 mode; + __u16 __padding[3]; /* must be zeroed */ + __aligned_u64 resolve; +}; + +#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */ +#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0 + +/* how->resolve flags for openat2(2). */ +#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings + (includes bind-mounts). */ +#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style + "magic-links". */ +#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks + (implies OEXT_NO_MAGICLINKS) */ +#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like + "..", symlinks, and absolute + paths which escape the dirfd. */ +#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".." + be scoped inside the dirfd + (similar to chroot(2)). */ + +/* how->upgrade flags for openat2(2). */ +/* First bit is reserved for a future UPGRADE_NOEXEC flag. */ +#define UPGRADE_NOREAD 0x02 /* Block re-opening with MAY_READ. */ +#define UPGRADE_NOWRITE 0x04 /* Block re-opening with MAY_WRITE. */ #endif /* _UAPI_LINUX_FCNTL_H */ From patchwork Tue Nov 5 09:05:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 11227289 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DEEC5139A for ; Tue, 5 Nov 2019 09:09:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BA471217F5 for ; Tue, 5 Nov 2019 09:09:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730686AbfKEJJl (ORCPT ); Tue, 5 Nov 2019 04:09:41 -0500 Received: from mout-p-102.mailbox.org ([80.241.56.152]:25868 "EHLO mout-p-102.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730610AbfKEJJl (ORCPT ); Tue, 5 Nov 2019 04:09:41 -0500 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 476kPl6jnnzKmTx; Tue, 5 Nov 2019 10:09:31 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter06.heinlein-hosting.de (spamfilter06.heinlein-hosting.de [80.241.56.125]) (amavisd-new, port 10030) with ESMTP id aB-YO7xwMxsW; Tue, 5 Nov 2019 10:09:26 +0100 (CET) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan , Ingo Molnar , Peter Zijlstra Cc: Aleksa Sarai , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Rasmus Villemoes , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Christian Brauner , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, libc-alpha@sourceware.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH v15 8/9] selftests: add openat2(2) selftests Date: Tue, 5 Nov 2019 20:05:52 +1100 Message-Id: <20191105090553.6350-9-cyphar@cyphar.com> In-Reply-To: <20191105090553.6350-1-cyphar@cyphar.com> References: <20191105090553.6350-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Test all of the various openat2(2) flags. A small stress-test of a symlink-rename attack is included to show that the protections against ".."-based attacks are sufficient. The main things these self-tests are enforcing are: * The struct+usize ABI for openat2(2) and copy_struct_from_user() to ensure that upgrades will be handled gracefully (in addition, ensuring that misaligned structures are also handled correctly). * The -EINVAL checks for openat2(2) are all correctly handled to avoid userspace passing unknown or conflicting flag sets (most importantly, ensuring that invalid flag combinations are checked). * All of the RESOLVE_* semantics (including errno values) are correctly handled with various combinations of paths and flags. * RESOLVE_IN_ROOT correctly protects against the symlink rename(2) attack that has been responsible for several CVEs (and likely will be responsible for several more). Cc: Shuah Khan Signed-off-by: Aleksa Sarai --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/openat2/.gitignore | 1 + tools/testing/selftests/openat2/Makefile | 8 + tools/testing/selftests/openat2/helpers.c | 109 ++++ tools/testing/selftests/openat2/helpers.h | 107 ++++ .../testing/selftests/openat2/openat2_test.c | 316 +++++++++++ .../selftests/openat2/rename_attack_test.c | 160 ++++++ .../testing/selftests/openat2/resolve_test.c | 523 ++++++++++++++++++ 8 files changed, 1225 insertions(+) create mode 100644 tools/testing/selftests/openat2/.gitignore create mode 100644 tools/testing/selftests/openat2/Makefile create mode 100644 tools/testing/selftests/openat2/helpers.c create mode 100644 tools/testing/selftests/openat2/helpers.h create mode 100644 tools/testing/selftests/openat2/openat2_test.c create mode 100644 tools/testing/selftests/openat2/rename_attack_test.c create mode 100644 tools/testing/selftests/openat2/resolve_test.c diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 4cdbae6f4e61..28996856ed5e 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -37,6 +37,7 @@ TARGETS += powerpc TARGETS += proc TARGETS += pstore TARGETS += ptrace +TARGETS += openat2 TARGETS += rseq TARGETS += rtc TARGETS += seccomp diff --git a/tools/testing/selftests/openat2/.gitignore b/tools/testing/selftests/openat2/.gitignore new file mode 100644 index 000000000000..bd68f6c3fd07 --- /dev/null +++ b/tools/testing/selftests/openat2/.gitignore @@ -0,0 +1 @@ +/*_test diff --git a/tools/testing/selftests/openat2/Makefile b/tools/testing/selftests/openat2/Makefile new file mode 100644 index 000000000000..4b93b1417b86 --- /dev/null +++ b/tools/testing/selftests/openat2/Makefile @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: GPL-2.0-or-later + +CFLAGS += -Wall -O2 -g -fsanitize=address -fsanitize=undefined +TEST_GEN_PROGS := openat2_test resolve_test rename_attack_test + +include ../lib.mk + +$(TEST_GEN_PROGS): helpers.c diff --git a/tools/testing/selftests/openat2/helpers.c b/tools/testing/selftests/openat2/helpers.c new file mode 100644 index 000000000000..e9a6557ab16f --- /dev/null +++ b/tools/testing/selftests/openat2/helpers.c @@ -0,0 +1,109 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Author: Aleksa Sarai + * Copyright (C) 2018-2019 SUSE LLC. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include + +#include "helpers.h" + +bool needs_openat2(const struct open_how *how) +{ + return how->resolve != 0; +} + +int raw_openat2(int dfd, const char *path, void *how, size_t size) +{ + int ret = syscall(__NR_openat2, dfd, path, how, size); + return ret >= 0 ? ret : -errno; +} + +int sys_openat2(int dfd, const char *path, struct open_how *how) +{ + return raw_openat2(dfd, path, how, sizeof(*how)); +} + +int sys_openat(int dfd, const char *path, struct open_how *how) +{ + int ret = openat(dfd, path, how->flags, how->mode); + return ret >= 0 ? ret : -errno; +} + +int sys_renameat2(int olddirfd, const char *oldpath, + int newdirfd, const char *newpath, unsigned int flags) +{ + int ret = syscall(__NR_renameat2, olddirfd, oldpath, + newdirfd, newpath, flags); + return ret >= 0 ? ret : -errno; +} + +int touchat(int dfd, const char *path) +{ + int fd = openat(dfd, path, O_CREAT); + if (fd >= 0) + close(fd); + return fd; +} + +char *fdreadlink(int fd) +{ + char *target, *tmp; + + E_asprintf(&tmp, "/proc/self/fd/%d", fd); + + target = malloc(PATH_MAX); + if (!target) + ksft_exit_fail_msg("fdreadlink: malloc failed\n"); + memset(target, 0, PATH_MAX); + + E_readlink(tmp, target, PATH_MAX); + free(tmp); + return target; +} + +bool fdequal(int fd, int dfd, const char *path) +{ + char *fdpath, *dfdpath, *other; + bool cmp; + + fdpath = fdreadlink(fd); + dfdpath = fdreadlink(dfd); + + if (!path) + E_asprintf(&other, "%s", dfdpath); + else if (*path == '/') + E_asprintf(&other, "%s", path); + else + E_asprintf(&other, "%s/%s", dfdpath, path); + + cmp = !strcmp(fdpath, other); + + free(fdpath); + free(dfdpath); + free(other); + return cmp; +} + +bool openat2_supported = false; + +void __attribute__((constructor)) init(void) +{ + struct open_how how = {}; + int fd; + + BUILD_BUG_ON(sizeof(struct open_how) != OPEN_HOW_SIZE_VER0); + + /* Check openat2(2) support. */ + fd = sys_openat2(AT_FDCWD, ".", &how); + openat2_supported = (fd >= 0); + + if (fd >= 0) + close(fd); +} diff --git a/tools/testing/selftests/openat2/helpers.h b/tools/testing/selftests/openat2/helpers.h new file mode 100644 index 000000000000..43ca5ceab6e3 --- /dev/null +++ b/tools/testing/selftests/openat2/helpers.h @@ -0,0 +1,107 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Author: Aleksa Sarai + * Copyright (C) 2018-2019 SUSE LLC. + */ + +#ifndef __RESOLVEAT_H__ +#define __RESOLVEAT_H__ + +#define _GNU_SOURCE +#include +#include +#include +#include "../kselftest.h" + +#define ARRAY_LEN(X) (sizeof (X) / sizeof (*(X))) +#define BUILD_BUG_ON(e) ((void)(sizeof(struct { int:(-!!(e)); }))) + +#ifndef SYS_openat2 +#ifndef __NR_openat2 +#define __NR_openat2 437 +#endif /* __NR_openat2 */ +#define SYS_openat2 __NR_openat2 +#endif /* SYS_openat2 */ + +/* + * Arguments for how openat2(2) should open the target path. If @resolve is + * zero, then openat2(2) operates very similarly to openat(2). + * + * However, unlike openat(2), unknown bits in @flags result in -EINVAL rather + * than being silently ignored. @mode must be zero unless one of {O_CREAT, + * O_TMPFILE} are set. + * + * @flags: O_* flags. + * @mode: O_CREAT/O_TMPFILE file mode. + * @resolve: RESOLVE_* flags. + */ +struct open_how { + __aligned_u64 flags; + __u16 mode; + __u16 __padding[3]; /* must be zeroed */ + __aligned_u64 resolve; +}; + +#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */ +#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0 + +bool needs_openat2(const struct open_how *how); + +#ifndef RESOLVE_IN_ROOT +/* how->resolve flags for openat2(2). */ +#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings + (includes bind-mounts). */ +#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style + "magic-links". */ +#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks + (implies OEXT_NO_MAGICLINKS) */ +#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like + "..", symlinks, and absolute + paths which escape the dirfd. */ +#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".." + be scoped inside the dirfd + (similar to chroot(2)). */ +#endif /* RESOLVE_IN_ROOT */ + +#define E_func(func, ...) \ + do { \ + if (func(__VA_ARGS__) < 0) \ + ksft_exit_fail_msg("%s:%d %s failed\n", \ + __FILE__, __LINE__, #func);\ + } while (0) + +#define E_asprintf(...) E_func(asprintf, __VA_ARGS__) +#define E_chmod(...) E_func(chmod, __VA_ARGS__) +#define E_dup2(...) E_func(dup2, __VA_ARGS__) +#define E_fchdir(...) E_func(fchdir, __VA_ARGS__) +#define E_fstatat(...) E_func(fstatat, __VA_ARGS__) +#define E_kill(...) E_func(kill, __VA_ARGS__) +#define E_mkdirat(...) E_func(mkdirat, __VA_ARGS__) +#define E_mount(...) E_func(mount, __VA_ARGS__) +#define E_prctl(...) E_func(prctl, __VA_ARGS__) +#define E_readlink(...) E_func(readlink, __VA_ARGS__) +#define E_setresuid(...) E_func(setresuid, __VA_ARGS__) +#define E_symlinkat(...) E_func(symlinkat, __VA_ARGS__) +#define E_touchat(...) E_func(touchat, __VA_ARGS__) +#define E_unshare(...) E_func(unshare, __VA_ARGS__) + +#define E_assert(expr, msg, ...) \ + do { \ + if (!(expr)) \ + ksft_exit_fail_msg("ASSERT(%s:%d) failed (%s): " msg "\n", \ + __FILE__, __LINE__, #expr, ##__VA_ARGS__); \ + } while (0) + +int raw_openat2(int dfd, const char *path, void *how, size_t size); +int sys_openat2(int dfd, const char *path, struct open_how *how); +int sys_openat(int dfd, const char *path, struct open_how *how); +int sys_renameat2(int olddirfd, const char *oldpath, + int newdirfd, const char *newpath, unsigned int flags); + +int touchat(int dfd, const char *path); +char *fdreadlink(int fd); +bool fdequal(int fd, int dfd, const char *path); + +extern bool openat2_supported; + +#endif /* __RESOLVEAT_H__ */ diff --git a/tools/testing/selftests/openat2/openat2_test.c b/tools/testing/selftests/openat2/openat2_test.c new file mode 100644 index 000000000000..8a641acb0d6c --- /dev/null +++ b/tools/testing/selftests/openat2/openat2_test.c @@ -0,0 +1,316 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Author: Aleksa Sarai + * Copyright (C) 2018-2019 SUSE LLC. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" +#include "helpers.h" + +/* + * O_LARGEFILE is set to 0 by glibc. + * XXX: This is wrong on {mips, parisc, powerpc, sparc}. + */ +#undef O_LARGEFILE +#define O_LARGEFILE 0x8000 + +struct open_how_ext { + struct open_how inner; + uint32_t extra1; + char pad1[128]; + uint32_t extra2; + char pad2[128]; + uint32_t extra3; +}; + +struct struct_test { + const char *name; + struct open_how_ext arg; + size_t size; + int err; +}; + +#define NUM_OPENAT2_STRUCT_TESTS 9 +#define NUM_OPENAT2_STRUCT_VARIATIONS 13 + +void test_openat2_struct(void) +{ + int misalignments[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 17, 87 }; + + struct struct_test tests[] = { + /* Normal struct. */ + { .name = "normal struct", + .arg.inner.flags = O_RDONLY, + .size = sizeof(struct open_how) }, + /* Bigger struct, with zeroed out end. */ + { .name = "bigger struct (zeroed out)", + .arg.inner.flags = O_RDONLY, + .size = sizeof(struct open_how_ext) }, + + /* Normal struct with broken padding. */ + { .name = "normal struct (non-zero padding[0])", + .arg.inner.flags = O_RDONLY, + .arg.inner.__padding = {0xa0, 0x00}, + .size = sizeof(struct open_how_ext), .err = -EINVAL }, + { .name = "normal struct (non-zero padding[1])", + .arg.inner.flags = O_RDONLY, + .arg.inner.__padding = {0x00, 0x1a}, + .size = sizeof(struct open_how_ext), .err = -EINVAL }, + + /* TODO: Once expanded, check zero-padding. */ + + /* Smaller than version-0 struct. */ + { .name = "zero-sized 'struct'", + .arg.inner.flags = O_RDONLY, .size = 0, .err = -EINVAL }, + { .name = "smaller-than-v0 struct", + .arg.inner.flags = O_RDONLY, + .size = OPEN_HOW_SIZE_VER0 - 1, .err = -EINVAL }, + + /* Bigger struct, with non-zero trailing bytes. */ + { .name = "bigger struct (non-zero data in first 'future field')", + .arg.inner.flags = O_RDONLY, .arg.extra1 = 0xdeadbeef, + .size = sizeof(struct open_how_ext), .err = -E2BIG }, + { .name = "bigger struct (non-zero data in middle of 'future fields')", + .arg.inner.flags = O_RDONLY, .arg.extra2 = 0xfeedcafe, + .size = sizeof(struct open_how_ext), .err = -E2BIG }, + { .name = "bigger struct (non-zero data at end of 'future fields')", + .arg.inner.flags = O_RDONLY, .arg.extra3 = 0xabad1dea, + .size = sizeof(struct open_how_ext), .err = -E2BIG }, + }; + + BUILD_BUG_ON(ARRAY_LEN(misalignments) != NUM_OPENAT2_STRUCT_VARIATIONS); + BUILD_BUG_ON(ARRAY_LEN(tests) != NUM_OPENAT2_STRUCT_TESTS); + + for (int i = 0; i < ARRAY_LEN(tests); i++) { + struct struct_test *test = &tests[i]; + struct open_how_ext how_ext = test->arg; + + for (int j = 0; j < ARRAY_LEN(misalignments); j++) { + int fd, misalign = misalignments[j]; + char *fdpath = NULL; + bool failed; + void (*resultfn)(const char *msg, ...) = ksft_test_result_pass; + + void *copy = NULL, *how_copy = &how_ext; + + if (!openat2_supported) { + ksft_print_msg("openat2(2) unsupported\n"); + resultfn = ksft_test_result_skip; + goto skip; + } + + if (misalign) { + /* + * Explicitly misalign the structure copying it with the given + * (mis)alignment offset. The other data is set to be non-zero to + * make sure that non-zero bytes outside the struct aren't checked + * + * This is effectively to check that is_zeroed_user() works. + */ + copy = malloc(misalign + sizeof(how_ext)); + how_copy = copy + misalign; + memset(copy, 0xff, misalign); + memcpy(how_copy, &how_ext, sizeof(how_ext)); + } + + fd = raw_openat2(AT_FDCWD, ".", how_copy, test->size); + if (test->err >= 0) + failed = (fd < 0); + else + failed = (fd != test->err); + if (fd >= 0) { + fdpath = fdreadlink(fd); + close(fd); + } + + if (failed) { + resultfn = ksft_test_result_fail; + + ksft_print_msg("openat2 unexpectedly returned "); + if (fdpath) + ksft_print_msg("%d['%s']\n", fd, fdpath); + else + ksft_print_msg("%d (%s)\n", fd, strerror(-fd)); + } + +skip: + if (test->err >= 0) + resultfn("openat2 with %s argument [misalign=%d] succeeds\n", + test->name, misalign); + else + resultfn("openat2 with %s argument [misalign=%d] fails with %d (%s)\n", + test->name, misalign, test->err, + strerror(-test->err)); + + free(copy); + free(fdpath); + fflush(stdout); + } + } +} + +struct flag_test { + const char *name; + struct open_how how; + int err; +}; + +#define NUM_OPENAT2_FLAG_TESTS 21 + +void test_openat2_flags(void) +{ + struct flag_test tests[] = { + /* O_TMPFILE is incompatible with O_PATH and O_CREAT. */ + { .name = "incompatible flags (O_TMPFILE | O_PATH)", + .how.flags = O_TMPFILE | O_PATH | O_RDWR, .err = -EINVAL }, + { .name = "incompatible flags (O_TMPFILE | O_CREAT)", + .how.flags = O_TMPFILE | O_CREAT | O_RDWR, .err = -EINVAL }, + + /* O_PATH only permits certain other flags to be set ... */ + { .name = "compatible flags (O_PATH | O_CLOEXEC)", + .how.flags = O_PATH | O_CLOEXEC }, + { .name = "compatible flags (O_PATH | O_DIRECTORY)", + .how.flags = O_PATH | O_DIRECTORY }, + { .name = "compatible flags (O_PATH | O_NOFOLLOW)", + .how.flags = O_PATH | O_NOFOLLOW }, + /* ... and others are absolutely not permitted. */ + { .name = "incompatible flags (O_PATH | O_RDWR)", + .how.flags = O_PATH | O_RDWR, .err = -EINVAL }, + { .name = "incompatible flags (O_PATH | O_CREAT)", + .how.flags = O_PATH | O_CREAT, .err = -EINVAL }, + { .name = "incompatible flags (O_PATH | O_EXCL)", + .how.flags = O_PATH | O_EXCL, .err = -EINVAL }, + { .name = "incompatible flags (O_PATH | O_NOCTTY)", + .how.flags = O_PATH | O_NOCTTY, .err = -EINVAL }, + { .name = "incompatible flags (O_PATH | O_DIRECT)", + .how.flags = O_PATH | O_DIRECT, .err = -EINVAL }, + { .name = "incompatible flags (O_PATH | O_LARGEFILE)", + .how.flags = O_PATH | O_LARGEFILE, .err = -EINVAL }, + + /* ->mode must only be set with O_{CREAT,TMPFILE}. */ + { .name = "non-zero how.mode and O_RDONLY", + .how.flags = O_RDONLY, .how.mode = 0600, .err = -EINVAL }, + { .name = "non-zero how.mode and O_PATH", + .how.flags = O_PATH, .how.mode = 0600, .err = -EINVAL }, + { .name = "valid how.mode and O_CREAT", + .how.flags = O_CREAT, .how.mode = 0600 }, + { .name = "valid how.mode and O_TMPFILE", + .how.flags = O_TMPFILE | O_RDWR, .how.mode = 0600 }, + /* ->mode must only contain 0777 bits. */ + { .name = "invalid how.mode and O_CREAT", + .how.flags = O_CREAT, + .how.mode = 0xFFFF, .err = -EINVAL }, + { .name = "invalid how.mode and O_TMPFILE", + .how.flags = O_TMPFILE | O_RDWR, + .how.mode = 0x1337, .err = -EINVAL }, + + /* ->resolve must only contain RESOLVE_* flags. */ + { .name = "invalid how.resolve and O_RDONLY", + .how.flags = O_RDONLY, + .how.resolve = 0x1337, .err = -EINVAL }, + { .name = "invalid how.resolve and O_CREAT", + .how.flags = O_CREAT, + .how.resolve = 0x1337, .err = -EINVAL }, + { .name = "invalid how.resolve and O_TMPFILE", + .how.flags = O_TMPFILE | O_RDWR, + .how.resolve = 0x1337, .err = -EINVAL }, + { .name = "invalid how.resolve and O_PATH", + .how.flags = O_PATH, + .how.resolve = 0x1337, .err = -EINVAL }, + }; + + BUILD_BUG_ON(ARRAY_LEN(tests) != NUM_OPENAT2_FLAG_TESTS); + + for (int i = 0; i < ARRAY_LEN(tests); i++) { + int fd, fdflags = -1; + char *path, *fdpath = NULL; + bool failed = false; + struct flag_test *test = &tests[i]; + void (*resultfn)(const char *msg, ...) = ksft_test_result_pass; + + if (!openat2_supported) { + ksft_print_msg("openat2(2) unsupported\n"); + resultfn = ksft_test_result_skip; + goto skip; + } + + path = (test->how.flags & O_CREAT) ? "/tmp/ksft.openat2_tmpfile" : "."; + unlink(path); + + fd = sys_openat2(AT_FDCWD, path, &test->how); + if (test->err >= 0) + failed = (fd < 0); + else + failed = (fd != test->err); + if (fd >= 0) { + int otherflags; + + fdpath = fdreadlink(fd); + fdflags = fcntl(fd, F_GETFL); + otherflags = fcntl(fd, F_GETFD); + close(fd); + + E_assert(fdflags >= 0, "fcntl F_GETFL of new fd"); + E_assert(otherflags >= 0, "fcntl F_GETFD of new fd"); + + /* O_CLOEXEC isn't shown in F_GETFL. */ + if (otherflags & FD_CLOEXEC) + fdflags |= O_CLOEXEC; + /* O_CREAT is hidden from F_GETFL. */ + if (test->how.flags & O_CREAT) + fdflags |= O_CREAT; + if (!(test->how.flags & O_LARGEFILE)) + fdflags &= ~O_LARGEFILE; + failed |= (fdflags != test->how.flags); + } + + if (failed) { + resultfn = ksft_test_result_fail; + + ksft_print_msg("openat2 unexpectedly returned "); + if (fdpath) + ksft_print_msg("%d['%s'] with %X (!= %X)\n", + fd, fdpath, fdflags, + test->how.flags); + else + ksft_print_msg("%d (%s)\n", fd, strerror(-fd)); + } + +skip: + if (test->err >= 0) + resultfn("openat2 with %s succeeds\n", test->name); + else + resultfn("openat2 with %s fails with %d (%s)\n", + test->name, test->err, strerror(-test->err)); + + free(fdpath); + fflush(stdout); + } +} + +#define NUM_TESTS (NUM_OPENAT2_STRUCT_VARIATIONS * NUM_OPENAT2_STRUCT_TESTS + \ + NUM_OPENAT2_FLAG_TESTS) + +int main(int argc, char **argv) +{ + ksft_print_header(); + ksft_set_plan(NUM_TESTS); + + test_openat2_struct(); + test_openat2_flags(); + + if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0) + ksft_exit_fail(); + else + ksft_exit_pass(); +} diff --git a/tools/testing/selftests/openat2/rename_attack_test.c b/tools/testing/selftests/openat2/rename_attack_test.c new file mode 100644 index 000000000000..0a770728b436 --- /dev/null +++ b/tools/testing/selftests/openat2/rename_attack_test.c @@ -0,0 +1,160 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Author: Aleksa Sarai + * Copyright (C) 2018-2019 SUSE LLC. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" +#include "helpers.h" + +/* Construct a test directory with the following structure: + * + * root/ + * |-- a/ + * | `-- c/ + * `-- b/ + */ +int setup_testdir(void) +{ + int dfd; + char dirname[] = "/tmp/ksft-openat2-rename-attack.XXXXXX"; + + /* Make the top-level directory. */ + if (!mkdtemp(dirname)) + ksft_exit_fail_msg("setup_testdir: failed to create tmpdir\n"); + dfd = open(dirname, O_PATH | O_DIRECTORY); + if (dfd < 0) + ksft_exit_fail_msg("setup_testdir: failed to open tmpdir\n"); + + E_mkdirat(dfd, "a", 0755); + E_mkdirat(dfd, "b", 0755); + E_mkdirat(dfd, "a/c", 0755); + + return dfd; +} + +/* Swap @dirfd/@a and @dirfd/@b constantly. Parent must kill this process. */ +pid_t spawn_attack(int dirfd, char *a, char *b) +{ + pid_t child = fork(); + if (child != 0) + return child; + + /* If the parent (the test process) dies, kill ourselves too. */ + E_prctl(PR_SET_PDEATHSIG, SIGKILL); + + /* Swap @a and @b. */ + for (;;) + renameat2(dirfd, a, dirfd, b, RENAME_EXCHANGE); + exit(1); +} + +#define NUM_RENAME_TESTS 2 +#define ROUNDS 400000 + +const char *flagname(int resolve) +{ + switch (resolve) { + case RESOLVE_IN_ROOT: + return "RESOLVE_IN_ROOT"; + case RESOLVE_BENEATH: + return "RESOLVE_BENEATH"; + } + return "(unknown)"; +} + +void test_rename_attack(int resolve) +{ + int dfd, afd; + pid_t child; + void (*resultfn)(const char *msg, ...) = ksft_test_result_pass; + int escapes = 0, other_errs = 0, exdevs = 0, eagains = 0, successes = 0; + + struct open_how how = { + .flags = O_PATH, + .resolve = resolve, + }; + + if (!openat2_supported) { + how.resolve = 0; + ksft_print_msg("openat2(2) unsupported -- using openat(2) instead\n"); + } + + dfd = setup_testdir(); + afd = openat(dfd, "a", O_PATH); + if (afd < 0) + ksft_exit_fail_msg("test_rename_attack: failed to open 'a'\n"); + + child = spawn_attack(dfd, "a/c", "b"); + + for (int i = 0; i < ROUNDS; i++) { + int fd; + char *victim_path = "c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../../c/../.."; + + if (openat2_supported) + fd = sys_openat2(afd, victim_path, &how); + else + fd = sys_openat(afd, victim_path, &how); + + if (fd < 0) { + if (fd == -EAGAIN) + eagains++; + else if (fd == -EXDEV) + exdevs++; + else if (fd == -ENOENT) + escapes++; /* escaped outside and got ENOENT... */ + else + other_errs++; /* unexpected error */ + } else { + if (fdequal(fd, afd, NULL)) + successes++; + else + escapes++; /* we got an unexpected fd */ + } + close(fd); + } + + if (escapes > 0) + resultfn = ksft_test_result_fail; + ksft_print_msg("non-escapes: EAGAIN=%d EXDEV=%d E=%d success=%d\n", + eagains, exdevs, other_errs, successes); + resultfn("rename attack with %s (%d runs, got %d escapes)\n", + flagname(resolve), ROUNDS, escapes); + + /* Should be killed anyway, but might as well make sure. */ + E_kill(child, SIGKILL); +} + +#define NUM_TESTS NUM_RENAME_TESTS + +int main(int argc, char **argv) +{ + ksft_print_header(); + ksft_set_plan(NUM_TESTS); + + test_rename_attack(RESOLVE_BENEATH); + test_rename_attack(RESOLVE_IN_ROOT); + + if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0) + ksft_exit_fail(); + else + ksft_exit_pass(); +} diff --git a/tools/testing/selftests/openat2/resolve_test.c b/tools/testing/selftests/openat2/resolve_test.c new file mode 100644 index 000000000000..7a94b1da8e7b --- /dev/null +++ b/tools/testing/selftests/openat2/resolve_test.c @@ -0,0 +1,523 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Author: Aleksa Sarai + * Copyright (C) 2018-2019 SUSE LLC. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" +#include "helpers.h" + +/* + * Construct a test directory with the following structure: + * + * root/ + * |-- procexe -> /proc/self/exe + * |-- procroot -> /proc/self/root + * |-- root/ + * |-- mnt/ [mountpoint] + * | |-- self -> ../mnt/ + * | `-- absself -> /mnt/ + * |-- etc/ + * | `-- passwd + * |-- creatlink -> /newfile3 + * |-- reletc -> etc/ + * |-- relsym -> etc/passwd + * |-- absetc -> /etc/ + * |-- abssym -> /etc/passwd + * |-- abscheeky -> /cheeky + * `-- cheeky/ + * |-- absself -> / + * |-- self -> ../../root/ + * |-- garbageself -> /../../root/ + * |-- passwd -> ../cheeky/../cheeky/../etc/../etc/passwd + * |-- abspasswd -> /../cheeky/../cheeky/../etc/../etc/passwd + * |-- dotdotlink -> ../../../../../../../../../../../../../../etc/passwd + * `-- garbagelink -> /../../../../../../../../../../../../../../etc/passwd + */ +int setup_testdir(void) +{ + int dfd, tmpfd; + char dirname[] = "/tmp/ksft-openat2-testdir.XXXXXX"; + + /* Unshare and make /tmp a new directory. */ + E_unshare(CLONE_NEWNS); + E_mount("", "/tmp", "", MS_PRIVATE, ""); + + /* Make the top-level directory. */ + if (!mkdtemp(dirname)) + ksft_exit_fail_msg("setup_testdir: failed to create tmpdir\n"); + dfd = open(dirname, O_PATH | O_DIRECTORY); + if (dfd < 0) + ksft_exit_fail_msg("setup_testdir: failed to open tmpdir\n"); + + /* A sub-directory which is actually used for tests. */ + E_mkdirat(dfd, "root", 0755); + tmpfd = openat(dfd, "root", O_PATH | O_DIRECTORY); + if (tmpfd < 0) + ksft_exit_fail_msg("setup_testdir: failed to open tmpdir\n"); + close(dfd); + dfd = tmpfd; + + E_symlinkat("/proc/self/exe", dfd, "procexe"); + E_symlinkat("/proc/self/root", dfd, "procroot"); + E_mkdirat(dfd, "root", 0755); + + /* There is no mountat(2), so use chdir. */ + E_mkdirat(dfd, "mnt", 0755); + E_fchdir(dfd); + E_mount("tmpfs", "./mnt", "tmpfs", MS_NOSUID | MS_NODEV, ""); + E_symlinkat("../mnt/", dfd, "mnt/self"); + E_symlinkat("/mnt/", dfd, "mnt/absself"); + + E_mkdirat(dfd, "etc", 0755); + E_touchat(dfd, "etc/passwd"); + + E_symlinkat("/newfile3", dfd, "creatlink"); + E_symlinkat("etc/", dfd, "reletc"); + E_symlinkat("etc/passwd", dfd, "relsym"); + E_symlinkat("/etc/", dfd, "absetc"); + E_symlinkat("/etc/passwd", dfd, "abssym"); + E_symlinkat("/cheeky", dfd, "abscheeky"); + + E_mkdirat(dfd, "cheeky", 0755); + + E_symlinkat("/", dfd, "cheeky/absself"); + E_symlinkat("../../root/", dfd, "cheeky/self"); + E_symlinkat("/../../root/", dfd, "cheeky/garbageself"); + + E_symlinkat("../cheeky/../etc/../etc/passwd", dfd, "cheeky/passwd"); + E_symlinkat("/../cheeky/../etc/../etc/passwd", dfd, "cheeky/abspasswd"); + + E_symlinkat("../../../../../../../../../../../../../../etc/passwd", + dfd, "cheeky/dotdotlink"); + E_symlinkat("/../../../../../../../../../../../../../../etc/passwd", + dfd, "cheeky/garbagelink"); + + return dfd; +} + +struct basic_test { + const char *name; + const char *dir; + const char *path; + struct open_how how; + bool pass; + union { + int err; + const char *path; + } out; +}; + +#define NUM_OPENAT2_OPATH_TESTS 88 + +void test_openat2_opath_tests(void) +{ + int rootfd, hardcoded_fd; + char *procselfexe, *hardcoded_fdpath; + + E_asprintf(&procselfexe, "/proc/%d/exe", getpid()); + rootfd = setup_testdir(); + + hardcoded_fd = open("/dev/null", O_RDONLY); + E_assert(hardcoded_fd >= 0, "open fd to hardcode"); + E_asprintf(&hardcoded_fdpath, "self/fd/%d", hardcoded_fd); + + struct basic_test tests[] = { + /** RESOLVE_BENEATH **/ + /* Attempts to cross dirfd should be blocked. */ + { .name = "[beneath] jump to /", + .path = "/", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] absolute link to $root", + .path = "cheeky/absself", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] chained absolute links to $root", + .path = "abscheeky/absself", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] jump outside $root", + .path = "..", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] temporary jump outside $root", + .path = "../root/", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] symlink temporary jump outside $root", + .path = "cheeky/self", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] chained symlink temporary jump outside $root", + .path = "abscheeky/self", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] garbage links to $root", + .path = "cheeky/garbageself", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] chained garbage links to $root", + .path = "abscheeky/garbageself", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + /* Only relative paths that stay inside dirfd should work. */ + { .name = "[beneath] ordinary path to 'root'", + .path = "root", .how.resolve = RESOLVE_BENEATH, + .out.path = "root", .pass = true }, + { .name = "[beneath] ordinary path to 'etc'", + .path = "etc", .how.resolve = RESOLVE_BENEATH, + .out.path = "etc", .pass = true }, + { .name = "[beneath] ordinary path to 'etc/passwd'", + .path = "etc/passwd", .how.resolve = RESOLVE_BENEATH, + .out.path = "etc/passwd", .pass = true }, + { .name = "[beneath] relative symlink inside $root", + .path = "relsym", .how.resolve = RESOLVE_BENEATH, + .out.path = "etc/passwd", .pass = true }, + { .name = "[beneath] chained-'..' relative symlink inside $root", + .path = "cheeky/passwd", .how.resolve = RESOLVE_BENEATH, + .out.path = "etc/passwd", .pass = true }, + { .name = "[beneath] absolute symlink component outside $root", + .path = "abscheeky/passwd", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] absolute symlink target outside $root", + .path = "abssym", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] absolute path outside $root", + .path = "/etc/passwd", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] cheeky absolute path outside $root", + .path = "cheeky/abspasswd", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] chained cheeky absolute path outside $root", + .path = "abscheeky/abspasswd", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + /* Tricky paths should fail. */ + { .name = "[beneath] tricky '..'-chained symlink outside $root", + .path = "cheeky/dotdotlink", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] tricky absolute + '..'-chained symlink outside $root", + .path = "abscheeky/dotdotlink", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] tricky garbage link outside $root", + .path = "cheeky/garbagelink", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + { .name = "[beneath] tricky absolute + garbage link outside $root", + .path = "abscheeky/garbagelink", .how.resolve = RESOLVE_BENEATH, + .out.err = -EXDEV, .pass = false }, + + /** RESOLVE_IN_ROOT **/ + /* All attempts to cross the dirfd will be scoped-to-root. */ + { .name = "[in_root] jump to /", + .path = "/", .how.resolve = RESOLVE_IN_ROOT, + .out.path = NULL, .pass = true }, + { .name = "[in_root] absolute symlink to /root", + .path = "cheeky/absself", .how.resolve = RESOLVE_IN_ROOT, + .out.path = NULL, .pass = true }, + { .name = "[in_root] chained absolute symlinks to /root", + .path = "abscheeky/absself", .how.resolve = RESOLVE_IN_ROOT, + .out.path = NULL, .pass = true }, + { .name = "[in_root] '..' at root", + .path = "..", .how.resolve = RESOLVE_IN_ROOT, + .out.path = NULL, .pass = true }, + { .name = "[in_root] '../root' at root", + .path = "../root/", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "root", .pass = true }, + { .name = "[in_root] relative symlink containing '..' above root", + .path = "cheeky/self", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "root", .pass = true }, + { .name = "[in_root] garbage link to /root", + .path = "cheeky/garbageself", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "root", .pass = true }, + { .name = "[in_root] chainged garbage links to /root", + .path = "abscheeky/garbageself", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "root", .pass = true }, + { .name = "[in_root] relative path to 'root'", + .path = "root", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "root", .pass = true }, + { .name = "[in_root] relative path to 'etc'", + .path = "etc", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc", .pass = true }, + { .name = "[in_root] relative path to 'etc/passwd'", + .path = "etc/passwd", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] relative symlink to 'etc/passwd'", + .path = "relsym", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] chained-'..' relative symlink to 'etc/passwd'", + .path = "cheeky/passwd", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] chained-'..' absolute + relative symlink to 'etc/passwd'", + .path = "abscheeky/passwd", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] absolute symlink to 'etc/passwd'", + .path = "abssym", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] absolute path 'etc/passwd'", + .path = "/etc/passwd", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] cheeky absolute path 'etc/passwd'", + .path = "cheeky/abspasswd", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] chained cheeky absolute path 'etc/passwd'", + .path = "abscheeky/abspasswd", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] tricky '..'-chained symlink outside $root", + .path = "cheeky/dotdotlink", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] tricky absolute + '..'-chained symlink outside $root", + .path = "abscheeky/dotdotlink", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] tricky absolute path + absolute + '..'-chained symlink outside $root", + .path = "/../../../../abscheeky/dotdotlink", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] tricky garbage link outside $root", + .path = "cheeky/garbagelink", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] tricky absolute + garbage link outside $root", + .path = "abscheeky/garbagelink", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + { .name = "[in_root] tricky absolute path + absolute + garbage link outside $root", + .path = "/../../../../abscheeky/garbagelink", .how.resolve = RESOLVE_IN_ROOT, + .out.path = "etc/passwd", .pass = true }, + /* O_CREAT should handle trailing symlinks correctly. */ + { .name = "[in_root] O_CREAT of relative path inside $root", + .path = "newfile1", .how.flags = O_CREAT, + .how.mode = 0700, + .how.resolve = RESOLVE_IN_ROOT, + .out.path = "newfile1", .pass = true }, + { .name = "[in_root] O_CREAT of absolute path", + .path = "/newfile2", .how.flags = O_CREAT, + .how.mode = 0700, + .how.resolve = RESOLVE_IN_ROOT, + .out.path = "newfile2", .pass = true }, + { .name = "[in_root] O_CREAT of tricky symlink outside root", + .path = "/creatlink", .how.flags = O_CREAT, + .how.mode = 0700, + .how.resolve = RESOLVE_IN_ROOT, + .out.path = "newfile3", .pass = true }, + + /** RESOLVE_NO_XDEV **/ + /* Crossing *down* into a mountpoint is disallowed. */ + { .name = "[no_xdev] cross into $mnt", + .path = "mnt", .how.resolve = RESOLVE_NO_XDEV, + .out.err = -EXDEV, .pass = false }, + { .name = "[no_xdev] cross into $mnt/", + .path = "mnt/", .how.resolve = RESOLVE_NO_XDEV, + .out.err = -EXDEV, .pass = false }, + { .name = "[no_xdev] cross into $mnt/.", + .path = "mnt/.", .how.resolve = RESOLVE_NO_XDEV, + .out.err = -EXDEV, .pass = false }, + /* Crossing *up* out of a mountpoint is disallowed. */ + { .name = "[no_xdev] goto mountpoint root", + .dir = "mnt", .path = ".", .how.resolve = RESOLVE_NO_XDEV, + .out.path = "mnt", .pass = true }, + { .name = "[no_xdev] cross up through '..'", + .dir = "mnt", .path = "..", .how.resolve = RESOLVE_NO_XDEV, + .out.err = -EXDEV, .pass = false }, + { .name = "[no_xdev] temporary cross up through '..'", + .dir = "mnt", .path = "../mnt", .how.resolve = RESOLVE_NO_XDEV, + .out.err = -EXDEV, .pass = false }, + { .name = "[no_xdev] temporary relative symlink cross up", + .dir = "mnt", .path = "self", .how.resolve = RESOLVE_NO_XDEV, + .out.err = -EXDEV, .pass = false }, + { .name = "[no_xdev] temporary absolute symlink cross up", + .dir = "mnt", .path = "absself", .how.resolve = RESOLVE_NO_XDEV, + .out.err = -EXDEV, .pass = false }, + /* Jumping to "/" is ok, but later components cannot cross. */ + { .name = "[no_xdev] jump to / directly", + .dir = "mnt", .path = "/", .how.resolve = RESOLVE_NO_XDEV, + .out.path = "/", .pass = true }, + { .name = "[no_xdev] jump to / (from /) directly", + .dir = "/", .path = "/", .how.resolve = RESOLVE_NO_XDEV, + .out.path = "/", .pass = true }, + { .name = "[no_xdev] jump to / then proc", + .path = "/proc/1", .how.resolve = RESOLVE_NO_XDEV, + .out.err = -EXDEV, .pass = false }, + { .name = "[no_xdev] jump to / then tmp", + .path = "/tmp", .how.resolve = RESOLVE_NO_XDEV, + .out.err = -EXDEV, .pass = false }, + /* Magic-links are blocked since they can switch vfsmounts. */ + { .name = "[no_xdev] cross through magic-link to self/root", + .dir = "/proc", .path = "self/root", .how.resolve = RESOLVE_NO_XDEV, + .out.err = -EXDEV, .pass = false }, + { .name = "[no_xdev] cross through magic-link to self/cwd", + .dir = "/proc", .path = "self/cwd", .how.resolve = RESOLVE_NO_XDEV, + .out.err = -EXDEV, .pass = false }, + /* Except magic-link jumps inside the same vfsmount. */ + { .name = "[no_xdev] jump through magic-link to same procfs", + .dir = "/proc", .path = hardcoded_fdpath, .how.resolve = RESOLVE_NO_XDEV, + .out.path = "/proc", .pass = true, }, + + /** RESOLVE_NO_MAGICLINKS **/ + /* Regular symlinks should work. */ + { .name = "[no_magiclinks] ordinary relative symlink", + .path = "relsym", .how.resolve = RESOLVE_NO_MAGICLINKS, + .out.path = "etc/passwd", .pass = true }, + /* Magic-links should not work. */ + { .name = "[no_magiclinks] symlink to magic-link", + .path = "procexe", .how.resolve = RESOLVE_NO_MAGICLINKS, + .out.err = -ELOOP, .pass = false }, + { .name = "[no_magiclinks] normal path to magic-link", + .path = "/proc/self/exe", .how.resolve = RESOLVE_NO_MAGICLINKS, + .out.err = -ELOOP, .pass = false }, + { .name = "[no_magiclinks] normal path to magic-link with O_NOFOLLOW", + .path = "/proc/self/exe", .how.flags = O_NOFOLLOW, + .how.resolve = RESOLVE_NO_MAGICLINKS, + .out.path = procselfexe, .pass = true }, + { .name = "[no_magiclinks] symlink to magic-link path component", + .path = "procroot/etc", .how.resolve = RESOLVE_NO_MAGICLINKS, + .out.err = -ELOOP, .pass = false }, + { .name = "[no_magiclinks] magic-link path component", + .path = "/proc/self/root/etc", .how.resolve = RESOLVE_NO_MAGICLINKS, + .out.err = -ELOOP, .pass = false }, + { .name = "[no_magiclinks] magic-link path component with O_NOFOLLOW", + .path = "/proc/self/root/etc", .how.flags = O_NOFOLLOW, + .how.resolve = RESOLVE_NO_MAGICLINKS, + .out.err = -ELOOP, .pass = false }, + + /** RESOLVE_NO_SYMLINKS **/ + /* Normal paths should work. */ + { .name = "[no_symlinks] ordinary path to '.'", + .path = ".", .how.resolve = RESOLVE_NO_SYMLINKS, + .out.path = NULL, .pass = true }, + { .name = "[no_symlinks] ordinary path to 'root'", + .path = "root", .how.resolve = RESOLVE_NO_SYMLINKS, + .out.path = "root", .pass = true }, + { .name = "[no_symlinks] ordinary path to 'etc'", + .path = "etc", .how.resolve = RESOLVE_NO_SYMLINKS, + .out.path = "etc", .pass = true }, + { .name = "[no_symlinks] ordinary path to 'etc/passwd'", + .path = "etc/passwd", .how.resolve = RESOLVE_NO_SYMLINKS, + .out.path = "etc/passwd", .pass = true }, + /* Regular symlinks are blocked. */ + { .name = "[no_symlinks] relative symlink target", + .path = "relsym", .how.resolve = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + { .name = "[no_symlinks] relative symlink component", + .path = "reletc/passwd", .how.resolve = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + { .name = "[no_symlinks] absolute symlink target", + .path = "abssym", .how.resolve = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + { .name = "[no_symlinks] absolute symlink component", + .path = "absetc/passwd", .how.resolve = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + { .name = "[no_symlinks] cheeky garbage link", + .path = "cheeky/garbagelink", .how.resolve = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + { .name = "[no_symlinks] cheeky absolute + garbage link", + .path = "abscheeky/garbagelink", .how.resolve = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + { .name = "[no_symlinks] cheeky absolute + absolute symlink", + .path = "abscheeky/absself", .how.resolve = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + /* Trailing symlinks with NO_FOLLOW. */ + { .name = "[no_symlinks] relative symlink with O_NOFOLLOW", + .path = "relsym", .how.flags = O_NOFOLLOW, + .how.resolve = RESOLVE_NO_SYMLINKS, + .out.path = "relsym", .pass = true }, + { .name = "[no_symlinks] absolute symlink with O_NOFOLLOW", + .path = "abssym", .how.flags = O_NOFOLLOW, + .how.resolve = RESOLVE_NO_SYMLINKS, + .out.path = "abssym", .pass = true }, + { .name = "[no_symlinks] trailing symlink with O_NOFOLLOW", + .path = "cheeky/garbagelink", .how.flags = O_NOFOLLOW, + .how.resolve = RESOLVE_NO_SYMLINKS, + .out.path = "cheeky/garbagelink", .pass = true }, + { .name = "[no_symlinks] multiple symlink components with O_NOFOLLOW", + .path = "abscheeky/absself", .how.flags = O_NOFOLLOW, + .how.resolve = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + { .name = "[no_symlinks] multiple symlink (and garbage link) components with O_NOFOLLOW", + .path = "abscheeky/garbagelink", .how.flags = O_NOFOLLOW, + .how.resolve = RESOLVE_NO_SYMLINKS, + .out.err = -ELOOP, .pass = false }, + }; + + BUILD_BUG_ON(ARRAY_LEN(tests) != NUM_OPENAT2_OPATH_TESTS); + + for (int i = 0; i < ARRAY_LEN(tests); i++) { + int dfd, fd; + char *fdpath = NULL; + bool failed; + void (*resultfn)(const char *msg, ...) = ksft_test_result_pass; + struct basic_test *test = &tests[i]; + + if (!openat2_supported) { + ksft_print_msg("openat2(2) unsupported\n"); + resultfn = ksft_test_result_skip; + goto skip; + } + + /* Auto-set O_PATH. */ + if (!(test->how.flags & O_CREAT)) + test->how.flags |= O_PATH; + + if (test->dir) + dfd = openat(rootfd, test->dir, O_PATH | O_DIRECTORY); + else + dfd = dup(rootfd); + E_assert(dfd, "failed to openat root '%s': %m", test->dir); + + E_dup2(dfd, hardcoded_fd); + + fd = sys_openat2(dfd, test->path, &test->how); + if (test->pass) + failed = (fd < 0 || !fdequal(fd, rootfd, test->out.path)); + else + failed = (fd != test->out.err); + if (fd >= 0) { + fdpath = fdreadlink(fd); + close(fd); + } + close(dfd); + + if (failed) { + resultfn = ksft_test_result_fail; + + ksft_print_msg("openat2 unexpectedly returned "); + if (fdpath) + ksft_print_msg("%d['%s']\n", fd, fdpath); + else + ksft_print_msg("%d (%s)\n", fd, strerror(-fd)); + } + +skip: + if (test->pass) + resultfn("%s gives path '%s'\n", test->name, + test->out.path ?: "."); + else + resultfn("%s fails with %d (%s)\n", test->name, + test->out.err, strerror(-test->out.err)); + + fflush(stdout); + free(fdpath); + } + + free(procselfexe); + close(rootfd); + + free(hardcoded_fdpath); + close(hardcoded_fd); +} + +#define NUM_TESTS NUM_OPENAT2_OPATH_TESTS + +int main(int argc, char **argv) +{ + ksft_print_header(); + ksft_set_plan(NUM_TESTS); + + /* NOTE: We should be checking for CAP_SYS_ADMIN here... */ + if (geteuid() != 0) + ksft_exit_skip("all tests require euid == 0\n"); + + test_openat2_opath_tests(); + + if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0) + ksft_exit_fail(); + else + ksft_exit_pass(); +} From patchwork Tue Nov 5 09:05:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 11227297 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 411581599 for ; Tue, 5 Nov 2019 09:10:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 285AE217F5 for ; Tue, 5 Nov 2019 09:10:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730742AbfKEJJ6 (ORCPT ); Tue, 5 Nov 2019 04:09:58 -0500 Received: from mout-p-201.mailbox.org ([80.241.56.171]:9604 "EHLO mout-p-201.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730555AbfKEJJ5 (ORCPT ); Tue, 5 Nov 2019 04:09:57 -0500 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:105:465:1:2:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 476kQ90KYtzQlBP; Tue, 5 Nov 2019 10:09:53 +0100 (CET) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.241]) by spamfilter02.heinlein-hosting.de (spamfilter02.heinlein-hosting.de [80.241.56.116]) (amavisd-new, port 10030) with ESMTP id fOLl1hihmqMD; Tue, 5 Nov 2019 10:09:46 +0100 (CET) From: Aleksa Sarai To: Al Viro , Jeff Layton , "J. Bruce Fields" , Arnd Bergmann , David Howells , Shuah Khan , Shuah Khan , Ingo Molnar , Peter Zijlstra Cc: Aleksa Sarai , Eric Biederman , Andy Lutomirski , Andrew Morton , Alexei Starovoitov , Kees Cook , Jann Horn , Tycho Andersen , David Drysdale , Chanho Min , Oleg Nesterov , Rasmus Villemoes , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Christian Brauner , Aleksa Sarai , Linus Torvalds , containers@lists.linux-foundation.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, libc-alpha@sourceware.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH v15 9/9] Documentation: path-lookup: mention LOOKUP_MAGICLINK_JUMPED Date: Tue, 5 Nov 2019 20:05:53 +1100 Message-Id: <20191105090553.6350-10-cyphar@cyphar.com> In-Reply-To: <20191105090553.6350-1-cyphar@cyphar.com> References: <20191105090553.6350-1-cyphar@cyphar.com> MIME-Version: 1.0 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Now that we have a special flag to signify magic-link jumps, mention it within the path-lookup docs. And now that "magic link" is the correct term for nd_jump_link()-style symlinks, clean up references to this type of "symlink". Signed-off-by: Aleksa Sarai --- Documentation/filesystems/path-lookup.rst | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst index 434a07b0002b..2c32795389bd 100644 --- a/Documentation/filesystems/path-lookup.rst +++ b/Documentation/filesystems/path-lookup.rst @@ -405,6 +405,10 @@ is requested. Keeping a reference in the ``nameidata`` ensures that only one root is in effect for the entire path walk, even if it races with a ``chroot()`` system call. +It should be noted that in the case of ``LOOKUP_IN_ROOT`` or +``LOOKUP_BENEATH``, the effective root becomes the directory file descriptor +passed to ``openat2()`` (which exposes these ``LOOKUP_`` flags). + The root is needed when either of two conditions holds: (1) either the pathname or a symbolic link starts with a "'/'", or (2) a "``..``" component is being handled, since "``..``" from the root must always stay @@ -1149,7 +1153,7 @@ so ``NULL`` is returned to indicate that the symlink can be released and the stack frame discarded. The other case involves things in ``/proc`` that look like symlinks but -aren't really:: +aren't really (and are therefore commonly referred to as "magic-links"):: $ ls -l /proc/self/fd/1 lrwx------ 1 neilb neilb 64 Jun 13 10:19 /proc/self/fd/1 -> /dev/pts/4 @@ -1310,12 +1314,14 @@ longer needed. ``LOOKUP_JUMPED`` means that the current dentry was chosen not because it had the right name but for some other reason. This happens when following "``..``", following a symlink to ``/``, crossing a mount point -or accessing a "``/proc/$PID/fd/$FD``" symlink. In this case the -filesystem has not been asked to revalidate the name (with -``d_revalidate()``). In such cases the inode may still need to be -revalidated, so ``d_op->d_weak_revalidate()`` is called if +or accessing a "``/proc/$PID/fd/$FD``" symlink (also known as a "magic +link"). In this case the filesystem has not been asked to revalidate the +name (with ``d_revalidate()``). In such cases the inode may still need +to be revalidated, so ``d_op->d_weak_revalidate()`` is called if ``LOOKUP_JUMPED`` is set when the look completes - which may be at the -final component or, when creating, unlinking, or renaming, at the penultimate component. +final component or, when creating, unlinking, or renaming, at the +penultimate component. ``LOOKUP_MAGICLINK_JUMPED`` is set alongside +``LOOKUP_JUMPED`` if a magic-link was traversed. Final-component flags ~~~~~~~~~~~~~~~~~~~~~