[v13,1/9] namei: obey trailing magic-link DAC permissions

The ability for userspace to "re-open" file descriptors through
/proc/self/fd has been a very useful tool for all sorts of usecases
(container runtimes are one common example). However, the current
interface for doing this has resulted in some pretty subtle security
holes. Userspace can re-open a file descriptor with more permissions
than the original, which can result in cases such as /proc/$pid/exe
being re-opened O_RDWR at a later date even though (by definition)
/proc/$pid/exe cannot be opened for writing. When combined with O_PATH
the results can get even more confusing.

We cannot block this outright. Aside from userspace already depending on
it, it's a useful feature which can actually increase the security of
userspace. For instance, LXC keeps an O_PATH of the container's
/dev/pts/ptmx that gets re-opened to create new ptys and then uses
TIOCGPTPEER to get the slave end. This allows for pty allocation without
resolving paths inside an (untrusted) container's rootfs. There isn't a
trivial way of doing this that is as straight-forward and safe as O_PATH
re-opening.

Instead we have to restrict it in such a way that it doesn't break
(good) users but does block potential attackers. The solution applied in
this patch is to restrict *re-opening* (not resolution through)
magic-links by requiring that mode of the link be obeyed. Normal
symlinks have modes of a+rwx but magic-links have other modes. These
magic-link modes were historically ignored during path resolution, but
they've now been re-purposed for more useful ends.

It is also necessary to define semantics for the mode of an O_PATH
descriptor, since re-opening a magic-link through an O_PATH needs to be
just as restricted as the corresponding magic-link -- otherwise the
above protection can be bypassed. There are two distinct cases:

 1. The target is a regular file (not a magic-link). Userspace depends
    on being able to re-open the O_PATH of a regular file, so we must
    define the mode to be a+rwx.

 2. The target is a magic-link. In this case, we simply copy the mode of
    the magic-link. This results in an O_PATH of a magic-link
    effectively acting as a no-op in terms of how much re-opening
    privileges a process has.

CAP_DAC_OVERRIDE can be used to override all of these restrictions, but
we only permit &init_userns's capabilities to affect these semantics.
The reason for this is that there isn't a clear way to track what
user_ns is the original owner of a given O_PATH chain -- thus an
unprivileged user could create a new userns and O_PATH the file
descriptor, owning it. All signs would indicate that the user really
does have CAP_DAC_OVERRIDE over the new descriptor and the protection
would be bypassed. We thus opt for the more conservative approach.

I have run this patch on several machines for several days. So far, the
only processes which have hit this case ("loadkeys" and "kbd_mode" from
the kbd package[1]) gracefully handle the permission error and do not
cause any user-visible problems. In order to give users a heads-up, a
warning is output to dmesg whenever may_open_magiclink() refuses access.

Additionally, in order to avoid an attack that Jann Horn found
(involving swapping a single fd between a re-openable file and a
non-reopenable one), we must recompute and save the relevant DAC mode
when doing the jump in nd_jump_link() -- rather than just using
nd->link_inode->i_mode. A PoC of this attack is included as a selftest
later in the patch series.

[1]: http://git.altlinux.org/people/legion/packages/kbd.git

Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
 fs/internal.h                  |   1 +
 fs/namei.c                     | 111 +++++++++++++++++++++++++++++----
 fs/open.c                      |   3 +-
 fs/proc/base.c                 |  49 ++++++++++-----
 fs/proc/fd.c                   |  45 ++++++++++---
 fs/proc/internal.h             |   2 +-
 fs/proc/namespaces.c           |   2 +-
 include/linux/fs.h             |   4 ++
 include/linux/namei.h          |   5 +-
 security/apparmor/apparmorfs.c |   2 +-
 10 files changed, 183 insertions(+), 41 deletions(-)

Message ID	20190930183316.10190-2-cyphar@cyphar.com (mailing list archive)
State	New
Headers	show Return-Path: <SRS0=bENQ=XZ=vger.kernel.org=linux-kselftest-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 258A51920 for <patchwork-linux-kselftest@patchwork.kernel.org>; Mon, 30 Sep 2019 21:04:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0142722515 for <patchwork-linux-kselftest@patchwork.kernel.org>; Mon, 30 Sep 2019 21:04:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732177AbfI3VDW (ORCPT <rfc822;patchwork-linux-kselftest@patchwork.kernel.org>); Mon, 30 Sep 2019 17:03:22 -0400 Received: from mx2.mailbox.org ([80.241.60.215]:16490 "EHLO mx2.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731678AbfI3VDS (ORCPT <rfc822;linux-kselftest@vger.kernel.org>); Mon, 30 Sep 2019 17:03:18 -0400 Received: from smtp2.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:105:465:1:1:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx2.mailbox.org (Postfix) with ESMTPS id 06149A18A5; Mon, 30 Sep 2019 20:34:59 +0200 (CEST) X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp2.mailbox.org ([80.241.60.240]) by hefe.heinlein-support.de (hefe.heinlein-support.de [91.198.250.172]) (amavisd-new, port 10030) with ESMTP id ucvH_ulkmMdN; Mon, 30 Sep 2019 20:34:51 +0200 (CEST) From: Aleksa Sarai <cyphar@cyphar.com> To: Al Viro <viro@zeniv.linux.org.uk>, Jeff Layton <jlayton@kernel.org>, "J. Bruce Fields" <bfields@fieldses.org>, Arnd Bergmann <arnd@arndb.de>, David Howells <dhowells@redhat.com>, Shuah Khan <shuah@kernel.org>, Shuah Khan <skhan@linuxfoundation.org>, Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org> Cc: Aleksa Sarai <cyphar@cyphar.com>, Andy Lutomirski <luto@kernel.org>, Christian Brauner <christian@brauner.io>, Eric Biederman <ebiederm@xmission.com>, Andrew Morton <akpm@linux-foundation.org>, Alexei Starovoitov <ast@kernel.org>, Kees Cook <keescook@chromium.org>, Jann Horn <jannh@google.com>, Tycho Andersen <tycho@tycho.ws>, David Drysdale <drysdale@google.com>, Chanho Min <chanho.min@lge.com>, Oleg Nesterov <oleg@redhat.com>, Rasmus Villemoes <linux@rasmusvillemoes.dk>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>, Aleksa Sarai <asarai@suse.de>, Linus Torvalds <torvalds@linux-foundation.org>, containers@lists.linux-foundation.org, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, libc-alpha@sourceware.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, sparclinux@vger.kernel.org Subject: [PATCH v13 1/9] namei: obey trailing magic-link DAC permissions Date: Tue, 1 Oct 2019 04:33:08 +1000 Message-Id: <20190930183316.10190-2-cyphar@cyphar.com> In-Reply-To: <20190930183316.10190-1-cyphar@cyphar.com> References: <20190930183316.10190-1-cyphar@cyphar.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: <linux-kselftest.vger.kernel.org> X-Mailing-List: linux-kselftest@vger.kernel.org
Series	namei: openat2(2) path resolution restrictions \| expand [v13,0/9] namei: openat2(2) path resolution restrictions [v13,1/9] namei: obey trailing magic-link DAC permissions [v13,2/9] procfs: switch magic-link modes to be more sane [v13,3/9] open: O_EMPTYPATH: procfs-less file descriptor re-opening [v13,4/9] namei: O_BENEATH-style path resolution flags [v13,5/9] namei: LOOKUP_IN_ROOT: chroot-like path resolution [v13,6/9] namei: permit ".." resolution with LOOKUP_{IN_ROOT,BENEATH} [v13,7/9] open: openat2(2) syscall [v13,8/9] selftests: add openat2(2) selftests [v13,9/9] Documentation: update path-lookup to mention trailing magic-links

[v13,1/9] namei: obey trailing magic-link DAC permissions

Commit Message

Patch