From patchwork Sun Oct 11 06:24:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lokesh Gidra X-Patchwork-Id: 11830677 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3226B174A for ; Sun, 11 Oct 2020 06:25:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 16EF420795 for ; Sun, 11 Oct 2020 06:25:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LVXko3Hc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727008AbgJKGZE (ORCPT ); Sun, 11 Oct 2020 02:25:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726898AbgJKGZD (ORCPT ); Sun, 11 Oct 2020 02:25:03 -0400 Received: from mail-qv1-xf49.google.com (mail-qv1-xf49.google.com [IPv6:2607:f8b0:4864:20::f49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA1F7C0613D2 for ; Sat, 10 Oct 2020 23:25:03 -0700 (PDT) Received: by mail-qv1-xf49.google.com with SMTP id eh4so4870465qvb.12 for ; Sat, 10 Oct 2020 23:25:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=EOovoB63mTNgnDeBGShd0QgMbi9PshapUqeSaojca1c=; b=LVXko3Hcrxqpkqjw4qp3l2lqDGjuJbCxZwlPQHlN3ABbKQE5MnJQJGYvrPrzp0+XQJ xe35ME2rFCaVv6FCK8RBf96ywpGvNTtsbOgMFmcsifDgIDsJYxLffmrAJLgPe04ITLhP 7q7V3YPlWvaRT8l/h21YGG8ykaEMDOJrVf7fa4Ogu7ZBflNc0n3VGKRQUato1UY9JDCz 9awfqjwCrbO7YZWSEu12mT6XSBJNXvsdCK2YM2TRriD/nL1oskRfAYOVUiLoz1mXdluR R0BzQdOraoC1ywdVMUmzqmwmIKhxYEJMbfUIrEp1IjVUgpvuHVfJlp28ySs9B/AhhGcv uF7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=EOovoB63mTNgnDeBGShd0QgMbi9PshapUqeSaojca1c=; b=rMuiwor8MlICv3uVnsqM+/U6JoLvWcmErSjOFyibODHNDU0+DVvH1jO6gEYUKiNcPN zPqWMBgdMIsd2nHsUPqKBxSXAwVZKSTNgwripYldjZl8I+JHIcsOql8zrrGmLodoP2Yu LgurE9yyTFHTksGCikc9uGsTExJiV4u/YkflQqPpNmrZns3Qp6vyxqyD59ATQNmqyVVF gqkgthje1Eb74RxxJEJQYS4g8nFbn6PrX0TaZnFzBRcdwjGbUvTLJUG9sK4gntw0+Dvt LPirL18VuNwV3VkdzzJLCpAi7MyaJ/hM5d789aAdc9PbJiAPN1+LefSC75l8yY8SoxNx tNHQ== X-Gm-Message-State: AOAM533oD4SYpOPym46AXhLKzp7I+EWwyKejiUbsjQQF9sEuZJVOpMkE +6VyJdG+rbsXeyiqINkZsP48skiz2QEnGyCByQ== X-Google-Smtp-Source: ABdhPJw9sm8/WVHSmcrHkJYm7Ha57Yv6FZUh8ylmPOpg1TPvFBMEVR2+aHjracYoglMHkrUriRfoN2WmW7juJLjHwA== Sender: "lokeshgidra via sendgmr" X-Received: from lg.mtv.corp.google.com ([2620:15c:211:202:f693:9fff:fef4:29dd]) (user=lokeshgidra job=sendgmr) by 2002:ad4:544a:: with SMTP id h10mr20428528qvt.35.1602397502814; Sat, 10 Oct 2020 23:25:02 -0700 (PDT) Date: Sat, 10 Oct 2020 23:24:55 -0700 In-Reply-To: <20201011062456.4065576-1-lokeshgidra@google.com> Message-Id: <20201011062456.4065576-2-lokeshgidra@google.com> Mime-Version: 1.0 References: <20201011062456.4065576-1-lokeshgidra@google.com> X-Mailer: git-send-email 2.28.0.1011.ga647a8990f-goog Subject: [PATCH v5 1/2] Add UFFD_USER_MODE_ONLY From: Lokesh Gidra To: Kees Cook , Jonathan Corbet , Peter Xu , Andrea Arcangeli , Sebastian Andrzej Siewior , Andrew Morton Cc: Alexander Viro , Stephen Smalley , Eric Biggers , Lokesh Gidra , Daniel Colascione , "Joel Fernandes (Google)" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kaleshsingh@google.com, calin@google.com, surenb@google.com, nnk@google.com, jeffv@google.com, kernel-team@android.com, Mike Rapoport , Shaohua Li , Jerome Glisse , Mauro Carvalho Chehab , Johannes Weiner , Mel Gorman , Nitin Gupta , Vlastimil Babka , Iurii Zaikin , Luis Chamberlain , Daniel Colascione Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org userfaultfd handles page faults from both user and kernel code. Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting userfaultfd object refuse to handle faults from kernel mode, treating these faults as if SIGBUS were always raised, causing the kernel code to fail with EFAULT. A future patch adds a knob allowing administrators to give some processes the ability to create userfaultfd file objects only if they pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will exploit userfaultfd's ability to delay kernel page faults to open timing windows for future exploits. Signed-off-by: Daniel Colascione Signed-off-by: Lokesh Gidra Reviewed-by: Andrea Arcangeli --- fs/userfaultfd.c | 10 +++++++++- include/uapi/linux/userfaultfd.h | 9 +++++++++ 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 0e4a3837da52..bd229f06d4e9 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -405,6 +405,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; + if ((vmf->flags & FAULT_FLAG_USER) == 0 && + ctx->flags & UFFD_USER_MODE_ONLY) { + printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " + "sysctl knob to 1 if kernel faults must be handled " + "without obtaining CAP_SYS_PTRACE capability\n"); + goto out; + } /* * If it's already released don't get it. This avoids to loop @@ -1975,10 +1982,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) BUG_ON(!current->mm); /* Check the UFFD_* constants for consistency. */ + BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS); BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK); - if (flags & ~UFFD_SHARED_FCNTL_FLAGS) + if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY)) return -EINVAL; ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index e7e98bde221f..5f2d88212f7c 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -257,4 +257,13 @@ struct uffdio_writeprotect { __u64 mode; }; +/* + * Flags for the userfaultfd(2) system call itself. + */ + +/* + * Create a userfaultfd that can handle page faults only in user mode. + */ +#define UFFD_USER_MODE_ONLY 1 + #endif /* _LINUX_USERFAULTFD_H */ From patchwork Sun Oct 11 06:24:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lokesh Gidra X-Patchwork-Id: 11830679 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5B194697 for ; Sun, 11 Oct 2020 06:25:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3D76E20795 for ; Sun, 11 Oct 2020 06:25:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="iuK2qujS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726719AbgJKGZK (ORCPT ); Sun, 11 Oct 2020 02:25:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727077AbgJKGZH (ORCPT ); Sun, 11 Oct 2020 02:25:07 -0400 Received: from mail-qt1-x849.google.com (mail-qt1-x849.google.com [IPv6:2607:f8b0:4864:20::849]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D78D5C0613CE for ; Sat, 10 Oct 2020 23:25:06 -0700 (PDT) Received: by mail-qt1-x849.google.com with SMTP id e19so10236808qtq.17 for ; Sat, 10 Oct 2020 23:25:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=5fGPuFwLHe/GLon7KcEGj+q06IZAPIH4nCR0GWMNees=; b=iuK2qujS2Wpb/UqMfnOeCjgHvZ1IsRGnt0EmpuwPtogcu4rNijokbrOyxMBpdb9OAK 85ZHfXv1VPQNQdn1y41zmpAAmWpNR40EPAMVSaiKmahqGdI55+GfXLwGSXuYmI41+d37 mTvdq6tLM9FgV7Sr3lLamNKbDZzCDB0bEup8RZg/XsEMsawRG0IbvaT3zFghbjRIY8CU HsQo72JTpSux84TMMpY3ZwnfkMeBWZusK9Bqzx2ZlC3qbUKbRU+hl7cLM4nkoJvVMw9J qfa01qh2VDH/dee9xBYEYl4LGHYvW7GcqzTeABvB/1rlz3VFA/pXUrZIxZzqMzrgfiO2 L3xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=5fGPuFwLHe/GLon7KcEGj+q06IZAPIH4nCR0GWMNees=; b=eZMKA+PWonzsU/GHFfpnO8xyT+0D0mBmuXjW+3Xhqo28STMbMvWKccg+58NycT3Uxi LrrUaM2qQx8okuSpnfirbyuLroaS5XDiWA6ciI/VVmvK+hbEeOEu+mDe5jmIIWW7ycJP MenQqfXC7DWbKdmEFPMc7rhqo5OC4B2qgWh2L62n9ylzbzEW/PiYlAA0/E88MrZ8RZdP dAx69ZC4YTGEYbwcd1r6anQMhSR+XHe71PFjq7aPgZG0tr77JV5eP0hVWYLv7K43z/Va h/SO+znFI/krREFuA9g0VjzR+UJHVSjNPFceVGWjR3gMoGGjjjbWttDatajG1xz5veSq vh9g== X-Gm-Message-State: AOAM533e/ZhsU/oj7vG6npZO8knL90Rp7kyvVlOfr8p3/Tac0W+JZpNp 5jMehrMbIyC8DsdczoZzVcrCY3c2flsTCX6xFA== X-Google-Smtp-Source: ABdhPJwT/P3/Fh36zKb/0qLPFkU8c3uzBjOUT7W2zQEYpKzol1aZwZdzOhkr0Y/f/ACdpO1wkyJGoNMDZQeurtOaJA== Sender: "lokeshgidra via sendgmr" X-Received: from lg.mtv.corp.google.com ([2620:15c:211:202:f693:9fff:fef4:29dd]) (user=lokeshgidra job=sendgmr) by 2002:a0c:fca9:: with SMTP id h9mr20337980qvq.30.1602397506018; Sat, 10 Oct 2020 23:25:06 -0700 (PDT) Date: Sat, 10 Oct 2020 23:24:56 -0700 In-Reply-To: <20201011062456.4065576-1-lokeshgidra@google.com> Message-Id: <20201011062456.4065576-3-lokeshgidra@google.com> Mime-Version: 1.0 References: <20201011062456.4065576-1-lokeshgidra@google.com> X-Mailer: git-send-email 2.28.0.1011.ga647a8990f-goog Subject: [PATCH v5 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob From: Lokesh Gidra To: Kees Cook , Jonathan Corbet , Peter Xu , Andrea Arcangeli , Sebastian Andrzej Siewior , Andrew Morton Cc: Alexander Viro , Stephen Smalley , Eric Biggers , Lokesh Gidra , Daniel Colascione , "Joel Fernandes (Google)" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kaleshsingh@google.com, calin@google.com, surenb@google.com, nnk@google.com, jeffv@google.com, kernel-team@android.com, Mike Rapoport , Shaohua Li , Jerome Glisse , Mauro Carvalho Chehab , Johannes Weiner , Mel Gorman , Nitin Gupta , Vlastimil Babka , Iurii Zaikin , Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org With this change, when the knob is set to 0, it allows unprivileged users to call userfaultfd, like when it is set to 1, but with the restriction that page faults from only user-mode can be handled. In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM. This enables administrators to reduce the likelihood that an attacker with access to userfaultfd can delay faulting kernel code to widen timing windows for other exploits. The default value of this knob is changed to 0. This is required for correct functioning of pipe mutex. However, this will fail postcopy live migration, which will be unnoticeable to the VM guests. To avoid this, set 'vm.userfault = 1' in /sys/sysctl.conf. For more details, refer to Andrea's reply [1]. [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ Signed-off-by: Lokesh Gidra Reviewed-by: Andrea Arcangeli --- Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++----- fs/userfaultfd.c | 6 ++++-- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 4b9d2e8e9142..4263d38c3c21 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -871,12 +871,17 @@ file-backed pages is less than the high watermark in a zone. unprivileged_userfaultfd ======================== -This flag controls whether unprivileged users can use the userfaultfd -system calls. Set this to 1 to allow unprivileged users to use the -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only -privileged users (with SYS_CAP_PTRACE capability). +This flag controls the mode in which unprivileged users can use the +userfaultfd system calls. Set this to 0 to restrict unprivileged users +to handle page faults in user mode only. In this case, users without +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to +succeed. Prohibiting use of userfaultfd for handling faults from kernel +mode may make certain vulnerabilities more difficult to exploit. -The default value is 1. +Set this to 1 to allow unprivileged users to use the userfaultfd system +calls without any restrictions. + +The default value is 0. user_reserve_kbytes diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index bd229f06d4e9..0f8a975db3be 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -28,7 +28,7 @@ #include #include -int sysctl_unprivileged_userfaultfd __read_mostly = 1; +int sysctl_unprivileged_userfaultfd __read_mostly; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; @@ -1976,7 +1976,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) struct userfaultfd_ctx *ctx; int fd; - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) + if (!sysctl_unprivileged_userfaultfd && + (flags & UFFD_USER_MODE_ONLY) == 0 && + !capable(CAP_SYS_PTRACE)) return -EPERM; BUG_ON(!current->mm);