From patchwork Fri Nov 20 03:04:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lokesh Gidra X-Patchwork-Id: 11919417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E2D1C64E69 for ; Fri, 20 Nov 2020 03:04:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BB3CB2223F for ; Fri, 20 Nov 2020 03:04:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="bn7ilro4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727021AbgKTDEU (ORCPT ); Thu, 19 Nov 2020 22:04:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726494AbgKTDEU (ORCPT ); Thu, 19 Nov 2020 22:04:20 -0500 Received: from mail-qv1-xf4a.google.com (mail-qv1-xf4a.google.com [IPv6:2607:f8b0:4864:20::f4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D72ADC061A04 for ; Thu, 19 Nov 2020 19:04:19 -0800 (PST) Received: by mail-qv1-xf4a.google.com with SMTP id bn4so6138747qvb.9 for ; Thu, 19 Nov 2020 19:04:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=S80ppjIa8avRgCLzMtS9uD6gmPO2GY129S5ovczlpcA=; b=bn7ilro4cOy+tDQBg0BRd9vDaADZLaBkGp8sycj7nJbZrcIxlJ8zRDBvHRXiSp4/vt bCGgK1DJGcSDZjcdWvldVAXk1Kr4M9ydE7q4MBcnvrbaj3SY1Xn3DjskcExrURpmNq/u /k9u5/4jdFDvoWSgxZ2b3dJmKDyDIz+KCQ6ri9RfFSySl2P/JzZddjgT6ZDBZYmIJwZj ySNIw9IdveLjnuXgBiEIlp0j10K5uwBXgUIhyff9SIXaEBmLNkopb1NZwpYFoVrcOe5I VqjQGTivND7CdMI4ftMS3HLMtAfS4yOPIVF/Qu8baAHx6XAhDLquEzfKJT0HkGoDhjEj mXDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=S80ppjIa8avRgCLzMtS9uD6gmPO2GY129S5ovczlpcA=; b=IRoLC1E6fqQL/WmmOGXSw1nfT7XTvAMPKoOzYP/l9TYgD5O4UkwwtFs90Ih8s1vcPz /a7TPXmUIBvFC1L6JzLzjN8IrahTWFFjxxzqg50oVcpD/gZ7YWvLc0jAPGdIeB879g/z nu4P9I/gsCzlF4Eyju0K/KoLa8O6P/o/p7JrYkoYC8tnFj9R7pI+a9kfM17016ZnA3Jn ubjIC8odl+YL0jNpbCK/hhu9oJIStwPEe5q119loGKLP5CgaW9UoQC8xVtMd3PnG1L8M JrmQj1pg4hoYLC7B5c1PfWE7M0Xq3Of7hpo4r91yQZHgjwBtc98PXBmOr/rCUCiLZ8lR N34w== X-Gm-Message-State: AOAM532njnm7B1D1kj8RN/4prp8KirNghE9OEQfcZw1ghGdLA1XiLxrQ SwngC0VKHPopzMg0tc7fDA2h6tDNA1O07v7vfg== X-Google-Smtp-Source: ABdhPJx34e9nPiDw+gT8zarXTKmsqrTynJL70oY3j/bY/HhHf4bACXfvvpHHXZO8YLCgaXOcRpzYo2Ixc5atUVGxXA== Sender: "lokeshgidra via sendgmr" X-Received: from lg.mtv.corp.google.com ([100.89.4.209]) (user=lokeshgidra job=sendgmr) by 2002:ad4:4051:: with SMTP id r17mr10076025qvp.39.1605841458917; Thu, 19 Nov 2020 19:04:18 -0800 (PST) Date: Thu, 19 Nov 2020 19:04:10 -0800 In-Reply-To: <20201120030411.2690816-1-lokeshgidra@google.com> Message-Id: <20201120030411.2690816-2-lokeshgidra@google.com> Mime-Version: 1.0 References: <20201120030411.2690816-1-lokeshgidra@google.com> X-Mailer: git-send-email 2.29.2.454.gaff20da3a2-goog Subject: [PATCH v6 1/2] Add UFFD_USER_MODE_ONLY From: Lokesh Gidra To: Kees Cook , Jonathan Corbet , Peter Xu , Andrea Arcangeli , Sebastian Andrzej Siewior , Andrew Morton Cc: Alexander Viro , Stephen Smalley , Eric Biggers , Lokesh Gidra , Daniel Colascione , "Joel Fernandes (Google)" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kaleshsingh@google.com, calin@google.com, surenb@google.com, jeffv@google.com, kernel-team@android.com, Mike Rapoport , Shaohua Li , Jerome Glisse , Mauro Carvalho Chehab , Johannes Weiner , Mel Gorman , Nitin Gupta , Vlastimil Babka , Iurii Zaikin , Luis Chamberlain , linux-mm@kvack.kernel.org, Daniel Colascione Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org userfaultfd handles page faults from both user and kernel code. Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting userfaultfd object refuse to handle faults from kernel mode, treating these faults as if SIGBUS were always raised, causing the kernel code to fail with EFAULT. A future patch adds a knob allowing administrators to give some processes the ability to create userfaultfd file objects only if they pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will exploit userfaultfd's ability to delay kernel page faults to open timing windows for future exploits. Signed-off-by: Daniel Colascione Signed-off-by: Lokesh Gidra Reviewed-by: Andrea Arcangeli --- fs/userfaultfd.c | 10 +++++++++- include/uapi/linux/userfaultfd.h | 9 +++++++++ 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 000b457ad087..605599fde015 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -405,6 +405,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; + if ((vmf->flags & FAULT_FLAG_USER) == 0 && + ctx->flags & UFFD_USER_MODE_ONLY) { + printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " + "sysctl knob to 1 if kernel faults must be handled " + "without obtaining CAP_SYS_PTRACE capability\n"); + goto out; + } /* * If it's already released don't get it. This avoids to loop @@ -1965,10 +1972,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) BUG_ON(!current->mm); /* Check the UFFD_* constants for consistency. */ + BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS); BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK); - if (flags & ~UFFD_SHARED_FCNTL_FLAGS) + if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY)) return -EINVAL; ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index e7e98bde221f..5f2d88212f7c 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -257,4 +257,13 @@ struct uffdio_writeprotect { __u64 mode; }; +/* + * Flags for the userfaultfd(2) system call itself. + */ + +/* + * Create a userfaultfd that can handle page faults only in user mode. + */ +#define UFFD_USER_MODE_ONLY 1 + #endif /* _LINUX_USERFAULTFD_H */ From patchwork Fri Nov 20 03:04:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lokesh Gidra X-Patchwork-Id: 11919415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB0B0C6379F for ; Fri, 20 Nov 2020 03:04:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 81EEE22277 for ; Fri, 20 Nov 2020 03:04:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="jy+3wxEY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727131AbgKTDEY (ORCPT ); Thu, 19 Nov 2020 22:04:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727125AbgKTDEX (ORCPT ); Thu, 19 Nov 2020 22:04:23 -0500 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78C48C0613CF for ; Thu, 19 Nov 2020 19:04:22 -0800 (PST) Received: by mail-pf1-x449.google.com with SMTP id e68so3228717pfe.4 for ; Thu, 19 Nov 2020 19:04:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=7bRLqN/4G8Vg/p+SruLvrAF5HknNrILrTJkN47+Yh/M=; b=jy+3wxEYjXh2uJxBT5hnkLUqSwcMvOJq7K7163jGJqJndQn+hV52uzfcxnVzk2k7i2 QxLzVlV42ZSMsgRw5im9AXWR981BWPgW8qyhodIrSmP+aJev97b6YM4FFBOUGTE1GkTB GZnlTAJyizj93ZLIANIIcRqmvuWnRQdXyBDWSuOIbY9l0eqtjfMZDhoZCsdagyRyzffj WS9ayYQfLKTj7PCL0EekOKJy19pub543GwDI6/9EjqPNJGDK7JoTv/CQ01J3lE9agX+I d0ja28FpMJ8VAORlvDNm5PQHGCluoL9XRDafZGPVjnjii9vA5/vU/DURLMI73BbKlz+y ZSzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=7bRLqN/4G8Vg/p+SruLvrAF5HknNrILrTJkN47+Yh/M=; b=BIxfZYIp17SpJAT1kqWtLbEYotIvyLUAKaexvKULc6riCI5otuszzA8tMcvWAxiaVi wqCs9DmLNpnEdhO1JDOh0kwckA+YlqktPG/QQIy7qyn2FGIXo44yS1GYCPvU1zHbSB6r WddF0VYgvfyQGTflCXm/WbvUPgZ475jgqbQL8HchOpzkZSCZFSIgP60mAMI/rcOVaxwI +ffJCkienxcCUJHUz1lZciXrofr8y8BamhFUWaUNMwG4YVsK/ME01gGMqHZStQ9/olHx XgYEbU0iyHKzaY1McGXj3xaCBxHTA6ea7CsvcDZJXc1EkwwvhJcRiLjOrPgjmvqtsaPb /XHQ== X-Gm-Message-State: AOAM5316TTi+sujNU8xjCw0y1MKJMZMYVGyJymypZHa1oez8OKE5Se9u rbffU+T6jJr3jidBjsRqM6hpdq0ZRixueERz0A== X-Google-Smtp-Source: ABdhPJychvoYkRk5RFftYRsayhMxGVviZlTyCQyVi0Xzz+20ZfzBXFAwD+Ni7rRNIeeqK17MbtDJdSLYdGMf/LyCvQ== Sender: "lokeshgidra via sendgmr" X-Received: from lg.mtv.corp.google.com ([100.89.4.209]) (user=lokeshgidra job=sendgmr) by 2002:a05:6a00:c8:b029:18b:b0e:e51 with SMTP id e8-20020a056a0000c8b029018b0b0e0e51mr11622681pfj.37.1605841461865; Thu, 19 Nov 2020 19:04:21 -0800 (PST) Date: Thu, 19 Nov 2020 19:04:11 -0800 In-Reply-To: <20201120030411.2690816-1-lokeshgidra@google.com> Message-Id: <20201120030411.2690816-3-lokeshgidra@google.com> Mime-Version: 1.0 References: <20201120030411.2690816-1-lokeshgidra@google.com> X-Mailer: git-send-email 2.29.2.454.gaff20da3a2-goog Subject: [PATCH v6 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob From: Lokesh Gidra To: Kees Cook , Jonathan Corbet , Peter Xu , Andrea Arcangeli , Sebastian Andrzej Siewior , Andrew Morton Cc: Alexander Viro , Stephen Smalley , Eric Biggers , Lokesh Gidra , Daniel Colascione , "Joel Fernandes (Google)" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kaleshsingh@google.com, calin@google.com, surenb@google.com, jeffv@google.com, kernel-team@android.com, Mike Rapoport , Shaohua Li , Jerome Glisse , Mauro Carvalho Chehab , Johannes Weiner , Mel Gorman , Nitin Gupta , Vlastimil Babka , Iurii Zaikin , Luis Chamberlain , linux-mm@kvack.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org With this change, when the knob is set to 0, it allows unprivileged users to call userfaultfd, like when it is set to 1, but with the restriction that page faults from only user-mode can be handled. In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM. This enables administrators to reduce the likelihood that an attacker with access to userfaultfd can delay faulting kernel code to widen timing windows for other exploits. The default value of this knob is changed to 0. This is required for correct functioning of pipe mutex. However, this will fail postcopy live migration, which will be unnoticeable to the VM guests. To avoid this, set 'vm.userfault = 1' in /sys/sysctl.conf. The main reason this change is desirable as in the short term is that the Android userland will behave as with the sysctl set to zero. So without this commit, any Linux binary using userfaultfd to manage its memory would behave differently if run within the Android userland. For more details, refer to Andrea's reply [1]. [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ Signed-off-by: Lokesh Gidra Reviewed-by: Andrea Arcangeli --- Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++----- fs/userfaultfd.c | 10 ++++++++-- 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index f455fa00c00f..d06a98b2a4e7 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a zone. unprivileged_userfaultfd ======================== -This flag controls whether unprivileged users can use the userfaultfd -system calls. Set this to 1 to allow unprivileged users to use the -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only -privileged users (with SYS_CAP_PTRACE capability). +This flag controls the mode in which unprivileged users can use the +userfaultfd system calls. Set this to 0 to restrict unprivileged users +to handle page faults in user mode only. In this case, users without +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to +succeed. Prohibiting use of userfaultfd for handling faults from kernel +mode may make certain vulnerabilities more difficult to exploit. -The default value is 1. +Set this to 1 to allow unprivileged users to use the userfaultfd system +calls without any restrictions. + +The default value is 0. user_reserve_kbytes diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 605599fde015..894cc28142e7 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -28,7 +28,7 @@ #include #include -int sysctl_unprivileged_userfaultfd __read_mostly = 1; +int sysctl_unprivileged_userfaultfd __read_mostly; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; @@ -1966,8 +1966,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) struct userfaultfd_ctx *ctx; int fd; - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) + if (!sysctl_unprivileged_userfaultfd && + (flags & UFFD_USER_MODE_ONLY) == 0 && + !capable(CAP_SYS_PTRACE)) { + printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " + "sysctl knob to 1 if kernel faults must be handled " + "without obtaining CAP_SYS_PTRACE capability\n"); return -EPERM; + } BUG_ON(!current->mm);