From patchwork Thu Apr 23 00:26:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Colascione X-Patchwork-Id: 11504765 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1DCE51667 for ; Thu, 23 Apr 2020 00:27:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 063922074F for ; Thu, 23 Apr 2020 00:27:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="UM9lf+Uj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726324AbgDWA1B (ORCPT ); Wed, 22 Apr 2020 20:27:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1725846AbgDWA1B (ORCPT ); Wed, 22 Apr 2020 20:27:01 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 040C0C03C1AA for ; Wed, 22 Apr 2020 17:27:01 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id y16so3670199pfe.16 for ; Wed, 22 Apr 2020 17:27:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to; bh=3cs6cpETnu5up7poODqqpjwIIIEEY9B/JJrROxUknZo=; b=UM9lf+UjXr7lQ4WhqVJUaEoeVlpkZXm8IVaaoCLCkjlU4nL3c9A30jdtNMATw5LU30 ZN/uQt6iOFMRSjWe1mIyG0tfJLl07wwCa/9w/q7bnyzDQOzo62S+rW6YV9fc1x2yMaQy U26PJs1dTMsjpsMRYLOd5Y5Y2ITryXXyxPAZ8VA12ffUY6m57kgupwUGYHEqKj1oYPH0 wF2hsOYFZU4zrR9uMbePQ9C1+He52kETBBM+OgoLJ2Lkf3j4fXoRvjjJxfFl+aP+ocyf H5+ezAemb1c/R0uR3+ZNDWcwGgvTWOQ2RA2jRQxlu+8ZJTVKxedPGoiN9dnLIV1O+TvS /04A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to; bh=3cs6cpETnu5up7poODqqpjwIIIEEY9B/JJrROxUknZo=; b=pdMhhTvXvrU9Myy+fFUB4UhRzbXJeopsB+PvAJ7gjSwUp/0YhRBj6yTHgmJPVGDgd0 m5e5qQBfXbEMIbPayFPyzMxCNFxiGFpSMRa2WEhr0+ppxG673d/rIRyJgHi2Sy0YVif3 voDUhOTGNPeVTdtwhqg4WmCIAxYTe3NGxJ6pl+/WidXWlyIUNNHq3G+oyr65Lxe6bjv/ qMIdPwxfwi3U9PjZw9Ylvn0k50bM4uaXBLRMyqLHJcA4GRqFWUpS56UhGL3sUUUanbHn z1WfZlaYp+ZDo5Yj3qunXWJh5rnSDUffLCFoMNNrptszjjxKwx9Z4B9HSArff3mhuRbi vPXQ== X-Gm-Message-State: AGi0Pub3ybEUV4sR33f0EeYYaB3VcVhj4cg3m16COyyQZtIz4XlHBnD4 ScH07AhMOeq2Do4/FAILYgi3pi1Dg7g= X-Google-Smtp-Source: APiQypLEMPU3+WWUCkc7uWfkQ+0lZ7h9UwrP41CKaeCyB4zN/CtnLBODcu/h6ljeA8UPWl8LNoRxpWvc6W0= X-Received: by 2002:a17:90a:b104:: with SMTP id z4mr1426983pjq.115.1587601620510; Wed, 22 Apr 2020 17:27:00 -0700 (PDT) Date: Wed, 22 Apr 2020 17:26:31 -0700 In-Reply-To: <20200423002632.224776-1-dancol@google.com> Message-Id: <20200423002632.224776-2-dancol@google.com> Mime-Version: 1.0 References: <20200423002632.224776-1-dancol@google.com> X-Mailer: git-send-email 2.26.2.303.gf8c07b1a785-goog Subject: [PATCH 1/2] Add UFFD_USER_MODE_ONLY From: Daniel Colascione To: Jonathan Corbet , Alexander Viro , Luis Chamberlain , Kees Cook , Iurii Zaikin , Mauro Carvalho Chehab , Andrew Morton , Andy Shevchenko , Vlastimil Babka , Mel Gorman , Sebastian Andrzej Siewior , Peter Xu , Daniel Colascione , Andrea Arcangeli , Mike Rapoport , Jerome Glisse , Shaohua Li , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, timmurray@google.com, minchan@google.com, sspatil@google.com, lokeshgidra@google.com Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org userfaultfd handles page faults from both user and kernel code. Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting userfaultfd object refuse to handle faults from kernel mode, treating these faults as if SIGBUS were always raised, causing the kernel code to fail with EFAULT. A future patch adds a knob allowing administrators to give some processes the ability to create userfaultfd file objects only if they pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will exploit userfaultfd's ability to delay kernel page faults to open timing windows for future exploits. Signed-off-by: Daniel Colascione --- fs/userfaultfd.c | 7 ++++++- include/uapi/linux/userfaultfd.h | 9 +++++++++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index e39fdec8a0b0..21378abe8f7b 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -418,6 +418,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; + if ((vmf->flags & FAULT_FLAG_USER) == 0 && + ctx->flags & UFFD_USER_MODE_ONLY) + goto out; /* * If it's already released don't get it. This avoids to loop @@ -2003,6 +2006,7 @@ static void init_once_userfaultfd_ctx(void *mem) SYSCALL_DEFINE1(userfaultfd, int, flags) { + static const int uffd_flags = UFFD_USER_MODE_ONLY; struct userfaultfd_ctx *ctx; int fd; @@ -2012,10 +2016,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) BUG_ON(!current->mm); /* Check the UFFD_* constants for consistency. */ + BUILD_BUG_ON(uffd_flags & UFFD_SHARED_FCNTL_FLAGS); BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK); - if (flags & ~UFFD_SHARED_FCNTL_FLAGS) + if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | uffd_flags)) return -EINVAL; ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index e7e98bde221f..5f2d88212f7c 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -257,4 +257,13 @@ struct uffdio_writeprotect { __u64 mode; }; +/* + * Flags for the userfaultfd(2) system call itself. + */ + +/* + * Create a userfaultfd that can handle page faults only in user mode. + */ +#define UFFD_USER_MODE_ONLY 1 + #endif /* _LINUX_USERFAULTFD_H */ From patchwork Thu Apr 23 00:26:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Colascione X-Patchwork-Id: 11504767 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A9115913 for ; Thu, 23 Apr 2020 00:27:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8A07A2074F for ; Thu, 23 Apr 2020 00:27:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DkYFt5JS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726364AbgDWA1I (ORCPT ); Wed, 22 Apr 2020 20:27:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726366AbgDWA1F (ORCPT ); Wed, 22 Apr 2020 20:27:05 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69319C03C1AB for ; Wed, 22 Apr 2020 17:27:04 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id x6so3214461pjg.5 for ; Wed, 22 Apr 2020 17:27:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to; bh=cLhIPyxMn3s+VxrKXhmBoW15tCFqEcrWbg6AizvES3c=; b=DkYFt5JS1cxdS2DHIvHgS0H+WuG1lYhcepNLKeMUdwTbK0/X/yFbZ/ke8bST2Y1V9i N6E+lCb7I55KnDLjr1oRU4jg2YrvHExJlXCWtvVjCQFHgQSq00Iiv9CmL0r9DzgUXxIz UaaOWHLzhaaQIiVOVf2sTLoYnWf9KbR0ubRP67JDTDjSA6utYb0NYceD6RS/GQfcq1Pg Opevl3csIbN4Ha7nrW/YsexZAL2w09hSV64mfaHidynrMu826KXNcPqEVzlvbnXqPlAx 9MYkFM8ZL1iS4waU0L3c3RBioyL2YwAYvwq6T7CUT1yonxAJjXO1oukoKp6D/v74GZt0 HtlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to; bh=cLhIPyxMn3s+VxrKXhmBoW15tCFqEcrWbg6AizvES3c=; b=ltkGQ+qyJ0ywzQNbSuzMuzqbvJRZ6lINFIQhjcAXcTcPsvRYHGCidDCuzXBplcokSh BPYho4qsCvr/OEa8EEYQxHTBs53lmkwNZzvhi5GLrhqMgjgVzIulacItibuOsFrShZnA /EhfngwlQ6R6eisMYrI3u/bDwRaO1jJP8pArXvV47NhdIpOmXoYlQD28e4R9p/a3hw69 TdjissTRmqf5TPCg0SZYTuSV8bvdpBjxn7A+WK1KOpSyb2eTyMrRIzW2yNPx3TFq81vn CMdhS5E5Nel6pAlfr9flV7y9aYR6LqQbntqEr/SfuvOjsl8GQqADset2b8pQH8gRWIAs vIiQ== X-Gm-Message-State: AGi0PubLYtTUC2R5PsK0rh+APC5ozdJJ7UT2xnFW6DR6snbpJB84gypy injYYCg9yhdgWJSFCKUXReec4jbyIj0= X-Google-Smtp-Source: APiQypJ8fFvez1uks1jwaK82spWzI82Nw2+laLmruWmMNiV7KJez/SW8NdNyAD1F4/9ojj2p3zTicA0CU7c= X-Received: by 2002:a17:90a:210b:: with SMTP id a11mr1464127pje.31.1587601623780; Wed, 22 Apr 2020 17:27:03 -0700 (PDT) Date: Wed, 22 Apr 2020 17:26:32 -0700 In-Reply-To: <20200423002632.224776-1-dancol@google.com> Message-Id: <20200423002632.224776-3-dancol@google.com> Mime-Version: 1.0 References: <20200423002632.224776-1-dancol@google.com> X-Mailer: git-send-email 2.26.2.303.gf8c07b1a785-goog Subject: [PATCH 2/2] Add a new sysctl knob: unprivileged_userfaultfd_user_mode_only From: Daniel Colascione To: Jonathan Corbet , Alexander Viro , Luis Chamberlain , Kees Cook , Iurii Zaikin , Mauro Carvalho Chehab , Andrew Morton , Andy Shevchenko , Vlastimil Babka , Mel Gorman , Sebastian Andrzej Siewior , Peter Xu , Daniel Colascione , Andrea Arcangeli , Mike Rapoport , Jerome Glisse , Shaohua Li , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, timmurray@google.com, minchan@google.com, sspatil@google.com, lokeshgidra@google.com Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This sysctl can be set to either zero or one. When zero (the default) the system lets all users call userfaultfd with or without UFFD_USER_MODE_ONLY, modulo other access controls. When unprivileged_userfaultfd_user_mode_only is set to one, users without CAP_SYS_PTRACE must pass UFFD_USER_MODE_ONLY to userfaultfd or the API will fail with EPERM. This facility allows administrators to reduce the likelihood that an attacker with access to userfaultfd can delay faulting kernel code to widen timing windows for other exploits. Signed-off-by: Daniel Colascione --- Documentation/admin-guide/sysctl/vm.rst | 13 +++++++++++++ fs/userfaultfd.c | 11 ++++++++++- include/linux/userfaultfd_k.h | 1 + kernel/sysctl.c | 9 +++++++++ 4 files changed, 33 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 0329a4d3fa9e..4296b508ab74 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -850,6 +850,19 @@ privileged users (with SYS_CAP_PTRACE capability). The default value is 1. +unprivileged_userfaultfd_user_mode_only +======================================== + +This flag controls whether unprivileged users can use the userfaultfd +system calls to handle page faults in kernel mode. If set to zero, +userfaultfd works with or without UFFD_USER_MODE_ONLY, modulo +unprivileged_userfaultfd above. If set to one, users without +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd +to succeed. Prohibiting use of userfaultfd for handling faults from +kernel mode may make certain vulnerabilities more difficult +to exploit. + +The default value is 0. user_reserve_kbytes =================== diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 21378abe8f7b..85cc1ab74361 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -29,6 +29,7 @@ #include int sysctl_unprivileged_userfaultfd __read_mostly = 1; +int sysctl_unprivileged_userfaultfd_user_mode_only __read_mostly = 0; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; @@ -2009,8 +2010,16 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) static const int uffd_flags = UFFD_USER_MODE_ONLY; struct userfaultfd_ctx *ctx; int fd; + bool need_cap_check = false; - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) + if (!sysctl_unprivileged_userfaultfd) + need_cap_check = true; + + if (sysctl_unprivileged_userfaultfd_user_mode_only && + (flags & UFFD_USER_MODE_ONLY) == 0) + need_cap_check = true; + + if (need_cap_check && !capable(CAP_SYS_PTRACE)) return -EPERM; BUG_ON(!current->mm); diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index a8e5f3ea9bb2..d81e30074bf5 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -31,6 +31,7 @@ #define UFFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS) extern int sysctl_unprivileged_userfaultfd; +extern int sysctl_unprivileged_userfaultfd_user_mode_only; extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 8a176d8727a3..9cbdf4483961 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1719,6 +1719,15 @@ static struct ctl_table vm_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, + { + .procname = "unprivileged_userfaultfd_user_mode_only", + .data = &sysctl_unprivileged_userfaultfd_user_mode_only, + .maxlen = sizeof(sysctl_unprivileged_userfaultfd_user_mode_only), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, #endif { } };