From patchwork Thu Sep 24 06:56:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lokesh Gidra X-Patchwork-Id: 11796381 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 768C259D for ; Thu, 24 Sep 2020 06:56:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 50897235F7 for ; Thu, 24 Sep 2020 06:56:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="IAfJ9hUk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727093AbgIXG4P (ORCPT ); Thu, 24 Sep 2020 02:56:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727090AbgIXG4O (ORCPT ); Thu, 24 Sep 2020 02:56:14 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1597DC0613CE for ; Wed, 23 Sep 2020 23:56:14 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 139so2008760ybe.15 for ; Wed, 23 Sep 2020 23:56:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=0lc69aA0lZ8Rvz8am0rRBV+yoECXRArvVbNXpFjuF40=; b=IAfJ9hUkQEx5zcT/VoPJFQBNov4n/VGUJ2rUk3yYjsbHcmOpHg2psmZnWDQT1HibG8 EoL1YHN1lh6kttKZYhxFxT/bhLxpS+FE5pJfN4nnr5ghPEoQFQX19vUBUc/UYBrd3aiD Y34qiKb+zTLH1vMmlVMKBg7iplLE/XMAYY4RDNo0HjImc21wcDBULe2M9Wu1rNNeJpNw OcyEND8JQgLZiLpqDHihBdOkCGy62BeywRwX0UGdoeAj+mEYZw+9IZ36lYOXp30qEtOe /3XOP5Wpdp9/In1hwPw11yVmfusFF4LGPB41iX8Pk1nN9EJL6Uygo8ox5pY8p2rh5ylP b1mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=0lc69aA0lZ8Rvz8am0rRBV+yoECXRArvVbNXpFjuF40=; b=ph0Qfg16buEKgG33VeBKJo0lGncI86YRg2g0SGIls0OgwRAm0sErSqdoqBqjaivPiw OgNDO90KJyPSkeoReCvydjZQn/uGOrhIhhhULPgpbrsjJXCJo9JZtA9NridyfazFs+ZO 4LjlZByATq+2yDSw+r8ynr475iHiIS4ZiUyZeOH09DMBlQ4YvpmasuD9UfOuGipEIjOQ FZdh6pBpqMn7XeMVt9BiN7eBCb8O6QACmvDv5qKenJHnO7ip5p6lmgxHKsb91rJbpPZ5 VyJrkaLqDbJYnsAt+mSpnxSEJyS4H56MZcJ9eNoFom60xY0qgNQMf/IhSOcsGH2VgQ1r X6BA== X-Gm-Message-State: AOAM530knKDoaYNKskekRIVVDKVQCYLA8NPZ7oWFQqRh0XygakiYe44l icwaNfJFEr2hybjMxL77O5EHXLni8kuwO2PkzQ== X-Google-Smtp-Source: ABdhPJzMmPQPeoJFyWqE98eXL323QIE0VTj57+sU2UBDWAcdC/qsoSA0OeELDrbSu6EGODdQw9UHozulqnwRgeyFRA== Sender: "lokeshgidra via sendgmr" X-Received: from lg.mtv.corp.google.com ([2620:15c:211:202:f693:9fff:fef4:29dd]) (user=lokeshgidra job=sendgmr) by 2002:a5b:e83:: with SMTP id z3mr4661708ybr.289.1600930573277; Wed, 23 Sep 2020 23:56:13 -0700 (PDT) Date: Wed, 23 Sep 2020 23:56:05 -0700 In-Reply-To: <20200924065606.3351177-1-lokeshgidra@google.com> Message-Id: <20200924065606.3351177-2-lokeshgidra@google.com> Mime-Version: 1.0 References: <20200924065606.3351177-1-lokeshgidra@google.com> X-Mailer: git-send-email 2.28.0.681.g6f77f65b4e-goog Subject: [PATCH v4 1/2] Add UFFD_USER_MODE_ONLY From: Lokesh Gidra To: Kees Cook , Jonathan Corbet , Peter Xu , Andrea Arcangeli , Sebastian Andrzej Siewior , Andrew Morton Cc: Alexander Viro , Stephen Smalley , Eric Biggers , Lokesh Gidra , Daniel Colascione , "Joel Fernandes (Google)" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kaleshsingh@google.com, calin@google.com, surenb@google.com, nnk@google.com, jeffv@google.com, kernel-team@android.com, Mike Rapoport , Shaohua Li , Jerome Glisse , Mauro Carvalho Chehab , Johannes Weiner , Mel Gorman , Nitin Gupta , Vlastimil Babka , Iurii Zaikin , Luis Chamberlain , Daniel Colascione Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org userfaultfd handles page faults from both user and kernel code. Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting userfaultfd object refuse to handle faults from kernel mode, treating these faults as if SIGBUS were always raised, causing the kernel code to fail with EFAULT. A future patch adds a knob allowing administrators to give some processes the ability to create userfaultfd file objects only if they pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will exploit userfaultfd's ability to delay kernel page faults to open timing windows for future exploits. Signed-off-by: Daniel Colascione Signed-off-by: Lokesh Gidra --- fs/userfaultfd.c | 6 +++++- include/uapi/linux/userfaultfd.h | 9 +++++++++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 0e4a3837da52..3191434057f3 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -405,6 +405,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; + if ((vmf->flags & FAULT_FLAG_USER) == 0 && + ctx->flags & UFFD_USER_MODE_ONLY) + goto out; /* * If it's already released don't get it. This avoids to loop @@ -1975,10 +1978,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) BUG_ON(!current->mm); /* Check the UFFD_* constants for consistency. */ + BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS); BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK); - if (flags & ~UFFD_SHARED_FCNTL_FLAGS) + if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY)) return -EINVAL; ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index e7e98bde221f..5f2d88212f7c 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -257,4 +257,13 @@ struct uffdio_writeprotect { __u64 mode; }; +/* + * Flags for the userfaultfd(2) system call itself. + */ + +/* + * Create a userfaultfd that can handle page faults only in user mode. + */ +#define UFFD_USER_MODE_ONLY 1 + #endif /* _LINUX_USERFAULTFD_H */ From patchwork Thu Sep 24 06:56:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lokesh Gidra X-Patchwork-Id: 11796383 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 20EB2139F for ; Thu, 24 Sep 2020 06:56:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F23392395A for ; Thu, 24 Sep 2020 06:56:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wFNt8Gqo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727128AbgIXG4T (ORCPT ); Thu, 24 Sep 2020 02:56:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727123AbgIXG4R (ORCPT ); Thu, 24 Sep 2020 02:56:17 -0400 Received: from mail-qt1-x849.google.com (mail-qt1-x849.google.com [IPv6:2607:f8b0:4864:20::849]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B573C0613CE for ; Wed, 23 Sep 2020 23:56:17 -0700 (PDT) Received: by mail-qt1-x849.google.com with SMTP id l5so1748743qtu.20 for ; Wed, 23 Sep 2020 23:56:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=ke1FEsl77Kc9oiXNsFHcCDNOwBSCt2v+UAY//5HtPgA=; b=wFNt8GqobtkvzKQbpeNuwwKD7cJM6653QEQTP0/fQ32HCibNFnNSCAdUGk7XxvxLCQ wZMVMziyIc+3KTIXWzlQ44tdEtv4Jms/lVWMSApeK+7xyiNDVGXduAgCQPWQmGScBe0y b3DZXlUy+6BpBugiPDDa6yscGSmDf4orJiy+k4J33nRIG0YifrMCLg9DsHILEenRyogr xgpuzhDOsJoD/y9ak4A+GfsFCQdUT9rNiT2C+iRojP6S0NSxY5Yy9rih4puQaZqexUQi WfclaOJ5k4/HwBP5gzQpCfxbdYlMU6bi6k0Pb6Pk2/aFrsXrwg4ro0JeWieN9YfqtCIW iqEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ke1FEsl77Kc9oiXNsFHcCDNOwBSCt2v+UAY//5HtPgA=; b=G7xaFfUqr8gZlFxbmu7a/Q2LvYxI8RWOpV5OLhVELGwZ+Poca0TqM/PanxEQ3GpqdS RwiobWWDpPBOWnS9Wi8hQg/KRflKQFcK/t1YhXDI0Q4X9P2YsJzjPXhOtkkcrjVmhwOv 6XF+ChNOKjV8WqjP7i8b40l3KG+3OjSnNTcO2MY3EoMQZmnfsxsWl+0hNnAEniuuu8yv xhOZ6YR/qRxkM/uXWQTpJtrIbtSM+4R1NXC1rOlA/ZPbHXqPLvy2v88Ge7GNq+7Fl3GZ yM3wqoJ41E2O16Ok2gQohuqMzD8RL0sxw8MJ/fcHfxAvXL6fTj9qUGam3+i+3tlrO5/q XVAQ== X-Gm-Message-State: AOAM532BxTQKIZ4c8tcVShx6ndLQqAZ5raiu0OYdHIsJDhaPlRR8vKYo RTcI9bkYhz2N/KkUSpOCEUZ+e/xnxNelf1G9WQ== X-Google-Smtp-Source: ABdhPJxza0fuBaFmbZ3gbUVLmFUZdTvAFAr+hqa4TX5v95zkmpSNEi+rzcne2bTvV+co7oFl2J4bRL0sKnZX61IOKQ== Sender: "lokeshgidra via sendgmr" X-Received: from lg.mtv.corp.google.com ([2620:15c:211:202:f693:9fff:fef4:29dd]) (user=lokeshgidra job=sendgmr) by 2002:ad4:58e3:: with SMTP id di3mr3949934qvb.54.1600930576477; Wed, 23 Sep 2020 23:56:16 -0700 (PDT) Date: Wed, 23 Sep 2020 23:56:06 -0700 In-Reply-To: <20200924065606.3351177-1-lokeshgidra@google.com> Message-Id: <20200924065606.3351177-3-lokeshgidra@google.com> Mime-Version: 1.0 References: <20200924065606.3351177-1-lokeshgidra@google.com> X-Mailer: git-send-email 2.28.0.681.g6f77f65b4e-goog Subject: [PATCH v4 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob From: Lokesh Gidra To: Kees Cook , Jonathan Corbet , Peter Xu , Andrea Arcangeli , Sebastian Andrzej Siewior , Andrew Morton Cc: Alexander Viro , Stephen Smalley , Eric Biggers , Lokesh Gidra , Daniel Colascione , "Joel Fernandes (Google)" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kaleshsingh@google.com, calin@google.com, surenb@google.com, nnk@google.com, jeffv@google.com, kernel-team@android.com, Mike Rapoport , Shaohua Li , Jerome Glisse , Mauro Carvalho Chehab , Johannes Weiner , Mel Gorman , Nitin Gupta , Vlastimil Babka , Iurii Zaikin , Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org With this change, when the knob is set to 0, it allows unprivileged users to call userfaultfd, like when it is set to 1, but with the restriction that page faults from only user-mode can be handled. In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM. This enables administrators to reduce the likelihood that an attacker with access to userfaultfd can delay faulting kernel code to widen timing windows for other exploits. The default value of this knob is changed to 0. This is required for correct functioning of pipe mutex. However, this will fail postcopy live migration, which will be unnoticeable to the VM guests. To avoid this, set 'vm.userfault = 1' in /sys/sysctl.conf. For more details, refer to Andrea's reply [1]. [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ Signed-off-by: Lokesh Gidra --- Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++----- fs/userfaultfd.c | 6 ++++-- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 4b9d2e8e9142..4263d38c3c21 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -871,12 +871,17 @@ file-backed pages is less than the high watermark in a zone. unprivileged_userfaultfd ======================== -This flag controls whether unprivileged users can use the userfaultfd -system calls. Set this to 1 to allow unprivileged users to use the -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only -privileged users (with SYS_CAP_PTRACE capability). +This flag controls the mode in which unprivileged users can use the +userfaultfd system calls. Set this to 0 to restrict unprivileged users +to handle page faults in user mode only. In this case, users without +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to +succeed. Prohibiting use of userfaultfd for handling faults from kernel +mode may make certain vulnerabilities more difficult to exploit. -The default value is 1. +Set this to 1 to allow unprivileged users to use the userfaultfd system +calls without any restrictions. + +The default value is 0. user_reserve_kbytes diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 3191434057f3..3816c11a986a 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -28,7 +28,7 @@ #include #include -int sysctl_unprivileged_userfaultfd __read_mostly = 1; +int sysctl_unprivileged_userfaultfd __read_mostly; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; @@ -1972,7 +1972,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) struct userfaultfd_ctx *ctx; int fd; - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) + if (!sysctl_unprivileged_userfaultfd && + (flags & UFFD_USER_MODE_ONLY) == 0 && + !capable(CAP_SYS_PTRACE)) return -EPERM; BUG_ON(!current->mm);