Message ID | 20201120030411.2690816-3-lokeshgidra@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Control over userfaultfd kernel-fault handling | expand |
On Thu, Nov 19, 2020 at 7:04 PM Lokesh Gidra <lokeshgidra@google.com> wrote: > > With this change, when the knob is set to 0, it allows unprivileged > users to call userfaultfd, like when it is set to 1, but with the > restriction that page faults from only user-mode can be handled. > In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) > must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with > EPERM. > > This enables administrators to reduce the likelihood that an attacker > with access to userfaultfd can delay faulting kernel code to widen > timing windows for other exploits. > > The default value of this knob is changed to 0. This is required for > correct functioning of pipe mutex. However, this will fail postcopy > live migration, which will be unnoticeable to the VM guests. To avoid > this, set 'vm.userfault = 1' in /sys/sysctl.conf. > > The main reason this change is desirable as in the short term is that > the Android userland will behave as with the sysctl set to zero. So > without this commit, any Linux binary using userfaultfd to manage its > memory would behave differently if run within the Android userland. > For more details, refer to Andrea's reply [1]. > > [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ > > Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> > Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> > --- > Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++----- > fs/userfaultfd.c | 10 ++++++++-- > 2 files changed, 18 insertions(+), 7 deletions(-) > > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst > index f455fa00c00f..d06a98b2a4e7 100644 > --- a/Documentation/admin-guide/sysctl/vm.rst > +++ b/Documentation/admin-guide/sysctl/vm.rst > @@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a zone. > unprivileged_userfaultfd > ======================== > > -This flag controls whether unprivileged users can use the userfaultfd > -system calls. Set this to 1 to allow unprivileged users to use the > -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only > -privileged users (with SYS_CAP_PTRACE capability). > +This flag controls the mode in which unprivileged users can use the > +userfaultfd system calls. Set this to 0 to restrict unprivileged users > +to handle page faults in user mode only. In this case, users without > +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to > +succeed. Prohibiting use of userfaultfd for handling faults from kernel > +mode may make certain vulnerabilities more difficult to exploit. > > -The default value is 1. > +Set this to 1 to allow unprivileged users to use the userfaultfd system > +calls without any restrictions. > + > +The default value is 0. > > > user_reserve_kbytes > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 605599fde015..894cc28142e7 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -28,7 +28,7 @@ > #include <linux/security.h> > #include <linux/hugetlb.h> > > -int sysctl_unprivileged_userfaultfd __read_mostly = 1; > +int sysctl_unprivileged_userfaultfd __read_mostly; > > static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; > > @@ -1966,8 +1966,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) > struct userfaultfd_ctx *ctx; > int fd; > > - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) > + if (!sysctl_unprivileged_userfaultfd && > + (flags & UFFD_USER_MODE_ONLY) == 0 && > + !capable(CAP_SYS_PTRACE)) { > + printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " > + "sysctl knob to 1 if kernel faults must be handled " > + "without obtaining CAP_SYS_PTRACE capability\n"); > return -EPERM; > + } > > BUG_ON(!current->mm); > > -- > 2.29.0.rc1.297.gfa9743e501-goog > Adding linux-mm@kvack.org list
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index f455fa00c00f..d06a98b2a4e7 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a zone. unprivileged_userfaultfd ======================== -This flag controls whether unprivileged users can use the userfaultfd -system calls. Set this to 1 to allow unprivileged users to use the -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only -privileged users (with SYS_CAP_PTRACE capability). +This flag controls the mode in which unprivileged users can use the +userfaultfd system calls. Set this to 0 to restrict unprivileged users +to handle page faults in user mode only. In this case, users without +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to +succeed. Prohibiting use of userfaultfd for handling faults from kernel +mode may make certain vulnerabilities more difficult to exploit. -The default value is 1. +Set this to 1 to allow unprivileged users to use the userfaultfd system +calls without any restrictions. + +The default value is 0. user_reserve_kbytes diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 605599fde015..894cc28142e7 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -28,7 +28,7 @@ #include <linux/security.h> #include <linux/hugetlb.h> -int sysctl_unprivileged_userfaultfd __read_mostly = 1; +int sysctl_unprivileged_userfaultfd __read_mostly; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; @@ -1966,8 +1966,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) struct userfaultfd_ctx *ctx; int fd; - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) + if (!sysctl_unprivileged_userfaultfd && + (flags & UFFD_USER_MODE_ONLY) == 0 && + !capable(CAP_SYS_PTRACE)) { + printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " + "sysctl knob to 1 if kernel faults must be handled " + "without obtaining CAP_SYS_PTRACE capability\n"); return -EPERM; + } BUG_ON(!current->mm);