diff mbox series

[1/2] Add UFFD_USER_MODE_ONLY

Message ID 20200423002632.224776-2-dancol@google.com (mailing list archive)
State New, archived
Headers show
Series Control over userfaultfd kernel-fault handling | expand

Commit Message

Daniel Colascione April 23, 2020, 12:26 a.m. UTC
userfaultfd handles page faults from both user and kernel code.  Add a
new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the
resulting userfaultfd object refuse to handle faults from kernel mode,
treating these faults as if SIGBUS were always raised, causing the
kernel code to fail with EFAULT.

A future patch adds a knob allowing administrators to give some
processes the ability to create userfaultfd file objects only if they
pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
will exploit userfaultfd's ability to delay kernel page faults to open
timing windows for future exploits.

Signed-off-by: Daniel Colascione <dancol@google.com>
---
 fs/userfaultfd.c                 | 7 ++++++-
 include/uapi/linux/userfaultfd.h | 9 +++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

Comments

Michael S. Tsirkin July 24, 2020, 2:28 p.m. UTC | #1
On Wed, Apr 22, 2020 at 05:26:31PM -0700, Daniel Colascione wrote:
> userfaultfd handles page faults from both user and kernel code.  Add a
> new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the
> resulting userfaultfd object refuse to handle faults from kernel mode,
> treating these faults as if SIGBUS were always raised, causing the
> kernel code to fail with EFAULT.
> 
> A future patch adds a knob allowing administrators to give some
> processes the ability to create userfaultfd file objects only if they
> pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
> will exploit userfaultfd's ability to delay kernel page faults to open
> timing windows for future exploits.
> 
> Signed-off-by: Daniel Colascione <dancol@google.com>

Something to add here is that there is separate work on selinux to
support limiting specific userspace programs to only this type of
userfaultfd.

I also think Kees' comment about documenting what is the threat being solved
including some links to external sources still applies.

Finally, a question:

Is there any way at all to increase security without breaking
the assumption that copy_from_user is the same as userspace read?


As an example of a drastical approach that might solve some issues, how
about allocating some special memory and setting some VMA flag, then
limiting copy from/to user to just this subset of virtual addresses?
We can then do things like pin these pages in RAM, forbid
madvise/userfaultfd for these addresses, etc.

Affected userspace then needs to use a kind of a bounce buffer for any
calls into kernel.  This needs much more support from userspace and adds
much more overhead, but on the flip side, affects more ways userspace
can slow down the kernel.

Was this discussed in the past? Links would be appreciated.


> ---
>  fs/userfaultfd.c                 | 7 ++++++-
>  include/uapi/linux/userfaultfd.h | 9 +++++++++
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index e39fdec8a0b0..21378abe8f7b 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -418,6 +418,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
>  
>  	if (ctx->features & UFFD_FEATURE_SIGBUS)
>  		goto out;
> +	if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
> +	    ctx->flags & UFFD_USER_MODE_ONLY)
> +		goto out;
>  
>  	/*
>  	 * If it's already released don't get it. This avoids to loop
> @@ -2003,6 +2006,7 @@ static void init_once_userfaultfd_ctx(void *mem)
>  
>  SYSCALL_DEFINE1(userfaultfd, int, flags)
>  {
> +	static const int uffd_flags = UFFD_USER_MODE_ONLY;
>  	struct userfaultfd_ctx *ctx;
>  	int fd;
>  
> @@ -2012,10 +2016,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
>  	BUG_ON(!current->mm);
>  
>  	/* Check the UFFD_* constants for consistency.  */
> +	BUILD_BUG_ON(uffd_flags & UFFD_SHARED_FCNTL_FLAGS);
>  	BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
>  	BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
>  
> -	if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
> +	if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | uffd_flags))
>  		return -EINVAL;
>  
>  	ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
> diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
> index e7e98bde221f..5f2d88212f7c 100644
> --- a/include/uapi/linux/userfaultfd.h
> +++ b/include/uapi/linux/userfaultfd.h
> @@ -257,4 +257,13 @@ struct uffdio_writeprotect {
>  	__u64 mode;
>  };
>  
> +/*
> + * Flags for the userfaultfd(2) system call itself.
> + */
> +
> +/*
> + * Create a userfaultfd that can handle page faults only in user mode.
> + */
> +#define UFFD_USER_MODE_ONLY 1
> +
>  #endif /* _LINUX_USERFAULTFD_H */
> -- 
> 2.26.2.303.gf8c07b1a785-goog
>
Lokesh Gidra July 24, 2020, 2:46 p.m. UTC | #2
On Fri, Jul 24, 2020 at 7:28 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Apr 22, 2020 at 05:26:31PM -0700, Daniel Colascione wrote:
> > userfaultfd handles page faults from both user and kernel code.  Add a
> > new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the
> > resulting userfaultfd object refuse to handle faults from kernel mode,
> > treating these faults as if SIGBUS were always raised, causing the
> > kernel code to fail with EFAULT.
> >
> > A future patch adds a knob allowing administrators to give some
> > processes the ability to create userfaultfd file objects only if they
> > pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
> > will exploit userfaultfd's ability to delay kernel page faults to open
> > timing windows for future exploits.
> >
> > Signed-off-by: Daniel Colascione <dancol@google.com>
>
> Something to add here is that there is separate work on selinux to
> support limiting specific userspace programs to only this type of
> userfaultfd.
>
> I also think Kees' comment about documenting what is the threat being solved
> including some links to external sources still applies.
>
> Finally, a question:
>
> Is there any way at all to increase security without breaking
> the assumption that copy_from_user is the same as userspace read?
>
>
> As an example of a drastical approach that might solve some issues, how
> about allocating some special memory and setting some VMA flag, then
> limiting copy from/to user to just this subset of virtual addresses?
> We can then do things like pin these pages in RAM, forbid
> madvise/userfaultfd for these addresses, etc.
>
> Affected userspace then needs to use a kind of a bounce buffer for any
> calls into kernel.  This needs much more support from userspace and adds
> much more overhead, but on the flip side, affects more ways userspace
> can slow down the kernel.
>
> Was this discussed in the past? Links would be appreciated.
>
Adding Nick and Jeff to the discussion.
>
> > ---
> >  fs/userfaultfd.c                 | 7 ++++++-
> >  include/uapi/linux/userfaultfd.h | 9 +++++++++
> >  2 files changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index e39fdec8a0b0..21378abe8f7b 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -418,6 +418,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
> >
> >       if (ctx->features & UFFD_FEATURE_SIGBUS)
> >               goto out;
> > +     if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
> > +         ctx->flags & UFFD_USER_MODE_ONLY)
> > +             goto out;
> >
> >       /*
> >        * If it's already released don't get it. This avoids to loop
> > @@ -2003,6 +2006,7 @@ static void init_once_userfaultfd_ctx(void *mem)
> >
> >  SYSCALL_DEFINE1(userfaultfd, int, flags)
> >  {
> > +     static const int uffd_flags = UFFD_USER_MODE_ONLY;
> >       struct userfaultfd_ctx *ctx;
> >       int fd;
> >
> > @@ -2012,10 +2016,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
> >       BUG_ON(!current->mm);
> >
> >       /* Check the UFFD_* constants for consistency.  */
> > +     BUILD_BUG_ON(uffd_flags & UFFD_SHARED_FCNTL_FLAGS);
> >       BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
> >       BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
> >
> > -     if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
> > +     if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | uffd_flags))
> >               return -EINVAL;
> >
> >       ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
> > diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
> > index e7e98bde221f..5f2d88212f7c 100644
> > --- a/include/uapi/linux/userfaultfd.h
> > +++ b/include/uapi/linux/userfaultfd.h
> > @@ -257,4 +257,13 @@ struct uffdio_writeprotect {
> >       __u64 mode;
> >  };
> >
> > +/*
> > + * Flags for the userfaultfd(2) system call itself.
> > + */
> > +
> > +/*
> > + * Create a userfaultfd that can handle page faults only in user mode.
> > + */
> > +#define UFFD_USER_MODE_ONLY 1
> > +
> >  #endif /* _LINUX_USERFAULTFD_H */
> > --
> > 2.26.2.303.gf8c07b1a785-goog
> >
>
Michael S. Tsirkin July 26, 2020, 10:09 a.m. UTC | #3
On Fri, Jul 24, 2020 at 07:46:02AM -0700, Lokesh Gidra wrote:
> On Fri, Jul 24, 2020 at 7:28 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Apr 22, 2020 at 05:26:31PM -0700, Daniel Colascione wrote:
> > > userfaultfd handles page faults from both user and kernel code.  Add a
> > > new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the
> > > resulting userfaultfd object refuse to handle faults from kernel mode,
> > > treating these faults as if SIGBUS were always raised, causing the
> > > kernel code to fail with EFAULT.
> > >
> > > A future patch adds a knob allowing administrators to give some
> > > processes the ability to create userfaultfd file objects only if they
> > > pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
> > > will exploit userfaultfd's ability to delay kernel page faults to open
> > > timing windows for future exploits.
> > >
> > > Signed-off-by: Daniel Colascione <dancol@google.com>
> >
> > Something to add here is that there is separate work on selinux to
> > support limiting specific userspace programs to only this type of
> > userfaultfd.
> >
> > I also think Kees' comment about documenting what is the threat being solved
> > including some links to external sources still applies.
> >
> > Finally, a question:
> >
> > Is there any way at all to increase security without breaking
> > the assumption that copy_from_user is the same as userspace read?
> >
> >
> > As an example of a drastical approach that might solve some issues, how
> > about allocating some special memory and setting some VMA flag, then
> > limiting copy from/to user to just this subset of virtual addresses?
> > We can then do things like pin these pages in RAM, forbid
> > madvise/userfaultfd for these addresses, etc.
> >
> > Affected userspace then needs to use a kind of a bounce buffer for any
> > calls into kernel.  This needs much more support from userspace and adds
> > much more overhead, but on the flip side, affects more ways userspace
> > can slow down the kernel.
> >
> > Was this discussed in the past? Links would be appreciated.
> >
> Adding Nick and Jeff to the discussion.

I guess a valid alternative is to block major faults in copy
to/from user for a given process/group of syscalls. Userspace can mlock
an area it uses for these system calls.

For example, allow BPF/security linux policy block all major faults
until the next syscall.  Yes that would then include userfaultfd.


> >
> > > ---
> > >  fs/userfaultfd.c                 | 7 ++++++-
> > >  include/uapi/linux/userfaultfd.h | 9 +++++++++
> > >  2 files changed, 15 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > index e39fdec8a0b0..21378abe8f7b 100644
> > > --- a/fs/userfaultfd.c
> > > +++ b/fs/userfaultfd.c
> > > @@ -418,6 +418,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
> > >
> > >       if (ctx->features & UFFD_FEATURE_SIGBUS)
> > >               goto out;
> > > +     if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
> > > +         ctx->flags & UFFD_USER_MODE_ONLY)
> > > +             goto out;
> > >
> > >       /*
> > >        * If it's already released don't get it. This avoids to loop
> > > @@ -2003,6 +2006,7 @@ static void init_once_userfaultfd_ctx(void *mem)
> > >
> > >  SYSCALL_DEFINE1(userfaultfd, int, flags)
> > >  {
> > > +     static const int uffd_flags = UFFD_USER_MODE_ONLY;
> > >       struct userfaultfd_ctx *ctx;
> > >       int fd;
> > >
> > > @@ -2012,10 +2016,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
> > >       BUG_ON(!current->mm);
> > >
> > >       /* Check the UFFD_* constants for consistency.  */
> > > +     BUILD_BUG_ON(uffd_flags & UFFD_SHARED_FCNTL_FLAGS);
> > >       BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
> > >       BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
> > >
> > > -     if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
> > > +     if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | uffd_flags))
> > >               return -EINVAL;
> > >
> > >       ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
> > > diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
> > > index e7e98bde221f..5f2d88212f7c 100644
> > > --- a/include/uapi/linux/userfaultfd.h
> > > +++ b/include/uapi/linux/userfaultfd.h
> > > @@ -257,4 +257,13 @@ struct uffdio_writeprotect {
> > >       __u64 mode;
> > >  };
> > >
> > > +/*
> > > + * Flags for the userfaultfd(2) system call itself.
> > > + */
> > > +
> > > +/*
> > > + * Create a userfaultfd that can handle page faults only in user mode.
> > > + */
> > > +#define UFFD_USER_MODE_ONLY 1
> > > +
> > >  #endif /* _LINUX_USERFAULTFD_H */
> > > --
> > > 2.26.2.303.gf8c07b1a785-goog
> > >
> >
diff mbox series

Patch

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index e39fdec8a0b0..21378abe8f7b 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -418,6 +418,9 @@  vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 
 	if (ctx->features & UFFD_FEATURE_SIGBUS)
 		goto out;
+	if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
+	    ctx->flags & UFFD_USER_MODE_ONLY)
+		goto out;
 
 	/*
 	 * If it's already released don't get it. This avoids to loop
@@ -2003,6 +2006,7 @@  static void init_once_userfaultfd_ctx(void *mem)
 
 SYSCALL_DEFINE1(userfaultfd, int, flags)
 {
+	static const int uffd_flags = UFFD_USER_MODE_ONLY;
 	struct userfaultfd_ctx *ctx;
 	int fd;
 
@@ -2012,10 +2016,11 @@  SYSCALL_DEFINE1(userfaultfd, int, flags)
 	BUG_ON(!current->mm);
 
 	/* Check the UFFD_* constants for consistency.  */
+	BUILD_BUG_ON(uffd_flags & UFFD_SHARED_FCNTL_FLAGS);
 	BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
 	BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
 
-	if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
+	if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | uffd_flags))
 		return -EINVAL;
 
 	ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index e7e98bde221f..5f2d88212f7c 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -257,4 +257,13 @@  struct uffdio_writeprotect {
 	__u64 mode;
 };
 
+/*
+ * Flags for the userfaultfd(2) system call itself.
+ */
+
+/*
+ * Create a userfaultfd that can handle page faults only in user mode.
+ */
+#define UFFD_USER_MODE_ONLY 1
+
 #endif /* _LINUX_USERFAULTFD_H */