Message ID | 20220817214728.489904-3-axelrasmussen@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | userfaultfd: add /dev/userfaultfd for fine grained access control | expand |
On Wed, Aug 17, 2022 at 02:47:25PM -0700, Axel Rasmussen wrote: > static int __init userfaultfd_init(void) > { > + WARN_ON(misc_register(&userfaultfd_misc)); Please no. Spell this out and properly error out if there is an issue: int ret; ret = misc_register(&userfaultfd_misc); if (ret) return ret; Handle issues properly, don't paper over them with WARN_ON(). thanks, greg k-h
On Wed, Aug 17, 2022 at 02:47:25PM -0700, Axel Rasmussen wrote: > +static int userfaultfd_dev_open(struct inode *inode, struct file *file) > +{ > + return 0; If your open does nothing, no need to list it here at all, right? > +} > + > +static long userfaultfd_dev_ioctl(struct file *file, unsigned int cmd, unsigned long flags) > +{ > + if (cmd != USERFAULTFD_IOC_NEW) > + return -EINVAL; > + > + return new_userfaultfd(flags); > +} > + > +static const struct file_operations userfaultfd_dev_fops = { > + .open = userfaultfd_dev_open, > + .unlocked_ioctl = userfaultfd_dev_ioctl, > + .compat_ioctl = userfaultfd_dev_ioctl, Why do you need to set compat_ioctl? Shouldn't it just default to the existing one? And why is this a device node at all? Shouldn't the syscall handle all of this (to be honest, I didn't read anything but the misc code, sorry.) thanks, greg k-h
On Thu, Aug 18, 2022 at 08:26:38AM +0200, Greg KH wrote: > On Wed, Aug 17, 2022 at 02:47:25PM -0700, Axel Rasmussen wrote: > > +static int userfaultfd_dev_open(struct inode *inode, struct file *file) > > +{ > > + return 0; > > If your open does nothing, no need to list it here at all, right? > > > +} > > + > > +static long userfaultfd_dev_ioctl(struct file *file, unsigned int cmd, unsigned long flags) > > +{ > > + if (cmd != USERFAULTFD_IOC_NEW) > > + return -EINVAL; > > + > > + return new_userfaultfd(flags); > > +} > > + > > +static const struct file_operations userfaultfd_dev_fops = { > > + .open = userfaultfd_dev_open, > > + .unlocked_ioctl = userfaultfd_dev_ioctl, > > + .compat_ioctl = userfaultfd_dev_ioctl, > > Why do you need to set compat_ioctl? Shouldn't it just default to the > existing one? > > And why is this a device node at all? Shouldn't the syscall handle all > of this (to be honest, I didn't read anything but the misc code, sorry.) Ah, read the documentation now. Seems you want to make it easier for people to get permissions on a system. Doesn't seem wise, but hey, it's not my feature... thanks, greg k-h
On Wed, Aug 17, 2022 at 11:32 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Thu, Aug 18, 2022 at 08:26:38AM +0200, Greg KH wrote: > > On Wed, Aug 17, 2022 at 02:47:25PM -0700, Axel Rasmussen wrote: > > > +static int userfaultfd_dev_open(struct inode *inode, struct file *file) > > > +{ > > > + return 0; > > > > If your open does nothing, no need to list it here at all, right? > > > > > +} > > > + > > > +static long userfaultfd_dev_ioctl(struct file *file, unsigned int cmd, unsigned long flags) > > > +{ > > > + if (cmd != USERFAULTFD_IOC_NEW) > > > + return -EINVAL; > > > + > > > + return new_userfaultfd(flags); > > > +} > > > + > > > +static const struct file_operations userfaultfd_dev_fops = { > > > + .open = userfaultfd_dev_open, > > > + .unlocked_ioctl = userfaultfd_dev_ioctl, > > > + .compat_ioctl = userfaultfd_dev_ioctl, > > > > Why do you need to set compat_ioctl? Shouldn't it just default to the > > existing one? > > > > And why is this a device node at all? Shouldn't the syscall handle all > > of this (to be honest, I didn't read anything but the misc code, sorry.) > > Ah, read the documentation now. Seems you want to make it easier for > people to get permissions on a system. Doesn't seem wise, but hey, it's > not my feature... Thanks for taking a look Greg! WIth the syscall, the only way to get access to this feature is to have CAP_SYS_PTRACE. Which gives you access to this, *plus* a bunch more stuff. My basic goal is to grant access to just this feature by itself, not really just to make it easier to access. I think a device node is the simplest way to achieve that (see the cover letter for considered alternatives). The other feedback looks like good simplification to me - I'll send another version with those changes. I have to admit this is the first time I've messed with misc device nodes, so apologies for being overly explicit. :) > > thanks, > > greg k-h
On Wed, Aug 17, 2022 at 11:26 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Wed, Aug 17, 2022 at 02:47:25PM -0700, Axel Rasmussen wrote: > > +static int userfaultfd_dev_open(struct inode *inode, struct file *file) > > +{ > > + return 0; > > If your open does nothing, no need to list it here at all, right? > > > +} > > + > > +static long userfaultfd_dev_ioctl(struct file *file, unsigned int cmd, unsigned long flags) > > +{ > > + if (cmd != USERFAULTFD_IOC_NEW) > > + return -EINVAL; > > + > > + return new_userfaultfd(flags); > > +} > > + > > +static const struct file_operations userfaultfd_dev_fops = { > > + .open = userfaultfd_dev_open, > > + .unlocked_ioctl = userfaultfd_dev_ioctl, > > + .compat_ioctl = userfaultfd_dev_ioctl, > > Why do you need to set compat_ioctl? Shouldn't it just default to the > existing one? I took some more time looking at this today, and I think it actually has to be the way it is. I didn't find anywhere we noticed compat_ioctl unset, and default to the "normal" one (e.g. see the compat ioctl syscall definition in fs/ioctl.c). It looks to me like it really does need some value. It's common to use compat_ptr_ioctl for this, but since we're interpreting the arg as a scalar not as a pointer, doing that here would be incorrect. It looks like there are other existing examples that do it the same way, e.g. seccomp_notify_ops in linux/seccomp.c. > > And why is this a device node at all? Shouldn't the syscall handle all > of this (to be honest, I didn't read anything but the misc code, sorry.) > > thanks, > > greg k-h
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 1c44bf75f916..698e768d5c3d 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -30,6 +30,7 @@ #include <linux/security.h> #include <linux/hugetlb.h> #include <linux/swapops.h> +#include <linux/miscdevice.h> int sysctl_unprivileged_userfaultfd __read_mostly; @@ -415,13 +416,8 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; - if ((vmf->flags & FAULT_FLAG_USER) == 0 && - ctx->flags & UFFD_USER_MODE_ONLY) { - printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " - "sysctl knob to 1 if kernel faults must be handled " - "without obtaining CAP_SYS_PTRACE capability\n"); + if (!(vmf->flags & FAULT_FLAG_USER) && (ctx->flags & UFFD_USER_MODE_ONLY)) goto out; - } /* * If it's already released don't get it. This avoids to loop @@ -2052,20 +2048,11 @@ static void init_once_userfaultfd_ctx(void *mem) seqcount_spinlock_init(&ctx->refile_seq, &ctx->fault_pending_wqh.lock); } -SYSCALL_DEFINE1(userfaultfd, int, flags) +static int new_userfaultfd(int flags) { struct userfaultfd_ctx *ctx; int fd; - if (!sysctl_unprivileged_userfaultfd && - (flags & UFFD_USER_MODE_ONLY) == 0 && - !capable(CAP_SYS_PTRACE)) { - printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " - "sysctl knob to 1 if kernel faults must be handled " - "without obtaining CAP_SYS_PTRACE capability\n"); - return -EPERM; - } - BUG_ON(!current->mm); /* Check the UFFD_* constants for consistency. */ @@ -2098,8 +2085,62 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) return fd; } +static inline bool userfaultfd_syscall_allowed(int flags) +{ + /* Userspace-only page faults are always allowed */ + if (flags & UFFD_USER_MODE_ONLY) + return true; + + /* + * The user is requesting a userfaultfd which can handle kernel faults. + * Privileged users are always allowed to do this. + */ + if (capable(CAP_SYS_PTRACE)) + return true; + + /* Otherwise, access to kernel fault handling is sysctl controlled. */ + return sysctl_unprivileged_userfaultfd; +} + +SYSCALL_DEFINE1(userfaultfd, int, flags) +{ + if (!userfaultfd_syscall_allowed(flags)) + return -EPERM; + + return new_userfaultfd(flags); +} + +static int userfaultfd_dev_open(struct inode *inode, struct file *file) +{ + return 0; +} + +static long userfaultfd_dev_ioctl(struct file *file, unsigned int cmd, unsigned long flags) +{ + if (cmd != USERFAULTFD_IOC_NEW) + return -EINVAL; + + return new_userfaultfd(flags); +} + +static const struct file_operations userfaultfd_dev_fops = { + .open = userfaultfd_dev_open, + .unlocked_ioctl = userfaultfd_dev_ioctl, + .compat_ioctl = userfaultfd_dev_ioctl, + .owner = THIS_MODULE, + .llseek = noop_llseek, +}; + +static struct miscdevice userfaultfd_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = "userfaultfd", + .fops = &userfaultfd_dev_fops +}; + static int __init userfaultfd_init(void) { + WARN_ON(misc_register(&userfaultfd_misc)); + userfaultfd_ctx_cachep = kmem_cache_create("userfaultfd_ctx_cache", sizeof(struct userfaultfd_ctx), 0, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 7d32b1e797fb..005e5e306266 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -12,6 +12,10 @@ #include <linux/types.h> +/* ioctls for /dev/userfaultfd */ +#define USERFAULTFD_IOC 0xAA +#define USERFAULTFD_IOC_NEW _IO(USERFAULTFD_IOC, 0x00) + /* * If the UFFDIO_API is upgraded someday, the UFFDIO_UNREGISTER and * UFFDIO_WAKE ioctls should be defined as _IOW and not as _IOR. In