Message ID | 20200211225547.235083-1-dancol@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Harden userfaultfd | expand |
On 2/11/2020 2:55 PM, Daniel Colascione wrote: > Userfaultfd in unprivileged contexts could be potentially very > useful. We'd like to harden userfaultfd to make such unprivileged use > less risky. This patch series allows SELinux to manage userfaultfd > file descriptors and allows administrators to limit userfaultfd to > servicing user-mode faults, increasing the difficulty of using > userfaultfd in exploit chains invoking delaying kernel faults. > > A new anon_inodes interface allows callers to opt into SELinux > management of anonymous file objects. In this mode, anon_inodes > creates new ephemeral inodes for anonymous file objects instead of > reusing a singleton dummy inode. A new LSM hook gives security modules > an opportunity to configure and veto these ephemeral inodes. > > Existing anon_inodes users must opt into the new functionality. > > Daniel Colascione (6): > Add a new flags-accepting interface for anonymous inodes > Add a concept of a "secure" anonymous file > Teach SELinux about a new userfaultfd class > Wire UFFD up to SELinux > Let userfaultfd opt out of handling kernel-mode faults > Add a new sysctl for limiting userfaultfd to user mode faults This must be posted to the linux Security Module list <linux-security-module@vger.kernel.org> > > Documentation/admin-guide/sysctl/vm.rst | 13 ++++ > fs/anon_inodes.c | 89 +++++++++++++++++-------- > fs/userfaultfd.c | 29 ++++++-- > include/linux/anon_inodes.h | 27 ++++++-- > include/linux/lsm_hooks.h | 8 +++ > include/linux/security.h | 2 + > include/linux/userfaultfd_k.h | 3 + > include/uapi/linux/userfaultfd.h | 9 +++ > kernel/sysctl.c | 9 +++ > security/security.c | 8 +++ > security/selinux/hooks.c | 68 +++++++++++++++++++ > security/selinux/include/classmap.h | 2 + > 12 files changed, 229 insertions(+), 38 deletions(-) >
On Tue, Feb 11, 2020 at 3:13 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 2/11/2020 2:55 PM, Daniel Colascione wrote: > > Userfaultfd in unprivileged contexts could be potentially very > > useful. We'd like to harden userfaultfd to make such unprivileged use > > less risky. This patch series allows SELinux to manage userfaultfd > > file descriptors and allows administrators to limit userfaultfd to > > servicing user-mode faults, increasing the difficulty of using > > userfaultfd in exploit chains invoking delaying kernel faults. > > > > A new anon_inodes interface allows callers to opt into SELinux > > management of anonymous file objects. In this mode, anon_inodes > > creates new ephemeral inodes for anonymous file objects instead of > > reusing a singleton dummy inode. A new LSM hook gives security modules > > an opportunity to configure and veto these ephemeral inodes. > > > > Existing anon_inodes users must opt into the new functionality. > > > > Daniel Colascione (6): > > Add a new flags-accepting interface for anonymous inodes > > Add a concept of a "secure" anonymous file > > Teach SELinux about a new userfaultfd class > > Wire UFFD up to SELinux > > Let userfaultfd opt out of handling kernel-mode faults > > Add a new sysctl for limiting userfaultfd to user mode faults > > This must be posted to the linux Security Module list > <linux-security-module@vger.kernel.org> Added. I thought selinux@ was sufficient.
Hi! Firstly, thanks for working on this! It's been on my TODO list for a while. :) Casey already recommended including the LSM list to CC (since this is a new LSM -- there are many LSMs). Additionally, the series should probably be sent _to_ the userfaultfd maintainers: Andrea Arcangeli <aarcange@redhat.com> Mike Rapoport <rppt@linux.ibm.com> and I'd also CC a couple other people that have done recent work: Peter Xu <peterx@redhat.com> Jann Horn <jannh@google.com> More notes below... On Tue, Feb 11, 2020 at 02:55:41PM -0800, Daniel Colascione wrote: > Userfaultfd in unprivileged contexts could be potentially very > useful. We'd like to harden userfaultfd to make such unprivileged use > less risky. This patch series allows SELinux to manage userfaultfd > file descriptors and allows administrators to limit userfaultfd to > servicing user-mode faults, increasing the difficulty of using > userfaultfd in exploit chains invoking delaying kernel faults. I actually think these are two very different goals and likely the series could be split into two for them. One is LSM hooking of userfaultfd and the SELinux attachment, and the other is the user-mode fault restrictions. And they would likely go via separate trees (LSM through James's LSM tree, and probably akpm's -mm tree for the sysctl). > A new anon_inodes interface allows callers to opt into SELinux > management of anonymous file objects. In this mode, anon_inodes > creates new ephemeral inodes for anonymous file objects instead of > reusing a singleton dummy inode. A new LSM hook gives security modules > an opportunity to configure and veto these ephemeral inodes. > > Existing anon_inodes users must opt into the new functionality. > > Daniel Colascione (6): > Add a new flags-accepting interface for anonymous inodes > Add a concept of a "secure" anonymous file > Teach SELinux about a new userfaultfd class > Wire UFFD up to SELinux The above is the first "series"... I don't have much opinion about it, though I do like the idea of making userfaultfd visible to the LSM. > Let userfaultfd opt out of handling kernel-mode faults > Add a new sysctl for limiting userfaultfd to user mode faults Now this I'm very interested in. Can you go into more detail about two things: - What is the threat being solved? (I understand the threat, but detailing it in the commit log is important for people who don't know it. Existing commit cefdca0a86be517bc390fc4541e3674b8e7803b0 gets into some of the details already, but I'd like to see reference to external sources like https://duasynt.com/blog/linux-kernel-heap-spray) - Why is this needed in addition to the existing vm.unprivileged_userfaultfd sysctl? (And should this maybe just be another setting for that sysctl, like "2"?) As to the mechanics of the change, I'm not sure I like the idea of adding a UAPI flag for this. Why not just retain the permission check done at open() and if kernelmode faults aren't allowed, ignore them? This would require no changes to existing programs and gains the desired defense. (And, I think, the sysctl value could be bumped to "2" as that's a better default state -- does qemu actually need kernelmode traps?) Thanks again for the patches! -Kees > > Documentation/admin-guide/sysctl/vm.rst | 13 ++++ > fs/anon_inodes.c | 89 +++++++++++++++++-------- > fs/userfaultfd.c | 29 ++++++-- > include/linux/anon_inodes.h | 27 ++++++-- > include/linux/lsm_hooks.h | 8 +++ > include/linux/security.h | 2 + > include/linux/userfaultfd_k.h | 3 + > include/uapi/linux/userfaultfd.h | 9 +++ > kernel/sysctl.c | 9 +++ > security/security.c | 8 +++ > security/selinux/hooks.c | 68 +++++++++++++++++++ > security/selinux/include/classmap.h | 2 + > 12 files changed, 229 insertions(+), 38 deletions(-) > > -- > 2.25.0.225.g125e21ebc7-goog >
On 2/11/20 6:27 PM, Daniel Colascione wrote: > On Tue, Feb 11, 2020 at 3:13 PM Casey Schaufler <casey@schaufler-ca.com> wrote: >> >> On 2/11/2020 2:55 PM, Daniel Colascione wrote: >>> Userfaultfd in unprivileged contexts could be potentially very >>> useful. We'd like to harden userfaultfd to make such unprivileged use >>> less risky. This patch series allows SELinux to manage userfaultfd >>> file descriptors and allows administrators to limit userfaultfd to >>> servicing user-mode faults, increasing the difficulty of using >>> userfaultfd in exploit chains invoking delaying kernel faults. >>> >>> A new anon_inodes interface allows callers to opt into SELinux >>> management of anonymous file objects. In this mode, anon_inodes >>> creates new ephemeral inodes for anonymous file objects instead of >>> reusing a singleton dummy inode. A new LSM hook gives security modules >>> an opportunity to configure and veto these ephemeral inodes. >>> >>> Existing anon_inodes users must opt into the new functionality. >>> >>> Daniel Colascione (6): >>> Add a new flags-accepting interface for anonymous inodes >>> Add a concept of a "secure" anonymous file >>> Teach SELinux about a new userfaultfd class >>> Wire UFFD up to SELinux >>> Let userfaultfd opt out of handling kernel-mode faults >>> Add a new sysctl for limiting userfaultfd to user mode faults >> >> This must be posted to the linux Security Module list >> <linux-security-module@vger.kernel.org> > > Added. I thought selinux@ was sufficient. scripts/get_maintainer.pl can be helpful in identifying relevant lists and maintainers for each patch. I don't use its output blindly as it tends to over-approximate but since your patches span the VFS, LSM framework, and selinux, you do need to include relevant maintainers/lists for each.
On Wed, Feb 12, 2020 at 8:51 AM Kees Cook <keescook@chromium.org> wrote: > On Tue, Feb 11, 2020 at 02:55:41PM -0800, Daniel Colascione wrote: > > Let userfaultfd opt out of handling kernel-mode faults > > Add a new sysctl for limiting userfaultfd to user mode faults > > Now this I'm very interested in. Can you go into more detail about two > things: [...] > - Why is this needed in addition to the existing vm.unprivileged_userfaultfd > sysctl? (And should this maybe just be another setting for that > sysctl, like "2"?) > > As to the mechanics of the change, I'm not sure I like the idea of adding > a UAPI flag for this. Why not just retain the permission check done at > open() and if kernelmode faults aren't allowed, ignore them? This would > require no changes to existing programs and gains the desired defense. > (And, I think, the sysctl value could be bumped to "2" as that's a > better default state -- does qemu actually need kernelmode traps?) I think this might be necessary for I/O emulation? As in, if before getting migrated, the guest writes some data into a buffer, then the guest gets migrated, and then while the postcopy migration stuff is still running, the guest tells QEMU to write that data from guest-physical memory to disk or whatever; I think in that case, QEMU will do something like a pwrite() syscall where the userspace pointer points into the memory area containing guest-physical memory, which would return -EFAULT if userfaultfd was restricted to userspace accesses. This was described in this old presentation about why userfaultfd is better than a SIGSEGV handler: https://drive.google.com/file/d/0BzyAwvVlQckeSzlCSDFmRHVybzQ/view (slide 6) (recording at https://youtu.be/pC8cWWRVSPw?t=463)
Thanks for taking a look and for the fast reply! On Tue, Feb 11, 2020 at 11:51 PM Kees Cook <keescook@chromium.org> wrote: > > Hi! > > Firstly, thanks for working on this! It's been on my TODO list for a > while. :) > > Casey already recommended including the LSM list to CC (since this is a > new LSM -- there are many LSMs). Additionally, the series should > probably be sent _to_ the userfaultfd maintainers: > Andrea Arcangeli <aarcange@redhat.com> > Mike Rapoport <rppt@linux.ibm.com> > and I'd also CC a couple other people that have done recent work: > Peter Xu <peterx@redhat.com> > Jann Horn <jannh@google.com> > > More notes below... In general, in the event that a patch series doesn't include all needed parties on the to-line, what's the right way to fix the situation without spamming everyone and forking the thread? In this case, since I'm splitting the patch series anyway, I can just expand the to-line in the reroll. > On Tue, Feb 11, 2020 at 02:55:41PM -0800, Daniel Colascione wrote: > > Userfaultfd in unprivileged contexts could be potentially very > > useful. We'd like to harden userfaultfd to make such unprivileged use > > less risky. This patch series allows SELinux to manage userfaultfd > > file descriptors and allows administrators to limit userfaultfd to > > servicing user-mode faults, increasing the difficulty of using > > userfaultfd in exploit chains invoking delaying kernel faults. > > I actually think these are two very different goals and likely the > series could be split into two for them. One is LSM hooking of > userfaultfd and the SELinux attachment, and the other is the user-mode > fault restrictions. And they would likely go via separate trees (LSM > through James's LSM tree, and probably akpm's -mm tree for the sysctl). > > > A new anon_inodes interface allows callers to opt into SELinux > > management of anonymous file objects. In this mode, anon_inodes > > creates new ephemeral inodes for anonymous file objects instead of > > reusing a singleton dummy inode. A new LSM hook gives security modules > > an opportunity to configure and veto these ephemeral inodes. > > > > Existing anon_inodes users must opt into the new functionality. > > > > Daniel Colascione (6): > > Add a new flags-accepting interface for anonymous inodes > > Add a concept of a "secure" anonymous file > > Teach SELinux about a new userfaultfd class > > Wire UFFD up to SELinux > > The above is the first "series"... I don't have much opinion about it, > though I do like the idea of making userfaultfd visible to the LSM. Yeah. The interesting part there is the anon_inodes API change. I'll split that half of the series out. > > Let userfaultfd opt out of handling kernel-mode faults > > Add a new sysctl for limiting userfaultfd to user mode faults > > Now this I'm very interested in. Can you go into more detail about two > things: > > - What is the threat being solved? (I understand the threat, but detailing > it in the commit log is important for people who don't know it. Existing > commit cefdca0a86be517bc390fc4541e3674b8e7803b0 gets into some of the > details already, but I'd like to see reference to external sources like > https://duasynt.com/blog/linux-kernel-heap-spray) Sure. I can add a reference to that and a more general discussion of how delaying kernel fault handling can broaden race windows for other exploits. > - Why is this needed in addition to the existing vm.unprivileged_userfaultfd > sysctl? (And should this maybe just be another setting for that > sysctl, like "2"?) We want to use UFFD for a new garbage collector in Android, and we want unprivileged processes to be able to use this new garbage collector. Giving them CAP_SYS_PTRACE is dangerous. > As to the mechanics of the change, I'm not sure I like the idea of adding > a UAPI flag for this. Why not just retain the permission check done at > open() and if kernelmode faults aren't allowed, ignore them? This would > require no changes to existing programs and gains the desired defense. As Jann mentions below, it's a matter of the kernel's contractual obligation to userspace. Right now, userfaultfd(2) either succeeds or it fails with one of the enumerated error codes. You're proposing having the userfaultfd(2) succeed but return a file descriptor that doesn't do the job the kernel promised it would do. If we were to adopt your proposal, an application would see UFFD succeed, then see unexpeced EFAULTs from system calls, which would probably cause the application to malfunctioning in exciting ways. An explicit "sorry, I can't do that" error code is better: an application that gets an error code from userfaultfd(2) can fall back to something else, but an application that silently gets an underfeatured UFFD doesn't know anything is wrong until it's too late. The additional flag preserves the UFFD contract and gives applications a way to probe for feature support. > (And, I think, the sysctl value could be bumped to "2" as that's a > better default state -- does qemu actually need kernelmode traps?) I prefer a default of one for regular systems because I don't want to make experimentation with novel ways to use UFFD more difficult, and a default of two would require users go out of their way to handle user faults, and few will. For a more constrained system like Android, two is fine.
On Wed, Feb 12, 2020 at 05:54:35PM +0100, Jann Horn wrote: > On Wed, Feb 12, 2020 at 8:51 AM Kees Cook <keescook@chromium.org> wrote: > > On Tue, Feb 11, 2020 at 02:55:41PM -0800, Daniel Colascione wrote: > > > Let userfaultfd opt out of handling kernel-mode faults > > > Add a new sysctl for limiting userfaultfd to user mode faults > > > > Now this I'm very interested in. Can you go into more detail about two > > things: > [...] > > - Why is this needed in addition to the existing vm.unprivileged_userfaultfd > > sysctl? (And should this maybe just be another setting for that > > sysctl, like "2"?) > > > > As to the mechanics of the change, I'm not sure I like the idea of adding > > a UAPI flag for this. Why not just retain the permission check done at > > open() and if kernelmode faults aren't allowed, ignore them? This would > > require no changes to existing programs and gains the desired defense. > > (And, I think, the sysctl value could be bumped to "2" as that's a > > better default state -- does qemu actually need kernelmode traps?) > > I think this might be necessary for I/O emulation? As in, if before > getting migrated, the guest writes some data into a buffer, then the > guest gets migrated, and then while the postcopy migration stuff is > still running, the guest tells QEMU to write that data from > guest-physical memory to disk or whatever; I think in that case, QEMU > will do something like a pwrite() syscall where the userspace pointer > points into the memory area containing guest-physical memory, which > would return -EFAULT if userfaultfd was restricted to userspace > accesses. > > This was described in this old presentation about why userfaultfd is > better than a SIGSEGV handler: > https://drive.google.com/file/d/0BzyAwvVlQckeSzlCSDFmRHVybzQ/view > (slide 6) (recording at https://youtu.be/pC8cWWRVSPw?t=463) Right. AFAICT QEMU uses it far more than disk IOs. A guest page can be accessed by any kernel component on the destination host during a postcopy procedure. It can be as simple as when a vcpu writes to a missing guest page which still resides on the source host, then KVM will get a page fault and trap into userfaultfd asking for that page. The same thing happens to other modules like vhost, etc., as long as a missing guest page is touched by a kernel module. Thanks,
Hello everyone, On Wed, Feb 12, 2020 at 12:14:16PM -0500, Peter Xu wrote: > Right. AFAICT QEMU uses it far more than disk IOs. A guest page can > be accessed by any kernel component on the destination host during a > postcopy procedure. It can be as simple as when a vcpu writes to a > missing guest page which still resides on the source host, then KVM > will get a page fault and trap into userfaultfd asking for that page. > The same thing happens to other modules like vhost, etc., as long as a > missing guest page is touched by a kernel module. Correct. How does the android garbage collection work to make sure there cannot be kernel faults on the missing memory? If I understood correctly (I didn't have much time to review sorry) what's proposed with regard to limiting uffd events from kernel faults, the only use case I know that could deal with it is the UFFD_FEATURE_SIGBUS but that's not normal userfaultfd: that's also the only feature required from uffd to implement a pure malloc library in userland that never takes the mmap sem for writing to implement userland mremap/mmap/munmap lib calls (as those will convert to UFFDIO_ZEROPAGE and MADV_DONTNEED internally to the lib and there will be always a single vma). We just need to extend UFFDIO_ZEROPAGE to map the THP zeropage to make this future pure-uffd malloc lib perform better. On the other end I'm also planning a mremap_vma_merge userland syscall that will merge fragmented vmas. Currently once you have a nice heap all contiguous but with small objects and you free the fragments you can't build THP anymore even if you make the memory virtually contiguous again by calling mremap. That just build up a ton of vmas slowing down the app forever and also preventing THP collapsing ever again. mremap_vma_merge will require no new kernel feature, but it fundamentally must be able to handle kernel faults. If databases starts to use that, how can you enable this feature without breaking random apps then? So it'd be a feature usable only by one user (Android) perhaps? And only until you start defragging the vmas of small objects? Thanks, Andrea
On Wed, Feb 12, 2020 at 11:41 AM Andrea Arcangeli <aarcange@redhat.com> wrote: > > Hello everyone, > > On Wed, Feb 12, 2020 at 12:14:16PM -0500, Peter Xu wrote: > > Right. AFAICT QEMU uses it far more than disk IOs. A guest page can > > be accessed by any kernel component on the destination host during a > > postcopy procedure. It can be as simple as when a vcpu writes to a > > missing guest page which still resides on the source host, then KVM > > will get a page fault and trap into userfaultfd asking for that page. > > The same thing happens to other modules like vhost, etc., as long as a > > missing guest page is touched by a kernel module. > > Correct. > > How does the android garbage collection work to make sure there cannot > be kernel faults on the missing memory? We don't pass pointers to the heap into system calls. (Big primitive arrays, ByteBuffer, etc. are allocated off the regular heap.) > If I understood correctly (I didn't have much time to review sorry) > what's proposed with regard to limiting uffd events from kernel > faults, I don't understand what you mean. The purpose of preventing UFFD from handling kernel faults is exploit mitigation. > the only use case I know that could deal with it is the > UFFD_FEATURE_SIGBUS but that's not normal userfaultfd: that's also the > only feature required from uffd to implement a pure malloc library in > userland that never takes the mmap sem for writing to implement > userland mremap/mmap/munmap lib calls (as those will convert to > UFFDIO_ZEROPAGE and MADV_DONTNEED internally to the lib and there will > be always a single vma). We just need to extend UFFDIO_ZEROPAGE to map > the THP zeropage to make this future pure-uffd malloc lib perform > better. The key requirement here is the ability to prevent unprivileged processes from using UFFD to widen kernel exploit windows by preventing UFFD from taking kernel faults. Forcing unprivileged processes to use UFFD only with UFFD_FEATURE_SIGBUS would satisfy this requirement, but it's much less flexible and unnecessarily couples two features. > On the other end I'm also planning a mremap_vma_merge userland syscall > that will merge fragmented vmas. This is probably a separate discussion, but does that operation really need to be a system call? Historically, userspace has operated mostly on page ranges and not VMAs per se, and the kernel has been free to merge and split VMAs as needed for its internal purposes. This approach has serious negative side effects (like making munmap fallible: see [1]), but it is what it is. [1] https://lore.kernel.org/linux-mm/CAKOZuetOD6MkGPVvYFLj5RXh200FaDyu3sQqZviVRhTFFS3fjA@mail.gmail.com/ > Currently once you have a nice heap all contiguous but with small > objects and you free the fragments you can't build THP anymore even if > you make the memory virtually contiguous again by calling mremap. That > just build up a ton of vmas slowing down the app forever and also > preventing THP collapsing ever again. Shouldn't the THP background kthread take care of merging VMAs? > mremap_vma_merge will require no new kernel feature, but it > fundamentally must be able to handle kernel faults. If databases > starts to use that, how can you enable this feature without breaking > random apps then? Presumably, those apps wouldn't issue the system call on address ranges managed with a non-kernel-fault UFFD. > So it'd be a feature usable only by one user (Android) perhaps? And > only until you start defragging the vmas of small objects? We shouldn't be fragmenting at all, either at the memory level or the VMA level. The GC is a moving collector, and we don't punch holes in the heap.
Hi Daniel, On Wed, Feb 12, 2020 at 12:04:39PM -0800, Daniel Colascione wrote: > We don't pass pointers to the heap into system calls. (Big primitive > arrays, ByteBuffer, etc. are allocated off the regular heap.) That sounds pretty restrictive, I wonder what you gain by enforcing that invariant or if it just happened incidentally for some other reason? Do you need to copy that memory every time if you need to do I/O on it? Are you sure this won't need to change any time soon to increase performance? > I don't understand what you mean. The purpose of preventing UFFD from > handling kernel faults is exploit mitigation. That part was clear. What wasn't clear is what the new feature does exactly and what it blocks, because it's all about blocking or how does it make things more secure? > The key requirement here is the ability to prevent unprivileged > processes from using UFFD to widen kernel exploit windows by > preventing UFFD from taking kernel faults. Forcing unprivileged > processes to use UFFD only with UFFD_FEATURE_SIGBUS would satisfy this > requirement, but it's much less flexible and unnecessarily couples two > features. I mentioned it in case you could use something like that model. > > On the other end I'm also planning a mremap_vma_merge userland syscall > > that will merge fragmented vmas. > > This is probably a separate discussion, but does that operation really > need to be a system call? Historically, userspace has operated mostly mremap_vma_merge was not intended as a system call, if implemented as a system call it wouldn't use uffd. > on page ranges and not VMAs per se, and the kernel has been free to Userland doesn't need to know anything.. unless it wants to optimize. The userland can know full well if it does certain mremap operations and puts its ranges virtually contiguous in a non linear way, so that the kernel cannot merge the vmas. > merge and split VMAs as needed for its internal purposes. This > approach has serious negative side effects (like making munmap > fallible: see [1]), but it is what it is. > > [1] https://lore.kernel.org/linux-mm/CAKOZuetOD6MkGPVvYFLj5RXh200FaDyu3sQqZviVRhTFFS3fjA@mail.gmail.com/ The fact it's fallible is a secondary concern here. Even if you make it unlimited, if it grows it slowdown everything and also prevents THP to be collapsed. Even the scalability of the mmap_sem worsens. > > Currently once you have a nice heap all contiguous but with small > > objects and you free the fragments you can't build THP anymore even if > > you make the memory virtually contiguous again by calling mremap. That > > just build up a ton of vmas slowing down the app forever and also > > preventing THP collapsing ever again. > > Shouldn't the THP background kthread take care of merging VMAs? The solution can't depend on any THP feature, because the buildup of vmas is a scalability issue and a performance regression over time even if THP is not configured in the kernel. However once that's solved THP also gets naturally optimized. What should happen (in my view) is just the simplest solution that can defrag and forcefully merge the vmas without having to stop or restart the app. > Presumably, those apps wouldn't issue the system call on address > ranges managed with a non-kernel-fault UFFD. So the new security feature won't have to block kernel faults on those apps and they can run side by side with the blocked app? > We shouldn't be fragmenting at all, either at the memory level or the > VMA level. The GC is a moving collector, and we don't punch holes in > the heap. That sounds good. Thanks, Andrea
Userfaultfd in unprivileged contexts could be potentially very useful. We'd like to harden userfaultfd to make such unprivileged use less risky. This patch series allows SELinux to manage userfaultfd file descriptors and in the future, other kinds of anonymous-inode-based file descriptor. SELinux policy authors can apply policy types to anonymous inodes by providing name-based transition rules keyed off the anonymous inode internal name ( "[userfaultfd]" in the case of userfaultfd(2) file descriptors) and applying policy to the new SIDs thus produced. Inside the kernel, a pair of new anon_inodes interface, anon_inode_getfile_secure and anon_inode_getfd_secure, allow callers to opt into this SELinux management. In this new "secure" mode, anon_inodes creates new ephemeral inodes for anonymous file objects instead of reusing the normal anon_inodes singleton dummy inode. A new LSM hook gives security modules an opportunity to configure and veto these ephemeral inodes. This patch series is one of two fork of [1] and is an alternative to [2]. The primary difference between the two patch series is that this partch series creates a unique inode for each "secure" anonymous inode, while the other patch series ([2]) continues using the singleton dummy anonymous inode and adds a way to attach SELinux security information directly to file objects. I prefer the approach in this patch series because 1) it's a smaller patch than [2], and 2) it produces a more regular security architecture: in this patch series, secure anonymous inodes aren't S_PRIVATE and they maintain the SELinux property that the label for a file is in its inode. We do need an additional inode per anonymous file, but per-struct-file inode creation doesn't seem to be a problem for pipes and sockets. The previous version of this feature ([1]) created a new SELinux security class for userfaultfd file descriptors. This version adopts the generic transition-based approach of [2]. This patch series also differs from [2] in that it doesn't affect all anonymous inodes right away --- instead requiring anon_inodes callers to opt in --- but this difference isn't one of basic approach. The important question to resolve is whether we should be creating new inodes or enhancing per-file data. [1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/ [2] https://lore.kernel.org/linux-fsdevel/20200213194157.5877-1-sds@tycho.nsa.gov/ Daniel Colascione (3): Add a new LSM-supporting anonymous inode interface Teach SELinux about anonymous inodes Wire UFFD up to SELinux fs/anon_inodes.c | 196 ++++++++++++++++++++++++++++-------- fs/userfaultfd.c | 34 +++++-- include/linux/anon_inodes.h | 13 +++ include/linux/lsm_hooks.h | 9 ++ include/linux/security.h | 4 + security/security.c | 10 ++ security/selinux/hooks.c | 57 +++++++++++ 7 files changed, 274 insertions(+), 49 deletions(-)
On Tue, 11 Feb 2020, Daniel Colascione wrote: > > This must be posted to the linux Security Module list > > <linux-security-module@vger.kernel.org> > > Added. I thought selinux@ was sufficient. > Please cc: me on these patches.
Userfaultfd in unprivileged contexts could be potentially very useful. We'd like to harden userfaultfd to make such unprivileged use less risky. This patch series allows SELinux to manage userfaultfd file descriptors and in the future, other kinds of anonymous-inode-based file descriptor. SELinux policy authors can apply policy types to anonymous inodes by providing name-based transition rules keyed off the anonymous inode internal name ( "[userfaultfd]" in the case of userfaultfd(2) file descriptors) and applying policy to the new SIDs thus produced. Inside the kernel, a pair of new anon_inodes interface, anon_inode_getfile_secure and anon_inode_getfd_secure, allow callers to opt into this SELinux management. In this new "secure" mode, anon_inodes creates new ephemeral inodes for anonymous file objects instead of reusing the normal anon_inodes singleton dummy inode. A new LSM hook gives security modules an opportunity to configure and veto these ephemeral inodes. This patch series is one of two fork of [1] and is an alternative to [2]. The primary difference between the two patch series is that this partch series creates a unique inode for each "secure" anonymous inode, while the other patch series ([2]) continues using the singleton dummy anonymous inode and adds a way to attach SELinux security information directly to file objects. I prefer the approach in this patch series because 1) it's a smaller patch than [2], and 2) it produces a more regular security architecture: in this patch series, secure anonymous inodes aren't S_PRIVATE and they maintain the SELinux property that the label for a file is in its inode. We do need an additional inode per anonymous file, but per-struct-file inode creation doesn't seem to be a problem for pipes and sockets. The previous version of this feature ([1]) created a new SELinux security class for userfaultfd file descriptors. This version adopts the generic transition-based approach of [2]. This patch series also differs from [2] in that it doesn't affect all anonymous inodes right away --- instead requiring anon_inodes callers to opt in --- but this difference isn't one of basic approach. The important question to resolve is whether we should be creating new inodes or enhancing per-file data. Changes from the first version of the patch: - Removed some error checks - Defined a new anon_inode SELinux class to resolve the ambiguity in [3] - Inherit sclass as well as descriptor from context inode Changes from the second version of the patch: - Fixed example policy in the commit message to reflect the use of the new anon_inode class. [1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/ [2] https://lore.kernel.org/linux-fsdevel/20200213194157.5877-1-sds@tycho.nsa.gov/ [3] https://lore.kernel.org/lkml/23f725ca-5b5a-5938-fcc8-5bbbfc9ba9bc@tycho.nsa.gov/ Daniel Colascione (3): Add a new LSM-supporting anonymous inode interface Teach SELinux about anonymous inodes Wire UFFD up to SELinux fs/anon_inodes.c | 196 ++++++++++++++++++++++------ fs/userfaultfd.c | 30 ++++- include/linux/anon_inodes.h | 13 ++ include/linux/lsm_hooks.h | 9 ++ include/linux/security.h | 4 + security/security.c | 10 ++ security/selinux/hooks.c | 54 ++++++++ security/selinux/include/classmap.h | 2 + 8 files changed, 272 insertions(+), 46 deletions(-)