Message ID | 20220719195628.3415852-1-axelrasmussen@google.com (mailing list archive) |
---|---|
Headers | show |
Series | userfaultfd: add /dev/userfaultfd for fine grained access control | expand |
> -----Original Message----- > From: Axel Rasmussen <axelrasmussen@google.com> > Sent: Tuesday, July 19, 2022 12:56 PM > To: Alexander Viro <viro@zeniv.linux.org.uk>; Andrew Morton > <akpm@linux-foundation.org>; Dave Hansen > <dave.hansen@linux.intel.com>; Dmitry V . Levin <ldv@altlinux.org>; Gleb > Fotengauer-Malinovskiy <glebfm@altlinux.org>; Hugh Dickins > <hughd@google.com>; Jan Kara <jack@suse.cz>; Jonathan Corbet > <corbet@lwn.net>; Mel Gorman <mgorman@techsingularity.net>; Mike > Kravetz <mike.kravetz@oracle.com>; Mike Rapoport <rppt@kernel.org>; > Amit, Nadav <namit@vmware.com>; Peter Xu <peterx@redhat.com>; > Shuah Khan <shuah@kernel.org>; Suren Baghdasaryan > <surenb@google.com>; Vlastimil Babka <vbabka@suse.cz>; zhangyi > <yi.zhang@huawei.com> > Cc: Axel Rasmussen <axelrasmussen@google.com>; linux- > doc@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux- > kernel@vger.kernel.org; linux-mm@kvack.org; linux- > kselftest@vger.kernel.org > Subject: [PATCH v4 0/5] userfaultfd: add /dev/userfaultfd for fine grained > access control I assume that leaving the LSM mailing list off of the CC is purely accidental. Please, please include us in the next round. > > This series is based on torvalds/master. > > The series is split up like so: > - Patch 1 is a simple fixup which we should take in any case (even by itself). > - Patches 2-6 add the feature, configurable selftest support, and docs. > > Why not ...? > ============ > > - Why not /proc/[pid]/userfaultfd? The proposed use case for this is for one > process to open a userfaultfd which can intercept another process' page > faults. This seems to me like exactly what CAP_SYS_PTRACE is for, though, > so I > think this use case can simply use a syscall without the powers > CAP_SYS_PTRACE > grants being "too much". > > - Why not use a syscall? Access to syscalls is generally controlled by > capabilities. We don't have a capability which is used for userfaultfd access > without also granting more / other permissions as well, and adding a new > capability was rejected [1]. > > - It's possible a LSM could be used to control access instead. I suspect > adding a brand new one just for this would be rejected, You won't know if you don't ask. > but I think some > existing ones like SELinux can be used to filter syscall access. Enabling > SELinux for large production deployments which don't already use it is > likely to be a huge undertaking though, and I don't think this use case by > itself is enough to motivate that kind of architectural change. > > Changelog > ========= > > v3->v4: > - Picked up an Acked-by on 5/5. > - Updated cover letter to cover "why not ...". > - Refactored userfaultfd_allowed() into userfaultfd_syscall_allowed(). > [Peter] > - Removed obsolete comment from a previous version. [Peter] > - Refactored userfaultfd_open() in selftest. [Peter] > - Reworded admin-guide documentation. [Mike, Peter] > - Squashed 2 commits adding /dev/userfaultfd to selftest and making > selftest > configurable. [Peter] > - Added "syscall" test modifier (the default behavior) to selftest. [Peter] > > v2->v3: > - Rebased onto linux-next/akpm-base, in order to be based on top of the > run_vmtests.sh refactor which was merged previously. > - Picked up some Reviewed-by's. > - Fixed ioctl definition (_IO instead of _IOWR), and stopped using > compat_ptr_ioctl since it is unneeded for ioctls which don't take a pointer. > - Removed the "handle_kernel_faults" bool, simplifying the code. The result > is > logically equivalent, but simpler. > - Fixed userfaultfd selftest so it returns KSFT_SKIP appropriately. > - Reworded documentation per Shuah's feedback on v2. > - Improved example usage for userfaultfd selftest. > > v1->v2: > - Add documentation update. > - Test *both* userfaultfd(2) and /dev/userfaultfd via the selftest. > > [1]: https://lore.kernel.org/lkml/686276b9-4530-2045-6bd8- > 170e5943abe4@schaufler-ca.com/T/ > > Axel Rasmussen (5): > selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh > userfaultfd: add /dev/userfaultfd for fine grained access control > userfaultfd: selftests: modify selftest to use /dev/userfaultfd > userfaultfd: update documentation to describe /dev/userfaultfd > selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh > > Documentation/admin-guide/mm/userfaultfd.rst | 41 +++++++++++- > Documentation/admin-guide/sysctl/vm.rst | 3 + > fs/userfaultfd.c | 69 ++++++++++++++++---- > include/uapi/linux/userfaultfd.h | 4 ++ > tools/testing/selftests/vm/run_vmtests.sh | 11 +++- > tools/testing/selftests/vm/userfaultfd.c | 69 +++++++++++++++++--- > 6 files changed, 169 insertions(+), 28 deletions(-) > > -- > 2.37.0.170.g444d1eabd0-goog
On Wed, Jul 20, 2022 at 3:16 PM Schaufler, Casey <casey.schaufler@intel.com> wrote: > > > -----Original Message----- > > From: Axel Rasmussen <axelrasmussen@google.com> > > Sent: Tuesday, July 19, 2022 12:56 PM > > To: Alexander Viro <viro@zeniv.linux.org.uk>; Andrew Morton > > <akpm@linux-foundation.org>; Dave Hansen > > <dave.hansen@linux.intel.com>; Dmitry V . Levin <ldv@altlinux.org>; Gleb > > Fotengauer-Malinovskiy <glebfm@altlinux.org>; Hugh Dickins > > <hughd@google.com>; Jan Kara <jack@suse.cz>; Jonathan Corbet > > <corbet@lwn.net>; Mel Gorman <mgorman@techsingularity.net>; Mike > > Kravetz <mike.kravetz@oracle.com>; Mike Rapoport <rppt@kernel.org>; > > Amit, Nadav <namit@vmware.com>; Peter Xu <peterx@redhat.com>; > > Shuah Khan <shuah@kernel.org>; Suren Baghdasaryan > > <surenb@google.com>; Vlastimil Babka <vbabka@suse.cz>; zhangyi > > <yi.zhang@huawei.com> > > Cc: Axel Rasmussen <axelrasmussen@google.com>; linux- > > doc@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux- > > kernel@vger.kernel.org; linux-mm@kvack.org; linux- > > kselftest@vger.kernel.org > > Subject: [PATCH v4 0/5] userfaultfd: add /dev/userfaultfd for fine grained > > access control > > I assume that leaving the LSM mailing list off of the CC is purely > accidental. Please, please include us in the next round. Honestly it just hadn't occurred to me, but I'm more than happy to CC it on future revisions. > > > > > This series is based on torvalds/master. > > > > The series is split up like so: > > - Patch 1 is a simple fixup which we should take in any case (even by itself). > > - Patches 2-6 add the feature, configurable selftest support, and docs. > > > > Why not ...? > > ============ > > > > - Why not /proc/[pid]/userfaultfd? The proposed use case for this is for one > > process to open a userfaultfd which can intercept another process' page > > faults. This seems to me like exactly what CAP_SYS_PTRACE is for, though, > > so I > > think this use case can simply use a syscall without the powers > > CAP_SYS_PTRACE > > grants being "too much". > > > > - Why not use a syscall? Access to syscalls is generally controlled by > > capabilities. We don't have a capability which is used for userfaultfd access > > without also granting more / other permissions as well, and adding a new > > capability was rejected [1]. > > > > - It's possible a LSM could be used to control access instead. I suspect > > adding a brand new one just for this would be rejected, > > You won't know if you don't ask. Fair enough - I wonder if MM folks (Andrew, Peter, Nadav especially) would find that approach more palatable than /proc/[pid]/userfaultfd? Would it make sense from our perspective to propose a userfaultfd- or MM-specific LSM for controlling access to certain features? I remember +Andrea saying Red Hat was also interested in some kind of access control mechanism like this. Would one or the other approach be more convenient for you? > > > but I think some > > existing ones like SELinux can be used to filter syscall access. Enabling > > SELinux for large production deployments which don't already use it is > > likely to be a huge undertaking though, and I don't think this use case by > > itself is enough to motivate that kind of architectural change. > > > > Changelog > > ========= > > > > v3->v4: > > - Picked up an Acked-by on 5/5. > > - Updated cover letter to cover "why not ...". > > - Refactored userfaultfd_allowed() into userfaultfd_syscall_allowed(). > > [Peter] > > - Removed obsolete comment from a previous version. [Peter] > > - Refactored userfaultfd_open() in selftest. [Peter] > > - Reworded admin-guide documentation. [Mike, Peter] > > - Squashed 2 commits adding /dev/userfaultfd to selftest and making > > selftest > > configurable. [Peter] > > - Added "syscall" test modifier (the default behavior) to selftest. [Peter] > > > > v2->v3: > > - Rebased onto linux-next/akpm-base, in order to be based on top of the > > run_vmtests.sh refactor which was merged previously. > > - Picked up some Reviewed-by's. > > - Fixed ioctl definition (_IO instead of _IOWR), and stopped using > > compat_ptr_ioctl since it is unneeded for ioctls which don't take a pointer. > > - Removed the "handle_kernel_faults" bool, simplifying the code. The result > > is > > logically equivalent, but simpler. > > - Fixed userfaultfd selftest so it returns KSFT_SKIP appropriately. > > - Reworded documentation per Shuah's feedback on v2. > > - Improved example usage for userfaultfd selftest. > > > > v1->v2: > > - Add documentation update. > > - Test *both* userfaultfd(2) and /dev/userfaultfd via the selftest. > > > > [1]: https://lore.kernel.org/lkml/686276b9-4530-2045-6bd8- > > 170e5943abe4@schaufler-ca.com/T/ > > > > Axel Rasmussen (5): > > selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh > > userfaultfd: add /dev/userfaultfd for fine grained access control > > userfaultfd: selftests: modify selftest to use /dev/userfaultfd > > userfaultfd: update documentation to describe /dev/userfaultfd > > selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh > > > > Documentation/admin-guide/mm/userfaultfd.rst | 41 +++++++++++- > > Documentation/admin-guide/sysctl/vm.rst | 3 + > > fs/userfaultfd.c | 69 ++++++++++++++++---- > > include/uapi/linux/userfaultfd.h | 4 ++ > > tools/testing/selftests/vm/run_vmtests.sh | 11 +++- > > tools/testing/selftests/vm/userfaultfd.c | 69 +++++++++++++++++--- > > 6 files changed, 169 insertions(+), 28 deletions(-) > > > > -- > > 2.37.0.170.g444d1eabd0-goog >
On Jul 20, 2022, at 4:04 PM, Axel Rasmussen <axelrasmussen@google.com> wrote: > ⚠ External Email > > On Wed, Jul 20, 2022 at 3:16 PM Schaufler, Casey > <casey.schaufler@intel.com> wrote: >>> -----Original Message----- >>> From: Axel Rasmussen <axelrasmussen@google.com> >>> Sent: Tuesday, July 19, 2022 12:56 PM >>> To: Alexander Viro <viro@zeniv.linux.org.uk>; Andrew Morton >>> <akpm@linux-foundation.org>; Dave Hansen >>> <dave.hansen@linux.intel.com>; Dmitry V . Levin <ldv@altlinux.org>; Gleb >>> Fotengauer-Malinovskiy <glebfm@altlinux.org>; Hugh Dickins >>> <hughd@google.com>; Jan Kara <jack@suse.cz>; Jonathan Corbet >>> <corbet@lwn.net>; Mel Gorman <mgorman@techsingularity.net>; Mike >>> Kravetz <mike.kravetz@oracle.com>; Mike Rapoport <rppt@kernel.org>; >>> Amit, Nadav <namit@vmware.com>; Peter Xu <peterx@redhat.com>; >>> Shuah Khan <shuah@kernel.org>; Suren Baghdasaryan >>> <surenb@google.com>; Vlastimil Babka <vbabka@suse.cz>; zhangyi >>> <yi.zhang@huawei.com> >>> Cc: Axel Rasmussen <axelrasmussen@google.com>; linux- >>> doc@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux- >>> kernel@vger.kernel.org; linux-mm@kvack.org; linux- >>> kselftest@vger.kernel.org >>> Subject: [PATCH v4 0/5] userfaultfd: add /dev/userfaultfd for fine grained >>> access control >> >> I assume that leaving the LSM mailing list off of the CC is purely >> accidental. Please, please include us in the next round. > > Honestly it just hadn't occurred to me, but I'm more than happy to CC > it on future revisions. > >>> This series is based on torvalds/master. >>> >>> The series is split up like so: >>> - Patch 1 is a simple fixup which we should take in any case (even by itself). >>> - Patches 2-6 add the feature, configurable selftest support, and docs. >>> >>> Why not ...? >>> ============ >>> >>> - Why not /proc/[pid]/userfaultfd? The proposed use case for this is for one >>> process to open a userfaultfd which can intercept another process' page >>> faults. This seems to me like exactly what CAP_SYS_PTRACE is for, though, >>> so I >>> think this use case can simply use a syscall without the powers >>> CAP_SYS_PTRACE >>> grants being "too much". >>> >>> - Why not use a syscall? Access to syscalls is generally controlled by >>> capabilities. We don't have a capability which is used for userfaultfd access >>> without also granting more / other permissions as well, and adding a new >>> capability was rejected [1]. >>> >>> - It's possible a LSM could be used to control access instead. I suspect >>> adding a brand new one just for this would be rejected, >> >> You won't know if you don't ask. > > Fair enough - I wonder if MM folks (Andrew, Peter, Nadav especially) > would find that approach more palatable than /proc/[pid]/userfaultfd? > Would it make sense from our perspective to propose a userfaultfd- or > MM-specific LSM for controlling access to certain features? > > I remember +Andrea saying Red Hat was also interested in some kind of > access control mechanism like this. Would one or the other approach be > more convenient for you? To reiterate my position - I think that /proc/[pid]/userfaultfd is very natural and can be easily extended to support cross-process access of userfaultfd. The necessary access controls are simple in any case. For cross-process access, they are similar to those that are used for other /proc/[pid]/X such as pagemap. I have little experience with LSM and I do not know how real deployments use them. If they are easier to deploy and people prefer them over some pseudo-file, I cannot argue against them.
I finished up some other work and got around to writing a v5 today, but I ran into a problem with /proc/[pid]/userfaultfd. Files in /proc/[pid]/* are owned by the user/group which started the process, and they don't support being chmod'ed. For the userfaultfd device, I think we want the following semantics: - For UFFDs created via the device, we want to always allow handling kernel mode faults - For security, the device should be owned by root:root by default, so unprivileged users don't have default access to handle kernel faults - But, the system administrator should be able to chown/chmod it, to grant access to handling kernel faults for this process more widely. It could be made to work like that but I think it would involve at least: - Special casing userfaultfd in proc_pid_make_inode - Updating setattr/getattr for /proc/[pid] to meaningfully store and then retrieve uid/gid different from the task's, again probably special cased for userfautlfd since we don't want this behavior for other files It seems to me such a change might raise eyebrows among procfs folks. Before I spend the time to write this up, does this seem like something that would obviously be nack'ed? On Wed, Jul 20, 2022 at 4:21 PM Nadav Amit <namit@vmware.com> wrote: > > On Jul 20, 2022, at 4:04 PM, Axel Rasmussen <axelrasmussen@google.com> wrote: > > > ⚠ External Email > > > > On Wed, Jul 20, 2022 at 3:16 PM Schaufler, Casey > > <casey.schaufler@intel.com> wrote: > >>> -----Original Message----- > >>> From: Axel Rasmussen <axelrasmussen@google.com> > >>> Sent: Tuesday, July 19, 2022 12:56 PM > >>> To: Alexander Viro <viro@zeniv.linux.org.uk>; Andrew Morton > >>> <akpm@linux-foundation.org>; Dave Hansen > >>> <dave.hansen@linux.intel.com>; Dmitry V . Levin <ldv@altlinux.org>; Gleb > >>> Fotengauer-Malinovskiy <glebfm@altlinux.org>; Hugh Dickins > >>> <hughd@google.com>; Jan Kara <jack@suse.cz>; Jonathan Corbet > >>> <corbet@lwn.net>; Mel Gorman <mgorman@techsingularity.net>; Mike > >>> Kravetz <mike.kravetz@oracle.com>; Mike Rapoport <rppt@kernel.org>; > >>> Amit, Nadav <namit@vmware.com>; Peter Xu <peterx@redhat.com>; > >>> Shuah Khan <shuah@kernel.org>; Suren Baghdasaryan > >>> <surenb@google.com>; Vlastimil Babka <vbabka@suse.cz>; zhangyi > >>> <yi.zhang@huawei.com> > >>> Cc: Axel Rasmussen <axelrasmussen@google.com>; linux- > >>> doc@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux- > >>> kernel@vger.kernel.org; linux-mm@kvack.org; linux- > >>> kselftest@vger.kernel.org > >>> Subject: [PATCH v4 0/5] userfaultfd: add /dev/userfaultfd for fine grained > >>> access control > >> > >> I assume that leaving the LSM mailing list off of the CC is purely > >> accidental. Please, please include us in the next round. > > > > Honestly it just hadn't occurred to me, but I'm more than happy to CC > > it on future revisions. > > > >>> This series is based on torvalds/master. > >>> > >>> The series is split up like so: > >>> - Patch 1 is a simple fixup which we should take in any case (even by itself). > >>> - Patches 2-6 add the feature, configurable selftest support, and docs. > >>> > >>> Why not ...? > >>> ============ > >>> > >>> - Why not /proc/[pid]/userfaultfd? The proposed use case for this is for one > >>> process to open a userfaultfd which can intercept another process' page > >>> faults. This seems to me like exactly what CAP_SYS_PTRACE is for, though, > >>> so I > >>> think this use case can simply use a syscall without the powers > >>> CAP_SYS_PTRACE > >>> grants being "too much". > >>> > >>> - Why not use a syscall? Access to syscalls is generally controlled by > >>> capabilities. We don't have a capability which is used for userfaultfd access > >>> without also granting more / other permissions as well, and adding a new > >>> capability was rejected [1]. > >>> > >>> - It's possible a LSM could be used to control access instead. I suspect > >>> adding a brand new one just for this would be rejected, > >> > >> You won't know if you don't ask. > > > > Fair enough - I wonder if MM folks (Andrew, Peter, Nadav especially) > > would find that approach more palatable than /proc/[pid]/userfaultfd? > > Would it make sense from our perspective to propose a userfaultfd- or > > MM-specific LSM for controlling access to certain features? > > > > I remember +Andrea saying Red Hat was also interested in some kind of > > access control mechanism like this. Would one or the other approach be > > more convenient for you? > > To reiterate my position - I think that /proc/[pid]/userfaultfd is very > natural and can be easily extended to support cross-process access of > userfaultfd. The necessary access controls are simple in any case. For > cross-process access, they are similar to those that are used for other > /proc/[pid]/X such as pagemap. > > I have little experience with LSM and I do not know how real deployments use > them. If they are easier to deploy and people prefer them over some > pseudo-file, I cannot argue against them. > >
On Aug 1, 2022, at 10:13 AM, Axel Rasmussen <axelrasmussen@google.com> wrote: > ⚠ External Email > > I finished up some other work and got around to writing a v5 today, > but I ran into a problem with /proc/[pid]/userfaultfd. > > Files in /proc/[pid]/* are owned by the user/group which started the > process, and they don't support being chmod'ed. > > For the userfaultfd device, I think we want the following semantics: > - For UFFDs created via the device, we want to always allow handling > kernel mode faults > - For security, the device should be owned by root:root by default, so > unprivileged users don't have default access to handle kernel faults > - But, the system administrator should be able to chown/chmod it, to > grant access to handling kernel faults for this process more widely. > > It could be made to work like that but I think it would involve at least: > > - Special casing userfaultfd in proc_pid_make_inode > - Updating setattr/getattr for /proc/[pid] to meaningfully store and > then retrieve uid/gid different from the task's, again probably > special cased for userfautlfd since we don't want this behavior for > other files > > It seems to me such a change might raise eyebrows among procfs folks. > Before I spend the time to write this up, does this seem like > something that would obviously be nack'ed? [ Please avoid top-posting in the future ] I have no interest in making your life harder than it should be. If you cannot find a suitable alternative, I will not fight against it. How about this alternative: how about following KVM usage-model? IOW: You open /dev/userfaultfd, but this is not the file-descriptor that you use for most operations. Instead you first issue an ioctl - similarly to KVM_CREATE_VM - to get a file-descriptor for your specific process. You then use this new file-descriptor to perform your operations (read/ioctl/etc). This would make the fact that ioctls/reads from different processes refer to different contexts (i.e., file-descriptors) much more natural. Does it sound better?
On Mon, Aug 1, 2022 at 12:53 PM Nadav Amit <namit@vmware.com> wrote: > > On Aug 1, 2022, at 10:13 AM, Axel Rasmussen <axelrasmussen@google.com> wrote: > > > ⚠ External Email > > > > I finished up some other work and got around to writing a v5 today, > > but I ran into a problem with /proc/[pid]/userfaultfd. > > > > Files in /proc/[pid]/* are owned by the user/group which started the > > process, and they don't support being chmod'ed. > > > > For the userfaultfd device, I think we want the following semantics: > > - For UFFDs created via the device, we want to always allow handling > > kernel mode faults > > - For security, the device should be owned by root:root by default, so > > unprivileged users don't have default access to handle kernel faults > > - But, the system administrator should be able to chown/chmod it, to > > grant access to handling kernel faults for this process more widely. > > > > It could be made to work like that but I think it would involve at least: > > > > - Special casing userfaultfd in proc_pid_make_inode > > - Updating setattr/getattr for /proc/[pid] to meaningfully store and > > then retrieve uid/gid different from the task's, again probably > > special cased for userfautlfd since we don't want this behavior for > > other files > > > > It seems to me such a change might raise eyebrows among procfs folks. > > Before I spend the time to write this up, does this seem like > > something that would obviously be nack'ed? > > [ Please avoid top-posting in the future ] I will remember this. Gmail's default behavior is annoying. :/ > > I have no interest in making your life harder than it should be. If you > cannot find a suitable alternative, I will not fight against it. > > How about this alternative: how about following KVM usage-model? > > IOW: You open /dev/userfaultfd, but this is not the file-descriptor that you > use for most operations. Instead you first issue an ioctl - similarly to > KVM_CREATE_VM - to get a file-descriptor for your specific process. You then > use this new file-descriptor to perform your operations (read/ioctl/etc). > > This would make the fact that ioctls/reads from different processes refer to > different contexts (i.e., file-descriptors) much more natural. > > Does it sound better? Ah, that I think is more or less what my series already proposes, if I understand you correctly. The usage is: fd = open(/dev/userfaultfd) /* This FD is only useful for creating new userfaultfds */ uffd = ioctl(fd, USERFAULTFD_IOC_NEW) /* Now you get a real uffd */ close(fd); /* No longer needed now that we have a real uffd */ /* Use uffd to register, COPY, CONTINUE, whatever */ One thing we could do now or in the future is extend USERFAULTFD_IOC_NEW to take a pid as an argument, to support creating uffds for remote processes. And then we get the benefit of permissions for /dev nodes working very naturally - they default to root, but can be configured by the sysadmin via chown/chmod, or udev rules, or whatever.
On Aug 1, 2022, at 3:50 PM, Axel Rasmussen <axelrasmussen@google.com> wrote: > ⚠ External Email > > On Mon, Aug 1, 2022 at 12:53 PM Nadav Amit <namit@vmware.com> wrote: >> On Aug 1, 2022, at 10:13 AM, Axel Rasmussen <axelrasmussen@google.com> wrote: > > Ah, that I think is more or less what my series already proposes, if I > understand you correctly. > > The usage is: > > fd = open(/dev/userfaultfd) /* This FD is only useful for creating new > userfaultfds */ > uffd = ioctl(fd, USERFAULTFD_IOC_NEW) /* Now you get a real uffd */ > close(fd); /* No longer needed now that we have a real uffd */ > > /* Use uffd to register, COPY, CONTINUE, whatever */ > > One thing we could do now or in the future is extend > USERFAULTFD_IOC_NEW to take a pid as an argument, to support creating > uffds for remote processes. > > > > And then we get the benefit of permissions for /dev nodes working very > naturally - they default to root, but can be configured by the > sysadmin via chown/chmod, or udev rules, or whatever. Oh. Stupid me. Then yes, using the /dev/userfaultfd is in line with other usage models, such as KVM. And reading from each file descriptor is indeed providing different output.