Message ID | 20220706082016.2603916-6-chao.p.peng@linux.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: mm: fd-based approach for supporting KVM guest private memory | expand |
On 06.07.22 10:20, Chao Peng wrote: > Introduce a new memfd_create() flag indicating the content of the > created memfd is inaccessible from userspace through ordinary MMU > access (e.g., read/write/mmap). However, the file content can be > accessed via a different mechanism (e.g. KVM MMU) indirectly. > > It provides semantics required for KVM guest private memory support > that a file descriptor with this flag set is going to be used as the > source of guest memory in confidential computing environments such > as Intel TDX/AMD SEV but may not be accessible from host userspace. > > The flag can not coexist with MFD_ALLOW_SEALING, future sealing is > also impossible for a memfd created with this flag. It's kind of weird to have it that way. Why should the user have to care? It's the notifier requirement to have that, no? Why can't we handle that when register a notifier? If anything is already mapped, fail registering the notifier if the notifier has these demands. If registering succeeds, block it internally. Or what am I missing? We might not need the memfile set flag semantics eventually and would not have to expose such a flag to user space.
On Fri, Aug 05, 2022 at 03:28:50PM +0200, David Hildenbrand wrote: > On 06.07.22 10:20, Chao Peng wrote: > > Introduce a new memfd_create() flag indicating the content of the > > created memfd is inaccessible from userspace through ordinary MMU > > access (e.g., read/write/mmap). However, the file content can be > > accessed via a different mechanism (e.g. KVM MMU) indirectly. > > > > It provides semantics required for KVM guest private memory support > > that a file descriptor with this flag set is going to be used as the > > source of guest memory in confidential computing environments such > > as Intel TDX/AMD SEV but may not be accessible from host userspace. > > > > The flag can not coexist with MFD_ALLOW_SEALING, future sealing is > > also impossible for a memfd created with this flag. > > It's kind of weird to have it that way. Why should the user have to > care? It's the notifier requirement to have that, no? > > Why can't we handle that when register a notifier? If anything is > already mapped, fail registering the notifier if the notifier has these > demands. If registering succeeds, block it internally. > > Or what am I missing? We might not need the memfile set flag semantics > eventually and would not have to expose such a flag to user space. This makes sense if doable. The major concern was: is there a reliable way to detect this (already mapped) at the time of memslot registering. Chao > > -- > Thanks, > > David / dhildenb >
On 10.08.22 11:37, Chao Peng wrote: > On Fri, Aug 05, 2022 at 03:28:50PM +0200, David Hildenbrand wrote: >> On 06.07.22 10:20, Chao Peng wrote: >>> Introduce a new memfd_create() flag indicating the content of the >>> created memfd is inaccessible from userspace through ordinary MMU >>> access (e.g., read/write/mmap). However, the file content can be >>> accessed via a different mechanism (e.g. KVM MMU) indirectly. >>> >>> It provides semantics required for KVM guest private memory support >>> that a file descriptor with this flag set is going to be used as the >>> source of guest memory in confidential computing environments such >>> as Intel TDX/AMD SEV but may not be accessible from host userspace. >>> >>> The flag can not coexist with MFD_ALLOW_SEALING, future sealing is >>> also impossible for a memfd created with this flag. >> >> It's kind of weird to have it that way. Why should the user have to >> care? It's the notifier requirement to have that, no? >> >> Why can't we handle that when register a notifier? If anything is >> already mapped, fail registering the notifier if the notifier has these >> demands. If registering succeeds, block it internally. >> >> Or what am I missing? We might not need the memfile set flag semantics >> eventually and would not have to expose such a flag to user space. > > This makes sense if doable. The major concern was: is there a reliable > way to detect this (already mapped) at the time of memslot registering. If too complicated, we could simplify to "was this ever mapped" and fail for now. Hooking into shmem_mmap() might be sufficient for that to get notified about the first mmap. As an alternative, mapping_mapped() or similar *might* do what we want.
On Wed, Aug 10, 2022 at 11:55:19AM +0200, David Hildenbrand wrote: > On 10.08.22 11:37, Chao Peng wrote: > > On Fri, Aug 05, 2022 at 03:28:50PM +0200, David Hildenbrand wrote: > >> On 06.07.22 10:20, Chao Peng wrote: > >>> Introduce a new memfd_create() flag indicating the content of the > >>> created memfd is inaccessible from userspace through ordinary MMU > >>> access (e.g., read/write/mmap). However, the file content can be > >>> accessed via a different mechanism (e.g. KVM MMU) indirectly. > >>> > >>> It provides semantics required for KVM guest private memory support > >>> that a file descriptor with this flag set is going to be used as the > >>> source of guest memory in confidential computing environments such > >>> as Intel TDX/AMD SEV but may not be accessible from host userspace. > >>> > >>> The flag can not coexist with MFD_ALLOW_SEALING, future sealing is > >>> also impossible for a memfd created with this flag. > >> > >> It's kind of weird to have it that way. Why should the user have to > >> care? It's the notifier requirement to have that, no? > >> > >> Why can't we handle that when register a notifier? If anything is > >> already mapped, fail registering the notifier if the notifier has these > >> demands. If registering succeeds, block it internally. > >> > >> Or what am I missing? We might not need the memfile set flag semantics > >> eventually and would not have to expose such a flag to user space. > > > > This makes sense if doable. The major concern was: is there a reliable > > way to detect this (already mapped) at the time of memslot registering. > > If too complicated, we could simplify to "was this ever mapped" and fail > for now. Hooking into shmem_mmap() might be sufficient for that to get > notified about the first mmap. > > As an alternative, mapping_mapped() or similar *might* do what we want. mapping_mapped() sounds the right one, I remember SEV people want first map then unmap. "was this ever mapped" may not work for them. Thanks, Chao > > > > -- > Thanks, > > David / dhildenb
On Fri, Aug 05, 2022 at 03:28:50PM +0200, David Hildenbrand wrote: > On 06.07.22 10:20, Chao Peng wrote: > > Introduce a new memfd_create() flag indicating the content of the > > created memfd is inaccessible from userspace through ordinary MMU > > access (e.g., read/write/mmap). However, the file content can be > > accessed via a different mechanism (e.g. KVM MMU) indirectly. > > > > It provides semantics required for KVM guest private memory support > > that a file descriptor with this flag set is going to be used as the > > source of guest memory in confidential computing environments such > > as Intel TDX/AMD SEV but may not be accessible from host userspace. > > > > The flag can not coexist with MFD_ALLOW_SEALING, future sealing is > > also impossible for a memfd created with this flag. > > It's kind of weird to have it that way. Why should the user have to > care? It's the notifier requirement to have that, no? > > Why can't we handle that when register a notifier? If anything is > already mapped, fail registering the notifier if the notifier has these > demands. If registering succeeds, block it internally. > > Or what am I missing? We might not need the memfile set flag semantics > eventually and would not have to expose such a flag to user space. Well, with the new shim-based[1] implementation the approach without uAPI does not work. We now have two struct file, one is a normal accessible memfd and the other one is wrapper around that hides the memfd from userspace and filters allowed operations. If we first create an accessible memfd that userspace see it would be hard to hide it as by the time userspace may have multiple fds in different processes that point to the same struct file. [1] https://lore.kernel.org/all/20220831142439.65q2gi4g2d2z4ofh@box.shutemov.name
diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h index 7a8a26751c23..48750474b904 100644 --- a/include/uapi/linux/memfd.h +++ b/include/uapi/linux/memfd.h @@ -8,6 +8,7 @@ #define MFD_CLOEXEC 0x0001U #define MFD_ALLOW_SEALING 0x0002U #define MFD_HUGETLB 0x0004U +#define MFD_INACCESSIBLE 0x0008U /* * Huge page size encoding when MFD_HUGETLB is specified, and a huge page diff --git a/mm/memfd.c b/mm/memfd.c index 2afd898798e4..72d7139ccced 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -18,6 +18,7 @@ #include <linux/hugetlb.h> #include <linux/shmem_fs.h> #include <linux/memfd.h> +#include <linux/memfile_notifier.h> #include <uapi/linux/memfd.h> /* @@ -262,7 +263,8 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned long arg) #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1) #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN) -#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB) +#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | \ + MFD_INACCESSIBLE) SYSCALL_DEFINE2(memfd_create, const char __user *, uname, @@ -284,6 +286,10 @@ SYSCALL_DEFINE2(memfd_create, return -EINVAL; } + /* Disallow sealing when MFD_INACCESSIBLE is set. */ + if (flags & MFD_INACCESSIBLE && flags & MFD_ALLOW_SEALING) + return -EINVAL; + /* length includes terminating zero */ len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1); if (len <= 0) @@ -330,12 +336,19 @@ SYSCALL_DEFINE2(memfd_create, if (flags & MFD_ALLOW_SEALING) { file_seals = memfd_file_seals_ptr(file); *file_seals &= ~F_SEAL_SEAL; + } else if (flags & MFD_INACCESSIBLE) { + error = memfile_node_set_flags(file, + MEMFILE_F_USER_INACCESSIBLE); + if (error) + goto err_file; } fd_install(fd, file); kfree(name); return fd; +err_file: + fput(file); err_fd: put_unused_fd(fd); err_name:
Introduce a new memfd_create() flag indicating the content of the created memfd is inaccessible from userspace through ordinary MMU access (e.g., read/write/mmap). However, the file content can be accessed via a different mechanism (e.g. KVM MMU) indirectly. It provides semantics required for KVM guest private memory support that a file descriptor with this flag set is going to be used as the source of guest memory in confidential computing environments such as Intel TDX/AMD SEV but may not be accessible from host userspace. The flag can not coexist with MFD_ALLOW_SEALING, future sealing is also impossible for a memfd created with this flag. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> --- include/uapi/linux/memfd.h | 1 + mm/memfd.c | 15 ++++++++++++++- 2 files changed, 15 insertions(+), 1 deletion(-)