diff mbox series

[v7,05/14] mm/memfd: Introduce MFD_INACCESSIBLE flag

Message ID 20220706082016.2603916-6-chao.p.peng@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: mm: fd-based approach for supporting KVM guest private memory | expand

Commit Message

Chao Peng July 6, 2022, 8:20 a.m. UTC
Introduce a new memfd_create() flag indicating the content of the
created memfd is inaccessible from userspace through ordinary MMU
access (e.g., read/write/mmap). However, the file content can be
accessed via a different mechanism (e.g. KVM MMU) indirectly.

It provides semantics required for KVM guest private memory support
that a file descriptor with this flag set is going to be used as the
source of guest memory in confidential computing environments such
as Intel TDX/AMD SEV but may not be accessible from host userspace.

The flag can not coexist with MFD_ALLOW_SEALING, future sealing is
also impossible for a memfd created with this flag.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 include/uapi/linux/memfd.h |  1 +
 mm/memfd.c                 | 15 ++++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

Comments

David Hildenbrand Aug. 5, 2022, 1:28 p.m. UTC | #1
On 06.07.22 10:20, Chao Peng wrote:
> Introduce a new memfd_create() flag indicating the content of the
> created memfd is inaccessible from userspace through ordinary MMU
> access (e.g., read/write/mmap). However, the file content can be
> accessed via a different mechanism (e.g. KVM MMU) indirectly.
> 
> It provides semantics required for KVM guest private memory support
> that a file descriptor with this flag set is going to be used as the
> source of guest memory in confidential computing environments such
> as Intel TDX/AMD SEV but may not be accessible from host userspace.
> 
> The flag can not coexist with MFD_ALLOW_SEALING, future sealing is
> also impossible for a memfd created with this flag.

It's kind of weird to have it that way. Why should the user have to
care? It's the notifier requirement to have that, no?

Why can't we handle that when register a notifier? If anything is
already mapped, fail registering the notifier if the notifier has these
demands. If registering succeeds, block it internally.

Or what am I missing? We might not need the memfile set flag semantics
eventually and would not have to expose such a flag to user space.
Chao Peng Aug. 10, 2022, 9:37 a.m. UTC | #2
On Fri, Aug 05, 2022 at 03:28:50PM +0200, David Hildenbrand wrote:
> On 06.07.22 10:20, Chao Peng wrote:
> > Introduce a new memfd_create() flag indicating the content of the
> > created memfd is inaccessible from userspace through ordinary MMU
> > access (e.g., read/write/mmap). However, the file content can be
> > accessed via a different mechanism (e.g. KVM MMU) indirectly.
> > 
> > It provides semantics required for KVM guest private memory support
> > that a file descriptor with this flag set is going to be used as the
> > source of guest memory in confidential computing environments such
> > as Intel TDX/AMD SEV but may not be accessible from host userspace.
> > 
> > The flag can not coexist with MFD_ALLOW_SEALING, future sealing is
> > also impossible for a memfd created with this flag.
> 
> It's kind of weird to have it that way. Why should the user have to
> care? It's the notifier requirement to have that, no?
> 
> Why can't we handle that when register a notifier? If anything is
> already mapped, fail registering the notifier if the notifier has these
> demands. If registering succeeds, block it internally.
> 
> Or what am I missing? We might not need the memfile set flag semantics
> eventually and would not have to expose such a flag to user space.

This makes sense if doable. The major concern was: is there a reliable
way to detect this (already mapped) at the time of memslot registering.

Chao
> 
> -- 
> Thanks,
> 
> David / dhildenb
>
David Hildenbrand Aug. 10, 2022, 9:55 a.m. UTC | #3
On 10.08.22 11:37, Chao Peng wrote:
> On Fri, Aug 05, 2022 at 03:28:50PM +0200, David Hildenbrand wrote:
>> On 06.07.22 10:20, Chao Peng wrote:
>>> Introduce a new memfd_create() flag indicating the content of the
>>> created memfd is inaccessible from userspace through ordinary MMU
>>> access (e.g., read/write/mmap). However, the file content can be
>>> accessed via a different mechanism (e.g. KVM MMU) indirectly.
>>>
>>> It provides semantics required for KVM guest private memory support
>>> that a file descriptor with this flag set is going to be used as the
>>> source of guest memory in confidential computing environments such
>>> as Intel TDX/AMD SEV but may not be accessible from host userspace.
>>>
>>> The flag can not coexist with MFD_ALLOW_SEALING, future sealing is
>>> also impossible for a memfd created with this flag.
>>
>> It's kind of weird to have it that way. Why should the user have to
>> care? It's the notifier requirement to have that, no?
>>
>> Why can't we handle that when register a notifier? If anything is
>> already mapped, fail registering the notifier if the notifier has these
>> demands. If registering succeeds, block it internally.
>>
>> Or what am I missing? We might not need the memfile set flag semantics
>> eventually and would not have to expose such a flag to user space.
> 
> This makes sense if doable. The major concern was: is there a reliable
> way to detect this (already mapped) at the time of memslot registering.

If too complicated, we could simplify to "was this ever mapped" and fail
for now. Hooking into shmem_mmap() might be sufficient for that to get
notified about the first mmap.

As an alternative, mapping_mapped() or similar *might* do what we want.
Chao Peng Aug. 11, 2022, 1:17 p.m. UTC | #4
On Wed, Aug 10, 2022 at 11:55:19AM +0200, David Hildenbrand wrote:
> On 10.08.22 11:37, Chao Peng wrote:
> > On Fri, Aug 05, 2022 at 03:28:50PM +0200, David Hildenbrand wrote:
> >> On 06.07.22 10:20, Chao Peng wrote:
> >>> Introduce a new memfd_create() flag indicating the content of the
> >>> created memfd is inaccessible from userspace through ordinary MMU
> >>> access (e.g., read/write/mmap). However, the file content can be
> >>> accessed via a different mechanism (e.g. KVM MMU) indirectly.
> >>>
> >>> It provides semantics required for KVM guest private memory support
> >>> that a file descriptor with this flag set is going to be used as the
> >>> source of guest memory in confidential computing environments such
> >>> as Intel TDX/AMD SEV but may not be accessible from host userspace.
> >>>
> >>> The flag can not coexist with MFD_ALLOW_SEALING, future sealing is
> >>> also impossible for a memfd created with this flag.
> >>
> >> It's kind of weird to have it that way. Why should the user have to
> >> care? It's the notifier requirement to have that, no?
> >>
> >> Why can't we handle that when register a notifier? If anything is
> >> already mapped, fail registering the notifier if the notifier has these
> >> demands. If registering succeeds, block it internally.
> >>
> >> Or what am I missing? We might not need the memfile set flag semantics
> >> eventually and would not have to expose such a flag to user space.
> > 
> > This makes sense if doable. The major concern was: is there a reliable
> > way to detect this (already mapped) at the time of memslot registering.
> 
> If too complicated, we could simplify to "was this ever mapped" and fail
> for now. Hooking into shmem_mmap() might be sufficient for that to get
> notified about the first mmap.
> 
> As an alternative, mapping_mapped() or similar *might* do what we want.

mapping_mapped() sounds the right one, I remember SEV people want first
map then unmap. "was this ever mapped" may not work for them.

Thanks,
Chao
> 
> 
> 
> -- 
> Thanks,
> 
> David / dhildenb
Kirill A . Shutemov Sept. 7, 2022, 4:18 p.m. UTC | #5
On Fri, Aug 05, 2022 at 03:28:50PM +0200, David Hildenbrand wrote:
> On 06.07.22 10:20, Chao Peng wrote:
> > Introduce a new memfd_create() flag indicating the content of the
> > created memfd is inaccessible from userspace through ordinary MMU
> > access (e.g., read/write/mmap). However, the file content can be
> > accessed via a different mechanism (e.g. KVM MMU) indirectly.
> > 
> > It provides semantics required for KVM guest private memory support
> > that a file descriptor with this flag set is going to be used as the
> > source of guest memory in confidential computing environments such
> > as Intel TDX/AMD SEV but may not be accessible from host userspace.
> > 
> > The flag can not coexist with MFD_ALLOW_SEALING, future sealing is
> > also impossible for a memfd created with this flag.
> 
> It's kind of weird to have it that way. Why should the user have to
> care? It's the notifier requirement to have that, no?
> 
> Why can't we handle that when register a notifier? If anything is
> already mapped, fail registering the notifier if the notifier has these
> demands. If registering succeeds, block it internally.
> 
> Or what am I missing? We might not need the memfile set flag semantics
> eventually and would not have to expose such a flag to user space.

Well, with the new shim-based[1] implementation the approach without uAPI
does not work.

We now have two struct file, one is a normal accessible memfd and the
other one is wrapper around that hides the memfd from userspace and
filters allowed operations. If we first create an accessible memfd that
userspace see it would be hard to hide it as by the time userspace may
have multiple fds in different processes that point to the same struct
file.

[1] https://lore.kernel.org/all/20220831142439.65q2gi4g2d2z4ofh@box.shutemov.name
diff mbox series

Patch

diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h
index 7a8a26751c23..48750474b904 100644
--- a/include/uapi/linux/memfd.h
+++ b/include/uapi/linux/memfd.h
@@ -8,6 +8,7 @@ 
 #define MFD_CLOEXEC		0x0001U
 #define MFD_ALLOW_SEALING	0x0002U
 #define MFD_HUGETLB		0x0004U
+#define MFD_INACCESSIBLE	0x0008U
 
 /*
  * Huge page size encoding when MFD_HUGETLB is specified, and a huge page
diff --git a/mm/memfd.c b/mm/memfd.c
index 2afd898798e4..72d7139ccced 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -18,6 +18,7 @@ 
 #include <linux/hugetlb.h>
 #include <linux/shmem_fs.h>
 #include <linux/memfd.h>
+#include <linux/memfile_notifier.h>
 #include <uapi/linux/memfd.h>
 
 /*
@@ -262,7 +263,8 @@  long memfd_fcntl(struct file *file, unsigned int cmd, unsigned long arg)
 #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1)
 #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN)
 
-#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB)
+#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | \
+		       MFD_INACCESSIBLE)
 
 SYSCALL_DEFINE2(memfd_create,
 		const char __user *, uname,
@@ -284,6 +286,10 @@  SYSCALL_DEFINE2(memfd_create,
 			return -EINVAL;
 	}
 
+	/* Disallow sealing when MFD_INACCESSIBLE is set. */
+	if (flags & MFD_INACCESSIBLE && flags & MFD_ALLOW_SEALING)
+		return -EINVAL;
+
 	/* length includes terminating zero */
 	len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1);
 	if (len <= 0)
@@ -330,12 +336,19 @@  SYSCALL_DEFINE2(memfd_create,
 	if (flags & MFD_ALLOW_SEALING) {
 		file_seals = memfd_file_seals_ptr(file);
 		*file_seals &= ~F_SEAL_SEAL;
+	} else if (flags & MFD_INACCESSIBLE) {
+		error = memfile_node_set_flags(file,
+					       MEMFILE_F_USER_INACCESSIBLE);
+		if (error)
+			goto err_file;
 	}
 
 	fd_install(fd, file);
 	kfree(name);
 	return fd;
 
+err_file:
+	fput(file);
 err_fd:
 	put_unused_fd(fd);
 err_name: