Message ID | 20240404185034.3184582-5-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: guest_memfd: New hooks and functionality for SEV-SNP and TDX | expand |
On 4/4/24 20:50, Paolo Bonzini wrote: > KVM would like to add a ioctl to encrypt and install a page into private > memory (i.e. into a guest_memfd), in preparation for launching an > encrypted guest. > > This API should be used only once per page (unless there are failures), > so we want to rule out the possibility of operating on a page that is > already in the guest_memfd's filemap. Overwriting the page is almost > certainly a sign of a bug, so we might as well forbid it. > > Therefore, introduce a new flag for __filemap_get_folio (to be passed > together with FGP_CREAT) that allows *adding* a new page to the filemap > but not returning an existing one. > > An alternative possibility would be to force KVM users to initialize > the whole filemap in one go, but that is complicated by the fact that > the filemap includes pages of different kinds, including some that are > per-vCPU rather than per-VM. Basically the result would be closer to > a system call that multiplexes multiple ioctls, than to something > cleaner like readv/writev. > > Races between callers that pass FGP_CREAT_ONLY are uninteresting to > the filemap code: one of the racers wins and one fails with EEXIST, > similar to calling open(2) with O_CREAT|O_EXCL. It doesn't matter to > filemap.c if the missing synchronization is in the kernel or in userspace, > and in fact it could even be intentional. (In the case of KVM it turns > out that a mutex is taken around these calls for unrelated reasons, > so there can be no races.) > > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Yosry Ahmed <yosryahmed@google.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Matthew, are your objections still valid or could I have your ack? Thanks, Paolo > --- > include/linux/pagemap.h | 2 ++ > mm/filemap.c | 4 ++++ > 2 files changed, 6 insertions(+) > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index f879c1d54da7..a8c0685e8c08 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -587,6 +587,7 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping, > * * %FGP_CREAT - If no folio is present then a new folio is allocated, > * added to the page cache and the VM's LRU list. The folio is > * returned locked. > + * * %FGP_CREAT_ONLY - Fail if a folio is present > * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the > * folio is already in cache. If the folio was allocated, unlock it > * before returning so the caller can do the same dance. > @@ -607,6 +608,7 @@ typedef unsigned int __bitwise fgf_t; > #define FGP_NOWAIT ((__force fgf_t)0x00000020) > #define FGP_FOR_MMAP ((__force fgf_t)0x00000040) > #define FGP_STABLE ((__force fgf_t)0x00000080) > +#define FGP_CREAT_ONLY ((__force fgf_t)0x00000100) > #define FGF_GET_ORDER(fgf) (((__force unsigned)fgf) >> 26) /* top 6 bits */ > > #define FGP_WRITEBEGIN (FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE) > diff --git a/mm/filemap.c b/mm/filemap.c > index 7437b2bd75c1..e7440e189ebd 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1863,6 +1863,10 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, > folio = NULL; > if (!folio) > goto no_page; > + if (fgp_flags & FGP_CREAT_ONLY) { > + folio_put(folio); > + return ERR_PTR(-EEXIST); > + } > > if (fgp_flags & FGP_LOCK) { > if (fgp_flags & FGP_NOWAIT) {
On 4/25/24 7:52 AM, Paolo Bonzini wrote: > On 4/4/24 20:50, Paolo Bonzini wrote: >> KVM would like to add a ioctl to encrypt and install a page into private >> memory (i.e. into a guest_memfd), in preparation for launching an >> encrypted guest. >> >> This API should be used only once per page (unless there are failures), >> so we want to rule out the possibility of operating on a page that is >> already in the guest_memfd's filemap. Overwriting the page is almost >> certainly a sign of a bug, so we might as well forbid it. >> >> Therefore, introduce a new flag for __filemap_get_folio (to be passed >> together with FGP_CREAT) that allows *adding* a new page to the filemap >> but not returning an existing one. >> >> An alternative possibility would be to force KVM users to initialize >> the whole filemap in one go, but that is complicated by the fact that >> the filemap includes pages of different kinds, including some that are >> per-vCPU rather than per-VM. Basically the result would be closer to >> a system call that multiplexes multiple ioctls, than to something >> cleaner like readv/writev. >> >> Races between callers that pass FGP_CREAT_ONLY are uninteresting to >> the filemap code: one of the racers wins and one fails with EEXIST, >> similar to calling open(2) with O_CREAT|O_EXCL. It doesn't matter to >> filemap.c if the missing synchronization is in the kernel or in userspace, >> and in fact it could even be intentional. (In the case of KVM it turns >> out that a mutex is taken around these calls for unrelated reasons, >> so there can be no races.) >> >> Cc: Matthew Wilcox <willy@infradead.org> >> Cc: Yosry Ahmed <yosryahmed@google.com> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > Matthew, are your objections still valid or could I have your ack? So per the sub-thread on PATCH 09/11, IIUC this is now moot, right? Vlastimil > Thanks, > > Paolo > >> --- >> include/linux/pagemap.h | 2 ++ >> mm/filemap.c | 4 ++++ >> 2 files changed, 6 insertions(+) >> >> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h >> index f879c1d54da7..a8c0685e8c08 100644 >> --- a/include/linux/pagemap.h >> +++ b/include/linux/pagemap.h >> @@ -587,6 +587,7 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping, >> * * %FGP_CREAT - If no folio is present then a new folio is allocated, >> * added to the page cache and the VM's LRU list. The folio is >> * returned locked. >> + * * %FGP_CREAT_ONLY - Fail if a folio is present >> * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the >> * folio is already in cache. If the folio was allocated, unlock it >> * before returning so the caller can do the same dance. >> @@ -607,6 +608,7 @@ typedef unsigned int __bitwise fgf_t; >> #define FGP_NOWAIT ((__force fgf_t)0x00000020) >> #define FGP_FOR_MMAP ((__force fgf_t)0x00000040) >> #define FGP_STABLE ((__force fgf_t)0x00000080) >> +#define FGP_CREAT_ONLY ((__force fgf_t)0x00000100) >> #define FGF_GET_ORDER(fgf) (((__force unsigned)fgf) >> 26) /* top 6 bits */ >> >> #define FGP_WRITEBEGIN (FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE) >> diff --git a/mm/filemap.c b/mm/filemap.c >> index 7437b2bd75c1..e7440e189ebd 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -1863,6 +1863,10 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, >> folio = NULL; >> if (!folio) >> goto no_page; >> + if (fgp_flags & FGP_CREAT_ONLY) { >> + folio_put(folio); >> + return ERR_PTR(-EEXIST); >> + } >> >> if (fgp_flags & FGP_LOCK) { >> if (fgp_flags & FGP_NOWAIT) { >
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index f879c1d54da7..a8c0685e8c08 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -587,6 +587,7 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping, * * %FGP_CREAT - If no folio is present then a new folio is allocated, * added to the page cache and the VM's LRU list. The folio is * returned locked. + * * %FGP_CREAT_ONLY - Fail if a folio is present * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the * folio is already in cache. If the folio was allocated, unlock it * before returning so the caller can do the same dance. @@ -607,6 +608,7 @@ typedef unsigned int __bitwise fgf_t; #define FGP_NOWAIT ((__force fgf_t)0x00000020) #define FGP_FOR_MMAP ((__force fgf_t)0x00000040) #define FGP_STABLE ((__force fgf_t)0x00000080) +#define FGP_CREAT_ONLY ((__force fgf_t)0x00000100) #define FGF_GET_ORDER(fgf) (((__force unsigned)fgf) >> 26) /* top 6 bits */ #define FGP_WRITEBEGIN (FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE) diff --git a/mm/filemap.c b/mm/filemap.c index 7437b2bd75c1..e7440e189ebd 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1863,6 +1863,10 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, folio = NULL; if (!folio) goto no_page; + if (fgp_flags & FGP_CREAT_ONLY) { + folio_put(folio); + return ERR_PTR(-EEXIST); + } if (fgp_flags & FGP_LOCK) { if (fgp_flags & FGP_NOWAIT) {
KVM would like to add a ioctl to encrypt and install a page into private memory (i.e. into a guest_memfd), in preparation for launching an encrypted guest. This API should be used only once per page (unless there are failures), so we want to rule out the possibility of operating on a page that is already in the guest_memfd's filemap. Overwriting the page is almost certainly a sign of a bug, so we might as well forbid it. Therefore, introduce a new flag for __filemap_get_folio (to be passed together with FGP_CREAT) that allows *adding* a new page to the filemap but not returning an existing one. An alternative possibility would be to force KVM users to initialize the whole filemap in one go, but that is complicated by the fact that the filemap includes pages of different kinds, including some that are per-vCPU rather than per-VM. Basically the result would be closer to a system call that multiplexes multiple ioctls, than to something cleaner like readv/writev. Races between callers that pass FGP_CREAT_ONLY are uninteresting to the filemap code: one of the racers wins and one fails with EEXIST, similar to calling open(2) with O_CREAT|O_EXCL. It doesn't matter to filemap.c if the missing synchronization is in the kernel or in userspace, and in fact it could even be intentional. (In the case of KVM it turns out that a mutex is taken around these calls for unrelated reasons, so there can be no races.) Cc: Matthew Wilcox <willy@infradead.org> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- include/linux/pagemap.h | 2 ++ mm/filemap.c | 4 ++++ 2 files changed, 6 insertions(+)