diff mbox series

[04/11] filemap: add FGP_CREAT_ONLY

Message ID 20240404185034.3184582-5-pbonzini@redhat.com (mailing list archive)
State New
Headers show
Series KVM: guest_memfd: New hooks and functionality for SEV-SNP and TDX | expand

Commit Message

Paolo Bonzini April 4, 2024, 6:50 p.m. UTC
KVM would like to add a ioctl to encrypt and install a page into private
memory (i.e. into a guest_memfd), in preparation for launching an
encrypted guest.

This API should be used only once per page (unless there are failures),
so we want to rule out the possibility of operating on a page that is
already in the guest_memfd's filemap.  Overwriting the page is almost
certainly a sign of a bug, so we might as well forbid it.

Therefore, introduce a new flag for __filemap_get_folio (to be passed
together with FGP_CREAT) that allows *adding* a new page to the filemap
but not returning an existing one.

An alternative possibility would be to force KVM users to initialize
the whole filemap in one go, but that is complicated by the fact that
the filemap includes pages of different kinds, including some that are
per-vCPU rather than per-VM.  Basically the result would be closer to
a system call that multiplexes multiple ioctls, than to something
cleaner like readv/writev.

Races between callers that pass FGP_CREAT_ONLY are uninteresting to
the filemap code: one of the racers wins and one fails with EEXIST,
similar to calling open(2) with O_CREAT|O_EXCL.  It doesn't matter to
filemap.c if the missing synchronization is in the kernel or in userspace,
and in fact it could even be intentional.  (In the case of KVM it turns
out that a mutex is taken around these calls for unrelated reasons,
so there can be no races.)

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/pagemap.h | 2 ++
 mm/filemap.c            | 4 ++++
 2 files changed, 6 insertions(+)

Comments

Paolo Bonzini April 25, 2024, 5:52 a.m. UTC | #1
On 4/4/24 20:50, Paolo Bonzini wrote:
> KVM would like to add a ioctl to encrypt and install a page into private
> memory (i.e. into a guest_memfd), in preparation for launching an
> encrypted guest.
> 
> This API should be used only once per page (unless there are failures),
> so we want to rule out the possibility of operating on a page that is
> already in the guest_memfd's filemap.  Overwriting the page is almost
> certainly a sign of a bug, so we might as well forbid it.
> 
> Therefore, introduce a new flag for __filemap_get_folio (to be passed
> together with FGP_CREAT) that allows *adding* a new page to the filemap
> but not returning an existing one.
> 
> An alternative possibility would be to force KVM users to initialize
> the whole filemap in one go, but that is complicated by the fact that
> the filemap includes pages of different kinds, including some that are
> per-vCPU rather than per-VM.  Basically the result would be closer to
> a system call that multiplexes multiple ioctls, than to something
> cleaner like readv/writev.
> 
> Races between callers that pass FGP_CREAT_ONLY are uninteresting to
> the filemap code: one of the racers wins and one fails with EEXIST,
> similar to calling open(2) with O_CREAT|O_EXCL.  It doesn't matter to
> filemap.c if the missing synchronization is in the kernel or in userspace,
> and in fact it could even be intentional.  (In the case of KVM it turns
> out that a mutex is taken around these calls for unrelated reasons,
> so there can be no races.)
> 
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Matthew, are your objections still valid or could I have your ack?

Thanks,

Paolo

> ---
>   include/linux/pagemap.h | 2 ++
>   mm/filemap.c            | 4 ++++
>   2 files changed, 6 insertions(+)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index f879c1d54da7..a8c0685e8c08 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -587,6 +587,7 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
>    * * %FGP_CREAT - If no folio is present then a new folio is allocated,
>    *   added to the page cache and the VM's LRU list.  The folio is
>    *   returned locked.
> + * * %FGP_CREAT_ONLY - Fail if a folio is present
>    * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the
>    *   folio is already in cache.  If the folio was allocated, unlock it
>    *   before returning so the caller can do the same dance.
> @@ -607,6 +608,7 @@ typedef unsigned int __bitwise fgf_t;
>   #define FGP_NOWAIT		((__force fgf_t)0x00000020)
>   #define FGP_FOR_MMAP		((__force fgf_t)0x00000040)
>   #define FGP_STABLE		((__force fgf_t)0x00000080)
> +#define FGP_CREAT_ONLY		((__force fgf_t)0x00000100)
>   #define FGF_GET_ORDER(fgf)	(((__force unsigned)fgf) >> 26)	/* top 6 bits */
>   
>   #define FGP_WRITEBEGIN		(FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE)
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 7437b2bd75c1..e7440e189ebd 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1863,6 +1863,10 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
>   		folio = NULL;
>   	if (!folio)
>   		goto no_page;
> +	if (fgp_flags & FGP_CREAT_ONLY) {
> +		folio_put(folio);
> +		return ERR_PTR(-EEXIST);
> +	}
>   
>   	if (fgp_flags & FGP_LOCK) {
>   		if (fgp_flags & FGP_NOWAIT) {
Vlastimil Babka April 29, 2024, 1:26 p.m. UTC | #2
On 4/25/24 7:52 AM, Paolo Bonzini wrote:
> On 4/4/24 20:50, Paolo Bonzini wrote:
>> KVM would like to add a ioctl to encrypt and install a page into private
>> memory (i.e. into a guest_memfd), in preparation for launching an
>> encrypted guest.
>> 
>> This API should be used only once per page (unless there are failures),
>> so we want to rule out the possibility of operating on a page that is
>> already in the guest_memfd's filemap.  Overwriting the page is almost
>> certainly a sign of a bug, so we might as well forbid it.
>> 
>> Therefore, introduce a new flag for __filemap_get_folio (to be passed
>> together with FGP_CREAT) that allows *adding* a new page to the filemap
>> but not returning an existing one.
>> 
>> An alternative possibility would be to force KVM users to initialize
>> the whole filemap in one go, but that is complicated by the fact that
>> the filemap includes pages of different kinds, including some that are
>> per-vCPU rather than per-VM.  Basically the result would be closer to
>> a system call that multiplexes multiple ioctls, than to something
>> cleaner like readv/writev.
>> 
>> Races between callers that pass FGP_CREAT_ONLY are uninteresting to
>> the filemap code: one of the racers wins and one fails with EEXIST,
>> similar to calling open(2) with O_CREAT|O_EXCL.  It doesn't matter to
>> filemap.c if the missing synchronization is in the kernel or in userspace,
>> and in fact it could even be intentional.  (In the case of KVM it turns
>> out that a mutex is taken around these calls for unrelated reasons,
>> so there can be no races.)
>> 
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Yosry Ahmed <yosryahmed@google.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> Matthew, are your objections still valid or could I have your ack?

So per the sub-thread on PATCH 09/11, IIUC this is now moot, right?

Vlastimil

> Thanks,
> 
> Paolo
> 
>> ---
>>   include/linux/pagemap.h | 2 ++
>>   mm/filemap.c            | 4 ++++
>>   2 files changed, 6 insertions(+)
>> 
>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>> index f879c1d54da7..a8c0685e8c08 100644
>> --- a/include/linux/pagemap.h
>> +++ b/include/linux/pagemap.h
>> @@ -587,6 +587,7 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
>>    * * %FGP_CREAT - If no folio is present then a new folio is allocated,
>>    *   added to the page cache and the VM's LRU list.  The folio is
>>    *   returned locked.
>> + * * %FGP_CREAT_ONLY - Fail if a folio is present
>>    * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the
>>    *   folio is already in cache.  If the folio was allocated, unlock it
>>    *   before returning so the caller can do the same dance.
>> @@ -607,6 +608,7 @@ typedef unsigned int __bitwise fgf_t;
>>   #define FGP_NOWAIT		((__force fgf_t)0x00000020)
>>   #define FGP_FOR_MMAP		((__force fgf_t)0x00000040)
>>   #define FGP_STABLE		((__force fgf_t)0x00000080)
>> +#define FGP_CREAT_ONLY		((__force fgf_t)0x00000100)
>>   #define FGF_GET_ORDER(fgf)	(((__force unsigned)fgf) >> 26)	/* top 6 bits */
>>   
>>   #define FGP_WRITEBEGIN		(FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE)
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 7437b2bd75c1..e7440e189ebd 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -1863,6 +1863,10 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
>>   		folio = NULL;
>>   	if (!folio)
>>   		goto no_page;
>> +	if (fgp_flags & FGP_CREAT_ONLY) {
>> +		folio_put(folio);
>> +		return ERR_PTR(-EEXIST);
>> +	}
>>   
>>   	if (fgp_flags & FGP_LOCK) {
>>   		if (fgp_flags & FGP_NOWAIT) {
>
diff mbox series

Patch

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index f879c1d54da7..a8c0685e8c08 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -587,6 +587,7 @@  pgoff_t page_cache_prev_miss(struct address_space *mapping,
  * * %FGP_CREAT - If no folio is present then a new folio is allocated,
  *   added to the page cache and the VM's LRU list.  The folio is
  *   returned locked.
+ * * %FGP_CREAT_ONLY - Fail if a folio is present
  * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the
  *   folio is already in cache.  If the folio was allocated, unlock it
  *   before returning so the caller can do the same dance.
@@ -607,6 +608,7 @@  typedef unsigned int __bitwise fgf_t;
 #define FGP_NOWAIT		((__force fgf_t)0x00000020)
 #define FGP_FOR_MMAP		((__force fgf_t)0x00000040)
 #define FGP_STABLE		((__force fgf_t)0x00000080)
+#define FGP_CREAT_ONLY		((__force fgf_t)0x00000100)
 #define FGF_GET_ORDER(fgf)	(((__force unsigned)fgf) >> 26)	/* top 6 bits */
 
 #define FGP_WRITEBEGIN		(FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE)
diff --git a/mm/filemap.c b/mm/filemap.c
index 7437b2bd75c1..e7440e189ebd 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1863,6 +1863,10 @@  struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 		folio = NULL;
 	if (!folio)
 		goto no_page;
+	if (fgp_flags & FGP_CREAT_ONLY) {
+		folio_put(folio);
+		return ERR_PTR(-EEXIST);
+	}
 
 	if (fgp_flags & FGP_LOCK) {
 		if (fgp_flags & FGP_NOWAIT) {