diff mbox series

[RFC,3/8] kvm: pfncache: enlighten about gmem

Message ID 20240709132041.3625501-4-roypat@amazon.co.uk (mailing list archive)
State New, archived
Headers show
Series Unmapping guest_memfd from Direct Map | expand

Commit Message

Patrick Roy July 9, 2024, 1:20 p.m. UTC
KVM uses gfn_to_pfn_caches to cache translations from gfn all the way to
the pfn (for example, kvm-clock caches the page storing the page used
for guest/host communication this way). Unlike the gfn_to_hva_cache,
where no equivalent caching semantics were possible to gmem-backed gfns
(see also 858e8068a750 ("kvm: pfncache: enlighten about gmem")), here it
is possible to simply cache the pfn returned by `kvm_gmem_get_pfn`.

Additionally, gfn_to_pfn_caches now invalidate whenever a cached gfn's
attributes are flipped from shared to private (or vice-versa).

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
---
 include/linux/kvm_types.h |  1 +
 virt/kvm/pfncache.c       | 41 +++++++++++++++++++++++++++++++++------
 2 files changed, 36 insertions(+), 6 deletions(-)

Comments

David Woodhouse July 9, 2024, 2:36 p.m. UTC | #1
On Tue, 2024-07-09 at 14:20 +0100, Patrick Roy wrote:
> KVM uses gfn_to_pfn_caches to cache translations from gfn all the way to
> the pfn (for example, kvm-clock caches the page storing the page used
> for guest/host communication this way). Unlike the gfn_to_hva_cache,
> where no equivalent caching semantics were possible to gmem-backed gfns
> (see also 858e8068a750 ("kvm: pfncache: enlighten about gmem")), here it
> is possible to simply cache the pfn returned by `kvm_gmem_get_pfn`.
> 
> Additionally, gfn_to_pfn_caches now invalidate whenever a cached gfn's
> attributes are flipped from shared to private (or vice-versa).
> 
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>

I can't see how this is safe from race conditions.

When the GPC is invalidated from gfn_to_pfn_cache_invalidate_start()
its *write* lock is taken and gpc->valid is set to false.

In parallel, any code using the GPC to access guest memory will take
the *read* lock, call kvm_gpc_check(), and then go ahead and use the
pointer to its heart's content until eventually dropping the read lock.

Since invalidation takes the write lock, it has to wait until the GPC
is no longer in active use, and the pointer cannot be being
dereferenced.

How does this work for the kvm_mem_is_private() check. You've added a
check in kvm_gpc_check(), but what if the pfn is made private
immediately *after* that check? Unless the code path which makes the
pfn private also takes the write lock, how is it safe?
Patrick Roy July 10, 2024, 9:49 a.m. UTC | #2
On 7/9/24 15:36, David Woodhouse wrote:
> On Tue, 2024-07-09 at 14:20 +0100, Patrick Roy wrote:
>> KVM uses gfn_to_pfn_caches to cache translations from gfn all the way to
>> the pfn (for example, kvm-clock caches the page storing the page used
>> for guest/host communication this way). Unlike the gfn_to_hva_cache,
>> where no equivalent caching semantics were possible to gmem-backed gfns
>> (see also 858e8068a750 ("kvm: pfncache: enlighten about gmem")), here it
>> is possible to simply cache the pfn returned by `kvm_gmem_get_pfn`.
>>
>> Additionally, gfn_to_pfn_caches now invalidate whenever a cached gfn's
>> attributes are flipped from shared to private (or vice-versa).
>>
>> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> 
> I can't see how this is safe from race conditions.
> 
> When the GPC is invalidated from gfn_to_pfn_cache_invalidate_start()
> its *write* lock is taken and gpc->valid is set to false.
> 
> In parallel, any code using the GPC to access guest memory will take
> the *read* lock, call kvm_gpc_check(), and then go ahead and use the
> pointer to its heart's content until eventually dropping the read lock.
> 
> Since invalidation takes the write lock, it has to wait until the GPC
> is no longer in active use, and the pointer cannot be being
> dereferenced.
> 
> How does this work for the kvm_mem_is_private() check. You've added a
> check in kvm_gpc_check(), but what if the pfn is made private
> immediately *after* that check? Unless the code path which makes the
> pfn private also takes the write lock, how is it safe?

Ah, you're right - I did in fact overlook this. I do think that it works
out though: kvm_vm_set_mem_attributes, which is used for flipping
between shared/private, registers the range which had its attributes
changed for invalidation, and thus gfn_to_pfn_cache_invalidate_start
should get called for it (although I have to admit I do not immediately
see what the exact callstack for this looks like, so maybe I am
misunderstanding something about invalidation here?).
David Woodhouse July 10, 2024, 10:20 a.m. UTC | #3
On Wed, 2024-07-10 at 10:49 +0100, Patrick Roy wrote:
> On 7/9/24 15:36, David Woodhouse wrote:

I did? It isn't September yet, surely?

> > On Tue, 2024-07-09 at 14:20 +0100, Patrick Roy wrote:
> > > KVM uses gfn_to_pfn_caches to cache translations from gfn all the way to
> > > the pfn (for example, kvm-clock caches the page storing the page used
> > > for guest/host communication this way). Unlike the gfn_to_hva_cache,
> > > where no equivalent caching semantics were possible to gmem-backed gfns
> > > (see also 858e8068a750 ("kvm: pfncache: enlighten about gmem")), here it
> > > is possible to simply cache the pfn returned by `kvm_gmem_get_pfn`.
> > > 
> > > Additionally, gfn_to_pfn_caches now invalidate whenever a cached gfn's
> > > attributes are flipped from shared to private (or vice-versa).
> > > 
> > > Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> > 
> > I can't see how this is safe from race conditions.
> > 
> > When the GPC is invalidated from gfn_to_pfn_cache_invalidate_start()
> > its *write* lock is taken and gpc->valid is set to false.
> > 
> > In parallel, any code using the GPC to access guest memory will take
> > the *read* lock, call kvm_gpc_check(), and then go ahead and use the
> > pointer to its heart's content until eventually dropping the read lock.
> > 
> > Since invalidation takes the write lock, it has to wait until the GPC
> > is no longer in active use, and the pointer cannot be being
> > dereferenced.
> > 
> > How does this work for the kvm_mem_is_private() check. You've added a
> > check in kvm_gpc_check(), but what if the pfn is made private
> > immediately *after* that check? Unless the code path which makes the
> > pfn private also takes the write lock, how is it safe?
> 
> Ah, you're right - I did in fact overlook this. I do think that it works
> out though: kvm_vm_set_mem_attributes, which is used for flipping
> between shared/private, registers the range which had its attributes
> changed for invalidation, and thus gfn_to_pfn_cache_invalidate_start
> should get called for it (although I have to admit I do not immediately
> see what the exact callstack for this looks like, so maybe I am
> misunderstanding something about invalidation here?).

In that case, wouldn't that mean the explicit checks on gpc->is_private
matching kvm_mem_is_private() would be redundant and you can remove
them because you can trust that gpc->valid would be cleared?
Patrick Roy July 10, 2024, 10:46 a.m. UTC | #4
On Wed, 2024-07-10 at 11:20 +0100, David Woodhouse wrote:
> On Wed, 2024-07-10 at 10:49 +0100, Patrick Roy wrote:
>> On 7/9/24 15:36, David Woodhouse wrote:
> 
> I did? It isn't September yet, surely?

Argh, thanks for letting me know, I think I've whacked some sense into
my mail client now :)

>>> On Tue, 2024-07-09 at 14:20 +0100, Patrick Roy wrote:
>>>> KVM uses gfn_to_pfn_caches to cache translations from gfn all the way to
>>>> the pfn (for example, kvm-clock caches the page storing the page used
>>>> for guest/host communication this way). Unlike the gfn_to_hva_cache,
>>>> where no equivalent caching semantics were possible to gmem-backed gfns
>>>> (see also 858e8068a750 ("kvm: pfncache: enlighten about gmem")), here it
>>>> is possible to simply cache the pfn returned by `kvm_gmem_get_pfn`.
>>>>
>>>> Additionally, gfn_to_pfn_caches now invalidate whenever a cached gfn's
>>>> attributes are flipped from shared to private (or vice-versa).
>>>>
>>>> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
>>>
>>> I can't see how this is safe from race conditions.
>>>
>>> When the GPC is invalidated from gfn_to_pfn_cache_invalidate_start()
>>> its *write* lock is taken and gpc->valid is set to false.
>>>
>>> In parallel, any code using the GPC to access guest memory will take
>>> the *read* lock, call kvm_gpc_check(), and then go ahead and use the
>>> pointer to its heart's content until eventually dropping the read lock.
>>>
>>> Since invalidation takes the write lock, it has to wait until the GPC
>>> is no longer in active use, and the pointer cannot be being
>>> dereferenced.
>>>
>>> How does this work for the kvm_mem_is_private() check. You've added a
>>> check in kvm_gpc_check(), but what if the pfn is made private
>>> immediately *after* that check? Unless the code path which makes the
>>> pfn private also takes the write lock, how is it safe?
>>
>> Ah, you're right - I did in fact overlook this. I do think that it works
>> out though: kvm_vm_set_mem_attributes, which is used for flipping
>> between shared/private, registers the range which had its attributes
>> changed for invalidation, and thus gfn_to_pfn_cache_invalidate_start
>> should get called for it (although I have to admit I do not immediately
>> see what the exact callstack for this looks like, so maybe I am
>> misunderstanding something about invalidation here?).
> 
> In that case, wouldn't that mean the explicit checks on gpc->is_private
> matching kvm_mem_is_private() would be redundant and you can remove
> them because you can trust that gpc->valid would be cleared?
> 

Right, yes, it would indeed mean that. I'll double-check my assumption
about the whole invalidation thing and adjust the code for the next
iteration!
David Woodhouse July 10, 2024, 10:50 a.m. UTC | #5
On Wed, 2024-07-10 at 11:46 +0100, Patrick Roy wrote:
> On Wed, 2024-07-10 at 11:20 +0100, David Woodhouse wrote:
> > In that case, wouldn't that mean the explicit checks on gpc->is_private
> > matching kvm_mem_is_private() would be redundant and you can remove
> > them because you can trust that gpc->valid would be cleared?
> > 
> 
> Right, yes, it would indeed mean that. I'll double-check my assumption
> about the whole invalidation thing and adjust the code for the next
> iteration!

I was going to suggest that you take the check you'd added to
kvm_gpc_check() and move it down below the ->valid check, and turn it
into a BUG_ON() to check that assertion.

You *might* get false positives with that though, if the result of
kvm_mem_is_private() becomes true before the flush actually *happens*?
diff mbox series

Patch

diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 827ecc0b7e10..8f85f01f6bb0 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -70,6 +70,7 @@  struct gfn_to_pfn_cache {
 	kvm_pfn_t pfn;
 	bool active;
 	bool valid;
+	bool is_private;
 };
 
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index f0039efb9e1e..6430e0a49558 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -90,6 +90,9 @@  bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
 	if (!kvm_gpc_is_valid_len(gpc->gpa, gpc->uhva, len))
 		return false;
 
+	if (gpc->is_private != kvm_mem_is_private(gpc->kvm, gpa_to_gfn(gpc->gpa)))
+		return false;
+
 	if (!gpc->valid)
 		return false;
 
@@ -159,6 +162,7 @@  static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 	kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
 	void *new_khva = NULL;
 	unsigned long mmu_seq;
+	gfn_t gfn;
 
 	lockdep_assert_held(&gpc->refresh_lock);
 
@@ -173,6 +177,7 @@  static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 
 	do {
 		mmu_seq = gpc->kvm->mmu_invalidate_seq;
+		gfn = gpa_to_gfn(gpc->gpa);
 		smp_rmb();
 
 		write_unlock_irq(&gpc->lock);
@@ -197,10 +202,19 @@  static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 			cond_resched();
 		}
 
-		/* We always request a writeable mapping */
-		new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL, true, NULL);
-		if (is_error_noslot_pfn(new_pfn))
-			goto out_error;
+		if (gpc->is_private) {
+			int r = kvm_gmem_get_pfn(gpc->kvm, gfn_to_memslot(gpc->kvm, gfn), gfn,
+						 &new_pfn, NULL);
+
+			if (r)
+				goto out_error;
+		} else {
+			/* We always request a writeable mapping */
+			new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL,
+					     true, NULL);
+			if (is_error_noslot_pfn(new_pfn))
+				goto out_error;
+		}
 
 		/*
 		 * Obtain a new kernel mapping if KVM itself will access the
@@ -252,6 +266,7 @@  static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 	unsigned long old_uhva;
 	kvm_pfn_t old_pfn;
 	bool hva_change = false;
+	bool old_private;
 	void *old_khva;
 	int ret;
 
@@ -271,8 +286,21 @@  static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 	old_pfn = gpc->pfn;
 	old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
 	old_uhva = PAGE_ALIGN_DOWN(gpc->uhva);
+	old_private = gpc->is_private;
+
+	gpc->is_private = kvm_mem_is_private(gpc->kvm, gpa_to_gfn(gpa));
+
+	if (gpc->is_private && !kvm_can_access_gmem(gpc->kvm)) {
+		ret = -EFAULT;
+		goto out_unlock;
+	}
 
 	if (kvm_is_error_gpa(gpa)) {
+		if (WARN_ON_ONCE(gpc->is_private)) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
 		page_offset = offset_in_page(uhva);
 
 		gpc->gpa = INVALID_GPA;
@@ -316,9 +344,10 @@  static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 
 	/*
 	 * If the userspace HVA changed or the PFN was already invalid,
-	 * drop the lock and do the HVA to PFN lookup again.
+	 * drop the lock and do the HVA to PFN lookup again. Also
+	 * recompute the pfn if the gfn changed from shared to private (or vice-versa).
 	 */
-	if (!gpc->valid || hva_change) {
+	if (!gpc->valid || hva_change || gpc->is_private != old_private) {
 		ret = hva_to_pfn_retry(gpc);
 	} else {
 		/*