Message ID | 20240709132041.3625501-4-roypat@amazon.co.uk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Unmapping guest_memfd from Direct Map | expand |
On Tue, 2024-07-09 at 14:20 +0100, Patrick Roy wrote: > KVM uses gfn_to_pfn_caches to cache translations from gfn all the way to > the pfn (for example, kvm-clock caches the page storing the page used > for guest/host communication this way). Unlike the gfn_to_hva_cache, > where no equivalent caching semantics were possible to gmem-backed gfns > (see also 858e8068a750 ("kvm: pfncache: enlighten about gmem")), here it > is possible to simply cache the pfn returned by `kvm_gmem_get_pfn`. > > Additionally, gfn_to_pfn_caches now invalidate whenever a cached gfn's > attributes are flipped from shared to private (or vice-versa). > > Signed-off-by: Patrick Roy <roypat@amazon.co.uk> I can't see how this is safe from race conditions. When the GPC is invalidated from gfn_to_pfn_cache_invalidate_start() its *write* lock is taken and gpc->valid is set to false. In parallel, any code using the GPC to access guest memory will take the *read* lock, call kvm_gpc_check(), and then go ahead and use the pointer to its heart's content until eventually dropping the read lock. Since invalidation takes the write lock, it has to wait until the GPC is no longer in active use, and the pointer cannot be being dereferenced. How does this work for the kvm_mem_is_private() check. You've added a check in kvm_gpc_check(), but what if the pfn is made private immediately *after* that check? Unless the code path which makes the pfn private also takes the write lock, how is it safe?
On 7/9/24 15:36, David Woodhouse wrote: > On Tue, 2024-07-09 at 14:20 +0100, Patrick Roy wrote: >> KVM uses gfn_to_pfn_caches to cache translations from gfn all the way to >> the pfn (for example, kvm-clock caches the page storing the page used >> for guest/host communication this way). Unlike the gfn_to_hva_cache, >> where no equivalent caching semantics were possible to gmem-backed gfns >> (see also 858e8068a750 ("kvm: pfncache: enlighten about gmem")), here it >> is possible to simply cache the pfn returned by `kvm_gmem_get_pfn`. >> >> Additionally, gfn_to_pfn_caches now invalidate whenever a cached gfn's >> attributes are flipped from shared to private (or vice-versa). >> >> Signed-off-by: Patrick Roy <roypat@amazon.co.uk> > > I can't see how this is safe from race conditions. > > When the GPC is invalidated from gfn_to_pfn_cache_invalidate_start() > its *write* lock is taken and gpc->valid is set to false. > > In parallel, any code using the GPC to access guest memory will take > the *read* lock, call kvm_gpc_check(), and then go ahead and use the > pointer to its heart's content until eventually dropping the read lock. > > Since invalidation takes the write lock, it has to wait until the GPC > is no longer in active use, and the pointer cannot be being > dereferenced. > > How does this work for the kvm_mem_is_private() check. You've added a > check in kvm_gpc_check(), but what if the pfn is made private > immediately *after* that check? Unless the code path which makes the > pfn private also takes the write lock, how is it safe? Ah, you're right - I did in fact overlook this. I do think that it works out though: kvm_vm_set_mem_attributes, which is used for flipping between shared/private, registers the range which had its attributes changed for invalidation, and thus gfn_to_pfn_cache_invalidate_start should get called for it (although I have to admit I do not immediately see what the exact callstack for this looks like, so maybe I am misunderstanding something about invalidation here?).
On Wed, 2024-07-10 at 10:49 +0100, Patrick Roy wrote: > On 7/9/24 15:36, David Woodhouse wrote: I did? It isn't September yet, surely? > > On Tue, 2024-07-09 at 14:20 +0100, Patrick Roy wrote: > > > KVM uses gfn_to_pfn_caches to cache translations from gfn all the way to > > > the pfn (for example, kvm-clock caches the page storing the page used > > > for guest/host communication this way). Unlike the gfn_to_hva_cache, > > > where no equivalent caching semantics were possible to gmem-backed gfns > > > (see also 858e8068a750 ("kvm: pfncache: enlighten about gmem")), here it > > > is possible to simply cache the pfn returned by `kvm_gmem_get_pfn`. > > > > > > Additionally, gfn_to_pfn_caches now invalidate whenever a cached gfn's > > > attributes are flipped from shared to private (or vice-versa). > > > > > > Signed-off-by: Patrick Roy <roypat@amazon.co.uk> > > > > I can't see how this is safe from race conditions. > > > > When the GPC is invalidated from gfn_to_pfn_cache_invalidate_start() > > its *write* lock is taken and gpc->valid is set to false. > > > > In parallel, any code using the GPC to access guest memory will take > > the *read* lock, call kvm_gpc_check(), and then go ahead and use the > > pointer to its heart's content until eventually dropping the read lock. > > > > Since invalidation takes the write lock, it has to wait until the GPC > > is no longer in active use, and the pointer cannot be being > > dereferenced. > > > > How does this work for the kvm_mem_is_private() check. You've added a > > check in kvm_gpc_check(), but what if the pfn is made private > > immediately *after* that check? Unless the code path which makes the > > pfn private also takes the write lock, how is it safe? > > Ah, you're right - I did in fact overlook this. I do think that it works > out though: kvm_vm_set_mem_attributes, which is used for flipping > between shared/private, registers the range which had its attributes > changed for invalidation, and thus gfn_to_pfn_cache_invalidate_start > should get called for it (although I have to admit I do not immediately > see what the exact callstack for this looks like, so maybe I am > misunderstanding something about invalidation here?). In that case, wouldn't that mean the explicit checks on gpc->is_private matching kvm_mem_is_private() would be redundant and you can remove them because you can trust that gpc->valid would be cleared?
On Wed, 2024-07-10 at 11:20 +0100, David Woodhouse wrote: > On Wed, 2024-07-10 at 10:49 +0100, Patrick Roy wrote: >> On 7/9/24 15:36, David Woodhouse wrote: > > I did? It isn't September yet, surely? Argh, thanks for letting me know, I think I've whacked some sense into my mail client now :) >>> On Tue, 2024-07-09 at 14:20 +0100, Patrick Roy wrote: >>>> KVM uses gfn_to_pfn_caches to cache translations from gfn all the way to >>>> the pfn (for example, kvm-clock caches the page storing the page used >>>> for guest/host communication this way). Unlike the gfn_to_hva_cache, >>>> where no equivalent caching semantics were possible to gmem-backed gfns >>>> (see also 858e8068a750 ("kvm: pfncache: enlighten about gmem")), here it >>>> is possible to simply cache the pfn returned by `kvm_gmem_get_pfn`. >>>> >>>> Additionally, gfn_to_pfn_caches now invalidate whenever a cached gfn's >>>> attributes are flipped from shared to private (or vice-versa). >>>> >>>> Signed-off-by: Patrick Roy <roypat@amazon.co.uk> >>> >>> I can't see how this is safe from race conditions. >>> >>> When the GPC is invalidated from gfn_to_pfn_cache_invalidate_start() >>> its *write* lock is taken and gpc->valid is set to false. >>> >>> In parallel, any code using the GPC to access guest memory will take >>> the *read* lock, call kvm_gpc_check(), and then go ahead and use the >>> pointer to its heart's content until eventually dropping the read lock. >>> >>> Since invalidation takes the write lock, it has to wait until the GPC >>> is no longer in active use, and the pointer cannot be being >>> dereferenced. >>> >>> How does this work for the kvm_mem_is_private() check. You've added a >>> check in kvm_gpc_check(), but what if the pfn is made private >>> immediately *after* that check? Unless the code path which makes the >>> pfn private also takes the write lock, how is it safe? >> >> Ah, you're right - I did in fact overlook this. I do think that it works >> out though: kvm_vm_set_mem_attributes, which is used for flipping >> between shared/private, registers the range which had its attributes >> changed for invalidation, and thus gfn_to_pfn_cache_invalidate_start >> should get called for it (although I have to admit I do not immediately >> see what the exact callstack for this looks like, so maybe I am >> misunderstanding something about invalidation here?). > > In that case, wouldn't that mean the explicit checks on gpc->is_private > matching kvm_mem_is_private() would be redundant and you can remove > them because you can trust that gpc->valid would be cleared? > Right, yes, it would indeed mean that. I'll double-check my assumption about the whole invalidation thing and adjust the code for the next iteration!
On Wed, 2024-07-10 at 11:46 +0100, Patrick Roy wrote: > On Wed, 2024-07-10 at 11:20 +0100, David Woodhouse wrote: > > In that case, wouldn't that mean the explicit checks on gpc->is_private > > matching kvm_mem_is_private() would be redundant and you can remove > > them because you can trust that gpc->valid would be cleared? > > > > Right, yes, it would indeed mean that. I'll double-check my assumption > about the whole invalidation thing and adjust the code for the next > iteration! I was going to suggest that you take the check you'd added to kvm_gpc_check() and move it down below the ->valid check, and turn it into a BUG_ON() to check that assertion. You *might* get false positives with that though, if the result of kvm_mem_is_private() becomes true before the flush actually *happens*?
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 827ecc0b7e10..8f85f01f6bb0 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -70,6 +70,7 @@ struct gfn_to_pfn_cache { kvm_pfn_t pfn; bool active; bool valid; + bool is_private; }; #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index f0039efb9e1e..6430e0a49558 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -90,6 +90,9 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len) if (!kvm_gpc_is_valid_len(gpc->gpa, gpc->uhva, len)) return false; + if (gpc->is_private != kvm_mem_is_private(gpc->kvm, gpa_to_gfn(gpc->gpa))) + return false; + if (!gpc->valid) return false; @@ -159,6 +162,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT; void *new_khva = NULL; unsigned long mmu_seq; + gfn_t gfn; lockdep_assert_held(&gpc->refresh_lock); @@ -173,6 +177,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) do { mmu_seq = gpc->kvm->mmu_invalidate_seq; + gfn = gpa_to_gfn(gpc->gpa); smp_rmb(); write_unlock_irq(&gpc->lock); @@ -197,10 +202,19 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) cond_resched(); } - /* We always request a writeable mapping */ - new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL, true, NULL); - if (is_error_noslot_pfn(new_pfn)) - goto out_error; + if (gpc->is_private) { + int r = kvm_gmem_get_pfn(gpc->kvm, gfn_to_memslot(gpc->kvm, gfn), gfn, + &new_pfn, NULL); + + if (r) + goto out_error; + } else { + /* We always request a writeable mapping */ + new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL, + true, NULL); + if (is_error_noslot_pfn(new_pfn)) + goto out_error; + } /* * Obtain a new kernel mapping if KVM itself will access the @@ -252,6 +266,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l unsigned long old_uhva; kvm_pfn_t old_pfn; bool hva_change = false; + bool old_private; void *old_khva; int ret; @@ -271,8 +286,21 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l old_pfn = gpc->pfn; old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva); old_uhva = PAGE_ALIGN_DOWN(gpc->uhva); + old_private = gpc->is_private; + + gpc->is_private = kvm_mem_is_private(gpc->kvm, gpa_to_gfn(gpa)); + + if (gpc->is_private && !kvm_can_access_gmem(gpc->kvm)) { + ret = -EFAULT; + goto out_unlock; + } if (kvm_is_error_gpa(gpa)) { + if (WARN_ON_ONCE(gpc->is_private)) { + ret = -EINVAL; + goto out_unlock; + } + page_offset = offset_in_page(uhva); gpc->gpa = INVALID_GPA; @@ -316,9 +344,10 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l /* * If the userspace HVA changed or the PFN was already invalid, - * drop the lock and do the HVA to PFN lookup again. + * drop the lock and do the HVA to PFN lookup again. Also + * recompute the pfn if the gfn changed from shared to private (or vice-versa). */ - if (!gpc->valid || hva_change) { + if (!gpc->valid || hva_change || gpc->is_private != old_private) { ret = hva_to_pfn_retry(gpc); } else { /*
KVM uses gfn_to_pfn_caches to cache translations from gfn all the way to the pfn (for example, kvm-clock caches the page storing the page used for guest/host communication this way). Unlike the gfn_to_hva_cache, where no equivalent caching semantics were possible to gmem-backed gfns (see also 858e8068a750 ("kvm: pfncache: enlighten about gmem")), here it is possible to simply cache the pfn returned by `kvm_gmem_get_pfn`. Additionally, gfn_to_pfn_caches now invalidate whenever a cached gfn's attributes are flipped from shared to private (or vice-versa). Signed-off-by: Patrick Roy <roypat@amazon.co.uk> --- include/linux/kvm_types.h | 1 + virt/kvm/pfncache.c | 41 +++++++++++++++++++++++++++++++++------ 2 files changed, 36 insertions(+), 6 deletions(-)