diff mbox series

[v3,6/8] KVM: Fully serialize gfn=>pfn cache refresh via mutex

Message ID 20220429210025.3293691-7-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: Fix mmu_notifier vs. pfncache vs. pfncache races | expand

Commit Message

Sean Christopherson April 29, 2022, 9 p.m. UTC
Protect gfn=>pfn cache refresh with a mutex to fully serialize refreshes.
The refresh logic doesn't protect against concurrent refreshes with
different GPAs (which may or may not be a desired use case, but it's
allowed in the code), nor does it protect against a false negative on the
memslot generation.

If the first refresh sees a stale memslot generation, it will refresh the
hva and generation before moving on to the hva=>pfn translation.  If it
then drops gpc->lock, a different user of the cache can come along,
acquire gpc->lock, see that the memslot generation is fresh, and skip
the hva=>pfn update due to the userspace address also matching (because
it too was updated).

The refresh path can already sleep during hva=>pfn resolution, so wrap
the refresh with a mutex to ensure that any given refresh runs to
completion before other callers can start their refresh.

Cc: stable@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_types.h |  2 ++
 virt/kvm/pfncache.c       | 10 ++++++++++
 2 files changed, 12 insertions(+)

Comments

Paolo Bonzini May 20, 2022, 3:24 p.m. UTC | #1
On 4/29/22 23:00, Sean Christopherson wrote:
> diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
> index 05cb0bcbf662..eaef31462bbe 100644
> --- a/virt/kvm/pfncache.c
> +++ b/virt/kvm/pfncache.c
> @@ -157,6 +157,13 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
>   	if (page_offset + len > PAGE_SIZE)
>   		return -EINVAL;
>   
> +	/*
> +	 * If another task is refreshing the cache, wait for it to complete.
> +	 * There is no guarantee that concurrent refreshes will see the same
> +	 * gpa, memslots generation, etc..., so they must be fully serialized.
> +	 */
> +	mutex_lock(&gpc->refresh_lock);
> +
>   	write_lock_irq(&gpc->lock);
>   
>   	old_pfn = gpc->pfn;
> @@ -248,6 +255,8 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
>    out:
>   	write_unlock_irq(&gpc->lock);
>   
> +	mutex_unlock(&gpc->refresh_lock);
> +
>   	gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
>   
>   	return ret;

Does kvm_gfn_to_pfn_cache_unmap also need to take the mutex, to avoid 
the WARN_ON(gpc->valid)?

Paolo
Sean Christopherson May 20, 2022, 3:53 p.m. UTC | #2
On Fri, May 20, 2022, Paolo Bonzini wrote:
> On 4/29/22 23:00, Sean Christopherson wrote:
> > diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
> > index 05cb0bcbf662..eaef31462bbe 100644
> > --- a/virt/kvm/pfncache.c
> > +++ b/virt/kvm/pfncache.c
> > @@ -157,6 +157,13 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
> >   	if (page_offset + len > PAGE_SIZE)
> >   		return -EINVAL;
> > +	/*
> > +	 * If another task is refreshing the cache, wait for it to complete.
> > +	 * There is no guarantee that concurrent refreshes will see the same
> > +	 * gpa, memslots generation, etc..., so they must be fully serialized.
> > +	 */
> > +	mutex_lock(&gpc->refresh_lock);
> > +
> >   	write_lock_irq(&gpc->lock);
> >   	old_pfn = gpc->pfn;
> > @@ -248,6 +255,8 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
> >    out:
> >   	write_unlock_irq(&gpc->lock);
> > +	mutex_unlock(&gpc->refresh_lock);
> > +
> >   	gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
> >   	return ret;
> 
> Does kvm_gfn_to_pfn_cache_unmap also need to take the mutex, to avoid the
> WARN_ON(gpc->valid)?

I don't know What WARN_ON() you're referring to, but there is a double-free bug
if unmap() runs during an invalidation.  That can be solved without having to
take the mutex though, just reset valid/pfn/khva before the retry.

When searching to see how unmap() was used in the original series (there's no
other user besides destroy...), I stumbled across this likely-related syzbot bug
that unfortunately didn't Cc KVM :-(

https://lore.kernel.org/all/00000000000073f09205db439577@google.com


diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 72eee096a7cd..1719b0249dbc 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -228,6 +228,11 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
        if (!old_valid || old_uhva != gpc->uhva) {
                void *new_khva = NULL;

+               /* blah blah blah */
+               gpc->valid = false;
+               gpc->pfn = KVM_PFN_ERR_FAULT;
+               gpc->khva = NULL;
+
                new_pfn = hva_to_pfn_retry(kvm, gpc);
                if (is_error_noslot_pfn(new_pfn)) {
                        ret = -EFAULT;
@@ -251,11 +256,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
                        /* Nothing more to do, the pfn is consumed only by the guest. */
                }

-               if (ret) {
-                       gpc->valid = false;
-                       gpc->pfn = KVM_PFN_ERR_FAULT;
-                       gpc->khva = NULL;
-               } else {
+               if (!ret) {
                        gpc->valid = true;
                        gpc->pfn = new_pfn;
                        gpc->khva = new_khva;
@@ -283,6 +284,11 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)

        write_lock_irq(&gpc->lock);

+       if (!gpc->valid) {
+               write_unlock_irq(&gpc->lock);
+               return;
+       }
+
        gpc->valid = false;

        old_khva = gpc->khva - offset_in_page(gpc->khva);
Paolo Bonzini May 20, 2022, 4:01 p.m. UTC | #3
On 5/20/22 17:53, Sean Christopherson wrote:
>> Does kvm_gfn_to_pfn_cache_unmap also need to take the mutex, to avoid the
>> WARN_ON(gpc->valid)?
> I don't know What WARN_ON() you're referring to, but there is a double-free bug
> if unmap() runs during an invalidation.  That can be solved without having to
> take the mutex though, just reset valid/pfn/khva before the retry.

I was thinking of this one:

	/*
	 * Other tasks must wait for _this_ refresh to complete before
	 * attempting to refresh.
	 */
	WARN_ON_ONCE(gpc->valid);

but unmap sets gpc->valid to false, not true. ಠ_ಠ

Still, as you point out unmap() and refresh() can easily clash.  In 
practice they probably exclude each other by different means (e.g. 
running in a single vCPU thread), but then in practice neither is a fast 
path.  It seems easier to just make them thread-safe the easy way now 
that there is a mutex.

> When searching to see how unmap() was used in the original series (there's no
> other user besides destroy...), I stumbled across this likely-related syzbot bug
> that unfortunately didn't Cc KVM:-(

To give an example, VMCLEAR would do an unmap() if the VMCS12 was mapped 
with a gfn-to-pfn cache.

Paolo
diff mbox series

Patch

diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index ac1ebb37a0ff..f328a01db4fe 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -19,6 +19,7 @@  struct kvm_memslots;
 enum kvm_mr_change;
 
 #include <linux/bits.h>
+#include <linux/mutex.h>
 #include <linux/types.h>
 #include <linux/spinlock_types.h>
 
@@ -69,6 +70,7 @@  struct gfn_to_pfn_cache {
 	struct kvm_vcpu *vcpu;
 	struct list_head list;
 	rwlock_t lock;
+	struct mutex refresh_lock;
 	void *khva;
 	kvm_pfn_t pfn;
 	enum pfn_cache_usage usage;
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 05cb0bcbf662..eaef31462bbe 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -157,6 +157,13 @@  int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	if (page_offset + len > PAGE_SIZE)
 		return -EINVAL;
 
+	/*
+	 * If another task is refreshing the cache, wait for it to complete.
+	 * There is no guarantee that concurrent refreshes will see the same
+	 * gpa, memslots generation, etc..., so they must be fully serialized.
+	 */
+	mutex_lock(&gpc->refresh_lock);
+
 	write_lock_irq(&gpc->lock);
 
 	old_pfn = gpc->pfn;
@@ -248,6 +255,8 @@  int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
  out:
 	write_unlock_irq(&gpc->lock);
 
+	mutex_unlock(&gpc->refresh_lock);
+
 	gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
 
 	return ret;
@@ -288,6 +297,7 @@  int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 
 	if (!gpc->active) {
 		rwlock_init(&gpc->lock);
+		mutex_init(&gpc->refresh_lock);
 
 		gpc->khva = NULL;
 		gpc->pfn = KVM_PFN_ERR_FAULT;