diff mbox series

[v4,11/20] KVM: x86/mmu: Allow for NULL vcpu pointer in __kvm_mmu_get_shadow_page()

Message ID 20220422210546.458943-12-dmatlack@google.com (mailing list archive)
State Superseded
Headers show
Series KVM: Extend Eager Page Splitting to the shadow MMU | expand

Commit Message

David Matlack April 22, 2022, 9:05 p.m. UTC
Allow the vcpu pointer in __kvm_mmu_get_shadow_page() to be NULL. Rename
it to vcpu_or_null to prevent future commits from accidentally taking
dependency on it without first considering the NULL case.

The vcpu pointer is only used for syncing indirect shadow pages in
kvm_mmu_find_shadow_page(). A vcpu pointer it not required for
correctness since unsync pages can simply be zapped. But this should
never occur in practice, since the only use-case for passing a NULL vCPU
pointer is eager page splitting which will only request direct shadow
pages (which can never be unsync).

Even though __kvm_mmu_get_shadow_page() can gracefully handle a NULL
vcpu, add a WARN() that will fire if __kvm_mmu_get_shadow_page() is ever
called to get an indirect shadow page with a NULL vCPU pointer, since
zapping unsync SPs is a performance overhead that should be considered.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 40 ++++++++++++++++++++++++++++++++--------
 1 file changed, 32 insertions(+), 8 deletions(-)

Comments

Sean Christopherson May 5, 2022, 11:33 p.m. UTC | #1
On Fri, Apr 22, 2022, David Matlack wrote:
> Allow the vcpu pointer in __kvm_mmu_get_shadow_page() to be NULL. Rename
> it to vcpu_or_null to prevent future commits from accidentally taking
> dependency on it without first considering the NULL case.
> 
> The vcpu pointer is only used for syncing indirect shadow pages in
> kvm_mmu_find_shadow_page(). A vcpu pointer it not required for
> correctness since unsync pages can simply be zapped. But this should
> never occur in practice, since the only use-case for passing a NULL vCPU
> pointer is eager page splitting which will only request direct shadow
> pages (which can never be unsync).
> 
> Even though __kvm_mmu_get_shadow_page() can gracefully handle a NULL
> vcpu, add a WARN() that will fire if __kvm_mmu_get_shadow_page() is ever
> called to get an indirect shadow page with a NULL vCPU pointer, since
> zapping unsync SPs is a performance overhead that should be considered.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 40 ++++++++++++++++++++++++++++++++--------
>  1 file changed, 32 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 04029c01aebd..21407bd4435a 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1845,16 +1845,27 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
>  	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)])	\
>  		if ((_sp)->gfn != (_gfn) || (_sp)->role.direct) {} else
>  
> -static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> -			 struct list_head *invalid_list)
> +static int __kvm_sync_page(struct kvm *kvm, struct kvm_vcpu *vcpu_or_null,
> +			   struct kvm_mmu_page *sp,
> +			   struct list_head *invalid_list)
>  {
> -	int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
> +	int ret = -1;
> +
> +	if (vcpu_or_null)

This should never happen.  I like the idea of warning early, but I really don't
like that the WARN is far removed from the code that actually depends on @vcpu
being non-NULL. Case in point, KVM should have bailed on the WARN and never
reached this point.  And the inner __kvm_sync_page() is completely unnecessary.

I also don't love the vcpu_or_null terminology; I get the intent, but it doesn't
really help because understand why/when it's NULL.

I played around with casting, e.g. to/from an unsigned long or void *, to prevent
usage, but that doesn't work very well because 'unsigned long' ends up being
awkward/confusing, and 'void *' is easily lost on a function call.  And both
lose type safety :-(

All in all, I think I'd prefer this patch to simply be a KVM_BUG_ON() if
kvm_mmu_find_shadow_page() encounters an unsync page.  Less churn, and IMO there's
no real loss in robustness, e.g. we'd really have to screw up code review and
testing to introduce a null vCPU pointer dereference in this code.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3d102522804a..5aed9265f592 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2041,6 +2041,13 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
                        goto out;

                if (sp->unsync) {
+                       /*
+                        * Getting indirect shadow pages without a vCPU pointer
+                        * is not supported, i.e. this should never happen.
+                        */
+                       if (KVM_BUG_ON(!vcpu, kvm))
+                               break;
+
                        /*
                         * The page is good, but is stale.  kvm_sync_page does
                         * get the latest guest state, but (unlike mmu_unsync_children)
David Matlack May 9, 2022, 9:26 p.m. UTC | #2
On Thu, May 5, 2022 at 4:33 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Apr 22, 2022, David Matlack wrote:
> > Allow the vcpu pointer in __kvm_mmu_get_shadow_page() to be NULL. Rename
> > it to vcpu_or_null to prevent future commits from accidentally taking
> > dependency on it without first considering the NULL case.
> >
> > The vcpu pointer is only used for syncing indirect shadow pages in
> > kvm_mmu_find_shadow_page(). A vcpu pointer it not required for
> > correctness since unsync pages can simply be zapped. But this should
> > never occur in practice, since the only use-case for passing a NULL vCPU
> > pointer is eager page splitting which will only request direct shadow
> > pages (which can never be unsync).
> >
> > Even though __kvm_mmu_get_shadow_page() can gracefully handle a NULL
> > vcpu, add a WARN() that will fire if __kvm_mmu_get_shadow_page() is ever
> > called to get an indirect shadow page with a NULL vCPU pointer, since
> > zapping unsync SPs is a performance overhead that should be considered.
> >
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 40 ++++++++++++++++++++++++++++++++--------
> >  1 file changed, 32 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 04029c01aebd..21407bd4435a 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -1845,16 +1845,27 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
> >         &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)])     \
> >               if ((_sp)->gfn != (_gfn) || (_sp)->role.direct) {} else
> >
> > -static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> > -                      struct list_head *invalid_list)
> > +static int __kvm_sync_page(struct kvm *kvm, struct kvm_vcpu *vcpu_or_null,
> > +                        struct kvm_mmu_page *sp,
> > +                        struct list_head *invalid_list)
> >  {
> > -     int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
> > +     int ret = -1;
> > +
> > +     if (vcpu_or_null)
>
> This should never happen.  I like the idea of warning early, but I really don't
> like that the WARN is far removed from the code that actually depends on @vcpu
> being non-NULL. Case in point, KVM should have bailed on the WARN and never
> reached this point.  And the inner __kvm_sync_page() is completely unnecessary.

Yeah that's fair.

>
> I also don't love the vcpu_or_null terminology; I get the intent, but it doesn't
> really help because understand why/when it's NULL.

Eh, I don't think it needs to encode why or when. It just needs to
flag to the reader (and future code authors) that this vcpu pointer
(unlike all other vcpu pointers in KVM) is NULL in certain cases.

>
> I played around with casting, e.g. to/from an unsigned long or void *, to prevent
> usage, but that doesn't work very well because 'unsigned long' ends up being
> awkward/confusing, and 'void *' is easily lost on a function call.  And both
> lose type safety :-(

Yet another shortcoming of C :(

(The other being our other discussion about the RET_PF* return codes
getting easily misinterpreted as KVM's magic return-to-user /
continue-running-guest return codes.)

Makes me miss Rust!

>
> All in all, I think I'd prefer this patch to simply be a KVM_BUG_ON() if
> kvm_mmu_find_shadow_page() encounters an unsync page.  Less churn, and IMO there's
> no real loss in robustness, e.g. we'd really have to screw up code review and
> testing to introduce a null vCPU pointer dereference in this code.

Agreed about moving the check here and dropping __kvm_sync_page(). But
I would prefer to retain the vcpu_or_null name (or at least something
other than "vcpu" to indicate there's something non-standard about
this pointer).

>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 3d102522804a..5aed9265f592 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2041,6 +2041,13 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                         goto out;
>
>                 if (sp->unsync) {
> +                       /*
> +                        * Getting indirect shadow pages without a vCPU pointer
> +                        * is not supported, i.e. this should never happen.
> +                        */
> +                       if (KVM_BUG_ON(!vcpu, kvm))
> +                               break;
> +
>                         /*
>                          * The page is good, but is stale.  kvm_sync_page does
>                          * get the latest guest state, but (unlike mmu_unsync_children)
>
Sean Christopherson May 9, 2022, 10:56 p.m. UTC | #3
On Mon, May 09, 2022, David Matlack wrote:
> On Thu, May 5, 2022 at 4:33 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Apr 22, 2022, David Matlack wrote:
> > > Allow the vcpu pointer in __kvm_mmu_get_shadow_page() to be NULL. Rename
> > > it to vcpu_or_null to prevent future commits from accidentally taking
> > > dependency on it without first considering the NULL case.
> > >
> > > The vcpu pointer is only used for syncing indirect shadow pages in
> > > kvm_mmu_find_shadow_page(). A vcpu pointer it not required for
> > > correctness since unsync pages can simply be zapped. But this should
> > > never occur in practice, since the only use-case for passing a NULL vCPU
> > > pointer is eager page splitting which will only request direct shadow
> > > pages (which can never be unsync).
> > >
> > > Even though __kvm_mmu_get_shadow_page() can gracefully handle a NULL
> > > vcpu, add a WARN() that will fire if __kvm_mmu_get_shadow_page() is ever
> > > called to get an indirect shadow page with a NULL vCPU pointer, since
> > > zapping unsync SPs is a performance overhead that should be considered.
> > >
> > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > ---
> > >  arch/x86/kvm/mmu/mmu.c | 40 ++++++++++++++++++++++++++++++++--------
> > >  1 file changed, 32 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 04029c01aebd..21407bd4435a 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -1845,16 +1845,27 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
> > >         &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)])     \
> > >               if ((_sp)->gfn != (_gfn) || (_sp)->role.direct) {} else
> > >
> > > -static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> > > -                      struct list_head *invalid_list)
> > > +static int __kvm_sync_page(struct kvm *kvm, struct kvm_vcpu *vcpu_or_null,
> > > +                        struct kvm_mmu_page *sp,
> > > +                        struct list_head *invalid_list)
> > >  {
> > > -     int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
> > > +     int ret = -1;
> > > +
> > > +     if (vcpu_or_null)
> >
> > This should never happen.  I like the idea of warning early, but I really don't
> > like that the WARN is far removed from the code that actually depends on @vcpu
> > being non-NULL. Case in point, KVM should have bailed on the WARN and never
> > reached this point.  And the inner __kvm_sync_page() is completely unnecessary.
> 
> Yeah that's fair.
> 
> >
> > I also don't love the vcpu_or_null terminology; I get the intent, but it doesn't
> > really help because understand why/when it's NULL.
> 
> Eh, I don't think it needs to encode why or when. It just needs to
> flag to the reader (and future code authors) that this vcpu pointer
> (unlike all other vcpu pointers in KVM) is NULL in certain cases.

My objection is that without the why/when, developers that aren't familiar with
this code won't know the rules for using vcpu_or_null.  E.g. I don't want to end
up with

	if (vcpu_or_null)
		do x;
	else
		do y;

because inevitably it'll become unclear whether or not that code is actually _correct_.
It might not #GP on a NULL pointer, but it doesn't mean it's correct.

> > I played around with casting, e.g. to/from an unsigned long or void *, to prevent
> > usage, but that doesn't work very well because 'unsigned long' ends up being
> > awkward/confusing, and 'void *' is easily lost on a function call.  And both
> > lose type safety :-(
> 
> Yet another shortcoming of C :(

And lack of closures, which would work very well here.

> (The other being our other discussion about the RET_PF* return codes
> getting easily misinterpreted as KVM's magic return-to-user /
> continue-running-guest return codes.)
> 
> Makes me miss Rust!
> 
> >
> > All in all, I think I'd prefer this patch to simply be a KVM_BUG_ON() if
> > kvm_mmu_find_shadow_page() encounters an unsync page.  Less churn, and IMO there's
> > no real loss in robustness, e.g. we'd really have to screw up code review and
> > testing to introduce a null vCPU pointer dereference in this code.
> 
> Agreed about moving the check here and dropping __kvm_sync_page(). But
> I would prefer to retain the vcpu_or_null name (or at least something
> other than "vcpu" to indicate there's something non-standard about
> this pointer).

The least awful idea I've come up with is wrapping the vCPU in a struct, e.g.

	struct sync_page_info {
		void *vcpu;
	}

That provides the contextual information I want, and also provides the hint that
something is odd about the vcpu, which you want.  It's like a very poor man's closure :-)
	
The struct could even be passed by value to avoid the miniscule overhead, and to
make readers look extra hard because it's that much more wierd.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3d102522804a..068be77a4fff 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2003,8 +2003,13 @@ static void clear_sp_write_flooding_count(u64 *spte)
        __clear_sp_write_flooding_count(sptep_to_sp(spte));
 }

+/* Wrapper to make it difficult to dereference a potentially NULL @vcpu. */
+struct sync_page_info {
+       void *vcpu;
+};
+
 static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
-                                                    struct kvm_vcpu *vcpu,
+                                                    struct sync_page_info spi,
                                                     gfn_t gfn,
                                                     struct hlist_head *sp_list,
                                                     union kvm_mmu_page_role role)
@@ -2041,6 +2046,13 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
                        goto out;

                if (sp->unsync) {
+                       /*
+                        * Getting indirect shadow pages without a valid @spi
+                        * is not supported, i.e. this should never happen.
+                        */
+                       if (KVM_BUG_ON(!spi.vcpu, kvm))
+                               break;
+
                        /*
                         * The page is good, but is stale.  kvm_sync_page does
                         * get the latest guest state, but (unlike mmu_unsync_children)
@@ -2053,7 +2065,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
                         * If the sync fails, the page is zapped.  If so, break
                         * in order to rebuild it.
                         */
-                       ret = kvm_sync_page(vcpu, sp, &invalid_list);
+                       ret = kvm_sync_page(spi.vcpu, sp, &invalid_list);
                        if (ret < 0)
                                break;

@@ -2120,7 +2132,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 }

 static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
-                                                     struct kvm_vcpu *vcpu,
+                                                     struct sync_page_info spi,
                                                      struct shadow_page_caches *caches,
                                                      gfn_t gfn,
                                                      union kvm_mmu_page_role role)
@@ -2131,7 +2143,7 @@ static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,

        sp_list = &kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];

-       sp = kvm_mmu_find_shadow_page(kvm, vcpu, gfn, sp_list, role);
+       sp = kvm_mmu_find_shadow_page(kvm, spi, gfn, sp_list, role);
        if (!sp) {
                created = true;
                sp = kvm_mmu_alloc_shadow_page(kvm, caches, gfn, sp_list, role);
@@ -2151,7 +2163,11 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
                .gfn_array_cache = &vcpu->arch.mmu_gfn_array_cache,
        };

-       return __kvm_mmu_get_shadow_page(vcpu->kvm, vcpu, &caches, gfn, role);
+       struct sync_page_info spi = {
+               .vcpu = vcpu,
+       };
+
+       return __kvm_mmu_get_shadow_page(vcpu->kvm, spi, &caches, gfn, role);
 }

 static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct, u32 access)
David Matlack May 9, 2022, 11:59 p.m. UTC | #4
On Mon, May 9, 2022 at 3:57 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, May 09, 2022, David Matlack wrote:
> > On Thu, May 5, 2022 at 4:33 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Fri, Apr 22, 2022, David Matlack wrote:
> > > > Allow the vcpu pointer in __kvm_mmu_get_shadow_page() to be NULL. Rename
> > > > it to vcpu_or_null to prevent future commits from accidentally taking
> > > > dependency on it without first considering the NULL case.
> > > >
> > > > The vcpu pointer is only used for syncing indirect shadow pages in
> > > > kvm_mmu_find_shadow_page(). A vcpu pointer it not required for
> > > > correctness since unsync pages can simply be zapped. But this should
> > > > never occur in practice, since the only use-case for passing a NULL vCPU
> > > > pointer is eager page splitting which will only request direct shadow
> > > > pages (which can never be unsync).
> > > >
> > > > Even though __kvm_mmu_get_shadow_page() can gracefully handle a NULL
> > > > vcpu, add a WARN() that will fire if __kvm_mmu_get_shadow_page() is ever
> > > > called to get an indirect shadow page with a NULL vCPU pointer, since
> > > > zapping unsync SPs is a performance overhead that should be considered.
> > > >
> > > > Signed-off-by: David Matlack <dmatlack@google.com>
> > > > ---
> > > >  arch/x86/kvm/mmu/mmu.c | 40 ++++++++++++++++++++++++++++++++--------
> > > >  1 file changed, 32 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 04029c01aebd..21407bd4435a 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -1845,16 +1845,27 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
> > > >         &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)])     \
> > > >               if ((_sp)->gfn != (_gfn) || (_sp)->role.direct) {} else
> > > >
> > > > -static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> > > > -                      struct list_head *invalid_list)
> > > > +static int __kvm_sync_page(struct kvm *kvm, struct kvm_vcpu *vcpu_or_null,
> > > > +                        struct kvm_mmu_page *sp,
> > > > +                        struct list_head *invalid_list)
> > > >  {
> > > > -     int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
> > > > +     int ret = -1;
> > > > +
> > > > +     if (vcpu_or_null)
> > >
> > > This should never happen.  I like the idea of warning early, but I really don't
> > > like that the WARN is far removed from the code that actually depends on @vcpu
> > > being non-NULL. Case in point, KVM should have bailed on the WARN and never
> > > reached this point.  And the inner __kvm_sync_page() is completely unnecessary.
> >
> > Yeah that's fair.
> >
> > >
> > > I also don't love the vcpu_or_null terminology; I get the intent, but it doesn't
> > > really help because understand why/when it's NULL.
> >
> > Eh, I don't think it needs to encode why or when. It just needs to
> > flag to the reader (and future code authors) that this vcpu pointer
> > (unlike all other vcpu pointers in KVM) is NULL in certain cases.
>
> My objection is that without the why/when, developers that aren't familiar with
> this code won't know the rules for using vcpu_or_null.  E.g. I don't want to end
> up with
>
>         if (vcpu_or_null)
>                 do x;
>         else
>                 do y;
>
> because inevitably it'll become unclear whether or not that code is actually _correct_.
> It might not #GP on a NULL pointer, but it doesn't mean it's correct.

Ah, right. And that's actually why I put the big comment and WARN in
__kvm_mmu_get_shadow_page(). Readers could easily jump to where
vcpu_or_null is passed in and see the rules around it. But if we move
the WARN to the kvm_sync_page() call, I agree it will be harder for
readers to know the rules and "vcpu_or_null" starts to become a risky
variable.

>
> > > I played around with casting, e.g. to/from an unsigned long or void *, to prevent
> > > usage, but that doesn't work very well because 'unsigned long' ends up being
> > > awkward/confusing, and 'void *' is easily lost on a function call.  And both
> > > lose type safety :-(
> >
> > Yet another shortcoming of C :(
>
> And lack of closures, which would work very well here.
>
> > (The other being our other discussion about the RET_PF* return codes
> > getting easily misinterpreted as KVM's magic return-to-user /
> > continue-running-guest return codes.)
> >
> > Makes me miss Rust!
> >
> > >
> > > All in all, I think I'd prefer this patch to simply be a KVM_BUG_ON() if
> > > kvm_mmu_find_shadow_page() encounters an unsync page.  Less churn, and IMO there's
> > > no real loss in robustness, e.g. we'd really have to screw up code review and
> > > testing to introduce a null vCPU pointer dereference in this code.
> >
> > Agreed about moving the check here and dropping __kvm_sync_page(). But
> > I would prefer to retain the vcpu_or_null name (or at least something
> > other than "vcpu" to indicate there's something non-standard about
> > this pointer).
>
> The least awful idea I've come up with is wrapping the vCPU in a struct, e.g.
>
>         struct sync_page_info {
>                 void *vcpu;
>         }
>
> That provides the contextual information I want, and also provides the hint that
> something is odd about the vcpu, which you want.  It's like a very poor man's closure :-)
>
> The struct could even be passed by value to avoid the miniscule overhead, and to
> make readers look extra hard because it's that much more wierd.

Interesting idea. I'll give it a shot.

>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 3d102522804a..068be77a4fff 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2003,8 +2003,13 @@ static void clear_sp_write_flooding_count(u64 *spte)
>         __clear_sp_write_flooding_count(sptep_to_sp(spte));
>  }
>
> +/* Wrapper to make it difficult to dereference a potentially NULL @vcpu. */
> +struct sync_page_info {
> +       void *vcpu;
> +};
> +
>  static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
> -                                                    struct kvm_vcpu *vcpu,
> +                                                    struct sync_page_info spi,
>                                                      gfn_t gfn,
>                                                      struct hlist_head *sp_list,
>                                                      union kvm_mmu_page_role role)
> @@ -2041,6 +2046,13 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                         goto out;
>
>                 if (sp->unsync) {
> +                       /*
> +                        * Getting indirect shadow pages without a valid @spi
> +                        * is not supported, i.e. this should never happen.
> +                        */
> +                       if (KVM_BUG_ON(!spi.vcpu, kvm))
> +                               break;
> +
>                         /*
>                          * The page is good, but is stale.  kvm_sync_page does
>                          * get the latest guest state, but (unlike mmu_unsync_children)
> @@ -2053,7 +2065,7 @@ static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
>                          * If the sync fails, the page is zapped.  If so, break
>                          * in order to rebuild it.
>                          */
> -                       ret = kvm_sync_page(vcpu, sp, &invalid_list);
> +                       ret = kvm_sync_page(spi.vcpu, sp, &invalid_list);
>                         if (ret < 0)
>                                 break;
>
> @@ -2120,7 +2132,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
>  }
>
>  static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
> -                                                     struct kvm_vcpu *vcpu,
> +                                                     struct sync_page_info spi,
>                                                       struct shadow_page_caches *caches,
>                                                       gfn_t gfn,
>                                                       union kvm_mmu_page_role role)
> @@ -2131,7 +2143,7 @@ static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
>
>         sp_list = &kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
>
> -       sp = kvm_mmu_find_shadow_page(kvm, vcpu, gfn, sp_list, role);
> +       sp = kvm_mmu_find_shadow_page(kvm, spi, gfn, sp_list, role);
>         if (!sp) {
>                 created = true;
>                 sp = kvm_mmu_alloc_shadow_page(kvm, caches, gfn, sp_list, role);
> @@ -2151,7 +2163,11 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
>                 .gfn_array_cache = &vcpu->arch.mmu_gfn_array_cache,
>         };
>
> -       return __kvm_mmu_get_shadow_page(vcpu->kvm, vcpu, &caches, gfn, role);
> +       struct sync_page_info spi = {
> +               .vcpu = vcpu,
> +       };
> +
> +       return __kvm_mmu_get_shadow_page(vcpu->kvm, spi, &caches, gfn, role);
>  }
>
>  static union kvm_mmu_page_role kvm_mmu_child_role(u64 *sptep, bool direct, u32 access)
>
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 04029c01aebd..21407bd4435a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1845,16 +1845,27 @@  static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)])	\
 		if ((_sp)->gfn != (_gfn) || (_sp)->role.direct) {} else
 
-static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-			 struct list_head *invalid_list)
+static int __kvm_sync_page(struct kvm *kvm, struct kvm_vcpu *vcpu_or_null,
+			   struct kvm_mmu_page *sp,
+			   struct list_head *invalid_list)
 {
-	int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
+	int ret = -1;
+
+	if (vcpu_or_null)
+		ret = vcpu_or_null->arch.mmu->sync_page(vcpu_or_null, sp);
 
 	if (ret < 0)
-		kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
+		kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
+
 	return ret;
 }
 
+static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+			 struct list_head *invalid_list)
+{
+	return __kvm_sync_page(vcpu->kvm, vcpu, sp, invalid_list);
+}
+
 static bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm,
 					struct list_head *invalid_list,
 					bool remote_flush)
@@ -2004,7 +2015,7 @@  static void clear_sp_write_flooding_count(u64 *spte)
 }
 
 static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
-						     struct kvm_vcpu *vcpu,
+						     struct kvm_vcpu *vcpu_or_null,
 						     gfn_t gfn,
 						     struct hlist_head *sp_list,
 						     union kvm_mmu_page_role role)
@@ -2053,7 +2064,7 @@  static struct kvm_mmu_page *kvm_mmu_find_shadow_page(struct kvm *kvm,
 			 * If the sync fails, the page is zapped.  If so, break
 			 * in order to rebuild it.
 			 */
-			ret = kvm_sync_page(vcpu, sp, &invalid_list);
+			ret = __kvm_sync_page(kvm, vcpu_or_null, sp, &invalid_list);
 			if (ret < 0)
 				break;
 
@@ -2120,7 +2131,7 @@  static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 }
 
 static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
-						      struct kvm_vcpu *vcpu,
+						      struct kvm_vcpu *vcpu_or_null,
 						      struct shadow_page_caches *caches,
 						      gfn_t gfn,
 						      union kvm_mmu_page_role role)
@@ -2129,9 +2140,22 @@  static struct kvm_mmu_page *__kvm_mmu_get_shadow_page(struct kvm *kvm,
 	struct kvm_mmu_page *sp;
 	bool created = false;
 
+	/*
+	 * A vCPU pointer should always be provided when getting indirect
+	 * shadow pages, as that shadow page may already exist and need to be
+	 * synced using the vCPU pointer (see __kvm_sync_page()). Direct shadow
+	 * pages are never unsync and thus do not require a vCPU pointer.
+	 *
+	 * No need to panic here as __kvm_sync_page() falls back to zapping an
+	 * unsync page if the vCPU pointer is NULL. But still WARN() since
+	 * such zapping will impact performance and this situation is never
+	 * expected to occur in practice.
+	 */
+	WARN_ON(!vcpu_or_null && !role.direct);
+
 	sp_list = &kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
 
-	sp = kvm_mmu_find_shadow_page(kvm, vcpu, gfn, sp_list, role);
+	sp = kvm_mmu_find_shadow_page(kvm, vcpu_or_null, gfn, sp_list, role);
 	if (!sp) {
 		created = true;
 		sp = kvm_mmu_alloc_shadow_page(kvm, caches, gfn, sp_list, role);