diff mbox series

[V3,02/12] KVM: X86/MMU: Add using_local_root_page()

Message ID 20220521131700.3661-3-jiangshanlai@gmail.com (mailing list archive)
State New, archived
Headers show
Series KVM: X86/MMU: Use one-off local shadow page for special roots | expand

Commit Message

Lai Jiangshan May 21, 2022, 1:16 p.m. UTC
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

In some cases, local root pages are used for MMU.  It is often using
to_shadow_page(mmu->root.hpa) to check if local root pages are used.

Add using_local_root_page() to directly check if local root pages are
used or needed to be used even mmu->root.hpa is not set.

Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via
using local shadow [root] pages.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

Comments

David Matlack May 26, 2022, 9:28 p.m. UTC | #1
On Sat, May 21, 2022 at 09:16:50PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> In some cases, local root pages are used for MMU.  It is often using
> to_shadow_page(mmu->root.hpa) to check if local root pages are used.
> 
> Add using_local_root_page() to directly check if local root pages are
> used or needed to be used even mmu->root.hpa is not set.
> 
> Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via
> using local shadow [root] pages.
> 
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 37 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index efe5a3dca1e0..624b6d2473f7 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
>  	mmu_spte_clear_no_track(parent_pte);
>  }
>  
> +/*
> + * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
> + * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
> + * 32bit L1 hypervisor.

How about using the terms "private" and "shared" instead of "local" and
"non-local"? I think that more accurately conveys what is special about
these pages: they are private to the vCPU using them. And then "shared"
is more intuitive to understand than "non-local" (which is used
elsewhere in this series).

> + *
> + * It includes cases:
> + *	nonpaging when !tdp_enabled				(direct paging)
> + *	shadow paging for 32 bit guest when !tdp_enabled	(shadow paging)
> + *	NPT in 32bit host (not shadowing nested NPT)		(direct paging)
> + *	shadow nested NPT for 32bit L1 hypervisor in 32bit host (shadow paging)
> + *	shadow nested NPT for 32bit L1 hypervisor in 64bit host (shadow paging)
> + *
> + * For the first four cases, mmu->root_role.level is PT32E_ROOT_LEVEL and the
> + * shadow pagetable is using PAE paging.
> + *
> + * For the last case, it is
> + * 	mmu->root_role.level > PT32E_ROOT_LEVEL &&
> + * 	!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL
> + * And if this condition is true, it must be the last case.
> + *
> + * With the two conditions combined, the checking condition is:
> + * 	mmu->root_role.level == PT32E_ROOT_LEVEL ||
> + * 	(!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
> + *
> + * (There is no "mmu->root_role.level > PT32E_ROOT_LEVEL" here, because it is
> + *  already ensured that mmu->root_role.level >= PT32E_ROOT_LEVEL)
> + */
> +static bool using_local_root_page(struct kvm_mmu *mmu)
> +{
> +	return mmu->root_role.level == PT32E_ROOT_LEVEL ||
> +	       (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
> +}
> +
>  static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
>  {
>  	struct kvm_mmu_page *sp;
> @@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
>  {
>  	/*
>  	 * For now, limit the caching to 64-bit hosts+VMs in order to avoid
> -	 * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
> -	 * later if necessary.
> +	 * having to deal with PDPTEs.  Local roots can not be put into
> +	 * mmu->prev_roots[] because mmu->pae_root can not be shared for
> +	 * different roots at the same time.

This comment ends up being a little confusing by the end of this series
because using_local_root_page() does not necessarily imply pae_root is
in use. i.e. case 5 (shadow nested NPT for 32bit L1 hypervisor in 64bit
host) does not use pae_root.

How about rewording this commit to say something like:

  If the vCPU is using a private root, it might be using pae_root, which
  cannot be shared for different roots at the same time.

>  	 */
> -	if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
> +	if (unlikely(using_local_root_page(mmu)))
>  		kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
>  
>  	if (VALID_PAGE(mmu->root.hpa))
> -- 
> 2.19.1.6.gb485710b
>
Sean Christopherson May 26, 2022, 9:38 p.m. UTC | #2
On Thu, May 26, 2022, David Matlack wrote:
> On Sat, May 21, 2022 at 09:16:50PM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> > 
> > In some cases, local root pages are used for MMU.  It is often using
> > to_shadow_page(mmu->root.hpa) to check if local root pages are used.
> > 
> > Add using_local_root_page() to directly check if local root pages are
> > used or needed to be used even mmu->root.hpa is not set.
> > 
> > Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via
> > using local shadow [root] pages.
> > 
> > Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 37 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index efe5a3dca1e0..624b6d2473f7 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
> >  	mmu_spte_clear_no_track(parent_pte);
> >  }
> >  
> > +/*
> > + * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
> > + * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
> > + * 32bit L1 hypervisor.
> 
> How about using the terms "private" and "shared" instead of "local" and
> "non-local"? I think that more accurately conveys what is special about
> these pages: they are private to the vCPU using them. And then "shared"
> is more intuitive to understand than "non-local" (which is used
> elsewhere in this series).

Please avoid "private" and "shared".  I haven't read the full context of the
discussion, but those terms have already been claimed by confidential VMs.

FWIW, I believe similar discussions happened around mm/ and kmap(), and they ended
up with thread_local and kmap_local().  Maybe "vCPU local" and "common"?
Sean Christopherson July 19, 2022, 10:03 p.m. UTC | #3
On Sat, May 21, 2022, Lai Jiangshan wrote:
> +static bool using_local_root_page(struct kvm_mmu *mmu)

Hmm, I agree with David that "local" isn't the most intuitive terminology.  But
I also do want to avoid private vs. shared to avoid confusion with confidential VMs.

Luckily, I don't think we need to come up with new terminology, just be literal
and call 'em "per-vCPU root pages".  E.g.

  static bool kvm_mmu_has_per_vcpu_root_page()

That way readers don't have to understand what "local" means, and that also captures
per-vCPU roots are an exception, i.e. that most roots are NOT per-vCPU.

> +{
> +	return mmu->root_role.level == PT32E_ROOT_LEVEL ||
> +	       (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
> +}
> +
>  static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
>  {
>  	struct kvm_mmu_page *sp;
> @@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
>  {
>  	/*
>  	 * For now, limit the caching to 64-bit hosts+VMs in order to avoid
> -	 * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
> -	 * later if necessary.
> +	 * having to deal with PDPTEs.  Local roots can not be put into
> +	 * mmu->prev_roots[] because mmu->pae_root can not be shared for
> +	 * different roots at the same time.
>  	 */
> -	if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
> +	if (unlikely(using_local_root_page(mmu)))

I don't know that I like using the local/per-vCPU helper.  The problem isn't _just_
that KVM is using a per-vCPU root, KVM is also deliberately punting on dealing with
PDTPRs.  E.g. the per-vCPU aspect doesn't explain why KVM doesn't allow reusing the
current root.  I don't like that the using_local_root_page() obfuscates that check.

My preference for this would be to revert back to a streamlined variation of the
code prior to commit 5499ea73e7db ("KVM: x86/mmu: look for a cached PGD when going
from 32-bit to 64-bit").

KVM switched to the !to_shadow_page() check to _avoid_ consuming (what is now)
mmu->root_role because, at the time of the patch, mmu held the _old_ data, which
was wrong/stale for nested virtualization transitions.

In other words, I would prefer that explicitly do (in a separate patch):

	/*
	 * For now, limit the fast switch to 64-bit VMs in order to avoid having
	 * to deal with PDPTEs.  32-bit VMs can be supported later if necessary.
	 */
	if (new_role.level < PT64_ROOT_LEVEL4)
		kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);

The "hosts+VMs" can be shortened to just "VMs", because running a 64-bit VM with
a 32-bit host just doesn't work for a variety of reasons, i.e. doesn't need to be
called out here.
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index efe5a3dca1e0..624b6d2473f7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1690,6 +1690,39 @@  static void drop_parent_pte(struct kvm_mmu_page *sp,
 	mmu_spte_clear_no_track(parent_pte);
 }
 
+/*
+ * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
+ * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
+ * 32bit L1 hypervisor.
+ *
+ * It includes cases:
+ *	nonpaging when !tdp_enabled				(direct paging)
+ *	shadow paging for 32 bit guest when !tdp_enabled	(shadow paging)
+ *	NPT in 32bit host (not shadowing nested NPT)		(direct paging)
+ *	shadow nested NPT for 32bit L1 hypervisor in 32bit host (shadow paging)
+ *	shadow nested NPT for 32bit L1 hypervisor in 64bit host (shadow paging)
+ *
+ * For the first four cases, mmu->root_role.level is PT32E_ROOT_LEVEL and the
+ * shadow pagetable is using PAE paging.
+ *
+ * For the last case, it is
+ * 	mmu->root_role.level > PT32E_ROOT_LEVEL &&
+ * 	!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL
+ * And if this condition is true, it must be the last case.
+ *
+ * With the two conditions combined, the checking condition is:
+ * 	mmu->root_role.level == PT32E_ROOT_LEVEL ||
+ * 	(!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ *
+ * (There is no "mmu->root_role.level > PT32E_ROOT_LEVEL" here, because it is
+ *  already ensured that mmu->root_role.level >= PT32E_ROOT_LEVEL)
+ */
+static bool using_local_root_page(struct kvm_mmu *mmu)
+{
+	return mmu->root_role.level == PT32E_ROOT_LEVEL ||
+	       (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
+}
+
 static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
 {
 	struct kvm_mmu_page *sp;
@@ -4252,10 +4285,11 @@  static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
 {
 	/*
 	 * For now, limit the caching to 64-bit hosts+VMs in order to avoid
-	 * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
-	 * later if necessary.
+	 * having to deal with PDPTEs.  Local roots can not be put into
+	 * mmu->prev_roots[] because mmu->pae_root can not be shared for
+	 * different roots at the same time.
 	 */
-	if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
+	if (unlikely(using_local_root_page(mmu)))
 		kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
 
 	if (VALID_PAGE(mmu->root.hpa))