[V3,02/12] KVM: X86/MMU: Add using_local_root_page()

Message ID	20220521131700.3661-3-jiangshanlai@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Lai Jiangshan <jiangshanlai@gmail.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>, Sean Christopherson <seanjc@google.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>, Maxim Levitsky <mlevitsk@redhat.com>, David Matlack <dmatlack@google.com>, Lai Jiangshan <jiangshan.ljs@antgroup.com> Subject: [PATCH V3 02/12] KVM: X86/MMU: Add using_local_root_page() Date: Sat, 21 May 2022 21:16:50 +0800 Message-Id: <20220521131700.3661-3-jiangshanlai@gmail.com> In-Reply-To: <20220521131700.3661-1-jiangshanlai@gmail.com> References: <20220521131700.3661-1-jiangshanlai@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	KVM: X86/MMU: Use one-off local shadow page for special roots \| expand [V3,00/12] KVM: X86/MMU: Use one-off local shadow page for special roots [V3,01/12] KVM: X86/MMU: Verify PDPTE for nested NPT in PAE paging mode when page fault [V3,02/12] KVM: X86/MMU: Add using_local_root_page() [V3,03/12] KVM: X86/MMU: Reduce a check in using_local_root_page() for common cases [V3,04/12] KVM: X86/MMU: Add local shadow pages [V3,05/12] KVM: X86/MMU: Link PAE root pagetable with its children [V3,06/12] KVM: X86/MMU: Activate local shadow pages and remove old logic [V3,07/12] KVM: X86/MMU: Remove the check of the return value of to_shadow_page() [V3,08/12] KVM: X86/MMU: Allocate mmu->pae_root for PAE paging on-demand [V3,09/12] KVM: X86/MMU: Move the verifying of NPT's PDPTE in FNAME(fetch) [V3,10/12] KVM: X86/MMU: Remove unused INVALID_PAE_ROOT and IS_VALID_PAE_ROOT [V3,11/12] KVM: X86/MMU: Don't use mmu->pae_root when shadowing PAE NPT in 64-bit host [V3,12/12] KVM: X86/MMU: Remove mmu_alloc_special_roots()

Message ID

20220521131700.3661-3-jiangshanlai@gmail.com (mailing list archive)

State

New, archived

Headers

From: Lai Jiangshan <jiangshanlai@gmail.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
        Paolo Bonzini <pbonzini@redhat.com>,
        Sean Christopherson <seanjc@google.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>,
        Maxim Levitsky <mlevitsk@redhat.com>,
        David Matlack <dmatlack@google.com>,
        Lai Jiangshan <jiangshan.ljs@antgroup.com>
Subject: [PATCH V3 02/12] KVM: X86/MMU: Add using_local_root_page()
Date: Sat, 21 May 2022 21:16:50 +0800
Message-Id: <20220521131700.3661-3-jiangshanlai@gmail.com>
In-Reply-To: <20220521131700.3661-1-jiangshanlai@gmail.com>
References: <20220521131700.3661-1-jiangshanlai@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

KVM: X86/MMU: Use one-off local shadow page for special roots | expand

Commit Message

Lai Jiangshan May 21, 2022, 1:16 p.m. UTC

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

In some cases, local root pages are used for MMU.  It is often using
to_shadow_page(mmu->root.hpa) to check if local root pages are used.

Add using_local_root_page() to directly check if local root pages are
used or needed to be used even mmu->root.hpa is not set.

Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via
using local shadow [root] pages.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

Comments

David Matlack May 26, 2022, 9:28 p.m. UTC | #1

On Sat, May 21, 2022 at 09:16:50PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> In some cases, local root pages are used for MMU.  It is often using
> to_shadow_page(mmu->root.hpa) to check if local root pages are used.
> 
> Add using_local_root_page() to directly check if local root pages are
> used or needed to be used even mmu->root.hpa is not set.
> 
> Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via
> using local shadow [root] pages.
> 
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 37 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index efe5a3dca1e0..624b6d2473f7 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
>  	mmu_spte_clear_no_track(parent_pte);
>  }
>  
> +/*
> + * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
> + * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
> + * 32bit L1 hypervisor.

How about using the terms "private" and "shared" instead of "local" and
"non-local"? I think that more accurately conveys what is special about
these pages: they are private to the vCPU using them. And then "shared"
is more intuitive to understand than "non-local" (which is used
elsewhere in this series).

> + *
> + * It includes cases:
> + *	nonpaging when !tdp_enabled				(direct paging)
> + *	shadow paging for 32 bit guest when !tdp_enabled	(shadow paging)
> + *	NPT in 32bit host (not shadowing nested NPT)		(direct paging)
> + *	shadow nested NPT for 32bit L1 hypervisor in 32bit host (shadow paging)
> + *	shadow nested NPT for 32bit L1 hypervisor in 64bit host (shadow paging)
> + *
> + * For the first four cases, mmu->root_role.level is PT32E_ROOT_LEVEL and the
> + * shadow pagetable is using PAE paging.
> + *
> + * For the last case, it is
> + * 	mmu->root_role.level > PT32E_ROOT_LEVEL &&
> + * 	!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL
> + * And if this condition is true, it must be the last case.
> + *
> + * With the two conditions combined, the checking condition is:
> + * 	mmu->root_role.level == PT32E_ROOT_LEVEL ||
> + * 	(!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
> + *
> + * (There is no "mmu->root_role.level > PT32E_ROOT_LEVEL" here, because it is
> + *  already ensured that mmu->root_role.level >= PT32E_ROOT_LEVEL)
> + */
> +static bool using_local_root_page(struct kvm_mmu *mmu)
> +{
> +	return mmu->root_role.level == PT32E_ROOT_LEVEL ||
> +	       (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
> +}
> +
>  static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
>  {
>  	struct kvm_mmu_page *sp;
> @@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
>  {
>  	/*
>  	 * For now, limit the caching to 64-bit hosts+VMs in order to avoid
> -	 * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
> -	 * later if necessary.
> +	 * having to deal with PDPTEs.  Local roots can not be put into
> +	 * mmu->prev_roots[] because mmu->pae_root can not be shared for
> +	 * different roots at the same time.

This comment ends up being a little confusing by the end of this series
because using_local_root_page() does not necessarily imply pae_root is
in use. i.e. case 5 (shadow nested NPT for 32bit L1 hypervisor in 64bit
host) does not use pae_root.

How about rewording this commit to say something like:

  If the vCPU is using a private root, it might be using pae_root, which
  cannot be shared for different roots at the same time.

>  	 */
> -	if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
> +	if (unlikely(using_local_root_page(mmu)))
>  		kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
>  
>  	if (VALID_PAGE(mmu->root.hpa))
> -- 
> 2.19.1.6.gb485710b
>

Sean Christopherson May 26, 2022, 9:38 p.m. UTC | #2

On Thu, May 26, 2022, David Matlack wrote:
> On Sat, May 21, 2022 at 09:16:50PM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> > 
> > In some cases, local root pages are used for MMU.  It is often using
> > to_shadow_page(mmu->root.hpa) to check if local root pages are used.
> > 
> > Add using_local_root_page() to directly check if local root pages are
> > used or needed to be used even mmu->root.hpa is not set.
> > 
> > Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via
> > using local shadow [root] pages.
> > 
> > Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 37 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index efe5a3dca1e0..624b6d2473f7 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
> >  	mmu_spte_clear_no_track(parent_pte);
> >  }
> >  
> > +/*
> > + * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
> > + * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
> > + * 32bit L1 hypervisor.
> 
> How about using the terms "private" and "shared" instead of "local" and
> "non-local"? I think that more accurately conveys what is special about
> these pages: they are private to the vCPU using them. And then "shared"
> is more intuitive to understand than "non-local" (which is used
> elsewhere in this series).

Please avoid "private" and "shared".  I haven't read the full context of the
discussion, but those terms have already been claimed by confidential VMs.

FWIW, I believe similar discussions happened around mm/ and kmap(), and they ended
up with thread_local and kmap_local().  Maybe "vCPU local" and "common"?

Sean Christopherson July 19, 2022, 10:03 p.m. UTC | #3

On Sat, May 21, 2022, Lai Jiangshan wrote:
> +static bool using_local_root_page(struct kvm_mmu *mmu)

Hmm, I agree with David that "local" isn't the most intuitive terminology.  But
I also do want to avoid private vs. shared to avoid confusion with confidential VMs.

Luckily, I don't think we need to come up with new terminology, just be literal
and call 'em "per-vCPU root pages".  E.g.

  static bool kvm_mmu_has_per_vcpu_root_page()

That way readers don't have to understand what "local" means, and that also captures
per-vCPU roots are an exception, i.e. that most roots are NOT per-vCPU.

> +{
> +	return mmu->root_role.level == PT32E_ROOT_LEVEL ||
> +	       (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
> +}
> +
>  static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
>  {
>  	struct kvm_mmu_page *sp;
> @@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
>  {
>  	/*
>  	 * For now, limit the caching to 64-bit hosts+VMs in order to avoid
> -	 * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
> -	 * later if necessary.
> +	 * having to deal with PDPTEs.  Local roots can not be put into
> +	 * mmu->prev_roots[] because mmu->pae_root can not be shared for
> +	 * different roots at the same time.
>  	 */
> -	if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
> +	if (unlikely(using_local_root_page(mmu)))

I don't know that I like using the local/per-vCPU helper.  The problem isn't _just_
that KVM is using a per-vCPU root, KVM is also deliberately punting on dealing with
PDTPRs.  E.g. the per-vCPU aspect doesn't explain why KVM doesn't allow reusing the
current root.  I don't like that the using_local_root_page() obfuscates that check.

My preference for this would be to revert back to a streamlined variation of the
code prior to commit 5499ea73e7db ("KVM: x86/mmu: look for a cached PGD when going
from 32-bit to 64-bit").

KVM switched to the !to_shadow_page() check to _avoid_ consuming (what is now)
mmu->root_role because, at the time of the patch, mmu held the _old_ data, which
was wrong/stale for nested virtualization transitions.

In other words, I would prefer that explicitly do (in a separate patch):

	/*
	 * For now, limit the fast switch to 64-bit VMs in order to avoid having
	 * to deal with PDPTEs.  32-bit VMs can be supported later if necessary.
	 */
	if (new_role.level < PT64_ROOT_LEVEL4)
		kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);

The "hosts+VMs" can be shortened to just "VMs", because running a 64-bit VM with
a 32-bit host just doesn't work for a variety of reasons, i.e. doesn't need to be
called out here.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index efe5a3dca1e0..624b6d2473f7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1690,6 +1690,39 @@  static void drop_parent_pte(struct kvm_mmu_page *sp,
 	mmu_spte_clear_no_track(parent_pte);
 }
 
+/*
+ * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
+ * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
+ * 32bit L1 hypervisor.
+ *
+ * It includes cases:
+ *	nonpaging when !tdp_enabled				(direct paging)
+ *	shadow paging for 32 bit guest when !tdp_enabled	(shadow paging)
+ *	NPT in 32bit host (not shadowing nested NPT)		(direct paging)
+ *	shadow nested NPT for 32bit L1 hypervisor in 32bit host (shadow paging)
+ *	shadow nested NPT for 32bit L1 hypervisor in 64bit host (shadow paging)
+ *
+ * For the first four cases, mmu->root_role.level is PT32E_ROOT_LEVEL and the
+ * shadow pagetable is using PAE paging.
+ *
+ * For the last case, it is
+ * 	mmu->root_role.level > PT32E_ROOT_LEVEL &&
+ * 	!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL
+ * And if this condition is true, it must be the last case.
+ *
+ * With the two conditions combined, the checking condition is:
+ * 	mmu->root_role.level == PT32E_ROOT_LEVEL ||
+ * 	(!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ *
+ * (There is no "mmu->root_role.level > PT32E_ROOT_LEVEL" here, because it is
+ *  already ensured that mmu->root_role.level >= PT32E_ROOT_LEVEL)
+ */
+static bool using_local_root_page(struct kvm_mmu *mmu)
+{
+	return mmu->root_role.level == PT32E_ROOT_LEVEL ||
+	       (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
+}
+
 static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
 {
 	struct kvm_mmu_page *sp;
@@ -4252,10 +4285,11 @@  static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
 {
 	/*
 	 * For now, limit the caching to 64-bit hosts+VMs in order to avoid
-	 * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
-	 * later if necessary.
+	 * having to deal with PDPTEs.  Local roots can not be put into
+	 * mmu->prev_roots[] because mmu->pae_root can not be shared for
+	 * different roots at the same time.
 	 */
-	if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
+	if (unlikely(using_local_root_page(mmu)))
 		kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
 
 	if (VALID_PAGE(mmu->root.hpa))

[V3,02/12] KVM: X86/MMU: Add using_local_root_page()

Commit Message

Comments

Patch