Message ID | 20220521131700.3661-3-jiangshanlai@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: X86/MMU: Use one-off local shadow page for special roots | expand |
On Sat, May 21, 2022 at 09:16:50PM +0800, Lai Jiangshan wrote: > From: Lai Jiangshan <jiangshan.ljs@antgroup.com> > > In some cases, local root pages are used for MMU. It is often using > to_shadow_page(mmu->root.hpa) to check if local root pages are used. > > Add using_local_root_page() to directly check if local root pages are > used or needed to be used even mmu->root.hpa is not set. > > Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via > using local shadow [root] pages. > > Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> > --- > arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++--- > 1 file changed, 37 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index efe5a3dca1e0..624b6d2473f7 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp, > mmu_spte_clear_no_track(parent_pte); > } > > +/* > + * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the > + * shadow pagetable is using PAE paging or the host is shadowing nested NPT for > + * 32bit L1 hypervisor. How about using the terms "private" and "shared" instead of "local" and "non-local"? I think that more accurately conveys what is special about these pages: they are private to the vCPU using them. And then "shared" is more intuitive to understand than "non-local" (which is used elsewhere in this series). > + * > + * It includes cases: > + * nonpaging when !tdp_enabled (direct paging) > + * shadow paging for 32 bit guest when !tdp_enabled (shadow paging) > + * NPT in 32bit host (not shadowing nested NPT) (direct paging) > + * shadow nested NPT for 32bit L1 hypervisor in 32bit host (shadow paging) > + * shadow nested NPT for 32bit L1 hypervisor in 64bit host (shadow paging) > + * > + * For the first four cases, mmu->root_role.level is PT32E_ROOT_LEVEL and the > + * shadow pagetable is using PAE paging. > + * > + * For the last case, it is > + * mmu->root_role.level > PT32E_ROOT_LEVEL && > + * !mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL > + * And if this condition is true, it must be the last case. > + * > + * With the two conditions combined, the checking condition is: > + * mmu->root_role.level == PT32E_ROOT_LEVEL || > + * (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL) > + * > + * (There is no "mmu->root_role.level > PT32E_ROOT_LEVEL" here, because it is > + * already ensured that mmu->root_role.level >= PT32E_ROOT_LEVEL) > + */ > +static bool using_local_root_page(struct kvm_mmu *mmu) > +{ > + return mmu->root_role.level == PT32E_ROOT_LEVEL || > + (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL); > +} > + > static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct) > { > struct kvm_mmu_page *sp; > @@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu, > { > /* > * For now, limit the caching to 64-bit hosts+VMs in order to avoid > - * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs > - * later if necessary. > + * having to deal with PDPTEs. Local roots can not be put into > + * mmu->prev_roots[] because mmu->pae_root can not be shared for > + * different roots at the same time. This comment ends up being a little confusing by the end of this series because using_local_root_page() does not necessarily imply pae_root is in use. i.e. case 5 (shadow nested NPT for 32bit L1 hypervisor in 64bit host) does not use pae_root. How about rewording this commit to say something like: If the vCPU is using a private root, it might be using pae_root, which cannot be shared for different roots at the same time. > */ > - if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa)) > + if (unlikely(using_local_root_page(mmu))) > kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT); > > if (VALID_PAGE(mmu->root.hpa)) > -- > 2.19.1.6.gb485710b >
On Thu, May 26, 2022, David Matlack wrote: > On Sat, May 21, 2022 at 09:16:50PM +0800, Lai Jiangshan wrote: > > From: Lai Jiangshan <jiangshan.ljs@antgroup.com> > > > > In some cases, local root pages are used for MMU. It is often using > > to_shadow_page(mmu->root.hpa) to check if local root pages are used. > > > > Add using_local_root_page() to directly check if local root pages are > > used or needed to be used even mmu->root.hpa is not set. > > > > Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via > > using local shadow [root] pages. > > > > Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> > > --- > > arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++--- > > 1 file changed, 37 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index efe5a3dca1e0..624b6d2473f7 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp, > > mmu_spte_clear_no_track(parent_pte); > > } > > > > +/* > > + * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the > > + * shadow pagetable is using PAE paging or the host is shadowing nested NPT for > > + * 32bit L1 hypervisor. > > How about using the terms "private" and "shared" instead of "local" and > "non-local"? I think that more accurately conveys what is special about > these pages: they are private to the vCPU using them. And then "shared" > is more intuitive to understand than "non-local" (which is used > elsewhere in this series). Please avoid "private" and "shared". I haven't read the full context of the discussion, but those terms have already been claimed by confidential VMs. FWIW, I believe similar discussions happened around mm/ and kmap(), and they ended up with thread_local and kmap_local(). Maybe "vCPU local" and "common"?
On Sat, May 21, 2022, Lai Jiangshan wrote: > +static bool using_local_root_page(struct kvm_mmu *mmu) Hmm, I agree with David that "local" isn't the most intuitive terminology. But I also do want to avoid private vs. shared to avoid confusion with confidential VMs. Luckily, I don't think we need to come up with new terminology, just be literal and call 'em "per-vCPU root pages". E.g. static bool kvm_mmu_has_per_vcpu_root_page() That way readers don't have to understand what "local" means, and that also captures per-vCPU roots are an exception, i.e. that most roots are NOT per-vCPU. > +{ > + return mmu->root_role.level == PT32E_ROOT_LEVEL || > + (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL); > +} > + > static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct) > { > struct kvm_mmu_page *sp; > @@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu, > { > /* > * For now, limit the caching to 64-bit hosts+VMs in order to avoid > - * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs > - * later if necessary. > + * having to deal with PDPTEs. Local roots can not be put into > + * mmu->prev_roots[] because mmu->pae_root can not be shared for > + * different roots at the same time. > */ > - if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa)) > + if (unlikely(using_local_root_page(mmu))) I don't know that I like using the local/per-vCPU helper. The problem isn't _just_ that KVM is using a per-vCPU root, KVM is also deliberately punting on dealing with PDTPRs. E.g. the per-vCPU aspect doesn't explain why KVM doesn't allow reusing the current root. I don't like that the using_local_root_page() obfuscates that check. My preference for this would be to revert back to a streamlined variation of the code prior to commit 5499ea73e7db ("KVM: x86/mmu: look for a cached PGD when going from 32-bit to 64-bit"). KVM switched to the !to_shadow_page() check to _avoid_ consuming (what is now) mmu->root_role because, at the time of the patch, mmu held the _old_ data, which was wrong/stale for nested virtualization transitions. In other words, I would prefer that explicitly do (in a separate patch): /* * For now, limit the fast switch to 64-bit VMs in order to avoid having * to deal with PDPTEs. 32-bit VMs can be supported later if necessary. */ if (new_role.level < PT64_ROOT_LEVEL4) kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT); The "hosts+VMs" can be shortened to just "VMs", because running a 64-bit VM with a 32-bit host just doesn't work for a variety of reasons, i.e. doesn't need to be called out here.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index efe5a3dca1e0..624b6d2473f7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp, mmu_spte_clear_no_track(parent_pte); } +/* + * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the + * shadow pagetable is using PAE paging or the host is shadowing nested NPT for + * 32bit L1 hypervisor. + * + * It includes cases: + * nonpaging when !tdp_enabled (direct paging) + * shadow paging for 32 bit guest when !tdp_enabled (shadow paging) + * NPT in 32bit host (not shadowing nested NPT) (direct paging) + * shadow nested NPT for 32bit L1 hypervisor in 32bit host (shadow paging) + * shadow nested NPT for 32bit L1 hypervisor in 64bit host (shadow paging) + * + * For the first four cases, mmu->root_role.level is PT32E_ROOT_LEVEL and the + * shadow pagetable is using PAE paging. + * + * For the last case, it is + * mmu->root_role.level > PT32E_ROOT_LEVEL && + * !mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL + * And if this condition is true, it must be the last case. + * + * With the two conditions combined, the checking condition is: + * mmu->root_role.level == PT32E_ROOT_LEVEL || + * (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL) + * + * (There is no "mmu->root_role.level > PT32E_ROOT_LEVEL" here, because it is + * already ensured that mmu->root_role.level >= PT32E_ROOT_LEVEL) + */ +static bool using_local_root_page(struct kvm_mmu *mmu) +{ + return mmu->root_role.level == PT32E_ROOT_LEVEL || + (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL); +} + static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct) { struct kvm_mmu_page *sp; @@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu, { /* * For now, limit the caching to 64-bit hosts+VMs in order to avoid - * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs - * later if necessary. + * having to deal with PDPTEs. Local roots can not be put into + * mmu->prev_roots[] because mmu->pae_root can not be shared for + * different roots at the same time. */ - if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa)) + if (unlikely(using_local_root_page(mmu))) kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT); if (VALID_PAGE(mmu->root.hpa))