Message ID | 20211111144634.88972-1-jiangshanlai@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: X86: Fix and clean up for register caches | expand |
Hello, Paolo any thought/concern about this one? Thanks Lai On 2021/11/11 22:46, Lai Jiangshan wrote: > From: Lai Jiangshan <laijs@linux.alibaba.com> > > For shadow paging, the pae_root needs to be reconstructed before the > coming VMENTER if the guest PDPTEs is changed. > > But not all paths that call load_pdptrs() will cause the pae_root to be > reconstructed. Normally, kvm_mmu_reset_context() and kvm_mmu_free_roots() > are used to launch later reconstruction. > > The commit d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and > CR0.NW are changed") skips kvm_mmu_reset_context() after load_pdptrs() > when changing CR0.CD and CR0.NW. > > The commit 21823fbda552("KVM: x86: Invalidate all PGDs for the current > PCID on MOV CR3 w/ flush") skips kvm_mmu_free_roots() after > load_pdptrs() when rewriting the CR3 with the same value. > > The commit a91a7c709600("KVM: X86: Don't reset mmu context when > toggling X86_CR4_PGE") skips kvm_mmu_reset_context() after > load_pdptrs() when changing CR4.PGE. > > Normally, the guest doesn't change the PDPTEs before doing only the > above operation without touching other bits that can force pae_root to > be reconstructed. Guests like linux would keep the PDPTEs unchaged > for every instance of pagetable. > > Fixes: d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed") > Fixes: 21823fbda552("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush") > Fixes: a91a7c709600("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE") > Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> > --- > arch/x86/kvm/x86.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 0176eaa86a35..cfba337e46ab 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -832,8 +832,14 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3) > if (memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) { > memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)); > kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR); > - /* Ensure the dirty PDPTEs to be loaded. */ > - kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); > + /* > + * Ensure the dirty PDPTEs to be loaded for VMX with EPT > + * enabled or pae_root to be reconstructed for shadow paging. > + */ > + if (tdp_enabled) > + kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); > + else > + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); > } > vcpu->arch.pdptrs_from_userspace = false; > >
On Thu, Nov 11, 2021, Lai Jiangshan wrote: > From: Lai Jiangshan <laijs@linux.alibaba.com> > > For shadow paging, the pae_root needs to be reconstructed before the > coming VMENTER if the guest PDPTEs is changed. > > But not all paths that call load_pdptrs() will cause the pae_root to be > reconstructed. Normally, kvm_mmu_reset_context() and kvm_mmu_free_roots() > are used to launch later reconstruction. > > The commit d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and > CR0.NW are changed") skips kvm_mmu_reset_context() after load_pdptrs() > when changing CR0.CD and CR0.NW. > > The commit 21823fbda552("KVM: x86: Invalidate all PGDs for the current > PCID on MOV CR3 w/ flush") skips kvm_mmu_free_roots() after > load_pdptrs() when rewriting the CR3 with the same value. This isn't accurate, prior to that commit KVM wasn't guaranteed to do kvm_mmu_free_roots() if it got a hit on the current CR3 or if a previous CR3 in the cache matched the new CR3 (the "cache" has done some odd things in the past). So I think this particular flavor would be: Fixes: 7c390d350f8b ("kvm: x86: Add fast CR3 switch code path") > The commit a91a7c709600("KVM: X86: Don't reset mmu context when > toggling X86_CR4_PGE") skips kvm_mmu_reset_context() after > load_pdptrs() when changing CR4.PGE. > > Normally, the guest doesn't change the PDPTEs before doing only the > above operation without touching other bits that can force pae_root to > be reconstructed. Guests like linux would keep the PDPTEs unchaged > for every instance of pagetable. > > Fixes: d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed") > Fixes: 21823fbda552("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush") > Fixes: a91a7c709600("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE") > Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> > --- > arch/x86/kvm/x86.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 0176eaa86a35..cfba337e46ab 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -832,8 +832,14 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3) > if (memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) { > memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)); > kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR); > - /* Ensure the dirty PDPTEs to be loaded. */ > - kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); > + /* > + * Ensure the dirty PDPTEs to be loaded for VMX with EPT > + * enabled or pae_root to be reconstructed for shadow paging. > + */ > + if (tdp_enabled) > + kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); > + else > + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); Shouldn't matter since it's legacy shadow paging, but @mmu should be used instead of vcpu->arch.mmuvcpu->arch.mmu. To avoid a dependency on the previous patch, I think it makes sense to have this be: if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT); before the memcpy(). Then we can decide independently if skipping the KVM_REQ_LOAD_MMU_PGD if the PDPTRs are unchanged with respect to the MMU is safe.
On 2021/12/8 08:15, Sean Christopherson wrote: >> >> The commit 21823fbda552("KVM: x86: Invalidate all PGDs for the current >> PCID on MOV CR3 w/ flush") skips kvm_mmu_free_roots() after >> load_pdptrs() when rewriting the CR3 with the same value. > > This isn't accurate, prior to that commit KVM wasn't guaranteed to do > kvm_mmu_free_roots() if it got a hit on the current CR3 or if a previous CR3 in > the cache matched the new CR3 (the "cache" has done some odd things in the past). > > So I think this particular flavor would be: > > Fixes: 7c390d350f8b ("kvm: x86: Add fast CR3 switch code path") If guest is 32bit, fast_cr3_switch() always return false, and kvm_mmu_free_roots() is always called, and no cr3 goes in prev_root. And from 21823fbda552, fast_cr3_switch() and kvm_mmu_free_roots() are both skipped when cr3 is unchanged. > >> The commit a91a7c709600("KVM: X86: Don't reset mmu context when >> toggling X86_CR4_PGE") skips kvm_mmu_reset_context() after >> load_pdptrs() when changing CR4.PGE. >> >> Normally, the guest doesn't change the PDPTEs before doing only the >> above operation without touching other bits that can force pae_root to >> be reconstructed. Guests like linux would keep the PDPTEs unchaged >> for every instance of pagetable. >> >> Fixes: d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed") >> Fixes: 21823fbda552("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush") >> Fixes: a91a7c709600("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE") >> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> >> --- >> arch/x86/kvm/x86.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 0176eaa86a35..cfba337e46ab 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -832,8 +832,14 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3) >> if (memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) { >> memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)); >> kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR); >> - /* Ensure the dirty PDPTEs to be loaded. */ >> - kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); >> + /* >> + * Ensure the dirty PDPTEs to be loaded for VMX with EPT >> + * enabled or pae_root to be reconstructed for shadow paging. >> + */ >> + if (tdp_enabled) >> + kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); >> + else >> + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); > > Shouldn't matter since it's legacy shadow paging, but @mmu should be used instead > of vcpu->arch.mmuvcpu->arch.mmu. @mmu is the "guest mmu" (vcpu->arch.walk_mmu), which is used to walk including loading pdptr. vcpu->arch.mmu is for host constructing mmu for shadowed or tdp mmu which is used in host side management including kvm_mmu_free_roots(). Even they are the same pointer now for !tdp, the meaning is different. I prefer to distinguish them even before kvm_mmu is split different for guest mmu (vcpu->arch.walk_mmu) and host constructing mmu (vcpu->arch.mmu). (I once searched all the usage of undistinguished usage of kvm_mmu *mmu, and found a bug, see "Use vcpu->arch.walk_mmu for kvm_mmu_invlpg()") I think Paolo is doing the splitting, unless I would take the job because I have some patches pending depended them. > > To avoid a dependency on the previous patch, I think it makes sense to have this be: > > if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) > kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT); > Yes, it is a good idea to add this first. Thanks for review and suggestion. Lai
On Wed, Dec 08, 2021, Lai Jiangshan wrote: > > > On 2021/12/8 08:15, Sean Christopherson wrote: > > > > > > The commit 21823fbda552("KVM: x86: Invalidate all PGDs for the current > > > PCID on MOV CR3 w/ flush") skips kvm_mmu_free_roots() after > > > load_pdptrs() when rewriting the CR3 with the same value. > > > > This isn't accurate, prior to that commit KVM wasn't guaranteed to do > > kvm_mmu_free_roots() if it got a hit on the current CR3 or if a previous CR3 in > > the cache matched the new CR3 (the "cache" has done some odd things in the past). > > > > So I think this particular flavor would be: > > > > Fixes: 7c390d350f8b ("kvm: x86: Add fast CR3 switch code path") > > If guest is 32bit, fast_cr3_switch() always return false, and > kvm_mmu_free_roots() is always called, and no cr3 goes in prev_root. Oh, duh, PAE paging. > > > + /* > > > + * Ensure the dirty PDPTEs to be loaded for VMX with EPT > > > + * enabled or pae_root to be reconstructed for shadow paging. > > > + */ > > > + if (tdp_enabled) > > > + kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); > > > + else > > > + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); > > > > Shouldn't matter since it's legacy shadow paging, but @mmu should be used instead > > of vcpu->arch.mmuvcpu->arch.mmu. > > @mmu is the "guest mmu" (vcpu->arch.walk_mmu), which is used to walk > including loading pdptr. > > vcpu->arch.mmu is for host constructing mmu for shadowed or tdp mmu > which is used in host side management including kvm_mmu_free_roots(). > > Even they are the same pointer now for !tdp, the meaning is different. I prefer > to distinguish them even before kvm_mmu is split different for guest mmu > (vcpu->arch.walk_mmu) and host constructing mmu (vcpu->arch.mmu). I see where you're coming from. I was viewing it from the perspective of, "they're the same thing for shadow paging, why reread mmu?". I'm ok with explicitly referencing arch.mmu, but maybe add a comment?
On 12/8/21 01:15, Sean Christopherson wrote: >> @@ -832,8 +832,14 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3) >> if (memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) { >> memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)); >> kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR); >> - /* Ensure the dirty PDPTEs to be loaded. */ >> - kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); >> + /* >> + * Ensure the dirty PDPTEs to be loaded for VMX with EPT >> + * enabled or pae_root to be reconstructed for shadow paging. >> + */ >> + if (tdp_enabled) >> + kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); >> + else >> + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); > Shouldn't matter since it's legacy shadow paging, but @mmu should be used instead > of vcpu->arch.mmuvcpu->arch.mmu. In kvm/next actually there's no mmu parameter to load_pdptrs, so it's okay to keep vcpu->arch.mmu. > To avoid a dependency on the previous patch, I think it makes sense to have this be: > > if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) > kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT); > > before the memcpy(). > > Then we can decide independently if skipping the KVM_REQ_LOAD_MMU_PGD if the > PDPTRs are unchanged with respect to the MMU is safe. Do you disagree that there's already an invariant that the PDPTRs can only be dirty if KVM_REQ_LOAD_MMU_PGD---and therefore a previous change to the PDPTRs would have triggered KVM_REQ_LOAD_MMU_PGD? This is opposed to the guest TLB flush due to MOV CR3; that one has to be done unconditionally for PAE paging, and it is handled separately within kvm_set_cr3. Paolo
On Thu, Dec 09, 2021, Paolo Bonzini wrote: > On 12/8/21 01:15, Sean Christopherson wrote: > > > @@ -832,8 +832,14 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3) > > > if (memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) { > > > memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)); > > > kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR); > > > - /* Ensure the dirty PDPTEs to be loaded. */ > > > - kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); > > > + /* > > > + * Ensure the dirty PDPTEs to be loaded for VMX with EPT > > > + * enabled or pae_root to be reconstructed for shadow paging. > > > + */ > > > + if (tdp_enabled) > > > + kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); > > > + else > > > + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); > > Shouldn't matter since it's legacy shadow paging, but @mmu should be used instead > > of vcpu->arch.mmuvcpu->arch.mmu. > > In kvm/next actually there's no mmu parameter to load_pdptrs, so it's okay > to keep vcpu->arch.mmu. > > > To avoid a dependency on the previous patch, I think it makes sense to have this be: > > > > if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) > > kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT); > > > > before the memcpy(). > > > > Then we can decide independently if skipping the KVM_REQ_LOAD_MMU_PGD if the > > PDPTRs are unchanged with respect to the MMU is safe. > > Do you disagree that there's already an invariant that the PDPTRs can only > be dirty if KVM_REQ_LOAD_MMU_PGD---and therefore a previous change to the > PDPTRs would have triggered KVM_REQ_LOAD_MMU_PGD? What I think is moot, because commit 24cd19a28cb7 ("KVM: X86: Update mmu->pdptrs only when it is changed") breaks nested VMs with EPT in L0 and PAE shadow paging in L2. Reproducing is trivial, just disable EPT in L1 and run a VM. I haven't investigating how it breaks things, because why it's broken is secondary for me. My primary concern is that we would even consider optimizing the PDPTR logic without a mountain of evidence that any patch is correct for all scenarios. We had to add an entire ioctl() just to get PDPTRs functional. This apparently wasn't validated against a simple use case, let alone against things like migration with nested VMs, multliple L2s, etc...
On Fri, Dec 10, 2021, Sean Christopherson wrote: > On Thu, Dec 09, 2021, Paolo Bonzini wrote: > > On 12/8/21 01:15, Sean Christopherson wrote: > > > > @@ -832,8 +832,14 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3) > > > > if (memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) { > > > > memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)); > > > > kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR); > > > > - /* Ensure the dirty PDPTEs to be loaded. */ > > > > - kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); > > > > + /* > > > > + * Ensure the dirty PDPTEs to be loaded for VMX with EPT > > > > + * enabled or pae_root to be reconstructed for shadow paging. > > > > + */ > > > > + if (tdp_enabled) > > > > + kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); > > > > + else > > > > + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); > > > Shouldn't matter since it's legacy shadow paging, but @mmu should be used instead > > > of vcpu->arch.mmuvcpu->arch.mmu. > > > > In kvm/next actually there's no mmu parameter to load_pdptrs, so it's okay > > to keep vcpu->arch.mmu. > > > > > To avoid a dependency on the previous patch, I think it makes sense to have this be: > > > > > > if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) > > > kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT); > > > > > > before the memcpy(). > > > > > > Then we can decide independently if skipping the KVM_REQ_LOAD_MMU_PGD if the > > > PDPTRs are unchanged with respect to the MMU is safe. > > > > Do you disagree that there's already an invariant that the PDPTRs can only > > be dirty if KVM_REQ_LOAD_MMU_PGD---and therefore a previous change to the > > PDPTRs would have triggered KVM_REQ_LOAD_MMU_PGD? > > What I think is moot, because commit 24cd19a28cb7 ("KVM: X86: Update mmu->pdptrs > only when it is changed") breaks nested VMs with EPT in L0 and PAE shadow paging > in L2. Reproducing is trivial, just disable EPT in L1 and run a VM. I haven't Doh, s/L2/L1 > investigating how it breaks things, because why it's broken is secondary for me. > > My primary concern is that we would even consider optimizing the PDPTR logic without > a mountain of evidence that any patch is correct for all scenarios. We had to add > an entire ioctl() just to get PDPTRs functional. This apparently wasn't validated > against a simple use case, let alone against things like migration with nested VMs, > multliple L2s, etc...
On Fri, 2021-12-10 at 21:07 +0000, Sean Christopherson wrote: > On Thu, Dec 09, 2021, Paolo Bonzini wrote: > > On 12/8/21 01:15, Sean Christopherson wrote: > > > > @@ -832,8 +832,14 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3) > > > > if (memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) { > > > > memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)); > > > > kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR); > > > > - /* Ensure the dirty PDPTEs to be loaded. */ > > > > - kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); > > > > + /* > > > > + * Ensure the dirty PDPTEs to be loaded for VMX with EPT > > > > + * enabled or pae_root to be reconstructed for shadow paging. > > > > + */ > > > > + if (tdp_enabled) > > > > + kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); > > > > + else > > > > + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); > > > Shouldn't matter since it's legacy shadow paging, but @mmu should be used instead > > > of vcpu->arch.mmuvcpu->arch.mmu. > > > > In kvm/next actually there's no mmu parameter to load_pdptrs, so it's okay > > to keep vcpu->arch.mmu. > > > > > To avoid a dependency on the previous patch, I think it makes sense to have this be: > > > > > > if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) > > > kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT); > > > > > > before the memcpy(). > > > > > > Then we can decide independently if skipping the KVM_REQ_LOAD_MMU_PGD if the > > > PDPTRs are unchanged with respect to the MMU is safe. > > > > Do you disagree that there's already an invariant that the PDPTRs can only > > be dirty if KVM_REQ_LOAD_MMU_PGD---and therefore a previous change to the > > PDPTRs would have triggered KVM_REQ_LOAD_MMU_PGD? > > What I think is moot, because commit 24cd19a28cb7 ("KVM: X86: Update mmu->pdptrs > only when it is changed") breaks nested VMs with EPT in L0 and PAE shadow paging > in L2. Reproducing is trivial, just disable EPT in L1 and run a VM. I haven't > investigating how it breaks things, because why it's broken is secondary for me. > > My primary concern is that we would even consider optimizing the PDPTR logic without > a mountain of evidence that any patch is correct for all scenarios. We had to add > an entire ioctl() just to get PDPTRs functional. This apparently wasn't validated > against a simple use case, let alone against things like migration with nested VMs, > multliple L2s, etc... I did validate the *SREGS2* against all the cases I could (like migration, EPT/NPT disabled/etc. I even started testing SMM to see how it affects PDPTRs, and patched seabios to use PAE paging. I still could have missed something. But note that qemu still doesn't use that ioctl (patch stuck in review). Best regards, Maxim Levitsky >
On 12/11/21 07:56, Maxim Levitsky wrote: >> This apparently wasn't validated against a simple use case, let >> alone against things like migration with nested VMs, multliple L2s, >> etc... > > I did validate the *SREGS2* against all the cases I could (like > migration, EPT/NPT disabled/etc. I even started testing SMM to see > how it affects PDPTRs, and patched seabios to use PAE paging. I still > could have missed something. Don't worry, I think Sean was talking about patch 16 and specifically digging at me (who deserved it completely). Paolo
On Sat, Dec 11, 2021, Paolo Bonzini wrote: > On 12/11/21 07:56, Maxim Levitsky wrote: > > > This apparently wasn't validated against a simple use case, let > > > alone against things like migration with nested VMs, multliple L2s, > > > etc... > > > > I did validate the *SREGS2* against all the cases I could (like > > migration, EPT/NPT disabled/etc. I even started testing SMM to see > > how it affects PDPTRs, and patched seabios to use PAE paging. I still > > could have missed something. > > Don't worry, I think Sean was talking about patch 16 and specifically > digging at me (who deserved it completely). Yes, patch 16. My goal wasn't to dig at anyone, I just wanted to dramatically emphasize how ridiculousy fragile and complex the PDPTR crud is due to the number of edge cases.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0176eaa86a35..cfba337e46ab 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -832,8 +832,14 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3) if (memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) { memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)); kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR); - /* Ensure the dirty PDPTEs to be loaded. */ - kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); + /* + * Ensure the dirty PDPTEs to be loaded for VMX with EPT + * enabled or pae_root to be reconstructed for shadow paging. + */ + if (tdp_enabled) + kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); + else + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); } vcpu->arch.pdptrs_from_userspace = false;