Message ID | 20200820230545.2411347-1-pshier@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected | expand |
On Thu, Aug 20, 2020 at 04:05:45PM -0700, Peter Shier wrote: > When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call > load_pdptrs to read the possibly updated PDPTEs from the guest > physical address referenced by CR3. It loads them into > vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in > vcpu->arch.regs_dirty. > > At the subsequent assumed reentry into L2, the mmu will call > vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees > VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads > VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works > if the L2 CRn write intercept always resumes L2. > > The resume path calls vmx_check_nested_events which checks for > exceptions, MTF, and expired VMX preemption timers. If > vmx_check_nested_events finds any of these conditions pending it will > reflect the corresponding exit into L1. Live migration at this point > would also cause a missed immediate reentry into L2. > > After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which > clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. When L2 next > resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in > vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from > vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load > VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale > values stored at last L2 exit. A repro of this bug showed L2 entering > triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values. > > When L2 is in PAE paging mode add a call to ept_load_pdptrs before > leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in > vcpu->arch.walk_mmu->pdptrs[]. > > Tested: > kvm-unit-tests with new directed test: vmx_mtf_pdpte_test. > Verified that test fails without the fix. > > Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a > custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix > would repro readily. Ran 14 simultaneous L2s for 140 iterations with no > failures. > > Signed-off-by: Peter Shier <pshier@google.com> > Reviewed-by: Jim Mattson <jmattson@google.com> > --- Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
On Fri, Aug 21, 2020 at 8:25 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Thu, Aug 20, 2020 at 04:05:45PM -0700, Peter Shier wrote: > > When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call > > load_pdptrs to read the possibly updated PDPTEs from the guest > > physical address referenced by CR3. It loads them into > > vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in > > vcpu->arch.regs_dirty. > > > > At the subsequent assumed reentry into L2, the mmu will call > > vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees > > VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads > > VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works > > if the L2 CRn write intercept always resumes L2. > > > > The resume path calls vmx_check_nested_events which checks for > > exceptions, MTF, and expired VMX preemption timers. If > > vmx_check_nested_events finds any of these conditions pending it will > > reflect the corresponding exit into L1. Live migration at this point > > would also cause a missed immediate reentry into L2. > > > > After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which > > clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. When L2 next > > resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in > > vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from > > vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load > > VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale > > values stored at last L2 exit. A repro of this bug showed L2 entering > > triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values. > > > > When L2 is in PAE paging mode add a call to ept_load_pdptrs before > > leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in > > vcpu->arch.walk_mmu->pdptrs[]. > > > > Tested: > > kvm-unit-tests with new directed test: vmx_mtf_pdpte_test. > > Verified that test fails without the fix. > > > > Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a > > custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix > > would repro readily. Ran 14 simultaneous L2s for 140 iterations with no > > failures. > > > > Signed-off-by: Peter Shier <pshier@google.com> > > Reviewed-by: Jim Mattson <jmattson@google.com> > > --- > > Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Ping. Thx
On 21/08/20 01:05, Peter Shier wrote: > After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which > clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. When L2 next > resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in > vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from > vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load > VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale > values stored at last L2 exit. A repro of this bug showed L2 entering > triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values. > > When L2 is in PAE paging mode add a call to ept_load_pdptrs before > leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in > vcpu->arch.walk_mmu->pdptrs[]. Queued with an improved comment: /* - * Ensure that the VMCS02 PDPTR fields are up-to-date before switching - * to L1. + * VCPU_EXREG_PDPTR will be clobbered in arch/x86/kvm/vmx/vmx.h between + * now and the new vmentry. Ensure that the VMCS02 PDPTR fields are + * up-to-date before switching to L1. */ I am currently on leave so I am going through the patches and queuing them, but I will only push kvm/next and kvm/queue next week. kvm/master patches will be sent to Linus for the next -rc though. Paolo
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 23b58c28a1c9..4d46025213e9 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4404,6 +4404,13 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason, if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) kvm_vcpu_flush_tlb_current(vcpu); + /* + * Ensure that the VMCS02 PDPTR fields are up-to-date before switching + * to L1. + */ + if (enable_ept && is_pae_paging(vcpu)) + vmx_ept_load_pdptrs(vcpu); + leave_guest_mode(vcpu); if (nested_cpu_has_preemption_timer(vmcs12)) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 46ba2e03a892..19a599bebd5c 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2971,7 +2971,7 @@ static void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu) vpid_sync_context(to_vmx(vcpu)->vpid); } -static void ept_load_pdptrs(struct kvm_vcpu *vcpu) +void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu) { struct kvm_mmu *mmu = vcpu->arch.walk_mmu; @@ -3114,7 +3114,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd, guest_cr3 = vcpu->arch.cr3; else /* vmcs01.GUEST_CR3 is already up-to-date. */ update_guest_cr3 = false; - ept_load_pdptrs(vcpu); + vmx_ept_load_pdptrs(vcpu); } else { guest_cr3 = pgd; } diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 26175a4759fa..a2f82127c170 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -356,7 +356,7 @@ void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp); int vmx_find_msr_index(struct vmx_msrs *m, u32 msr); int vmx_handle_memory_failure(struct kvm_vcpu *vcpu, int r, struct x86_exception *e); +void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu); #define POSTED_INTR_ON 0 #define POSTED_INTR_SN 1 --