diff mbox series

[1/2] KVM: nVMX: Stash L1's CR3 in vmcs01.GUEST_CR3 on nested entry w/o EPT

Message ID 20190520201029.7126-2-sean.j.christopherson@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: nVMX: Alternative no-EPT GUEST_CR3 fix | expand

Commit Message

Sean Christopherson May 20, 2019, 8:10 p.m. UTC
KVM does not have 100% coverage of VMX consistency checks, i.e. some
checks that cause VM-Fail may only be detected by hardware during a
nested VM-Entry.  In such a case, KVM must restore L1's state to the
pre-VM-Enter state as L2's state has already been loaded into KVM's
software model.

L1's CR3 and PDPTRs in particular are loaded from vmcs01.GUEST_*.  But
when EPT is disabled, the associated fields hold KVM's shadow values,
not L1's "real" values.  Fortunately, when EPT is disabled the PDPTRs
come from memory, i.e. are not cached in the VMCS.  Which leaves CR3
as the sole anomaly.  Handle CR3 by overwriting vmcs01.GUEST_CR3 with
L1's CR3 during the nested VM-Entry when EPT is disabled *and* nested
early checks are disabled, so that nested_vmx_restore_host_state() will
naturally restore the correct vcpu->arch.cr3 from vmcs01.GUEST_CR3.

Note, these shenanigans work because nested_vmx_restore_host_state()
does a full kvm_mmu_reset_context(), i.e. unloads the current MMU,
which guarantees vmcs01.GUEST_CR3 will be rewritten with a new shadow
CR3 prior to re-entering L1.  Writing vmcs01.GUEST_CR3 is done if and
only if nested early checks are disabled as "late" VM-Fail should never
happen in that case (KVM WARNs), and the conditional write avoids the
need to restore the correct GUEST_CR3 when nested_vmx_check_vmentry_hw()
fails.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: bd18bffca353 ("KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini June 6, 2019, 12:24 p.m. UTC | #1
On 20/05/19 22:10, Sean Christopherson wrote:
> KVM does not have 100% coverage of VMX consistency checks, i.e. some
> checks that cause VM-Fail may only be detected by hardware during a
> nested VM-Entry.  In such a case, KVM must restore L1's state to the
> pre-VM-Enter state as L2's state has already been loaded into KVM's
> software model.
> 
> L1's CR3 and PDPTRs in particular are loaded from vmcs01.GUEST_*.  But
> when EPT is disabled, the associated fields hold KVM's shadow values,
> not L1's "real" values.  Fortunately, when EPT is disabled the PDPTRs
> come from memory, i.e. are not cached in the VMCS.  Which leaves CR3
> as the sole anomaly.  Handle CR3 by overwriting vmcs01.GUEST_CR3 with
> L1's CR3 during the nested VM-Entry when EPT is disabled *and* nested
> early checks are disabled, so that nested_vmx_restore_host_state() will
> naturally restore the correct vcpu->arch.cr3 from vmcs01.GUEST_CR3.
> 
> Note, these shenanigans work because nested_vmx_restore_host_state()
> does a full kvm_mmu_reset_context(), i.e. unloads the current MMU,
> which guarantees vmcs01.GUEST_CR3 will be rewritten with a new shadow
> CR3 prior to re-entering L1.  Writing vmcs01.GUEST_CR3 is done if and
> only if nested early checks are disabled as "late" VM-Fail should never
> happen in that case (KVM WARNs), and the conditional write avoids the
> need to restore the correct GUEST_CR3 when nested_vmx_check_vmentry_hw()
> fails.
> 
> Reported-by: Paolo Bonzini <pbonzini@redhat.com>
> Fixes: bd18bffca353 ("KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 1032f068f0b9..92117092f6e9 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2963,6 +2963,8 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>  	if (kvm_mpx_supported() &&
>  		!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
>  		vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);

This hunk needs a comment that says basically the same things that are
in the commit message.

	/* Temporarily overwrite vmcs01.GUEST_CR3 with L1's CR3 during
	 * the nested VM-Entry when EPT is disabled *and* nested early
	 * checks are disabled, because nested_vmx_restore_host_state()
	 * wants to restore the correct vcpu->arch.cr3 from
	 * vmcs01.GUEST_CR3.  On nested vmexit, kvm_mmu_reset_context()
	 * will overwrite GUEST_CR3 with the shadow CR3 prior to
	 * re-entering L1.  This is not needed when nested early checks
	 * are enabled, because it doesn't reload the MMU until
	 * after the checks have succeeded.
	 */

And also, the restore of the correct vcpu->arch.cr3 from
vmcs01.GUEST_CR3 can be moved to this patch otherwise the comment would
not be accurate.

But, as I said in the review of patch 2, I'm a bit lost as to how
kvm_mmu_reset_context() is enough to reload the shadow CR3 into the vmcs01.

Paolo

> +	if (!enable_ept && !nested_early_check)
> +		vmcs_writel(GUEST_CR3, vcpu->arch.cr3);
>  
>  	vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02);
>  
> @@ -3794,7 +3796,8 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
>  	 * VMFail, like everything else we just need to ensure our
>  	 * software model is up-to-date.
>  	 */
> -	ept_save_pdptrs(vcpu);
> +	if (enable_ept)
> +		ept_save_pdptrs(vcpu);
>  
>  	kvm_mmu_reset_context(vcpu);
>  
>
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1032f068f0b9..92117092f6e9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2963,6 +2963,8 @@  int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
 	if (kvm_mpx_supported() &&
 		!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
 		vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
+	if (!enable_ept && !nested_early_check)
+		vmcs_writel(GUEST_CR3, vcpu->arch.cr3);
 
 	vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02);
 
@@ -3794,7 +3796,8 @@  static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
 	 * VMFail, like everything else we just need to ensure our
 	 * software model is up-to-date.
 	 */
-	ept_save_pdptrs(vcpu);
+	if (enable_ept)
+		ept_save_pdptrs(vcpu);
 
 	kvm_mmu_reset_context(vcpu);