KVM: nVMX: Don't leak L1 MMIO regions to L2

Message ID	20191004175203.145954-1-jmattson@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=SJcU=X5=vger.kernel.org=kvm-owner@kernel.org> Date: Fri, 4 Oct 2019 10:52:03 -0700 Message-Id: <20191004175203.145954-1-jmattson@google.com> Mime-Version: 1.0 Subject: [PATCH] KVM: nVMX: Don't leak L1 MMIO regions to L2 From: Jim Mattson <jmattson@google.com> To: kvm@vger.kernel.org Cc: Jim Mattson <jmattson@google.com>, Dan Cross <dcross@google.com>, Marc Orr <marcorr@google.com>, Peter Shier <pshier@google.com>, Liran Alon <liran.alon@oracle.com> Content-Type: text/plain; charset="UTF-8" Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	KVM: nVMX: Don't leak L1 MMIO regions to L2 \| expand KVM: nVMX: Don't leak L1 MMIO regions to L2

Message ID

20191004175203.145954-1-jmattson@google.com (mailing list archive)

State

New, archived

Headers

Date: Fri,  4 Oct 2019 10:52:03 -0700
Message-Id: <20191004175203.145954-1-jmattson@google.com>
Mime-Version: 1.0
Subject: [PATCH] KVM: nVMX: Don't leak L1 MMIO regions to L2
From: Jim Mattson <jmattson@google.com>
To: kvm@vger.kernel.org
Cc: Jim Mattson <jmattson@google.com>, Dan Cross <dcross@google.com>,
        Marc Orr <marcorr@google.com>, Peter Shier <pshier@google.com>,
        Liran Alon <liran.alon@oracle.com>
Content-Type: text/plain; charset="UTF-8"
Sender: kvm-owner@vger.kernel.org
Precedence: bulk

Series

KVM: nVMX: Don't leak L1 MMIO regions to L2 | expand

Commit Message

Jim Mattson Oct. 4, 2019, 5:52 p.m. UTC

If the "virtualize APIC accesses" VM-execution control is set in the
VMCS, the APIC virtualization hardware is triggered when a page walk
in VMX non-root mode terminates at a PTE wherein the address of the 4k
page frame matches the APIC-access address specified in the VMCS. On
hardware, the APIC-access address may be any valid 4k-aligned physical
address.

KVM's nVMX implementation enforces the additional constraint that the
APIC-access address specified in the vmcs12 must be backed by
cacheable memory in L1. If not, L0 will simply clear the "virtualize
APIC accesses" VM-execution control in the vmcs02.

The problem with this approach is that the L1 guest has arranged the
vmcs12 EPT tables--or shadow page tables, if the "enable EPT"
VM-execution control is clear in the vmcs12--so that the L2 guest
physical address(es)--or L2 guest linear address(es)--that reference
the L2 APIC map to the APIC-access address specified in the
vmcs12. Without the "virtualize APIC accesses" VM-execution control in
the vmcs02, the APIC accesses in the L2 guest will directly access the
APIC-access page in L1.

When there is no mapping whatsoever for the APIC-access address in L1,
the L2 VM just loses the intended APIC virtualization. However, when
the APIC-access address is mapped to an MMIO region in L1, the L2
guest gets direct access to the L1 MMIO device. For example, if the
APIC-access address specified in the vmcs12 is 0xfee00000, then L2
gets direct access to L1's APIC.

Since this vmcs12 configuration is something that KVM cannot
faithfully emulate, the appropriate response is to exit to userspace
with KVM_INTERNAL_ERROR_EMULATION.

Fixes: fe3ef05c7572 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12")
Reported-by: Dan Cross <dcross@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Dan Cross <dcross@google.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
---
v1: Same code as the RFC.

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/vmx/nested.c       | 46 ++++++++++++++++++---------------
 arch/x86/kvm/x86.c              |  9 +++++--
 3 files changed, 33 insertions(+), 24 deletions(-)

Comments

Paolo Bonzini Oct. 7, 2019, 10:07 a.m. UTC | #1

As usual, nothing to say about the behavior, just about the code...

On 04/10/19 19:52, Jim Mattson wrote:
> + * Returns:
> + *   0 - success, i.e. proceed with actual VMEnter
> + *  -EFAULT - consistency check VMExit
> + *  -EINVAL - consistency check VMFail
> + *  -ENOTSUPP - kvm internal error
>   */

... the error codes do not mean much here.  Can you define an enum instead?

(I also thought about passing the exit reason, where bit 31 could be
used to distinguish VMX instruction failure from an entry failure
VMexit, which sounds cleaner if you just look at the prototype but
becomes messy fairly quickly because you have to pass back the exit
qualification too.  The from_vmentry argument could become u32
*p_exit_qual and be NULL if not called from VMentry, but it doesn't seem
worthwhile at all).

Thanks,

Paolo

>  int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>  {
> @@ -3045,6 +3044,7 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>  	bool evaluate_pending_interrupts;
>  	u32 exit_reason = EXIT_REASON_INVALID_STATE;
>  	u32 exit_qual;
> +	int r;
>  
>  	evaluate_pending_interrupts = exec_controls_get(vmx) &
>  		(CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING);
> @@ -3081,11 +3081,13 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>  	prepare_vmcs02_early(vmx, vmcs12);
>  
>  	if (from_vmentry) {
> -		nested_get_vmcs12_pages(vcpu);
> +		r = nested_get_vmcs12_pages(vcpu);
> +		if (unlikely(r))
> +			return r;
>  
>  		if (nested_vmx_check_vmentry_hw(vcpu)) {
>  			vmx_switch_vmcs(vcpu, &vmx->vmcs01);
> -			return -1;
> +			return -EINVAL;
>  		}
>  
>  		if (nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual))
> @@ -3165,14 +3167,14 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>  	vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>  
>  	if (!from_vmentry)
> -		return 1;
> +		return -EFAULT;
>  
>  	load_vmcs12_host_state(vcpu, vmcs12);
>  	vmcs12->vm_exit_reason = exit_reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
>  	vmcs12->exit_qualification = exit_qual;
>  	if (enable_shadow_vmcs || vmx->nested.hv_evmcs)
>  		vmx->nested.need_vmcs12_to_shadow_sync = true;
> -	return 1;
> +	return -EFAULT;
>  }
>  
>  /*
> @@ -3246,11 +3248,13 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
>  	vmx->nested.nested_run_pending = 1;
>  	ret = nested_vmx_enter_non_root_mode(vcpu, true);
>  	vmx->nested.nested_run_pending = !ret;
> -	if (ret > 0)
> -		return 1;
> -	else if (ret)
> +	if (ret == -EINVAL)
>  		return nested_vmx_failValid(vcpu,
>  			VMXERR_ENTRY_INVALID_CONTROL_FIELD);
> +	else if (ret == -ENOTSUPP)
> +		return 0;
> +	else if (ret)
> +		return 1;
>  
>  	/* Hide L1D cache contents from the nested guest.  */
>  	vmx->vcpu.arch.l1tf_flush_l1d = true;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e6b5cfe3c345..e8b04560f064 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7931,8 +7931,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	bool req_immediate_exit = false;
>  
>  	if (kvm_request_pending(vcpu)) {
> -		if (kvm_check_request(KVM_REQ_GET_VMCS12_PAGES, vcpu))
> -			kvm_x86_ops->get_vmcs12_pages(vcpu);
> +		if (kvm_check_request(KVM_REQ_GET_VMCS12_PAGES, vcpu)) {
> +			r = kvm_x86_ops->get_vmcs12_pages(vcpu);
> +			if (unlikely(r)) {
> +				r = 0;
> +				goto out;
> +			}
> +		}
>  		if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
>  			kvm_mmu_unload(vcpu);
>  		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))

Sean Christopherson Oct. 7, 2019, 3:18 p.m. UTC | #2

On Mon, Oct 07, 2019 at 12:07:03PM +0200, Paolo Bonzini wrote:
> As usual, nothing to say about the behavior, just about the code...
> 
> On 04/10/19 19:52, Jim Mattson wrote:
> > + * Returns:
> > + *   0 - success, i.e. proceed with actual VMEnter
> > + *  -EFAULT - consistency check VMExit
> > + *  -EINVAL - consistency check VMFail
> > + *  -ENOTSUPP - kvm internal error
> >   */
> 
> ... the error codes do not mean much here.  Can you define an enum instead?

Agreed.  Liran also suggested an enum.

> (I also thought about passing the exit reason, where bit 31 could be
> used to distinguish VMX instruction failure from an entry failure
> VMexit, which sounds cleaner if you just look at the prototype but
> becomes messy fairly quickly because you have to pass back the exit
> qualification too.  The from_vmentry argument could become u32
> *p_exit_qual and be NULL if not called from VMentry, but it doesn't seem
> worthwhile at all).

Ya.  I also tried (and failed) to find a clever solution that didn't
require a multi-state return value.  As much as I dislike returning an
enum, it's the lesser of all evils.

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 50eb430b0ad8..cc4a9e90d5f8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1189,7 +1189,7 @@  struct kvm_x86_ops {
 	int (*set_nested_state)(struct kvm_vcpu *vcpu,
 				struct kvm_nested_state __user *user_kvm_nested_state,
 				struct kvm_nested_state *kvm_state);
-	void (*get_vmcs12_pages)(struct kvm_vcpu *vcpu);
+	int (*get_vmcs12_pages)(struct kvm_vcpu *vcpu);
 
 	int (*smi_allowed)(struct kvm_vcpu *vcpu);
 	int (*pre_enter_smm)(struct kvm_vcpu *vcpu, char *smstate);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 41abc62c9a8a..7d55292a3b4b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2917,7 +2917,7 @@  static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu)
 static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 						 struct vmcs12 *vmcs12);
 
-static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
+static int nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -2937,19 +2937,16 @@  static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
 			vmx->nested.apic_access_page = NULL;
 		}
 		page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr);
-		/*
-		 * If translation failed, no matter: This feature asks
-		 * to exit when accessing the given address, and if it
-		 * can never be accessed, this feature won't do
-		 * anything anyway.
-		 */
 		if (!is_error_page(page)) {
 			vmx->nested.apic_access_page = page;
 			hpa = page_to_phys(vmx->nested.apic_access_page);
 			vmcs_write64(APIC_ACCESS_ADDR, hpa);
 		} else {
-			secondary_exec_controls_clearbit(vmx,
-				SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
+			vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+			vcpu->run->internal.suberror =
+				KVM_INTERNAL_ERROR_EMULATION;
+			vcpu->run->internal.ndata = 0;
+			return -ENOTSUPP;
 		}
 	}
 
@@ -2994,6 +2991,7 @@  static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
 		exec_controls_setbit(vmx, CPU_BASED_USE_MSR_BITMAPS);
 	else
 		exec_controls_clearbit(vmx, CPU_BASED_USE_MSR_BITMAPS);
+	return 0;
 }
 
 /*
@@ -3032,11 +3030,12 @@  static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 /*
  * If from_vmentry is false, this is being called from state restore (either RSM
  * or KVM_SET_NESTED_STATE).  Otherwise it's called from vmlaunch/vmresume.
-+ *
-+ * Returns:
-+ *   0 - success, i.e. proceed with actual VMEnter
-+ *   1 - consistency check VMExit
-+ *  -1 - consistency check VMFail
+ *
+ * Returns:
+ *   0 - success, i.e. proceed with actual VMEnter
+ *  -EFAULT - consistency check VMExit
+ *  -EINVAL - consistency check VMFail
+ *  -ENOTSUPP - kvm internal error
  */
 int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
 {
@@ -3045,6 +3044,7 @@  int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
 	bool evaluate_pending_interrupts;
 	u32 exit_reason = EXIT_REASON_INVALID_STATE;
 	u32 exit_qual;
+	int r;
 
 	evaluate_pending_interrupts = exec_controls_get(vmx) &
 		(CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_VIRTUAL_NMI_PENDING);
@@ -3081,11 +3081,13 @@  int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
 	prepare_vmcs02_early(vmx, vmcs12);
 
 	if (from_vmentry) {
-		nested_get_vmcs12_pages(vcpu);
+		r = nested_get_vmcs12_pages(vcpu);
+		if (unlikely(r))
+			return r;
 
 		if (nested_vmx_check_vmentry_hw(vcpu)) {
 			vmx_switch_vmcs(vcpu, &vmx->vmcs01);
-			return -1;
+			return -EINVAL;
 		}
 
 		if (nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual))
@@ -3165,14 +3167,14 @@  int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
 	vmx_switch_vmcs(vcpu, &vmx->vmcs01);
 
 	if (!from_vmentry)
-		return 1;
+		return -EFAULT;
 
 	load_vmcs12_host_state(vcpu, vmcs12);
 	vmcs12->vm_exit_reason = exit_reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
 	vmcs12->exit_qualification = exit_qual;
 	if (enable_shadow_vmcs || vmx->nested.hv_evmcs)
 		vmx->nested.need_vmcs12_to_shadow_sync = true;
-	return 1;
+	return -EFAULT;
 }
 
 /*
@@ -3246,11 +3248,13 @@  static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	vmx->nested.nested_run_pending = 1;
 	ret = nested_vmx_enter_non_root_mode(vcpu, true);
 	vmx->nested.nested_run_pending = !ret;
-	if (ret > 0)
-		return 1;
-	else if (ret)
+	if (ret == -EINVAL)
 		return nested_vmx_failValid(vcpu,
 			VMXERR_ENTRY_INVALID_CONTROL_FIELD);
+	else if (ret == -ENOTSUPP)
+		return 0;
+	else if (ret)
+		return 1;
 
 	/* Hide L1D cache contents from the nested guest.  */
 	vmx->vcpu.arch.l1tf_flush_l1d = true;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e6b5cfe3c345..e8b04560f064 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7931,8 +7931,13 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	bool req_immediate_exit = false;
 
 	if (kvm_request_pending(vcpu)) {
-		if (kvm_check_request(KVM_REQ_GET_VMCS12_PAGES, vcpu))
-			kvm_x86_ops->get_vmcs12_pages(vcpu);
+		if (kvm_check_request(KVM_REQ_GET_VMCS12_PAGES, vcpu)) {
+			r = kvm_x86_ops->get_vmcs12_pages(vcpu);
+			if (unlikely(r)) {
+				r = 0;
+				goto out;
+			}
+		}
 		if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
 			kvm_mmu_unload(vcpu);
 		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))

KVM: nVMX: Don't leak L1 MMIO regions to L2

Commit Message

Comments

Patch