diff mbox

[11/11] nEPT: Provide the correct exit qualification upon EPT

Message ID B3957F502244574B99B84CAB927E0E61EC344E@SHSMSX101.ccr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Xu, Xinhao April 27, 2013, 6:42 a.m. UTC
From 2df72c1e8e3b167a6008ab11e2a68d734c68e425 Mon Sep 17 00:00:00 2001
From: Xinhao Xu <xinhao.xu@intel.com>
Date: Sat, 27 Apr 2013 05:45:49 +0800
Subject: [PATCH] nEPT: Check EPT misconfiguration while walking addr & move pte
 check code to vmx.c

I add code to detect EPT misconfiguration and inject it to L1.
Now L1 can correctly go to ept_misconfig handler(instead of
wrongly going to fast_page_fault), it will try to handle mmio
page fault, if failed, it is a real ept misconfiguration.

For scalability, Xiantao suggests me moving vendor specific
code out from common code. In order to do this, I add new ops
in kvm_mmu struct, check_tdp_pte, to provide an interface to
check fault while walking address. The better way is to do
fault detecting at here, but so far I just have checked ept
misconfiguration. More patches will be added in future.

Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
---
 arch/x86/include/asm/kvm_emulate.h |    2 +
 arch/x86/include/asm/kvm_host.h    |    2 +
 arch/x86/kvm/cpuid.c               |    1 +
 arch/x86/kvm/mmu.c                 |    5 --
 arch/x86/kvm/mmu.h                 |    5 ++
 arch/x86/kvm/paging_tmpl.h         |   25 ++++++++---
 arch/x86/kvm/vmx.c                 |   79 +++++++++++++++++++++++++++++++++++-
 7 files changed, 106 insertions(+), 13 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jan Kiszka April 28, 2013, 9:35 a.m. UTC | #1
On 2013-04-27 08:42, Xu, Xinhao wrote:
> From 2df72c1e8e3b167a6008ab11e2a68d734c68e425 Mon Sep 17 00:00:00 2001
> From: Xinhao Xu <xinhao.xu@intel.com>
> Date: Sat, 27 Apr 2013 05:45:49 +0800
> Subject: [PATCH] nEPT: Check EPT misconfiguration while walking addr & move pte
>  check code to vmx.c

Please fix up the subject when posting a patch in reply to another one.

> 
> I add code to detect EPT misconfiguration and inject it to L1.
> Now L1 can correctly go to ept_misconfig handler(instead of
> wrongly going to fast_page_fault), it will try to handle mmio
> page fault, if failed, it is a real ept misconfiguration.
> 
> For scalability, Xiantao suggests me moving vendor specific
> code out from common code. In order to do this, I add new ops
> in kvm_mmu struct, check_tdp_pte, to provide an interface to
> check fault while walking address. The better way is to do
> fault detecting at here, but so far I just have checked ept
> misconfiguration. More patches will be added in future.

Seems there are some issues remaining. I can boot Linux as L2 when I
remove this patch. When it's applied, L2 becomes pretty slow and
eventually resets during kernel boot of L2. L1 remains stable.

Jan
Xu, Xinhao May 2, 2013, 6:59 a.m. UTC | #2
Hi, Jan
Can you provide details of your test environment?

-----Original Message-----
From: Jan Kiszka [mailto:jan.kiszka@web.de] 
Sent: Sunday, April 28, 2013 5:36 PM
To: Xu, Xinhao
Cc: Nakajima, Jun; kvm@vger.kernel.org
Subject: Re: [PATCH 11/11] nEPT: Provide the correct exit qualification upon EPT

On 2013-04-27 08:42, Xu, Xinhao wrote:
> From 2df72c1e8e3b167a6008ab11e2a68d734c68e425 Mon Sep 17 00:00:00 2001
> From: Xinhao Xu <xinhao.xu@intel.com>
> Date: Sat, 27 Apr 2013 05:45:49 +0800
> Subject: [PATCH] nEPT: Check EPT misconfiguration while walking addr & 
> move pte  check code to vmx.c

Please fix up the subject when posting a patch in reply to another one.

> 
> I add code to detect EPT misconfiguration and inject it to L1.
> Now L1 can correctly go to ept_misconfig handler(instead of wrongly 
> going to fast_page_fault), it will try to handle mmio page fault, if 
> failed, it is a real ept misconfiguration.
> 
> For scalability, Xiantao suggests me moving vendor specific code out 
> from common code. In order to do this, I add new ops in kvm_mmu 
> struct, check_tdp_pte, to provide an interface to check fault while 
> walking address. The better way is to do fault detecting at here, but 
> so far I just have checked ept misconfiguration. More patches will be 
> added in future.

Seems there are some issues remaining. I can boot Linux as L2 when I remove this patch. When it's applied, L2 becomes pretty slow and eventually resets during kernel boot of L2. L1 remains stable.

Jan


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka May 2, 2013, 8:50 a.m. UTC | #3
On 2013-05-02 08:59, Xu, Xinhao wrote:
> Hi, Jan
> Can you provide details of your test environment?

Pretty simple: Run a Linux 3.9(-rc5) kernel as L2 on top of the same
kernel with QEMU (git head at that time). I've attached the config.

Jan

> 
> -----Original Message-----
> From: Jan Kiszka [mailto:jan.kiszka@web.de] 
> Sent: Sunday, April 28, 2013 5:36 PM
> To: Xu, Xinhao
> Cc: Nakajima, Jun; kvm@vger.kernel.org
> Subject: Re: [PATCH 11/11] nEPT: Provide the correct exit qualification upon EPT
> 
> On 2013-04-27 08:42, Xu, Xinhao wrote:
>> From 2df72c1e8e3b167a6008ab11e2a68d734c68e425 Mon Sep 17 00:00:00 2001
>> From: Xinhao Xu <xinhao.xu@intel.com>
>> Date: Sat, 27 Apr 2013 05:45:49 +0800
>> Subject: [PATCH] nEPT: Check EPT misconfiguration while walking addr & 
>> move pte  check code to vmx.c
> 
> Please fix up the subject when posting a patch in reply to another one.
> 
>>
>> I add code to detect EPT misconfiguration and inject it to L1.
>> Now L1 can correctly go to ept_misconfig handler(instead of wrongly 
>> going to fast_page_fault), it will try to handle mmio page fault, if 
>> failed, it is a real ept misconfiguration.
>>
>> For scalability, Xiantao suggests me moving vendor specific code out 
>> from common code. In order to do this, I add new ops in kvm_mmu 
>> struct, check_tdp_pte, to provide an interface to check fault while 
>> walking address. The better way is to do fault detecting at here, but 
>> so far I just have checked ept misconfiguration. More patches will be 
>> added in future.
> 
> Seems there are some issues remaining. I can boot Linux as L2 when I remove this patch. When it's applied, L2 becomes pretty slow and eventually resets during kernel boot of L2. L1 remains stable.
> 
> Jan
> 
>
diff mbox

Patch

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 15f960c..c74f7c6 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -21,6 +21,8 @@  struct x86_exception {
 	u8 vector;
 	bool error_code_valid;
 	u16 error_code;
+	u64 exit_qualification;
+	bool fault_checked_by_vendor; /* is this fault checked by vendor */
 	bool nested_page_fault;
 	u64 address; /* cr2 or nested page fault gpa */
 };
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e029bba..0fedfcd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -262,6 +262,8 @@  struct kvm_mmu {
 	void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva);
 	void (*update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 			   u64 *spte, const void *pte);
+	bool (*check_tdp_pte)(struct kvm_vcpu *vcpu, u64 pte, int level,
+			struct x86_exception *fault);
 	hpa_t root_hpa;
 	int root_level;
 	int shadow_root_level;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a20ecb5..51dd1f9 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -621,6 +621,7 @@  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
 not_found:
 	return 36;
 }
+EXPORT_SYMBOL_GPL(cpuid_maxphyaddr);
 
 /*
  * If no match is found, check whether we exceed the vCPU's limit
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 99bfc5e..e1f5ea1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -230,11 +230,6 @@  static bool set_mmio_spte(u64 *sptep, gfn_t gfn, pfn_t pfn, unsigned access)
 	return false;
 }
 
-static inline u64 rsvd_bits(int s, int e)
-{
-	return ((1ULL << (e - s + 1)) - 1) << s;
-}
-
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 		u64 dirty_mask, u64 nx_mask, u64 x_mask)
 {
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 19dd5ab..8aebd5a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -91,6 +91,11 @@  static inline bool is_write_protection(struct kvm_vcpu *vcpu)
 	return kvm_read_cr0_bits(vcpu, X86_CR0_WP);
 }
 
+static inline u64 rsvd_bits(int s, int e)
+{
+	return ((1ULL << (e - s + 1)) - 1) << s;
+}
+
 /*
  * Will a fault with a given page-fault error code (pfec) cause a permission
  * fault with the given access (in ACC_* format)?
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index bd370e7..09e7154 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -240,6 +240,7 @@  retry_walk:
 	}
 #endif
 	walker->max_level = walker->level;
+	walker->fault.fault_checked_by_vendor = false;
 	ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
 	       (mmu->get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
 
@@ -282,14 +283,22 @@  retry_walk:
 		if (unlikely(!FNAME(is_present_gpte)(pte)))
 			goto error;
 
+#if PTTYPE != PTTYPE_EPT
 		if (unlikely(is_rsvd_bits_set(&vcpu->arch.mmu, pte,
 					      walker->level))) {
 			errcode |= PFERR_PRESENT_MASK;
-#if PTTYPE != PTTYPE_EPT
 			errcode |= PFERR_RSVD_MASK;
-#endif
 			goto error;
 		}
+#else
+                if (vcpu->arch.mmu.check_tdp_pte)
+                {
+                        if (vcpu->arch.mmu.check_tdp_pte(vcpu, pte,
+                                walker->level, &walker->fault))
+                        goto error;
+                }
+
+#endif
 
 		accessed_dirty &= pte;
 		pte_access = pt_access & FNAME(gpte_access)(vcpu, pte);
@@ -349,12 +358,14 @@  error:
 
 	walker->fault.vector = PF_VECTOR;
 	walker->fault.error_code_valid = true;
-#if PTTYPE != PTTYPE_EPT
 	walker->fault.error_code = errcode;
-#else
-	/* Reuse bits [2:0] of EPT violation */
-	walker->fault.error_code = vcpu->arch.exit_qualification & 0x7;
-#endif
+	/*
+	 * If it is not fault checked in  check_tdp_pte, for now, it is a
+	 * EPT violation, set correct exit qualification according to Intel
+	 * SDM. This part will be changed in future.*/
+        if (!walker->fault.fault_checked_by_vendor)
+                walker->fault.exit_qualification = ((walker->pte_access &
+					0x7) << 3) | access;
 	walker->fault.address = addr;
 	walker->fault.nested_page_fault = mmu != vcpu->arch.walk_mmu;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 61e2853..a450c84 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7006,10 +7006,86 @@  static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
 	 * Note no need to set vmcs12->vm_exit_reason as it is already copied
 	 * from vmcs02 in nested_vmx_vmexit() above, i.e., EPT_VIOLATION.
 	 */
-	vmcs12->exit_qualification = fault->error_code;
+	if (fault->fault_checked_by_vendor && fault->exit_qualification == 0)
+                vmcs12->vm_exit_reason = EXIT_REASON_EPT_MISCONFIG;
+        else
+                vmcs12->vm_exit_reason = EXIT_REASON_EPT_VIOLATION;
+        vmcs12->exit_qualification = fault->exit_qualification;
 	vmcs12->guest_physical_address = fault->address;
 }
 
+static bool nested_ept_rsvd_bits_check(struct kvm_vcpu *vcpu, u64 pte, int level)
+{
+	int maxphyaddr = cpuid_maxphyaddr(vcpu);
+	u64 rsvd_mask = rsvd_bits(maxphyaddr, 51);
+
+	switch (level)
+	{
+		case 4:
+			rsvd_mask |= rsvd_bits(3, 7);
+			break;
+		case 3:
+		case 2:
+			if (pte & (1 << 7))
+				rsvd_mask |= rsvd_bits(PAGE_SHIFT, PAGE_SHIFT + 9 * (level - 1) - 1);
+			else
+				rsvd_mask |= rsvd_bits(3, 6);
+			break;
+		case 1:
+			break;
+		default:
+			/* impossible to go to here */
+			BUG();
+	}
+	return pte & rsvd_mask;
+}
+
+static bool nested_ept_rwx_bits_check(u64 pte)
+{
+	/* write only or write/execute only */
+	uint8_t rwx_bits = pte & 7;
+
+	switch (rwx_bits)
+	{
+		case 0x2:
+		case 0x6:
+			return 1;
+		case 0x4:
+			if (!(nested_vmx_ept_caps & 0x1))
+				return 1;
+		default:
+			return 0;
+	}
+}
+
+static bool nested_ept_memtype_check(u64 pte, int level)
+{
+	if (level == 1 || (level == 2 && (pte & (1ULL << 7)))) {
+		/* 0x38, namely bits 5:3, stands for EPT memory type */
+		u64 ept_mem_type = (pte & 0x38) >> 3;
+
+		if (ept_mem_type == 0x2 || ept_mem_type == 0x3 ||
+			ept_mem_type == 0x7)
+			return 1;
+	}
+	return 0;
+}
+
+bool nested_check_ept_pte(struct kvm_vcpu *vcpu, u64 pte, int level,
+				struct x86_exception *fault)
+{
+	bool r;
+	r = nested_ept_rsvd_bits_check(vcpu, pte, level) ||
+		nested_ept_rwx_bits_check(pte) ||
+		nested_ept_memtype_check(pte, level);
+	if (r)
+	{
+		fault->exit_qualification = 0;
+		fault->fault_checked_by_vendor = true;
+	}
+	return r;
+}
+
 static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
 {
 	int r = kvm_init_shadow_EPT_mmu(vcpu, &vcpu->arch.mmu);
@@ -7017,6 +7093,7 @@  static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu.set_cr3           = vmx_set_cr3;
 	vcpu->arch.mmu.get_cr3           = nested_ept_get_cr3;
 	vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
+	vcpu->arch.mmu.check_tdp_pte     = nested_check_ept_pte;
 
 	vcpu->arch.walk_mmu              = &vcpu->arch.nested_mmu;
 
-- 
1.7.1


-----Original Message-----
From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf Of Jun Nakajima
Sent: Friday, April 26, 2013 2:44 PM
To: kvm@vger.kernel.org
Subject: [PATCH 11/11] nEPT: Provide the correct exit qualification upon EPT

Save [2:0] of exit qualificaiton at EPT violation, and use the information when injecting EPT violation.

Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/paging_tmpl.h      | 5 +++++
 arch/x86/kvm/vmx.c              | 3 +++
 3 files changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4979778..e029bba 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -504,6 +504,8 @@  struct kvm_vcpu_arch {
 	 * instruction.
 	 */
 	bool write_fault_to_shadow_pgtable;
+
+	unsigned long exit_qualification; /* set at EPT violation at this 
+point */
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index e13b6c5..bd370e7 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -349,7 +349,12 @@  error:
 
 	walker->fault.vector = PF_VECTOR;
 	walker->fault.error_code_valid = true;
+#if PTTYPE != PTTYPE_EPT
 	walker->fault.error_code = errcode;
+#else
+	/* Reuse bits [2:0] of EPT violation */
+	walker->fault.error_code = vcpu->arch.exit_qualification & 0x7; #endif
 	walker->fault.address = addr;
 	walker->fault.nested_page_fault = mmu != vcpu->arch.walk_mmu;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 95304cc..61e2853 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -425,6 +425,7 @@  struct vcpu_vmx {
 	ktime_t entry_time;
 	s64 vnmi_blocked_time;
 	u32 exit_reason;
+	unsigned long exit_qualification;
 
 	bool rdtscp_enabled;
 
@@ -5074,6 +5075,8 @@  static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	/* ept page table is present? */
 	error_code |= (exit_qualification >> 3) & 0x1;
 
+    vcpu->arch.exit_qualification = exit_qualification;
+
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);  }
 
--
1.8.2.1.610.g562af5b