diff mbox

[v5,11/14] nEPT: MMU context for nested EPT

Message ID 1375282131-9713-12-git-send-email-gleb@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gleb Natapov July 31, 2013, 2:48 p.m. UTC
From: Nadav Har'El <nyh@il.ibm.com>

KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/mmu.c |   26 ++++++++++++++++++++++++++
 arch/x86/kvm/mmu.h |    2 ++
 arch/x86/kvm/vmx.c |   41 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 68 insertions(+), 1 deletion(-)

Comments

Xiao Guangrong Aug. 1, 2013, 9:16 a.m. UTC | #1
On 07/31/2013 10:48 PM, Gleb Natapov wrote:
> From: Nadav Har'El <nyh@il.ibm.com>
> 
> KVM's existing shadow MMU code already supports nested TDP. To use it, we
> need to set up a new "MMU context" for nested EPT, and create a few callbacks
> for it (nested_ept_*()). This context should also use the EPT versions of
> the page table access functions (defined in the previous patch).
> Then, we need to switch back and forth between this nested context and the
> regular MMU context when switching between L1 and L2 (when L1 runs this L2
> with EPT).

This patch looks good to me.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

But i am confused that update_permission_bitmask() is not adjusted in this
series. That function depends on kvm_read_cr4_bits(X86_CR4_SMEP) and
is_write_protection(), these two functions should read the registers from
L2 guest, using the L2 status to check L1's page table seems strange.
The same issue is in nested npt. Anything i missed?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov Aug. 1, 2013, 9:37 a.m. UTC | #2
On Thu, Aug 01, 2013 at 05:16:07PM +0800, Xiao Guangrong wrote:
> On 07/31/2013 10:48 PM, Gleb Natapov wrote:
> > From: Nadav Har'El <nyh@il.ibm.com>
> > 
> > KVM's existing shadow MMU code already supports nested TDP. To use it, we
> > need to set up a new "MMU context" for nested EPT, and create a few callbacks
> > for it (nested_ept_*()). This context should also use the EPT versions of
> > the page table access functions (defined in the previous patch).
> > Then, we need to switch back and forth between this nested context and the
> > regular MMU context when switching between L1 and L2 (when L1 runs this L2
> > with EPT).
> 
> This patch looks good to me.
> 
> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> 
> But i am confused that update_permission_bitmask() is not adjusted in this
> series. That function depends on kvm_read_cr4_bits(X86_CR4_SMEP) and
> is_write_protection(), these two functions should read the registers from
> L2 guest, using the L2 status to check L1's page table seems strange.
> The same issue is in nested npt. Anything i missed?
Good catch again. Looks like we need update_permission_bitmask_ept()
that uses different logic to calculate permissions.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong Aug. 1, 2013, 9:51 a.m. UTC | #3
On 08/01/2013 05:16 PM, Xiao Guangrong wrote:
> On 07/31/2013 10:48 PM, Gleb Natapov wrote:
>> From: Nadav Har'El <nyh@il.ibm.com>
>>
>> KVM's existing shadow MMU code already supports nested TDP. To use it, we
>> need to set up a new "MMU context" for nested EPT, and create a few callbacks
>> for it (nested_ept_*()). This context should also use the EPT versions of
>> the page table access functions (defined in the previous patch).
>> Then, we need to switch back and forth between this nested context and the
>> regular MMU context when switching between L1 and L2 (when L1 runs this L2
>> with EPT).
> 
> This patch looks good to me.
> 
> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> 
> But i am confused that update_permission_bitmask() is not adjusted in this
> series. That function depends on kvm_read_cr4_bits(X86_CR4_SMEP) and
> is_write_protection(), these two functions should read the registers from
> L2 guest, using the L2 status to check L1's page table seems strange.
> The same issue is in nested npt. Anything i missed?

After check the code, i found vcpu->arch.mmu is not updated when switch to
nested mmu, that means, "using the L2 status to check L1's page table seems
strange" is wrong. That is fine on nested npt, but nested ept should adjust
the logic anyway.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 58ae9db..37fff14 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3792,6 +3792,32 @@  int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
+int kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
+		bool execonly)
+{
+	ASSERT(vcpu);
+	ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+
+	context->nx = true;
+	context->new_cr3 = paging_new_cr3;
+	context->page_fault = ept_page_fault;
+	context->gva_to_gpa = ept_gva_to_gpa;
+	context->sync_page = ept_sync_page;
+	context->invlpg = ept_invlpg;
+	context->update_pte = ept_update_pte;
+	context->free = paging_free;
+	context->root_level = context->shadow_root_level;
+	context->root_hpa = INVALID_PAGE;
+	context->direct_map = false;
+
+	reset_rsvds_bits_mask_ept(vcpu, context, execonly);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
+
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
 	int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 5b59c57..77e044a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -71,6 +71,8 @@  enum {
 
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+int kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
+		bool execonly);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f3514d7..f41751a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1046,6 +1046,11 @@  static inline bool nested_cpu_has_virtual_nmis(struct vmcs12 *vmcs12,
 	return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
+{
+	return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
+}
+
 static inline bool is_exception(u32 intr_info)
 {
 	return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -7432,6 +7437,33 @@  static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
 	vmcs12->guest_physical_address = fault->address;
 }
 
+/* Callbacks for nested_ept_init_mmu_context: */
+
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+	/* return the page table to be shadowed - in our case, EPT12 */
+	return get_vmcs12(vcpu)->ept_pointer;
+}
+
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+	int r = kvm_init_shadow_ept_mmu(vcpu, &vcpu->arch.mmu,
+			nested_vmx_ept_caps & VMX_EPT_EXECUTE_ONLY_BIT);
+
+	vcpu->arch.mmu.set_cr3           = vmx_set_cr3;
+	vcpu->arch.mmu.get_cr3           = nested_ept_get_cr3;
+	vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
+
+	vcpu->arch.walk_mmu              = &vcpu->arch.nested_mmu;
+
+	return r;
+}
+
+static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.walk_mmu = &vcpu->arch.mmu;
+}
+
 /*
  * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
  * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
@@ -7652,6 +7684,11 @@  static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 		vmx_flush_tlb(vcpu);
 	}
 
+	if (nested_cpu_has_ept(vmcs12)) {
+		kvm_mmu_unload(vcpu);
+		nested_ept_init_mmu_context(vcpu);
+	}
+
 	if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
 		vcpu->arch.efer = vmcs12->guest_ia32_efer;
 	else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
@@ -8124,7 +8161,9 @@  static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 	vcpu->arch.cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK);
 	kvm_set_cr4(vcpu, vmcs12->host_cr4);
 
-	/* shadow page tables on either EPT or shadow page tables */
+	if (nested_cpu_has_ept(vmcs12))
+		nested_ept_uninit_mmu_context(vcpu);
+
 	kvm_set_cr3(vcpu, vmcs12->host_cr3);
 	kvm_mmu_reset_context(vcpu);