diff mbox

KVM: nVMX: fix shadow on EPT

Message ID 20131009161319.GO3574@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gleb Natapov Oct. 9, 2013, 4:13 p.m. UTC
72f857950f6f19 broke shadow on EPT. This patch reverts it and fixes PAE
on nEPT (which reverted commit fixed) in other way.

Shadow on EPT is now broken because while L1 builds shadow page table
for L2 (which is PAE while L2 is in real mode) it never loads L2's
GUEST_PDPTR[0-3].  They do not need to be loaded because without nested
virtualization HW does this during guest entry if EPT is disabled,
but in our case L0 emulates L2's vmentry while EPT is enables, so we
cannot rely on vmcs12->guest_pdptr[0-3] to contain up-to-date values
and need to re-read PDPTEs from L2 memory. This is what kvm_set_cr3()
is doing, but by clearing cache bits during L2 vmentry we drop values
that kvm_set_cr3() read from memory.

So why the same code does not work for PAE on nEPT? kvm_set_cr3()
reads pdptes into vcpu->arch.walk_mmu->pdptrs[]. walk_mmu points to
vcpu->arch.nested_mmu while nested guest is running, but ept_load_pdptrs()
uses vcpu->arch.mmu which contain incorrect values. Fix that by using
walk_mmu in ept_(load|save)_pdptrs.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Paolo Bonzini Oct. 9, 2013, 4:19 p.m. UTC | #1
Il 09/10/2013 18:13, Gleb Natapov ha scritto:
> 72f857950f6f19 broke shadow on EPT. This patch reverts it and fixes PAE
> on nEPT (which reverted commit fixed) in other way.
> 
> Shadow on EPT is now broken because while L1 builds shadow page table
> for L2 (which is PAE while L2 is in real mode) it never loads L2's
> GUEST_PDPTR[0-3].  They do not need to be loaded because without nested
> virtualization HW does this during guest entry if EPT is disabled,
> but in our case L0 emulates L2's vmentry while EPT is enables, so we
> cannot rely on vmcs12->guest_pdptr[0-3] to contain up-to-date values
> and need to re-read PDPTEs from L2 memory. This is what kvm_set_cr3()
> is doing, but by clearing cache bits during L2 vmentry we drop values
> that kvm_set_cr3() read from memory.
> 
> So why the same code does not work for PAE on nEPT? kvm_set_cr3()
> reads pdptes into vcpu->arch.walk_mmu->pdptrs[]. walk_mmu points to
> vcpu->arch.nested_mmu while nested guest is running, but ept_load_pdptrs()
> uses vcpu->arch.mmu which contain incorrect values. Fix that by using
> walk_mmu in ept_(load|save)_pdptrs.
> 
> Signed-off-by: Gleb Natapov <gleb@redhat.com>

The new fix also looks much more obvious.

Applied to queue, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov Oct. 9, 2013, 4:35 p.m. UTC | #2
On Wed, Oct 09, 2013 at 06:19:52PM +0200, Paolo Bonzini wrote:
> Il 09/10/2013 18:13, Gleb Natapov ha scritto:
> > 72f857950f6f19 broke shadow on EPT. This patch reverts it and fixes PAE
> > on nEPT (which reverted commit fixed) in other way.
> > 
> > Shadow on EPT is now broken because while L1 builds shadow page table
> > for L2 (which is PAE while L2 is in real mode) it never loads L2's
> > GUEST_PDPTR[0-3].  They do not need to be loaded because without nested
> > virtualization HW does this during guest entry if EPT is disabled,
> > but in our case L0 emulates L2's vmentry while EPT is enables, so we
> > cannot rely on vmcs12->guest_pdptr[0-3] to contain up-to-date values
> > and need to re-read PDPTEs from L2 memory. This is what kvm_set_cr3()
> > is doing, but by clearing cache bits during L2 vmentry we drop values
> > that kvm_set_cr3() read from memory.
> > 
> > So why the same code does not work for PAE on nEPT? kvm_set_cr3()
> > reads pdptes into vcpu->arch.walk_mmu->pdptrs[]. walk_mmu points to
> > vcpu->arch.nested_mmu while nested guest is running, but ept_load_pdptrs()
> > uses vcpu->arch.mmu which contain incorrect values. Fix that by using
> > walk_mmu in ept_(load|save)_pdptrs.
> > 
> > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> 
> The new fix also looks much more obvious.
> 
> Applied to queue, thanks.
> 
The patch that break shadow on EPT is in master IIRC, if it is the case
consider this for -rc5.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Oct. 9, 2013, 4:38 p.m. UTC | #3
Il 09/10/2013 18:35, Gleb Natapov ha scritto:
> On Wed, Oct 09, 2013 at 06:19:52PM +0200, Paolo Bonzini wrote:
>> Il 09/10/2013 18:13, Gleb Natapov ha scritto:
>>> 72f857950f6f19 broke shadow on EPT. This patch reverts it and fixes PAE
>>> on nEPT (which reverted commit fixed) in other way.
>>>
>>> Shadow on EPT is now broken because while L1 builds shadow page table
>>> for L2 (which is PAE while L2 is in real mode) it never loads L2's
>>> GUEST_PDPTR[0-3].  They do not need to be loaded because without nested
>>> virtualization HW does this during guest entry if EPT is disabled,
>>> but in our case L0 emulates L2's vmentry while EPT is enables, so we
>>> cannot rely on vmcs12->guest_pdptr[0-3] to contain up-to-date values
>>> and need to re-read PDPTEs from L2 memory. This is what kvm_set_cr3()
>>> is doing, but by clearing cache bits during L2 vmentry we drop values
>>> that kvm_set_cr3() read from memory.
>>>
>>> So why the same code does not work for PAE on nEPT? kvm_set_cr3()
>>> reads pdptes into vcpu->arch.walk_mmu->pdptrs[]. walk_mmu points to
>>> vcpu->arch.nested_mmu while nested guest is running, but ept_load_pdptrs()
>>> uses vcpu->arch.mmu which contain incorrect values. Fix that by using
>>> walk_mmu in ept_(load|save)_pdptrs.
>>>
>>> Signed-off-by: Gleb Natapov <gleb@redhat.com>
>>
>> The new fix also looks much more obvious.
>>
>> Applied to queue, thanks.
>>
> The patch that break shadow on EPT is in master IIRC, if it is the case
> consider this for -rc5.

Yes.  I would also like to include the PPC patches that Alex pointed to
us, but I need a Tested-by because the backport isn't trivial.

If it doesn't come soon (i.e. by friday), I'll drop them and include
only your patch.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2db9164..1f00a94 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3252,25 +3252,29 @@  static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu)
 
 static void ept_load_pdptrs(struct kvm_vcpu *vcpu)
 {
+	struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
+
 	if (!test_bit(VCPU_EXREG_PDPTR,
 		      (unsigned long *)&vcpu->arch.regs_dirty))
 		return;
 
 	if (is_paging(vcpu) && is_pae(vcpu) && !is_long_mode(vcpu)) {
-		vmcs_write64(GUEST_PDPTR0, vcpu->arch.mmu.pdptrs[0]);
-		vmcs_write64(GUEST_PDPTR1, vcpu->arch.mmu.pdptrs[1]);
-		vmcs_write64(GUEST_PDPTR2, vcpu->arch.mmu.pdptrs[2]);
-		vmcs_write64(GUEST_PDPTR3, vcpu->arch.mmu.pdptrs[3]);
+		vmcs_write64(GUEST_PDPTR0, mmu->pdptrs[0]);
+		vmcs_write64(GUEST_PDPTR1, mmu->pdptrs[1]);
+		vmcs_write64(GUEST_PDPTR2, mmu->pdptrs[2]);
+		vmcs_write64(GUEST_PDPTR3, mmu->pdptrs[3]);
 	}
 }
 
 static void ept_save_pdptrs(struct kvm_vcpu *vcpu)
 {
+	struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
+
 	if (is_paging(vcpu) && is_pae(vcpu) && !is_long_mode(vcpu)) {
-		vcpu->arch.mmu.pdptrs[0] = vmcs_read64(GUEST_PDPTR0);
-		vcpu->arch.mmu.pdptrs[1] = vmcs_read64(GUEST_PDPTR1);
-		vcpu->arch.mmu.pdptrs[2] = vmcs_read64(GUEST_PDPTR2);
-		vcpu->arch.mmu.pdptrs[3] = vmcs_read64(GUEST_PDPTR3);
+		mmu->pdptrs[0] = vmcs_read64(GUEST_PDPTR0);
+		mmu->pdptrs[1] = vmcs_read64(GUEST_PDPTR1);
+		mmu->pdptrs[2] = vmcs_read64(GUEST_PDPTR2);
+		mmu->pdptrs[3] = vmcs_read64(GUEST_PDPTR3);
 	}
 
 	__set_bit(VCPU_EXREG_PDPTR,
@@ -7794,10 +7798,6 @@  static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 		vmcs_write64(GUEST_PDPTR1, vmcs12->guest_pdptr1);
 		vmcs_write64(GUEST_PDPTR2, vmcs12->guest_pdptr2);
 		vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
-		__clear_bit(VCPU_EXREG_PDPTR,
-				(unsigned long *)&vcpu->arch.regs_avail);
-		__clear_bit(VCPU_EXREG_PDPTR,
-				(unsigned long *)&vcpu->arch.regs_dirty);
 	}
 
 	kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12->guest_rsp);