KVM: nVMX: sync vcms02 segment regs prior to vmx_set_cr0
diff mbox

Message ID 20180305173327.1252-1-sean.j.christopherson@intel.com
State New
Headers show

Commit Message

Sean Christopherson March 5, 2018, 5:33 p.m. UTC
Segment registers must be synchronized prior to any code that may
trigger a call to emulation_required()/guest_state_valid(), e.g.
vmx_set_cr0().  Because preparing vmcs02 writes segmentation fields
directly, i.e. doesn't use vmx_set_segment(), emulation_required
will not be re-evaluated when synchronizing the segment registers,
which can result in L0 incorrectly starting emulation of L2.

Fixes: 8665c3f97320 ("KVM: nVMX: initialize descriptor cache fields in prepare_vmcs02_full")

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Comments

Radim Krčmář March 9, 2018, 6:37 p.m. UTC | #1
2018-03-05 09:33-0800, Sean Christopherson:
> Segment registers must be synchronized prior to any code that may
> trigger a call to emulation_required()/guest_state_valid(), e.g.
> vmx_set_cr0().  Because preparing vmcs02 writes segmentation fields
> directly, i.e. doesn't use vmx_set_segment(), emulation_required
> will not be re-evaluated when synchronizing the segment registers,
> which can result in L0 incorrectly starting emulation of L2.
> 
> Fixes: 8665c3f97320 ("KVM: nVMX: initialize descriptor cache fields in prepare_vmcs02_full")
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -10563,11 +10563,8 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>  	return 0;
>  }
>  
> -static void prepare_vmcs02_full(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> -			       bool from_vmentry)

Nice catch!

Paolo, was there a reason to that prevented prepare_vmcs02_full at the
beginning of prepare_vmcs02?

Thanks.
Paolo Bonzini March 14, 2018, 1:14 p.m. UTC | #2
On 09/03/2018 19:37, Radim Krčmář wrote:
> 2018-03-05 09:33-0800, Sean Christopherson:
>> Segment registers must be synchronized prior to any code that may
>> trigger a call to emulation_required()/guest_state_valid(), e.g.
>> vmx_set_cr0().  Because preparing vmcs02 writes segmentation fields
>> directly, i.e. doesn't use vmx_set_segment(), emulation_required
>> will not be re-evaluated when synchronizing the segment registers,
>> which can result in L0 incorrectly starting emulation of L2.
>>
>> Fixes: 8665c3f97320 ("KVM: nVMX: initialize descriptor cache fields in prepare_vmcs02_full")
>>
>> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> ---
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -10563,11 +10563,8 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>  	return 0;
>>  }
>>  
>> -static void prepare_vmcs02_full(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>> -			       bool from_vmentry)
> 
> Nice catch!
> 
> Paolo, was there a reason to that prevented prepare_vmcs02_full at the
> beginning of prepare_vmcs02?

No, I placed it there because initially I wanted to move more work to
prepare_vmcs02_full (such as entry/exit controls, CR0/CR4/EFER, etc.),
but that didn't quite work out.

Sean, do you have a test case for this bug?  (And if it can be fixed by
moving prepare_vmcs02_full at the beginning, as suggested by Radim, I'm
all for it).

Paolo
Sean Christopherson March 14, 2018, 3:58 p.m. UTC | #3
On Wed, 2018-03-14 at 14:14 +0100, Paolo Bonzini wrote:
> On 09/03/2018 19:37, Radim Krčmář wrote:

> > 

> > 2018-03-05 09:33-0800, Sean Christopherson:

> > > 

> > > Segment registers must be synchronized prior to any code that may

> > > trigger a call to emulation_required()/guest_state_valid(), e.g.

> > > vmx_set_cr0().  Because preparing vmcs02 writes segmentation fields

> > > directly, i.e. doesn't use vmx_set_segment(), emulation_required

> > > will not be re-evaluated when synchronizing the segment registers,

> > > which can result in L0 incorrectly starting emulation of L2.

> > > 

> > > Fixes: 8665c3f97320 ("KVM: nVMX: initialize descriptor cache fields in prepare_vmcs02_full")

> > > 

> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

> > > ---

> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c

> > > @@ -10563,11 +10563,8 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne

> > >  	return 0;

> > >  }

> > >  

> > > -static void prepare_vmcs02_full(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,

> > > -			       bool from_vmentry)

> > Nice catch!

> > 

> > Paolo, was there a reason to that prevented prepare_vmcs02_full at the

> > beginning of prepare_vmcs02?

> No, I placed it there because initially I wanted to move more work to

> prepare_vmcs02_full (such as entry/exit controls, CR0/CR4/EFER, etc.),

> but that didn't quite work out.

> 

> Sean, do you have a test case for this bug?  (And if it can be fixed by

> moving prepare_vmcs02_full at the beginning, as suggested by Radim, I'm

> all for it).


I don't have a formal unit test, but I hit the bug 100% of the time
simply by booting L2 with stock OVMF and unrestricted_guest=0 in L0,
i.e. it's trivially easy for me to reproduce.

I think I could create a unit test relatively easily.  I hit the bug
early in BIOS (OVMF) in L2, e.g. less than 10 instructions after reset.
In theory it should be straightforward to extract/port OVMF's reset
code to a unit test.  

Moving prepare_vmcs02_full to the beginning fixes the bug and doesn't
introduce any new issues as far as I can tell, but moving that much
code is a bit scary, especially this late in a release cycle.  I think
it would be ok since prepare_vmcs02_full doesn't appear to consume
dynamic state or have side effects, but I'll definitely defer to your
judgment.

> Paolo
Sean Christopherson March 14, 2018, 5:30 p.m. UTC | #4
On Wed, 2018-03-14, Sean Christopherson wrote:
> On Wed, 2018-03-14 at 14:14 +0100, Paolo Bonzini wrote:

> > On 09/03/2018 19:37, Radim Krčmář wrote:

> > > 

> > > 2018-03-05 09:33-0800, Sean Christopherson:

> > > > 

> > > > Segment registers must be synchronized prior to any code that may

> > > > trigger a call to emulation_required()/guest_state_valid(), e.g.

> > > > vmx_set_cr0().  Because preparing vmcs02 writes segmentation fields

> > > > directly, i.e. doesn't use vmx_set_segment(), emulation_required

> > > > will not be re-evaluated when synchronizing the segment registers,

> > > > which can result in L0 incorrectly starting emulation of L2.

> > > > 

> > > > Fixes: 8665c3f97320 ("KVM: nVMX: initialize descriptor cache fields in prepare_vmcs02_full")

> > > > 

> > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

> > > > ---

> > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c

> > > > @@ -10563,11 +10563,8 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne

> > > >  	return 0;

> > > >  }

> > > >  

> > > > -static void prepare_vmcs02_full(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,

> > > > -			       bool from_vmentry)

> > > Nice catch!

> > > 

> > > Paolo, was there a reason to that prevented prepare_vmcs02_full at the

> > > beginning of prepare_vmcs02?

> > No, I placed it there because initially I wanted to move more work to

> > prepare_vmcs02_full (such as entry/exit controls, CR0/CR4/EFER, etc.),

> > but that didn't quite work out.

> > 

> > Sean, do you have a test case for this bug?  (And if it can be fixed by

> > moving prepare_vmcs02_full at the beginning, as suggested by Radim, I'm

> > all for it).

> 

> I don't have a formal unit test, but I hit the bug 100% of the time

> simply by booting L2 with stock OVMF and unrestricted_guest=0 in L0,

> i.e. it's trivially easy for me to reproduce.

> 

> I think I could create a unit test relatively easily.  I hit the bug

> early in BIOS (OVMF) in L2, e.g. less than 10 instructions after reset.

> In theory it should be straightforward to extract/port OVMF's reset

> code to a unit test.  


The existing kvm-unit-tests tests/realmode hits this when setting
CR0.PM in exec_in_big_real_mode(), just need to load kvm_intel with
unrestricted_guest=0 in L0.  I thought I had tried tests/realmode before
sending the previous email, but I think I forgot to load kvm_intel in L1
and so it ran with accel=tcg, doh.

> Moving prepare_vmcs02_full to the beginning fixes the bug and doesn't

> introduce any new issues as far as I can tell, but moving that much

> code is a bit scary, especially this late in a release cycle.  I think

> it would be ok since prepare_vmcs02_full doesn't appear to consume

> dynamic state or have side effects, but I'll definitely defer to your

> judgment.

> 

> > Paolo
Paolo Bonzini March 15, 2018, 10:01 p.m. UTC | #5
On 14/03/2018 18:30, Christopherson, Sean J wrote:
> On Wed, 2018-03-14, Sean Christopherson wrote:
>> On Wed, 2018-03-14 at 14:14 +0100, Paolo Bonzini wrote:
>>> On 09/03/2018 19:37, Radim Krčmář wrote:
>>>>
>>>> 2018-03-05 09:33-0800, Sean Christopherson:
>>>>>
>>>>> Segment registers must be synchronized prior to any code that may
>>>>> trigger a call to emulation_required()/guest_state_valid(), e.g.
>>>>> vmx_set_cr0().  Because preparing vmcs02 writes segmentation fields
>>>>> directly, i.e. doesn't use vmx_set_segment(), emulation_required
>>>>> will not be re-evaluated when synchronizing the segment registers,
>>>>> which can result in L0 incorrectly starting emulation of L2.
>>>>>
>>>>> Fixes: 8665c3f97320 ("KVM: nVMX: initialize descriptor cache fields in prepare_vmcs02_full")
>>>>>
>>>>> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>>>>> ---
>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>> @@ -10563,11 +10563,8 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>>>>  	return 0;
>>>>>  }
>>>>>  
>>>>> -static void prepare_vmcs02_full(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>>>>> -			       bool from_vmentry)
>>>> Nice catch!
>>>>
>>>> Paolo, was there a reason to that prevented prepare_vmcs02_full at the
>>>> beginning of prepare_vmcs02?
>>> No, I placed it there because initially I wanted to move more work to
>>> prepare_vmcs02_full (such as entry/exit controls, CR0/CR4/EFER, etc.),
>>> but that didn't quite work out.
>>>
>>> Sean, do you have a test case for this bug?  (And if it can be fixed by
>>> moving prepare_vmcs02_full at the beginning, as suggested by Radim, I'm
>>> all for it).
>>
>> I don't have a formal unit test, but I hit the bug 100% of the time
>> simply by booting L2 with stock OVMF and unrestricted_guest=0 in L0,
>> i.e. it's trivially easy for me to reproduce.
>>
>> I think I could create a unit test relatively easily.  I hit the bug
>> early in BIOS (OVMF) in L2, e.g. less than 10 instructions after reset.
>> In theory it should be straightforward to extract/port OVMF's reset
>> code to a unit test.  
> 
> The existing kvm-unit-tests tests/realmode hits this when setting
> CR0.PM in exec_in_big_real_mode(), just need to load kvm_intel with
> unrestricted_guest=0 in L0.  I thought I had tried tests/realmode before
> sending the previous email, but I think I forgot to load kvm_intel in L1
> and so it ran with accel=tcg, doh.

Cool, I'll try to reproduce both ways and use it to test the move of
prepare_vmcs02_full at the beginning.

Paolo

Patch
diff mbox

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 591214843046..3d73d9ed15c8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10563,11 +10563,8 @@  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 	return 0;
 }
 
-static void prepare_vmcs02_full(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
-			       bool from_vmentry)
+static void prepare_vmcs02_segmentation(struct vmcs12 *vmcs12)
 {
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-
 	vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
 	vmcs_write16(GUEST_SS_SELECTOR, vmcs12->guest_ss_selector);
 	vmcs_write16(GUEST_DS_SELECTOR, vmcs12->guest_ds_selector);
@@ -10599,6 +10596,12 @@  static void prepare_vmcs02_full(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	vmcs_writel(GUEST_TR_BASE, vmcs12->guest_tr_base);
 	vmcs_writel(GUEST_GDTR_BASE, vmcs12->guest_gdtr_base);
 	vmcs_writel(GUEST_IDTR_BASE, vmcs12->guest_idtr_base);
+}
+
+static void prepare_vmcs02_full(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
+			       bool from_vmentry)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
 	vmcs_write32(GUEST_SYSENTER_CS, vmcs12->guest_sysenter_cs);
 	vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS,
@@ -10714,6 +10717,14 @@  static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	vmcs_writel(GUEST_ES_BASE, vmcs12->guest_es_base);
 	vmcs_writel(GUEST_CS_BASE, vmcs12->guest_cs_base);
 
+	/*
+	 * Segment registers must be synchronized prior to any code that may
+	 * trigger a call to emulation_required()/guest_state_valid(), e.g.
+	 * vmx_set_cr0().
+	 */
+	if (vmx->nested.dirty_vmcs12)
+		prepare_vmcs02_segmentation(vmcs12);
+
 	/*
 	 * Not in vmcs02: GUEST_PML_INDEX, HOST_FS_SELECTOR, HOST_GS_SELECTOR,
 	 * HOST_FS_BASE, HOST_GS_BASE.