diff mbox

[10/10] kvm: vmx: handle VMEXIT from SGX Enclave

Message ID 20170508052434.3627-11-kai.huang@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kai Huang May 8, 2017, 5:24 a.m. UTC
VMX adds new bit to both exit_reason and GUEST_INTERRUPT_STATE to indicate
whether VMEXIT happens in Enclave. Several instructions are also invalid or
behave differently in enclave according to SDM. This patch handles those
cases.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/include/asm/vmx.h      |   1 +
 arch/x86/include/uapi/asm/vmx.h |   1 +
 arch/x86/kvm/vmx.c              | 120 +++++++++++++++++++++++++++++++++-------
 3 files changed, 103 insertions(+), 19 deletions(-)

Comments

Paolo Bonzini May 8, 2017, 8:22 a.m. UTC | #1
On 08/05/2017 07:24, Kai Huang wrote:
> @@ -6977,6 +7042,31 @@ static __exit void hardware_unsetup(void)
>   */
>  static int handle_pause(struct kvm_vcpu *vcpu)
>  {
> +	/*
> +	 * SDM 39.6.3 PAUSE Instruction.
> +	 *
> +	 * SDM suggests, if VMEXIT caused by 'PAUSE-loop exiting', VMM should
> +	 * disable 'PAUSE-loop exiting' so PAUSE can be executed in Enclave
> +	 * again without further PAUSE-looping VMEXIT.
> +	 *
> +	 * SDM suggests, if VMEXIT caused by 'PAUSE exiting', VMM should disable
> +	 * 'PAUSE exiting' so PAUSE can be executed in Enclave again without
> +	 * further PAUSE VMEXIT.
> +	 */

How is PLE re-enabled?

I don't understand the interaction of the internal control registers
(paragraph 41.1.4) with VMX.  How can you migrate the VM between EENTER
and EEXIT?

In addition, paragraph 41.1.4 does not include the parts of CR_SAVE_FS*
and CR_SAVE_GS* (base, limit, access rights) and does not include
CR_ENCLAVE_ENTRY_IP.

Paolo

> +	if (vmx_exit_from_enclave(vcpu)) {
> +		u32 exec_ctl, secondary_exec_ctl;
> +
> +		exec_ctl = vmx_exec_control(to_vmx(vcpu));
> +		exec_ctl &= ~CPU_BASED_PAUSE_EXITING;
> +		vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, exec_ctl);
> +
> +		secondary_exec_ctl = vmx_secondary_exec_control(to_vmx(vcpu));
> +		secondary_exec_ctl &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
> +		vmcs_set_secondary_exec_control(secondary_exec_ctl);
> +
> +		return 1;
> +	}
> +
Kai Huang May 11, 2017, 9:34 a.m. UTC | #2
On 5/8/2017 8:22 PM, Paolo Bonzini wrote:
>
>
> On 08/05/2017 07:24, Kai Huang wrote:
>> @@ -6977,6 +7042,31 @@ static __exit void hardware_unsetup(void)
>>   */
>>  static int handle_pause(struct kvm_vcpu *vcpu)
>>  {
>> +	/*
>> +	 * SDM 39.6.3 PAUSE Instruction.
>> +	 *
>> +	 * SDM suggests, if VMEXIT caused by 'PAUSE-loop exiting', VMM should
>> +	 * disable 'PAUSE-loop exiting' so PAUSE can be executed in Enclave
>> +	 * again without further PAUSE-looping VMEXIT.
>> +	 *
>> +	 * SDM suggests, if VMEXIT caused by 'PAUSE exiting', VMM should disable
>> +	 * 'PAUSE exiting' so PAUSE can be executed in Enclave again without
>> +	 * further PAUSE VMEXIT.
>> +	 */
>
> How is PLE re-enabled?

Currently it will not be enabled again. Probably we can re-enable it at 
another VMEXIT, if that VMEXIT is not PLE VMEXIT?

>
> I don't understand the interaction of the internal control registers
> (paragraph 41.1.4) with VMX.  How can you migrate the VM between EENTER
> and EEXIT?

Current SGX hardware architecture doesn't support live migration, as the 
key architecture of SGX is not migratable. For example, some keys are 
persistent and bound to hardware (sealing and attestation). Therefore 
right now if SGX is exposed to guest, live migration is not supposed.

>
> In addition, paragraph 41.1.4 does not include the parts of CR_SAVE_FS*
> and CR_SAVE_GS* (base, limit, access rights) and does not include
> CR_ENCLAVE_ENTRY_IP.

CPU can exit enclave via EEXIT, or by Asynchronous Enclave Exit (AEX). 
All non-EEXIT enclave exit are referred as AEX. When AEX happens, a so 
called "synthetic state" is created on CPU to prevent any software from 
trying to observe *secret* from CPU status in AEX. What exactly will be 
pushed in "synthetic state" is in SDM 40.3.

So in my understanding, CPU won't put something like 
"CR_ENCLAVE_ENTRY_IP" to RIP. Actually during AEX, Asynchronous Exit 
Pointer (AEP), which is in normal memory, will be pushed to stack and 
IRET will return to AEP to continue to run. AEP typically points to some 
small piece of code which basically calls ERESUME so that we can go back 
to enclave to run.

Hope my reply answered your questions?

Thanks,
-Kai

>
> Paolo
>
>> +	if (vmx_exit_from_enclave(vcpu)) {
>> +		u32 exec_ctl, secondary_exec_ctl;
>> +
>> +		exec_ctl = vmx_exec_control(to_vmx(vcpu));
>> +		exec_ctl &= ~CPU_BASED_PAUSE_EXITING;
>> +		vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, exec_ctl);
>> +
>> +		secondary_exec_ctl = vmx_secondary_exec_control(to_vmx(vcpu));
>> +		secondary_exec_ctl &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>> +		vmcs_set_secondary_exec_control(secondary_exec_ctl);
>> +
>> +		return 1;
>> +	}
>> +
>
Kai Huang June 19, 2017, 5:02 a.m. UTC | #3
On 5/11/2017 9:34 PM, Huang, Kai wrote:
> 
> 
> On 5/8/2017 8:22 PM, Paolo Bonzini wrote:
>>
>>
>> On 08/05/2017 07:24, Kai Huang wrote:
>>> @@ -6977,6 +7042,31 @@ static __exit void hardware_unsetup(void)
>>>   */
>>>  static int handle_pause(struct kvm_vcpu *vcpu)
>>>  {
>>> +    /*
>>> +     * SDM 39.6.3 PAUSE Instruction.
>>> +     *
>>> +     * SDM suggests, if VMEXIT caused by 'PAUSE-loop exiting', VMM 
>>> should
>>> +     * disable 'PAUSE-loop exiting' so PAUSE can be executed in Enclave
>>> +     * again without further PAUSE-looping VMEXIT.
>>> +     *
>>> +     * SDM suggests, if VMEXIT caused by 'PAUSE exiting', VMM should 
>>> disable
>>> +     * 'PAUSE exiting' so PAUSE can be executed in Enclave again 
>>> without
>>> +     * further PAUSE VMEXIT.
>>> +     */
>>
>> How is PLE re-enabled?
> 
> Currently it will not be enabled again. Probably we can re-enable it at 
> another VMEXIT, if that VMEXIT is not PLE VMEXIT?

Hi Paolo, all,

Sorry for reply late.

Do you think it is feasible to turn on PLE again on another further 
VMEXIT? Or another VMEXIT that is not from enclave?

Any suggestions so that I can improve in next version RFC?

> 
>>
>> I don't understand the interaction of the internal control registers
>> (paragraph 41.1.4) with VMX.  How can you migrate the VM between EENTER
>> and EEXIT?
> 
> Current SGX hardware architecture doesn't support live migration, as the 
> key architecture of SGX is not migratable. For example, some keys are 
> persistent and bound to hardware (sealing and attestation). Therefore 
> right now if SGX is exposed to guest, live migration is not supposed.

We recently had a discussion on this. We figured out that we are able to 
support SGX live migration with some kind of workaround -- basically the 
idea is we can ignore source VM's EPC and depend on destination VM's SGX 
driver and userspace SW stack to handle *sudden lose of EPC*, but this 
will cause some inconsistence with HW behavior, and will need to depend 
on driver's ability. I'll elaborate this in next version design and RFC 
and we can have a discussion whether to support it or not (along with 
snapshot support). But maybe we can also have a detailed discussion if 
you want to start now?

Thanks,
-Kai
> 
>>
>> In addition, paragraph 41.1.4 does not include the parts of CR_SAVE_FS*
>> and CR_SAVE_GS* (base, limit, access rights) and does not include
>> CR_ENCLAVE_ENTRY_IP.
> 
> CPU can exit enclave via EEXIT, or by Asynchronous Enclave Exit (AEX). 
> All non-EEXIT enclave exit are referred as AEX. When AEX happens, a so 
> called "synthetic state" is created on CPU to prevent any software from 
> trying to observe *secret* from CPU status in AEX. What exactly will be 
> pushed in "synthetic state" is in SDM 40.3.
> 
> So in my understanding, CPU won't put something like 
> "CR_ENCLAVE_ENTRY_IP" to RIP. Actually during AEX, Asynchronous Exit 
> Pointer (AEP), which is in normal memory, will be pushed to stack and 
> IRET will return to AEP to continue to run. AEP typically points to some 
> small piece of code which basically calls ERESUME so that we can go back 
> to enclave to run.
> 
> Hope my reply answered your questions?
> 
> Thanks,
> -Kai
> 
>>
>> Paolo
>>
>>> +    if (vmx_exit_from_enclave(vcpu)) {
>>> +        u32 exec_ctl, secondary_exec_ctl;
>>> +
>>> +        exec_ctl = vmx_exec_control(to_vmx(vcpu));
>>> +        exec_ctl &= ~CPU_BASED_PAUSE_EXITING;
>>> +        vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, exec_ctl);
>>> +
>>> +        secondary_exec_ctl = vmx_secondary_exec_control(to_vmx(vcpu));
>>> +        secondary_exec_ctl &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>>> +        vmcs_set_secondary_exec_control(secondary_exec_ctl);
>>> +
>>> +        return 1;
>>> +    }
>>> +
>>
>
Radim Krčmář June 27, 2017, 3:29 p.m. UTC | #4
2017-06-19 17:02+1200, Huang, Kai:
> Hi Paolo, all,
> 
> Sorry for reply late.

I'm sorry as well,

> Do you think it is feasible to turn on PLE again on another further VMEXIT?
> Or another VMEXIT that is not from enclave?
> 
> Any suggestions so that I can improve in next version RFC?

KVM doesn't enable "PAUSE exiting", it enables "PAUSE loop exiting".
SDM recommends disabling "PAUSE exiting" because the VM exits
(fault-like) on every PAUSE and needs intervention in order to progress,
but SDM doesn't say to disable "PAUSE loop exiting".

Being inside an enclave doesn't change the usefulness of PLE (yielding
the CPU to a task that isn't blocked), so I think it would be best to do
nothing with it, thanks.
Kai Huang June 28, 2017, 10:22 p.m. UTC | #5
On 6/28/2017 3:29 AM, Radim Krčmář wrote:
> 2017-06-19 17:02+1200, Huang, Kai:
>> Hi Paolo, all,
>>
>> Sorry for reply late.
> 
> I'm sorry as well,
> 
>> Do you think it is feasible to turn on PLE again on another further VMEXIT?
>> Or another VMEXIT that is not from enclave?
>>
>> Any suggestions so that I can improve in next version RFC?
> 
> KVM doesn't enable "PAUSE exiting", it enables "PAUSE loop exiting".
> SDM recommends disabling "PAUSE exiting" because the VM exits
> (fault-like) on every PAUSE and needs intervention in order to progress,
> but SDM doesn't say to disable "PAUSE loop exiting".
> 
> Being inside an enclave doesn't change the usefulness of PLE (yielding
> the CPU to a task that isn't blocked), so I think it would be best to do
> nothing with it, thanks.

Hi Radim,

Thanks for feedback. You are right, and obviously I didn't read SDM 
carefully. Will do in next version. :)

Thanks,
-Kai
>
diff mbox

Patch

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 2f24290b7f9d..ec91f68f4511 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -351,6 +351,7 @@  enum vmcs_field {
 #define GUEST_INTR_STATE_MOV_SS		0x00000002
 #define GUEST_INTR_STATE_SMI		0x00000004
 #define GUEST_INTR_STATE_NMI		0x00000008
+#define GUEST_INTR_STATE_ENCLAVE_INTR	0x00000010
 
 /* GUEST_ACTIVITY_STATE flags */
 #define GUEST_ACTIVITY_ACTIVE		0
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 2bcd967d5c83..6f18898c003d 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -26,6 +26,7 @@ 
 
 
 #define VMX_EXIT_REASONS_FAILED_VMENTRY         0x80000000
+#define VMX_EXIT_REASON_FROM_ENCLAVE		0x08000000
 
 #define EXIT_REASON_EXCEPTION_NMI       0
 #define EXIT_REASON_EXTERNAL_INTERRUPT  1
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b5f37982e975..1022295ba925 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2628,6 +2628,24 @@  static void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
 		vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, interruptibility);
 }
 
+static bool vmx_exit_from_enclave(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * We have 2 bits to indicate whether VMEXIT happens from enclave --
+	 * bit 27 in VM_EXIT_REASON, and bit 4 in GUEST_INTERRUPTIBILITY_INFO.
+	 * Currently use latter to check whether VMEXIT happens from enclave,
+	 * but note that we never clear this bit therefore we assume hardware
+	 * will clear this bit when VMEXIT happens not from enclave, which
+	 * should be the case.
+	 *
+	 * We can either do this via bit 27 in VM_EXIT_REASON, by adding a bool
+	 * in vmx and set it in vmx_handle_exit when above bit is set, and clear
+	 * the bool right before vmentry to guest.
+	 */
+	return vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
+		GUEST_INTR_STATE_ENCLAVE_INTR ? true : false;
+}
+
 static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
 	unsigned long rip;
@@ -5457,6 +5475,25 @@  static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 	return exec_control;
 }
 
+static void vmcs_set_secondary_exec_control(u32 new_ctl)
+{
+	/*
+	 * These bits in the secondary execution controls field
+	 * are dynamic, the others are mostly based on the hypervisor
+	 * architecture and the guest's CPUID.  Do not touch the
+	 * dynamic bits.
+	 */
+	u32 mask =
+		SECONDARY_EXEC_SHADOW_VMCS |
+		SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
+		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
+
+	u32 cur_ctl = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+
+	vmcs_write32(SECONDARY_VM_EXEC_CONTROL,
+		     (new_ctl & ~mask) | (cur_ctl & mask));
+}
+
 static void ept_set_mmio_spte_mask(void)
 {
 	/*
@@ -6305,6 +6342,12 @@  static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
 
 static int handle_cpuid(struct kvm_vcpu *vcpu)
 {
+	/* CPUID is invalid in enclave */
+	if (vmx_exit_from_enclave(vcpu)) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
 	return kvm_emulate_cpuid(vcpu);
 }
 
@@ -6378,6 +6421,16 @@  static int handle_vmcall(struct kvm_vcpu *vcpu)
 
 static int handle_invd(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * SDM 39.6.5 INVD Handling when Enclaves Are Enabled.
+	 *
+	 * Spec says INVD causes #GP if EPC is enabled.
+	 */
+	if (vmx_exit_from_enclave(vcpu)) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
 	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
 }
 
@@ -6399,6 +6452,18 @@  static int handle_rdpmc(struct kvm_vcpu *vcpu)
 
 static int handle_wbinvd(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * SDM 39.6.5 INVD Handling when Enclaves Are Enabled.
+	 *
+	 * Spec says INVD causes #GP if EPC is enabled.
+	 *
+	 * FIXME: Does this also apply to WBINVD?
+	 */
+	if (vmx_exit_from_enclave(vcpu)) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
 	return kvm_emulate_wbinvd(vcpu);
 }
 
@@ -6977,6 +7042,31 @@  static __exit void hardware_unsetup(void)
  */
 static int handle_pause(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * SDM 39.6.3 PAUSE Instruction.
+	 *
+	 * SDM suggests, if VMEXIT caused by 'PAUSE-loop exiting', VMM should
+	 * disable 'PAUSE-loop exiting' so PAUSE can be executed in Enclave
+	 * again without further PAUSE-looping VMEXIT.
+	 *
+	 * SDM suggests, if VMEXIT caused by 'PAUSE exiting', VMM should disable
+	 * 'PAUSE exiting' so PAUSE can be executed in Enclave again without
+	 * further PAUSE VMEXIT.
+	 */
+	if (vmx_exit_from_enclave(vcpu)) {
+		u32 exec_ctl, secondary_exec_ctl;
+
+		exec_ctl = vmx_exec_control(to_vmx(vcpu));
+		exec_ctl &= ~CPU_BASED_PAUSE_EXITING;
+		vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, exec_ctl);
+
+		secondary_exec_ctl = vmx_secondary_exec_control(to_vmx(vcpu));
+		secondary_exec_ctl &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
+		vmcs_set_secondary_exec_control(secondary_exec_ctl);
+
+		return 1;
+	}
+
 	if (ple_gap)
 		grow_ple_window(vcpu);
 
@@ -8876,6 +8966,17 @@  static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 		return 0;
 	}
 
+	/* Bit 27 of exit_reason will be set if VMEXT is from SGX enclave. */
+	if (exit_reason & VMX_EXIT_REASON_FROM_ENCLAVE) {
+		/*
+		 * Need to clear bit 27 otherwise further check of calling
+		 * kvm_vmx_exit_handlers would fail. We rely on bit 4 of
+		 * GUEST_INTERRUPTIBILITY_INFO to determine whether VMEXIT
+		 * is from enclave in the future.
+		 */
+		exit_reason &= ~VMX_EXIT_REASON_FROM_ENCLAVE;
+	}
+
 	/*
 	 * Note:
 	 * Do not try to fix EXIT_REASON_EPT_MISCONFIG if it caused by
@@ -9768,25 +9869,6 @@  static int vmx_get_lpage_level(void)
 		return PT_PDPE_LEVEL;
 }
 
-static void vmcs_set_secondary_exec_control(u32 new_ctl)
-{
-	/*
-	 * These bits in the secondary execution controls field
-	 * are dynamic, the others are mostly based on the hypervisor
-	 * architecture and the guest's CPUID.  Do not touch the
-	 * dynamic bits.
-	 */
-	u32 mask =
-		SECONDARY_EXEC_SHADOW_VMCS |
-		SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
-		SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
-
-	u32 cur_ctl = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
-
-	vmcs_write32(SECONDARY_VM_EXEC_CONTROL,
-		     (new_ctl & ~mask) | (cur_ctl & mask));
-}
-
 /*
  * Generate MSR_IA32_VMX_CR{0,4}_FIXED1 according to CPUID. Only set bits
  * (indicating "allowed-1") if they are supported in the guest's CPUID.