diff mbox

[08/10] kvm: vmx: add guest's IA32_SGXLEPUBKEYHASHn runtime switch support

Message ID 20170508052434.3627-9-kai.huang@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kai Huang May 8, 2017, 5:24 a.m. UTC
If SGX runtime launch control is enabled on host (IA32_FEATURE_CONTROL[17]
is set), KVM can support running multiple guests with each running LE signed
with different RSA pubkey. KVM traps IA32_SGXLEPUBKEYHASHn MSR write from
and keeps the values to vcpu internally, and when vcpu is scheduled in, KVM
write those values to real IA32_SGXLEPUBKEYHASHn MSR.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
---
 arch/x86/include/asm/msr-index.h |   5 ++
 arch/x86/kvm/vmx.c               | 123 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 128 insertions(+)

Comments

Kai Huang May 12, 2017, 12:32 a.m. UTC | #1
Hi Paolo/Radim,

I'd like to start a discussion regarding to IA32_SGXLEPUBKEYHASHn 
handling here. I also copied SGX driver mailing list (which looks like I 
should do when sending out this series, sorry) and Sean, Haim and Haitao 
from Intel to have a better discussion.

Basically IA32_SGXLEPUBKEYHASHn (or more generally speaking, SGX Launch 
Control) allows us to run different Launch Enclave (LE) signed with 
different RSA keys. Only when the value of IA32_SGXLEPUBKEYHASHn matches 
the key used to sign the LE, the LE can be initialized, specifically, by 
EINIT, successfully. So before calling EINIT for LE, we have to make 
sure IA32_SGXLEPUBKEYHASHn contain the matched value. One fact is only 
EINIT uses IA32_SGXLEPUBKEYHASHn, and after EINIT, other ENCLS/ENCLU 
(ex, EGETKEY) runs correctly even the MSRs are changed to other values.

To support KVM guests to run their own LEs inside guests, KVM traps 
IA32_SGXLEPUBKEYHASHn MSR write and keep the value to vcpu internally, 
and KVM needs to write the cached value to real MSRs before guest runs 
EINIT. The problem at host side, we also run LE, probably multiple LEs 
(it seems currently SGX driver plans to run single in-kernel LE but I am 
not familiar with details, and IMO we should not assume host will only 
run one LE), therefore if KVM changes the physical MSRs for guest,
host may not be able to run LE as it may not re-write the right MSRs 
back. There are two approaches to make host and KVM guests work together:

1. Anyone who wants to run LE is responsible for writing the correct 
value to IA32_SGXLEPUBKEYHASHn.

My current patch is based on this assumption. For KVM guest, naturally, 
we will write the cached value to real MSRs when vcpu is scheduled in. 
For host, SGX driver should write its own value to MSRs when it performs 
EINIT for LE.

One argument against this approach is KVM guest should never have impact 
on host side, meaning host should not be aware of such MSR change, in 
which case, if host do some performance optimization thing that won't 
update MSRs actively, when host run EINIT, the physical MSRs may contain 
incorrect value. Instead, KVM should be responsible for restoring the 
original MSRs, which brings us to approach 2 below.

2. KVM should restore MSRs after changing for guest.

To do this, the simplest way for KVM is: 1) to save the original 
physical MSRs and update to guest's MSRs before VMENTRY; 2) in VMEXIT 
rewrite the original value to physical MSRs.

To me this approach is also arguable, as KVM guest is actually just a 
normal process (OK, maybe not that normal), and KVM guest should be 
treated as the same as other processes which runs LE, which means 
approach 1 is also reasonable.

And approach 2 will have more performance impact than approach 1 for 
KVM, as it read/write IA32_SGXLEPUBKEYHASHn during each VMEXIT/VMENTRY, 
while approach 1 only write MSRs when vcpu is scheduled in, which is 
less frequent.

I'd like to hear all your comments and hopefully we can have some 
agreement on this.

Another thing is, not quite related to selecting which approach above, 
and either we choose approach 1 or approach 2, KVM still suffers the 
performance loss of writing (and/or reading) to IA32_SGXLEPUBKEYHASHn 
MSRs, either when vcpu scheduled in or during each VMEXIT/VMENTRY. Given 
the fact that the IA32_SGXLEPUBKEYHASHn will only be used by EINIT, We 
can actually do some optimization by trapping EINIT from guest and only 
update MSRs in EINIT VMEXIT. This works for approach 1, but for approach 
2 we have to do some tricky thing during VMEXIT/VMENTRY
to check whether MSRs have been changed by EINIT VMEXIT, and only 
restore the original value if EINIT VMEXIT has happened. Guest's LE 
continues to run even physical MSRs are changed back to original.

But trapping ENCLS requires either 1) KVM to run ENCLS on hebalf of 
guest, in which case we have to reconstruct and remap guest's ENCLS 
parameters and skip the ENCLS for guest; 2) using MTF to let guest to 
run ENCLS again, while still trapping ENCLS. Either case would introduce 
more complicated code and potentially be more buggy, and I don't think 
we should do this to save some time of writing MSRs. If we need to turn 
on ENCLS VMEXIT anyway we can optimize this.

Thank you in advance.

Thanks,
-Kai

On 5/8/2017 5:24 PM, Kai Huang wrote:
> If SGX runtime launch control is enabled on host (IA32_FEATURE_CONTROL[17]
> is set), KVM can support running multiple guests with each running LE signed
> with different RSA pubkey. KVM traps IA32_SGXLEPUBKEYHASHn MSR write from
> and keeps the values to vcpu internally, and when vcpu is scheduled in, KVM
> write those values to real IA32_SGXLEPUBKEYHASHn MSR.
>
> Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
> ---
>  arch/x86/include/asm/msr-index.h |   5 ++
>  arch/x86/kvm/vmx.c               | 123 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 128 insertions(+)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e3770f570bb9..70482b951b0f 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -417,6 +417,11 @@
>  #define MSR_IA32_TSC_ADJUST             0x0000003b
>  #define MSR_IA32_BNDCFGS		0x00000d90
>
> +#define MSR_IA32_SGXLEPUBKEYHASH0	0x0000008c
> +#define MSR_IA32_SGXLEPUBKEYHASH1	0x0000008d
> +#define MSR_IA32_SGXLEPUBKEYHASH2	0x0000008e
> +#define MSR_IA32_SGXLEPUBKEYHASH3	0x0000008f
> +
>  #define MSR_IA32_XSS			0x00000da0
>
>  #define FEATURE_CONTROL_LOCKED				(1<<0)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a16539594a99..c96332b9dd44 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -656,6 +656,9 @@ struct vcpu_vmx {
>  	 */
>  	u64 msr_ia32_feature_control;
>  	u64 msr_ia32_feature_control_valid_bits;
> +
> +	/* SGX Launch Control public key hash */
> +	u64 msr_ia32_sgxlepubkeyhash[4];
>  };
>
>  enum segment_cache_field {
> @@ -2244,6 +2247,70 @@ static void decache_tsc_multiplier(struct vcpu_vmx *vmx)
>  	vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
>  }
>
> +static bool cpu_sgx_lepubkeyhash_writable(void)
> +{
> +	u64 val, sgx_lc_enabled_mask = (FEATURE_CONTROL_LOCKED |
> +			FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE);
> +
> +	rdmsrl(MSR_IA32_FEATURE_CONTROL, val);
> +
> +	return ((val & sgx_lc_enabled_mask) == sgx_lc_enabled_mask);
> +}
> +
> +static bool vmx_sgx_lc_disabled_in_bios(struct kvm_vcpu *vcpu)
> +{
> +	return (to_vmx(vcpu)->msr_ia32_feature_control & FEATURE_CONTROL_LOCKED)
> +		&& (!(to_vmx(vcpu)->msr_ia32_feature_control &
> +				FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE));
> +}
> +
> +#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH0		0xa6053e051270b7ac
> +#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH1	        0x6cfbe8ba8b3b413d
> +#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH2		0xc4916d99f2b3735d
> +#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH3		0xd4f8c05909f9bb3b
> +
> +static void vmx_sgx_init_lepubkeyhash(struct kvm_vcpu *vcpu)
> +{
> +	u64 h0, h1, h2, h3;
> +
> +	/*
> +	 * If runtime launch control is enabled (IA32_SGXLEPUBKEYHASHn is
> +	 * writable), we set guest's default value to be Intel's default
> +	 * hash (which is fixed value and can be hard-coded). Otherwise,
> +	 * guest can only use machine's IA32_SGXLEPUBKEYHASHn so set guest's
> +	 * default to that.
> +	 */
> +	if (cpu_sgx_lepubkeyhash_writable()) {
> +		h0 = SGX_INTEL_DEFAULT_LEPUBKEYHASH0;
> +		h1 = SGX_INTEL_DEFAULT_LEPUBKEYHASH1;
> +		h2 = SGX_INTEL_DEFAULT_LEPUBKEYHASH2;
> +		h3 = SGX_INTEL_DEFAULT_LEPUBKEYHASH3;
> +	}
> +	else {
> +		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH0, h0);
> +		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH1, h1);
> +		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH2, h2);
> +		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH3, h3);
> +	}
> +
> +	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0] = h0;
> +	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1] = h1;
> +	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2] = h2;
> +	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3] = h3;
> +}
> +
> +static void vmx_sgx_lepubkeyhash_load(struct kvm_vcpu *vcpu)
> +{
> +	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0,
> +			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0]);
> +	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH1,
> +			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1]);
> +	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH2,
> +			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2]);
> +	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH3,
> +			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3]);
> +}
> +
>  /*
>   * Switches to specified vcpu, until a matching vcpu_put(), but assumes
>   * vcpu mutex is already taken.
> @@ -2316,6 +2383,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>
>  	vmx_vcpu_pi_load(vcpu, cpu);
>  	vmx->host_pkru = read_pkru();
> +
> +	/*
> +	 * Load guset's SGX LE pubkey hash if runtime launch control is
> +	 * enabled.
> +	 */
> +	if (guest_cpuid_has_sgx_launch_control(vcpu) &&
> +			cpu_sgx_lepubkeyhash_writable())
> +		vmx_sgx_lepubkeyhash_load(vcpu);
>  }
>
>  static void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
> @@ -3225,6 +3300,19 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_IA32_FEATURE_CONTROL:
>  		msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control;
>  		break;
> +	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
> +		/*
> +		 * SDM 35.1 Model-Specific Registers, table 35-2.
> +		 * Read permitted if CPUID.0x12.0:EAX[0] = 1. (We have
> +		 * guaranteed this will be true if guest_cpuid_has_sgx
> +		 * is true.)
> +		 */
> +		if (!guest_cpuid_has_sgx(vcpu))
> +			return 1;
> +		msr_info->data =
> +			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_info->index -
> +			MSR_IA32_SGXLEPUBKEYHASH0];
> +		break;
>  	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
>  		if (!nested_vmx_allowed(vcpu))
>  			return 1;
> @@ -3344,6 +3432,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		 * SGX has been enabled in BIOS before using SGX.
>  		 */
>  		break;
> +	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
> +		/*
> +		 * SDM 35.1 Model-Specific Registers, table 35-2.
> +		 * - If CPUID.0x7.0:ECX[30] = 1, FEATURE_CONTROL[17] is
> +		 * available.
> +		 * - Write permitted if CPUID.0x12.0:EAX[0] = 1 &&
> +		 * FEATURE_CONTROL[17] = 1 && FEATURE_CONTROL[0] = 1.
> +		 */
> +		if (!guest_cpuid_has_sgx(vcpu) ||
> +				!guest_cpuid_has_sgx_launch_control(vcpu))
> +			return 1;
> +		/*
> +		 * Don't let userspace set guest's IA32_SGXLEPUBKEYHASHn,
> +		 * if machine's IA32_SGXLEPUBKEYHASHn cannot be changed at
> +		 * runtime. Note to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash are
> +		 * set to default in vmx_create_vcpu therefore guest is able
> +		 * to get the machine's IA32_SGXLEPUBKEYHASHn by rdmsr in
> +		 * guest.
> +		 */
> +		if (!cpu_sgx_lepubkeyhash_writable())
> +			return 1;
> +		/*
> +		 * If guest's FEATURE_CONTROL[17] is not set, guest's
> +		 * IA32_SGXLEPUBKEYHASHn are not writeable from guest.
> +		 */
> +		if (!vmx_sgx_lc_disabled_in_bios(vcpu) &&
> +				!msr_info->host_initiated)
> +			return 1;
> +		to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_index -
> +			MSR_IA32_SGXLEPUBKEYHASH0] = data;
> +		break;
>  	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
>  		if (!msr_info->host_initiated)
>  			return 1; /* they are read-only */
> @@ -9305,6 +9424,10 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
>  		vmx->nested.vpid02 = allocate_vpid();
>  	}
>
> +	/* Set vcpu's default IA32_SGXLEPUBKEYHASHn */
> +	if (enable_sgx && boot_cpu_has(X86_FEATURE_SGX_LAUNCH_CONTROL))
> +		vmx_sgx_init_lepubkeyhash(&vmx->vcpu);
> +
>  	vmx->nested.posted_intr_nv = -1;
>  	vmx->nested.current_vmptr = -1ull;
>  	vmx->nested.current_vmcs12 = NULL;
>
Andy Lutomirski May 12, 2017, 3:28 a.m. UTC | #2
[resending due to some kind of kernel.org glitch -- sorry if anyone
gets duplicates]

On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> My current patch is based on this assumption. For KVM guest, naturally, we
> will write the cached value to real MSRs when vcpu is scheduled in. For
> host, SGX driver should write its own value to MSRs when it performs EINIT
> for LE.

This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
would propose a totally different solution:

Have a percpu variable that stores the current SGXLEPUBKEYHASH along
with whatever lock is needed (probably just a mutex).  Users of EINIT
will take the mutex, compare the percpu variable to the desired value,
and, if it's different, do WRMSR and update the percpu variable.

KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
support the same handling as the host.  There is no action required at
all on KVM guest entry and exit.

FWIW, I think that KVM will, in the long run, want to trap EINIT for
other reasons: someone is going to want to implement policy for what
enclaves are allowed that applies to guests as well as the host.
Also, some day Intel may fix its architectural design flaw [1] by
allowing EINIT to personalize the enclave's keying, and, if it's done
by a new argument to EINIT instead of an MSR, KVM will have to trap
EINIT to handle it.

>
> One argument against this approach is KVM guest should never have impact on
> host side, meaning host should not be aware of such MSR change

As a somewhat generic comment, I don't like this approach to KVM
development.  KVM mucks with lots of important architectural control
registers, and, in all too many cases, it tries to do so independently
of the other arch/x86 code.  This ends up causing all kinds of grief.

Can't KVM and the real x86 arch code cooperate for real?  The host and
the KVM code are in arch/x86 in the same source tree.

>
> 2. KVM should restore MSRs after changing for guest.

No, this is IMO silly.  Don't restore it on each exit, in the user
return hook, or anywhere else.  Just make sure the host knows it was
changed.

> Another thing is, not quite related to selecting which approach above, and
> either we choose approach 1 or approach 2, KVM still suffers the performance
> loss of writing (and/or reading) to IA32_SGXLEPUBKEYHASHn MSRs, either when
> vcpu scheduled in or during each VMEXIT/VMENTRY. Given the fact that the
> IA32_SGXLEPUBKEYHASHn will only be used by EINIT, We can actually do some
> optimization by trapping EINIT from guest and only update MSRs in EINIT
> VMEXIT.

Yep.

> But trapping ENCLS requires either 1) KVM to run ENCLS on hebalf of guest,
> in which case we have to reconstruct and remap guest's ENCLS parameters and
> skip the ENCLS for guest; 2) using MTF to let guest to run ENCLS again,
> while still trapping ENCLS.

I would advocate for the former approach.  (But you can't remap the
parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
don't see why this is any more complicated than emulating any other
instruction that accesses memory.)

If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is
wrong and then clear the exit flag once the MSRs are in sync.  You'll
need to be careful to avoid races in which the host's value leaks into
the guest.  I think you'll find that this is more complicated, less
flexible, and less performant than just handling ENCLS[EINIT] directly
in the host.

[1] Guests that steal sealed data from each other or from the host can
manipulate that data without compromising the hypervisor by simply
loading the same enclave that its rightful owner would use.  If you're
trying to use SGX to protect your crypto credentials so that, if
stolen, they can't be used outside the guest, I would consider this to
be a major flaw.  It breaks the security model in a multi-tenant cloud
situation.  I've complained about it before.
Kai Huang May 12, 2017, 4:56 a.m. UTC | #3
Hi Andy,

Thanks for your helpful comments! See my reply below, and some questions 
as well.

On 5/12/2017 3:28 PM, Andy Lutomirski wrote:
> [resending due to some kind of kernel.org glitch -- sorry if anyone
> gets duplicates]
>
> On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>> My current patch is based on this assumption. For KVM guest, naturally, we
>> will write the cached value to real MSRs when vcpu is scheduled in. For
>> host, SGX driver should write its own value to MSRs when it performs EINIT
>> for LE.
>
> This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
> would propose a totally different solution:

I am not sure whether the cost of writing to 4 MSRs would be *extremely* 
slow, as when vcpu is schedule in, KVM is already doing vmcs_load, 
writing to several MSRs, etc.

>
> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> with whatever lock is needed (probably just a mutex).  Users of EINIT
> will take the mutex, compare the percpu variable to the desired value,
> and, if it's different, do WRMSR and update the percpu variable.
>
> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
> support the same handling as the host.  There is no action required at
> all on KVM guest entry and exit.

This is doable, but SGX driver needs to do those things and expose 
interfaces for KVM to use. In terms of the percpu data, it is nice to 
have, but I am not sure whether it is mandatory, as IMO EINIT is not 
even in performance critical path. We can simply read old value from 
MSRs out and compare whether the old equals to the new.

>
> FWIW, I think that KVM will, in the long run, want to trap EINIT for
> other reasons: someone is going to want to implement policy for what
> enclaves are allowed that applies to guests as well as the host.

I am not very convinced why "what enclaves are allowed" in host would 
apply to guest. Can you elaborate? I mean in general virtualization just 
focus emulating hardware behavior. If a native machine is able to run 
any LE, the virtual machine should be able to as well (of course, with 
guest's IA32_FEATURE_CONTROL[bit 17] set).

> Also, some day Intel may fix its architectural design flaw [1] by
> allowing EINIT to personalize the enclave's keying, and, if it's done
> by a new argument to EINIT instead of an MSR, KVM will have to trap
> EINIT to handle it.

Looks this flaw is not the same issue as above (host enclave policy 
applies to guest)?

>
>>
>> One argument against this approach is KVM guest should never have impact on
>> host side, meaning host should not be aware of such MSR change
>
> As a somewhat generic comment, I don't like this approach to KVM
> development.  KVM mucks with lots of important architectural control
> registers, and, in all too many cases, it tries to do so independently
> of the other arch/x86 code.  This ends up causing all kinds of grief.
>
> Can't KVM and the real x86 arch code cooperate for real?  The host and
> the KVM code are in arch/x86 in the same source tree.

Currently on host SGX driver, which is pretty much self-contained, 
implements all SGX related staff.

>
>>
>> 2. KVM should restore MSRs after changing for guest.
>
> No, this is IMO silly.  Don't restore it on each exit, in the user
> return hook, or anywhere else.  Just make sure the host knows it was
> changed.
>
>> Another thing is, not quite related to selecting which approach above, and
>> either we choose approach 1 or approach 2, KVM still suffers the performance
>> loss of writing (and/or reading) to IA32_SGXLEPUBKEYHASHn MSRs, either when
>> vcpu scheduled in or during each VMEXIT/VMENTRY. Given the fact that the
>> IA32_SGXLEPUBKEYHASHn will only be used by EINIT, We can actually do some
>> optimization by trapping EINIT from guest and only update MSRs in EINIT
>> VMEXIT.
>
> Yep.
>
>> But trapping ENCLS requires either 1) KVM to run ENCLS on hebalf of guest,
>> in which case we have to reconstruct and remap guest's ENCLS parameters and
>> skip the ENCLS for guest; 2) using MTF to let guest to run ENCLS again,
>> while still trapping ENCLS.
>
> I would advocate for the former approach.  (But you can't remap the
> parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
> don't see why this is any more complicated than emulating any other
> instruction that accesses memory.)

No you cannot just copy. Because all address in guest's ENCLS parameters 
are guest's virtual address, we cannot use them to execute ENCLS in KVM. 
If any guest virtual addresses is used in ENCLS parameters, for example, 
PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to 
KVM's virtual address.

Btw, what is TOCTOU issue? would you also elaborate locking issue?

>
> If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is
> wrong and then clear the exit flag once the MSRs are in sync.  You'll
> need to be careful to avoid races in which the host's value leaks into
> the guest.  I think you'll find that this is more complicated, less
> flexible, and less performant than just handling ENCLS[EINIT] directly
> in the host.

Sorry I don't quite follow this part. Why would host's value leaks into 
guest? I suppose the *value* means host's IA32_SGXLEPUBKEYHASHn? guest's 
MSR read/write is always trapped and emulated by KVM.

>
> [1] Guests that steal sealed data from each other or from the host can
> manipulate that data without compromising the hypervisor by simply
> loading the same enclave that its rightful owner would use.  If you're
> trying to use SGX to protect your crypto credentials so that, if
> stolen, they can't be used outside the guest, I would consider this to
> be a major flaw.  It breaks the security model in a multi-tenant cloud
> situation.  I've complained about it before.
>

Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In 
this case even it is leaked looks we cannot dig anything out just the 
hash value?

Thanks,
-Kai
Andy Lutomirski May 12, 2017, 6:11 a.m. UTC | #4
On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> I am not sure whether the cost of writing to 4 MSRs would be *extremely*
> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing
> to several MSRs, etc.

I'm speculating that these MSRs may be rather unoptimized and hence
unusualy slow.

>
>>
>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>> will take the mutex, compare the percpu variable to the desired value,
>> and, if it's different, do WRMSR and update the percpu variable.
>>
>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
>> support the same handling as the host.  There is no action required at
>> all on KVM guest entry and exit.
>
>
> This is doable, but SGX driver needs to do those things and expose
> interfaces for KVM to use. In terms of the percpu data, it is nice to have,
> but I am not sure whether it is mandatory, as IMO EINIT is not even in
> performance critical path. We can simply read old value from MSRs out and
> compare whether the old equals to the new.

I think the SGX driver should probably live in arch/x86, and the
interface could be a simple percpu variable that is exported (from the
main kernel image, not from a module).

>
>>
>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>> other reasons: someone is going to want to implement policy for what
>> enclaves are allowed that applies to guests as well as the host.
>
>
> I am not very convinced why "what enclaves are allowed" in host would apply
> to guest. Can you elaborate? I mean in general virtualization just focus
> emulating hardware behavior. If a native machine is able to run any LE, the
> virtual machine should be able to as well (of course, with guest's
> IA32_FEATURE_CONTROL[bit 17] set).

I strongly disagree.  I can imagine two classes of sensible policies
for launch control:

1. Allow everything.  This seems quite sensible to me.

2. Allow some things, and make sure that VMs have at least as
restrictive a policy as host root has.  After all, what's the point of
restricting enclaves in the host if host code can simply spawn a
little VM to run otherwise-disallowed enclaves?

>
>> Also, some day Intel may fix its architectural design flaw [1] by
>> allowing EINIT to personalize the enclave's keying, and, if it's done
>> by a new argument to EINIT instead of an MSR, KVM will have to trap
>> EINIT to handle it.
>
>
> Looks this flaw is not the same issue as above (host enclave policy applies
> to guest)?

It's related.  Without this flaw, it might make sense to apply looser
policy in the guest as in the host.  With this flaw, I think your
policy fails to have any real effect if you don't enforce it on
guests.

>
>>
>>>
>>> One argument against this approach is KVM guest should never have impact
>>> on
>>> host side, meaning host should not be aware of such MSR change
>>
>>
>> As a somewhat generic comment, I don't like this approach to KVM
>> development.  KVM mucks with lots of important architectural control
>> registers, and, in all too many cases, it tries to do so independently
>> of the other arch/x86 code.  This ends up causing all kinds of grief.
>>
>> Can't KVM and the real x86 arch code cooperate for real?  The host and
>> the KVM code are in arch/x86 in the same source tree.
>
>
> Currently on host SGX driver, which is pretty much self-contained,
> implements all SGX related staff.

I will probably NAK this if it comes my way for inclusion upstream.
Just because it can be self-contained doesn't mean it should be
self-contained.

>>
>> I would advocate for the former approach.  (But you can't remap the
>> parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
>> don't see why this is any more complicated than emulating any other
>> instruction that accesses memory.)
>
>
> No you cannot just copy. Because all address in guest's ENCLS parameters are
> guest's virtual address, we cannot use them to execute ENCLS in KVM. If any
> guest virtual addresses is used in ENCLS parameters, for example,
> PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's
> virtual address.
>
> Btw, what is TOCTOU issue? would you also elaborate locking issue?

I was partially mis-remembering how this worked.  It looks like
SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be
mapped.  If KVM applied some policy to the launchable enclaves, it
would want to make sure that it only looks at fields that are copied
to make sure that the enclave that gets launched is the one it
verified.  The locking issue I'm imagining is that the SECS (or
whatever else might be mapped) doesn't disappear and get reused for
something else while it's mapped in the host.  Presumably KVM has an
existing mechanism for this, but maybe SECS is special because it's
not quite normal memory IIRC.

>
>>
>> If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is
>> wrong and then clear the exit flag once the MSRs are in sync.  You'll
>> need to be careful to avoid races in which the host's value leaks into
>> the guest.  I think you'll find that this is more complicated, less
>> flexible, and less performant than just handling ENCLS[EINIT] directly
>> in the host.
>
>
> Sorry I don't quite follow this part. Why would host's value leaks into
> guest? I suppose the *value* means host's IA32_SGXLEPUBKEYHASHn? guest's MSR
> read/write is always trapped and emulated by KVM.

You'd need to make sure that this sequence of events doesn't happen:

 - Guest does EINIT and it exits.
 - Host updates the MSRs and the ENCLS-exiting bitmap.
 - Guest is preempted before it retries EINIT.
 - A different host thread launches an enclave, thus changing the MSRs.
 - Guest resumes and runs EINIT without exiting with the wrong MSR values.

>
>>
>> [1] Guests that steal sealed data from each other or from the host can
>> manipulate that data without compromising the hypervisor by simply
>> loading the same enclave that its rightful owner would use.  If you're
>> trying to use SGX to protect your crypto credentials so that, if
>> stolen, they can't be used outside the guest, I would consider this to
>> be a major flaw.  It breaks the security model in a multi-tenant cloud
>> situation.  I've complained about it before.
>>
>
> Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this
> case even it is leaked looks we cannot dig anything out just the hash value?

Not sure what you mean.  Are you asking about the lack of guest personalization?

Concretely, imagine I write an enclave that seals my TLS client
certificate's private key and offers an API to sign TLS certificate
requests with it.  This way, if my system is compromised, an attacker
can use the certificate only so long as they have access to my
machine.  If I kick them out or if they merely get the ability to read
the sealed data but not to execute code, the private key should still
be safe.  But, if this system is a VM guest, the attacker could run
the exact same enclave on another guest on the same physical CPU and
sign using my key.  Whoops!
Sean Christopherson May 12, 2017, 6:48 p.m. UTC | #5
Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> > I am not sure whether the cost of writing to 4 MSRs would be *extremely*
> > slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing
> > to several MSRs, etc.
> 
> I'm speculating that these MSRs may be rather unoptimized and hence
> unusualy slow.
> 

Good speculation :)  We've been told to expect that writing the hash MSRs
will be at least 2.5x slower than normal MSRs.

> >
> >>
> >> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> >> with whatever lock is needed (probably just a mutex).  Users of EINIT
> >> will take the mutex, compare the percpu variable to the desired value,
> >> and, if it's different, do WRMSR and update the percpu variable.
> >>
> >> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
> >> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
> >> support the same handling as the host.  There is no action required at
> >> all on KVM guest entry and exit.
> >
> >
> > This is doable, but SGX driver needs to do those things and expose
> > interfaces for KVM to use. In terms of the percpu data, it is nice to have,
> > but I am not sure whether it is mandatory, as IMO EINIT is not even in
> > performance critical path. We can simply read old value from MSRs out and
> > compare whether the old equals to the new.
> 
> I think the SGX driver should probably live in arch/x86, and the
> interface could be a simple percpu variable that is exported (from the
> main kernel image, not from a module).
> 

Agreed, this would make life easier for future SGX code that can't be
self-contained in the driver, e.g. EPC cgroup.  Future architectural
enhancements might also require tighter integration with the kernel.


> >>
> >> I would advocate for the former approach.  (But you can't remap the
> >> parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
> >> don't see why this is any more complicated than emulating any other
> >> instruction that accesses memory.)
> >
> >
> > No you cannot just copy. Because all address in guest's ENCLS parameters are
> > guest's virtual address, we cannot use them to execute ENCLS in KVM. If any
> > guest virtual addresses is used in ENCLS parameters, for example,
> > PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's
> > virtual address.
> >
> > Btw, what is TOCTOU issue? would you also elaborate locking issue?
> 
> I was partially mis-remembering how this worked.  It looks like
> SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be
> mapped.  If KVM applied some policy to the launchable enclaves, it
> would want to make sure that it only looks at fields that are copied
> to make sure that the enclave that gets launched is the one it
> verified.  The locking issue I'm imagining is that the SECS (or
> whatever else might be mapped) doesn't disappear and get reused for
> something else while it's mapped in the host.  Presumably KVM has an
> existing mechanism for this, but maybe SECS is special because it's
> not quite normal memory IIRC.
> 

Mapping the SECS in the host should not be an issue, AFAIK there aren't
any restrictions on the VA passed to EINIT as long as it resolves to a
SECS page in the EPCM, e.g. the SGX driver maps the SECS for EINIT with
an arbitrary VA.

I don't think emulating EINIT introduces any TOCTOU race conditions that
wouldn't already exist.  Evicting the SECS or modifying the page tables
on a different thread while executing EINIT is either a guest kernel bug
or bizarre behavior that the guest can already handle.  Similarly, KVM
would need special handling for evicting a guest's SECS, regardless of
EINIT emulation.

> >> [1] Guests that steal sealed data from each other or from the host can
> >> manipulate that data without compromising the hypervisor by simply
> >> loading the same enclave that its rightful owner would use.  If you're
> >> trying to use SGX to protect your crypto credentials so that, if
> >> stolen, they can't be used outside the guest, I would consider this to
> >> be a major flaw.  It breaks the security model in a multi-tenant cloud
> >> situation.  I've complained about it before.
> >>
> >
> > Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this
> > case even it is leaked looks we cannot dig anything out just the hash value?
> 
> Not sure what you mean.  Are you asking about the lack of guest
> personalization?
> 
> Concretely, imagine I write an enclave that seals my TLS client
> certificate's private key and offers an API to sign TLS certificate
> requests with it.  This way, if my system is compromised, an attacker
> can use the certificate only so long as they have access to my
> machine.  If I kick them out or if they merely get the ability to read
> the sealed data but not to execute code, the private key should still
> be safe.  But, if this system is a VM guest, the attacker could run
> the exact same enclave on another guest on the same physical CPU and
> sign using my key.  Whoops!

I know this issue has been raised internally as well, but I don't know
the status of the situation.  I'll follow up and provide any information
I can.
Sean Christopherson May 12, 2017, 8:50 p.m. UTC | #6
Christopherson, Sean J <sean.j.christopherson@intel.com> wrote:
> Andy Lutomirski <luto@kernel.org> wrote:
> > On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
> > >> [1] Guests that steal sealed data from each other or from the host can
> > >> manipulate that data without compromising the hypervisor by simply
> > >> loading the same enclave that its rightful owner would use.  If you're
> > >> trying to use SGX to protect your crypto credentials so that, if
> > >> stolen, they can't be used outside the guest, I would consider this to
> > >> be a major flaw.  It breaks the security model in a multi-tenant cloud
> > >> situation.  I've complained about it before.
> > >>
> > >
> > > Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In
> > > this case even it is leaked looks we cannot dig anything out just the
> > > hash value?
> > 
> > Not sure what you mean.  Are you asking about the lack of guest
> > personalization?
> > 
> > Concretely, imagine I write an enclave that seals my TLS client
> > certificate's private key and offers an API to sign TLS certificate
> > requests with it.  This way, if my system is compromised, an attacker
> > can use the certificate only so long as they have access to my
> > machine.  If I kick them out or if they merely get the ability to read
> > the sealed data but not to execute code, the private key should still
> > be safe.  But, if this system is a VM guest, the attacker could run
> > the exact same enclave on another guest on the same physical CPU and
> > sign using my key.  Whoops!
> 
> I know this issue has been raised internally as well, but I don't know
> the status of the situation.  I'll follow up and provide any information
> I can.

So, the key players are well aware of the value added by per-VM keys,
but, ultimately, shipping this feature is dependent on having strong
requests from customers.
Jarkko Sakkinen May 15, 2017, 12:46 p.m. UTC | #7
On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote:
> [resending due to some kind of kernel.org glitch -- sorry if anyone
> gets duplicates]
> 
> On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> > My current patch is based on this assumption. For KVM guest, naturally, we
> > will write the cached value to real MSRs when vcpu is scheduled in. For
> > host, SGX driver should write its own value to MSRs when it performs EINIT
> > for LE.
> 
> This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
> would propose a totally different solution:
> 
> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> with whatever lock is needed (probably just a mutex).  Users of EINIT
> will take the mutex, compare the percpu variable to the desired value,
> and, if it's different, do WRMSR and update the percpu variable.

This is exactly what I've been suggesting internally: trap EINIT and
check the value and write conditionally.

I think this would be the best starting point.

/Jarkko
Kai Huang May 15, 2017, 11:56 p.m. UTC | #8
On 5/16/2017 12:46 AM, Jarkko Sakkinen wrote:
> On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote:
>> [resending due to some kind of kernel.org glitch -- sorry if anyone
>> gets duplicates]
>>
>> On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>> My current patch is based on this assumption. For KVM guest, naturally, we
>>> will write the cached value to real MSRs when vcpu is scheduled in. For
>>> host, SGX driver should write its own value to MSRs when it performs EINIT
>>> for LE.
>>
>> This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
>> would propose a totally different solution:
>>
>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>> will take the mutex, compare the percpu variable to the desired value,
>> and, if it's different, do WRMSR and update the percpu variable.
>
> This is exactly what I've been suggesting internally: trap EINIT and
> check the value and write conditionally.
>
> I think this would be the best starting point.

OK. Assuming we are going to have this percpu variable for 
IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to 
this percpu variable after KVM writes guest's value to hardware MSR? And 
host (SGX driver) need to do the same thing (check the value and write 
conditionally), correct?

Thanks,
-Kai

>
> /Jarkko
>
Kai Huang May 16, 2017, 12:48 a.m. UTC | #9
On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>> I am not sure whether the cost of writing to 4 MSRs would be *extremely*
>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing
>> to several MSRs, etc.
>
> I'm speculating that these MSRs may be rather unoptimized and hence
> unusualy slow.
>
>>
>>>
>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>> will take the mutex, compare the percpu variable to the desired value,
>>> and, if it's different, do WRMSR and update the percpu variable.
>>>
>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
>>> support the same handling as the host.  There is no action required at
>>> all on KVM guest entry and exit.
>>
>>
>> This is doable, but SGX driver needs to do those things and expose
>> interfaces for KVM to use. In terms of the percpu data, it is nice to have,
>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>> performance critical path. We can simply read old value from MSRs out and
>> compare whether the old equals to the new.
>
> I think the SGX driver should probably live in arch/x86, and the
> interface could be a simple percpu variable that is exported (from the
> main kernel image, not from a module).
>
>>
>>>
>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>> other reasons: someone is going to want to implement policy for what
>>> enclaves are allowed that applies to guests as well as the host.
>>
>>
>> I am not very convinced why "what enclaves are allowed" in host would apply
>> to guest. Can you elaborate? I mean in general virtualization just focus
>> emulating hardware behavior. If a native machine is able to run any LE, the
>> virtual machine should be able to as well (of course, with guest's
>> IA32_FEATURE_CONTROL[bit 17] set).
>
> I strongly disagree.  I can imagine two classes of sensible policies
> for launch control:
>
> 1. Allow everything.  This seems quite sensible to me.
>
> 2. Allow some things, and make sure that VMs have at least as
> restrictive a policy as host root has.  After all, what's the point of
> restricting enclaves in the host if host code can simply spawn a
> little VM to run otherwise-disallowed enclaves?

What's the current SGX driver launch control policy? Yes allow 
everything works for KVM so lets skip this. Are we going to support 
allowing several LEs, or just allowing one single LE? I know Jarkko is 
doing in-kernel LE staff but I don't know details.

I am trying to find a way that we can both not break host launch control 
policy, and be consistent to HW behavior (from guest's view). Currently 
we can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn 
either enabled or disabled. I introduced an Qemu parameter 'lewr' for 
this purpose. Actually I introduced below Qemu SGX parameters for 
creating guest:

	-sgx epc=<size>,lehash='SHA-256 hash',lewr

where 'epc' specifies guest's EPC size, lehash specifies (initial) value 
of guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is 
allowed to change guest's IA32_SGXLEPUBKEYHASHn at runtime.

If host only allows one single LE to run, KVM can add a restrict that 
only allows to create KVM guest with runtime change to 
IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed (single) hash 
can be used by guest. From guest's view, it simply has 
IA32_FEATURE_CONTROL[bit17] cleared and has IA32_SGXLEPUBKEYHASHn with 
default value to be host allowed (single) hash.

If host allows several LEs (not but everything), and if we create guest 
with 'lewr', then the behavior is not consistent with HW behavior, as 
from guest's hardware's point of view, we can actually run any LE but we 
have to tell guest that you are only allowed to change 
IA32_SGXLEPUBKEYHASHn to some specific values. One compromise solution 
is we don't allow to create guest with 'lewr' specified, and at the 
meantime, only allow to create guest with host approved hashes specified 
in 'lehash'. This will make guest's behavior consistent to HW behavior 
but only allows guest to run one LE (which is specified by 'lehash' when 
guest is created).

I'd like to hear comments from you guys.

Paolo, do you also have comments here from KVM's side?

Thanks,
-Kai

>
>>
>>> Also, some day Intel may fix its architectural design flaw [1] by
>>> allowing EINIT to personalize the enclave's keying, and, if it's done
>>> by a new argument to EINIT instead of an MSR, KVM will have to trap
>>> EINIT to handle it.
>>
>>
>> Looks this flaw is not the same issue as above (host enclave policy applies
>> to guest)?
>
> It's related.  Without this flaw, it might make sense to apply looser
> policy in the guest as in the host.  With this flaw, I think your
> policy fails to have any real effect if you don't enforce it on
> guests.
>
>>
>>>
>>>>
>>>> One argument against this approach is KVM guest should never have impact
>>>> on
>>>> host side, meaning host should not be aware of such MSR change
>>>
>>>
>>> As a somewhat generic comment, I don't like this approach to KVM
>>> development.  KVM mucks with lots of important architectural control
>>> registers, and, in all too many cases, it tries to do so independently
>>> of the other arch/x86 code.  This ends up causing all kinds of grief.
>>>
>>> Can't KVM and the real x86 arch code cooperate for real?  The host and
>>> the KVM code are in arch/x86 in the same source tree.
>>
>>
>> Currently on host SGX driver, which is pretty much self-contained,
>> implements all SGX related staff.
>
> I will probably NAK this if it comes my way for inclusion upstream.
> Just because it can be self-contained doesn't mean it should be
> self-contained.
>
>>>
>>> I would advocate for the former approach.  (But you can't remap the
>>> parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
>>> don't see why this is any more complicated than emulating any other
>>> instruction that accesses memory.)
>>
>>
>> No you cannot just copy. Because all address in guest's ENCLS parameters are
>> guest's virtual address, we cannot use them to execute ENCLS in KVM. If any
>> guest virtual addresses is used in ENCLS parameters, for example,
>> PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's
>> virtual address.
>>
>> Btw, what is TOCTOU issue? would you also elaborate locking issue?
>
> I was partially mis-remembering how this worked.  It looks like
> SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be
> mapped.  If KVM applied some policy to the launchable enclaves, it
> would want to make sure that it only looks at fields that are copied
> to make sure that the enclave that gets launched is the one it
> verified.  The locking issue I'm imagining is that the SECS (or
> whatever else might be mapped) doesn't disappear and get reused for
> something else while it's mapped in the host.  Presumably KVM has an
> existing mechanism for this, but maybe SECS is special because it's
> not quite normal memory IIRC.
>
>>
>>>
>>> If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is
>>> wrong and then clear the exit flag once the MSRs are in sync.  You'll
>>> need to be careful to avoid races in which the host's value leaks into
>>> the guest.  I think you'll find that this is more complicated, less
>>> flexible, and less performant than just handling ENCLS[EINIT] directly
>>> in the host.
>>
>>
>> Sorry I don't quite follow this part. Why would host's value leaks into
>> guest? I suppose the *value* means host's IA32_SGXLEPUBKEYHASHn? guest's MSR
>> read/write is always trapped and emulated by KVM.
>
> You'd need to make sure that this sequence of events doesn't happen:
>
>  - Guest does EINIT and it exits.
>  - Host updates the MSRs and the ENCLS-exiting bitmap.
>  - Guest is preempted before it retries EINIT.
>  - A different host thread launches an enclave, thus changing the MSRs.
>  - Guest resumes and runs EINIT without exiting with the wrong MSR values.
>
>>
>>>
>>> [1] Guests that steal sealed data from each other or from the host can
>>> manipulate that data without compromising the hypervisor by simply
>>> loading the same enclave that its rightful owner would use.  If you're
>>> trying to use SGX to protect your crypto credentials so that, if
>>> stolen, they can't be used outside the guest, I would consider this to
>>> be a major flaw.  It breaks the security model in a multi-tenant cloud
>>> situation.  I've complained about it before.
>>>
>>
>> Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this
>> case even it is leaked looks we cannot dig anything out just the hash value?
>
> Not sure what you mean.  Are you asking about the lack of guest personalization?
>
> Concretely, imagine I write an enclave that seals my TLS client
> certificate's private key and offers an API to sign TLS certificate
> requests with it.  This way, if my system is compromised, an attacker
> can use the certificate only so long as they have access to my
> machine.  If I kick them out or if they merely get the ability to read
> the sealed data but not to execute code, the private key should still
> be safe.  But, if this system is a VM guest, the attacker could run
> the exact same enclave on another guest on the same physical CPU and
> sign using my key.  Whoops!
>
Kai Huang May 16, 2017, 12:59 a.m. UTC | #10
On 5/13/2017 6:48 AM, Christopherson, Sean J wrote:
> Andy Lutomirski <luto@kernel.org> wrote:
>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>> I am not sure whether the cost of writing to 4 MSRs would be *extremely*
>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing
>>> to several MSRs, etc.
>>
>> I'm speculating that these MSRs may be rather unoptimized and hence
>> unusualy slow.
>>
>
> Good speculation :)  We've been told to expect that writing the hash MSRs
> will be at least 2.5x slower than normal MSRs.
>
>>>
>>>>
>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>> will take the mutex, compare the percpu variable to the desired value,
>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>
>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
>>>> support the same handling as the host.  There is no action required at
>>>> all on KVM guest entry and exit.
>>>
>>>
>>> This is doable, but SGX driver needs to do those things and expose
>>> interfaces for KVM to use. In terms of the percpu data, it is nice to have,
>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>> performance critical path. We can simply read old value from MSRs out and
>>> compare whether the old equals to the new.
>>
>> I think the SGX driver should probably live in arch/x86, and the
>> interface could be a simple percpu variable that is exported (from the
>> main kernel image, not from a module).
>>
>
> Agreed, this would make life easier for future SGX code that can't be
> self-contained in the driver, e.g. EPC cgroup.  Future architectural
> enhancements might also require tighter integration with the kernel.


>
>
>>>>
>>>> I would advocate for the former approach.  (But you can't remap the
>>>> parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
>>>> don't see why this is any more complicated than emulating any other
>>>> instruction that accesses memory.)
>>>
>>>
>>> No you cannot just copy. Because all address in guest's ENCLS parameters are
>>> guest's virtual address, we cannot use them to execute ENCLS in KVM. If any
>>> guest virtual addresses is used in ENCLS parameters, for example,
>>> PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's
>>> virtual address.
>>>
>>> Btw, what is TOCTOU issue? would you also elaborate locking issue?
>>
>> I was partially mis-remembering how this worked.  It looks like
>> SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be
>> mapped.  If KVM applied some policy to the launchable enclaves, it
>> would want to make sure that it only looks at fields that are copied
>> to make sure that the enclave that gets launched is the one it
>> verified.  The locking issue I'm imagining is that the SECS (or
>> whatever else might be mapped) doesn't disappear and get reused for
>> something else while it's mapped in the host.  Presumably KVM has an
>> existing mechanism for this, but maybe SECS is special because it's
>> not quite normal memory IIRC.

I am thinking we might not need to check value in SIGSTRUCT or 
EINITTOKEN, as KVM needs to emulate guest IA32_SGXLEPUBKEYHASHn write. 
If we decide we should apply host's policy to KVM guest, it seems we can 
do the check when trapping guest write to IA32_SGXLEPUBKEYHASHn, and 
only host approved values (may be a white-list or something?) are 
allowed to be written in the guest.

Thanks,
-Kai

>>
>
> Mapping the SECS in the host should not be an issue, AFAIK there aren't
> any restrictions on the VA passed to EINIT as long as it resolves to a
> SECS page in the EPCM, e.g. the SGX driver maps the SECS for EINIT with
> an arbitrary VA.
>
> I don't think emulating EINIT introduces any TOCTOU race conditions that
> wouldn't already exist.  Evicting the SECS or modifying the page tables
> on a different thread while executing EINIT is either a guest kernel bug
> or bizarre behavior that the guest can already handle.  Similarly, KVM
> would need special handling for evicting a guest's SECS, regardless of
> EINIT emulation.

Agreed.

>
>>>> [1] Guests that steal sealed data from each other or from the host can
>>>> manipulate that data without compromising the hypervisor by simply
>>>> loading the same enclave that its rightful owner would use.  If you're
>>>> trying to use SGX to protect your crypto credentials so that, if
>>>> stolen, they can't be used outside the guest, I would consider this to
>>>> be a major flaw.  It breaks the security model in a multi-tenant cloud
>>>> situation.  I've complained about it before.
>>>>
>>>
>>> Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this
>>> case even it is leaked looks we cannot dig anything out just the hash value?
>>
>> Not sure what you mean.  Are you asking about the lack of guest
>> personalization?
>>
>> Concretely, imagine I write an enclave that seals my TLS client
>> certificate's private key and offers an API to sign TLS certificate
>> requests with it.  This way, if my system is compromised, an attacker
>> can use the certificate only so long as they have access to my
>> machine.  If I kick them out or if they merely get the ability to read
>> the sealed data but not to execute code, the private key should still
>> be safe.  But, if this system is a VM guest, the attacker could run
>> the exact same enclave on another guest on the same physical CPU and
>> sign using my key.  Whoops!
>
> I know this issue has been raised internally as well, but I don't know
> the status of the situation.  I'll follow up and provide any information
> I can.
>
Kai Huang May 16, 2017, 1:22 a.m. UTC | #11
On 5/13/2017 6:48 AM, Christopherson, Sean J wrote:
> Andy Lutomirski <luto@kernel.org> wrote:
>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>> I am not sure whether the cost of writing to 4 MSRs would be *extremely*
>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing
>>> to several MSRs, etc.
>>
>> I'm speculating that these MSRs may be rather unoptimized and hence
>> unusualy slow.
>>
>
> Good speculation :)  We've been told to expect that writing the hash MSRs
> will be at least 2.5x slower than normal MSRs.
>
>>>
>>>>
>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>> will take the mutex, compare the percpu variable to the desired value,
>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>
>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
>>>> support the same handling as the host.  There is no action required at
>>>> all on KVM guest entry and exit.
>>>
>>>
>>> This is doable, but SGX driver needs to do those things and expose
>>> interfaces for KVM to use. In terms of the percpu data, it is nice to have,
>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>> performance critical path. We can simply read old value from MSRs out and
>>> compare whether the old equals to the new.
>>
>> I think the SGX driver should probably live in arch/x86, and the
>> interface could be a simple percpu variable that is exported (from the
>> main kernel image, not from a module).
>>
>
> Agreed, this would make life easier for future SGX code that can't be
> self-contained in the driver, e.g. EPC cgroup.  Future architectural
> enhancements might also require tighter integration with the kernel.

I think this is better as well. In this way we can leverage SGX code 
more easily. Some SGX detection code can be done in arch/x86/ as well, 
so that other code can access, ex, SGX capabilities, quickly.

Another thing is actually SDK says SGX CPUID is per-thread, and we 
should not assume SGX CPUID will report the same info on all processors. 
I think it's better to check this as well. Moving SGX detection to 
identify_cpu can make this work more easily.

Thanks,
-Kai
>
>
>>>>
>>>> I would advocate for the former approach.  (But you can't remap the
>>>> parameters due to TOCTOU issues, locking, etc.  Just copy them.  I
>>>> don't see why this is any more complicated than emulating any other
>>>> instruction that accesses memory.)
>>>
>>>
>>> No you cannot just copy. Because all address in guest's ENCLS parameters are
>>> guest's virtual address, we cannot use them to execute ENCLS in KVM. If any
>>> guest virtual addresses is used in ENCLS parameters, for example,
>>> PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's
>>> virtual address.
>>>
>>> Btw, what is TOCTOU issue? would you also elaborate locking issue?
>>
>> I was partially mis-remembering how this worked.  It looks like
>> SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be
>> mapped.  If KVM applied some policy to the launchable enclaves, it
>> would want to make sure that it only looks at fields that are copied
>> to make sure that the enclave that gets launched is the one it
>> verified.  The locking issue I'm imagining is that the SECS (or
>> whatever else might be mapped) doesn't disappear and get reused for
>> something else while it's mapped in the host.  Presumably KVM has an
>> existing mechanism for this, but maybe SECS is special because it's
>> not quite normal memory IIRC.
>>
>
> Mapping the SECS in the host should not be an issue, AFAIK there aren't
> any restrictions on the VA passed to EINIT as long as it resolves to a
> SECS page in the EPCM, e.g. the SGX driver maps the SECS for EINIT with
> an arbitrary VA.
>
> I don't think emulating EINIT introduces any TOCTOU race conditions that
> wouldn't already exist.  Evicting the SECS or modifying the page tables
> on a different thread while executing EINIT is either a guest kernel bug
> or bizarre behavior that the guest can already handle.  Similarly, KVM
> would need special handling for evicting a guest's SECS, regardless of
> EINIT emulation.
>
>>>> [1] Guests that steal sealed data from each other or from the host can
>>>> manipulate that data without compromising the hypervisor by simply
>>>> loading the same enclave that its rightful owner would use.  If you're
>>>> trying to use SGX to protect your crypto credentials so that, if
>>>> stolen, they can't be used outside the guest, I would consider this to
>>>> be a major flaw.  It breaks the security model in a multi-tenant cloud
>>>> situation.  I've complained about it before.
>>>>
>>>
>>> Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this
>>> case even it is leaked looks we cannot dig anything out just the hash value?
>>
>> Not sure what you mean.  Are you asking about the lack of guest
>> personalization?
>>
>> Concretely, imagine I write an enclave that seals my TLS client
>> certificate's private key and offers an API to sign TLS certificate
>> requests with it.  This way, if my system is compromised, an attacker
>> can use the certificate only so long as they have access to my
>> machine.  If I kick them out or if they merely get the ability to read
>> the sealed data but not to execute code, the private key should still
>> be safe.  But, if this system is a VM guest, the attacker could run
>> the exact same enclave on another guest on the same physical CPU and
>> sign using my key.  Whoops!
>
> I know this issue has been raised internally as well, but I don't know
> the status of the situation.  I'll follow up and provide any information
> I can.
>
Paolo Bonzini May 16, 2017, 2:21 p.m. UTC | #12
On 16/05/2017 02:48, Huang, Kai wrote:
> 
> 
> If host only allows one single LE to run, KVM can add a restrict that
> only allows to create KVM guest with runtime change to
> IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed (single) hash
> can be used by guest. From guest's view, it simply has
> IA32_FEATURE_CONTROL[bit17] cleared and has IA32_SGXLEPUBKEYHASHn with
> default value to be host allowed (single) hash.
> 
> If host allows several LEs (not but everything), and if we create guest
> with 'lewr', then the behavior is not consistent with HW behavior, as
> from guest's hardware's point of view, we can actually run any LE but we
> have to tell guest that you are only allowed to change
> IA32_SGXLEPUBKEYHASHn to some specific values. One compromise solution
> is we don't allow to create guest with 'lewr' specified, and at the
> meantime, only allow to create guest with host approved hashes specified
> in 'lehash'. This will make guest's behavior consistent to HW behavior
> but only allows guest to run one LE (which is specified by 'lehash' when
> guest is created).
> 
> I'd like to hear comments from you guys.
> 
> Paolo, do you also have comments here from KVM's side?

I would start with read-only LE hash (same as the host), which is a
valid configuration anyway.  Then later we can trap EINIT to emulate
IA32_SGXLEPUBKEYHASHn.

Paolo
Paolo Bonzini May 16, 2017, 2:23 p.m. UTC | #13
On 16/05/2017 01:56, Huang, Kai wrote:
>>>
>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>> will take the mutex, compare the percpu variable to the desired value,
>>> and, if it's different, do WRMSR and update the percpu variable.
>>
>> This is exactly what I've been suggesting internally: trap EINIT and
>> check the value and write conditionally.
>>
>> I think this would be the best starting point.
> 
> OK. Assuming we are going to have this percpu variable for
> IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to
> this percpu variable after KVM writes guest's value to hardware MSR? And
> host (SGX driver) need to do the same thing (check the value and write
> conditionally), correct?

The percpu variable is just an optimization.  If EINIT is not
performance critical, you could even do the WRMSR unconditionally; what
matters is having a mutex that covers both WRMSR and EINIT.

Thanks,

Paolo
Andy Lutomirski May 17, 2017, 12:09 a.m. UTC | #14
On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>
>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>> wrote:
>>>
>>> I am not sure whether the cost of writing to 4 MSRs would be *extremely*
>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>> writing
>>> to several MSRs, etc.
>>
>>
>> I'm speculating that these MSRs may be rather unoptimized and hence
>> unusualy slow.
>>
>>>
>>>>
>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>> will take the mutex, compare the percpu variable to the desired value,
>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>
>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
>>>> support the same handling as the host.  There is no action required at
>>>> all on KVM guest entry and exit.
>>>
>>>
>>>
>>> This is doable, but SGX driver needs to do those things and expose
>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>> have,
>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>> performance critical path. We can simply read old value from MSRs out and
>>> compare whether the old equals to the new.
>>
>>
>> I think the SGX driver should probably live in arch/x86, and the
>> interface could be a simple percpu variable that is exported (from the
>> main kernel image, not from a module).
>>
>>>
>>>>
>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>> other reasons: someone is going to want to implement policy for what
>>>> enclaves are allowed that applies to guests as well as the host.
>>>
>>>
>>>
>>> I am not very convinced why "what enclaves are allowed" in host would
>>> apply
>>> to guest. Can you elaborate? I mean in general virtualization just focus
>>> emulating hardware behavior. If a native machine is able to run any LE,
>>> the
>>> virtual machine should be able to as well (of course, with guest's
>>> IA32_FEATURE_CONTROL[bit 17] set).
>>
>>
>> I strongly disagree.  I can imagine two classes of sensible policies
>> for launch control:
>>
>> 1. Allow everything.  This seems quite sensible to me.
>>
>> 2. Allow some things, and make sure that VMs have at least as
>> restrictive a policy as host root has.  After all, what's the point of
>> restricting enclaves in the host if host code can simply spawn a
>> little VM to run otherwise-disallowed enclaves?
>
>
> What's the current SGX driver launch control policy? Yes allow everything
> works for KVM so lets skip this. Are we going to support allowing several
> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
> staff but I don't know details.
>
> I am trying to find a way that we can both not break host launch control
> policy, and be consistent to HW behavior (from guest's view). Currently we
> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn either
> enabled or disabled. I introduced an Qemu parameter 'lewr' for this purpose.
> Actually I introduced below Qemu SGX parameters for creating guest:
>
>         -sgx epc=<size>,lehash='SHA-256 hash',lewr
>
> where 'epc' specifies guest's EPC size, lehash specifies (initial) value of
> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is allowed
> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>
> If host only allows one single LE to run, KVM can add a restrict that only
> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
> disabled, so that only host allowed (single) hash can be used by guest. From
> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) hash.
>
> If host allows several LEs (not but everything), and if we create guest with
> 'lewr', then the behavior is not consistent with HW behavior, as from
> guest's hardware's point of view, we can actually run any LE but we have to
> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to some
> specific values. One compromise solution is we don't allow to create guest
> with 'lewr' specified, and at the meantime, only allow to create guest with
> host approved hashes specified in 'lehash'. This will make guest's behavior
> consistent to HW behavior but only allows guest to run one LE (which is
> specified by 'lehash' when guest is created).

I'm not sure I entirely agree for a couple reasons.

1. I wouldn't be surprised if the kernel ends up implementing a policy
in which it checks all enclaves (not just LEs) for acceptability.  In
fact, if the kernel sticks with the "no LE at all or just
kernel-internal LE", then checking enclaves directly against some
admin- or distro-provided signer list seems reasonable.  This type of
policy can't be forwarded to a guest by restricting allowed LE
signers.  But this is mostly speculation since AFAIK no one has
seriously proposed any particular policy support and the plan was to
not have this for the initial implementation.

2. While matching hardware behavior is nice in principle, there
doesn't seem to be useful hardware behavior to match here.  If the
host had a list of five allowed LE signers, how exactly would it
restrict the MSRs?  They're not written atomically, so you can't
directly tell what's being written.  Also, the only way to fail an MSR
write is to send #GP, and Windows (and even Linux) may not expect
that.  Linux doesn't panic due to #GP on MSR writes these days, but
you still get a big fat warning.  I wouldn't be at all surprised if
Windows BSODs.  ENCLS[EINIT], on the other hand, returns an actual
error code.  I'm not sure that a sensible error code exists
("SGX_HYPERVISOR_SAID_NO?", perhaps), but SGX_INVALID_EINITTOKEN seems
to mean, more or less, "the CPU thinks you're not authorized to do
this", so forcing that error code could be entirely reasonable.

If the host policy is to allow a list of LE signers, you could return
SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
the list.

--Andy
Sean Christopherson May 17, 2017, 2:21 p.m. UTC | #15
On Tue, 2017-05-16 at 11:56 +1200, Huang, Kai wrote:
> 
> On 5/16/2017 12:46 AM, Jarkko Sakkinen wrote:
> > 
> > On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote:
> > > 
> > > [resending due to some kind of kernel.org glitch -- sorry if anyone
> > > gets duplicates]
> > > 
> > > On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com>
> > > wrote:
> > > > 
> > > > My current patch is based on this assumption. For KVM guest, naturally,
> > > > we
> > > > will write the cached value to real MSRs when vcpu is scheduled in. For
> > > > host, SGX driver should write its own value to MSRs when it performs
> > > > EINIT
> > > > for LE.
> > > This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
> > > would propose a totally different solution:
> > > 
> > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> > > with whatever lock is needed (probably just a mutex).  Users of EINIT
> > > will take the mutex, compare the percpu variable to the desired value,
> > > and, if it's different, do WRMSR and update the percpu variable.
> > This is exactly what I've been suggesting internally: trap EINIT and
> > check the value and write conditionally.
> > 
> > I think this would be the best starting point.
> OK. Assuming we are going to have this percpu variable for 
> IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to 
> this percpu variable after KVM writes guest's value to hardware MSR? And 
> host (SGX driver) need to do the same thing (check the value and write 
> conditionally), correct?
> 
> Thanks,
> -Kai

Yes, the percpu variable is simply a cache so that the kernel doesn't have to do
four RDMSRs every time it wants to do EINIT.  KVM would still maintain shadow
copies of the MSRs for each vcpu for emulating RDMSR, WRMSR and EINIT.  I don't
think KVM would even need to be aware of the percpu variable, i.e. the entire
lock->(rd/wr)msr->EINIT->unlock sequence can probably be encapsulated in a
single function that is called from both the primary SGX driver and from KVM.
Kai Huang May 18, 2017, 7:45 a.m. UTC | #16
On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>
>>
>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>
>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>>> wrote:
>>>>
>>>> I am not sure whether the cost of writing to 4 MSRs would be *extremely*
>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>> writing
>>>> to several MSRs, etc.
>>>
>>>
>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>> unusualy slow.
>>>
>>>>
>>>>>
>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>>> will take the mutex, compare the percpu variable to the desired value,
>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>
>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
>>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
>>>>> support the same handling as the host.  There is no action required at
>>>>> all on KVM guest entry and exit.
>>>>
>>>>
>>>>
>>>> This is doable, but SGX driver needs to do those things and expose
>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>>> have,
>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>>> performance critical path. We can simply read old value from MSRs out and
>>>> compare whether the old equals to the new.
>>>
>>>
>>> I think the SGX driver should probably live in arch/x86, and the
>>> interface could be a simple percpu variable that is exported (from the
>>> main kernel image, not from a module).
>>>
>>>>
>>>>>
>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>>> other reasons: someone is going to want to implement policy for what
>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>
>>>>
>>>>
>>>> I am not very convinced why "what enclaves are allowed" in host would
>>>> apply
>>>> to guest. Can you elaborate? I mean in general virtualization just focus
>>>> emulating hardware behavior. If a native machine is able to run any LE,
>>>> the
>>>> virtual machine should be able to as well (of course, with guest's
>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>
>>>
>>> I strongly disagree.  I can imagine two classes of sensible policies
>>> for launch control:
>>>
>>> 1. Allow everything.  This seems quite sensible to me.
>>>
>>> 2. Allow some things, and make sure that VMs have at least as
>>> restrictive a policy as host root has.  After all, what's the point of
>>> restricting enclaves in the host if host code can simply spawn a
>>> little VM to run otherwise-disallowed enclaves?
>>
>>
>> What's the current SGX driver launch control policy? Yes allow everything
>> works for KVM so lets skip this. Are we going to support allowing several
>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
>> staff but I don't know details.
>>
>> I am trying to find a way that we can both not break host launch control
>> policy, and be consistent to HW behavior (from guest's view). Currently we
>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn either
>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this purpose.
>> Actually I introduced below Qemu SGX parameters for creating guest:
>>
>>         -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>
>> where 'epc' specifies guest's EPC size, lehash specifies (initial) value of
>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is allowed
>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>
>> If host only allows one single LE to run, KVM can add a restrict that only
>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>> disabled, so that only host allowed (single) hash can be used by guest. From
>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) hash.
>>
>> If host allows several LEs (not but everything), and if we create guest with
>> 'lewr', then the behavior is not consistent with HW behavior, as from
>> guest's hardware's point of view, we can actually run any LE but we have to
>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to some
>> specific values. One compromise solution is we don't allow to create guest
>> with 'lewr' specified, and at the meantime, only allow to create guest with
>> host approved hashes specified in 'lehash'. This will make guest's behavior
>> consistent to HW behavior but only allows guest to run one LE (which is
>> specified by 'lehash' when guest is created).
>
> I'm not sure I entirely agree for a couple reasons.
>
> 1. I wouldn't be surprised if the kernel ends up implementing a policy
> in which it checks all enclaves (not just LEs) for acceptability.  In
> fact, if the kernel sticks with the "no LE at all or just
> kernel-internal LE", then checking enclaves directly against some
> admin- or distro-provided signer list seems reasonable.  This type of
> policy can't be forwarded to a guest by restricting allowed LE
> signers.  But this is mostly speculation since AFAIK no one has
> seriously proposed any particular policy support and the plan was to
> not have this for the initial implementation.
>
> 2. While matching hardware behavior is nice in principle, there
> doesn't seem to be useful hardware behavior to match here.  If the
> host had a list of five allowed LE signers, how exactly would it
> restrict the MSRs?  They're not written atomically, so you can't
> directly tell what's being written.

In this case I actually plan to just allow creating guest with guest's 
IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is 
specified, creating guest will fail. And we only allow creating guest 
with host allowed hash values (with 'lehash=hash-value'), and if 
'hash-value' specified by 'lehash' is not allowed by host, we also fail 
to create guest.

We can only allow creating guest with 'lewr' specified when host allows 
anything.

But in this way, we are restricting guest OS's ability to run LE, as 
only one LE, that is specified by 'lehash' parameter, can be run. But I 
think this won't hurt much, as multiple guests still are able to run 
different LEs?

Also, the only way to fail an MSR
> write is to send #GP, and Windows (and even Linux) may not expect
> that.  Linux doesn't panic due to #GP on MSR writes these days, but
> you still get a big fat warning.  I wouldn't be at all surprised if
> Windows BSODs.

We cannot allow writing some particular value to MSRs successfully, 
while injecting #GP when writing other values to the same MSRs. So #GP 
is not option.

ENCLS[EINIT], on the other hand, returns an actual
> error code.  I'm not sure that a sensible error code exists
> ("SGX_HYPERVISOR_SAID_NO?", perhaps),

Looks no such error code exists. And we cannot return such error code to 
guest as such error code is only supposed to be valid when ENCLS is run 
in hypervisor.

but SGX_INVALID_EINITTOKEN seems
> to mean, more or less, "the CPU thinks you're not authorized to do
> this", so forcing that error code could be entirely reasonable.
>
> If the host policy is to allow a list of LE signers, you could return
> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
> the list.

But this would be inconsistent with HW behavior. If the hash value in 
guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, 
EINIT is not supposed to return SGX_INVALID_EINITTOKEN.

I think from VMM's perspective, emulating HW behavior to be consistent 
with real HW behavior is very important.

Paolo, would you provide your comments?

>
> --Andy
>
Kai Huang May 18, 2017, 7:54 a.m. UTC | #17
On 5/17/2017 2:21 AM, Paolo Bonzini wrote:
>
>
> On 16/05/2017 02:48, Huang, Kai wrote:
>>
>>
>> If host only allows one single LE to run, KVM can add a restrict that
>> only allows to create KVM guest with runtime change to
>> IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed (single) hash
>> can be used by guest. From guest's view, it simply has
>> IA32_FEATURE_CONTROL[bit17] cleared and has IA32_SGXLEPUBKEYHASHn with
>> default value to be host allowed (single) hash.
>>
>> If host allows several LEs (not but everything), and if we create guest
>> with 'lewr', then the behavior is not consistent with HW behavior, as
>> from guest's hardware's point of view, we can actually run any LE but we
>> have to tell guest that you are only allowed to change
>> IA32_SGXLEPUBKEYHASHn to some specific values. One compromise solution
>> is we don't allow to create guest with 'lewr' specified, and at the
>> meantime, only allow to create guest with host approved hashes specified
>> in 'lehash'. This will make guest's behavior consistent to HW behavior
>> but only allows guest to run one LE (which is specified by 'lehash' when
>> guest is created).
>>
>> I'd like to hear comments from you guys.
>>
>> Paolo, do you also have comments here from KVM's side?
>
> I would start with read-only LE hash (same as the host), which is a
> valid configuration anyway.  Then later we can trap EINIT to emulate
> IA32_SGXLEPUBKEYHASHn.

You mean we can start with creating guest without Qemu 'lewr' parameter 
support, and always disallowing guest to change IA32_SGXLEPUBKEYHASHn?
Even in this way, KVM still needs to emulate IA32_SGXLEPUBKEYHASHn (just 
allow MSR reading but not writing), and write guest's value to physical 
MSRs when running guest (trapping EINIT and write MSRs during EINIT is 
really just performance optimization). Because host can run multiple LEs 
and change MSRs. Your suggestion only works when runtime change to 
IA32_SGXLEPUBKEYHASHn is disabled on host (meaning physical machine).

Thanks,
-Kai
>
> Paolo
>
Kai Huang May 18, 2017, 8:14 a.m. UTC | #18
On 5/18/2017 2:21 AM, Sean Christopherson wrote:
> On Tue, 2017-05-16 at 11:56 +1200, Huang, Kai wrote:
>>
>> On 5/16/2017 12:46 AM, Jarkko Sakkinen wrote:
>>>
>>> On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote:
>>>>
>>>> [resending due to some kind of kernel.org glitch -- sorry if anyone
>>>> gets duplicates]
>>>>
>>>> On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>> My current patch is based on this assumption. For KVM guest, naturally,
>>>>> we
>>>>> will write the cached value to real MSRs when vcpu is scheduled in. For
>>>>> host, SGX driver should write its own value to MSRs when it performs
>>>>> EINIT
>>>>> for LE.
>>>> This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
>>>> would propose a totally different solution:
>>>>
>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>> will take the mutex, compare the percpu variable to the desired value,
>>>> and, if it's different, do WRMSR and update the percpu variable.
>>> This is exactly what I've been suggesting internally: trap EINIT and
>>> check the value and write conditionally.
>>>
>>> I think this would be the best starting point.
>> OK. Assuming we are going to have this percpu variable for
>> IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to
>> this percpu variable after KVM writes guest's value to hardware MSR? And
>> host (SGX driver) need to do the same thing (check the value and write
>> conditionally), correct?
>>
>> Thanks,
>> -Kai
>
> Yes, the percpu variable is simply a cache so that the kernel doesn't have to do
> four RDMSRs every time it wants to do EINIT.  KVM would still maintain shadow
> copies of the MSRs for each vcpu for emulating RDMSR, WRMSR and EINIT.  I don't
> think KVM would even need to be aware of the percpu variable, i.e. the entire
> lock->(rd/wr)msr->EINIT->unlock sequence can probably be encapsulated in a
> single function that is called from both the primary SGX driver and from KVM.
>

You are making assumption that KVM will run ENCLS on behalf of guest. :)

If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I 
actually prefer to using MTF, as with MTF we don't have to do all the 
remapping guest's virtual address to KVM's virtual address thing, if we 
don't need to look into guest's ENCLS parameter. But if we need to look 
into guest's ENCLS parameters, for example, to locate physical SECS 
page, or to update physical EPC page's info (that KVM needs to 
maintain), maybe we can choose running ENCLS on behalf of guest.

But if we are going to run ENCLS on behalf of guest, I think providing a 
single function which does write MSR and EINIT for KVM should be a good 
idea.

Thanks,
-Kai
Paolo Bonzini May 18, 2017, 8:58 a.m. UTC | #19
On 18/05/2017 09:54, Huang, Kai wrote:
>>
>> I would start with read-only LE hash (same as the host), which is a
>> valid configuration anyway.  Then later we can trap EINIT to emulate
>> IA32_SGXLEPUBKEYHASHn.
> 
> You mean we can start with creating guest without Qemu 'lewr' parameter
> support, and always disallowing guest to change IA32_SGXLEPUBKEYHASHn?
> Even in this way, KVM still needs to emulate IA32_SGXLEPUBKEYHASHn (just
> allow MSR reading but not writing), and write guest's value to physical
> MSRs when running guest (trapping EINIT and write MSRs during EINIT is
> really just performance optimization). Because host can run multiple LEs
> and change MSRs.

Oh, I didn't know this.  So I guess there isn't much benefit in skipping
the trapping of EINIT.

Paolo

> Your suggestion only works when runtime change to
> IA32_SGXLEPUBKEYHASHn is disabled on host (meaning physical machine).
Jarkko Sakkinen May 20, 2017, 1:23 p.m. UTC | #20
On Tue, May 16, 2017 at 11:56:38AM +1200, Huang, Kai wrote:
> 
> 
> On 5/16/2017 12:46 AM, Jarkko Sakkinen wrote:
> > On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote:
> > > [resending due to some kind of kernel.org glitch -- sorry if anyone
> > > gets duplicates]
> > > 
> > > On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> > > > My current patch is based on this assumption. For KVM guest, naturally, we
> > > > will write the cached value to real MSRs when vcpu is scheduled in. For
> > > > host, SGX driver should write its own value to MSRs when it performs EINIT
> > > > for LE.
> > > 
> > > This seems unnecessarily slow (perhaps *extremely* slow) to me.  I
> > > would propose a totally different solution:
> > > 
> > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> > > with whatever lock is needed (probably just a mutex).  Users of EINIT
> > > will take the mutex, compare the percpu variable to the desired value,
> > > and, if it's different, do WRMSR and update the percpu variable.
> > 
> > This is exactly what I've been suggesting internally: trap EINIT and
> > check the value and write conditionally.
> > 
> > I think this would be the best starting point.
> 
> OK. Assuming we are going to have this percpu variable for
> IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to this
> percpu variable after KVM writes guest's value to hardware MSR? And host
> (SGX driver) need to do the same thing (check the value and write
> conditionally), correct?
> 
> Thanks,
> -Kai

This how I would understand it, yes.

/Jarkko
Andy Lutomirski May 20, 2017, 9:55 p.m. UTC | #21
On Thu, May 18, 2017 at 1:14 AM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> You are making assumption that KVM will run ENCLS on behalf of guest. :)
>
> If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I
> actually prefer to using MTF, as with MTF we don't have to do all the
> remapping guest's virtual address to KVM's virtual address thing, if we
> don't need to look into guest's ENCLS parameter. But if we need to look into
> guest's ENCLS parameters, for example, to locate physical SECS page, or to
> update physical EPC page's info (that KVM needs to maintain), maybe we can
> choose running ENCLS on behalf of guest.

After thinking about this a bit, I don't see how MTF helps.
Currently, KVM works kind of like this:

local_irq_disable();
set up stuff;
VMRESUME;
restore some host state;
local_irq_enable();

If the guest is going to run with the EINIT-exiting bit clear, the
only way I see this working is to modify KVM along the lines of:

local_irq_disable();
set up stuff;
if (condition here) {
  WRMSR to SGXLEPUBKEYHASH;
  update percpu shadow copyl
  clear EINIT-exiting bit;
} else {
  set EINIT-exiting bit;
}
VMRESUME;
restore some host state;
local_irq_enable();

where "condition here" might be something like "the last VMRESUME
exited due to EINIT".

I don't see how MTF helps much.  And if I were the KVM maintainer, I
would probably prefer to trap EINIT instead of adding a special case
to the main vm entry code.
Kai Huang May 23, 2017, 5:43 a.m. UTC | #22
On 5/21/2017 9:55 AM, Andy Lutomirski wrote:
> On Thu, May 18, 2017 at 1:14 AM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>> You are making assumption that KVM will run ENCLS on behalf of guest. :)
>>
>> If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I
>> actually prefer to using MTF, as with MTF we don't have to do all the
>> remapping guest's virtual address to KVM's virtual address thing, if we
>> don't need to look into guest's ENCLS parameter. But if we need to look into
>> guest's ENCLS parameters, for example, to locate physical SECS page, or to
>> update physical EPC page's info (that KVM needs to maintain), maybe we can
>> choose running ENCLS on behalf of guest.
>
> After thinking about this a bit, I don't see how MTF helps.
> Currently, KVM works kind of like this:
>
> local_irq_disable();
> set up stuff;
> VMRESUME;
> restore some host state;
> local_irq_enable();
>
> If the guest is going to run with the EINIT-exiting bit clear, the
> only way I see this working is to modify KVM along the lines of:
>
> local_irq_disable();
> set up stuff;
> if (condition here) {
>   WRMSR to SGXLEPUBKEYHASH;
>   update percpu shadow copyl
>   clear EINIT-exiting bit;
> } else {
>   set EINIT-exiting bit;
> }
> VMRESUME;
> restore some host state;
> local_irq_enable();
>
> where "condition here" might be something like "the last VMRESUME
> exited due to EINIT".
>
> I don't see how MTF helps much.  And if I were the KVM maintainer, I
> would probably prefer to trap EINIT instead of adding a special case
> to the main vm entry code.
>

Hi Andy,

Thanks for your comments. However I didn't intend to use MTF in your 
way. The idea of using MTF (along with ENCLS VMEXIT) is, by turning on 
MTF VMEXIT upon ENCLS VMEXIT, we are able to mark a single step VMEXIT 
after ENCLS so that ENCLS can run in guest as single step.

Let me explain how the two approaches work below in general, so that we 
can decide which is better. Only trapping EINIT in order to update 
IA32_SGXLEPUBKEYHASHn is relatively simpler but I'd compare the two in 
more general way, assuming we may want to trap more ENCLS in order to, 
ex, track EPC/Enclave status/info, in the future to support, ex, EPC 
oversubscription between KVM guests.

Below diagram shows the basic idea of the two approaches.


	--------------------------------------------------------------
			|	ENCLS		|
	--------------------------------------------------------------
			|	   	       /|\
	ENCLS VMEXIT	|			| VMENTRY
			|			|
		       \|/			|
		
     		1) identify which ENCLS leaf (RAX)
     		2) reconstruct/remap guest's ENCLS parameters, ex:
			- remap any guest VA (virtual address) to KVM VA
                 	- reconstruct PAGEINFO
         	3) do whatever needed before ENCLS, ex:
			- updating MSRs before EINIT
     		4) run ENCLS on behalf of guest, and skip ENCLS
		5) emulate ENCLS result (succeeded or not)
			- update guest's RAX-RDX.
			- and/or inject #GP (or #UD).
		6) do whatever needed after ENCLS, ex:
			- updating EPC/Enclave status/info

		   	1) Run ENCLS on behalf of guest


	--------------------------------------------------------------
			 |	ENCLS		   |
	--------------------------------------------------------------
			|/|\		          |/|\
	ENCLS VMEXIT	| | VMENTRY    MTF VMEXIT | | VMENTRY
		        | |	                  | |
		       \|/|		         \|/|
	1) Turn off EMCLS VMEXIT      1) Turn off MTF VMEXIT
	2) turn on MTF VMEXIT	      2) Turn on ENCLS VMEXIT
	3) cache ENCLS parameters     3) check whether ENCLS has run
	   (ENCLS changes RAX)        4) check whether ENCLS succeeded
	4) do whatever needed before     or not.
            ENCLS                      5) do whatever needed after ENCLS

			2) Using MTF

The concern of running ENCLS on behalf of guest is emulating ENCLS 
error. KVM needs to *correctly* emulate ENCLS error to guest so that the 
error we inject to guest can reflect the right behavior as if ENCLS run 
in guest. Running ENCLS in root-mode may be potentially different 
running ENCLS in non-root mode, therefore we have to go through all 
possible error codes to make sure we can emulate. And for some error 
code, ex, SGX_LOCKFAIL, we can handle it in KVM and don't have to inject 
error to guest. So the point is we have to go through all error code to 
make sure KVM can emulate ENCLS error code correctly for guest.
Another argument is Intel may add new error codes in the future when 
more SGX functionalities are introduced, so emulating error code may be 
a burden.

Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT 
upon ENCLS VMEXIT, the MTF won't be absolutely pending at end of that 
ENCLS. For example, MTF may be pending at end of interrupt (cannot 
recall exactly) if event is pending during VMENTRY from ENCLS VMEXIT. 
Therefore we have to do additional thing to check whether this MTF 
VMEXIT really happens after ENCLS run (step 3 above). And depending on 
what we need to do, we may need to check whether ENCLS succeeded or not 
in guest, which is also tricky, as ENCLS can fail in either setting 
error code in RAX, or generating #GP or #UD (step 4 above). We may still 
need to do gva->gpa->hpa, ex, in order to locate EPC/SECS page and 
update status, depending on the purpose of trapping ENCLS.

But by using MTF, we don't have to worry about ENCLS error emulation, as 
ENCLS runs in guest, thus we don't need to worry about this root-mode 
and non-root mode difference. I think this is the major reason that we 
want to use MTF.
Kai Huang May 23, 2017, 5:55 a.m. UTC | #23
On 5/23/2017 5:43 PM, Huang, Kai wrote:
>
>
> On 5/21/2017 9:55 AM, Andy Lutomirski wrote:
>> On Thu, May 18, 2017 at 1:14 AM, Huang, Kai
>> <kai.huang@linux.intel.com> wrote:
>>> You are making assumption that KVM will run ENCLS on behalf of guest. :)
>>>
>>> If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I
>>> actually prefer to using MTF, as with MTF we don't have to do all the
>>> remapping guest's virtual address to KVM's virtual address thing, if we
>>> don't need to look into guest's ENCLS parameter. But if we need to
>>> look into
>>> guest's ENCLS parameters, for example, to locate physical SECS page,
>>> or to
>>> update physical EPC page's info (that KVM needs to maintain), maybe
>>> we can
>>> choose running ENCLS on behalf of guest.
>>
>> After thinking about this a bit, I don't see how MTF helps.
>> Currently, KVM works kind of like this:
>>
>> local_irq_disable();
>> set up stuff;
>> VMRESUME;
>> restore some host state;
>> local_irq_enable();
>>
>> If the guest is going to run with the EINIT-exiting bit clear, the
>> only way I see this working is to modify KVM along the lines of:
>>
>> local_irq_disable();
>> set up stuff;
>> if (condition here) {
>>   WRMSR to SGXLEPUBKEYHASH;
>>   update percpu shadow copyl
>>   clear EINIT-exiting bit;
>> } else {
>>   set EINIT-exiting bit;
>> }
>> VMRESUME;
>> restore some host state;
>> local_irq_enable();
>>
>> where "condition here" might be something like "the last VMRESUME
>> exited due to EINIT".
>>
>> I don't see how MTF helps much.  And if I were the KVM maintainer, I
>> would probably prefer to trap EINIT instead of adding a special case
>> to the main vm entry code.
>>
>
> Hi Andy,
>
> Thanks for your comments. However I didn't intend to use MTF in your
> way. The idea of using MTF (along with ENCLS VMEXIT) is, by turning on
> MTF VMEXIT upon ENCLS VMEXIT, we are able to mark a single step VMEXIT
> after ENCLS so that ENCLS can run in guest as single step.
>
> Let me explain how the two approaches work below in general, so that we
> can decide which is better. Only trapping EINIT in order to update
> IA32_SGXLEPUBKEYHASHn is relatively simpler but I'd compare the two in
> more general way, assuming we may want to trap more ENCLS in order to,
> ex, track EPC/Enclave status/info, in the future to support, ex, EPC
> oversubscription between KVM guests.
>
> Below diagram shows the basic idea of the two approaches.
>
>
>     --------------------------------------------------------------
>             |    ENCLS        |
>     --------------------------------------------------------------
>             |                  /|\
>     ENCLS VMEXIT    |            | VMENTRY
>             |            |
>                \|/            |

Looks the diagrams are broken. Sorry. I need to verify before sending 
next time. But looks they are still understandable?

Thanks,
Kai
>
>             1) identify which ENCLS leaf (RAX)
>             2) reconstruct/remap guest's ENCLS parameters, ex:
>             - remap any guest VA (virtual address) to KVM VA
>                     - reconstruct PAGEINFO
>             3) do whatever needed before ENCLS, ex:
>             - updating MSRs before EINIT
>             4) run ENCLS on behalf of guest, and skip ENCLS
>         5) emulate ENCLS result (succeeded or not)
>             - update guest's RAX-RDX.
>             - and/or inject #GP (or #UD).
>         6) do whatever needed after ENCLS, ex:
>             - updating EPC/Enclave status/info
>
>                1) Run ENCLS on behalf of guest
>
>
>     --------------------------------------------------------------
>              |    ENCLS           |
>     --------------------------------------------------------------
>             |/|\                  |/|\
>     ENCLS VMEXIT    | | VMENTRY    MTF VMEXIT | | VMENTRY
>                 | |                      | |
>                \|/|                 \|/|
>     1) Turn off EMCLS VMEXIT      1) Turn off MTF VMEXIT
>     2) turn on MTF VMEXIT          2) Turn on ENCLS VMEXIT
>     3) cache ENCLS parameters     3) check whether ENCLS has run
>        (ENCLS changes RAX)        4) check whether ENCLS succeeded
>     4) do whatever needed before     or not.
>            ENCLS                      5) do whatever needed after ENCLS
>
>             2) Using MTF
>
> The concern of running ENCLS on behalf of guest is emulating ENCLS
> error. KVM needs to *correctly* emulate ENCLS error to guest so that the
> error we inject to guest can reflect the right behavior as if ENCLS run
> in guest. Running ENCLS in root-mode may be potentially different
> running ENCLS in non-root mode, therefore we have to go through all
> possible error codes to make sure we can emulate. And for some error
> code, ex, SGX_LOCKFAIL, we can handle it in KVM and don't have to inject
> error to guest. So the point is we have to go through all error code to
> make sure KVM can emulate ENCLS error code correctly for guest.
> Another argument is Intel may add new error codes in the future when
> more SGX functionalities are introduced, so emulating error code may be
> a burden.
>
> Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT
> upon ENCLS VMEXIT, the MTF won't be absolutely pending at end of that
> ENCLS. For example, MTF may be pending at end of interrupt (cannot
> recall exactly) if event is pending during VMENTRY from ENCLS VMEXIT.
> Therefore we have to do additional thing to check whether this MTF
> VMEXIT really happens after ENCLS run (step 3 above). And depending on
> what we need to do, we may need to check whether ENCLS succeeded or not
> in guest, which is also tricky, as ENCLS can fail in either setting
> error code in RAX, or generating #GP or #UD (step 4 above). We may still
> need to do gva->gpa->hpa, ex, in order to locate EPC/SECS page and
> update status, depending on the purpose of trapping ENCLS.
>
> But by using MTF, we don't have to worry about ENCLS error emulation, as
> ENCLS runs in guest, thus we don't need to worry about this root-mode
> and non-root mode difference. I think this is the major reason that we
> want to use MTF.
>
Andy Lutomirski May 23, 2017, 4:34 p.m. UTC | #24
On Mon, May 22, 2017 at 10:43 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 5/21/2017 9:55 AM, Andy Lutomirski wrote:
>>
>> On Thu, May 18, 2017 at 1:14 AM, Huang, Kai <kai.huang@linux.intel.com>
>> wrote:
>>>
>>> You are making assumption that KVM will run ENCLS on behalf of guest. :)
>>>
>>> If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I
>>> actually prefer to using MTF, as with MTF we don't have to do all the
>>> remapping guest's virtual address to KVM's virtual address thing, if we
>>> don't need to look into guest's ENCLS parameter. But if we need to look
>>> into
>>> guest's ENCLS parameters, for example, to locate physical SECS page, or
>>> to
>>> update physical EPC page's info (that KVM needs to maintain), maybe we
>>> can
>>> choose running ENCLS on behalf of guest.
>>
>>
>> After thinking about this a bit, I don't see how MTF helps.
>> Currently, KVM works kind of like this:
>>
>> local_irq_disable();
>> set up stuff;
>> VMRESUME;
>> restore some host state;
>> local_irq_enable();
>>
>> If the guest is going to run with the EINIT-exiting bit clear, the
>> only way I see this working is to modify KVM along the lines of:
>>
>> local_irq_disable();
>> set up stuff;
>> if (condition here) {
>>   WRMSR to SGXLEPUBKEYHASH;
>>   update percpu shadow copyl
>>   clear EINIT-exiting bit;
>> } else {
>>   set EINIT-exiting bit;
>> }
>> VMRESUME;
>> restore some host state;
>> local_irq_enable();
>>
>> where "condition here" might be something like "the last VMRESUME
>> exited due to EINIT".
>>
>> I don't see how MTF helps much.  And if I were the KVM maintainer, I
>> would probably prefer to trap EINIT instead of adding a special case
>> to the main vm entry code.
>>
>
> Hi Andy,
>
> Thanks for your comments. However I didn't intend to use MTF in your way.
> The idea of using MTF (along with ENCLS VMEXIT) is, by turning on MTF VMEXIT
> upon ENCLS VMEXIT, we are able to mark a single step VMEXIT after ENCLS so
> that ENCLS can run in guest as single step.
>
> Let me explain how the two approaches work below in general, so that we can
> decide which is better. Only trapping EINIT in order to update
> IA32_SGXLEPUBKEYHASHn is relatively simpler but I'd compare the two in more
> general way, assuming we may want to trap more ENCLS in order to, ex, track
> EPC/Enclave status/info, in the future to support, ex, EPC oversubscription
> between KVM guests.
>
> Below diagram shows the basic idea of the two approaches.
>

...

>
>         --------------------------------------------------------------
>                          |      ENCLS              |
>         --------------------------------------------------------------
>                         |/|\                      |/|\
>         ENCLS VMEXIT    | | VMENTRY    MTF VMEXIT | | VMENTRY
>                         | |                       | |
>                        \|/|                      \|/|
>         1) Turn off EMCLS VMEXIT      1) Turn off MTF VMEXIT
>         2) turn on MTF VMEXIT         2) Turn on ENCLS VMEXIT
>         3) cache ENCLS parameters     3) check whether ENCLS has run
>            (ENCLS changes RAX)        4) check whether ENCLS succeeded
>         4) do whatever needed before     or not.
>            ENCLS                      5) do whatever needed after ENCLS
>
>                         2) Using MTF
>

...

> Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT upon
> ENCLS VMEXIT, the MTF won't be absolutely pending at end of that ENCLS. For
> example, MTF may be pending at end of interrupt (cannot recall exactly) if
> event is pending during VMENTRY from ENCLS VMEXIT. Therefore we have to do
> additional thing to check whether this MTF VMEXIT really happens after ENCLS
> run (step 3 above). And depending on what we need to do, we may need to
> check whether ENCLS succeeded or not in guest, which is also tricky, as
> ENCLS can fail in either setting error code in RAX, or generating #GP or #UD
> (step 4 above). We may still need to do gva->gpa->hpa, ex, in order to
> locate EPC/SECS page and update status, depending on the purpose of trapping
> ENCLS.

I think there are some issues here.

First, you're making a big assumption that, when you resume the guest
with MTF set, the instruction that gets executed is still
ENCLS[EINIT].  That's not guaranteed as is -- you could race against
another vCPU that changes the instruction, the instruction could be in
IO space, host userspace could be messing with you, etc.  Second, I
don't think there's any precedent at all in KVM for doing this.
Third, you still need to make sure that the MSRs retain the value you
want them to have by the time ENCLS happens.  I think that, by the
time you resolve all of these issues, it'll look a lot like the
pseudocode I emailed out, and MTF won't be necessary any more.

>
> But by using MTF, we don't have to worry about ENCLS error emulation, as
> ENCLS runs in guest, thus we don't need to worry about this root-mode and
> non-root mode difference. I think this is the major reason that we want to
> use MTF.

I don't see why error emulation is hard.  If the host does ENCLS on
behalf of the guest and it returns an error, can't you return exactly
the same error to the guest with no further processing?  The only
tricky case is where the host rejects due to its own policy and you
have to choose an error code.

--Andy
Paolo Bonzini May 23, 2017, 4:43 p.m. UTC | #25
On 23/05/2017 18:34, Andy Lutomirski wrote:
> 
>> Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT upon
>> ENCLS VMEXIT, the MTF won't be absolutely pending at end of that ENCLS. For
>> example, MTF may be pending at end of interrupt (cannot recall exactly) if
>> event is pending during VMENTRY from ENCLS VMEXIT. Therefore we have to do
>> additional thing to check whether this MTF VMEXIT really happens after ENCLS
>> run (step 3 above). And depending on what we need to do, we may need to
>> check whether ENCLS succeeded or not in guest, which is also tricky, as
>> ENCLS can fail in either setting error code in RAX, or generating #GP or #UD
>> (step 4 above). We may still need to do gva->gpa->hpa, ex, in order to
>> locate EPC/SECS page and update status, depending on the purpose of trapping
>> ENCLS.
> I think there are some issues here.
> 
> First, you're making a big assumption that, when you resume the guest
> with MTF set, the instruction that gets executed is still
> ENCLS[EINIT].  That's not guaranteed as is -- you could race against
> another vCPU that changes the instruction, the instruction could be in
> IO space, host userspace could be messing with you, etc.  Second, I
> don't think there's any precedent at all in KVM for doing this.
> Third, you still need to make sure that the MSRs retain the value you
> want them to have by the time ENCLS happens.  I think that, by the
> time you resolve all of these issues, it'll look a lot like the
> pseudocode I emailed out, and MTF won't be necessary any more.

Agreed.  Emulation in the host is better.

Paolo
Kai Huang May 24, 2017, 8:20 a.m. UTC | #26
On 5/24/2017 4:43 AM, Paolo Bonzini wrote:
>
>
> On 23/05/2017 18:34, Andy Lutomirski wrote:
>>
>>> Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT upon
>>> ENCLS VMEXIT, the MTF won't be absolutely pending at end of that ENCLS. For
>>> example, MTF may be pending at end of interrupt (cannot recall exactly) if
>>> event is pending during VMENTRY from ENCLS VMEXIT. Therefore we have to do
>>> additional thing to check whether this MTF VMEXIT really happens after ENCLS
>>> run (step 3 above). And depending on what we need to do, we may need to
>>> check whether ENCLS succeeded or not in guest, which is also tricky, as
>>> ENCLS can fail in either setting error code in RAX, or generating #GP or #UD
>>> (step 4 above). We may still need to do gva->gpa->hpa, ex, in order to
>>> locate EPC/SECS page and update status, depending on the purpose of trapping
>>> ENCLS.
>> I think there are some issues here.
>>
>> First, you're making a big assumption that, when you resume the guest
>> with MTF set, the instruction that gets executed is still
>> ENCLS[EINIT].  That's not guaranteed as is -- you could race against
>> another vCPU that changes the instruction, the instruction could be in
>> IO space, host userspace could be messing with you, etc.  Second, I
>> don't think there's any precedent at all in KVM for doing this.
>> Third, you still need to make sure that the MSRs retain the value you
>> want them to have by the time ENCLS happens.  I think that, by the
>> time you resolve all of these issues, it'll look a lot like the
>> pseudocode I emailed out, and MTF won't be necessary any more.
>
> Agreed.  Emulation in the host is better.

Hi Andy/Paolo,

Thanks for comments. I'll follow your suggestion in v2.

Thanks,
-Kai

>
> Paolo
>
Kai Huang June 6, 2017, 8:52 p.m. UTC | #27
On 5/18/2017 7:45 PM, Huang, Kai wrote:
> 
> 
> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai 
>> <kai.huang@linux.intel.com> wrote:
>>>
>>>
>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>
>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>> I am not sure whether the cost of writing to 4 MSRs would be 
>>>>> *extremely*
>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>>> writing
>>>>> to several MSRs, etc.
>>>>
>>>>
>>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>>> unusualy slow.
>>>>
>>>>>
>>>>>>
>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>>>> will take the mutex, compare the percpu variable to the desired 
>>>>>> value,
>>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>>
>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its 
>>>>>> in-memory
>>>>>> state but *not* changing the MSRs.  KVM will trap and emulate 
>>>>>> EINIT to
>>>>>> support the same handling as the host.  There is no action 
>>>>>> required at
>>>>>> all on KVM guest entry and exit.
>>>>>
>>>>>
>>>>>
>>>>> This is doable, but SGX driver needs to do those things and expose
>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>>>> have,
>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>>>> performance critical path. We can simply read old value from MSRs 
>>>>> out and
>>>>> compare whether the old equals to the new.
>>>>
>>>>
>>>> I think the SGX driver should probably live in arch/x86, and the
>>>> interface could be a simple percpu variable that is exported (from the
>>>> main kernel image, not from a module).
>>>>
>>>>>
>>>>>>
>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>>>> other reasons: someone is going to want to implement policy for what
>>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>>
>>>>>
>>>>>
>>>>> I am not very convinced why "what enclaves are allowed" in host would
>>>>> apply
>>>>> to guest. Can you elaborate? I mean in general virtualization just 
>>>>> focus
>>>>> emulating hardware behavior. If a native machine is able to run any 
>>>>> LE,
>>>>> the
>>>>> virtual machine should be able to as well (of course, with guest's
>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>
>>>>
>>>> I strongly disagree.  I can imagine two classes of sensible policies
>>>> for launch control:
>>>>
>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>
>>>> 2. Allow some things, and make sure that VMs have at least as
>>>> restrictive a policy as host root has.  After all, what's the point of
>>>> restricting enclaves in the host if host code can simply spawn a
>>>> little VM to run otherwise-disallowed enclaves?
>>>
>>>
>>> What's the current SGX driver launch control policy? Yes allow 
>>> everything
>>> works for KVM so lets skip this. Are we going to support allowing 
>>> several
>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
>>> staff but I don't know details.
>>>
>>> I am trying to find a way that we can both not break host launch control
>>> policy, and be consistent to HW behavior (from guest's view). 
>>> Currently we
>>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn 
>>> either
>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this 
>>> purpose.
>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>
>>>         -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>
>>> where 'epc' specifies guest's EPC size, lehash specifies (initial) 
>>> value of
>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is 
>>> allowed
>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>
>>> If host only allows one single LE to run, KVM can add a restrict that 
>>> only
>>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>> disabled, so that only host allowed (single) hash can be used by 
>>> guest. From
>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) 
>>> hash.
>>>
>>> If host allows several LEs (not but everything), and if we create 
>>> guest with
>>> 'lewr', then the behavior is not consistent with HW behavior, as from
>>> guest's hardware's point of view, we can actually run any LE but we 
>>> have to
>>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn 
>>> to some
>>> specific values. One compromise solution is we don't allow to create 
>>> guest
>>> with 'lewr' specified, and at the meantime, only allow to create 
>>> guest with
>>> host approved hashes specified in 'lehash'. This will make guest's 
>>> behavior
>>> consistent to HW behavior but only allows guest to run one LE (which is
>>> specified by 'lehash' when guest is created).
>>
>> I'm not sure I entirely agree for a couple reasons.
>>
>> 1. I wouldn't be surprised if the kernel ends up implementing a policy
>> in which it checks all enclaves (not just LEs) for acceptability.  In
>> fact, if the kernel sticks with the "no LE at all or just
>> kernel-internal LE", then checking enclaves directly against some
>> admin- or distro-provided signer list seems reasonable.  This type of
>> policy can't be forwarded to a guest by restricting allowed LE
>> signers.  But this is mostly speculation since AFAIK no one has
>> seriously proposed any particular policy support and the plan was to
>> not have this for the initial implementation.
>>
>> 2. While matching hardware behavior is nice in principle, there
>> doesn't seem to be useful hardware behavior to match here.  If the
>> host had a list of five allowed LE signers, how exactly would it
>> restrict the MSRs?  They're not written atomically, so you can't
>> directly tell what's being written.
> 
> In this case I actually plan to just allow creating guest with guest's 
> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is 
> specified, creating guest will fail. And we only allow creating guest 
> with host allowed hash values (with 'lehash=hash-value'), and if 
> 'hash-value' specified by 'lehash' is not allowed by host, we also fail 
> to create guest.
> 
> We can only allow creating guest with 'lewr' specified when host allows 
> anything.
> 
> But in this way, we are restricting guest OS's ability to run LE, as 
> only one LE, that is specified by 'lehash' parameter, can be run. But I 
> think this won't hurt much, as multiple guests still are able to run 
> different LEs?
> 
> Also, the only way to fail an MSR
>> write is to send #GP, and Windows (and even Linux) may not expect
>> that.  Linux doesn't panic due to #GP on MSR writes these days, but
>> you still get a big fat warning.  I wouldn't be at all surprised if
>> Windows BSODs.
> 
> We cannot allow writing some particular value to MSRs successfully, 
> while injecting #GP when writing other values to the same MSRs. So #GP 
> is not option.
> 
> ENCLS[EINIT], on the other hand, returns an actual
>> error code.  I'm not sure that a sensible error code exists
>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
> 
> Looks no such error code exists. And we cannot return such error code to 
> guest as such error code is only supposed to be valid when ENCLS is run 
> in hypervisor.
> 
> but SGX_INVALID_EINITTOKEN seems
>> to mean, more or less, "the CPU thinks you're not authorized to do
>> this", so forcing that error code could be entirely reasonable.
>>
>> If the host policy is to allow a list of LE signers, you could return
>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
>> the list.
> 
> But this would be inconsistent with HW behavior. If the hash value in 
> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, 
> EINIT is not supposed to return SGX_INVALID_EINITTOKEN.
> 
> I think from VMM's perspective, emulating HW behavior to be consistent 
> with real HW behavior is very important.
> 
> Paolo, would you provide your comments?

Hi all,

This has been quite for a while and I'd like to start discussion again. 
Jarkko told me that currently he only supports one LE in SGX driver, but 
I am not sure whether he is going to extend in the future or not. I 
think this might also depend on requirements from customers.

Andy,

If we only support one LE in driver, then we can only support the same 
LE for all KVM guests, according to your comments that host kernel 
launch control policy should also apply to KVM guests? WOuld you 
comments more?

Jarkko,

Could you help to clarify the whole launch control policy in host side 
so that we can have a better understanding together?

Thanks,
-Kai

> 
>>
>> --Andy
>>
Andy Lutomirski June 6, 2017, 9:22 p.m. UTC | #28
On Tue, Jun 6, 2017 at 1:52 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 5/18/2017 7:45 PM, Huang, Kai wrote:
>>
>>
>>
>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>>>
>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com>
>>> wrote:
>>>>
>>>>
>>>>
>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>>
>>>>>
>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> I am not sure whether the cost of writing to 4 MSRs would be
>>>>>> *extremely*
>>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>>>> writing
>>>>>> to several MSRs, etc.
>>>>>
>>>>>
>>>>>
>>>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>>>> unusualy slow.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>>>>> will take the mutex, compare the percpu variable to the desired
>>>>>>> value,
>>>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>>>
>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its
>>>>>>> in-memory
>>>>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT
>>>>>>> to
>>>>>>> support the same handling as the host.  There is no action required
>>>>>>> at
>>>>>>> all on KVM guest entry and exit.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> This is doable, but SGX driver needs to do those things and expose
>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>>>>> have,
>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>>>>> performance critical path. We can simply read old value from MSRs out
>>>>>> and
>>>>>> compare whether the old equals to the new.
>>>>>
>>>>>
>>>>>
>>>>> I think the SGX driver should probably live in arch/x86, and the
>>>>> interface could be a simple percpu variable that is exported (from the
>>>>> main kernel image, not from a module).
>>>>>
>>>>>>
>>>>>>>
>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>>>>> other reasons: someone is going to want to implement policy for what
>>>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I am not very convinced why "what enclaves are allowed" in host would
>>>>>> apply
>>>>>> to guest. Can you elaborate? I mean in general virtualization just
>>>>>> focus
>>>>>> emulating hardware behavior. If a native machine is able to run any
>>>>>> LE,
>>>>>> the
>>>>>> virtual machine should be able to as well (of course, with guest's
>>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>>
>>>>>
>>>>>
>>>>> I strongly disagree.  I can imagine two classes of sensible policies
>>>>> for launch control:
>>>>>
>>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>>
>>>>> 2. Allow some things, and make sure that VMs have at least as
>>>>> restrictive a policy as host root has.  After all, what's the point of
>>>>> restricting enclaves in the host if host code can simply spawn a
>>>>> little VM to run otherwise-disallowed enclaves?
>>>>
>>>>
>>>>
>>>> What's the current SGX driver launch control policy? Yes allow
>>>> everything
>>>> works for KVM so lets skip this. Are we going to support allowing
>>>> several
>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
>>>> staff but I don't know details.
>>>>
>>>> I am trying to find a way that we can both not break host launch control
>>>> policy, and be consistent to HW behavior (from guest's view). Currently
>>>> we
>>>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>> either
>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this
>>>> purpose.
>>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>>
>>>>         -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>>
>>>> where 'epc' specifies guest's EPC size, lehash specifies (initial) value
>>>> of
>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is
>>>> allowed
>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>>
>>>> If host only allows one single LE to run, KVM can add a restrict that
>>>> only
>>>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>> disabled, so that only host allowed (single) hash can be used by guest.
>>>> From
>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single)
>>>> hash.
>>>>
>>>> If host allows several LEs (not but everything), and if we create guest
>>>> with
>>>> 'lewr', then the behavior is not consistent with HW behavior, as from
>>>> guest's hardware's point of view, we can actually run any LE but we have
>>>> to
>>>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to
>>>> some
>>>> specific values. One compromise solution is we don't allow to create
>>>> guest
>>>> with 'lewr' specified, and at the meantime, only allow to create guest
>>>> with
>>>> host approved hashes specified in 'lehash'. This will make guest's
>>>> behavior
>>>> consistent to HW behavior but only allows guest to run one LE (which is
>>>> specified by 'lehash' when guest is created).
>>>
>>>
>>> I'm not sure I entirely agree for a couple reasons.
>>>
>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy
>>> in which it checks all enclaves (not just LEs) for acceptability.  In
>>> fact, if the kernel sticks with the "no LE at all or just
>>> kernel-internal LE", then checking enclaves directly against some
>>> admin- or distro-provided signer list seems reasonable.  This type of
>>> policy can't be forwarded to a guest by restricting allowed LE
>>> signers.  But this is mostly speculation since AFAIK no one has
>>> seriously proposed any particular policy support and the plan was to
>>> not have this for the initial implementation.
>>>
>>> 2. While matching hardware behavior is nice in principle, there
>>> doesn't seem to be useful hardware behavior to match here.  If the
>>> host had a list of five allowed LE signers, how exactly would it
>>> restrict the MSRs?  They're not written atomically, so you can't
>>> directly tell what's being written.
>>
>>
>> In this case I actually plan to just allow creating guest with guest's
>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
>> specified, creating guest will fail. And we only allow creating guest with
>> host allowed hash values (with 'lehash=hash-value'), and if 'hash-value'
>> specified by 'lehash' is not allowed by host, we also fail to create guest.
>>
>> We can only allow creating guest with 'lewr' specified when host allows
>> anything.
>>
>> But in this way, we are restricting guest OS's ability to run LE, as only
>> one LE, that is specified by 'lehash' parameter, can be run. But I think
>> this won't hurt much, as multiple guests still are able to run different
>> LEs?
>>
>> Also, the only way to fail an MSR
>>>
>>> write is to send #GP, and Windows (and even Linux) may not expect
>>> that.  Linux doesn't panic due to #GP on MSR writes these days, but
>>> you still get a big fat warning.  I wouldn't be at all surprised if
>>> Windows BSODs.
>>
>>
>> We cannot allow writing some particular value to MSRs successfully, while
>> injecting #GP when writing other values to the same MSRs. So #GP is not
>> option.
>>
>> ENCLS[EINIT], on the other hand, returns an actual
>>>
>>> error code.  I'm not sure that a sensible error code exists
>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
>>
>>
>> Looks no such error code exists. And we cannot return such error code to
>> guest as such error code is only supposed to be valid when ENCLS is run in
>> hypervisor.
>>
>> but SGX_INVALID_EINITTOKEN seems
>>>
>>> to mean, more or less, "the CPU thinks you're not authorized to do
>>> this", so forcing that error code could be entirely reasonable.
>>>
>>> If the host policy is to allow a list of LE signers, you could return
>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
>>> the list.
>>
>>
>> But this would be inconsistent with HW behavior. If the hash value in
>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, EINIT
>> is not supposed to return SGX_INVALID_EINITTOKEN.
>>
>> I think from VMM's perspective, emulating HW behavior to be consistent
>> with real HW behavior is very important.
>>
>> Paolo, would you provide your comments?
>
>
> Hi all,
>
> This has been quite for a while and I'd like to start discussion again.
> Jarkko told me that currently he only supports one LE in SGX driver, but I
> am not sure whether he is going to extend in the future or not. I think this
> might also depend on requirements from customers.
>
> Andy,
>
> If we only support one LE in driver, then we can only support the same LE
> for all KVM guests, according to your comments that host kernel launch
> control policy should also apply to KVM guests? WOuld you comments more?

I'm not at all convinced that, going forward, Linux's host-side launch
control policy will be entirely contained in the LE.  I'm also not
convinced that non-Linux guests will function at all under this type
of policy -- what if FreeBSD's LE is different for whatever reason?

>
> Jarkko,
>
> Could you help to clarify the whole launch control policy in host side so
> that we can have a better understanding together?
>
> Thanks,
> -Kai
>
>>
>>>
>>> --Andy
>>>
>
Kai Huang June 6, 2017, 10:51 p.m. UTC | #29
On 6/7/2017 9:22 AM, Andy Lutomirski wrote:
> On Tue, Jun 6, 2017 at 1:52 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>
>>
>> On 5/18/2017 7:45 PM, Huang, Kai wrote:
>>>
>>>
>>>
>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>>>>
>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> I am not sure whether the cost of writing to 4 MSRs would be
>>>>>>> *extremely*
>>>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>>>>> writing
>>>>>>> to several MSRs, etc.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>>>>> unusualy slow.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>>>>>> will take the mutex, compare the percpu variable to the desired
>>>>>>>> value,
>>>>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>>>>
>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its
>>>>>>>> in-memory
>>>>>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT
>>>>>>>> to
>>>>>>>> support the same handling as the host.  There is no action required
>>>>>>>> at
>>>>>>>> all on KVM guest entry and exit.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This is doable, but SGX driver needs to do those things and expose
>>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>>>>>> have,
>>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>>>>>> performance critical path. We can simply read old value from MSRs out
>>>>>>> and
>>>>>>> compare whether the old equals to the new.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I think the SGX driver should probably live in arch/x86, and the
>>>>>> interface could be a simple percpu variable that is exported (from the
>>>>>> main kernel image, not from a module).
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>>>>>> other reasons: someone is going to want to implement policy for what
>>>>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I am not very convinced why "what enclaves are allowed" in host would
>>>>>>> apply
>>>>>>> to guest. Can you elaborate? I mean in general virtualization just
>>>>>>> focus
>>>>>>> emulating hardware behavior. If a native machine is able to run any
>>>>>>> LE,
>>>>>>> the
>>>>>>> virtual machine should be able to as well (of course, with guest's
>>>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>>>
>>>>>>
>>>>>>
>>>>>> I strongly disagree.  I can imagine two classes of sensible policies
>>>>>> for launch control:
>>>>>>
>>>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>>>
>>>>>> 2. Allow some things, and make sure that VMs have at least as
>>>>>> restrictive a policy as host root has.  After all, what's the point of
>>>>>> restricting enclaves in the host if host code can simply spawn a
>>>>>> little VM to run otherwise-disallowed enclaves?
>>>>>
>>>>>
>>>>>
>>>>> What's the current SGX driver launch control policy? Yes allow
>>>>> everything
>>>>> works for KVM so lets skip this. Are we going to support allowing
>>>>> several
>>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
>>>>> staff but I don't know details.
>>>>>
>>>>> I am trying to find a way that we can both not break host launch control
>>>>> policy, and be consistent to HW behavior (from guest's view). Currently
>>>>> we
>>>>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>>> either
>>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this
>>>>> purpose.
>>>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>>>
>>>>>          -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>>>
>>>>> where 'epc' specifies guest's EPC size, lehash specifies (initial) value
>>>>> of
>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is
>>>>> allowed
>>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>>>
>>>>> If host only allows one single LE to run, KVM can add a restrict that
>>>>> only
>>>>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>>> disabled, so that only host allowed (single) hash can be used by guest.
>>>>> From
>>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single)
>>>>> hash.
>>>>>
>>>>> If host allows several LEs (not but everything), and if we create guest
>>>>> with
>>>>> 'lewr', then the behavior is not consistent with HW behavior, as from
>>>>> guest's hardware's point of view, we can actually run any LE but we have
>>>>> to
>>>>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to
>>>>> some
>>>>> specific values. One compromise solution is we don't allow to create
>>>>> guest
>>>>> with 'lewr' specified, and at the meantime, only allow to create guest
>>>>> with
>>>>> host approved hashes specified in 'lehash'. This will make guest's
>>>>> behavior
>>>>> consistent to HW behavior but only allows guest to run one LE (which is
>>>>> specified by 'lehash' when guest is created).
>>>>
>>>>
>>>> I'm not sure I entirely agree for a couple reasons.
>>>>
>>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy
>>>> in which it checks all enclaves (not just LEs) for acceptability.  In
>>>> fact, if the kernel sticks with the "no LE at all or just
>>>> kernel-internal LE", then checking enclaves directly against some
>>>> admin- or distro-provided signer list seems reasonable.  This type of
>>>> policy can't be forwarded to a guest by restricting allowed LE
>>>> signers.  But this is mostly speculation since AFAIK no one has
>>>> seriously proposed any particular policy support and the plan was to
>>>> not have this for the initial implementation.
>>>>
>>>> 2. While matching hardware behavior is nice in principle, there
>>>> doesn't seem to be useful hardware behavior to match here.  If the
>>>> host had a list of five allowed LE signers, how exactly would it
>>>> restrict the MSRs?  They're not written atomically, so you can't
>>>> directly tell what's being written.
>>>
>>>
>>> In this case I actually plan to just allow creating guest with guest's
>>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
>>> specified, creating guest will fail. And we only allow creating guest with
>>> host allowed hash values (with 'lehash=hash-value'), and if 'hash-value'
>>> specified by 'lehash' is not allowed by host, we also fail to create guest.
>>>
>>> We can only allow creating guest with 'lewr' specified when host allows
>>> anything.
>>>
>>> But in this way, we are restricting guest OS's ability to run LE, as only
>>> one LE, that is specified by 'lehash' parameter, can be run. But I think
>>> this won't hurt much, as multiple guests still are able to run different
>>> LEs?
>>>
>>> Also, the only way to fail an MSR
>>>>
>>>> write is to send #GP, and Windows (and even Linux) may not expect
>>>> that.  Linux doesn't panic due to #GP on MSR writes these days, but
>>>> you still get a big fat warning.  I wouldn't be at all surprised if
>>>> Windows BSODs.
>>>
>>>
>>> We cannot allow writing some particular value to MSRs successfully, while
>>> injecting #GP when writing other values to the same MSRs. So #GP is not
>>> option.
>>>
>>> ENCLS[EINIT], on the other hand, returns an actual
>>>>
>>>> error code.  I'm not sure that a sensible error code exists
>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
>>>
>>>
>>> Looks no such error code exists. And we cannot return such error code to
>>> guest as such error code is only supposed to be valid when ENCLS is run in
>>> hypervisor.
>>>
>>> but SGX_INVALID_EINITTOKEN seems
>>>>
>>>> to mean, more or less, "the CPU thinks you're not authorized to do
>>>> this", so forcing that error code could be entirely reasonable.
>>>>
>>>> If the host policy is to allow a list of LE signers, you could return
>>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
>>>> the list.
>>>
>>>
>>> But this would be inconsistent with HW behavior. If the hash value in
>>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, EINIT
>>> is not supposed to return SGX_INVALID_EINITTOKEN.
>>>
>>> I think from VMM's perspective, emulating HW behavior to be consistent
>>> with real HW behavior is very important.
>>>
>>> Paolo, would you provide your comments?
>>
>>
>> Hi all,
>>
>> This has been quite for a while and I'd like to start discussion again.
>> Jarkko told me that currently he only supports one LE in SGX driver, but I
>> am not sure whether he is going to extend in the future or not. I think this
>> might also depend on requirements from customers.
>>
>> Andy,
>>
>> If we only support one LE in driver, then we can only support the same LE
>> for all KVM guests, according to your comments that host kernel launch
>> control policy should also apply to KVM guests? WOuld you comments more?
> 
> I'm not at all convinced that, going forward, Linux's host-side launch
> control policy will be entirely contained in the LE.  I'm also not
> convinced that non-Linux guests will function at all under this type
> of policy -- what if FreeBSD's LE is different for whatever reason?

I am not convinced either. I think we need Jarkko to elaborate how is 
host side launch control policy implemented, or is there any policy at 
all. I also tried to read SGX driver code but looks I couldn't find any 
implementation regarding to this.

Hi Jarkko,

Can you elaborate on this?

Thanks,
-Kai
> 
>>
>> Jarkko,
>>
>> Could you help to clarify the whole launch control policy in host side so
>> that we can have a better understanding together?
>>
>> Thanks,
>> -Kai
>>
>>>
>>>>
>>>> --Andy
>>>>
>>
>
Cohen, Haim June 7, 2017, 2:45 p.m. UTC | #30
On 6/6/2017 6:52 PM, Huang, Kai wrote:
>On 6/7/2017 9:22 AM, Andy Lutomirski wrote:

>> On Tue, Jun 6, 2017 at 1:52 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:

>>>

>>>

>>> On 5/18/2017 7:45 PM, Huang, Kai wrote:

>>>>

>>>>

>>>>

>>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:

>>>>>

>>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai

>>>>> <kai.huang@linux.intel.com>

>>>>> wrote:

>>>>>>

>>>>>>

>>>>>>

>>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:

>>>>>>>

>>>>>>>

>>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai

>>>>>>> <kai.huang@linux.intel.com>

>>>>>>> wrote:

>>>>>>>>

>>>>>>>>

>>>>>>>> I am not sure whether the cost of writing to 4 MSRs would be

>>>>>>>> *extremely*

>>>>>>>> slow, as when vcpu is schedule in, KVM is already doing

>>>>>>>> vmcs_load, writing to several MSRs, etc.

>>>>>>>

>>>>>>>

>>>>>>>

>>>>>>> I'm speculating that these MSRs may be rather unoptimized and hence

>>>>>>> unusualy slow.

>>>>>>>

>>>>>>>>

>>>>>>>>>

>>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH

>along

>>>>>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT

>>>>>>>>> will take the mutex, compare the percpu variable to the desired

>>>>>>>>> value,

>>>>>>>>> and, if it's different, do WRMSR and update the percpu variable.

>>>>>>>>>

>>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its

>>>>>>>>> in-memory

>>>>>>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT

>>>>>>>>> to

>>>>>>>>> support the same handling as the host.  There is no action required

>>>>>>>>> at

>>>>>>>>> all on KVM guest entry and exit.

>>>>>>>>

>>>>>>>>

>>>>>>>>

>>>>>>>>

>>>>>>>> This is doable, but SGX driver needs to do those things and expose

>>>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to

>>>>>>>> have,

>>>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in

>>>>>>>> performance critical path. We can simply read old value from MSRs out

>>>>>>>> and

>>>>>>>> compare whether the old equals to the new.

>>>>>>>

>>>>>>>

>>>>>>>

>>>>>>> I think the SGX driver should probably live in arch/x86, and the

>>>>>>> interface could be a simple percpu variable that is exported (from the

>>>>>>> main kernel image, not from a module).

>>>>>>>

>>>>>>>>

>>>>>>>>>

>>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for

>>>>>>>>> other reasons: someone is going to want to implement policy for what

>>>>>>>>> enclaves are allowed that applies to guests as well as the host.

>>>>>>>>

>>>>>>>>

>>>>>>>>

>>>>>>>>

>>>>>>>> I am not very convinced why "what enclaves are allowed" in host would

>>>>>>>> apply

>>>>>>>> to guest. Can you elaborate? I mean in general virtualization just

>>>>>>>> focus

>>>>>>>> emulating hardware behavior. If a native machine is able to run any

>>>>>>>> LE,

>>>>>>>> the

>>>>>>>> virtual machine should be able to as well (of course, with guest's

>>>>>>>> IA32_FEATURE_CONTROL[bit 17] set).

>>>>>>>

>>>>>>>

>>>>>>>

>>>>>>> I strongly disagree.  I can imagine two classes of sensible policies

>>>>>>> for launch control:

>>>>>>>

>>>>>>> 1. Allow everything.  This seems quite sensible to me.

>>>>>>>

>>>>>>> 2. Allow some things, and make sure that VMs have at least as

>>>>>>> restrictive a policy as host root has.  After all, what's the point of

>>>>>>> restricting enclaves in the host if host code can simply spawn a

>>>>>>> little VM to run otherwise-disallowed enclaves?

>>>>>>

>>>>>>

>>>>>>

>>>>>> What's the current SGX driver launch control policy? Yes allow

>>>>>> everything

>>>>>> works for KVM so lets skip this. Are we going to support allowing

>>>>>> several

>>>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE

>>>>>> staff but I don't know details.

>>>>>>

>>>>>> I am trying to find a way that we can both not break host launch control

>>>>>> policy, and be consistent to HW behavior (from guest's view). Currently

>>>>>> we

>>>>>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn

>>>>>> either

>>>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this

>>>>>> purpose.

>>>>>> Actually I introduced below Qemu SGX parameters for creating guest:

>>>>>>

>>>>>>          -sgx epc=<size>,lehash='SHA-256 hash',lewr

>>>>>>

>>>>>> where 'epc' specifies guest's EPC size, lehash specifies (initial) value

>>>>>> of

>>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is

>>>>>> allowed

>>>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.

>>>>>>

>>>>>> If host only allows one single LE to run, KVM can add a restrict that

>>>>>> only

>>>>>> allows to create KVM guest with runtime change to

>IA32_SGXLEPUBKEYHASHn

>>>>>> disabled, so that only host allowed (single) hash can be used by guest.

>>>>>> From

>>>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and

>has

>>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single)

>>>>>> hash.

>>>>>>

>>>>>> If host allows several LEs (not but everything), and if we create guest

>>>>>> with

>>>>>> 'lewr', then the behavior is not consistent with HW behavior, as from

>>>>>> guest's hardware's point of view, we can actually run any LE but we have

>>>>>> to

>>>>>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn

>to

>>>>>> some

>>>>>> specific values. One compromise solution is we don't allow to create

>>>>>> guest

>>>>>> with 'lewr' specified, and at the meantime, only allow to create guest

>>>>>> with

>>>>>> host approved hashes specified in 'lehash'. This will make guest's

>>>>>> behavior

>>>>>> consistent to HW behavior but only allows guest to run one LE (which is

>>>>>> specified by 'lehash' when guest is created).

>>>>>

>>>>>

>>>>> I'm not sure I entirely agree for a couple reasons.

>>>>>

>>>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy

>>>>> in which it checks all enclaves (not just LEs) for acceptability.  In

>>>>> fact, if the kernel sticks with the "no LE at all or just

>>>>> kernel-internal LE", then checking enclaves directly against some

>>>>> admin- or distro-provided signer list seems reasonable.  This type of

>>>>> policy can't be forwarded to a guest by restricting allowed LE

>>>>> signers.  But this is mostly speculation since AFAIK no one has

>>>>> seriously proposed any particular policy support and the plan was to

>>>>> not have this for the initial implementation.

>>>>>

>>>>> 2. While matching hardware behavior is nice in principle, there

>>>>> doesn't seem to be useful hardware behavior to match here.  If the

>>>>> host had a list of five allowed LE signers, how exactly would it

>>>>> restrict the MSRs?  They're not written atomically, so you can't

>>>>> directly tell what's being written.

>>>>

>>>>

>>>> In this case I actually plan to just allow creating guest with guest's

>>>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is

>>>> specified, creating guest will fail. And we only allow creating guest with

>>>> host allowed hash values (with 'lehash=hash-value'), and if 'hash-value'

>>>> specified by 'lehash' is not allowed by host, we also fail to create guest.

>>>>

>>>> We can only allow creating guest with 'lewr' specified when host allows

>>>> anything.

>>>>

>>>> But in this way, we are restricting guest OS's ability to run LE, as only

>>>> one LE, that is specified by 'lehash' parameter, can be run. But I think

>>>> this won't hurt much, as multiple guests still are able to run different

>>>> LEs?

>>>>

>>>> Also, the only way to fail an MSR

>>>>>

>>>>> write is to send #GP, and Windows (and even Linux) may not expect

>>>>> that.  Linux doesn't panic due to #GP on MSR writes these days, but

>>>>> you still get a big fat warning.  I wouldn't be at all surprised if

>>>>> Windows BSODs.

>>>>

>>>>

>>>> We cannot allow writing some particular value to MSRs successfully, while

>>>> injecting #GP when writing other values to the same MSRs. So #GP is not

>>>> option.

>>>>

>>>> ENCLS[EINIT], on the other hand, returns an actual

>>>>>

>>>>> error code.  I'm not sure that a sensible error code exists

>>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),

>>>>

>>>>

>>>> Looks no such error code exists. And we cannot return such error code to

>>>> guest as such error code is only supposed to be valid when ENCLS is run in

>>>> hypervisor.

>>>>

>>>> but SGX_INVALID_EINITTOKEN seems

>>>>>

>>>>> to mean, more or less, "the CPU thinks you're not authorized to do

>>>>> this", so forcing that error code could be entirely reasonable.

>>>>>

>>>>> If the host policy is to allow a list of LE signers, you could return

>>>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in

>>>>> the list.

>>>>

>>>>

>>>> But this would be inconsistent with HW behavior. If the hash value in

>>>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT,

>EINIT

>>>> is not supposed to return SGX_INVALID_EINITTOKEN.

>>>>

>>>> I think from VMM's perspective, emulating HW behavior to be consistent

>>>> with real HW behavior is very important.

>>>>

>>>> Paolo, would you provide your comments?

>>>

>>>

>>> Hi all,

>>>

>>> This has been quite for a while and I'd like to start discussion again.

>>> Jarkko told me that currently he only supports one LE in SGX driver, but I

>>> am not sure whether he is going to extend in the future or not. I think this

>>> might also depend on requirements from customers.

>>>

>>> Andy,

>>>

>>> If we only support one LE in driver, then we can only support the same LE

>>> for all KVM guests, according to your comments that host kernel launch

>>> control policy should also apply to KVM guests? WOuld you comments more?

>>

>> I'm not at all convinced that, going forward, Linux's host-side launch

>> control policy will be entirely contained in the LE.  I'm also not

>> convinced that non-Linux guests will function at all under this type

>> of policy -- what if FreeBSD's LE is different for whatever reason?

>

>I am not convinced either. I think we need Jarkko to elaborate how is

>host side launch control policy implemented, or is there any policy at

>all. I also tried to read SGX driver code but looks I couldn't find any

>implementation regarding to this.

>

>Hi Jarkko,

>

>Can you elaborate on this?

>

>Thanks,

>-Kai


I don't think the kernel's LE policy is relevant here.
As long as you allow the guest OS to set the IA32_SGXLEPUBKEYHASHn MSRs, either directly or by the VMM 'lehash' value, you don't need a support for more than on LE in the kernel.
The host kernel will have one LE and the guest kernel has another "one" LE that may have a different hash value.
I agree with Andy that different OS distributions are likely to have different LE hash values - so the guest OS may require different hash setting.

>>

>>>

>>> Jarkko,

>>>

>>> Could you help to clarify the whole launch control policy in host side so

>>> that we can have a better understanding together?

>>>

>>> Thanks,

>>> -Kai

>>>

>>>>

>>>>>

>>>>> --Andy

>>>>>

>>>

>>
Jarkko Sakkinen June 8, 2017, 12:31 p.m. UTC | #31
On Wed, Jun 07, 2017 at 08:52:42AM +1200, Huang, Kai wrote:
> 
> 
> On 5/18/2017 7:45 PM, Huang, Kai wrote:
> > 
> > 
> > On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
> > > On Mon, May 15, 2017 at 5:48 PM, Huang, Kai
> > > <kai.huang@linux.intel.com> wrote:
> > > > 
> > > > 
> > > > On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
> > > > > 
> > > > > On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
> > > > > wrote:
> > > > > > 
> > > > > > I am not sure whether the cost of writing to 4 MSRs
> > > > > > would be *extremely*
> > > > > > slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
> > > > > > writing
> > > > > > to several MSRs, etc.
> > > > > 
> > > > > 
> > > > > I'm speculating that these MSRs may be rather unoptimized and hence
> > > > > unusualy slow.
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> > > > > > > with whatever lock is needed (probably just a mutex).  Users of EINIT
> > > > > > > will take the mutex, compare the percpu variable to
> > > > > > > the desired value,
> > > > > > > and, if it's different, do WRMSR and update the percpu variable.
> > > > > > > 
> > > > > > > KVM will implement writes to SGXLEPUBKEYHASH by
> > > > > > > updating its in-memory
> > > > > > > state but *not* changing the MSRs.  KVM will trap
> > > > > > > and emulate EINIT to
> > > > > > > support the same handling as the host.  There is no
> > > > > > > action required at
> > > > > > > all on KVM guest entry and exit.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > This is doable, but SGX driver needs to do those things and expose
> > > > > > interfaces for KVM to use. In terms of the percpu data, it is nice to
> > > > > > have,
> > > > > > but I am not sure whether it is mandatory, as IMO EINIT is not even in
> > > > > > performance critical path. We can simply read old value
> > > > > > from MSRs out and
> > > > > > compare whether the old equals to the new.
> > > > > 
> > > > > 
> > > > > I think the SGX driver should probably live in arch/x86, and the
> > > > > interface could be a simple percpu variable that is exported (from the
> > > > > main kernel image, not from a module).
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > FWIW, I think that KVM will, in the long run, want to trap EINIT for
> > > > > > > other reasons: someone is going to want to implement policy for what
> > > > > > > enclaves are allowed that applies to guests as well as the host.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > I am not very convinced why "what enclaves are allowed" in host would
> > > > > > apply
> > > > > > to guest. Can you elaborate? I mean in general
> > > > > > virtualization just focus
> > > > > > emulating hardware behavior. If a native machine is able
> > > > > > to run any LE,
> > > > > > the
> > > > > > virtual machine should be able to as well (of course, with guest's
> > > > > > IA32_FEATURE_CONTROL[bit 17] set).
> > > > > 
> > > > > 
> > > > > I strongly disagree.  I can imagine two classes of sensible policies
> > > > > for launch control:
> > > > > 
> > > > > 1. Allow everything.  This seems quite sensible to me.
> > > > > 
> > > > > 2. Allow some things, and make sure that VMs have at least as
> > > > > restrictive a policy as host root has.  After all, what's the point of
> > > > > restricting enclaves in the host if host code can simply spawn a
> > > > > little VM to run otherwise-disallowed enclaves?
> > > > 
> > > > 
> > > > What's the current SGX driver launch control policy? Yes allow
> > > > everything
> > > > works for KVM so lets skip this. Are we going to support
> > > > allowing several
> > > > LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
> > > > staff but I don't know details.
> > > > 
> > > > I am trying to find a way that we can both not break host launch control
> > > > policy, and be consistent to HW behavior (from guest's view).
> > > > Currently we
> > > > can create a KVM guest with runtime change to
> > > > IA32_SGXLEPUBKEYHASHn either
> > > > enabled or disabled. I introduced an Qemu parameter 'lewr' for
> > > > this purpose.
> > > > Actually I introduced below Qemu SGX parameters for creating guest:
> > > > 
> > > >         -sgx epc=<size>,lehash='SHA-256 hash',lewr
> > > > 
> > > > where 'epc' specifies guest's EPC size, lehash specifies
> > > > (initial) value of
> > > > guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether
> > > > guest is allowed
> > > > to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
> > > > 
> > > > If host only allows one single LE to run, KVM can add a restrict
> > > > that only
> > > > allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
> > > > disabled, so that only host allowed (single) hash can be used by
> > > > guest. From
> > > > guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
> > > > IA32_SGXLEPUBKEYHASHn with default value to be host allowed
> > > > (single) hash.
> > > > 
> > > > If host allows several LEs (not but everything), and if we
> > > > create guest with
> > > > 'lewr', then the behavior is not consistent with HW behavior, as from
> > > > guest's hardware's point of view, we can actually run any LE but
> > > > we have to
> > > > tell guest that you are only allowed to change
> > > > IA32_SGXLEPUBKEYHASHn to some
> > > > specific values. One compromise solution is we don't allow to
> > > > create guest
> > > > with 'lewr' specified, and at the meantime, only allow to create
> > > > guest with
> > > > host approved hashes specified in 'lehash'. This will make
> > > > guest's behavior
> > > > consistent to HW behavior but only allows guest to run one LE (which is
> > > > specified by 'lehash' when guest is created).
> > > 
> > > I'm not sure I entirely agree for a couple reasons.
> > > 
> > > 1. I wouldn't be surprised if the kernel ends up implementing a policy
> > > in which it checks all enclaves (not just LEs) for acceptability.  In
> > > fact, if the kernel sticks with the "no LE at all or just
> > > kernel-internal LE", then checking enclaves directly against some
> > > admin- or distro-provided signer list seems reasonable.  This type of
> > > policy can't be forwarded to a guest by restricting allowed LE
> > > signers.  But this is mostly speculation since AFAIK no one has
> > > seriously proposed any particular policy support and the plan was to
> > > not have this for the initial implementation.
> > > 
> > > 2. While matching hardware behavior is nice in principle, there
> > > doesn't seem to be useful hardware behavior to match here.  If the
> > > host had a list of five allowed LE signers, how exactly would it
> > > restrict the MSRs?  They're not written atomically, so you can't
> > > directly tell what's being written.
> > 
> > In this case I actually plan to just allow creating guest with guest's
> > IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
> > specified, creating guest will fail. And we only allow creating guest
> > with host allowed hash values (with 'lehash=hash-value'), and if
> > 'hash-value' specified by 'lehash' is not allowed by host, we also fail
> > to create guest.
> > 
> > We can only allow creating guest with 'lewr' specified when host allows
> > anything.
> > 
> > But in this way, we are restricting guest OS's ability to run LE, as
> > only one LE, that is specified by 'lehash' parameter, can be run. But I
> > think this won't hurt much, as multiple guests still are able to run
> > different LEs?
> > 
> > Also, the only way to fail an MSR
> > > write is to send #GP, and Windows (and even Linux) may not expect
> > > that.  Linux doesn't panic due to #GP on MSR writes these days, but
> > > you still get a big fat warning.  I wouldn't be at all surprised if
> > > Windows BSODs.
> > 
> > We cannot allow writing some particular value to MSRs successfully,
> > while injecting #GP when writing other values to the same MSRs. So #GP
> > is not option.
> > 
> > ENCLS[EINIT], on the other hand, returns an actual
> > > error code.  I'm not sure that a sensible error code exists
> > > ("SGX_HYPERVISOR_SAID_NO?", perhaps),
> > 
> > Looks no such error code exists. And we cannot return such error code to
> > guest as such error code is only supposed to be valid when ENCLS is run
> > in hypervisor.
> > 
> > but SGX_INVALID_EINITTOKEN seems
> > > to mean, more or less, "the CPU thinks you're not authorized to do
> > > this", so forcing that error code could be entirely reasonable.
> > > 
> > > If the host policy is to allow a list of LE signers, you could return
> > > SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
> > > the list.
> > 
> > But this would be inconsistent with HW behavior. If the hash value in
> > guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT,
> > EINIT is not supposed to return SGX_INVALID_EINITTOKEN.
> > 
> > I think from VMM's perspective, emulating HW behavior to be consistent
> > with real HW behavior is very important.
> > 
> > Paolo, would you provide your comments?
> 
> Hi all,
> 
> This has been quite for a while and I'd like to start discussion again.
> Jarkko told me that currently he only supports one LE in SGX driver, but I
> am not sure whether he is going to extend in the future or not. I think this
> might also depend on requirements from customers.
> 
> Andy,
> 
> If we only support one LE in driver, then we can only support the same LE
> for all KVM guests, according to your comments that host kernel launch
> control policy should also apply to KVM guests? WOuld you comments more?
> 
> Jarkko,
> 
> Could you help to clarify the whole launch control policy in host side so
> that we can have a better understanding together?
> 
> Thanks,
> -Kai

So. I have pass through LE. It creates EINITTOKEN for anything. Couldn't
VMM keep virtual values for MSRs and ask host side LE create token when
it needs to?

/Jarkko
Kai Huang June 8, 2017, 11:47 p.m. UTC | #32
On 6/9/2017 12:31 AM, Jarkko Sakkinen wrote:
> On Wed, Jun 07, 2017 at 08:52:42AM +1200, Huang, Kai wrote:
>>
>>
>> On 5/18/2017 7:45 PM, Huang, Kai wrote:
>>>
>>>
>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai
>>>> <kai.huang@linux.intel.com> wrote:
>>>>>
>>>>>
>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>>>
>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> I am not sure whether the cost of writing to 4 MSRs
>>>>>>> would be *extremely*
>>>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>>>>> writing
>>>>>>> to several MSRs, etc.
>>>>>>
>>>>>>
>>>>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>>>>> unusualy slow.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>>>>>> will take the mutex, compare the percpu variable to
>>>>>>>> the desired value,
>>>>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>>>>
>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by
>>>>>>>> updating its in-memory
>>>>>>>> state but *not* changing the MSRs.  KVM will trap
>>>>>>>> and emulate EINIT to
>>>>>>>> support the same handling as the host.  There is no
>>>>>>>> action required at
>>>>>>>> all on KVM guest entry and exit.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This is doable, but SGX driver needs to do those things and expose
>>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>>>>>> have,
>>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>>>>>> performance critical path. We can simply read old value
>>>>>>> from MSRs out and
>>>>>>> compare whether the old equals to the new.
>>>>>>
>>>>>>
>>>>>> I think the SGX driver should probably live in arch/x86, and the
>>>>>> interface could be a simple percpu variable that is exported (from the
>>>>>> main kernel image, not from a module).
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>>>>>> other reasons: someone is going to want to implement policy for what
>>>>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I am not very convinced why "what enclaves are allowed" in host would
>>>>>>> apply
>>>>>>> to guest. Can you elaborate? I mean in general
>>>>>>> virtualization just focus
>>>>>>> emulating hardware behavior. If a native machine is able
>>>>>>> to run any LE,
>>>>>>> the
>>>>>>> virtual machine should be able to as well (of course, with guest's
>>>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>>>
>>>>>>
>>>>>> I strongly disagree.  I can imagine two classes of sensible policies
>>>>>> for launch control:
>>>>>>
>>>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>>>
>>>>>> 2. Allow some things, and make sure that VMs have at least as
>>>>>> restrictive a policy as host root has.  After all, what's the point of
>>>>>> restricting enclaves in the host if host code can simply spawn a
>>>>>> little VM to run otherwise-disallowed enclaves?
>>>>>
>>>>>
>>>>> What's the current SGX driver launch control policy? Yes allow
>>>>> everything
>>>>> works for KVM so lets skip this. Are we going to support
>>>>> allowing several
>>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
>>>>> staff but I don't know details.
>>>>>
>>>>> I am trying to find a way that we can both not break host launch control
>>>>> policy, and be consistent to HW behavior (from guest's view).
>>>>> Currently we
>>>>> can create a KVM guest with runtime change to
>>>>> IA32_SGXLEPUBKEYHASHn either
>>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for
>>>>> this purpose.
>>>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>>>
>>>>>          -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>>>
>>>>> where 'epc' specifies guest's EPC size, lehash specifies
>>>>> (initial) value of
>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether
>>>>> guest is allowed
>>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>>>
>>>>> If host only allows one single LE to run, KVM can add a restrict
>>>>> that only
>>>>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>>> disabled, so that only host allowed (single) hash can be used by
>>>>> guest. From
>>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed
>>>>> (single) hash.
>>>>>
>>>>> If host allows several LEs (not but everything), and if we
>>>>> create guest with
>>>>> 'lewr', then the behavior is not consistent with HW behavior, as from
>>>>> guest's hardware's point of view, we can actually run any LE but
>>>>> we have to
>>>>> tell guest that you are only allowed to change
>>>>> IA32_SGXLEPUBKEYHASHn to some
>>>>> specific values. One compromise solution is we don't allow to
>>>>> create guest
>>>>> with 'lewr' specified, and at the meantime, only allow to create
>>>>> guest with
>>>>> host approved hashes specified in 'lehash'. This will make
>>>>> guest's behavior
>>>>> consistent to HW behavior but only allows guest to run one LE (which is
>>>>> specified by 'lehash' when guest is created).
>>>>
>>>> I'm not sure I entirely agree for a couple reasons.
>>>>
>>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy
>>>> in which it checks all enclaves (not just LEs) for acceptability.  In
>>>> fact, if the kernel sticks with the "no LE at all or just
>>>> kernel-internal LE", then checking enclaves directly against some
>>>> admin- or distro-provided signer list seems reasonable.  This type of
>>>> policy can't be forwarded to a guest by restricting allowed LE
>>>> signers.  But this is mostly speculation since AFAIK no one has
>>>> seriously proposed any particular policy support and the plan was to
>>>> not have this for the initial implementation.
>>>>
>>>> 2. While matching hardware behavior is nice in principle, there
>>>> doesn't seem to be useful hardware behavior to match here.  If the
>>>> host had a list of five allowed LE signers, how exactly would it
>>>> restrict the MSRs?  They're not written atomically, so you can't
>>>> directly tell what's being written.
>>>
>>> In this case I actually plan to just allow creating guest with guest's
>>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
>>> specified, creating guest will fail. And we only allow creating guest
>>> with host allowed hash values (with 'lehash=hash-value'), and if
>>> 'hash-value' specified by 'lehash' is not allowed by host, we also fail
>>> to create guest.
>>>
>>> We can only allow creating guest with 'lewr' specified when host allows
>>> anything.
>>>
>>> But in this way, we are restricting guest OS's ability to run LE, as
>>> only one LE, that is specified by 'lehash' parameter, can be run. But I
>>> think this won't hurt much, as multiple guests still are able to run
>>> different LEs?
>>>
>>> Also, the only way to fail an MSR
>>>> write is to send #GP, and Windows (and even Linux) may not expect
>>>> that.  Linux doesn't panic due to #GP on MSR writes these days, but
>>>> you still get a big fat warning.  I wouldn't be at all surprised if
>>>> Windows BSODs.
>>>
>>> We cannot allow writing some particular value to MSRs successfully,
>>> while injecting #GP when writing other values to the same MSRs. So #GP
>>> is not option.
>>>
>>> ENCLS[EINIT], on the other hand, returns an actual
>>>> error code.  I'm not sure that a sensible error code exists
>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
>>>
>>> Looks no such error code exists. And we cannot return such error code to
>>> guest as such error code is only supposed to be valid when ENCLS is run
>>> in hypervisor.
>>>
>>> but SGX_INVALID_EINITTOKEN seems
>>>> to mean, more or less, "the CPU thinks you're not authorized to do
>>>> this", so forcing that error code could be entirely reasonable.
>>>>
>>>> If the host policy is to allow a list of LE signers, you could return
>>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
>>>> the list.
>>>
>>> But this would be inconsistent with HW behavior. If the hash value in
>>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT,
>>> EINIT is not supposed to return SGX_INVALID_EINITTOKEN.
>>>
>>> I think from VMM's perspective, emulating HW behavior to be consistent
>>> with real HW behavior is very important.
>>>
>>> Paolo, would you provide your comments?
>>
>> Hi all,
>>
>> This has been quite for a while and I'd like to start discussion again.
>> Jarkko told me that currently he only supports one LE in SGX driver, but I
>> am not sure whether he is going to extend in the future or not. I think this
>> might also depend on requirements from customers.
>>
>> Andy,
>>
>> If we only support one LE in driver, then we can only support the same LE
>> for all KVM guests, according to your comments that host kernel launch
>> control policy should also apply to KVM guests? WOuld you comments more?
>>
>> Jarkko,
>>
>> Could you help to clarify the whole launch control policy in host side so
>> that we can have a better understanding together?
>>
>> Thanks,
>> -Kai
> 
> So. I have pass through LE. It creates EINITTOKEN for anything. Couldn't
> VMM keep virtual values for MSRs and ask host side LE create token when
> it needs to?

Hi Jarkko,

Thanks for replying. VMM doesn't need driver to generate EINITTOKEN. The 
EINITTOKEN is from guest too, upon VMM traps EINIT.

I think Andy's comments is, if host SGX driver only allows someone's LE 
to run (ex, LE from Intel, Redhat, etc...), then SGX driver should also 
govern LEs from KVM guests as well, by checking SIGSTRUCT from KVM guest 
(the SIGSTRUCT is provided by KVM by trapping guest's EINIT).

In my understanding, although you only allows one LE in kernel, but you 
won't limit who's LE can be run (basically kernel can run LE signed by 
anyone, but just one LE when kernel is running), so I don't see there is 
any limitation to KVM guests here.

But it may still be better if SGX driver can provide function like:

     int sgx_validate_sigstruct(struct sigstruct *sig);

for KVM to call, in case driver is changed (ex, to only allows LEs from 
some particular ones to run), but this is not necessary now. KVM changes 
can be done later when driver make the changes.

Andy,

Am I understanding correctly? Does this make sense to you?

Thanks,
-Kai

> 
> /Jarkko
>
Andy Lutomirski June 8, 2017, 11:53 p.m. UTC | #33
On Thu, Jun 8, 2017 at 4:47 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 6/9/2017 12:31 AM, Jarkko Sakkinen wrote:
>>
>> On Wed, Jun 07, 2017 at 08:52:42AM +1200, Huang, Kai wrote:
>>>
>>>
>>>
>>> On 5/18/2017 7:45 PM, Huang, Kai wrote:
>>>>
>>>>
>>>>
>>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>>>>>
>>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai
>>>>> <kai.huang@linux.intel.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai
>>>>>>> <kai.huang@linux.intel.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not sure whether the cost of writing to 4 MSRs
>>>>>>>> would be *extremely*
>>>>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>>>>>> writing
>>>>>>>> to several MSRs, etc.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>>>>>> unusualy slow.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH
>>>>>>>>> along
>>>>>>>>> with whatever lock is needed (probably just a mutex).  Users of
>>>>>>>>> EINIT
>>>>>>>>> will take the mutex, compare the percpu variable to
>>>>>>>>> the desired value,
>>>>>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>>>>>
>>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by
>>>>>>>>> updating its in-memory
>>>>>>>>> state but *not* changing the MSRs.  KVM will trap
>>>>>>>>> and emulate EINIT to
>>>>>>>>> support the same handling as the host.  There is no
>>>>>>>>> action required at
>>>>>>>>> all on KVM guest entry and exit.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This is doable, but SGX driver needs to do those things and expose
>>>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice
>>>>>>>> to
>>>>>>>> have,
>>>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even
>>>>>>>> in
>>>>>>>> performance critical path. We can simply read old value
>>>>>>>> from MSRs out and
>>>>>>>> compare whether the old equals to the new.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I think the SGX driver should probably live in arch/x86, and the
>>>>>>> interface could be a simple percpu variable that is exported (from
>>>>>>> the
>>>>>>> main kernel image, not from a module).
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT
>>>>>>>>> for
>>>>>>>>> other reasons: someone is going to want to implement policy for
>>>>>>>>> what
>>>>>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not very convinced why "what enclaves are allowed" in host
>>>>>>>> would
>>>>>>>> apply
>>>>>>>> to guest. Can you elaborate? I mean in general
>>>>>>>> virtualization just focus
>>>>>>>> emulating hardware behavior. If a native machine is able
>>>>>>>> to run any LE,
>>>>>>>> the
>>>>>>>> virtual machine should be able to as well (of course, with guest's
>>>>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I strongly disagree.  I can imagine two classes of sensible policies
>>>>>>> for launch control:
>>>>>>>
>>>>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>>>>
>>>>>>> 2. Allow some things, and make sure that VMs have at least as
>>>>>>> restrictive a policy as host root has.  After all, what's the point
>>>>>>> of
>>>>>>> restricting enclaves in the host if host code can simply spawn a
>>>>>>> little VM to run otherwise-disallowed enclaves?
>>>>>>
>>>>>>
>>>>>>
>>>>>> What's the current SGX driver launch control policy? Yes allow
>>>>>> everything
>>>>>> works for KVM so lets skip this. Are we going to support
>>>>>> allowing several
>>>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel
>>>>>> LE
>>>>>> staff but I don't know details.
>>>>>>
>>>>>> I am trying to find a way that we can both not break host launch
>>>>>> control
>>>>>> policy, and be consistent to HW behavior (from guest's view).
>>>>>> Currently we
>>>>>> can create a KVM guest with runtime change to
>>>>>> IA32_SGXLEPUBKEYHASHn either
>>>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for
>>>>>> this purpose.
>>>>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>>>>
>>>>>>          -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>>>>
>>>>>> where 'epc' specifies guest's EPC size, lehash specifies
>>>>>> (initial) value of
>>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether
>>>>>> guest is allowed
>>>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>>>>
>>>>>> If host only allows one single LE to run, KVM can add a restrict
>>>>>> that only
>>>>>> allows to create KVM guest with runtime change to
>>>>>> IA32_SGXLEPUBKEYHASHn
>>>>>> disabled, so that only host allowed (single) hash can be used by
>>>>>> guest. From
>>>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and
>>>>>> has
>>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed
>>>>>> (single) hash.
>>>>>>
>>>>>> If host allows several LEs (not but everything), and if we
>>>>>> create guest with
>>>>>> 'lewr', then the behavior is not consistent with HW behavior, as from
>>>>>> guest's hardware's point of view, we can actually run any LE but
>>>>>> we have to
>>>>>> tell guest that you are only allowed to change
>>>>>> IA32_SGXLEPUBKEYHASHn to some
>>>>>> specific values. One compromise solution is we don't allow to
>>>>>> create guest
>>>>>> with 'lewr' specified, and at the meantime, only allow to create
>>>>>> guest with
>>>>>> host approved hashes specified in 'lehash'. This will make
>>>>>> guest's behavior
>>>>>> consistent to HW behavior but only allows guest to run one LE (which
>>>>>> is
>>>>>> specified by 'lehash' when guest is created).
>>>>>
>>>>>
>>>>> I'm not sure I entirely agree for a couple reasons.
>>>>>
>>>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy
>>>>> in which it checks all enclaves (not just LEs) for acceptability.  In
>>>>> fact, if the kernel sticks with the "no LE at all or just
>>>>> kernel-internal LE", then checking enclaves directly against some
>>>>> admin- or distro-provided signer list seems reasonable.  This type of
>>>>> policy can't be forwarded to a guest by restricting allowed LE
>>>>> signers.  But this is mostly speculation since AFAIK no one has
>>>>> seriously proposed any particular policy support and the plan was to
>>>>> not have this for the initial implementation.
>>>>>
>>>>> 2. While matching hardware behavior is nice in principle, there
>>>>> doesn't seem to be useful hardware behavior to match here.  If the
>>>>> host had a list of five allowed LE signers, how exactly would it
>>>>> restrict the MSRs?  They're not written atomically, so you can't
>>>>> directly tell what's being written.
>>>>
>>>>
>>>> In this case I actually plan to just allow creating guest with guest's
>>>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
>>>> specified, creating guest will fail. And we only allow creating guest
>>>> with host allowed hash values (with 'lehash=hash-value'), and if
>>>> 'hash-value' specified by 'lehash' is not allowed by host, we also fail
>>>> to create guest.
>>>>
>>>> We can only allow creating guest with 'lewr' specified when host allows
>>>> anything.
>>>>
>>>> But in this way, we are restricting guest OS's ability to run LE, as
>>>> only one LE, that is specified by 'lehash' parameter, can be run. But I
>>>> think this won't hurt much, as multiple guests still are able to run
>>>> different LEs?
>>>>
>>>> Also, the only way to fail an MSR
>>>>>
>>>>> write is to send #GP, and Windows (and even Linux) may not expect
>>>>> that.  Linux doesn't panic due to #GP on MSR writes these days, but
>>>>> you still get a big fat warning.  I wouldn't be at all surprised if
>>>>> Windows BSODs.
>>>>
>>>>
>>>> We cannot allow writing some particular value to MSRs successfully,
>>>> while injecting #GP when writing other values to the same MSRs. So #GP
>>>> is not option.
>>>>
>>>> ENCLS[EINIT], on the other hand, returns an actual
>>>>>
>>>>> error code.  I'm not sure that a sensible error code exists
>>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
>>>>
>>>>
>>>> Looks no such error code exists. And we cannot return such error code to
>>>> guest as such error code is only supposed to be valid when ENCLS is run
>>>> in hypervisor.
>>>>
>>>> but SGX_INVALID_EINITTOKEN seems
>>>>>
>>>>> to mean, more or less, "the CPU thinks you're not authorized to do
>>>>> this", so forcing that error code could be entirely reasonable.
>>>>>
>>>>> If the host policy is to allow a list of LE signers, you could return
>>>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
>>>>> the list.
>>>>
>>>>
>>>> But this would be inconsistent with HW behavior. If the hash value in
>>>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT,
>>>> EINIT is not supposed to return SGX_INVALID_EINITTOKEN.
>>>>
>>>> I think from VMM's perspective, emulating HW behavior to be consistent
>>>> with real HW behavior is very important.
>>>>
>>>> Paolo, would you provide your comments?
>>>
>>>
>>> Hi all,
>>>
>>> This has been quite for a while and I'd like to start discussion again.
>>> Jarkko told me that currently he only supports one LE in SGX driver, but
>>> I
>>> am not sure whether he is going to extend in the future or not. I think
>>> this
>>> might also depend on requirements from customers.
>>>
>>> Andy,
>>>
>>> If we only support one LE in driver, then we can only support the same LE
>>> for all KVM guests, according to your comments that host kernel launch
>>> control policy should also apply to KVM guests? WOuld you comments more?
>>>
>>> Jarkko,
>>>
>>> Could you help to clarify the whole launch control policy in host side so
>>> that we can have a better understanding together?
>>>
>>> Thanks,
>>> -Kai
>>
>>
>> So. I have pass through LE. It creates EINITTOKEN for anything. Couldn't
>> VMM keep virtual values for MSRs and ask host side LE create token when
>> it needs to?
>
>
> Hi Jarkko,
>
> Thanks for replying. VMM doesn't need driver to generate EINITTOKEN. The
> EINITTOKEN is from guest too, upon VMM traps EINIT.
>
> I think Andy's comments is, if host SGX driver only allows someone's LE to
> run (ex, LE from Intel, Redhat, etc...), then SGX driver should also govern
> LEs from KVM guests as well, by checking SIGSTRUCT from KVM guest (the
> SIGSTRUCT is provided by KVM by trapping guest's EINIT).
>
> In my understanding, although you only allows one LE in kernel, but you
> won't limit who's LE can be run (basically kernel can run LE signed by
> anyone, but just one LE when kernel is running), so I don't see there is any
> limitation to KVM guests here.
>
> But it may still be better if SGX driver can provide function like:
>
>     int sgx_validate_sigstruct(struct sigstruct *sig);
>
> for KVM to call, in case driver is changed (ex, to only allows LEs from some
> particular ones to run), but this is not necessary now. KVM changes can be
> done later when driver make the changes.
>
> Andy,
>
> Am I understanding correctly? Does this make sense to you?

My understanding is that the kernel will not (at least initially)
allow users to supply an LE.  The kernel will handle launches all by
itself.  How it does so is an implementation detail, and it seems to
me that it will cause compatibility issues if guests have to use the
host's LE.

sgx_validate_sigstruct(...) sounds like it could be a good approach to me.

>
> Thanks,
> -Kai
>
>>
>> /Jarkko
>>
>
Cohen, Haim June 9, 2017, 3:38 p.m. UTC | #34
On 6/8/2017 7:53 PM, Andy Lutomirski wrote:
>

>On Thu, Jun 8, 2017 at 4:47 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:

>>

>>

>> On 6/9/2017 12:31 AM, Jarkko Sakkinen wrote:

>>>

>>> On Wed, Jun 07, 2017 at 08:52:42AM +1200, Huang, Kai wrote:

>>>>

>>>>

>>>>

>>>> On 5/18/2017 7:45 PM, Huang, Kai wrote:

>>>>>

>>>>>

>>>>>

>>>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:

>>>>>>

>>>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai

>>>>>> <kai.huang@linux.intel.com> wrote:

>>>>>>>

>>>>>>>

>>>>>>>

>>>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:

>>>>>>>>

>>>>>>>>

>>>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai

>>>>>>>> <kai.huang@linux.intel.com>

>>>>>>>> wrote:

>>>>>>>>>

>>>>>>>>>

>>>>>>>>> I am not sure whether the cost of writing to 4 MSRs would be

>>>>>>>>> *extremely* slow, as when vcpu is schedule in, KVM is already

>>>>>>>>> doing vmcs_load, writing to several MSRs, etc.

>>>>>>>>

>>>>>>>>

>>>>>>>>

>>>>>>>> I'm speculating that these MSRs may be rather unoptimized and

>>>>>>>> hence unusualy slow.

>>>>>>>>

>>>>>>>>>

>>>>>>>>>>

>>>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH

>>>>>>>>>> along with whatever lock is needed (probably just a mutex).

>>>>>>>>>> Users of EINIT will take the mutex, compare the percpu

>>>>>>>>>> variable to the desired value, and, if it's different, do

>>>>>>>>>> WRMSR and update the percpu variable.

>>>>>>>>>>

>>>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its

>>>>>>>>>> in-memory state but *not* changing the MSRs.  KVM will trap

>>>>>>>>>> and emulate EINIT to support the same handling as the host.

>>>>>>>>>> There is no action required at all on KVM guest entry and

>>>>>>>>>> exit.

>>>>>>>>>

>>>>>>>>>

>>>>>>>>>

>>>>>>>>>

>>>>>>>>> This is doable, but SGX driver needs to do those things and

>>>>>>>>> expose interfaces for KVM to use. In terms of the percpu data,

>>>>>>>>> it is nice to have, but I am not sure whether it is mandatory,

>>>>>>>>> as IMO EINIT is not even in performance critical path. We can

>>>>>>>>> simply read old value from MSRs out and compare whether the old

>>>>>>>>> equals to the new.

>>>>>>>>

>>>>>>>>

>>>>>>>>

>>>>>>>> I think the SGX driver should probably live in arch/x86, and the

>>>>>>>> interface could be a simple percpu variable that is exported

>>>>>>>> (from the main kernel image, not from a module).

>>>>>>>>

>>>>>>>>>

>>>>>>>>>>

>>>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap

>>>>>>>>>> EINIT for other reasons: someone is going to want to implement

>>>>>>>>>> policy for what enclaves are allowed that applies to guests as

>>>>>>>>>> well as the host.

>>>>>>>>>

>>>>>>>>>

>>>>>>>>>

>>>>>>>>>

>>>>>>>>> I am not very convinced why "what enclaves are allowed" in host

>>>>>>>>> would apply to guest. Can you elaborate? I mean in general

>>>>>>>>> virtualization just focus emulating hardware behavior. If a

>>>>>>>>> native machine is able to run any LE, the virtual machine

>>>>>>>>> should be able to as well (of course, with guest's

>>>>>>>>> IA32_FEATURE_CONTROL[bit 17] set).

>>>>>>>>

>>>>>>>>

>>>>>>>>

>>>>>>>> I strongly disagree.  I can imagine two classes of sensible

>>>>>>>> policies for launch control:

>>>>>>>>

>>>>>>>> 1. Allow everything.  This seems quite sensible to me.

>>>>>>>>

>>>>>>>> 2. Allow some things, and make sure that VMs have at least as

>>>>>>>> restrictive a policy as host root has.  After all, what's the

>>>>>>>> point of restricting enclaves in the host if host code can

>>>>>>>> simply spawn a little VM to run otherwise-disallowed enclaves?

>>>>>>>

>>>>>>>

>>>>>>>

>>>>>>> What's the current SGX driver launch control policy? Yes allow

>>>>>>> everything works for KVM so lets skip this. Are we going to

>>>>>>> support allowing several LEs, or just allowing one single LE? I

>>>>>>> know Jarkko is doing in-kernel LE staff but I don't know details.

>>>>>>>

>>>>>>> I am trying to find a way that we can both not break host launch

>>>>>>> control policy, and be consistent to HW behavior (from guest's

>>>>>>> view).

>>>>>>> Currently we

>>>>>>> can create a KVM guest with runtime change to

>>>>>>> IA32_SGXLEPUBKEYHASHn either enabled or disabled. I introduced an

>>>>>>> Qemu parameter 'lewr' for this purpose.

>>>>>>> Actually I introduced below Qemu SGX parameters for creating guest:

>>>>>>>

>>>>>>>          -sgx epc=<size>,lehash='SHA-256 hash',lewr

>>>>>>>

>>>>>>> where 'epc' specifies guest's EPC size, lehash specifies

>>>>>>> (initial) value of

>>>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest

>>>>>>> is allowed to change guest's IA32_SGXLEPUBKEYHASHn at runtime.

>>>>>>>

>>>>>>> If host only allows one single LE to run, KVM can add a restrict

>>>>>>> that only allows to create KVM guest with runtime change to

>>>>>>> IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed

>>>>>>> (single) hash can be used by guest. From guest's view, it simply

>>>>>>> has IA32_FEATURE_CONTROL[bit17] cleared and has

>>>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed

>>>>>>> (single) hash.

>>>>>>>

>>>>>>> If host allows several LEs (not but everything), and if we create

>>>>>>> guest with 'lewr', then the behavior is not consistent with HW

>>>>>>> behavior, as from guest's hardware's point of view, we can

>>>>>>> actually run any LE but we have to tell guest that you are only

>>>>>>> allowed to change IA32_SGXLEPUBKEYHASHn to some specific values.

>>>>>>> One compromise solution is we don't allow to create guest with

>>>>>>> 'lewr' specified, and at the meantime, only allow to create guest

>>>>>>> with host approved hashes specified in 'lehash'. This will make

>>>>>>> guest's behavior consistent to HW behavior but only allows guest

>>>>>>> to run one LE (which is specified by 'lehash' when guest is

>>>>>>> created).

>>>>>>

>>>>>>

>>>>>> I'm not sure I entirely agree for a couple reasons.

>>>>>>

>>>>>> 1. I wouldn't be surprised if the kernel ends up implementing a

>>>>>> policy in which it checks all enclaves (not just LEs) for

>>>>>> acceptability.  In fact, if the kernel sticks with the "no LE at

>>>>>> all or just kernel-internal LE", then checking enclaves directly

>>>>>> against some

>>>>>> admin- or distro-provided signer list seems reasonable.  This type

>>>>>> of policy can't be forwarded to a guest by restricting allowed LE

>>>>>> signers.  But this is mostly speculation since AFAIK no one has

>>>>>> seriously proposed any particular policy support and the plan was

>>>>>> to not have this for the initial implementation.

>>>>>>

>>>>>> 2. While matching hardware behavior is nice in principle, there

>>>>>> doesn't seem to be useful hardware behavior to match here.  If the

>>>>>> host had a list of five allowed LE signers, how exactly would it

>>>>>> restrict the MSRs?  They're not written atomically, so you can't

>>>>>> directly tell what's being written.

>>>>>

>>>>>

>>>>> In this case I actually plan to just allow creating guest with

>>>>> guest's IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified).

>>>>> If 'lewr' is specified, creating guest will fail. And we only allow

>>>>> creating guest with host allowed hash values (with

>>>>> 'lehash=hash-value'), and if 'hash-value' specified by 'lehash' is

>>>>> not allowed by host, we also fail to create guest.

>>>>>

>>>>> We can only allow creating guest with 'lewr' specified when host

>>>>> allows anything.

>>>>>

>>>>> But in this way, we are restricting guest OS's ability to run LE,

>>>>> as only one LE, that is specified by 'lehash' parameter, can be

>>>>> run. But I think this won't hurt much, as multiple guests still are

>>>>> able to run different LEs?

>>>>>

>>>>> Also, the only way to fail an MSR

>>>>>>

>>>>>> write is to send #GP, and Windows (and even Linux) may not expect

>>>>>> that.  Linux doesn't panic due to #GP on MSR writes these days,

>>>>>> but you still get a big fat warning.  I wouldn't be at all

>>>>>> surprised if Windows BSODs.

>>>>>

>>>>>

>>>>> We cannot allow writing some particular value to MSRs successfully,

>>>>> while injecting #GP when writing other values to the same MSRs. So

>>>>> #GP is not option.

>>>>>

>>>>> ENCLS[EINIT], on the other hand, returns an actual

>>>>>>

>>>>>> error code.  I'm not sure that a sensible error code exists

>>>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),

>>>>>

>>>>>

>>>>> Looks no such error code exists. And we cannot return such error

>>>>> code to guest as such error code is only supposed to be valid when

>>>>> ENCLS is run in hypervisor.

>>>>>

>>>>> but SGX_INVALID_EINITTOKEN seems

>>>>>>

>>>>>> to mean, more or less, "the CPU thinks you're not authorized to do

>>>>>> this", so forcing that error code could be entirely reasonable.

>>>>>>

>>>>>> If the host policy is to allow a list of LE signers, you could

>>>>>> return SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE

>>>>>> that isn't in the list.

>>>>>

>>>>>

>>>>> But this would be inconsistent with HW behavior. If the hash value

>>>>> in guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by

>>>>> EINIT, EINIT is not supposed to return SGX_INVALID_EINITTOKEN.

>>>>>

>>>>> I think from VMM's perspective, emulating HW behavior to be

>>>>> consistent with real HW behavior is very important.

>>>>>

>>>>> Paolo, would you provide your comments?

>>>>

>>>>

>>>> Hi all,

>>>>

>>>> This has been quite for a while and I'd like to start discussion again.

>>>> Jarkko told me that currently he only supports one LE in SGX driver,

>>>> but I am not sure whether he is going to extend in the future or

>>>> not. I think this might also depend on requirements from customers.

>>>>

>>>> Andy,

>>>>

>>>> If we only support one LE in driver, then we can only support the

>>>> same LE for all KVM guests, according to your comments that host

>>>> kernel launch control policy should also apply to KVM guests? WOuld you

>comments more?

>>>>

>>>> Jarkko,

>>>>

>>>> Could you help to clarify the whole launch control policy in host

>>>> side so that we can have a better understanding together?

>>>>

>>>> Thanks,

>>>> -Kai

>>>

>>>

>>> So. I have pass through LE. It creates EINITTOKEN for anything.

>>> Couldn't VMM keep virtual values for MSRs and ask host side LE create

>>> token when it needs to?

>>

>>

>> Hi Jarkko,

>>

>> Thanks for replying. VMM doesn't need driver to generate EINITTOKEN.

>> The EINITTOKEN is from guest too, upon VMM traps EINIT.

>>

>> I think Andy's comments is, if host SGX driver only allows someone's

>> LE to run (ex, LE from Intel, Redhat, etc...), then SGX driver should

>> also govern LEs from KVM guests as well, by checking SIGSTRUCT from

>> KVM guest (the SIGSTRUCT is provided by KVM by trapping guest's EINIT).

>>

>> In my understanding, although you only allows one LE in kernel, but

>> you won't limit who's LE can be run (basically kernel can run LE

>> signed by anyone, but just one LE when kernel is running), so I don't

>> see there is any limitation to KVM guests here.

>>

>> But it may still be better if SGX driver can provide function like:

>>

>>     int sgx_validate_sigstruct(struct sigstruct *sig);

>>

>> for KVM to call, in case driver is changed (ex, to only allows LEs

>> from some particular ones to run), but this is not necessary now. KVM

>> changes can be done later when driver make the changes.

>>

>> Andy,

>>

>> Am I understanding correctly? Does this make sense to you?

>

>My understanding is that the kernel will not (at least initially) allow users to supply

>an LE.  The kernel will handle launches all by itself.  How it does so is an

>implementation detail, and it seems to me that it will cause compatibility issues if

>guests have to use the host's LE.

>

>sgx_validate_sigstruct(...) sounds like it could be a good approach to me.

>


I don't think the kernel needs to support more than one LE.
For my understanding the guest kernel may include different LE and will require different LE hash, either by configuring 'lehash' when launching the guest, or by the guest kernel setting the hash value.
I assume that the VMM should not block this MSR writes and allow the guest to set the hash values for its own LE (by caching the MSR writes and setting the values on EINIT).

So I don't see why the host kernel should provide EINITTOKEN for the guests.
Moreover, I guess that the approach of LE providing tokens to any enclave is only the initial implementation, and different distributions may decide to add some logic to the token generation, so it might be that the host LE or the guest LE will not provide token to some enclave, and I don't think we'll want to override this kernel logic.

>>

>> Thanks,

>> -Kai

>>

>>>

>>> /Jarkko

>>>

>>
Jarkko Sakkinen June 10, 2017, 12:23 p.m. UTC | #35
On Fri, Jun 09, 2017 at 11:47:13AM +1200, Huang, Kai wrote:
> In my understanding, although you only allows one LE in kernel, but you
> won't limit who's LE can be run (basically kernel can run LE signed by
> anyone, but just one LE when kernel is running), so I don't see there is any
> limitation to KVM guests here.
> 
> But it may still be better if SGX driver can provide function like:
> 
>     int sgx_validate_sigstruct(struct sigstruct *sig);
> 
> for KVM to call, in case driver is changed (ex, to only allows LEs from some
> particular ones to run), but this is not necessary now. KVM changes can be
> done later when driver make the changes.
> 
> Andy,
> 
> Am I understanding correctly? Does this make sense to you?
> 
> Thanks,
> -Kai

Nope. I don't even understand the *beginnings* what that function would
do. I don't understand what the validation means here and what VMM would
do if that functions reports "success".

How that would work on a system where MSRs cannot be changed?

In that kind of system the host OS must generate EINITTOKEN for the LE
running on inside the guest and maintain completely virtualized MSR
values for the guest.

/Jarkko
Kai Huang June 11, 2017, 10:45 p.m. UTC | #36
On 6/11/2017 12:23 AM, Jarkko Sakkinen wrote:
> On Fri, Jun 09, 2017 at 11:47:13AM +1200, Huang, Kai wrote:
>> In my understanding, although you only allows one LE in kernel, but you
>> won't limit who's LE can be run (basically kernel can run LE signed by
>> anyone, but just one LE when kernel is running), so I don't see there is any
>> limitation to KVM guests here.
>>
>> But it may still be better if SGX driver can provide function like:
>>
>>      int sgx_validate_sigstruct(struct sigstruct *sig);
>>
>> for KVM to call, in case driver is changed (ex, to only allows LEs from some
>> particular ones to run), but this is not necessary now. KVM changes can be
>> done later when driver make the changes.
>>
>> Andy,
>>
>> Am I understanding correctly? Does this make sense to you?
>>
>> Thanks,
>> -Kai
> 
> Nope. I don't even understand the *beginnings* what that function would
> do. I don't understand what the validation means here and what VMM would
> do if that functions reports "success".

The validation means either the sigstruct->modulus or 
SHA256(sigstruct->modulus) should be in a 'approved white-list' 
maintained by kernel (which I know doesn't exist now, but Andy some kind 
suggested we may or should have, in the future I guess), otherwise the 
function returns error to indicate the LE from guest is "unapproved by 
host kernel/driver".

Andy, would you explain here?

> 
> How that would work on a system where MSRs cannot be changed?

This is simple, we simply won't allow guest to choose its own 
IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter 
when creating the guest.

To elaborate, currently in my design Qemu has below new parameters to 
support SGX:

	# qemu-system-x86_64 -sgx, epc=<size>,lehash=<sha-256 hash>,lewr

The 'epc=<size>' specifies guest's EPC size obviously, lehash specifies 
guest's initial IA32_SGXLEPUBKEYHASHn (similar to the value configured 
in BIOS for real machine), and 'lewr' specifies whether guest's 
IA32_SGXLEPUBKEYHASHn can be changed by OS at runtime. The 'lehash' and 
'lewr' are optional.

If MSRs cannot be changed on physical machine, then we will fail to 
create guest if either 'lehash' or 'lewr' is specified when creating the 
guest.

> 
> In that kind of system the host OS must generate EINITTOKEN for the LE
> running on inside the guest and maintain completely virtualized MSR
> values for the guest.

The host OS will not generate EINITTOKEN for guest in any circumstances, 
as EINITTOKEN will always be from guest's EINIT instruction. KVM traps 
EINIT from guest and gets both SIGSTRUCT and EINITTOKEN from the EINIT 
leaf, update MSRs, and run EINIT on behalf of guest.

Btw the purpose for KVM to trap EINIT is to update guest's virtual 
IA32_SGXLEPUBKEYHASHn to physical MSRs, before running EINIT. In fact 
KVM even doesn't need to trap EINIT but simply updating guest's MSRs to 
real MSRs when vcpu is scheduled in, if SGX driver can update host LE's 
hash to MSRs before EINIT in host. KVM are not trying to guarantee 
running EINIT successfully here, but simply to emulate guest's 
IA32_SGXLEPUBKEYHASHn and EINIT in the guest.

Thanks,
-Kai

> 
> /Jarkko
>
Jarkko Sakkinen June 12, 2017, 8:36 a.m. UTC | #37
On Mon, Jun 12, 2017 at 10:45:07AM +1200, Huang, Kai wrote:
> > > But it may still be better if SGX driver can provide function like:
> > > 
> > >      int sgx_validate_sigstruct(struct sigstruct *sig);
> > > 
> > > for KVM to call, in case driver is changed (ex, to only allows LEs from some
> > > particular ones to run), but this is not necessary now. KVM changes can be
> > > done later when driver make the changes.
> > > 
> > > Andy,
> > > 
> > > Am I understanding correctly? Does this make sense to you?
> > > 
> > > Thanks,
> > > -Kai
> > 
> > Nope. I don't even understand the *beginnings* what that function would
> > do. I don't understand what the validation means here and what VMM would
> > do if that functions reports "success".
> 
> The validation means either the sigstruct->modulus or
> SHA256(sigstruct->modulus) should be in a 'approved white-list' maintained
> by kernel (which I know doesn't exist now, but Andy some kind suggested we
> may or should have, in the future I guess), otherwise the function returns
> error to indicate the LE from guest is "unapproved by host kernel/driver".
> 
> Andy, would you explain here?

That can be considered but I still have zero idea what this function is
and what its relation to whitelist would be.

> > How that would work on a system where MSRs cannot be changed?
> 
> This is simple, we simply won't allow guest to choose its own
> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
> creating the guest.

Why not? You could have virtual MSRs and ask host LE to generate token
if they match to modulus.

> To elaborate, currently in my design Qemu has below new parameters to
> support SGX:
> 
> 	# qemu-system-x86_64 -sgx, epc=<size>,lehash=<sha-256 hash>,lewr
> 
> The 'epc=<size>' specifies guest's EPC size obviously, lehash specifies
> guest's initial IA32_SGXLEPUBKEYHASHn (similar to the value configured in
> BIOS for real machine), and 'lewr' specifies whether guest's
> IA32_SGXLEPUBKEYHASHn can be changed by OS at runtime. The 'lehash' and
> 'lewr' are optional.
> 
> If MSRs cannot be changed on physical machine, then we will fail to create
> guest if either 'lehash' or 'lewr' is specified when creating the guest.
> 
> > 
> > In that kind of system the host OS must generate EINITTOKEN for the LE
> > running on inside the guest and maintain completely virtualized MSR
> > values for the guest.
> 
> The host OS will not generate EINITTOKEN for guest in any circumstances, as
> EINITTOKEN will always be from guest's EINIT instruction. KVM traps EINIT
> from guest and gets both SIGSTRUCT and EINITTOKEN from the EINIT leaf,
> update MSRs, and run EINIT on behalf of guest.

Seriously sounds like a stupid constraint or I'm not getting something
(which also might be the case). If you anyway trap EINIT, you could
create a special case for guest LE.

/Jarkko
Kai Huang June 12, 2017, 9:53 a.m. UTC | #38
On 6/12/2017 8:36 PM, Jarkko Sakkinen wrote:
> On Mon, Jun 12, 2017 at 10:45:07AM +1200, Huang, Kai wrote:
>>>> But it may still be better if SGX driver can provide function like:
>>>>
>>>>       int sgx_validate_sigstruct(struct sigstruct *sig);
>>>>
>>>> for KVM to call, in case driver is changed (ex, to only allows LEs from some
>>>> particular ones to run), but this is not necessary now. KVM changes can be
>>>> done later when driver make the changes.
>>>>
>>>> Andy,
>>>>
>>>> Am I understanding correctly? Does this make sense to you?
>>>>
>>>> Thanks,
>>>> -Kai
>>>
>>> Nope. I don't even understand the *beginnings* what that function would
>>> do. I don't understand what the validation means here and what VMM would
>>> do if that functions reports "success".
>>
>> The validation means either the sigstruct->modulus or
>> SHA256(sigstruct->modulus) should be in a 'approved white-list' maintained
>> by kernel (which I know doesn't exist now, but Andy some kind suggested we
>> may or should have, in the future I guess), otherwise the function returns
>> error to indicate the LE from guest is "unapproved by host kernel/driver".
>>
>> Andy, would you explain here?
> 
> That can be considered but I still have zero idea what this function is
> and what its relation to whitelist would be.

The relation is this function only returns success when 
sigstruct->modulus (or SHA256(sigstruct->modulus)) matches the value in 
the 'white-list'...

VMM does nothing if the function returns success, it continues to run 
EINIT on behalf of guest, and inject the result to guest (either success 
or failure, VMM doesn't care the result but need to report the result to 
guest to reflect HW behavior).

If this function returns error, VMM needs to report some error code to 
guest. VMM even doesn't need to run EINIT anymore.

But IMO we don't have add this function now. If your driver choose to 
have such 'white-list', we can do this in the future.

> 
>>> How that would work on a system where MSRs cannot be changed?
>>
>> This is simple, we simply won't allow guest to choose its own
>> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
>> creating the guest.
> 
> Why not? You could have virtual MSRs and ask host LE to generate token
> if they match to modulus.

The guest has its own LE running inside, and guest's LE will generate 
token for enclaves in guest. The host will not generate token for guest 
in any circumstances, because this is totally guest's behavior.

Virtualization is only about to emulate hardware's behavior, but not to 
assume, or depend on, or change guest's SW behavior. We are not trying 
to make sure EINIT will run successfully in guest, instead, we are 
trying to make sure EINIT will be just the same behavior as it runs on 
physical machine -- either success or failure, according to guest's 
sigstruct and token.

If EINIT in guest is supposed to fail (ex, incorrect token is generated 
by guest's LE), KVM will need to inject the supposed error to guest, to 
reflect the real hardware behavior. One more example, if KVM even 
chooses not to trap EINIT, how can you provide token generated by host 
LE to guest? You simply cannot. Host LE will never generate token for 
guest, in any circumstance.

> 
>> To elaborate, currently in my design Qemu has below new parameters to
>> support SGX:
>>
>> 	# qemu-system-x86_64 -sgx, epc=<size>,lehash=<sha-256 hash>,lewr
>>
>> The 'epc=<size>' specifies guest's EPC size obviously, lehash specifies
>> guest's initial IA32_SGXLEPUBKEYHASHn (similar to the value configured in
>> BIOS for real machine), and 'lewr' specifies whether guest's
>> IA32_SGXLEPUBKEYHASHn can be changed by OS at runtime. The 'lehash' and
>> 'lewr' are optional.
>>
>> If MSRs cannot be changed on physical machine, then we will fail to create
>> guest if either 'lehash' or 'lewr' is specified when creating the guest.
>>
>>>
>>> In that kind of system the host OS must generate EINITTOKEN for the LE
>>> running on inside the guest and maintain completely virtualized MSR
>>> values for the guest.
>>
>> The host OS will not generate EINITTOKEN for guest in any circumstances, as
>> EINITTOKEN will always be from guest's EINIT instruction. KVM traps EINIT
>> from guest and gets both SIGSTRUCT and EINITTOKEN from the EINIT leaf,
>> update MSRs, and run EINIT on behalf of guest.
> 
> Seriously sounds like a stupid constraint or I'm not getting something
> (which also might be the case). If you anyway trap EINIT, you could
> create a special case for guest LE.

This is not constraint, but KVM has to emulate hardware correctly. For 
this part please see my explanation above.

And let me explain the purpose of trapping EINIT again here.

When guest is about to run EINIT, if guest's SHA256(sigstruct->modulus) 
matches guest's virtual IA32_SGXLEPUBKEYHASHn (and if others are 
correctly populated in sigstruct and token as well), KVM needs to make 
sure that EINIT will run successfully in guest, even physical 
IA32_SGXLEPUBKEYHASHn are not equal to guest's virtual MSRs at this 
particular time. This is because given the same condition, the EINIT 
will run successfully on physical machine. KVM needs to emulate the 
right HW behavior.

How to make sure guest's EINIT will run successfully in this case? We 
need to update guest's virtual MSRs to physical MSRs before guest runs 
EINIT. The whole purpose for KVM to trap EINIT from guest, is to 
guarantee that before EINIT, KVM can update guest's virtual MSRs to 
physical MSRs.

Like I said before, KVM even doesn't need to trap EINIT for this 
purpose, ex, KVM can update guest's virtual MSRs to real MSRs when vcpu 
is scheduled in (which is exactly what I did in this patch, btw), but 
this is considered to be performance unfriendly, so Andy, Sean suggested 
we should do this when trapping EINIT from guest.

Another problem of updating MSRs when vcpu is scheduled in is, this will 
break SGX driver if SGX driver does some performance optimization in 
terms of updating MSRs, such as keeping per_cpu variables for MSR 
current value, and only updating MSRs if per_cpu value is not equal to 
the value you want to update. Because of this, Andy, Sean also suggested 
that SGX driver can provide a function to handle updating MSRs and run 
EINIT together with mutex protected, and KVM simply calls that function 
(of course KVM will provide the sigstruct and token by trapping EINIT 
from guest). This can cover MSRs updating and EINIT for both host and 
KVM guests. IMO this is a good idea as well, and you should consider to 
do in this way.

Thanks,
-Kai

> 
> /Jarkko
>
Andy Lutomirski June 12, 2017, 4:24 p.m. UTC | #39
On Mon, Jun 12, 2017 at 2:53 AM, Huang, Kai <kai.huang@linux.intel.com> wrote:
\
>>
>>>> How that would work on a system where MSRs cannot be changed?
>>>
>>>
>>> This is simple, we simply won't allow guest to choose its own
>>> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
>>> creating the guest.
>>
>>
>> Why not? You could have virtual MSRs and ask host LE to generate token
>> if they match to modulus.
>
>
> The guest has its own LE running inside, and guest's LE will generate token
> for enclaves in guest. The host will not generate token for guest in any
> circumstances, because this is totally guest's behavior.
>
> Virtualization is only about to emulate hardware's behavior, but not to
> assume, or depend on, or change guest's SW behavior. We are not trying to
> make sure EINIT will run successfully in guest, instead, we are trying to
> make sure EINIT will be just the same behavior as it runs on physical
> machine -- either success or failure, according to guest's sigstruct and
> token.

I disagree.  Virtualization can do whatever it wants, but a pretty
strong constraint is that the guest should be at least reasonably
function.  A Windows guest, for example, shouldn't BSOD.  But the host
most certainly can restrict what the guest can do.  If a guest is
given pass-through access to a graphics card, the host is well within
its rights to impose thermal policy, prevent reflashing, etc.
Similarly, if a guest is given access to SGX, the host can and should
impose required policy on the guest.  If this means that an EINIT that
would have succeeded at host CPL 0 fails in the guest, so be it.

Of course, there isn't much in the way of host policy right now, so
this may not require any particular action until interesting host
policy shows up.

> This is not constraint, but KVM has to emulate hardware correctly. For this
> part please see my explanation above.
>
> And let me explain the purpose of trapping EINIT again here.
>
> When guest is about to run EINIT, if guest's SHA256(sigstruct->modulus)
> matches guest's virtual IA32_SGXLEPUBKEYHASHn (and if others are correctly
> populated in sigstruct and token as well), KVM needs to make sure that EINIT
> will run successfully in guest, even physical IA32_SGXLEPUBKEYHASHn are not
> equal to guest's virtual MSRs at this particular time.

True, so long as it doesn't contradict host policy to do so.

> This is because given
> the same condition, the EINIT will run successfully on physical machine. KVM
> needs to emulate the right HW behavior.

No.  The host needs to do this because KVM needs to work and be
useful, not because KVM needs to precisely match CPU behavior as seen
by VMX root.

To avoid confusion, I don't believe I've ever said that guests should
be restricted in which LEs they can use.  The types of restrictions
I'm talking about are that, if the host prevents user code from
running, say, a provisioning enclave that isn't whitelisted, then the
guest should have the same restriction applied.  This type of
restriction *can't* be usefully done by restricting acceptable MSR
values, but it's trivial by trapping EINIT.
Kai Huang June 12, 2017, 10:08 p.m. UTC | #40
On 6/13/2017 4:24 AM, Andy Lutomirski wrote:
> On Mon, Jun 12, 2017 at 2:53 AM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> \
>>>
>>>>> How that would work on a system where MSRs cannot be changed?
>>>>
>>>>
>>>> This is simple, we simply won't allow guest to choose its own
>>>> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
>>>> creating the guest.
>>>
>>>
>>> Why not? You could have virtual MSRs and ask host LE to generate token
>>> if they match to modulus.
>>
>>
>> The guest has its own LE running inside, and guest's LE will generate token
>> for enclaves in guest. The host will not generate token for guest in any
>> circumstances, because this is totally guest's behavior.
>>
>> Virtualization is only about to emulate hardware's behavior, but not to
>> assume, or depend on, or change guest's SW behavior. We are not trying to
>> make sure EINIT will run successfully in guest, instead, we are trying to
>> make sure EINIT will be just the same behavior as it runs on physical
>> machine -- either success or failure, according to guest's sigstruct and
>> token.
> 
> I disagree.  Virtualization can do whatever it wants, but a pretty
> strong constraint is that the guest should be at least reasonably
> function.  

Virtualization only can do whatever it wants on the part that it can 
trap and emulate, and such *whatever* should not break HW behavior 
presented to guest. This is fundamental thing for virtualization. I 
don't know whether there are some *minor* case that the HW behavior 
emulation is not fully respected but I believe, if they exist, those 
cases are extremely rare, and we certainly have a good reason why such 
cases are OK.

Anyway I'll leave this part to KVM maintainers.

Paolo, Radim,

Sorry this thread is a little bit long till now. Can you comments on this?

A Windows guest, for example, shouldn't BSOD.  But the host
> most certainly can restrict what the guest can do.  If a guest is
> given pass-through access to a graphics card, the host is well within
> its rights to impose thermal policy, prevent reflashing, etc.

You need to see whether those policies are provided by PCIE configration 
space or by device's registers. If policies are implemented via PCIE 
configuration space (which is trapped and emulated by VMM/Qemu), you 
certainly can apply restrict on it by emulating PCIE configuration space 
access. But if those policies are done by device registers, which driver 
will control totally, it's totally up to driver and host cannot apply 
any policies. You can choose to or not to pass through device's specific 
BARs (ex, you cannot passthrough the BARs contains MSI), but once the 
BARs are passthroughed to guest, you cannot control guest's behavior.


> Similarly, if a guest is given access to SGX, the host can and should
> impose required policy on the guest.  If this means that an EINIT that
> would have succeeded at host CPL 0 fails in the guest, so be it.

This is completely different. And I disagree. If EINIT can run at host 
CPL 0, it can be run in guest's CPL 0 as well. Unless hardware doesn't 
support this, in which case we even cannot support SGX virtualization.

The exception is, if HW provides, for example, some specific capability 
bits that can be used to control whether EINIT can be run in CPL 0 or 
not, and hypervisor is able to trap those bits, then hypervisor can 
manipulate those bits to make guest think HW doesn't allow EINIT to run 
in CPL 0, in which case it is quite reasonable in guest EINIT cannot run 
in CPL 0 (because it is HW behavior).

> 
> Of course, there isn't much in the way of host policy right now, so
> this may not require any particular action until interesting host
> policy shows up.
> 
>> This is not constraint, but KVM has to emulate hardware correctly. For this
>> part please see my explanation above.
>>
>> And let me explain the purpose of trapping EINIT again here.
>>
>> When guest is about to run EINIT, if guest's SHA256(sigstruct->modulus)
>> matches guest's virtual IA32_SGXLEPUBKEYHASHn (and if others are correctly
>> populated in sigstruct and token as well), KVM needs to make sure that EINIT
>> will run successfully in guest, even physical IA32_SGXLEPUBKEYHASHn are not
>> equal to guest's virtual MSRs at this particular time.
> 
> True, so long as it doesn't contradict host policy to do so.
> 
>> This is because given
>> the same condition, the EINIT will run successfully on physical machine. KVM
>> needs to emulate the right HW behavior.
> 
> No.  The host needs to do this because KVM needs to work and be
> useful, not because KVM needs to precisely match CPU behavior as seen
> by VMX root.

If we need to break HW behavior to make SGX useful. The maintainer may 
choose not to support SGX virtualization, as from HW this feature simply 
cannot be virtualized.

Anyway I'll leave this to KVM maintainers to determine.

> 
> To avoid confusion, I don't believe I've ever said that guests should
> be restricted in which LEs they can use.  The types of restrictions
> I'm talking about are that, if the host prevents user code from
> running, say, a provisioning enclave that isn't whitelisted, then the
> guest should have the same restriction applied.  This type of
> restriction *can't* be usefully done by restricting acceptable MSR
> values, but it's trivial by trapping EINIT.

OK. Sorry I didn't get your point before. I thought it was about 
restrict LE.

I don't know whether SGX driver will have restrict on running 
provisioning enclave. In my understanding provisioning enclave is always 
from Intel. However I am not expert here and probably be wrong. Can you 
point out *exactly* what restricts in host must/should be applied to 
guest so that Jarkko can know whether he will support those restricts or 
not? Otherwise I don't think we even need to talk about this topic at 
current stage.

Thanks,
-Kai

>
Andy Lutomirski June 12, 2017, 11 p.m. UTC | #41
On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
> I don't know whether SGX driver will have restrict on running provisioning
> enclave. In my understanding provisioning enclave is always from Intel.
> However I am not expert here and probably be wrong. Can you point out
> *exactly* what restricts in host must/should be applied to guest so that
> Jarkko can know whether he will support those restricts or not? Otherwise I
> don't think we even need to talk about this topic at current stage.
>

The whole point is that I don't know.  But here are two types of
restriction I can imagine demand for:

1. Only a particular approved provisioning enclave may run (be it
Intel's or otherwise -- with a non-Intel LE, I think you can launch a
non-Intel provisioning enclave).  This would be done to restrict what
types of remote attestation can be done. (Intel supplies a remote
attestation service that uses some contractual policy that I don't
know.  Maybe a system owner wants a different policy applied to ISVs.)
 Imposing this policy on guests more or less requires filtering EINIT.

2. For kiosk-ish or single-purpose applications, I can imagine that
you would want to allow a specific list of enclave signers or even
enclave hashes. Maybe you would allow exactly one enclave hash.  You
could kludge this up with a restrictive LE policy, but you could also
do it for real by implementing the specific restriction in the kernel.
Then you'd want to impose it on the guest, and you'd do it by
filtering EINIT.

For the time being, I don't expect either policy to be implemented
right away, but I bet that something like this will eventually happen.
Jarkko Sakkinen June 13, 2017, 6:57 p.m. UTC | #42
On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote:
> > > This is simple, we simply won't allow guest to choose its own
> > > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
> > > creating the guest.
> > 
> > Why not? You could have virtual MSRs and ask host LE to generate token
> > if they match to modulus.
> 
> The guest has its own LE running inside, and guest's LE will generate token
> for enclaves in guest. The host will not generate token for guest in any
> circumstances, because this is totally guest's behavior.

Why can't host LE generate the token without guest knowning it and
supply it with EINIT?
> > Seriously sounds like a stupid constraint or I'm not getting something
> > (which also might be the case). If you anyway trap EINIT, you could
> > create a special case for guest LE.
> 
> This is not constraint, but KVM has to emulate hardware correctly. For this
> part please see my explanation above.

I'm being now totally honest to your: your explanation makes absolutely
zero sense to me. You don't need a 1000+ words to explain the scenarios
where "host as a delegate LE" approach would go wrong.

Please just pinpoint the scenarios where it goes wrong. I'll ignore
the text below.

/Jarkko
Jarkko Sakkinen June 13, 2017, 7:05 p.m. UTC | #43
On Tue, Jun 13, 2017 at 09:57:18PM +0300, Jarkko Sakkinen wrote:
> On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote:
> > > > This is simple, we simply won't allow guest to choose its own
> > > > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
> > > > creating the guest.
> > > 
> > > Why not? You could have virtual MSRs and ask host LE to generate token
> > > if they match to modulus.
> > 
> > The guest has its own LE running inside, and guest's LE will generate token
> > for enclaves in guest. The host will not generate token for guest in any
> > circumstances, because this is totally guest's behavior.
> 
> Why can't host LE generate the token without guest knowning it and
> supply it with EINIT?
> > > Seriously sounds like a stupid constraint or I'm not getting something
> > > (which also might be the case). If you anyway trap EINIT, you could
> > > create a special case for guest LE.
> > 
> > This is not constraint, but KVM has to emulate hardware correctly. For this
> > part please see my explanation above.
> 
> I'm being now totally honest to your: your explanation makes absolutely
> zero sense to me. You don't need a 1000+ words to explain the scenarios
> where "host as a delegate LE" approach would go wrong.
> 
> Please just pinpoint the scenarios where it goes wrong. I'll ignore
> the text below.
> 
> /Jarkko

When I've been reading this discussion the biggest lesson for me has
been that this is a new argument for having in-kernel LE in addition
to what Andy has stated before: the MSRs *never* need to be updated on
behalf of the guest.

/Jarkko
Sean Christopherson June 13, 2017, 8:13 p.m. UTC | #44
On Tue, 2017-06-13 at 22:05 +0300, Jarkko Sakkinen wrote:
> On Tue, Jun 13, 2017 at 09:57:18PM +0300, Jarkko Sakkinen wrote:
> > 
> > On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote:
> > > 
> > > > 
> > > > > 
> > > > > This is simple, we simply won't allow guest to choose its own
> > > > > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter
> > > > > when
> > > > > creating the guest.
> > > > Why not? You could have virtual MSRs and ask host LE to generate token
> > > > if they match to modulus.
> > > The guest has its own LE running inside, and guest's LE will generate
> > > token
> > > for enclaves in guest. The host will not generate token for guest in any
> > > circumstances, because this is totally guest's behavior.
> > Why can't host LE generate the token without guest knowning it and
> > supply it with EINIT?
> > > 
> > > > 
> > > > Seriously sounds like a stupid constraint or I'm not getting something
> > > > (which also might be the case). If you anyway trap EINIT, you could
> > > > create a special case for guest LE.
> > > This is not constraint, but KVM has to emulate hardware correctly. For
> > > this
> > > part please see my explanation above.
> > I'm being now totally honest to your: your explanation makes absolutely
> > zero sense to me. You don't need a 1000+ words to explain the scenarios
> > where "host as a delegate LE" approach would go wrong.
> > 
> > Please just pinpoint the scenarios where it goes wrong. I'll ignore
> > the text below.
> > 
> > /Jarkko
> When I've been reading this discussion the biggest lesson for me has
> been that this is a new argument for having in-kernel LE in addition
> to what Andy has stated before: the MSRs *never* need to be updated on
> behalf of the guest.
> 
> /Jarkko

The MSRs need to be written to run a LE in the guest, EINITTOKEN can't be used
to EINIT an enclave that is requesting access to the EINITTOKENKEY, i.e. a LE.
Preventing the guest from running its own LE is not an option, as the owner of
the LE, e.g. guest kernel or userspace daemon, will likely disable SGX if its LE
fails to run (including any ECALLS into the LE).  Allowing a guest to run a LE
doesn't mean the host can't ignore/discard the guest's EINITTOKENs, assuming the
host traps EINIT.
Kai Huang June 13, 2017, 11:28 p.m. UTC | #45
On 6/14/2017 6:57 AM, Jarkko Sakkinen wrote:
> On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote:
>>>> This is simple, we simply won't allow guest to choose its own
>>>> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
>>>> creating the guest.
>>>
>>> Why not? You could have virtual MSRs and ask host LE to generate token
>>> if they match to modulus.
>>
>> The guest has its own LE running inside, and guest's LE will generate token
>> for enclaves in guest. The host will not generate token for guest in any
>> circumstances, because this is totally guest's behavior.
> 
> Why can't host LE generate the token without guest knowning it and
> supply it with EINIT?

I have said many times, virtualization is only about to emulate hardware 
behavior. The same software runs in native machine is supposed to run in 
guest. I don't care on host how you implement -- whether LE delegates 
token for other enclave or not. Your implementation works on host, the 
exact driver works in guest. I said several times, I even don't have to 
trap EINIT, in which case you simply cannot generate token for EINIT 
from guest, because there is no EINIT from guest!


>>> Seriously sounds like a stupid constraint or I'm not getting something
>>> (which also might be the case). If you anyway trap EINIT, you could
>>> create a special case for guest LE.
>>
>> This is not constraint, but KVM has to emulate hardware correctly. For this
>> part please see my explanation above.
> 
> I'm being now totally honest to your: your explanation makes absolutely
> zero sense to me. You don't need a 1000+ words to explain the scenarios
> where "host as a delegate LE" approach would go wrong.

This doesn't require 1000+ words.. This is simple, and I have explained:

This is not constraint, but KVM has to emulate hardware correctly.

The additional 1000+ words that I spent lots of time on typing is trying 
to explain to you -- why we (at least Andy, Sean, Haim I believe) agreed 
to choose to trap EINIT.


> 
> Please just pinpoint the scenarios where it goes wrong. I'll ignore
> the text below.

Of course you can, if you already understand why we agreed to choose to 
trap EINIT.

I think Andy's comments just made things more complicated. He is trying 
to solve problems that don't exist in your implementation, and I believe 
we can handle this in the future, if those problems come up.

So let's focus on how to handle updating MSRs and EINIT. I believe Andy, 
me, and Sean are all on the same page. Not sure whether you are or not.

For me I am perfectly fine not to trap EINIT (which is exactly this 
patch did), but you have to guarantee the correctness of host side.

> 
> /Jarkko
>
Jarkko Sakkinen June 14, 2017, 9:37 a.m. UTC | #46
On Tue, Jun 13, 2017 at 01:13:04PM -0700, Sean Christopherson wrote:
 
> The MSRs need to be written to run a LE in the guest, EINITTOKEN can't be used
> to EINIT an enclave that is requesting access to the EINITTOKENKEY, i.e. a LE.
> Preventing the guest from running its own LE is not an option, as the owner of
> the LE, e.g. guest kernel or userspace daemon, will likely disable SGX if its LE
> fails to run (including any ECALLS into the LE).  Allowing a guest to run a LE
> doesn't mean the host can't ignore/discard the guest's EINITTOKENs, assuming the
> host traps EINIT.

[I started one week leave today but will peek MLs seldomly so except
some delay in my follow up responses]

Please, lets not use the term ECALL in these discussions. It's neither
hardware nor kernel specific concept. It's abstraction that exists only
in the Intel SDK. I have neither ECALLs nor OCALLs in my LE for example.
There are enough moving parts without such abstraction.

I'm looking at the section "EINIT - Initialize an Enclave for Execution"
from the SDM. I'm not seeing a branch in the pseudo code that checks for
ATTRIBUTES.EINITTOKENKEY.

39.1.4 states that "Only Launch Enclaves are allowed to launch without a
valid token." I'm not sure what I should deduce from that because that
statement is *incorrect*. If you control the MSRs, you can launch
anything you want to launch. I guess we should make a bug report of this
section as it's complete nonsense?

The table 41-56 does not show any key material bound to key hash defined
in the MSRs.

Instead of teaching me stuff that I already know I would just like to
get pinpointed where is the "side-effect" that makes the constraint that
you are claiming. I can then update the documentation so that we don't
have to go through this discussion anymore :-)

/Jarkko
Jarkko Sakkinen June 14, 2017, 9:44 a.m. UTC | #47
On Wed, Jun 14, 2017 at 11:28:32AM +1200, Huang, Kai wrote:
> 
> 
> On 6/14/2017 6:57 AM, Jarkko Sakkinen wrote:
> > On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote:
> > > > > This is simple, we simply won't allow guest to choose its own
> > > > > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when
> > > > > creating the guest.
> > > > 
> > > > Why not? You could have virtual MSRs and ask host LE to generate token
> > > > if they match to modulus.
> > > 
> > > The guest has its own LE running inside, and guest's LE will generate token
> > > for enclaves in guest. The host will not generate token for guest in any
> > > circumstances, because this is totally guest's behavior.
> > 
> > Why can't host LE generate the token without guest knowning it and
> > supply it with EINIT?
> 
> I have said many times, virtualization is only about to emulate hardware
> behavior. The same software runs in native machine is supposed to run in
> guest. I don't care on host how you implement -- whether LE delegates token
> for other enclave or not. Your implementation works on host, the exact
> driver works in guest. I said several times, I even don't have to trap
> EINIT, in which case you simply cannot generate token for EINIT from guest,
> because there is no EINIT from guest!
> 
> 
> > > > Seriously sounds like a stupid constraint or I'm not getting something
> > > > (which also might be the case). If you anyway trap EINIT, you could
> > > > create a special case for guest LE.
> > > 
> > > This is not constraint, but KVM has to emulate hardware correctly. For this
> > > part please see my explanation above.
> > 
> > I'm being now totally honest to your: your explanation makes absolutely
> > zero sense to me. You don't need a 1000+ words to explain the scenarios
> > where "host as a delegate LE" approach would go wrong.
> 
> This doesn't require 1000+ words.. This is simple, and I have explained:
> 
> This is not constraint, but KVM has to emulate hardware correctly.
> 
> The additional 1000+ words that I spent lots of time on typing is trying to
> explain to you -- why we (at least Andy, Sean, Haim I believe) agreed to
> choose to trap EINIT.
> 
> 
> > 
> > Please just pinpoint the scenarios where it goes wrong. I'll ignore
> > the text below.
> 
> Of course you can, if you already understand why we agreed to choose to trap
> EINIT.
> 
> I think Andy's comments just made things more complicated. He is trying to
> solve problems that don't exist in your implementation, and I believe we can
> handle this in the future, if those problems come up.
> 
> So let's focus on how to handle updating MSRs and EINIT. I believe Andy, me,
> and Sean are all on the same page. Not sure whether you are or not.
> 
> For me I am perfectly fine not to trap EINIT (which is exactly this patch
> did), but you have to guarantee the correctness of host side.
> 
> > 
> > /Jarkko
> > 

I'm not yet seeing why MSRs would ever would need to be updated.

See my response to Sean for details.

There's probably some detail in the SDM that I'm not observing.

/Jarkko
Sean Christopherson June 14, 2017, 3:11 p.m. UTC | #48
Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> On Tue, Jun 13, 2017 at 01:13:04PM -0700, Sean Christopherson wrote:
>  
> > The MSRs need to be written to run a LE in the guest, EINITTOKEN can't be
> > used to EINIT an enclave that is requesting access to the EINITTOKENKEY,
> > i.e. a LE. Preventing the guest from running its own LE is not an option,
> > as the owner of the LE, e.g. guest kernel or userspace daemon, will likely
> > disable SGX if its LE fails to run (including any ECALLS into the LE).
> > Allowing a guest to run a LE doesn't mean the host can't ignore/discard the
> > guest's EINITTOKENs, assuming the host traps EINIT.
> 
> [I started one week leave today but will peek MLs seldomly so except
> some delay in my follow up responses]
> 
> Please, lets not use the term ECALL in these discussions. It's neither
> hardware nor kernel specific concept. It's abstraction that exists only
> in the Intel SDK. I have neither ECALLs nor OCALLs in my LE for example.
> There are enough moving parts without such abstraction.
> 
> I'm looking at the section "EINIT - Initialize an Enclave for Execution"
> from the SDM. I'm not seeing a branch in the pseudo code that checks for
> ATTRIBUTES.EINITTOKENKEY.

(* if controlled ATTRIBUTES are set, SIGSTRUCT must be signed using an authorized key *)
CONTROLLED_ATTRIBUTES <- 0000000000000020H;
IF (((DS:RCX.ATTRIBUTES & CONTROLLED_ATTRIBUTES) != 0) and (TMP_MRSIGNER != IA32_SGXLEPUBKEYHASH))
    RFLAG.ZF <- 1;
    RAX <- SGX_INVALID_ATTRIBUTE;
    GOTO EXIT;
FI;

Bit 5, i.e. 20H, corresponds to the EINITTOKENKEY.  This is also covered in the
text description under Intel SGX Launch Control Configuration - "The hash of the
public key used to sign the SIGSTRUCT of the Launch Enclave must equal the value
in the IA32_SGXLEPUBKEYHASH MSRs."


> 39.1.4 states that "Only Launch Enclaves are allowed to launch without a
> valid token." I'm not sure what I should deduce from that because that
> statement is *incorrect*. If you control the MSRs, you can launch
> anything you want to launch. I guess we should make a bug report of this
> section as it's complete nonsense?

I wouldn't call it complete nonsense, there are far more egregious ambiguities
in the SDM.  If you read the statement in the context of someone learning about
SGX, it makes perfect sense: if it's not a launch enclave, it needs a token.
Sure, rewording the statement to something like "Only enclaves whose public key
hash equals the value in the IA32_SGXLEPUBKEYHASH MSRs are allowed to launch
without a token." is technically more accurate, but I wouldn't describe the
current wording as "complete nonsense".  


> The table 41-56 does not show any key material bound to key hash defined
> in the MSRs.
> 
> Instead of teaching me stuff that I already know I would just like to
> get pinpointed where is the "side-effect" that makes the constraint that
> you are claiming. I can then update the documentation so that we don't
> have to go through this discussion anymore :-)
> 
> /Jarkko
Jarkko Sakkinen June 14, 2017, 5:03 p.m. UTC | #49
On Wed, Jun 14, 2017 at 03:11:34PM +0000, Christopherson, Sean J wrote:
> Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> > On Tue, Jun 13, 2017 at 01:13:04PM -0700, Sean Christopherson wrote:
> >  
> > > The MSRs need to be written to run a LE in the guest, EINITTOKEN can't be
> > > used to EINIT an enclave that is requesting access to the EINITTOKENKEY,
> > > i.e. a LE. Preventing the guest from running its own LE is not an option,
> > > as the owner of the LE, e.g. guest kernel or userspace daemon, will likely
> > > disable SGX if its LE fails to run (including any ECALLS into the LE).
> > > Allowing a guest to run a LE doesn't mean the host can't ignore/discard the
> > > guest's EINITTOKENs, assuming the host traps EINIT.
> > 
> > [I started one week leave today but will peek MLs seldomly so except
> > some delay in my follow up responses]
> > 
> > Please, lets not use the term ECALL in these discussions. It's neither
> > hardware nor kernel specific concept. It's abstraction that exists only
> > in the Intel SDK. I have neither ECALLs nor OCALLs in my LE for example.
> > There are enough moving parts without such abstraction.
> > 
> > I'm looking at the section "EINIT - Initialize an Enclave for Execution"
> > from the SDM. I'm not seeing a branch in the pseudo code that checks for
> > ATTRIBUTES.EINITTOKENKEY.
> 
> (* if controlled ATTRIBUTES are set, SIGSTRUCT must be signed using an authorized key *)
> CONTROLLED_ATTRIBUTES <- 0000000000000020H;
> IF (((DS:RCX.ATTRIBUTES & CONTROLLED_ATTRIBUTES) != 0) and (TMP_MRSIGNER != IA32_SGXLEPUBKEYHASH))
>     RFLAG.ZF <- 1;
>     RAX <- SGX_INVALID_ATTRIBUTE;
>     GOTO EXIT;
> FI;
> 
> Bit 5, i.e. 20H, corresponds to the EINITTOKENKEY.  This is also covered in the
> text description under Intel SGX Launch Control Configuration - "The hash of the
> public key used to sign the SIGSTRUCT of the Launch Enclave must equal the value
> in the IA32_SGXLEPUBKEYHASH MSRs."

Thanks. I wonder by the naming is ambiguous (the value is exactly the
same as the value of ATTRIBUTES.EINITTOKENKEY but the name is different)
but there it is.

> > 39.1.4 states that "Only Launch Enclaves are allowed to launch without a
> > valid token." I'm not sure what I should deduce from that because that
> > statement is *incorrect*. If you control the MSRs, you can launch
> > anything you want to launch. I guess we should make a bug report of this
> > section as it's complete nonsense?
> 
> I wouldn't call it complete nonsense, there are far more egregious ambiguities
> in the SDM.  If you read the statement in the context of someone learning about
> SGX, it makes perfect sense: if it's not a launch enclave, it needs a token.
> Sure, rewording the statement to something like "Only enclaves whose public key
> hash equals the value in the IA32_SGXLEPUBKEYHASH MSRs are allowed to launch
> without a token." is technically more accurate, but I wouldn't describe the
> current wording as "complete nonsense".  

Agreed! That was a harsh overstatement.

I think that in this kind of stuff the accurancy still would make sense
when cryptography is involved.

I'll make updates to intel_sgx.rst. It's good to have it documented when
virtualization stuff is upstreamed.

/Jarkko
Kai Huang June 16, 2017, 3:46 a.m. UTC | #50
On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>
>> I don't know whether SGX driver will have restrict on running provisioning
>> enclave. In my understanding provisioning enclave is always from Intel.
>> However I am not expert here and probably be wrong. Can you point out
>> *exactly* what restricts in host must/should be applied to guest so that
>> Jarkko can know whether he will support those restricts or not? Otherwise I
>> don't think we even need to talk about this topic at current stage.
>>
> 
> The whole point is that I don't know.  But here are two types of
> restriction I can imagine demand for:
> 
> 1. Only a particular approved provisioning enclave may run (be it
> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
> non-Intel provisioning enclave).  This would be done to restrict what
> types of remote attestation can be done. (Intel supplies a remote
> attestation service that uses some contractual policy that I don't
> know.  Maybe a system owner wants a different policy applied to ISVs.)
>   Imposing this policy on guests more or less requires filtering EINIT.

Hi Andy,

Sorry for late reply.

What is the issue if host and guest run provisioning enclave from 
different vendor, for example, host runs intel's provisioning enclave, 
and guest runs other vendor's provisioning enclave? Or different guests 
run provisioning enclaves from different vendors?

One reason I am asking is that, on Xen (where we don't have concept of 
*host*), it's likely that we won't apply any policy at Xen hypervisor at 
all, and guests will be able to run any enclave from any signer as their 
wish.

Sorry that I don't understand (or kind of forgot) the issues here.

> 
> 2. For kiosk-ish or single-purpose applications, I can imagine that
> you would want to allow a specific list of enclave signers or even
> enclave hashes. Maybe you would allow exactly one enclave hash.  You
> could kludge this up with a restrictive LE policy, but you could also
> do it for real by implementing the specific restriction in the kernel.
> Then you'd want to impose it on the guest, and you'd do it by
> filtering EINIT.
Assuming the enclave hash means measurement of enclave, and assuming we 
have a policy that we only allow enclave from one signer to run, would 
you also elaborate the issue that, if host and guest run enclaves from 
different signer? If host has such policy, and we are allowing creating 
guests on such host, I think that typically we will have the same policy 
in the guest (vetted by guest's kernel). The owner of that host should 
be aware of the risk (if there's any) by creating guest and run enclave 
inside it.

Thanks,
-Kai

> 
> For the time being, I don't expect either policy to be implemented
> right away, but I bet that something like this will eventually happen.
>
Andy Lutomirski June 16, 2017, 4:11 a.m. UTC | #51
On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
>>
>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com>
>> wrote:
>>>
>>>
>>> I don't know whether SGX driver will have restrict on running
>>> provisioning
>>> enclave. In my understanding provisioning enclave is always from Intel.
>>> However I am not expert here and probably be wrong. Can you point out
>>> *exactly* what restricts in host must/should be applied to guest so that
>>> Jarkko can know whether he will support those restricts or not? Otherwise
>>> I
>>> don't think we even need to talk about this topic at current stage.
>>>
>>
>> The whole point is that I don't know.  But here are two types of
>> restriction I can imagine demand for:
>>
>> 1. Only a particular approved provisioning enclave may run (be it
>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
>> non-Intel provisioning enclave).  This would be done to restrict what
>> types of remote attestation can be done. (Intel supplies a remote
>> attestation service that uses some contractual policy that I don't
>> know.  Maybe a system owner wants a different policy applied to ISVs.)
>>   Imposing this policy on guests more or less requires filtering EINIT.
>
>
> Hi Andy,
>
> Sorry for late reply.
>
> What is the issue if host and guest run provisioning enclave from different
> vendor, for example, host runs intel's provisioning enclave, and guest runs
> other vendor's provisioning enclave? Or different guests run provisioning
> enclaves from different vendors?

There's no issue unless someone has tried to impose a policy.  There
is clearly at least some interest in having policies that affect what
enclaves can run -- otherwise there wouldn't be LEs in the first
place.

>
> One reason I am asking is that, on Xen (where we don't have concept of
> *host*), it's likely that we won't apply any policy at Xen hypervisor at
> all, and guests will be able to run any enclave from any signer as their
> wish.

That seems entirely reasonable.  Someone may eventually ask Xen to add
support for SGX enclave restrictions, in which case you'll either have
to tell them that it won't happen or implement it.

>
> Sorry that I don't understand (or kind of forgot) the issues here.
>
>>
>> 2. For kiosk-ish or single-purpose applications, I can imagine that
>> you would want to allow a specific list of enclave signers or even
>> enclave hashes. Maybe you would allow exactly one enclave hash.  You
>> could kludge this up with a restrictive LE policy, but you could also
>> do it for real by implementing the specific restriction in the kernel.
>> Then you'd want to impose it on the guest, and you'd do it by
>> filtering EINIT.
>
> Assuming the enclave hash means measurement of enclave, and assuming we have
> a policy that we only allow enclave from one signer to run, would you also
> elaborate the issue that, if host and guest run enclaves from different
> signer? If host has such policy, and we are allowing creating guests on such
> host, I think that typically we will have the same policy in the guest

Yes, I presume this too, but.

> (vetted by guest's kernel). The owner of that host should be aware of the
> risk (if there's any) by creating guest and run enclave inside it.

No.  The host does not trust the guest in general.  If the host has a
policy that the only enclave that shall run is X, that doesn't mean
that the host shall reject all enclaves requested by the normal
userspace API except X but that, if /dev/kvm is used, then the user is
magically trusted to not load a guest that fails to respect the host
policy.  It means that the only enclave that shall run is X regardless
of what interface is used.  The host must only allow X to be loaded by
its userspace and the host must only allow X to be loaded by a guest.
Kai Huang June 16, 2017, 4:33 a.m. UTC | #52
On 6/16/2017 4:11 PM, Andy Lutomirski wrote:
> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>
>>
>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
>>>
>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com>
>>> wrote:
>>>>
>>>>
>>>> I don't know whether SGX driver will have restrict on running
>>>> provisioning
>>>> enclave. In my understanding provisioning enclave is always from Intel.
>>>> However I am not expert here and probably be wrong. Can you point out
>>>> *exactly* what restricts in host must/should be applied to guest so that
>>>> Jarkko can know whether he will support those restricts or not? Otherwise
>>>> I
>>>> don't think we even need to talk about this topic at current stage.
>>>>
>>>
>>> The whole point is that I don't know.  But here are two types of
>>> restriction I can imagine demand for:
>>>
>>> 1. Only a particular approved provisioning enclave may run (be it
>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
>>> non-Intel provisioning enclave).  This would be done to restrict what
>>> types of remote attestation can be done. (Intel supplies a remote
>>> attestation service that uses some contractual policy that I don't
>>> know.  Maybe a system owner wants a different policy applied to ISVs.)
>>>    Imposing this policy on guests more or less requires filtering EINIT.
>>
>>
>> Hi Andy,
>>
>> Sorry for late reply.
>>
>> What is the issue if host and guest run provisioning enclave from different
>> vendor, for example, host runs intel's provisioning enclave, and guest runs
>> other vendor's provisioning enclave? Or different guests run provisioning
>> enclaves from different vendors?
> 
> There's no issue unless someone has tried to impose a policy.  There
> is clearly at least some interest in having policies that affect what
> enclaves can run -- otherwise there wouldn't be LEs in the first
> place.
> 
>>
>> One reason I am asking is that, on Xen (where we don't have concept of
>> *host*), it's likely that we won't apply any policy at Xen hypervisor at
>> all, and guests will be able to run any enclave from any signer as their
>> wish.
> 
> That seems entirely reasonable.  Someone may eventually ask Xen to add
> support for SGX enclave restrictions, in which case you'll either have
> to tell them that it won't happen or implement it.
> 
>>
>> Sorry that I don't understand (or kind of forgot) the issues here.
>>
>>>
>>> 2. For kiosk-ish or single-purpose applications, I can imagine that
>>> you would want to allow a specific list of enclave signers or even
>>> enclave hashes. Maybe you would allow exactly one enclave hash.  You
>>> could kludge this up with a restrictive LE policy, but you could also
>>> do it for real by implementing the specific restriction in the kernel.
>>> Then you'd want to impose it on the guest, and you'd do it by
>>> filtering EINIT.
>>
>> Assuming the enclave hash means measurement of enclave, and assuming we have
>> a policy that we only allow enclave from one signer to run, would you also
>> elaborate the issue that, if host and guest run enclaves from different
>> signer? If host has such policy, and we are allowing creating guests on such
>> host, I think that typically we will have the same policy in the guest
> 
> Yes, I presume this too, but.
> 
>> (vetted by guest's kernel). The owner of that host should be aware of the
>> risk (if there's any) by creating guest and run enclave inside it.
> 
> No.  The host does not trust the guest in general.  If the host has a

I agree.

> policy that the only enclave that shall run is X, that doesn't mean
> that the host shall reject all enclaves requested by the normal
> userspace API except X but that, if /dev/kvm is used, then the user is
> magically trusted to not load a guest that fails to respect the host
> policy.  It means that the only enclave that shall run is X regardless
> of what interface is used.  The host must only allow X to be loaded by
> its userspace and the host must only allow X to be loaded by a guest.
> 

This is theoretical thing. I think your statement makes sense only if we 
have specific example that can prove there's actual risk when allowing 
guest to exceed X approved by host.

I will dig more in your previous emails to see whether you have listed 
such real cases (I some kind forgot sorry) but if you don't mind, you 
can list such cases here.

Thanks,
-Kai
Kai Huang June 16, 2017, 9:34 a.m. UTC | #53
On 6/16/2017 4:33 PM, Huang, Kai wrote:
> 
> 
> On 6/16/2017 4:11 PM, Andy Lutomirski wrote:
>> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai 
>> <kai.huang@linux.intel.com> wrote:
>>>
>>>
>>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
>>>>
>>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> I don't know whether SGX driver will have restrict on running
>>>>> provisioning
>>>>> enclave. In my understanding provisioning enclave is always from 
>>>>> Intel.
>>>>> However I am not expert here and probably be wrong. Can you point out
>>>>> *exactly* what restricts in host must/should be applied to guest so 
>>>>> that
>>>>> Jarkko can know whether he will support those restricts or not? 
>>>>> Otherwise
>>>>> I
>>>>> don't think we even need to talk about this topic at current stage.
>>>>>
>>>>
>>>> The whole point is that I don't know.  But here are two types of
>>>> restriction I can imagine demand for:
>>>>
>>>> 1. Only a particular approved provisioning enclave may run (be it
>>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
>>>> non-Intel provisioning enclave).  This would be done to restrict what
>>>> types of remote attestation can be done. (Intel supplies a remote
>>>> attestation service that uses some contractual policy that I don't
>>>> know.  Maybe a system owner wants a different policy applied to ISVs.)
>>>>    Imposing this policy on guests more or less requires filtering 
>>>> EINIT.
>>>
>>>
>>> Hi Andy,
>>>
>>> Sorry for late reply.
>>>
>>> What is the issue if host and guest run provisioning enclave from 
>>> different
>>> vendor, for example, host runs intel's provisioning enclave, and 
>>> guest runs
>>> other vendor's provisioning enclave? Or different guests run 
>>> provisioning
>>> enclaves from different vendors?
>>
>> There's no issue unless someone has tried to impose a policy.  There
>> is clearly at least some interest in having policies that affect what
>> enclaves can run -- otherwise there wouldn't be LEs in the first
>> place.
>>
>>>
>>> One reason I am asking is that, on Xen (where we don't have concept of
>>> *host*), it's likely that we won't apply any policy at Xen hypervisor at
>>> all, and guests will be able to run any enclave from any signer as their
>>> wish.
>>
>> That seems entirely reasonable.  Someone may eventually ask Xen to add
>> support for SGX enclave restrictions, in which case you'll either have
>> to tell them that it won't happen or implement it.
>>
>>>
>>> Sorry that I don't understand (or kind of forgot) the issues here.
>>>
>>>>
>>>> 2. For kiosk-ish or single-purpose applications, I can imagine that
>>>> you would want to allow a specific list of enclave signers or even
>>>> enclave hashes. Maybe you would allow exactly one enclave hash.  You
>>>> could kludge this up with a restrictive LE policy, but you could also
>>>> do it for real by implementing the specific restriction in the kernel.
>>>> Then you'd want to impose it on the guest, and you'd do it by
>>>> filtering EINIT.
>>>
>>> Assuming the enclave hash means measurement of enclave, and assuming 
>>> we have
>>> a policy that we only allow enclave from one signer to run, would you 
>>> also
>>> elaborate the issue that, if host and guest run enclaves from different
>>> signer? If host has such policy, and we are allowing creating guests 
>>> on such
>>> host, I think that typically we will have the same policy in the guest
>>
>> Yes, I presume this too, but.
>>
>>> (vetted by guest's kernel). The owner of that host should be aware of 
>>> the
>>> risk (if there's any) by creating guest and run enclave inside it.
>>
>> No.  The host does not trust the guest in general.  If the host has a
> 
> I agree.
> 
>> policy that the only enclave that shall run is X, that doesn't mean
>> that the host shall reject all enclaves requested by the normal
>> userspace API except X but that, if /dev/kvm is used, then the user is
>> magically trusted to not load a guest that fails to respect the host
>> policy.  It means that the only enclave that shall run is X regardless
>> of what interface is used.  The host must only allow X to be loaded by
>> its userspace and the host must only allow X to be loaded by a guest.
>>
> 
> This is theoretical thing. I think your statement makes sense only if we 
> have specific example that can prove there's actual risk when allowing 
> guest to exceed X approved by host.
> 
> I will dig more in your previous emails to see whether you have listed 
> such real cases (I some kind forgot sorry) but if you don't mind, you 
> can list such cases here.

Hi Andy,

I can find an example you listed in your previous email but it is not 
related to host policy but related to SGX's key architecture issue. I 
quoted below:

"Concretely, imagine I write an enclave that seals my TLS client
certificate's private key and offers an API to sign TLS certificate
requests with it.  This way, if my system is compromised, an attacker
can use the certificate only so long as they have access to my
machine.  If I kick them out or if they merely get the ability to read
the sealed data but not to execute code, the private key should still
be safe.  But, if this system is a VM guest, the attacker could run
the exact same enclave on another guest on the same physical CPU and
sign using my key.  Whoops!"

I think you will have this problem even you apply the most strict policy 
at both host and guest -- only allow one enclave from one signer to run. 
This is indeed a flaw but virtualization cannot do anything to solve 
this -- unless we don't support virtualization at all :)

Sorry I am just trying to find out whether there's real case that really 
require we apply host's policy to guest and will have problem if we don't.

Thanks,
-Kai

> 
> Thanks,
> -Kai
Andy Lutomirski June 16, 2017, 4:03 p.m. UTC | #54
On Thu, Jun 15, 2017 at 9:33 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 6/16/2017 4:11 PM, Andy Lutomirski wrote:
>>
>> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com>
>> wrote:
>>>
>>>
>>>
>>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
>>>>
>>>>
>>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> I don't know whether SGX driver will have restrict on running
>>>>> provisioning
>>>>> enclave. In my understanding provisioning enclave is always from Intel.
>>>>> However I am not expert here and probably be wrong. Can you point out
>>>>> *exactly* what restricts in host must/should be applied to guest so
>>>>> that
>>>>> Jarkko can know whether he will support those restricts or not?
>>>>> Otherwise
>>>>> I
>>>>> don't think we even need to talk about this topic at current stage.
>>>>>
>>>>
>>>> The whole point is that I don't know.  But here are two types of
>>>> restriction I can imagine demand for:
>>>>
>>>> 1. Only a particular approved provisioning enclave may run (be it
>>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
>>>> non-Intel provisioning enclave).  This would be done to restrict what
>>>> types of remote attestation can be done. (Intel supplies a remote
>>>> attestation service that uses some contractual policy that I don't
>>>> know.  Maybe a system owner wants a different policy applied to ISVs.)
>>>>    Imposing this policy on guests more or less requires filtering EINIT.
>>>
>>>
>>>
>>> Hi Andy,
>>>
>>> Sorry for late reply.
>>>
>>> What is the issue if host and guest run provisioning enclave from
>>> different
>>> vendor, for example, host runs intel's provisioning enclave, and guest
>>> runs
>>> other vendor's provisioning enclave? Or different guests run provisioning
>>> enclaves from different vendors?
>>
>>
>> There's no issue unless someone has tried to impose a policy.  There
>> is clearly at least some interest in having policies that affect what
>> enclaves can run -- otherwise there wouldn't be LEs in the first
>> place.
>>
>>>
>>> One reason I am asking is that, on Xen (where we don't have concept of
>>> *host*), it's likely that we won't apply any policy at Xen hypervisor at
>>> all, and guests will be able to run any enclave from any signer as their
>>> wish.
>>
>>
>> That seems entirely reasonable.  Someone may eventually ask Xen to add
>> support for SGX enclave restrictions, in which case you'll either have
>> to tell them that it won't happen or implement it.
>>
>>>
>>> Sorry that I don't understand (or kind of forgot) the issues here.
>>>
>>>>
>>>> 2. For kiosk-ish or single-purpose applications, I can imagine that
>>>> you would want to allow a specific list of enclave signers or even
>>>> enclave hashes. Maybe you would allow exactly one enclave hash.  You
>>>> could kludge this up with a restrictive LE policy, but you could also
>>>> do it for real by implementing the specific restriction in the kernel.
>>>> Then you'd want to impose it on the guest, and you'd do it by
>>>> filtering EINIT.
>>>
>>>
>>> Assuming the enclave hash means measurement of enclave, and assuming we
>>> have
>>> a policy that we only allow enclave from one signer to run, would you
>>> also
>>> elaborate the issue that, if host and guest run enclaves from different
>>> signer? If host has such policy, and we are allowing creating guests on
>>> such
>>> host, I think that typically we will have the same policy in the guest
>>
>>
>> Yes, I presume this too, but.
>>
>>> (vetted by guest's kernel). The owner of that host should be aware of the
>>> risk (if there's any) by creating guest and run enclave inside it.
>>
>>
>> No.  The host does not trust the guest in general.  If the host has a
>
>
> I agree.
>
>> policy that the only enclave that shall run is X, that doesn't mean
>> that the host shall reject all enclaves requested by the normal
>> userspace API except X but that, if /dev/kvm is used, then the user is
>> magically trusted to not load a guest that fails to respect the host
>> policy.  It means that the only enclave that shall run is X regardless
>> of what interface is used.  The host must only allow X to be loaded by
>> its userspace and the host must only allow X to be loaded by a guest.
>>
>
> This is theoretical thing. I think your statement makes sense only if we
> have specific example that can prove there's actual risk when allowing guest
> to exceed X approved by host.

I would turn this around.  Can you come up with any example where the
host would have a restrictive policy but that the policy should not be
enforced for guests?
Andy Lutomirski June 16, 2017, 4:25 p.m. UTC | #55
On Thu, Jun 15, 2017 at 9:33 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>
>
> On 6/16/2017 4:11 PM, Andy Lutomirski wrote:
>>
>> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com>
>> wrote:
>>>
>>>
>>>
>>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
>>>>
>>>>
>>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> I don't know whether SGX driver will have restrict on running
>>>>> provisioning
>>>>> enclave. In my understanding provisioning enclave is always from Intel.
>>>>> However I am not expert here and probably be wrong. Can you point out
>>>>> *exactly* what restricts in host must/should be applied to guest so
>>>>> that
>>>>> Jarkko can know whether he will support those restricts or not?
>>>>> Otherwise
>>>>> I
>>>>> don't think we even need to talk about this topic at current stage.
>>>>>
>>>>
>>>> The whole point is that I don't know.  But here are two types of
>>>> restriction I can imagine demand for:
>>>>
>>>> 1. Only a particular approved provisioning enclave may run (be it
>>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
>>>> non-Intel provisioning enclave).  This would be done to restrict what
>>>> types of remote attestation can be done. (Intel supplies a remote
>>>> attestation service that uses some contractual policy that I don't
>>>> know.  Maybe a system owner wants a different policy applied to ISVs.)
>>>>    Imposing this policy on guests more or less requires filtering EINIT.
>>>
>>>
>>>
>>> Hi Andy,
>>>
>>> Sorry for late reply.
>>>
>>> What is the issue if host and guest run provisioning enclave from
>>> different
>>> vendor, for example, host runs intel's provisioning enclave, and guest
>>> runs
>>> other vendor's provisioning enclave? Or different guests run provisioning
>>> enclaves from different vendors?
>>
>>
>> There's no issue unless someone has tried to impose a policy.  There
>> is clearly at least some interest in having policies that affect what
>> enclaves can run -- otherwise there wouldn't be LEs in the first
>> place.
>>
>>>
>>> One reason I am asking is that, on Xen (where we don't have concept of
>>> *host*), it's likely that we won't apply any policy at Xen hypervisor at
>>> all, and guests will be able to run any enclave from any signer as their
>>> wish.
>>
>>
>> That seems entirely reasonable.  Someone may eventually ask Xen to add
>> support for SGX enclave restrictions, in which case you'll either have
>> to tell them that it won't happen or implement it.
>>
>>>
>>> Sorry that I don't understand (or kind of forgot) the issues here.
>>>
>>>>
>>>> 2. For kiosk-ish or single-purpose applications, I can imagine that
>>>> you would want to allow a specific list of enclave signers or even
>>>> enclave hashes. Maybe you would allow exactly one enclave hash.  You
>>>> could kludge this up with a restrictive LE policy, but you could also
>>>> do it for real by implementing the specific restriction in the kernel.
>>>> Then you'd want to impose it on the guest, and you'd do it by
>>>> filtering EINIT.
>>>
>>>
>>> Assuming the enclave hash means measurement of enclave, and assuming we
>>> have
>>> a policy that we only allow enclave from one signer to run, would you
>>> also
>>> elaborate the issue that, if host and guest run enclaves from different
>>> signer? If host has such policy, and we are allowing creating guests on
>>> such
>>> host, I think that typically we will have the same policy in the guest
>>
>>
>> Yes, I presume this too, but.
>>
>>> (vetted by guest's kernel). The owner of that host should be aware of the
>>> risk (if there's any) by creating guest and run enclave inside it.
>>
>>
>> No.  The host does not trust the guest in general.  If the host has a
>
>
> I agree.
>
>> policy that the only enclave that shall run is X, that doesn't mean
>> that the host shall reject all enclaves requested by the normal
>> userspace API except X but that, if /dev/kvm is used, then the user is
>> magically trusted to not load a guest that fails to respect the host
>> policy.  It means that the only enclave that shall run is X regardless
>> of what interface is used.  The host must only allow X to be loaded by
>> its userspace and the host must only allow X to be loaded by a guest.
>>
>
> This is theoretical thing. I think your statement makes sense only if we
> have specific example that can prove there's actual risk when allowing guest
> to exceed X approved by host.
>
> I will dig more in your previous emails to see whether you have listed such
> real cases (I some kind forgot sorry) but if you don't mind, you can list
> such cases here.

I'm operating under the assumption that some kind of policy exists in
the first place.  I can imagine everything working fairly well without
any real policy, but apparently there are vendors who want restrictive
policies.  What I can't imagine is anyone who wants a restrictive
policy but is then okay with the host only partially enforcing it.

>
> Thanks,
> -Kai
Sean Christopherson June 16, 2017, 4:31 p.m. UTC | #56
Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 15, 2017 at 9:33 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> >
> >
> > On 6/16/2017 4:11 PM, Andy Lutomirski wrote:
> >>
> >> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com>
> >> wrote:
> >>>
> >>>
> >>>
> >>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote:
> >>>>
> >>>>
> >>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com>
> >>>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> I don't know whether SGX driver will have restrict on running
> >>>>> provisioning
> >>>>> enclave. In my understanding provisioning enclave is always from Intel.
> >>>>> However I am not expert here and probably be wrong. Can you point out
> >>>>> *exactly* what restricts in host must/should be applied to guest so
> >>>>> that
> >>>>> Jarkko can know whether he will support those restricts or not?
> >>>>> Otherwise
> >>>>> I
> >>>>> don't think we even need to talk about this topic at current stage.
> >>>>>
> >>>>
> >>>> The whole point is that I don't know.  But here are two types of
> >>>> restriction I can imagine demand for:
> >>>>
> >>>> 1. Only a particular approved provisioning enclave may run (be it
> >>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a
> >>>> non-Intel provisioning enclave).  This would be done to restrict what
> >>>> types of remote attestation can be done. (Intel supplies a remote
> >>>> attestation service that uses some contractual policy that I don't
> >>>> know.  Maybe a system owner wants a different policy applied to ISVs.)
> >>>>    Imposing this policy on guests more or less requires filtering EINIT.
> >>>
> >>>
> >>>
> >>> Hi Andy,
> >>>
> >>> Sorry for late reply.
> >>>
> >>> What is the issue if host and guest run provisioning enclave from
> >>> different
> >>> vendor, for example, host runs intel's provisioning enclave, and guest
> >>> runs
> >>> other vendor's provisioning enclave? Or different guests run provisioning
> >>> enclaves from different vendors?
> >>
> >>
> >> There's no issue unless someone has tried to impose a policy.  There
> >> is clearly at least some interest in having policies that affect what
> >> enclaves can run -- otherwise there wouldn't be LEs in the first
> >> place.
> >>
> >>>
> >>> One reason I am asking is that, on Xen (where we don't have concept of
> >>> *host*), it's likely that we won't apply any policy at Xen hypervisor at
> >>> all, and guests will be able to run any enclave from any signer as their
> >>> wish.
> >>
> >>
> >> That seems entirely reasonable.  Someone may eventually ask Xen to add
> >> support for SGX enclave restrictions, in which case you'll either have
> >> to tell them that it won't happen or implement it.
> >>
> >>>
> >>> Sorry that I don't understand (or kind of forgot) the issues here.
> >>>
> >>>>
> >>>> 2. For kiosk-ish or single-purpose applications, I can imagine that
> >>>> you would want to allow a specific list of enclave signers or even
> >>>> enclave hashes. Maybe you would allow exactly one enclave hash.  You
> >>>> could kludge this up with a restrictive LE policy, but you could also
> >>>> do it for real by implementing the specific restriction in the kernel.
> >>>> Then you'd want to impose it on the guest, and you'd do it by
> >>>> filtering EINIT.
> >>>
> >>>
> >>> Assuming the enclave hash means measurement of enclave, and assuming we
> >>> have
> >>> a policy that we only allow enclave from one signer to run, would you
> >>> also
> >>> elaborate the issue that, if host and guest run enclaves from different
> >>> signer? If host has such policy, and we are allowing creating guests on
> >>> such
> >>> host, I think that typically we will have the same policy in the guest
> >>
> >>
> >> Yes, I presume this too, but.
> >>
> >>> (vetted by guest's kernel). The owner of that host should be aware of the
> >>> risk (if there's any) by creating guest and run enclave inside it.
> >>
> >>
> >> No.  The host does not trust the guest in general.  If the host has a
> >
> >
> > I agree.
> >
> >> policy that the only enclave that shall run is X, that doesn't mean
> >> that the host shall reject all enclaves requested by the normal
> >> userspace API except X but that, if /dev/kvm is used, then the user is
> >> magically trusted to not load a guest that fails to respect the host
> >> policy.  It means that the only enclave that shall run is X regardless
> >> of what interface is used.  The host must only allow X to be loaded by
> >> its userspace and the host must only allow X to be loaded by a guest.
> >>
> >
> > This is theoretical thing. I think your statement makes sense only if we
> > have specific example that can prove there's actual risk when allowing guest
> > to exceed X approved by host.
> >
> > I will dig more in your previous emails to see whether you have listed such
> > real cases (I some kind forgot sorry) but if you don't mind, you can list
> > such cases here.
> 
> I'm operating under the assumption that some kind of policy exists in
> the first place.  I can imagine everything working fairly well without
> any real policy, but apparently there are vendors who want restrictive
> policies.  What I can't imagine is anyone who wants a restrictive
> policy but is then okay with the host only partially enforcing it.

I think there is a certain amount of inception going on here, i.e. the only
reason we're discussing LE enforced policies in the kernel is because the LE
architecture exists and can't be disabled.  The LE, as originally designed,
is intended to be a way for *userspace* to control what code can run on the
system, e.g. to provide a hook for anti-virus/malware to inspect an enclave
since it's impossible to inspect an enclave once it is running.

The kernel doesn't need an LE to restrict what enclaves can run, e.g. it can
perform inspection at any point during the initialization process.  This is
true for guest enclaves as well since the kernel can trap EINIT.  By making
the LE kernel-only we've bastardized the concept of the LE and have negated
the primary value provided by an LE[1][2].  In my opinion, the discussion of
the kernel's launch policies is much ado about nothing, e.g. if supported by
hardware, I think we'd opt to disable launch control completely.

[1] On a system with unlocked IA32_SGXLEPUBKEYHASH MSRs, the only value added
by a using a LE to enforce the kernel's policies is defense-in-depth, e.g. an
attacker can't hide malicious code in an enclave even if it gains control of
the kernel.  I think this is a very minor benefit since running in an enclave
doesn't grant any new privileges and doesn't persist across system reset.

[2] I think it's safe to assume that any use case that requires locked hash
MSRs is out of scope for this discussion, given that the upstream kernel will
require unlocked MSRs.
Andy Lutomirski June 16, 2017, 4:43 p.m. UTC | #57
On Fri, Jun 16, 2017 at 9:31 AM, Christopherson, Sean J
<sean.j.christopherson@intel.com> wrote:
> I think there is a certain amount of inception going on here, i.e. the only
> reason we're discussing LE enforced policies in the kernel is because the LE
> architecture exists and can't be disabled.  The LE, as originally designed,
> is intended to be a way for *userspace* to control what code can run on the
> system, e.g. to provide a hook for anti-virus/malware to inspect an enclave
> since it's impossible to inspect an enclave once it is running.
>
> The kernel doesn't need an LE to restrict what enclaves can run, e.g. it can
> perform inspection at any point during the initialization process.  This is
> true for guest enclaves as well since the kernel can trap EINIT.  By making
> the LE kernel-only we've bastardized the concept of the LE and have negated
> the primary value provided by an LE[1][2].  In my opinion, the discussion of
> the kernel's launch policies is much ado about nothing, e.g. if supported by
> hardware, I think we'd opt to disable launch control completely.

Agreed.

I don't think I've ever said that the kernel should implement
restrictions on what enclaves should run [1].  All I've said is that
(a) if the kernel does implement restrictions like this, it should
apply them to guests as well and (b) that the kernel should probably
trap EINIT because that's the most sensible way to deal with the MSRs.

[1] With the possible exception of provisioning enclaves.  I'm still
not convinced that anyone except root should be allowed to run an
exclave with the provision bit set, as that bit gives access to the
provisioning key, which is rather special.  From memory, it bypasses
the owner epoch, and it may have privacy issues.  Maybe this is a
nonissue, but I'd like to see someone seriously analyze how
provisioning enclaves that may not be signed by Intel affect the
overall security of the system and how Linux should handle them.  SGX
was designed under the assumption that provisioning enclaves would
only ever be signed by Intel, and that's not the case any more, and
dealing with this intelligently may require some thought.
Sean Christopherson July 19, 2017, 3:04 p.m. UTC | #58
On Thu, 2017-05-11 at 23:11 -0700, Andy Lutomirski wrote:
> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
> > 
> > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along
> > > with whatever lock is needed (probably just a mutex).  Users of EINIT
> > > will take the mutex, compare the percpu variable to the desired value,
> > > and, if it's different, do WRMSR and update the percpu variable.
> > > 
> > > KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory
> > > state but *not* changing the MSRs.  KVM will trap and emulate EINIT to
> > > support the same handling as the host.  There is no action required at
> > > all on KVM guest entry and exit.
> > 
> > This is doable, but SGX driver needs to do those things and expose
> > interfaces for KVM to use. In terms of the percpu data, it is nice to have,
> > but I am not sure whether it is mandatory, as IMO EINIT is not even in
> > performance critical path. We can simply read old value from MSRs out and
> > compare whether the old equals to the new.
> I think the SGX driver should probably live in arch/x86, and the
> interface could be a simple percpu variable that is exported (from the
> main kernel image, not from a module).

Jarkko, what are your thoughts on moving the SGX code into arch/x86 and removing
the option to build it as a module?  This would simplify the KVM and EPC cgroup
implementations.
diff mbox

Patch

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e3770f570bb9..70482b951b0f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -417,6 +417,11 @@ 
 #define MSR_IA32_TSC_ADJUST             0x0000003b
 #define MSR_IA32_BNDCFGS		0x00000d90
 
+#define MSR_IA32_SGXLEPUBKEYHASH0	0x0000008c
+#define MSR_IA32_SGXLEPUBKEYHASH1	0x0000008d
+#define MSR_IA32_SGXLEPUBKEYHASH2	0x0000008e
+#define MSR_IA32_SGXLEPUBKEYHASH3	0x0000008f
+
 #define MSR_IA32_XSS			0x00000da0
 
 #define FEATURE_CONTROL_LOCKED				(1<<0)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a16539594a99..c96332b9dd44 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -656,6 +656,9 @@  struct vcpu_vmx {
 	 */
 	u64 msr_ia32_feature_control;
 	u64 msr_ia32_feature_control_valid_bits;
+
+	/* SGX Launch Control public key hash */
+	u64 msr_ia32_sgxlepubkeyhash[4];
 };
 
 enum segment_cache_field {
@@ -2244,6 +2247,70 @@  static void decache_tsc_multiplier(struct vcpu_vmx *vmx)
 	vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
 }
 
+static bool cpu_sgx_lepubkeyhash_writable(void)
+{
+	u64 val, sgx_lc_enabled_mask = (FEATURE_CONTROL_LOCKED |
+			FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE);
+
+	rdmsrl(MSR_IA32_FEATURE_CONTROL, val);
+
+	return ((val & sgx_lc_enabled_mask) == sgx_lc_enabled_mask);
+}
+
+static bool vmx_sgx_lc_disabled_in_bios(struct kvm_vcpu *vcpu)
+{
+	return (to_vmx(vcpu)->msr_ia32_feature_control & FEATURE_CONTROL_LOCKED)
+		&& (!(to_vmx(vcpu)->msr_ia32_feature_control &
+				FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE));
+}
+
+#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH0		0xa6053e051270b7ac
+#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH1	        0x6cfbe8ba8b3b413d
+#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH2		0xc4916d99f2b3735d
+#define	SGX_INTEL_DEFAULT_LEPUBKEYHASH3		0xd4f8c05909f9bb3b
+
+static void vmx_sgx_init_lepubkeyhash(struct kvm_vcpu *vcpu)
+{
+	u64 h0, h1, h2, h3;
+
+	/*
+	 * If runtime launch control is enabled (IA32_SGXLEPUBKEYHASHn is
+	 * writable), we set guest's default value to be Intel's default
+	 * hash (which is fixed value and can be hard-coded). Otherwise,
+	 * guest can only use machine's IA32_SGXLEPUBKEYHASHn so set guest's
+	 * default to that.
+	 */
+	if (cpu_sgx_lepubkeyhash_writable()) {
+		h0 = SGX_INTEL_DEFAULT_LEPUBKEYHASH0;
+		h1 = SGX_INTEL_DEFAULT_LEPUBKEYHASH1;
+		h2 = SGX_INTEL_DEFAULT_LEPUBKEYHASH2;
+		h3 = SGX_INTEL_DEFAULT_LEPUBKEYHASH3;
+	}
+	else {
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH0, h0);
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH1, h1);
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH2, h2);
+		rdmsrl(MSR_IA32_SGXLEPUBKEYHASH3, h3);
+	}
+
+	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0] = h0;
+	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1] = h1;
+	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2] = h2;
+	to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3] = h3;
+}
+
+static void vmx_sgx_lepubkeyhash_load(struct kvm_vcpu *vcpu)
+{
+	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0,
+			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0]);
+	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH1,
+			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1]);
+	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH2,
+			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2]);
+	wrmsrl(MSR_IA32_SGXLEPUBKEYHASH3,
+			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3]);
+}
+
 /*
  * Switches to specified vcpu, until a matching vcpu_put(), but assumes
  * vcpu mutex is already taken.
@@ -2316,6 +2383,14 @@  static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	vmx_vcpu_pi_load(vcpu, cpu);
 	vmx->host_pkru = read_pkru();
+
+	/*
+	 * Load guset's SGX LE pubkey hash if runtime launch control is
+	 * enabled.
+	 */
+	if (guest_cpuid_has_sgx_launch_control(vcpu) &&
+			cpu_sgx_lepubkeyhash_writable())
+		vmx_sgx_lepubkeyhash_load(vcpu);
 }
 
 static void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
@@ -3225,6 +3300,19 @@  static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_FEATURE_CONTROL:
 		msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control;
 		break;
+	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
+		/*
+		 * SDM 35.1 Model-Specific Registers, table 35-2.
+		 * Read permitted if CPUID.0x12.0:EAX[0] = 1. (We have
+		 * guaranteed this will be true if guest_cpuid_has_sgx
+		 * is true.)
+		 */
+		if (!guest_cpuid_has_sgx(vcpu))
+			return 1;
+		msr_info->data =
+			to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_info->index -
+			MSR_IA32_SGXLEPUBKEYHASH0];
+		break;
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!nested_vmx_allowed(vcpu))
 			return 1;
@@ -3344,6 +3432,37 @@  static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 * SGX has been enabled in BIOS before using SGX.
 		 */
 		break;
+	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
+		/*
+		 * SDM 35.1 Model-Specific Registers, table 35-2.
+		 * - If CPUID.0x7.0:ECX[30] = 1, FEATURE_CONTROL[17] is
+		 * available.
+		 * - Write permitted if CPUID.0x12.0:EAX[0] = 1 &&
+		 * FEATURE_CONTROL[17] = 1 && FEATURE_CONTROL[0] = 1.
+		 */
+		if (!guest_cpuid_has_sgx(vcpu) ||
+				!guest_cpuid_has_sgx_launch_control(vcpu))
+			return 1;
+		/*
+		 * Don't let userspace set guest's IA32_SGXLEPUBKEYHASHn,
+		 * if machine's IA32_SGXLEPUBKEYHASHn cannot be changed at
+		 * runtime. Note to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash are
+		 * set to default in vmx_create_vcpu therefore guest is able
+		 * to get the machine's IA32_SGXLEPUBKEYHASHn by rdmsr in
+		 * guest.
+		 */
+		if (!cpu_sgx_lepubkeyhash_writable())
+			return 1;
+		/*
+		 * If guest's FEATURE_CONTROL[17] is not set, guest's
+		 * IA32_SGXLEPUBKEYHASHn are not writeable from guest.
+		 */
+		if (!vmx_sgx_lc_disabled_in_bios(vcpu) &&
+				!msr_info->host_initiated)
+			return 1;
+		to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_index -
+			MSR_IA32_SGXLEPUBKEYHASH0] = data;
+		break;
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!msr_info->host_initiated)
 			return 1; /* they are read-only */
@@ -9305,6 +9424,10 @@  static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 		vmx->nested.vpid02 = allocate_vpid();
 	}
 
+	/* Set vcpu's default IA32_SGXLEPUBKEYHASHn */
+	if (enable_sgx && boot_cpu_has(X86_FEATURE_SGX_LAUNCH_CONTROL))
+		vmx_sgx_init_lepubkeyhash(&vmx->vcpu);
+
 	vmx->nested.posted_intr_nv = -1;
 	vmx->nested.current_vmptr = -1ull;
 	vmx->nested.current_vmcs12 = NULL;