Message ID | 20170508052434.3627-9-kai.huang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Paolo/Radim, I'd like to start a discussion regarding to IA32_SGXLEPUBKEYHASHn handling here. I also copied SGX driver mailing list (which looks like I should do when sending out this series, sorry) and Sean, Haim and Haitao from Intel to have a better discussion. Basically IA32_SGXLEPUBKEYHASHn (or more generally speaking, SGX Launch Control) allows us to run different Launch Enclave (LE) signed with different RSA keys. Only when the value of IA32_SGXLEPUBKEYHASHn matches the key used to sign the LE, the LE can be initialized, specifically, by EINIT, successfully. So before calling EINIT for LE, we have to make sure IA32_SGXLEPUBKEYHASHn contain the matched value. One fact is only EINIT uses IA32_SGXLEPUBKEYHASHn, and after EINIT, other ENCLS/ENCLU (ex, EGETKEY) runs correctly even the MSRs are changed to other values. To support KVM guests to run their own LEs inside guests, KVM traps IA32_SGXLEPUBKEYHASHn MSR write and keep the value to vcpu internally, and KVM needs to write the cached value to real MSRs before guest runs EINIT. The problem at host side, we also run LE, probably multiple LEs (it seems currently SGX driver plans to run single in-kernel LE but I am not familiar with details, and IMO we should not assume host will only run one LE), therefore if KVM changes the physical MSRs for guest, host may not be able to run LE as it may not re-write the right MSRs back. There are two approaches to make host and KVM guests work together: 1. Anyone who wants to run LE is responsible for writing the correct value to IA32_SGXLEPUBKEYHASHn. My current patch is based on this assumption. For KVM guest, naturally, we will write the cached value to real MSRs when vcpu is scheduled in. For host, SGX driver should write its own value to MSRs when it performs EINIT for LE. One argument against this approach is KVM guest should never have impact on host side, meaning host should not be aware of such MSR change, in which case, if host do some performance optimization thing that won't update MSRs actively, when host run EINIT, the physical MSRs may contain incorrect value. Instead, KVM should be responsible for restoring the original MSRs, which brings us to approach 2 below. 2. KVM should restore MSRs after changing for guest. To do this, the simplest way for KVM is: 1) to save the original physical MSRs and update to guest's MSRs before VMENTRY; 2) in VMEXIT rewrite the original value to physical MSRs. To me this approach is also arguable, as KVM guest is actually just a normal process (OK, maybe not that normal), and KVM guest should be treated as the same as other processes which runs LE, which means approach 1 is also reasonable. And approach 2 will have more performance impact than approach 1 for KVM, as it read/write IA32_SGXLEPUBKEYHASHn during each VMEXIT/VMENTRY, while approach 1 only write MSRs when vcpu is scheduled in, which is less frequent. I'd like to hear all your comments and hopefully we can have some agreement on this. Another thing is, not quite related to selecting which approach above, and either we choose approach 1 or approach 2, KVM still suffers the performance loss of writing (and/or reading) to IA32_SGXLEPUBKEYHASHn MSRs, either when vcpu scheduled in or during each VMEXIT/VMENTRY. Given the fact that the IA32_SGXLEPUBKEYHASHn will only be used by EINIT, We can actually do some optimization by trapping EINIT from guest and only update MSRs in EINIT VMEXIT. This works for approach 1, but for approach 2 we have to do some tricky thing during VMEXIT/VMENTRY to check whether MSRs have been changed by EINIT VMEXIT, and only restore the original value if EINIT VMEXIT has happened. Guest's LE continues to run even physical MSRs are changed back to original. But trapping ENCLS requires either 1) KVM to run ENCLS on hebalf of guest, in which case we have to reconstruct and remap guest's ENCLS parameters and skip the ENCLS for guest; 2) using MTF to let guest to run ENCLS again, while still trapping ENCLS. Either case would introduce more complicated code and potentially be more buggy, and I don't think we should do this to save some time of writing MSRs. If we need to turn on ENCLS VMEXIT anyway we can optimize this. Thank you in advance. Thanks, -Kai On 5/8/2017 5:24 PM, Kai Huang wrote: > If SGX runtime launch control is enabled on host (IA32_FEATURE_CONTROL[17] > is set), KVM can support running multiple guests with each running LE signed > with different RSA pubkey. KVM traps IA32_SGXLEPUBKEYHASHn MSR write from > and keeps the values to vcpu internally, and when vcpu is scheduled in, KVM > write those values to real IA32_SGXLEPUBKEYHASHn MSR. > > Signed-off-by: Kai Huang <kai.huang@linux.intel.com> > --- > arch/x86/include/asm/msr-index.h | 5 ++ > arch/x86/kvm/vmx.c | 123 +++++++++++++++++++++++++++++++++++++++ > 2 files changed, 128 insertions(+) > > diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h > index e3770f570bb9..70482b951b0f 100644 > --- a/arch/x86/include/asm/msr-index.h > +++ b/arch/x86/include/asm/msr-index.h > @@ -417,6 +417,11 @@ > #define MSR_IA32_TSC_ADJUST 0x0000003b > #define MSR_IA32_BNDCFGS 0x00000d90 > > +#define MSR_IA32_SGXLEPUBKEYHASH0 0x0000008c > +#define MSR_IA32_SGXLEPUBKEYHASH1 0x0000008d > +#define MSR_IA32_SGXLEPUBKEYHASH2 0x0000008e > +#define MSR_IA32_SGXLEPUBKEYHASH3 0x0000008f > + > #define MSR_IA32_XSS 0x00000da0 > > #define FEATURE_CONTROL_LOCKED (1<<0) > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index a16539594a99..c96332b9dd44 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -656,6 +656,9 @@ struct vcpu_vmx { > */ > u64 msr_ia32_feature_control; > u64 msr_ia32_feature_control_valid_bits; > + > + /* SGX Launch Control public key hash */ > + u64 msr_ia32_sgxlepubkeyhash[4]; > }; > > enum segment_cache_field { > @@ -2244,6 +2247,70 @@ static void decache_tsc_multiplier(struct vcpu_vmx *vmx) > vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio); > } > > +static bool cpu_sgx_lepubkeyhash_writable(void) > +{ > + u64 val, sgx_lc_enabled_mask = (FEATURE_CONTROL_LOCKED | > + FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE); > + > + rdmsrl(MSR_IA32_FEATURE_CONTROL, val); > + > + return ((val & sgx_lc_enabled_mask) == sgx_lc_enabled_mask); > +} > + > +static bool vmx_sgx_lc_disabled_in_bios(struct kvm_vcpu *vcpu) > +{ > + return (to_vmx(vcpu)->msr_ia32_feature_control & FEATURE_CONTROL_LOCKED) > + && (!(to_vmx(vcpu)->msr_ia32_feature_control & > + FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE)); > +} > + > +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH0 0xa6053e051270b7ac > +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH1 0x6cfbe8ba8b3b413d > +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH2 0xc4916d99f2b3735d > +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH3 0xd4f8c05909f9bb3b > + > +static void vmx_sgx_init_lepubkeyhash(struct kvm_vcpu *vcpu) > +{ > + u64 h0, h1, h2, h3; > + > + /* > + * If runtime launch control is enabled (IA32_SGXLEPUBKEYHASHn is > + * writable), we set guest's default value to be Intel's default > + * hash (which is fixed value and can be hard-coded). Otherwise, > + * guest can only use machine's IA32_SGXLEPUBKEYHASHn so set guest's > + * default to that. > + */ > + if (cpu_sgx_lepubkeyhash_writable()) { > + h0 = SGX_INTEL_DEFAULT_LEPUBKEYHASH0; > + h1 = SGX_INTEL_DEFAULT_LEPUBKEYHASH1; > + h2 = SGX_INTEL_DEFAULT_LEPUBKEYHASH2; > + h3 = SGX_INTEL_DEFAULT_LEPUBKEYHASH3; > + } > + else { > + rdmsrl(MSR_IA32_SGXLEPUBKEYHASH0, h0); > + rdmsrl(MSR_IA32_SGXLEPUBKEYHASH1, h1); > + rdmsrl(MSR_IA32_SGXLEPUBKEYHASH2, h2); > + rdmsrl(MSR_IA32_SGXLEPUBKEYHASH3, h3); > + } > + > + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0] = h0; > + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1] = h1; > + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2] = h2; > + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3] = h3; > +} > + > +static void vmx_sgx_lepubkeyhash_load(struct kvm_vcpu *vcpu) > +{ > + wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0, > + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0]); > + wrmsrl(MSR_IA32_SGXLEPUBKEYHASH1, > + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1]); > + wrmsrl(MSR_IA32_SGXLEPUBKEYHASH2, > + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2]); > + wrmsrl(MSR_IA32_SGXLEPUBKEYHASH3, > + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3]); > +} > + > /* > * Switches to specified vcpu, until a matching vcpu_put(), but assumes > * vcpu mutex is already taken. > @@ -2316,6 +2383,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > > vmx_vcpu_pi_load(vcpu, cpu); > vmx->host_pkru = read_pkru(); > + > + /* > + * Load guset's SGX LE pubkey hash if runtime launch control is > + * enabled. > + */ > + if (guest_cpuid_has_sgx_launch_control(vcpu) && > + cpu_sgx_lepubkeyhash_writable()) > + vmx_sgx_lepubkeyhash_load(vcpu); > } > > static void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu) > @@ -3225,6 +3300,19 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > case MSR_IA32_FEATURE_CONTROL: > msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control; > break; > + case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3: > + /* > + * SDM 35.1 Model-Specific Registers, table 35-2. > + * Read permitted if CPUID.0x12.0:EAX[0] = 1. (We have > + * guaranteed this will be true if guest_cpuid_has_sgx > + * is true.) > + */ > + if (!guest_cpuid_has_sgx(vcpu)) > + return 1; > + msr_info->data = > + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_info->index - > + MSR_IA32_SGXLEPUBKEYHASH0]; > + break; > case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC: > if (!nested_vmx_allowed(vcpu)) > return 1; > @@ -3344,6 +3432,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > * SGX has been enabled in BIOS before using SGX. > */ > break; > + case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3: > + /* > + * SDM 35.1 Model-Specific Registers, table 35-2. > + * - If CPUID.0x7.0:ECX[30] = 1, FEATURE_CONTROL[17] is > + * available. > + * - Write permitted if CPUID.0x12.0:EAX[0] = 1 && > + * FEATURE_CONTROL[17] = 1 && FEATURE_CONTROL[0] = 1. > + */ > + if (!guest_cpuid_has_sgx(vcpu) || > + !guest_cpuid_has_sgx_launch_control(vcpu)) > + return 1; > + /* > + * Don't let userspace set guest's IA32_SGXLEPUBKEYHASHn, > + * if machine's IA32_SGXLEPUBKEYHASHn cannot be changed at > + * runtime. Note to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash are > + * set to default in vmx_create_vcpu therefore guest is able > + * to get the machine's IA32_SGXLEPUBKEYHASHn by rdmsr in > + * guest. > + */ > + if (!cpu_sgx_lepubkeyhash_writable()) > + return 1; > + /* > + * If guest's FEATURE_CONTROL[17] is not set, guest's > + * IA32_SGXLEPUBKEYHASHn are not writeable from guest. > + */ > + if (!vmx_sgx_lc_disabled_in_bios(vcpu) && > + !msr_info->host_initiated) > + return 1; > + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_index - > + MSR_IA32_SGXLEPUBKEYHASH0] = data; > + break; > case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC: > if (!msr_info->host_initiated) > return 1; /* they are read-only */ > @@ -9305,6 +9424,10 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) > vmx->nested.vpid02 = allocate_vpid(); > } > > + /* Set vcpu's default IA32_SGXLEPUBKEYHASHn */ > + if (enable_sgx && boot_cpu_has(X86_FEATURE_SGX_LAUNCH_CONTROL)) > + vmx_sgx_init_lepubkeyhash(&vmx->vcpu); > + > vmx->nested.posted_intr_nv = -1; > vmx->nested.current_vmptr = -1ull; > vmx->nested.current_vmcs12 = NULL; >
[resending due to some kind of kernel.org glitch -- sorry if anyone gets duplicates] On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > My current patch is based on this assumption. For KVM guest, naturally, we > will write the cached value to real MSRs when vcpu is scheduled in. For > host, SGX driver should write its own value to MSRs when it performs EINIT > for LE. This seems unnecessarily slow (perhaps *extremely* slow) to me. I would propose a totally different solution: Have a percpu variable that stores the current SGXLEPUBKEYHASH along with whatever lock is needed (probably just a mutex). Users of EINIT will take the mutex, compare the percpu variable to the desired value, and, if it's different, do WRMSR and update the percpu variable. KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory state but *not* changing the MSRs. KVM will trap and emulate EINIT to support the same handling as the host. There is no action required at all on KVM guest entry and exit. FWIW, I think that KVM will, in the long run, want to trap EINIT for other reasons: someone is going to want to implement policy for what enclaves are allowed that applies to guests as well as the host. Also, some day Intel may fix its architectural design flaw [1] by allowing EINIT to personalize the enclave's keying, and, if it's done by a new argument to EINIT instead of an MSR, KVM will have to trap EINIT to handle it. > > One argument against this approach is KVM guest should never have impact on > host side, meaning host should not be aware of such MSR change As a somewhat generic comment, I don't like this approach to KVM development. KVM mucks with lots of important architectural control registers, and, in all too many cases, it tries to do so independently of the other arch/x86 code. This ends up causing all kinds of grief. Can't KVM and the real x86 arch code cooperate for real? The host and the KVM code are in arch/x86 in the same source tree. > > 2. KVM should restore MSRs after changing for guest. No, this is IMO silly. Don't restore it on each exit, in the user return hook, or anywhere else. Just make sure the host knows it was changed. > Another thing is, not quite related to selecting which approach above, and > either we choose approach 1 or approach 2, KVM still suffers the performance > loss of writing (and/or reading) to IA32_SGXLEPUBKEYHASHn MSRs, either when > vcpu scheduled in or during each VMEXIT/VMENTRY. Given the fact that the > IA32_SGXLEPUBKEYHASHn will only be used by EINIT, We can actually do some > optimization by trapping EINIT from guest and only update MSRs in EINIT > VMEXIT. Yep. > But trapping ENCLS requires either 1) KVM to run ENCLS on hebalf of guest, > in which case we have to reconstruct and remap guest's ENCLS parameters and > skip the ENCLS for guest; 2) using MTF to let guest to run ENCLS again, > while still trapping ENCLS. I would advocate for the former approach. (But you can't remap the parameters due to TOCTOU issues, locking, etc. Just copy them. I don't see why this is any more complicated than emulating any other instruction that accesses memory.) If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is wrong and then clear the exit flag once the MSRs are in sync. You'll need to be careful to avoid races in which the host's value leaks into the guest. I think you'll find that this is more complicated, less flexible, and less performant than just handling ENCLS[EINIT] directly in the host. [1] Guests that steal sealed data from each other or from the host can manipulate that data without compromising the hypervisor by simply loading the same enclave that its rightful owner would use. If you're trying to use SGX to protect your crypto credentials so that, if stolen, they can't be used outside the guest, I would consider this to be a major flaw. It breaks the security model in a multi-tenant cloud situation. I've complained about it before.
Hi Andy, Thanks for your helpful comments! See my reply below, and some questions as well. On 5/12/2017 3:28 PM, Andy Lutomirski wrote: > [resending due to some kind of kernel.org glitch -- sorry if anyone > gets duplicates] > > On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: >> My current patch is based on this assumption. For KVM guest, naturally, we >> will write the cached value to real MSRs when vcpu is scheduled in. For >> host, SGX driver should write its own value to MSRs when it performs EINIT >> for LE. > > This seems unnecessarily slow (perhaps *extremely* slow) to me. I > would propose a totally different solution: I am not sure whether the cost of writing to 4 MSRs would be *extremely* slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing to several MSRs, etc. > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along > with whatever lock is needed (probably just a mutex). Users of EINIT > will take the mutex, compare the percpu variable to the desired value, > and, if it's different, do WRMSR and update the percpu variable. > > KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory > state but *not* changing the MSRs. KVM will trap and emulate EINIT to > support the same handling as the host. There is no action required at > all on KVM guest entry and exit. This is doable, but SGX driver needs to do those things and expose interfaces for KVM to use. In terms of the percpu data, it is nice to have, but I am not sure whether it is mandatory, as IMO EINIT is not even in performance critical path. We can simply read old value from MSRs out and compare whether the old equals to the new. > > FWIW, I think that KVM will, in the long run, want to trap EINIT for > other reasons: someone is going to want to implement policy for what > enclaves are allowed that applies to guests as well as the host. I am not very convinced why "what enclaves are allowed" in host would apply to guest. Can you elaborate? I mean in general virtualization just focus emulating hardware behavior. If a native machine is able to run any LE, the virtual machine should be able to as well (of course, with guest's IA32_FEATURE_CONTROL[bit 17] set). > Also, some day Intel may fix its architectural design flaw [1] by > allowing EINIT to personalize the enclave's keying, and, if it's done > by a new argument to EINIT instead of an MSR, KVM will have to trap > EINIT to handle it. Looks this flaw is not the same issue as above (host enclave policy applies to guest)? > >> >> One argument against this approach is KVM guest should never have impact on >> host side, meaning host should not be aware of such MSR change > > As a somewhat generic comment, I don't like this approach to KVM > development. KVM mucks with lots of important architectural control > registers, and, in all too many cases, it tries to do so independently > of the other arch/x86 code. This ends up causing all kinds of grief. > > Can't KVM and the real x86 arch code cooperate for real? The host and > the KVM code are in arch/x86 in the same source tree. Currently on host SGX driver, which is pretty much self-contained, implements all SGX related staff. > >> >> 2. KVM should restore MSRs after changing for guest. > > No, this is IMO silly. Don't restore it on each exit, in the user > return hook, or anywhere else. Just make sure the host knows it was > changed. > >> Another thing is, not quite related to selecting which approach above, and >> either we choose approach 1 or approach 2, KVM still suffers the performance >> loss of writing (and/or reading) to IA32_SGXLEPUBKEYHASHn MSRs, either when >> vcpu scheduled in or during each VMEXIT/VMENTRY. Given the fact that the >> IA32_SGXLEPUBKEYHASHn will only be used by EINIT, We can actually do some >> optimization by trapping EINIT from guest and only update MSRs in EINIT >> VMEXIT. > > Yep. > >> But trapping ENCLS requires either 1) KVM to run ENCLS on hebalf of guest, >> in which case we have to reconstruct and remap guest's ENCLS parameters and >> skip the ENCLS for guest; 2) using MTF to let guest to run ENCLS again, >> while still trapping ENCLS. > > I would advocate for the former approach. (But you can't remap the > parameters due to TOCTOU issues, locking, etc. Just copy them. I > don't see why this is any more complicated than emulating any other > instruction that accesses memory.) No you cannot just copy. Because all address in guest's ENCLS parameters are guest's virtual address, we cannot use them to execute ENCLS in KVM. If any guest virtual addresses is used in ENCLS parameters, for example, PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's virtual address. Btw, what is TOCTOU issue? would you also elaborate locking issue? > > If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is > wrong and then clear the exit flag once the MSRs are in sync. You'll > need to be careful to avoid races in which the host's value leaks into > the guest. I think you'll find that this is more complicated, less > flexible, and less performant than just handling ENCLS[EINIT] directly > in the host. Sorry I don't quite follow this part. Why would host's value leaks into guest? I suppose the *value* means host's IA32_SGXLEPUBKEYHASHn? guest's MSR read/write is always trapped and emulated by KVM. > > [1] Guests that steal sealed data from each other or from the host can > manipulate that data without compromising the hypervisor by simply > loading the same enclave that its rightful owner would use. If you're > trying to use SGX to protect your crypto credentials so that, if > stolen, they can't be used outside the guest, I would consider this to > be a major flaw. It breaks the security model in a multi-tenant cloud > situation. I've complained about it before. > Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this case even it is leaked looks we cannot dig anything out just the hash value? Thanks, -Kai
On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > I am not sure whether the cost of writing to 4 MSRs would be *extremely* > slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing > to several MSRs, etc. I'm speculating that these MSRs may be rather unoptimized and hence unusualy slow. > >> >> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >> with whatever lock is needed (probably just a mutex). Users of EINIT >> will take the mutex, compare the percpu variable to the desired value, >> and, if it's different, do WRMSR and update the percpu variable. >> >> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory >> state but *not* changing the MSRs. KVM will trap and emulate EINIT to >> support the same handling as the host. There is no action required at >> all on KVM guest entry and exit. > > > This is doable, but SGX driver needs to do those things and expose > interfaces for KVM to use. In terms of the percpu data, it is nice to have, > but I am not sure whether it is mandatory, as IMO EINIT is not even in > performance critical path. We can simply read old value from MSRs out and > compare whether the old equals to the new. I think the SGX driver should probably live in arch/x86, and the interface could be a simple percpu variable that is exported (from the main kernel image, not from a module). > >> >> FWIW, I think that KVM will, in the long run, want to trap EINIT for >> other reasons: someone is going to want to implement policy for what >> enclaves are allowed that applies to guests as well as the host. > > > I am not very convinced why "what enclaves are allowed" in host would apply > to guest. Can you elaborate? I mean in general virtualization just focus > emulating hardware behavior. If a native machine is able to run any LE, the > virtual machine should be able to as well (of course, with guest's > IA32_FEATURE_CONTROL[bit 17] set). I strongly disagree. I can imagine two classes of sensible policies for launch control: 1. Allow everything. This seems quite sensible to me. 2. Allow some things, and make sure that VMs have at least as restrictive a policy as host root has. After all, what's the point of restricting enclaves in the host if host code can simply spawn a little VM to run otherwise-disallowed enclaves? > >> Also, some day Intel may fix its architectural design flaw [1] by >> allowing EINIT to personalize the enclave's keying, and, if it's done >> by a new argument to EINIT instead of an MSR, KVM will have to trap >> EINIT to handle it. > > > Looks this flaw is not the same issue as above (host enclave policy applies > to guest)? It's related. Without this flaw, it might make sense to apply looser policy in the guest as in the host. With this flaw, I think your policy fails to have any real effect if you don't enforce it on guests. > >> >>> >>> One argument against this approach is KVM guest should never have impact >>> on >>> host side, meaning host should not be aware of such MSR change >> >> >> As a somewhat generic comment, I don't like this approach to KVM >> development. KVM mucks with lots of important architectural control >> registers, and, in all too many cases, it tries to do so independently >> of the other arch/x86 code. This ends up causing all kinds of grief. >> >> Can't KVM and the real x86 arch code cooperate for real? The host and >> the KVM code are in arch/x86 in the same source tree. > > > Currently on host SGX driver, which is pretty much self-contained, > implements all SGX related staff. I will probably NAK this if it comes my way for inclusion upstream. Just because it can be self-contained doesn't mean it should be self-contained. >> >> I would advocate for the former approach. (But you can't remap the >> parameters due to TOCTOU issues, locking, etc. Just copy them. I >> don't see why this is any more complicated than emulating any other >> instruction that accesses memory.) > > > No you cannot just copy. Because all address in guest's ENCLS parameters are > guest's virtual address, we cannot use them to execute ENCLS in KVM. If any > guest virtual addresses is used in ENCLS parameters, for example, > PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's > virtual address. > > Btw, what is TOCTOU issue? would you also elaborate locking issue? I was partially mis-remembering how this worked. It looks like SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be mapped. If KVM applied some policy to the launchable enclaves, it would want to make sure that it only looks at fields that are copied to make sure that the enclave that gets launched is the one it verified. The locking issue I'm imagining is that the SECS (or whatever else might be mapped) doesn't disappear and get reused for something else while it's mapped in the host. Presumably KVM has an existing mechanism for this, but maybe SECS is special because it's not quite normal memory IIRC. > >> >> If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is >> wrong and then clear the exit flag once the MSRs are in sync. You'll >> need to be careful to avoid races in which the host's value leaks into >> the guest. I think you'll find that this is more complicated, less >> flexible, and less performant than just handling ENCLS[EINIT] directly >> in the host. > > > Sorry I don't quite follow this part. Why would host's value leaks into > guest? I suppose the *value* means host's IA32_SGXLEPUBKEYHASHn? guest's MSR > read/write is always trapped and emulated by KVM. You'd need to make sure that this sequence of events doesn't happen: - Guest does EINIT and it exits. - Host updates the MSRs and the ENCLS-exiting bitmap. - Guest is preempted before it retries EINIT. - A different host thread launches an enclave, thus changing the MSRs. - Guest resumes and runs EINIT without exiting with the wrong MSR values. > >> >> [1] Guests that steal sealed data from each other or from the host can >> manipulate that data without compromising the hypervisor by simply >> loading the same enclave that its rightful owner would use. If you're >> trying to use SGX to protect your crypto credentials so that, if >> stolen, they can't be used outside the guest, I would consider this to >> be a major flaw. It breaks the security model in a multi-tenant cloud >> situation. I've complained about it before. >> > > Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this > case even it is leaked looks we cannot dig anything out just the hash value? Not sure what you mean. Are you asking about the lack of guest personalization? Concretely, imagine I write an enclave that seals my TLS client certificate's private key and offers an API to sign TLS certificate requests with it. This way, if my system is compromised, an attacker can use the certificate only so long as they have access to my machine. If I kick them out or if they merely get the ability to read the sealed data but not to execute code, the private key should still be safe. But, if this system is a VM guest, the attacker could run the exact same enclave on another guest on the same physical CPU and sign using my key. Whoops!
Andy Lutomirski <luto@kernel.org> wrote: > On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > I am not sure whether the cost of writing to 4 MSRs would be *extremely* > > slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing > > to several MSRs, etc. > > I'm speculating that these MSRs may be rather unoptimized and hence > unusualy slow. > Good speculation :) We've been told to expect that writing the hash MSRs will be at least 2.5x slower than normal MSRs. > > > >> > >> Have a percpu variable that stores the current SGXLEPUBKEYHASH along > >> with whatever lock is needed (probably just a mutex). Users of EINIT > >> will take the mutex, compare the percpu variable to the desired value, > >> and, if it's different, do WRMSR and update the percpu variable. > >> > >> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory > >> state but *not* changing the MSRs. KVM will trap and emulate EINIT to > >> support the same handling as the host. There is no action required at > >> all on KVM guest entry and exit. > > > > > > This is doable, but SGX driver needs to do those things and expose > > interfaces for KVM to use. In terms of the percpu data, it is nice to have, > > but I am not sure whether it is mandatory, as IMO EINIT is not even in > > performance critical path. We can simply read old value from MSRs out and > > compare whether the old equals to the new. > > I think the SGX driver should probably live in arch/x86, and the > interface could be a simple percpu variable that is exported (from the > main kernel image, not from a module). > Agreed, this would make life easier for future SGX code that can't be self-contained in the driver, e.g. EPC cgroup. Future architectural enhancements might also require tighter integration with the kernel. > >> > >> I would advocate for the former approach. (But you can't remap the > >> parameters due to TOCTOU issues, locking, etc. Just copy them. I > >> don't see why this is any more complicated than emulating any other > >> instruction that accesses memory.) > > > > > > No you cannot just copy. Because all address in guest's ENCLS parameters are > > guest's virtual address, we cannot use them to execute ENCLS in KVM. If any > > guest virtual addresses is used in ENCLS parameters, for example, > > PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's > > virtual address. > > > > Btw, what is TOCTOU issue? would you also elaborate locking issue? > > I was partially mis-remembering how this worked. It looks like > SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be > mapped. If KVM applied some policy to the launchable enclaves, it > would want to make sure that it only looks at fields that are copied > to make sure that the enclave that gets launched is the one it > verified. The locking issue I'm imagining is that the SECS (or > whatever else might be mapped) doesn't disappear and get reused for > something else while it's mapped in the host. Presumably KVM has an > existing mechanism for this, but maybe SECS is special because it's > not quite normal memory IIRC. > Mapping the SECS in the host should not be an issue, AFAIK there aren't any restrictions on the VA passed to EINIT as long as it resolves to a SECS page in the EPCM, e.g. the SGX driver maps the SECS for EINIT with an arbitrary VA. I don't think emulating EINIT introduces any TOCTOU race conditions that wouldn't already exist. Evicting the SECS or modifying the page tables on a different thread while executing EINIT is either a guest kernel bug or bizarre behavior that the guest can already handle. Similarly, KVM would need special handling for evicting a guest's SECS, regardless of EINIT emulation. > >> [1] Guests that steal sealed data from each other or from the host can > >> manipulate that data without compromising the hypervisor by simply > >> loading the same enclave that its rightful owner would use. If you're > >> trying to use SGX to protect your crypto credentials so that, if > >> stolen, they can't be used outside the guest, I would consider this to > >> be a major flaw. It breaks the security model in a multi-tenant cloud > >> situation. I've complained about it before. > >> > > > > Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this > > case even it is leaked looks we cannot dig anything out just the hash value? > > Not sure what you mean. Are you asking about the lack of guest > personalization? > > Concretely, imagine I write an enclave that seals my TLS client > certificate's private key and offers an API to sign TLS certificate > requests with it. This way, if my system is compromised, an attacker > can use the certificate only so long as they have access to my > machine. If I kick them out or if they merely get the ability to read > the sealed data but not to execute code, the private key should still > be safe. But, if this system is a VM guest, the attacker could run > the exact same enclave on another guest on the same physical CPU and > sign using my key. Whoops! I know this issue has been raised internally as well, but I don't know the status of the situation. I'll follow up and provide any information I can.
Christopherson, Sean J <sean.j.christopherson@intel.com> wrote: > Andy Lutomirski <luto@kernel.org> wrote: > > On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> > > >> [1] Guests that steal sealed data from each other or from the host can > > >> manipulate that data without compromising the hypervisor by simply > > >> loading the same enclave that its rightful owner would use. If you're > > >> trying to use SGX to protect your crypto credentials so that, if > > >> stolen, they can't be used outside the guest, I would consider this to > > >> be a major flaw. It breaks the security model in a multi-tenant cloud > > >> situation. I've complained about it before. > > >> > > > > > > Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In > > > this case even it is leaked looks we cannot dig anything out just the > > > hash value? > > > > Not sure what you mean. Are you asking about the lack of guest > > personalization? > > > > Concretely, imagine I write an enclave that seals my TLS client > > certificate's private key and offers an API to sign TLS certificate > > requests with it. This way, if my system is compromised, an attacker > > can use the certificate only so long as they have access to my > > machine. If I kick them out or if they merely get the ability to read > > the sealed data but not to execute code, the private key should still > > be safe. But, if this system is a VM guest, the attacker could run > > the exact same enclave on another guest on the same physical CPU and > > sign using my key. Whoops! > > I know this issue has been raised internally as well, but I don't know > the status of the situation. I'll follow up and provide any information > I can. So, the key players are well aware of the value added by per-VM keys, but, ultimately, shipping this feature is dependent on having strong requests from customers.
On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote: > [resending due to some kind of kernel.org glitch -- sorry if anyone > gets duplicates] > > On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > My current patch is based on this assumption. For KVM guest, naturally, we > > will write the cached value to real MSRs when vcpu is scheduled in. For > > host, SGX driver should write its own value to MSRs when it performs EINIT > > for LE. > > This seems unnecessarily slow (perhaps *extremely* slow) to me. I > would propose a totally different solution: > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along > with whatever lock is needed (probably just a mutex). Users of EINIT > will take the mutex, compare the percpu variable to the desired value, > and, if it's different, do WRMSR and update the percpu variable. This is exactly what I've been suggesting internally: trap EINIT and check the value and write conditionally. I think this would be the best starting point. /Jarkko
On 5/16/2017 12:46 AM, Jarkko Sakkinen wrote: > On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote: >> [resending due to some kind of kernel.org glitch -- sorry if anyone >> gets duplicates] >> >> On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: >>> My current patch is based on this assumption. For KVM guest, naturally, we >>> will write the cached value to real MSRs when vcpu is scheduled in. For >>> host, SGX driver should write its own value to MSRs when it performs EINIT >>> for LE. >> >> This seems unnecessarily slow (perhaps *extremely* slow) to me. I >> would propose a totally different solution: >> >> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >> with whatever lock is needed (probably just a mutex). Users of EINIT >> will take the mutex, compare the percpu variable to the desired value, >> and, if it's different, do WRMSR and update the percpu variable. > > This is exactly what I've been suggesting internally: trap EINIT and > check the value and write conditionally. > > I think this would be the best starting point. OK. Assuming we are going to have this percpu variable for IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to this percpu variable after KVM writes guest's value to hardware MSR? And host (SGX driver) need to do the same thing (check the value and write conditionally), correct? Thanks, -Kai > > /Jarkko >
On 5/12/2017 6:11 PM, Andy Lutomirski wrote: > On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: >> I am not sure whether the cost of writing to 4 MSRs would be *extremely* >> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing >> to several MSRs, etc. > > I'm speculating that these MSRs may be rather unoptimized and hence > unusualy slow. > >> >>> >>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >>> with whatever lock is needed (probably just a mutex). Users of EINIT >>> will take the mutex, compare the percpu variable to the desired value, >>> and, if it's different, do WRMSR and update the percpu variable. >>> >>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory >>> state but *not* changing the MSRs. KVM will trap and emulate EINIT to >>> support the same handling as the host. There is no action required at >>> all on KVM guest entry and exit. >> >> >> This is doable, but SGX driver needs to do those things and expose >> interfaces for KVM to use. In terms of the percpu data, it is nice to have, >> but I am not sure whether it is mandatory, as IMO EINIT is not even in >> performance critical path. We can simply read old value from MSRs out and >> compare whether the old equals to the new. > > I think the SGX driver should probably live in arch/x86, and the > interface could be a simple percpu variable that is exported (from the > main kernel image, not from a module). > >> >>> >>> FWIW, I think that KVM will, in the long run, want to trap EINIT for >>> other reasons: someone is going to want to implement policy for what >>> enclaves are allowed that applies to guests as well as the host. >> >> >> I am not very convinced why "what enclaves are allowed" in host would apply >> to guest. Can you elaborate? I mean in general virtualization just focus >> emulating hardware behavior. If a native machine is able to run any LE, the >> virtual machine should be able to as well (of course, with guest's >> IA32_FEATURE_CONTROL[bit 17] set). > > I strongly disagree. I can imagine two classes of sensible policies > for launch control: > > 1. Allow everything. This seems quite sensible to me. > > 2. Allow some things, and make sure that VMs have at least as > restrictive a policy as host root has. After all, what's the point of > restricting enclaves in the host if host code can simply spawn a > little VM to run otherwise-disallowed enclaves? What's the current SGX driver launch control policy? Yes allow everything works for KVM so lets skip this. Are we going to support allowing several LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE staff but I don't know details. I am trying to find a way that we can both not break host launch control policy, and be consistent to HW behavior (from guest's view). Currently we can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn either enabled or disabled. I introduced an Qemu parameter 'lewr' for this purpose. Actually I introduced below Qemu SGX parameters for creating guest: -sgx epc=<size>,lehash='SHA-256 hash',lewr where 'epc' specifies guest's EPC size, lehash specifies (initial) value of guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is allowed to change guest's IA32_SGXLEPUBKEYHASHn at runtime. If host only allows one single LE to run, KVM can add a restrict that only allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed (single) hash can be used by guest. From guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) hash. If host allows several LEs (not but everything), and if we create guest with 'lewr', then the behavior is not consistent with HW behavior, as from guest's hardware's point of view, we can actually run any LE but we have to tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to some specific values. One compromise solution is we don't allow to create guest with 'lewr' specified, and at the meantime, only allow to create guest with host approved hashes specified in 'lehash'. This will make guest's behavior consistent to HW behavior but only allows guest to run one LE (which is specified by 'lehash' when guest is created). I'd like to hear comments from you guys. Paolo, do you also have comments here from KVM's side? Thanks, -Kai > >> >>> Also, some day Intel may fix its architectural design flaw [1] by >>> allowing EINIT to personalize the enclave's keying, and, if it's done >>> by a new argument to EINIT instead of an MSR, KVM will have to trap >>> EINIT to handle it. >> >> >> Looks this flaw is not the same issue as above (host enclave policy applies >> to guest)? > > It's related. Without this flaw, it might make sense to apply looser > policy in the guest as in the host. With this flaw, I think your > policy fails to have any real effect if you don't enforce it on > guests. > >> >>> >>>> >>>> One argument against this approach is KVM guest should never have impact >>>> on >>>> host side, meaning host should not be aware of such MSR change >>> >>> >>> As a somewhat generic comment, I don't like this approach to KVM >>> development. KVM mucks with lots of important architectural control >>> registers, and, in all too many cases, it tries to do so independently >>> of the other arch/x86 code. This ends up causing all kinds of grief. >>> >>> Can't KVM and the real x86 arch code cooperate for real? The host and >>> the KVM code are in arch/x86 in the same source tree. >> >> >> Currently on host SGX driver, which is pretty much self-contained, >> implements all SGX related staff. > > I will probably NAK this if it comes my way for inclusion upstream. > Just because it can be self-contained doesn't mean it should be > self-contained. > >>> >>> I would advocate for the former approach. (But you can't remap the >>> parameters due to TOCTOU issues, locking, etc. Just copy them. I >>> don't see why this is any more complicated than emulating any other >>> instruction that accesses memory.) >> >> >> No you cannot just copy. Because all address in guest's ENCLS parameters are >> guest's virtual address, we cannot use them to execute ENCLS in KVM. If any >> guest virtual addresses is used in ENCLS parameters, for example, >> PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's >> virtual address. >> >> Btw, what is TOCTOU issue? would you also elaborate locking issue? > > I was partially mis-remembering how this worked. It looks like > SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be > mapped. If KVM applied some policy to the launchable enclaves, it > would want to make sure that it only looks at fields that are copied > to make sure that the enclave that gets launched is the one it > verified. The locking issue I'm imagining is that the SECS (or > whatever else might be mapped) doesn't disappear and get reused for > something else while it's mapped in the host. Presumably KVM has an > existing mechanism for this, but maybe SECS is special because it's > not quite normal memory IIRC. > >> >>> >>> If necessary for some reason, trap EINIT when the SGXLEPUBKEYKASH is >>> wrong and then clear the exit flag once the MSRs are in sync. You'll >>> need to be careful to avoid races in which the host's value leaks into >>> the guest. I think you'll find that this is more complicated, less >>> flexible, and less performant than just handling ENCLS[EINIT] directly >>> in the host. >> >> >> Sorry I don't quite follow this part. Why would host's value leaks into >> guest? I suppose the *value* means host's IA32_SGXLEPUBKEYHASHn? guest's MSR >> read/write is always trapped and emulated by KVM. > > You'd need to make sure that this sequence of events doesn't happen: > > - Guest does EINIT and it exits. > - Host updates the MSRs and the ENCLS-exiting bitmap. > - Guest is preempted before it retries EINIT. > - A different host thread launches an enclave, thus changing the MSRs. > - Guest resumes and runs EINIT without exiting with the wrong MSR values. > >> >>> >>> [1] Guests that steal sealed data from each other or from the host can >>> manipulate that data without compromising the hypervisor by simply >>> loading the same enclave that its rightful owner would use. If you're >>> trying to use SGX to protect your crypto credentials so that, if >>> stolen, they can't be used outside the guest, I would consider this to >>> be a major flaw. It breaks the security model in a multi-tenant cloud >>> situation. I've complained about it before. >>> >> >> Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this >> case even it is leaked looks we cannot dig anything out just the hash value? > > Not sure what you mean. Are you asking about the lack of guest personalization? > > Concretely, imagine I write an enclave that seals my TLS client > certificate's private key and offers an API to sign TLS certificate > requests with it. This way, if my system is compromised, an attacker > can use the certificate only so long as they have access to my > machine. If I kick them out or if they merely get the ability to read > the sealed data but not to execute code, the private key should still > be safe. But, if this system is a VM guest, the attacker could run > the exact same enclave on another guest on the same physical CPU and > sign using my key. Whoops! >
On 5/13/2017 6:48 AM, Christopherson, Sean J wrote: > Andy Lutomirski <luto@kernel.org> wrote: >> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: >>> I am not sure whether the cost of writing to 4 MSRs would be *extremely* >>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing >>> to several MSRs, etc. >> >> I'm speculating that these MSRs may be rather unoptimized and hence >> unusualy slow. >> > > Good speculation :) We've been told to expect that writing the hash MSRs > will be at least 2.5x slower than normal MSRs. > >>> >>>> >>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >>>> with whatever lock is needed (probably just a mutex). Users of EINIT >>>> will take the mutex, compare the percpu variable to the desired value, >>>> and, if it's different, do WRMSR and update the percpu variable. >>>> >>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory >>>> state but *not* changing the MSRs. KVM will trap and emulate EINIT to >>>> support the same handling as the host. There is no action required at >>>> all on KVM guest entry and exit. >>> >>> >>> This is doable, but SGX driver needs to do those things and expose >>> interfaces for KVM to use. In terms of the percpu data, it is nice to have, >>> but I am not sure whether it is mandatory, as IMO EINIT is not even in >>> performance critical path. We can simply read old value from MSRs out and >>> compare whether the old equals to the new. >> >> I think the SGX driver should probably live in arch/x86, and the >> interface could be a simple percpu variable that is exported (from the >> main kernel image, not from a module). >> > > Agreed, this would make life easier for future SGX code that can't be > self-contained in the driver, e.g. EPC cgroup. Future architectural > enhancements might also require tighter integration with the kernel. > > >>>> >>>> I would advocate for the former approach. (But you can't remap the >>>> parameters due to TOCTOU issues, locking, etc. Just copy them. I >>>> don't see why this is any more complicated than emulating any other >>>> instruction that accesses memory.) >>> >>> >>> No you cannot just copy. Because all address in guest's ENCLS parameters are >>> guest's virtual address, we cannot use them to execute ENCLS in KVM. If any >>> guest virtual addresses is used in ENCLS parameters, for example, >>> PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's >>> virtual address. >>> >>> Btw, what is TOCTOU issue? would you also elaborate locking issue? >> >> I was partially mis-remembering how this worked. It looks like >> SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be >> mapped. If KVM applied some policy to the launchable enclaves, it >> would want to make sure that it only looks at fields that are copied >> to make sure that the enclave that gets launched is the one it >> verified. The locking issue I'm imagining is that the SECS (or >> whatever else might be mapped) doesn't disappear and get reused for >> something else while it's mapped in the host. Presumably KVM has an >> existing mechanism for this, but maybe SECS is special because it's >> not quite normal memory IIRC. I am thinking we might not need to check value in SIGSTRUCT or EINITTOKEN, as KVM needs to emulate guest IA32_SGXLEPUBKEYHASHn write. If we decide we should apply host's policy to KVM guest, it seems we can do the check when trapping guest write to IA32_SGXLEPUBKEYHASHn, and only host approved values (may be a white-list or something?) are allowed to be written in the guest. Thanks, -Kai >> > > Mapping the SECS in the host should not be an issue, AFAIK there aren't > any restrictions on the VA passed to EINIT as long as it resolves to a > SECS page in the EPCM, e.g. the SGX driver maps the SECS for EINIT with > an arbitrary VA. > > I don't think emulating EINIT introduces any TOCTOU race conditions that > wouldn't already exist. Evicting the SECS or modifying the page tables > on a different thread while executing EINIT is either a guest kernel bug > or bizarre behavior that the guest can already handle. Similarly, KVM > would need special handling for evicting a guest's SECS, regardless of > EINIT emulation. Agreed. > >>>> [1] Guests that steal sealed data from each other or from the host can >>>> manipulate that data without compromising the hypervisor by simply >>>> loading the same enclave that its rightful owner would use. If you're >>>> trying to use SGX to protect your crypto credentials so that, if >>>> stolen, they can't be used outside the guest, I would consider this to >>>> be a major flaw. It breaks the security model in a multi-tenant cloud >>>> situation. I've complained about it before. >>>> >>> >>> Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this >>> case even it is leaked looks we cannot dig anything out just the hash value? >> >> Not sure what you mean. Are you asking about the lack of guest >> personalization? >> >> Concretely, imagine I write an enclave that seals my TLS client >> certificate's private key and offers an API to sign TLS certificate >> requests with it. This way, if my system is compromised, an attacker >> can use the certificate only so long as they have access to my >> machine. If I kick them out or if they merely get the ability to read >> the sealed data but not to execute code, the private key should still >> be safe. But, if this system is a VM guest, the attacker could run >> the exact same enclave on another guest on the same physical CPU and >> sign using my key. Whoops! > > I know this issue has been raised internally as well, but I don't know > the status of the situation. I'll follow up and provide any information > I can. >
On 5/13/2017 6:48 AM, Christopherson, Sean J wrote: > Andy Lutomirski <luto@kernel.org> wrote: >> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: >>> I am not sure whether the cost of writing to 4 MSRs would be *extremely* >>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, writing >>> to several MSRs, etc. >> >> I'm speculating that these MSRs may be rather unoptimized and hence >> unusualy slow. >> > > Good speculation :) We've been told to expect that writing the hash MSRs > will be at least 2.5x slower than normal MSRs. > >>> >>>> >>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >>>> with whatever lock is needed (probably just a mutex). Users of EINIT >>>> will take the mutex, compare the percpu variable to the desired value, >>>> and, if it's different, do WRMSR and update the percpu variable. >>>> >>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory >>>> state but *not* changing the MSRs. KVM will trap and emulate EINIT to >>>> support the same handling as the host. There is no action required at >>>> all on KVM guest entry and exit. >>> >>> >>> This is doable, but SGX driver needs to do those things and expose >>> interfaces for KVM to use. In terms of the percpu data, it is nice to have, >>> but I am not sure whether it is mandatory, as IMO EINIT is not even in >>> performance critical path. We can simply read old value from MSRs out and >>> compare whether the old equals to the new. >> >> I think the SGX driver should probably live in arch/x86, and the >> interface could be a simple percpu variable that is exported (from the >> main kernel image, not from a module). >> > > Agreed, this would make life easier for future SGX code that can't be > self-contained in the driver, e.g. EPC cgroup. Future architectural > enhancements might also require tighter integration with the kernel. I think this is better as well. In this way we can leverage SGX code more easily. Some SGX detection code can be done in arch/x86/ as well, so that other code can access, ex, SGX capabilities, quickly. Another thing is actually SDK says SGX CPUID is per-thread, and we should not assume SGX CPUID will report the same info on all processors. I think it's better to check this as well. Moving SGX detection to identify_cpu can make this work more easily. Thanks, -Kai > > >>>> >>>> I would advocate for the former approach. (But you can't remap the >>>> parameters due to TOCTOU issues, locking, etc. Just copy them. I >>>> don't see why this is any more complicated than emulating any other >>>> instruction that accesses memory.) >>> >>> >>> No you cannot just copy. Because all address in guest's ENCLS parameters are >>> guest's virtual address, we cannot use them to execute ENCLS in KVM. If any >>> guest virtual addresses is used in ENCLS parameters, for example, >>> PAGEINFO.SECS, PAGEINFO.SECINFO/PCMD, etc, you have to remap them to KVM's >>> virtual address. >>> >>> Btw, what is TOCTOU issue? would you also elaborate locking issue? >> >> I was partially mis-remembering how this worked. It looks like >> SIGSTRUCT and EINITTOKEN could be copied but SECS would have to be >> mapped. If KVM applied some policy to the launchable enclaves, it >> would want to make sure that it only looks at fields that are copied >> to make sure that the enclave that gets launched is the one it >> verified. The locking issue I'm imagining is that the SECS (or >> whatever else might be mapped) doesn't disappear and get reused for >> something else while it's mapped in the host. Presumably KVM has an >> existing mechanism for this, but maybe SECS is special because it's >> not quite normal memory IIRC. >> > > Mapping the SECS in the host should not be an issue, AFAIK there aren't > any restrictions on the VA passed to EINIT as long as it resolves to a > SECS page in the EPCM, e.g. the SGX driver maps the SECS for EINIT with > an arbitrary VA. > > I don't think emulating EINIT introduces any TOCTOU race conditions that > wouldn't already exist. Evicting the SECS or modifying the page tables > on a different thread while executing EINIT is either a guest kernel bug > or bizarre behavior that the guest can already handle. Similarly, KVM > would need special handling for evicting a guest's SECS, regardless of > EINIT emulation. > >>>> [1] Guests that steal sealed data from each other or from the host can >>>> manipulate that data without compromising the hypervisor by simply >>>> loading the same enclave that its rightful owner would use. If you're >>>> trying to use SGX to protect your crypto credentials so that, if >>>> stolen, they can't be used outside the guest, I would consider this to >>>> be a major flaw. It breaks the security model in a multi-tenant cloud >>>> situation. I've complained about it before. >>>> >>> >>> Looks potentially only guest's IA32_SGXLEPUBKEYHASHn may be leaked? In this >>> case even it is leaked looks we cannot dig anything out just the hash value? >> >> Not sure what you mean. Are you asking about the lack of guest >> personalization? >> >> Concretely, imagine I write an enclave that seals my TLS client >> certificate's private key and offers an API to sign TLS certificate >> requests with it. This way, if my system is compromised, an attacker >> can use the certificate only so long as they have access to my >> machine. If I kick them out or if they merely get the ability to read >> the sealed data but not to execute code, the private key should still >> be safe. But, if this system is a VM guest, the attacker could run >> the exact same enclave on another guest on the same physical CPU and >> sign using my key. Whoops! > > I know this issue has been raised internally as well, but I don't know > the status of the situation. I'll follow up and provide any information > I can. >
On 16/05/2017 02:48, Huang, Kai wrote: > > > If host only allows one single LE to run, KVM can add a restrict that > only allows to create KVM guest with runtime change to > IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed (single) hash > can be used by guest. From guest's view, it simply has > IA32_FEATURE_CONTROL[bit17] cleared and has IA32_SGXLEPUBKEYHASHn with > default value to be host allowed (single) hash. > > If host allows several LEs (not but everything), and if we create guest > with 'lewr', then the behavior is not consistent with HW behavior, as > from guest's hardware's point of view, we can actually run any LE but we > have to tell guest that you are only allowed to change > IA32_SGXLEPUBKEYHASHn to some specific values. One compromise solution > is we don't allow to create guest with 'lewr' specified, and at the > meantime, only allow to create guest with host approved hashes specified > in 'lehash'. This will make guest's behavior consistent to HW behavior > but only allows guest to run one LE (which is specified by 'lehash' when > guest is created). > > I'd like to hear comments from you guys. > > Paolo, do you also have comments here from KVM's side? I would start with read-only LE hash (same as the host), which is a valid configuration anyway. Then later we can trap EINIT to emulate IA32_SGXLEPUBKEYHASHn. Paolo
On 16/05/2017 01:56, Huang, Kai wrote: >>> >>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >>> with whatever lock is needed (probably just a mutex). Users of EINIT >>> will take the mutex, compare the percpu variable to the desired value, >>> and, if it's different, do WRMSR and update the percpu variable. >> >> This is exactly what I've been suggesting internally: trap EINIT and >> check the value and write conditionally. >> >> I think this would be the best starting point. > > OK. Assuming we are going to have this percpu variable for > IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to > this percpu variable after KVM writes guest's value to hardware MSR? And > host (SGX driver) need to do the same thing (check the value and write > conditionally), correct? The percpu variable is just an optimization. If EINIT is not performance critical, you could even do the WRMSR unconditionally; what matters is having a mutex that covers both WRMSR and EINIT. Thanks, Paolo
On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > > On 5/12/2017 6:11 PM, Andy Lutomirski wrote: >> >> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> >> wrote: >>> >>> I am not sure whether the cost of writing to 4 MSRs would be *extremely* >>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, >>> writing >>> to several MSRs, etc. >> >> >> I'm speculating that these MSRs may be rather unoptimized and hence >> unusualy slow. >> >>> >>>> >>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >>>> with whatever lock is needed (probably just a mutex). Users of EINIT >>>> will take the mutex, compare the percpu variable to the desired value, >>>> and, if it's different, do WRMSR and update the percpu variable. >>>> >>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory >>>> state but *not* changing the MSRs. KVM will trap and emulate EINIT to >>>> support the same handling as the host. There is no action required at >>>> all on KVM guest entry and exit. >>> >>> >>> >>> This is doable, but SGX driver needs to do those things and expose >>> interfaces for KVM to use. In terms of the percpu data, it is nice to >>> have, >>> but I am not sure whether it is mandatory, as IMO EINIT is not even in >>> performance critical path. We can simply read old value from MSRs out and >>> compare whether the old equals to the new. >> >> >> I think the SGX driver should probably live in arch/x86, and the >> interface could be a simple percpu variable that is exported (from the >> main kernel image, not from a module). >> >>> >>>> >>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for >>>> other reasons: someone is going to want to implement policy for what >>>> enclaves are allowed that applies to guests as well as the host. >>> >>> >>> >>> I am not very convinced why "what enclaves are allowed" in host would >>> apply >>> to guest. Can you elaborate? I mean in general virtualization just focus >>> emulating hardware behavior. If a native machine is able to run any LE, >>> the >>> virtual machine should be able to as well (of course, with guest's >>> IA32_FEATURE_CONTROL[bit 17] set). >> >> >> I strongly disagree. I can imagine two classes of sensible policies >> for launch control: >> >> 1. Allow everything. This seems quite sensible to me. >> >> 2. Allow some things, and make sure that VMs have at least as >> restrictive a policy as host root has. After all, what's the point of >> restricting enclaves in the host if host code can simply spawn a >> little VM to run otherwise-disallowed enclaves? > > > What's the current SGX driver launch control policy? Yes allow everything > works for KVM so lets skip this. Are we going to support allowing several > LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE > staff but I don't know details. > > I am trying to find a way that we can both not break host launch control > policy, and be consistent to HW behavior (from guest's view). Currently we > can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn either > enabled or disabled. I introduced an Qemu parameter 'lewr' for this purpose. > Actually I introduced below Qemu SGX parameters for creating guest: > > -sgx epc=<size>,lehash='SHA-256 hash',lewr > > where 'epc' specifies guest's EPC size, lehash specifies (initial) value of > guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is allowed > to change guest's IA32_SGXLEPUBKEYHASHn at runtime. > > If host only allows one single LE to run, KVM can add a restrict that only > allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn > disabled, so that only host allowed (single) hash can be used by guest. From > guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has > IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) hash. > > If host allows several LEs (not but everything), and if we create guest with > 'lewr', then the behavior is not consistent with HW behavior, as from > guest's hardware's point of view, we can actually run any LE but we have to > tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to some > specific values. One compromise solution is we don't allow to create guest > with 'lewr' specified, and at the meantime, only allow to create guest with > host approved hashes specified in 'lehash'. This will make guest's behavior > consistent to HW behavior but only allows guest to run one LE (which is > specified by 'lehash' when guest is created). I'm not sure I entirely agree for a couple reasons. 1. I wouldn't be surprised if the kernel ends up implementing a policy in which it checks all enclaves (not just LEs) for acceptability. In fact, if the kernel sticks with the "no LE at all or just kernel-internal LE", then checking enclaves directly against some admin- or distro-provided signer list seems reasonable. This type of policy can't be forwarded to a guest by restricting allowed LE signers. But this is mostly speculation since AFAIK no one has seriously proposed any particular policy support and the plan was to not have this for the initial implementation. 2. While matching hardware behavior is nice in principle, there doesn't seem to be useful hardware behavior to match here. If the host had a list of five allowed LE signers, how exactly would it restrict the MSRs? They're not written atomically, so you can't directly tell what's being written. Also, the only way to fail an MSR write is to send #GP, and Windows (and even Linux) may not expect that. Linux doesn't panic due to #GP on MSR writes these days, but you still get a big fat warning. I wouldn't be at all surprised if Windows BSODs. ENCLS[EINIT], on the other hand, returns an actual error code. I'm not sure that a sensible error code exists ("SGX_HYPERVISOR_SAID_NO?", perhaps), but SGX_INVALID_EINITTOKEN seems to mean, more or less, "the CPU thinks you're not authorized to do this", so forcing that error code could be entirely reasonable. If the host policy is to allow a list of LE signers, you could return SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in the list. --Andy
On Tue, 2017-05-16 at 11:56 +1200, Huang, Kai wrote: > > On 5/16/2017 12:46 AM, Jarkko Sakkinen wrote: > > > > On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote: > > > > > > [resending due to some kind of kernel.org glitch -- sorry if anyone > > > gets duplicates] > > > > > > On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> > > > wrote: > > > > > > > > My current patch is based on this assumption. For KVM guest, naturally, > > > > we > > > > will write the cached value to real MSRs when vcpu is scheduled in. For > > > > host, SGX driver should write its own value to MSRs when it performs > > > > EINIT > > > > for LE. > > > This seems unnecessarily slow (perhaps *extremely* slow) to me. I > > > would propose a totally different solution: > > > > > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along > > > with whatever lock is needed (probably just a mutex). Users of EINIT > > > will take the mutex, compare the percpu variable to the desired value, > > > and, if it's different, do WRMSR and update the percpu variable. > > This is exactly what I've been suggesting internally: trap EINIT and > > check the value and write conditionally. > > > > I think this would be the best starting point. > OK. Assuming we are going to have this percpu variable for > IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to > this percpu variable after KVM writes guest's value to hardware MSR? And > host (SGX driver) need to do the same thing (check the value and write > conditionally), correct? > > Thanks, > -Kai Yes, the percpu variable is simply a cache so that the kernel doesn't have to do four RDMSRs every time it wants to do EINIT. KVM would still maintain shadow copies of the MSRs for each vcpu for emulating RDMSR, WRMSR and EINIT. I don't think KVM would even need to be aware of the percpu variable, i.e. the entire lock->(rd/wr)msr->EINIT->unlock sequence can probably be encapsulated in a single function that is called from both the primary SGX driver and from KVM.
On 5/17/2017 12:09 PM, Andy Lutomirski wrote: > On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: >> >> >> On 5/12/2017 6:11 PM, Andy Lutomirski wrote: >>> >>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> >>> wrote: >>>> >>>> I am not sure whether the cost of writing to 4 MSRs would be *extremely* >>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, >>>> writing >>>> to several MSRs, etc. >>> >>> >>> I'm speculating that these MSRs may be rather unoptimized and hence >>> unusualy slow. >>> >>>> >>>>> >>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >>>>> with whatever lock is needed (probably just a mutex). Users of EINIT >>>>> will take the mutex, compare the percpu variable to the desired value, >>>>> and, if it's different, do WRMSR and update the percpu variable. >>>>> >>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory >>>>> state but *not* changing the MSRs. KVM will trap and emulate EINIT to >>>>> support the same handling as the host. There is no action required at >>>>> all on KVM guest entry and exit. >>>> >>>> >>>> >>>> This is doable, but SGX driver needs to do those things and expose >>>> interfaces for KVM to use. In terms of the percpu data, it is nice to >>>> have, >>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in >>>> performance critical path. We can simply read old value from MSRs out and >>>> compare whether the old equals to the new. >>> >>> >>> I think the SGX driver should probably live in arch/x86, and the >>> interface could be a simple percpu variable that is exported (from the >>> main kernel image, not from a module). >>> >>>> >>>>> >>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for >>>>> other reasons: someone is going to want to implement policy for what >>>>> enclaves are allowed that applies to guests as well as the host. >>>> >>>> >>>> >>>> I am not very convinced why "what enclaves are allowed" in host would >>>> apply >>>> to guest. Can you elaborate? I mean in general virtualization just focus >>>> emulating hardware behavior. If a native machine is able to run any LE, >>>> the >>>> virtual machine should be able to as well (of course, with guest's >>>> IA32_FEATURE_CONTROL[bit 17] set). >>> >>> >>> I strongly disagree. I can imagine two classes of sensible policies >>> for launch control: >>> >>> 1. Allow everything. This seems quite sensible to me. >>> >>> 2. Allow some things, and make sure that VMs have at least as >>> restrictive a policy as host root has. After all, what's the point of >>> restricting enclaves in the host if host code can simply spawn a >>> little VM to run otherwise-disallowed enclaves? >> >> >> What's the current SGX driver launch control policy? Yes allow everything >> works for KVM so lets skip this. Are we going to support allowing several >> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE >> staff but I don't know details. >> >> I am trying to find a way that we can both not break host launch control >> policy, and be consistent to HW behavior (from guest's view). Currently we >> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn either >> enabled or disabled. I introduced an Qemu parameter 'lewr' for this purpose. >> Actually I introduced below Qemu SGX parameters for creating guest: >> >> -sgx epc=<size>,lehash='SHA-256 hash',lewr >> >> where 'epc' specifies guest's EPC size, lehash specifies (initial) value of >> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is allowed >> to change guest's IA32_SGXLEPUBKEYHASHn at runtime. >> >> If host only allows one single LE to run, KVM can add a restrict that only >> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn >> disabled, so that only host allowed (single) hash can be used by guest. From >> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has >> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) hash. >> >> If host allows several LEs (not but everything), and if we create guest with >> 'lewr', then the behavior is not consistent with HW behavior, as from >> guest's hardware's point of view, we can actually run any LE but we have to >> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to some >> specific values. One compromise solution is we don't allow to create guest >> with 'lewr' specified, and at the meantime, only allow to create guest with >> host approved hashes specified in 'lehash'. This will make guest's behavior >> consistent to HW behavior but only allows guest to run one LE (which is >> specified by 'lehash' when guest is created). > > I'm not sure I entirely agree for a couple reasons. > > 1. I wouldn't be surprised if the kernel ends up implementing a policy > in which it checks all enclaves (not just LEs) for acceptability. In > fact, if the kernel sticks with the "no LE at all or just > kernel-internal LE", then checking enclaves directly against some > admin- or distro-provided signer list seems reasonable. This type of > policy can't be forwarded to a guest by restricting allowed LE > signers. But this is mostly speculation since AFAIK no one has > seriously proposed any particular policy support and the plan was to > not have this for the initial implementation. > > 2. While matching hardware behavior is nice in principle, there > doesn't seem to be useful hardware behavior to match here. If the > host had a list of five allowed LE signers, how exactly would it > restrict the MSRs? They're not written atomically, so you can't > directly tell what's being written. In this case I actually plan to just allow creating guest with guest's IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is specified, creating guest will fail. And we only allow creating guest with host allowed hash values (with 'lehash=hash-value'), and if 'hash-value' specified by 'lehash' is not allowed by host, we also fail to create guest. We can only allow creating guest with 'lewr' specified when host allows anything. But in this way, we are restricting guest OS's ability to run LE, as only one LE, that is specified by 'lehash' parameter, can be run. But I think this won't hurt much, as multiple guests still are able to run different LEs? Also, the only way to fail an MSR > write is to send #GP, and Windows (and even Linux) may not expect > that. Linux doesn't panic due to #GP on MSR writes these days, but > you still get a big fat warning. I wouldn't be at all surprised if > Windows BSODs. We cannot allow writing some particular value to MSRs successfully, while injecting #GP when writing other values to the same MSRs. So #GP is not option. ENCLS[EINIT], on the other hand, returns an actual > error code. I'm not sure that a sensible error code exists > ("SGX_HYPERVISOR_SAID_NO?", perhaps), Looks no such error code exists. And we cannot return such error code to guest as such error code is only supposed to be valid when ENCLS is run in hypervisor. but SGX_INVALID_EINITTOKEN seems > to mean, more or less, "the CPU thinks you're not authorized to do > this", so forcing that error code could be entirely reasonable. > > If the host policy is to allow a list of LE signers, you could return > SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in > the list. But this would be inconsistent with HW behavior. If the hash value in guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, EINIT is not supposed to return SGX_INVALID_EINITTOKEN. I think from VMM's perspective, emulating HW behavior to be consistent with real HW behavior is very important. Paolo, would you provide your comments? > > --Andy >
On 5/17/2017 2:21 AM, Paolo Bonzini wrote: > > > On 16/05/2017 02:48, Huang, Kai wrote: >> >> >> If host only allows one single LE to run, KVM can add a restrict that >> only allows to create KVM guest with runtime change to >> IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed (single) hash >> can be used by guest. From guest's view, it simply has >> IA32_FEATURE_CONTROL[bit17] cleared and has IA32_SGXLEPUBKEYHASHn with >> default value to be host allowed (single) hash. >> >> If host allows several LEs (not but everything), and if we create guest >> with 'lewr', then the behavior is not consistent with HW behavior, as >> from guest's hardware's point of view, we can actually run any LE but we >> have to tell guest that you are only allowed to change >> IA32_SGXLEPUBKEYHASHn to some specific values. One compromise solution >> is we don't allow to create guest with 'lewr' specified, and at the >> meantime, only allow to create guest with host approved hashes specified >> in 'lehash'. This will make guest's behavior consistent to HW behavior >> but only allows guest to run one LE (which is specified by 'lehash' when >> guest is created). >> >> I'd like to hear comments from you guys. >> >> Paolo, do you also have comments here from KVM's side? > > I would start with read-only LE hash (same as the host), which is a > valid configuration anyway. Then later we can trap EINIT to emulate > IA32_SGXLEPUBKEYHASHn. You mean we can start with creating guest without Qemu 'lewr' parameter support, and always disallowing guest to change IA32_SGXLEPUBKEYHASHn? Even in this way, KVM still needs to emulate IA32_SGXLEPUBKEYHASHn (just allow MSR reading but not writing), and write guest's value to physical MSRs when running guest (trapping EINIT and write MSRs during EINIT is really just performance optimization). Because host can run multiple LEs and change MSRs. Your suggestion only works when runtime change to IA32_SGXLEPUBKEYHASHn is disabled on host (meaning physical machine). Thanks, -Kai > > Paolo >
On 5/18/2017 2:21 AM, Sean Christopherson wrote: > On Tue, 2017-05-16 at 11:56 +1200, Huang, Kai wrote: >> >> On 5/16/2017 12:46 AM, Jarkko Sakkinen wrote: >>> >>> On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote: >>>> >>>> [resending due to some kind of kernel.org glitch -- sorry if anyone >>>> gets duplicates] >>>> >>>> On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> >>>> wrote: >>>>> >>>>> My current patch is based on this assumption. For KVM guest, naturally, >>>>> we >>>>> will write the cached value to real MSRs when vcpu is scheduled in. For >>>>> host, SGX driver should write its own value to MSRs when it performs >>>>> EINIT >>>>> for LE. >>>> This seems unnecessarily slow (perhaps *extremely* slow) to me. I >>>> would propose a totally different solution: >>>> >>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >>>> with whatever lock is needed (probably just a mutex). Users of EINIT >>>> will take the mutex, compare the percpu variable to the desired value, >>>> and, if it's different, do WRMSR and update the percpu variable. >>> This is exactly what I've been suggesting internally: trap EINIT and >>> check the value and write conditionally. >>> >>> I think this would be the best starting point. >> OK. Assuming we are going to have this percpu variable for >> IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to >> this percpu variable after KVM writes guest's value to hardware MSR? And >> host (SGX driver) need to do the same thing (check the value and write >> conditionally), correct? >> >> Thanks, >> -Kai > > Yes, the percpu variable is simply a cache so that the kernel doesn't have to do > four RDMSRs every time it wants to do EINIT. KVM would still maintain shadow > copies of the MSRs for each vcpu for emulating RDMSR, WRMSR and EINIT. I don't > think KVM would even need to be aware of the percpu variable, i.e. the entire > lock->(rd/wr)msr->EINIT->unlock sequence can probably be encapsulated in a > single function that is called from both the primary SGX driver and from KVM. > You are making assumption that KVM will run ENCLS on behalf of guest. :) If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I actually prefer to using MTF, as with MTF we don't have to do all the remapping guest's virtual address to KVM's virtual address thing, if we don't need to look into guest's ENCLS parameter. But if we need to look into guest's ENCLS parameters, for example, to locate physical SECS page, or to update physical EPC page's info (that KVM needs to maintain), maybe we can choose running ENCLS on behalf of guest. But if we are going to run ENCLS on behalf of guest, I think providing a single function which does write MSR and EINIT for KVM should be a good idea. Thanks, -Kai
On 18/05/2017 09:54, Huang, Kai wrote: >> >> I would start with read-only LE hash (same as the host), which is a >> valid configuration anyway. Then later we can trap EINIT to emulate >> IA32_SGXLEPUBKEYHASHn. > > You mean we can start with creating guest without Qemu 'lewr' parameter > support, and always disallowing guest to change IA32_SGXLEPUBKEYHASHn? > Even in this way, KVM still needs to emulate IA32_SGXLEPUBKEYHASHn (just > allow MSR reading but not writing), and write guest's value to physical > MSRs when running guest (trapping EINIT and write MSRs during EINIT is > really just performance optimization). Because host can run multiple LEs > and change MSRs. Oh, I didn't know this. So I guess there isn't much benefit in skipping the trapping of EINIT. Paolo > Your suggestion only works when runtime change to > IA32_SGXLEPUBKEYHASHn is disabled on host (meaning physical machine).
On Tue, May 16, 2017 at 11:56:38AM +1200, Huang, Kai wrote: > > > On 5/16/2017 12:46 AM, Jarkko Sakkinen wrote: > > On Thu, May 11, 2017 at 08:28:37PM -0700, Andy Lutomirski wrote: > > > [resending due to some kind of kernel.org glitch -- sorry if anyone > > > gets duplicates] > > > > > > On Thu, May 11, 2017 at 5:32 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > > > My current patch is based on this assumption. For KVM guest, naturally, we > > > > will write the cached value to real MSRs when vcpu is scheduled in. For > > > > host, SGX driver should write its own value to MSRs when it performs EINIT > > > > for LE. > > > > > > This seems unnecessarily slow (perhaps *extremely* slow) to me. I > > > would propose a totally different solution: > > > > > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along > > > with whatever lock is needed (probably just a mutex). Users of EINIT > > > will take the mutex, compare the percpu variable to the desired value, > > > and, if it's different, do WRMSR and update the percpu variable. > > > > This is exactly what I've been suggesting internally: trap EINIT and > > check the value and write conditionally. > > > > I think this would be the best starting point. > > OK. Assuming we are going to have this percpu variable for > IA32_SGXLEPUBKEYHASHn, I suppose KVM also will update guest's value to this > percpu variable after KVM writes guest's value to hardware MSR? And host > (SGX driver) need to do the same thing (check the value and write > conditionally), correct? > > Thanks, > -Kai This how I would understand it, yes. /Jarkko
On Thu, May 18, 2017 at 1:14 AM, Huang, Kai <kai.huang@linux.intel.com> wrote: > You are making assumption that KVM will run ENCLS on behalf of guest. :) > > If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I > actually prefer to using MTF, as with MTF we don't have to do all the > remapping guest's virtual address to KVM's virtual address thing, if we > don't need to look into guest's ENCLS parameter. But if we need to look into > guest's ENCLS parameters, for example, to locate physical SECS page, or to > update physical EPC page's info (that KVM needs to maintain), maybe we can > choose running ENCLS on behalf of guest. After thinking about this a bit, I don't see how MTF helps. Currently, KVM works kind of like this: local_irq_disable(); set up stuff; VMRESUME; restore some host state; local_irq_enable(); If the guest is going to run with the EINIT-exiting bit clear, the only way I see this working is to modify KVM along the lines of: local_irq_disable(); set up stuff; if (condition here) { WRMSR to SGXLEPUBKEYHASH; update percpu shadow copyl clear EINIT-exiting bit; } else { set EINIT-exiting bit; } VMRESUME; restore some host state; local_irq_enable(); where "condition here" might be something like "the last VMRESUME exited due to EINIT". I don't see how MTF helps much. And if I were the KVM maintainer, I would probably prefer to trap EINIT instead of adding a special case to the main vm entry code.
On 5/21/2017 9:55 AM, Andy Lutomirski wrote: > On Thu, May 18, 2017 at 1:14 AM, Huang, Kai <kai.huang@linux.intel.com> wrote: >> You are making assumption that KVM will run ENCLS on behalf of guest. :) >> >> If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I >> actually prefer to using MTF, as with MTF we don't have to do all the >> remapping guest's virtual address to KVM's virtual address thing, if we >> don't need to look into guest's ENCLS parameter. But if we need to look into >> guest's ENCLS parameters, for example, to locate physical SECS page, or to >> update physical EPC page's info (that KVM needs to maintain), maybe we can >> choose running ENCLS on behalf of guest. > > After thinking about this a bit, I don't see how MTF helps. > Currently, KVM works kind of like this: > > local_irq_disable(); > set up stuff; > VMRESUME; > restore some host state; > local_irq_enable(); > > If the guest is going to run with the EINIT-exiting bit clear, the > only way I see this working is to modify KVM along the lines of: > > local_irq_disable(); > set up stuff; > if (condition here) { > WRMSR to SGXLEPUBKEYHASH; > update percpu shadow copyl > clear EINIT-exiting bit; > } else { > set EINIT-exiting bit; > } > VMRESUME; > restore some host state; > local_irq_enable(); > > where "condition here" might be something like "the last VMRESUME > exited due to EINIT". > > I don't see how MTF helps much. And if I were the KVM maintainer, I > would probably prefer to trap EINIT instead of adding a special case > to the main vm entry code. > Hi Andy, Thanks for your comments. However I didn't intend to use MTF in your way. The idea of using MTF (along with ENCLS VMEXIT) is, by turning on MTF VMEXIT upon ENCLS VMEXIT, we are able to mark a single step VMEXIT after ENCLS so that ENCLS can run in guest as single step. Let me explain how the two approaches work below in general, so that we can decide which is better. Only trapping EINIT in order to update IA32_SGXLEPUBKEYHASHn is relatively simpler but I'd compare the two in more general way, assuming we may want to trap more ENCLS in order to, ex, track EPC/Enclave status/info, in the future to support, ex, EPC oversubscription between KVM guests. Below diagram shows the basic idea of the two approaches. -------------------------------------------------------------- | ENCLS | -------------------------------------------------------------- | /|\ ENCLS VMEXIT | | VMENTRY | | \|/ | 1) identify which ENCLS leaf (RAX) 2) reconstruct/remap guest's ENCLS parameters, ex: - remap any guest VA (virtual address) to KVM VA - reconstruct PAGEINFO 3) do whatever needed before ENCLS, ex: - updating MSRs before EINIT 4) run ENCLS on behalf of guest, and skip ENCLS 5) emulate ENCLS result (succeeded or not) - update guest's RAX-RDX. - and/or inject #GP (or #UD). 6) do whatever needed after ENCLS, ex: - updating EPC/Enclave status/info 1) Run ENCLS on behalf of guest -------------------------------------------------------------- | ENCLS | -------------------------------------------------------------- |/|\ |/|\ ENCLS VMEXIT | | VMENTRY MTF VMEXIT | | VMENTRY | | | | \|/| \|/| 1) Turn off EMCLS VMEXIT 1) Turn off MTF VMEXIT 2) turn on MTF VMEXIT 2) Turn on ENCLS VMEXIT 3) cache ENCLS parameters 3) check whether ENCLS has run (ENCLS changes RAX) 4) check whether ENCLS succeeded 4) do whatever needed before or not. ENCLS 5) do whatever needed after ENCLS 2) Using MTF The concern of running ENCLS on behalf of guest is emulating ENCLS error. KVM needs to *correctly* emulate ENCLS error to guest so that the error we inject to guest can reflect the right behavior as if ENCLS run in guest. Running ENCLS in root-mode may be potentially different running ENCLS in non-root mode, therefore we have to go through all possible error codes to make sure we can emulate. And for some error code, ex, SGX_LOCKFAIL, we can handle it in KVM and don't have to inject error to guest. So the point is we have to go through all error code to make sure KVM can emulate ENCLS error code correctly for guest. Another argument is Intel may add new error codes in the future when more SGX functionalities are introduced, so emulating error code may be a burden. Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT upon ENCLS VMEXIT, the MTF won't be absolutely pending at end of that ENCLS. For example, MTF may be pending at end of interrupt (cannot recall exactly) if event is pending during VMENTRY from ENCLS VMEXIT. Therefore we have to do additional thing to check whether this MTF VMEXIT really happens after ENCLS run (step 3 above). And depending on what we need to do, we may need to check whether ENCLS succeeded or not in guest, which is also tricky, as ENCLS can fail in either setting error code in RAX, or generating #GP or #UD (step 4 above). We may still need to do gva->gpa->hpa, ex, in order to locate EPC/SECS page and update status, depending on the purpose of trapping ENCLS. But by using MTF, we don't have to worry about ENCLS error emulation, as ENCLS runs in guest, thus we don't need to worry about this root-mode and non-root mode difference. I think this is the major reason that we want to use MTF.
On 5/23/2017 5:43 PM, Huang, Kai wrote: > > > On 5/21/2017 9:55 AM, Andy Lutomirski wrote: >> On Thu, May 18, 2017 at 1:14 AM, Huang, Kai >> <kai.huang@linux.intel.com> wrote: >>> You are making assumption that KVM will run ENCLS on behalf of guest. :) >>> >>> If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I >>> actually prefer to using MTF, as with MTF we don't have to do all the >>> remapping guest's virtual address to KVM's virtual address thing, if we >>> don't need to look into guest's ENCLS parameter. But if we need to >>> look into >>> guest's ENCLS parameters, for example, to locate physical SECS page, >>> or to >>> update physical EPC page's info (that KVM needs to maintain), maybe >>> we can >>> choose running ENCLS on behalf of guest. >> >> After thinking about this a bit, I don't see how MTF helps. >> Currently, KVM works kind of like this: >> >> local_irq_disable(); >> set up stuff; >> VMRESUME; >> restore some host state; >> local_irq_enable(); >> >> If the guest is going to run with the EINIT-exiting bit clear, the >> only way I see this working is to modify KVM along the lines of: >> >> local_irq_disable(); >> set up stuff; >> if (condition here) { >> WRMSR to SGXLEPUBKEYHASH; >> update percpu shadow copyl >> clear EINIT-exiting bit; >> } else { >> set EINIT-exiting bit; >> } >> VMRESUME; >> restore some host state; >> local_irq_enable(); >> >> where "condition here" might be something like "the last VMRESUME >> exited due to EINIT". >> >> I don't see how MTF helps much. And if I were the KVM maintainer, I >> would probably prefer to trap EINIT instead of adding a special case >> to the main vm entry code. >> > > Hi Andy, > > Thanks for your comments. However I didn't intend to use MTF in your > way. The idea of using MTF (along with ENCLS VMEXIT) is, by turning on > MTF VMEXIT upon ENCLS VMEXIT, we are able to mark a single step VMEXIT > after ENCLS so that ENCLS can run in guest as single step. > > Let me explain how the two approaches work below in general, so that we > can decide which is better. Only trapping EINIT in order to update > IA32_SGXLEPUBKEYHASHn is relatively simpler but I'd compare the two in > more general way, assuming we may want to trap more ENCLS in order to, > ex, track EPC/Enclave status/info, in the future to support, ex, EPC > oversubscription between KVM guests. > > Below diagram shows the basic idea of the two approaches. > > > -------------------------------------------------------------- > | ENCLS | > -------------------------------------------------------------- > | /|\ > ENCLS VMEXIT | | VMENTRY > | | > \|/ | Looks the diagrams are broken. Sorry. I need to verify before sending next time. But looks they are still understandable? Thanks, Kai > > 1) identify which ENCLS leaf (RAX) > 2) reconstruct/remap guest's ENCLS parameters, ex: > - remap any guest VA (virtual address) to KVM VA > - reconstruct PAGEINFO > 3) do whatever needed before ENCLS, ex: > - updating MSRs before EINIT > 4) run ENCLS on behalf of guest, and skip ENCLS > 5) emulate ENCLS result (succeeded or not) > - update guest's RAX-RDX. > - and/or inject #GP (or #UD). > 6) do whatever needed after ENCLS, ex: > - updating EPC/Enclave status/info > > 1) Run ENCLS on behalf of guest > > > -------------------------------------------------------------- > | ENCLS | > -------------------------------------------------------------- > |/|\ |/|\ > ENCLS VMEXIT | | VMENTRY MTF VMEXIT | | VMENTRY > | | | | > \|/| \|/| > 1) Turn off EMCLS VMEXIT 1) Turn off MTF VMEXIT > 2) turn on MTF VMEXIT 2) Turn on ENCLS VMEXIT > 3) cache ENCLS parameters 3) check whether ENCLS has run > (ENCLS changes RAX) 4) check whether ENCLS succeeded > 4) do whatever needed before or not. > ENCLS 5) do whatever needed after ENCLS > > 2) Using MTF > > The concern of running ENCLS on behalf of guest is emulating ENCLS > error. KVM needs to *correctly* emulate ENCLS error to guest so that the > error we inject to guest can reflect the right behavior as if ENCLS run > in guest. Running ENCLS in root-mode may be potentially different > running ENCLS in non-root mode, therefore we have to go through all > possible error codes to make sure we can emulate. And for some error > code, ex, SGX_LOCKFAIL, we can handle it in KVM and don't have to inject > error to guest. So the point is we have to go through all error code to > make sure KVM can emulate ENCLS error code correctly for guest. > Another argument is Intel may add new error codes in the future when > more SGX functionalities are introduced, so emulating error code may be > a burden. > > Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT > upon ENCLS VMEXIT, the MTF won't be absolutely pending at end of that > ENCLS. For example, MTF may be pending at end of interrupt (cannot > recall exactly) if event is pending during VMENTRY from ENCLS VMEXIT. > Therefore we have to do additional thing to check whether this MTF > VMEXIT really happens after ENCLS run (step 3 above). And depending on > what we need to do, we may need to check whether ENCLS succeeded or not > in guest, which is also tricky, as ENCLS can fail in either setting > error code in RAX, or generating #GP or #UD (step 4 above). We may still > need to do gva->gpa->hpa, ex, in order to locate EPC/SECS page and > update status, depending on the purpose of trapping ENCLS. > > But by using MTF, we don't have to worry about ENCLS error emulation, as > ENCLS runs in guest, thus we don't need to worry about this root-mode > and non-root mode difference. I think this is the major reason that we > want to use MTF. >
On Mon, May 22, 2017 at 10:43 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > > On 5/21/2017 9:55 AM, Andy Lutomirski wrote: >> >> On Thu, May 18, 2017 at 1:14 AM, Huang, Kai <kai.huang@linux.intel.com> >> wrote: >>> >>> You are making assumption that KVM will run ENCLS on behalf of guest. :) >>> >>> If we don't need to look into guest's SIGSTRUCT, EINITTOKEN, etc, then I >>> actually prefer to using MTF, as with MTF we don't have to do all the >>> remapping guest's virtual address to KVM's virtual address thing, if we >>> don't need to look into guest's ENCLS parameter. But if we need to look >>> into >>> guest's ENCLS parameters, for example, to locate physical SECS page, or >>> to >>> update physical EPC page's info (that KVM needs to maintain), maybe we >>> can >>> choose running ENCLS on behalf of guest. >> >> >> After thinking about this a bit, I don't see how MTF helps. >> Currently, KVM works kind of like this: >> >> local_irq_disable(); >> set up stuff; >> VMRESUME; >> restore some host state; >> local_irq_enable(); >> >> If the guest is going to run with the EINIT-exiting bit clear, the >> only way I see this working is to modify KVM along the lines of: >> >> local_irq_disable(); >> set up stuff; >> if (condition here) { >> WRMSR to SGXLEPUBKEYHASH; >> update percpu shadow copyl >> clear EINIT-exiting bit; >> } else { >> set EINIT-exiting bit; >> } >> VMRESUME; >> restore some host state; >> local_irq_enable(); >> >> where "condition here" might be something like "the last VMRESUME >> exited due to EINIT". >> >> I don't see how MTF helps much. And if I were the KVM maintainer, I >> would probably prefer to trap EINIT instead of adding a special case >> to the main vm entry code. >> > > Hi Andy, > > Thanks for your comments. However I didn't intend to use MTF in your way. > The idea of using MTF (along with ENCLS VMEXIT) is, by turning on MTF VMEXIT > upon ENCLS VMEXIT, we are able to mark a single step VMEXIT after ENCLS so > that ENCLS can run in guest as single step. > > Let me explain how the two approaches work below in general, so that we can > decide which is better. Only trapping EINIT in order to update > IA32_SGXLEPUBKEYHASHn is relatively simpler but I'd compare the two in more > general way, assuming we may want to trap more ENCLS in order to, ex, track > EPC/Enclave status/info, in the future to support, ex, EPC oversubscription > between KVM guests. > > Below diagram shows the basic idea of the two approaches. > ... > > -------------------------------------------------------------- > | ENCLS | > -------------------------------------------------------------- > |/|\ |/|\ > ENCLS VMEXIT | | VMENTRY MTF VMEXIT | | VMENTRY > | | | | > \|/| \|/| > 1) Turn off EMCLS VMEXIT 1) Turn off MTF VMEXIT > 2) turn on MTF VMEXIT 2) Turn on ENCLS VMEXIT > 3) cache ENCLS parameters 3) check whether ENCLS has run > (ENCLS changes RAX) 4) check whether ENCLS succeeded > 4) do whatever needed before or not. > ENCLS 5) do whatever needed after ENCLS > > 2) Using MTF > ... > Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT upon > ENCLS VMEXIT, the MTF won't be absolutely pending at end of that ENCLS. For > example, MTF may be pending at end of interrupt (cannot recall exactly) if > event is pending during VMENTRY from ENCLS VMEXIT. Therefore we have to do > additional thing to check whether this MTF VMEXIT really happens after ENCLS > run (step 3 above). And depending on what we need to do, we may need to > check whether ENCLS succeeded or not in guest, which is also tricky, as > ENCLS can fail in either setting error code in RAX, or generating #GP or #UD > (step 4 above). We may still need to do gva->gpa->hpa, ex, in order to > locate EPC/SECS page and update status, depending on the purpose of trapping > ENCLS. I think there are some issues here. First, you're making a big assumption that, when you resume the guest with MTF set, the instruction that gets executed is still ENCLS[EINIT]. That's not guaranteed as is -- you could race against another vCPU that changes the instruction, the instruction could be in IO space, host userspace could be messing with you, etc. Second, I don't think there's any precedent at all in KVM for doing this. Third, you still need to make sure that the MSRs retain the value you want them to have by the time ENCLS happens. I think that, by the time you resolve all of these issues, it'll look a lot like the pseudocode I emailed out, and MTF won't be necessary any more. > > But by using MTF, we don't have to worry about ENCLS error emulation, as > ENCLS runs in guest, thus we don't need to worry about this root-mode and > non-root mode difference. I think this is the major reason that we want to > use MTF. I don't see why error emulation is hard. If the host does ENCLS on behalf of the guest and it returns an error, can't you return exactly the same error to the guest with no further processing? The only tricky case is where the host rejects due to its own policy and you have to choose an error code. --Andy
On 23/05/2017 18:34, Andy Lutomirski wrote: > >> Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT upon >> ENCLS VMEXIT, the MTF won't be absolutely pending at end of that ENCLS. For >> example, MTF may be pending at end of interrupt (cannot recall exactly) if >> event is pending during VMENTRY from ENCLS VMEXIT. Therefore we have to do >> additional thing to check whether this MTF VMEXIT really happens after ENCLS >> run (step 3 above). And depending on what we need to do, we may need to >> check whether ENCLS succeeded or not in guest, which is also tricky, as >> ENCLS can fail in either setting error code in RAX, or generating #GP or #UD >> (step 4 above). We may still need to do gva->gpa->hpa, ex, in order to >> locate EPC/SECS page and update status, depending on the purpose of trapping >> ENCLS. > I think there are some issues here. > > First, you're making a big assumption that, when you resume the guest > with MTF set, the instruction that gets executed is still > ENCLS[EINIT]. That's not guaranteed as is -- you could race against > another vCPU that changes the instruction, the instruction could be in > IO space, host userspace could be messing with you, etc. Second, I > don't think there's any precedent at all in KVM for doing this. > Third, you still need to make sure that the MSRs retain the value you > want them to have by the time ENCLS happens. I think that, by the > time you resolve all of these issues, it'll look a lot like the > pseudocode I emailed out, and MTF won't be necessary any more. Agreed. Emulation in the host is better. Paolo
On 5/24/2017 4:43 AM, Paolo Bonzini wrote: > > > On 23/05/2017 18:34, Andy Lutomirski wrote: >> >>> Using MTF is also a little bit tricky, as when we turn on MTF VMEXIT upon >>> ENCLS VMEXIT, the MTF won't be absolutely pending at end of that ENCLS. For >>> example, MTF may be pending at end of interrupt (cannot recall exactly) if >>> event is pending during VMENTRY from ENCLS VMEXIT. Therefore we have to do >>> additional thing to check whether this MTF VMEXIT really happens after ENCLS >>> run (step 3 above). And depending on what we need to do, we may need to >>> check whether ENCLS succeeded or not in guest, which is also tricky, as >>> ENCLS can fail in either setting error code in RAX, or generating #GP or #UD >>> (step 4 above). We may still need to do gva->gpa->hpa, ex, in order to >>> locate EPC/SECS page and update status, depending on the purpose of trapping >>> ENCLS. >> I think there are some issues here. >> >> First, you're making a big assumption that, when you resume the guest >> with MTF set, the instruction that gets executed is still >> ENCLS[EINIT]. That's not guaranteed as is -- you could race against >> another vCPU that changes the instruction, the instruction could be in >> IO space, host userspace could be messing with you, etc. Second, I >> don't think there's any precedent at all in KVM for doing this. >> Third, you still need to make sure that the MSRs retain the value you >> want them to have by the time ENCLS happens. I think that, by the >> time you resolve all of these issues, it'll look a lot like the >> pseudocode I emailed out, and MTF won't be necessary any more. > > Agreed. Emulation in the host is better. Hi Andy/Paolo, Thanks for comments. I'll follow your suggestion in v2. Thanks, -Kai > > Paolo >
On 5/18/2017 7:45 PM, Huang, Kai wrote: > > > On 5/17/2017 12:09 PM, Andy Lutomirski wrote: >> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai >> <kai.huang@linux.intel.com> wrote: >>> >>> >>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote: >>>> >>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> >>>> wrote: >>>>> >>>>> I am not sure whether the cost of writing to 4 MSRs would be >>>>> *extremely* >>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, >>>>> writing >>>>> to several MSRs, etc. >>>> >>>> >>>> I'm speculating that these MSRs may be rather unoptimized and hence >>>> unusualy slow. >>>> >>>>> >>>>>> >>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >>>>>> with whatever lock is needed (probably just a mutex). Users of EINIT >>>>>> will take the mutex, compare the percpu variable to the desired >>>>>> value, >>>>>> and, if it's different, do WRMSR and update the percpu variable. >>>>>> >>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its >>>>>> in-memory >>>>>> state but *not* changing the MSRs. KVM will trap and emulate >>>>>> EINIT to >>>>>> support the same handling as the host. There is no action >>>>>> required at >>>>>> all on KVM guest entry and exit. >>>>> >>>>> >>>>> >>>>> This is doable, but SGX driver needs to do those things and expose >>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to >>>>> have, >>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in >>>>> performance critical path. We can simply read old value from MSRs >>>>> out and >>>>> compare whether the old equals to the new. >>>> >>>> >>>> I think the SGX driver should probably live in arch/x86, and the >>>> interface could be a simple percpu variable that is exported (from the >>>> main kernel image, not from a module). >>>> >>>>> >>>>>> >>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for >>>>>> other reasons: someone is going to want to implement policy for what >>>>>> enclaves are allowed that applies to guests as well as the host. >>>>> >>>>> >>>>> >>>>> I am not very convinced why "what enclaves are allowed" in host would >>>>> apply >>>>> to guest. Can you elaborate? I mean in general virtualization just >>>>> focus >>>>> emulating hardware behavior. If a native machine is able to run any >>>>> LE, >>>>> the >>>>> virtual machine should be able to as well (of course, with guest's >>>>> IA32_FEATURE_CONTROL[bit 17] set). >>>> >>>> >>>> I strongly disagree. I can imagine two classes of sensible policies >>>> for launch control: >>>> >>>> 1. Allow everything. This seems quite sensible to me. >>>> >>>> 2. Allow some things, and make sure that VMs have at least as >>>> restrictive a policy as host root has. After all, what's the point of >>>> restricting enclaves in the host if host code can simply spawn a >>>> little VM to run otherwise-disallowed enclaves? >>> >>> >>> What's the current SGX driver launch control policy? Yes allow >>> everything >>> works for KVM so lets skip this. Are we going to support allowing >>> several >>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE >>> staff but I don't know details. >>> >>> I am trying to find a way that we can both not break host launch control >>> policy, and be consistent to HW behavior (from guest's view). >>> Currently we >>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn >>> either >>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this >>> purpose. >>> Actually I introduced below Qemu SGX parameters for creating guest: >>> >>> -sgx epc=<size>,lehash='SHA-256 hash',lewr >>> >>> where 'epc' specifies guest's EPC size, lehash specifies (initial) >>> value of >>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is >>> allowed >>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime. >>> >>> If host only allows one single LE to run, KVM can add a restrict that >>> only >>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn >>> disabled, so that only host allowed (single) hash can be used by >>> guest. From >>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has >>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) >>> hash. >>> >>> If host allows several LEs (not but everything), and if we create >>> guest with >>> 'lewr', then the behavior is not consistent with HW behavior, as from >>> guest's hardware's point of view, we can actually run any LE but we >>> have to >>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn >>> to some >>> specific values. One compromise solution is we don't allow to create >>> guest >>> with 'lewr' specified, and at the meantime, only allow to create >>> guest with >>> host approved hashes specified in 'lehash'. This will make guest's >>> behavior >>> consistent to HW behavior but only allows guest to run one LE (which is >>> specified by 'lehash' when guest is created). >> >> I'm not sure I entirely agree for a couple reasons. >> >> 1. I wouldn't be surprised if the kernel ends up implementing a policy >> in which it checks all enclaves (not just LEs) for acceptability. In >> fact, if the kernel sticks with the "no LE at all or just >> kernel-internal LE", then checking enclaves directly against some >> admin- or distro-provided signer list seems reasonable. This type of >> policy can't be forwarded to a guest by restricting allowed LE >> signers. But this is mostly speculation since AFAIK no one has >> seriously proposed any particular policy support and the plan was to >> not have this for the initial implementation. >> >> 2. While matching hardware behavior is nice in principle, there >> doesn't seem to be useful hardware behavior to match here. If the >> host had a list of five allowed LE signers, how exactly would it >> restrict the MSRs? They're not written atomically, so you can't >> directly tell what's being written. > > In this case I actually plan to just allow creating guest with guest's > IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is > specified, creating guest will fail. And we only allow creating guest > with host allowed hash values (with 'lehash=hash-value'), and if > 'hash-value' specified by 'lehash' is not allowed by host, we also fail > to create guest. > > We can only allow creating guest with 'lewr' specified when host allows > anything. > > But in this way, we are restricting guest OS's ability to run LE, as > only one LE, that is specified by 'lehash' parameter, can be run. But I > think this won't hurt much, as multiple guests still are able to run > different LEs? > > Also, the only way to fail an MSR >> write is to send #GP, and Windows (and even Linux) may not expect >> that. Linux doesn't panic due to #GP on MSR writes these days, but >> you still get a big fat warning. I wouldn't be at all surprised if >> Windows BSODs. > > We cannot allow writing some particular value to MSRs successfully, > while injecting #GP when writing other values to the same MSRs. So #GP > is not option. > > ENCLS[EINIT], on the other hand, returns an actual >> error code. I'm not sure that a sensible error code exists >> ("SGX_HYPERVISOR_SAID_NO?", perhaps), > > Looks no such error code exists. And we cannot return such error code to > guest as such error code is only supposed to be valid when ENCLS is run > in hypervisor. > > but SGX_INVALID_EINITTOKEN seems >> to mean, more or less, "the CPU thinks you're not authorized to do >> this", so forcing that error code could be entirely reasonable. >> >> If the host policy is to allow a list of LE signers, you could return >> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in >> the list. > > But this would be inconsistent with HW behavior. If the hash value in > guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, > EINIT is not supposed to return SGX_INVALID_EINITTOKEN. > > I think from VMM's perspective, emulating HW behavior to be consistent > with real HW behavior is very important. > > Paolo, would you provide your comments? Hi all, This has been quite for a while and I'd like to start discussion again. Jarkko told me that currently he only supports one LE in SGX driver, but I am not sure whether he is going to extend in the future or not. I think this might also depend on requirements from customers. Andy, If we only support one LE in driver, then we can only support the same LE for all KVM guests, according to your comments that host kernel launch control policy should also apply to KVM guests? WOuld you comments more? Jarkko, Could you help to clarify the whole launch control policy in host side so that we can have a better understanding together? Thanks, -Kai > >> >> --Andy >>
On Tue, Jun 6, 2017 at 1:52 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > > On 5/18/2017 7:45 PM, Huang, Kai wrote: >> >> >> >> On 5/17/2017 12:09 PM, Andy Lutomirski wrote: >>> >>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com> >>> wrote: >>>> >>>> >>>> >>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote: >>>>> >>>>> >>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> >>>>> wrote: >>>>>> >>>>>> >>>>>> I am not sure whether the cost of writing to 4 MSRs would be >>>>>> *extremely* >>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, >>>>>> writing >>>>>> to several MSRs, etc. >>>>> >>>>> >>>>> >>>>> I'm speculating that these MSRs may be rather unoptimized and hence >>>>> unusualy slow. >>>>> >>>>>> >>>>>>> >>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >>>>>>> with whatever lock is needed (probably just a mutex). Users of EINIT >>>>>>> will take the mutex, compare the percpu variable to the desired >>>>>>> value, >>>>>>> and, if it's different, do WRMSR and update the percpu variable. >>>>>>> >>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its >>>>>>> in-memory >>>>>>> state but *not* changing the MSRs. KVM will trap and emulate EINIT >>>>>>> to >>>>>>> support the same handling as the host. There is no action required >>>>>>> at >>>>>>> all on KVM guest entry and exit. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> This is doable, but SGX driver needs to do those things and expose >>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to >>>>>> have, >>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in >>>>>> performance critical path. We can simply read old value from MSRs out >>>>>> and >>>>>> compare whether the old equals to the new. >>>>> >>>>> >>>>> >>>>> I think the SGX driver should probably live in arch/x86, and the >>>>> interface could be a simple percpu variable that is exported (from the >>>>> main kernel image, not from a module). >>>>> >>>>>> >>>>>>> >>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for >>>>>>> other reasons: someone is going to want to implement policy for what >>>>>>> enclaves are allowed that applies to guests as well as the host. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I am not very convinced why "what enclaves are allowed" in host would >>>>>> apply >>>>>> to guest. Can you elaborate? I mean in general virtualization just >>>>>> focus >>>>>> emulating hardware behavior. If a native machine is able to run any >>>>>> LE, >>>>>> the >>>>>> virtual machine should be able to as well (of course, with guest's >>>>>> IA32_FEATURE_CONTROL[bit 17] set). >>>>> >>>>> >>>>> >>>>> I strongly disagree. I can imagine two classes of sensible policies >>>>> for launch control: >>>>> >>>>> 1. Allow everything. This seems quite sensible to me. >>>>> >>>>> 2. Allow some things, and make sure that VMs have at least as >>>>> restrictive a policy as host root has. After all, what's the point of >>>>> restricting enclaves in the host if host code can simply spawn a >>>>> little VM to run otherwise-disallowed enclaves? >>>> >>>> >>>> >>>> What's the current SGX driver launch control policy? Yes allow >>>> everything >>>> works for KVM so lets skip this. Are we going to support allowing >>>> several >>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE >>>> staff but I don't know details. >>>> >>>> I am trying to find a way that we can both not break host launch control >>>> policy, and be consistent to HW behavior (from guest's view). Currently >>>> we >>>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn >>>> either >>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this >>>> purpose. >>>> Actually I introduced below Qemu SGX parameters for creating guest: >>>> >>>> -sgx epc=<size>,lehash='SHA-256 hash',lewr >>>> >>>> where 'epc' specifies guest's EPC size, lehash specifies (initial) value >>>> of >>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is >>>> allowed >>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime. >>>> >>>> If host only allows one single LE to run, KVM can add a restrict that >>>> only >>>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn >>>> disabled, so that only host allowed (single) hash can be used by guest. >>>> From >>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has >>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) >>>> hash. >>>> >>>> If host allows several LEs (not but everything), and if we create guest >>>> with >>>> 'lewr', then the behavior is not consistent with HW behavior, as from >>>> guest's hardware's point of view, we can actually run any LE but we have >>>> to >>>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to >>>> some >>>> specific values. One compromise solution is we don't allow to create >>>> guest >>>> with 'lewr' specified, and at the meantime, only allow to create guest >>>> with >>>> host approved hashes specified in 'lehash'. This will make guest's >>>> behavior >>>> consistent to HW behavior but only allows guest to run one LE (which is >>>> specified by 'lehash' when guest is created). >>> >>> >>> I'm not sure I entirely agree for a couple reasons. >>> >>> 1. I wouldn't be surprised if the kernel ends up implementing a policy >>> in which it checks all enclaves (not just LEs) for acceptability. In >>> fact, if the kernel sticks with the "no LE at all or just >>> kernel-internal LE", then checking enclaves directly against some >>> admin- or distro-provided signer list seems reasonable. This type of >>> policy can't be forwarded to a guest by restricting allowed LE >>> signers. But this is mostly speculation since AFAIK no one has >>> seriously proposed any particular policy support and the plan was to >>> not have this for the initial implementation. >>> >>> 2. While matching hardware behavior is nice in principle, there >>> doesn't seem to be useful hardware behavior to match here. If the >>> host had a list of five allowed LE signers, how exactly would it >>> restrict the MSRs? They're not written atomically, so you can't >>> directly tell what's being written. >> >> >> In this case I actually plan to just allow creating guest with guest's >> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is >> specified, creating guest will fail. And we only allow creating guest with >> host allowed hash values (with 'lehash=hash-value'), and if 'hash-value' >> specified by 'lehash' is not allowed by host, we also fail to create guest. >> >> We can only allow creating guest with 'lewr' specified when host allows >> anything. >> >> But in this way, we are restricting guest OS's ability to run LE, as only >> one LE, that is specified by 'lehash' parameter, can be run. But I think >> this won't hurt much, as multiple guests still are able to run different >> LEs? >> >> Also, the only way to fail an MSR >>> >>> write is to send #GP, and Windows (and even Linux) may not expect >>> that. Linux doesn't panic due to #GP on MSR writes these days, but >>> you still get a big fat warning. I wouldn't be at all surprised if >>> Windows BSODs. >> >> >> We cannot allow writing some particular value to MSRs successfully, while >> injecting #GP when writing other values to the same MSRs. So #GP is not >> option. >> >> ENCLS[EINIT], on the other hand, returns an actual >>> >>> error code. I'm not sure that a sensible error code exists >>> ("SGX_HYPERVISOR_SAID_NO?", perhaps), >> >> >> Looks no such error code exists. And we cannot return such error code to >> guest as such error code is only supposed to be valid when ENCLS is run in >> hypervisor. >> >> but SGX_INVALID_EINITTOKEN seems >>> >>> to mean, more or less, "the CPU thinks you're not authorized to do >>> this", so forcing that error code could be entirely reasonable. >>> >>> If the host policy is to allow a list of LE signers, you could return >>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in >>> the list. >> >> >> But this would be inconsistent with HW behavior. If the hash value in >> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, EINIT >> is not supposed to return SGX_INVALID_EINITTOKEN. >> >> I think from VMM's perspective, emulating HW behavior to be consistent >> with real HW behavior is very important. >> >> Paolo, would you provide your comments? > > > Hi all, > > This has been quite for a while and I'd like to start discussion again. > Jarkko told me that currently he only supports one LE in SGX driver, but I > am not sure whether he is going to extend in the future or not. I think this > might also depend on requirements from customers. > > Andy, > > If we only support one LE in driver, then we can only support the same LE > for all KVM guests, according to your comments that host kernel launch > control policy should also apply to KVM guests? WOuld you comments more? I'm not at all convinced that, going forward, Linux's host-side launch control policy will be entirely contained in the LE. I'm also not convinced that non-Linux guests will function at all under this type of policy -- what if FreeBSD's LE is different for whatever reason? > > Jarkko, > > Could you help to clarify the whole launch control policy in host side so > that we can have a better understanding together? > > Thanks, > -Kai > >> >>> >>> --Andy >>> >
On 6/7/2017 9:22 AM, Andy Lutomirski wrote: > On Tue, Jun 6, 2017 at 1:52 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: >> >> >> On 5/18/2017 7:45 PM, Huang, Kai wrote: >>> >>> >>> >>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote: >>>> >>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com> >>>> wrote: >>>>> >>>>> >>>>> >>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote: >>>>>> >>>>>> >>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> I am not sure whether the cost of writing to 4 MSRs would be >>>>>>> *extremely* >>>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, >>>>>>> writing >>>>>>> to several MSRs, etc. >>>>>> >>>>>> >>>>>> >>>>>> I'm speculating that these MSRs may be rather unoptimized and hence >>>>>> unusualy slow. >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >>>>>>>> with whatever lock is needed (probably just a mutex). Users of EINIT >>>>>>>> will take the mutex, compare the percpu variable to the desired >>>>>>>> value, >>>>>>>> and, if it's different, do WRMSR and update the percpu variable. >>>>>>>> >>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its >>>>>>>> in-memory >>>>>>>> state but *not* changing the MSRs. KVM will trap and emulate EINIT >>>>>>>> to >>>>>>>> support the same handling as the host. There is no action required >>>>>>>> at >>>>>>>> all on KVM guest entry and exit. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> This is doable, but SGX driver needs to do those things and expose >>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to >>>>>>> have, >>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in >>>>>>> performance critical path. We can simply read old value from MSRs out >>>>>>> and >>>>>>> compare whether the old equals to the new. >>>>>> >>>>>> >>>>>> >>>>>> I think the SGX driver should probably live in arch/x86, and the >>>>>> interface could be a simple percpu variable that is exported (from the >>>>>> main kernel image, not from a module). >>>>>> >>>>>>> >>>>>>>> >>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for >>>>>>>> other reasons: someone is going to want to implement policy for what >>>>>>>> enclaves are allowed that applies to guests as well as the host. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> I am not very convinced why "what enclaves are allowed" in host would >>>>>>> apply >>>>>>> to guest. Can you elaborate? I mean in general virtualization just >>>>>>> focus >>>>>>> emulating hardware behavior. If a native machine is able to run any >>>>>>> LE, >>>>>>> the >>>>>>> virtual machine should be able to as well (of course, with guest's >>>>>>> IA32_FEATURE_CONTROL[bit 17] set). >>>>>> >>>>>> >>>>>> >>>>>> I strongly disagree. I can imagine two classes of sensible policies >>>>>> for launch control: >>>>>> >>>>>> 1. Allow everything. This seems quite sensible to me. >>>>>> >>>>>> 2. Allow some things, and make sure that VMs have at least as >>>>>> restrictive a policy as host root has. After all, what's the point of >>>>>> restricting enclaves in the host if host code can simply spawn a >>>>>> little VM to run otherwise-disallowed enclaves? >>>>> >>>>> >>>>> >>>>> What's the current SGX driver launch control policy? Yes allow >>>>> everything >>>>> works for KVM so lets skip this. Are we going to support allowing >>>>> several >>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE >>>>> staff but I don't know details. >>>>> >>>>> I am trying to find a way that we can both not break host launch control >>>>> policy, and be consistent to HW behavior (from guest's view). Currently >>>>> we >>>>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn >>>>> either >>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this >>>>> purpose. >>>>> Actually I introduced below Qemu SGX parameters for creating guest: >>>>> >>>>> -sgx epc=<size>,lehash='SHA-256 hash',lewr >>>>> >>>>> where 'epc' specifies guest's EPC size, lehash specifies (initial) value >>>>> of >>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is >>>>> allowed >>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime. >>>>> >>>>> If host only allows one single LE to run, KVM can add a restrict that >>>>> only >>>>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn >>>>> disabled, so that only host allowed (single) hash can be used by guest. >>>>> From >>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has >>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) >>>>> hash. >>>>> >>>>> If host allows several LEs (not but everything), and if we create guest >>>>> with >>>>> 'lewr', then the behavior is not consistent with HW behavior, as from >>>>> guest's hardware's point of view, we can actually run any LE but we have >>>>> to >>>>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to >>>>> some >>>>> specific values. One compromise solution is we don't allow to create >>>>> guest >>>>> with 'lewr' specified, and at the meantime, only allow to create guest >>>>> with >>>>> host approved hashes specified in 'lehash'. This will make guest's >>>>> behavior >>>>> consistent to HW behavior but only allows guest to run one LE (which is >>>>> specified by 'lehash' when guest is created). >>>> >>>> >>>> I'm not sure I entirely agree for a couple reasons. >>>> >>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy >>>> in which it checks all enclaves (not just LEs) for acceptability. In >>>> fact, if the kernel sticks with the "no LE at all or just >>>> kernel-internal LE", then checking enclaves directly against some >>>> admin- or distro-provided signer list seems reasonable. This type of >>>> policy can't be forwarded to a guest by restricting allowed LE >>>> signers. But this is mostly speculation since AFAIK no one has >>>> seriously proposed any particular policy support and the plan was to >>>> not have this for the initial implementation. >>>> >>>> 2. While matching hardware behavior is nice in principle, there >>>> doesn't seem to be useful hardware behavior to match here. If the >>>> host had a list of five allowed LE signers, how exactly would it >>>> restrict the MSRs? They're not written atomically, so you can't >>>> directly tell what's being written. >>> >>> >>> In this case I actually plan to just allow creating guest with guest's >>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is >>> specified, creating guest will fail. And we only allow creating guest with >>> host allowed hash values (with 'lehash=hash-value'), and if 'hash-value' >>> specified by 'lehash' is not allowed by host, we also fail to create guest. >>> >>> We can only allow creating guest with 'lewr' specified when host allows >>> anything. >>> >>> But in this way, we are restricting guest OS's ability to run LE, as only >>> one LE, that is specified by 'lehash' parameter, can be run. But I think >>> this won't hurt much, as multiple guests still are able to run different >>> LEs? >>> >>> Also, the only way to fail an MSR >>>> >>>> write is to send #GP, and Windows (and even Linux) may not expect >>>> that. Linux doesn't panic due to #GP on MSR writes these days, but >>>> you still get a big fat warning. I wouldn't be at all surprised if >>>> Windows BSODs. >>> >>> >>> We cannot allow writing some particular value to MSRs successfully, while >>> injecting #GP when writing other values to the same MSRs. So #GP is not >>> option. >>> >>> ENCLS[EINIT], on the other hand, returns an actual >>>> >>>> error code. I'm not sure that a sensible error code exists >>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps), >>> >>> >>> Looks no such error code exists. And we cannot return such error code to >>> guest as such error code is only supposed to be valid when ENCLS is run in >>> hypervisor. >>> >>> but SGX_INVALID_EINITTOKEN seems >>>> >>>> to mean, more or less, "the CPU thinks you're not authorized to do >>>> this", so forcing that error code could be entirely reasonable. >>>> >>>> If the host policy is to allow a list of LE signers, you could return >>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in >>>> the list. >>> >>> >>> But this would be inconsistent with HW behavior. If the hash value in >>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, EINIT >>> is not supposed to return SGX_INVALID_EINITTOKEN. >>> >>> I think from VMM's perspective, emulating HW behavior to be consistent >>> with real HW behavior is very important. >>> >>> Paolo, would you provide your comments? >> >> >> Hi all, >> >> This has been quite for a while and I'd like to start discussion again. >> Jarkko told me that currently he only supports one LE in SGX driver, but I >> am not sure whether he is going to extend in the future or not. I think this >> might also depend on requirements from customers. >> >> Andy, >> >> If we only support one LE in driver, then we can only support the same LE >> for all KVM guests, according to your comments that host kernel launch >> control policy should also apply to KVM guests? WOuld you comments more? > > I'm not at all convinced that, going forward, Linux's host-side launch > control policy will be entirely contained in the LE. I'm also not > convinced that non-Linux guests will function at all under this type > of policy -- what if FreeBSD's LE is different for whatever reason? I am not convinced either. I think we need Jarkko to elaborate how is host side launch control policy implemented, or is there any policy at all. I also tried to read SGX driver code but looks I couldn't find any implementation regarding to this. Hi Jarkko, Can you elaborate on this? Thanks, -Kai > >> >> Jarkko, >> >> Could you help to clarify the whole launch control policy in host side so >> that we can have a better understanding together? >> >> Thanks, >> -Kai >> >>> >>>> >>>> --Andy >>>> >> >
On 6/6/2017 6:52 PM, Huang, Kai wrote: >On 6/7/2017 9:22 AM, Andy Lutomirski wrote: >> On Tue, Jun 6, 2017 at 1:52 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: >>> >>> >>> On 5/18/2017 7:45 PM, Huang, Kai wrote: >>>> >>>> >>>> >>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote: >>>>> >>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai >>>>> <kai.huang@linux.intel.com> >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote: >>>>>>> >>>>>>> >>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai >>>>>>> <kai.huang@linux.intel.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> I am not sure whether the cost of writing to 4 MSRs would be >>>>>>>> *extremely* >>>>>>>> slow, as when vcpu is schedule in, KVM is already doing >>>>>>>> vmcs_load, writing to several MSRs, etc. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I'm speculating that these MSRs may be rather unoptimized and hence >>>>>>> unusualy slow. >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH >along >>>>>>>>> with whatever lock is needed (probably just a mutex). Users of EINIT >>>>>>>>> will take the mutex, compare the percpu variable to the desired >>>>>>>>> value, >>>>>>>>> and, if it's different, do WRMSR and update the percpu variable. >>>>>>>>> >>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its >>>>>>>>> in-memory >>>>>>>>> state but *not* changing the MSRs. KVM will trap and emulate EINIT >>>>>>>>> to >>>>>>>>> support the same handling as the host. There is no action required >>>>>>>>> at >>>>>>>>> all on KVM guest entry and exit. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> This is doable, but SGX driver needs to do those things and expose >>>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to >>>>>>>> have, >>>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in >>>>>>>> performance critical path. We can simply read old value from MSRs out >>>>>>>> and >>>>>>>> compare whether the old equals to the new. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I think the SGX driver should probably live in arch/x86, and the >>>>>>> interface could be a simple percpu variable that is exported (from the >>>>>>> main kernel image, not from a module). >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for >>>>>>>>> other reasons: someone is going to want to implement policy for what >>>>>>>>> enclaves are allowed that applies to guests as well as the host. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I am not very convinced why "what enclaves are allowed" in host would >>>>>>>> apply >>>>>>>> to guest. Can you elaborate? I mean in general virtualization just >>>>>>>> focus >>>>>>>> emulating hardware behavior. If a native machine is able to run any >>>>>>>> LE, >>>>>>>> the >>>>>>>> virtual machine should be able to as well (of course, with guest's >>>>>>>> IA32_FEATURE_CONTROL[bit 17] set). >>>>>>> >>>>>>> >>>>>>> >>>>>>> I strongly disagree. I can imagine two classes of sensible policies >>>>>>> for launch control: >>>>>>> >>>>>>> 1. Allow everything. This seems quite sensible to me. >>>>>>> >>>>>>> 2. Allow some things, and make sure that VMs have at least as >>>>>>> restrictive a policy as host root has. After all, what's the point of >>>>>>> restricting enclaves in the host if host code can simply spawn a >>>>>>> little VM to run otherwise-disallowed enclaves? >>>>>> >>>>>> >>>>>> >>>>>> What's the current SGX driver launch control policy? Yes allow >>>>>> everything >>>>>> works for KVM so lets skip this. Are we going to support allowing >>>>>> several >>>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE >>>>>> staff but I don't know details. >>>>>> >>>>>> I am trying to find a way that we can both not break host launch control >>>>>> policy, and be consistent to HW behavior (from guest's view). Currently >>>>>> we >>>>>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn >>>>>> either >>>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this >>>>>> purpose. >>>>>> Actually I introduced below Qemu SGX parameters for creating guest: >>>>>> >>>>>> -sgx epc=<size>,lehash='SHA-256 hash',lewr >>>>>> >>>>>> where 'epc' specifies guest's EPC size, lehash specifies (initial) value >>>>>> of >>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is >>>>>> allowed >>>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime. >>>>>> >>>>>> If host only allows one single LE to run, KVM can add a restrict that >>>>>> only >>>>>> allows to create KVM guest with runtime change to >IA32_SGXLEPUBKEYHASHn >>>>>> disabled, so that only host allowed (single) hash can be used by guest. >>>>>> From >>>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and >has >>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single) >>>>>> hash. >>>>>> >>>>>> If host allows several LEs (not but everything), and if we create guest >>>>>> with >>>>>> 'lewr', then the behavior is not consistent with HW behavior, as from >>>>>> guest's hardware's point of view, we can actually run any LE but we have >>>>>> to >>>>>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn >to >>>>>> some >>>>>> specific values. One compromise solution is we don't allow to create >>>>>> guest >>>>>> with 'lewr' specified, and at the meantime, only allow to create guest >>>>>> with >>>>>> host approved hashes specified in 'lehash'. This will make guest's >>>>>> behavior >>>>>> consistent to HW behavior but only allows guest to run one LE (which is >>>>>> specified by 'lehash' when guest is created). >>>>> >>>>> >>>>> I'm not sure I entirely agree for a couple reasons. >>>>> >>>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy >>>>> in which it checks all enclaves (not just LEs) for acceptability. In >>>>> fact, if the kernel sticks with the "no LE at all or just >>>>> kernel-internal LE", then checking enclaves directly against some >>>>> admin- or distro-provided signer list seems reasonable. This type of >>>>> policy can't be forwarded to a guest by restricting allowed LE >>>>> signers. But this is mostly speculation since AFAIK no one has >>>>> seriously proposed any particular policy support and the plan was to >>>>> not have this for the initial implementation. >>>>> >>>>> 2. While matching hardware behavior is nice in principle, there >>>>> doesn't seem to be useful hardware behavior to match here. If the >>>>> host had a list of five allowed LE signers, how exactly would it >>>>> restrict the MSRs? They're not written atomically, so you can't >>>>> directly tell what's being written. >>>> >>>> >>>> In this case I actually plan to just allow creating guest with guest's >>>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is >>>> specified, creating guest will fail. And we only allow creating guest with >>>> host allowed hash values (with 'lehash=hash-value'), and if 'hash-value' >>>> specified by 'lehash' is not allowed by host, we also fail to create guest. >>>> >>>> We can only allow creating guest with 'lewr' specified when host allows >>>> anything. >>>> >>>> But in this way, we are restricting guest OS's ability to run LE, as only >>>> one LE, that is specified by 'lehash' parameter, can be run. But I think >>>> this won't hurt much, as multiple guests still are able to run different >>>> LEs? >>>> >>>> Also, the only way to fail an MSR >>>>> >>>>> write is to send #GP, and Windows (and even Linux) may not expect >>>>> that. Linux doesn't panic due to #GP on MSR writes these days, but >>>>> you still get a big fat warning. I wouldn't be at all surprised if >>>>> Windows BSODs. >>>> >>>> >>>> We cannot allow writing some particular value to MSRs successfully, while >>>> injecting #GP when writing other values to the same MSRs. So #GP is not >>>> option. >>>> >>>> ENCLS[EINIT], on the other hand, returns an actual >>>>> >>>>> error code. I'm not sure that a sensible error code exists >>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps), >>>> >>>> >>>> Looks no such error code exists. And we cannot return such error code to >>>> guest as such error code is only supposed to be valid when ENCLS is run in >>>> hypervisor. >>>> >>>> but SGX_INVALID_EINITTOKEN seems >>>>> >>>>> to mean, more or less, "the CPU thinks you're not authorized to do >>>>> this", so forcing that error code could be entirely reasonable. >>>>> >>>>> If the host policy is to allow a list of LE signers, you could return >>>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in >>>>> the list. >>>> >>>> >>>> But this would be inconsistent with HW behavior. If the hash value in >>>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, >EINIT >>>> is not supposed to return SGX_INVALID_EINITTOKEN. >>>> >>>> I think from VMM's perspective, emulating HW behavior to be consistent >>>> with real HW behavior is very important. >>>> >>>> Paolo, would you provide your comments? >>> >>> >>> Hi all, >>> >>> This has been quite for a while and I'd like to start discussion again. >>> Jarkko told me that currently he only supports one LE in SGX driver, but I >>> am not sure whether he is going to extend in the future or not. I think this >>> might also depend on requirements from customers. >>> >>> Andy, >>> >>> If we only support one LE in driver, then we can only support the same LE >>> for all KVM guests, according to your comments that host kernel launch >>> control policy should also apply to KVM guests? WOuld you comments more? >> >> I'm not at all convinced that, going forward, Linux's host-side launch >> control policy will be entirely contained in the LE. I'm also not >> convinced that non-Linux guests will function at all under this type >> of policy -- what if FreeBSD's LE is different for whatever reason? > >I am not convinced either. I think we need Jarkko to elaborate how is >host side launch control policy implemented, or is there any policy at >all. I also tried to read SGX driver code but looks I couldn't find any >implementation regarding to this. > >Hi Jarkko, > >Can you elaborate on this? > >Thanks, >-Kai I don't think the kernel's LE policy is relevant here. As long as you allow the guest OS to set the IA32_SGXLEPUBKEYHASHn MSRs, either directly or by the VMM 'lehash' value, you don't need a support for more than on LE in the kernel. The host kernel will have one LE and the guest kernel has another "one" LE that may have a different hash value. I agree with Andy that different OS distributions are likely to have different LE hash values - so the guest OS may require different hash setting. >> >>> >>> Jarkko, >>> >>> Could you help to clarify the whole launch control policy in host side so >>> that we can have a better understanding together? >>> >>> Thanks, >>> -Kai >>> >>>> >>>>> >>>>> --Andy >>>>> >>> >>
On Wed, Jun 07, 2017 at 08:52:42AM +1200, Huang, Kai wrote: > > > On 5/18/2017 7:45 PM, Huang, Kai wrote: > > > > > > On 5/17/2017 12:09 PM, Andy Lutomirski wrote: > > > On Mon, May 15, 2017 at 5:48 PM, Huang, Kai > > > <kai.huang@linux.intel.com> wrote: > > > > > > > > > > > > On 5/12/2017 6:11 PM, Andy Lutomirski wrote: > > > > > > > > > > On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> > > > > > wrote: > > > > > > > > > > > > I am not sure whether the cost of writing to 4 MSRs > > > > > > would be *extremely* > > > > > > slow, as when vcpu is schedule in, KVM is already doing vmcs_load, > > > > > > writing > > > > > > to several MSRs, etc. > > > > > > > > > > > > > > > I'm speculating that these MSRs may be rather unoptimized and hence > > > > > unusualy slow. > > > > > > > > > > > > > > > > > > > > > > > > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along > > > > > > > with whatever lock is needed (probably just a mutex). Users of EINIT > > > > > > > will take the mutex, compare the percpu variable to > > > > > > > the desired value, > > > > > > > and, if it's different, do WRMSR and update the percpu variable. > > > > > > > > > > > > > > KVM will implement writes to SGXLEPUBKEYHASH by > > > > > > > updating its in-memory > > > > > > > state but *not* changing the MSRs. KVM will trap > > > > > > > and emulate EINIT to > > > > > > > support the same handling as the host. There is no > > > > > > > action required at > > > > > > > all on KVM guest entry and exit. > > > > > > > > > > > > > > > > > > > > > > > > This is doable, but SGX driver needs to do those things and expose > > > > > > interfaces for KVM to use. In terms of the percpu data, it is nice to > > > > > > have, > > > > > > but I am not sure whether it is mandatory, as IMO EINIT is not even in > > > > > > performance critical path. We can simply read old value > > > > > > from MSRs out and > > > > > > compare whether the old equals to the new. > > > > > > > > > > > > > > > I think the SGX driver should probably live in arch/x86, and the > > > > > interface could be a simple percpu variable that is exported (from the > > > > > main kernel image, not from a module). > > > > > > > > > > > > > > > > > > > > > > > > > FWIW, I think that KVM will, in the long run, want to trap EINIT for > > > > > > > other reasons: someone is going to want to implement policy for what > > > > > > > enclaves are allowed that applies to guests as well as the host. > > > > > > > > > > > > > > > > > > > > > > > > I am not very convinced why "what enclaves are allowed" in host would > > > > > > apply > > > > > > to guest. Can you elaborate? I mean in general > > > > > > virtualization just focus > > > > > > emulating hardware behavior. If a native machine is able > > > > > > to run any LE, > > > > > > the > > > > > > virtual machine should be able to as well (of course, with guest's > > > > > > IA32_FEATURE_CONTROL[bit 17] set). > > > > > > > > > > > > > > > I strongly disagree. I can imagine two classes of sensible policies > > > > > for launch control: > > > > > > > > > > 1. Allow everything. This seems quite sensible to me. > > > > > > > > > > 2. Allow some things, and make sure that VMs have at least as > > > > > restrictive a policy as host root has. After all, what's the point of > > > > > restricting enclaves in the host if host code can simply spawn a > > > > > little VM to run otherwise-disallowed enclaves? > > > > > > > > > > > > What's the current SGX driver launch control policy? Yes allow > > > > everything > > > > works for KVM so lets skip this. Are we going to support > > > > allowing several > > > > LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE > > > > staff but I don't know details. > > > > > > > > I am trying to find a way that we can both not break host launch control > > > > policy, and be consistent to HW behavior (from guest's view). > > > > Currently we > > > > can create a KVM guest with runtime change to > > > > IA32_SGXLEPUBKEYHASHn either > > > > enabled or disabled. I introduced an Qemu parameter 'lewr' for > > > > this purpose. > > > > Actually I introduced below Qemu SGX parameters for creating guest: > > > > > > > > -sgx epc=<size>,lehash='SHA-256 hash',lewr > > > > > > > > where 'epc' specifies guest's EPC size, lehash specifies > > > > (initial) value of > > > > guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether > > > > guest is allowed > > > > to change guest's IA32_SGXLEPUBKEYHASHn at runtime. > > > > > > > > If host only allows one single LE to run, KVM can add a restrict > > > > that only > > > > allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn > > > > disabled, so that only host allowed (single) hash can be used by > > > > guest. From > > > > guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has > > > > IA32_SGXLEPUBKEYHASHn with default value to be host allowed > > > > (single) hash. > > > > > > > > If host allows several LEs (not but everything), and if we > > > > create guest with > > > > 'lewr', then the behavior is not consistent with HW behavior, as from > > > > guest's hardware's point of view, we can actually run any LE but > > > > we have to > > > > tell guest that you are only allowed to change > > > > IA32_SGXLEPUBKEYHASHn to some > > > > specific values. One compromise solution is we don't allow to > > > > create guest > > > > with 'lewr' specified, and at the meantime, only allow to create > > > > guest with > > > > host approved hashes specified in 'lehash'. This will make > > > > guest's behavior > > > > consistent to HW behavior but only allows guest to run one LE (which is > > > > specified by 'lehash' when guest is created). > > > > > > I'm not sure I entirely agree for a couple reasons. > > > > > > 1. I wouldn't be surprised if the kernel ends up implementing a policy > > > in which it checks all enclaves (not just LEs) for acceptability. In > > > fact, if the kernel sticks with the "no LE at all or just > > > kernel-internal LE", then checking enclaves directly against some > > > admin- or distro-provided signer list seems reasonable. This type of > > > policy can't be forwarded to a guest by restricting allowed LE > > > signers. But this is mostly speculation since AFAIK no one has > > > seriously proposed any particular policy support and the plan was to > > > not have this for the initial implementation. > > > > > > 2. While matching hardware behavior is nice in principle, there > > > doesn't seem to be useful hardware behavior to match here. If the > > > host had a list of five allowed LE signers, how exactly would it > > > restrict the MSRs? They're not written atomically, so you can't > > > directly tell what's being written. > > > > In this case I actually plan to just allow creating guest with guest's > > IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is > > specified, creating guest will fail. And we only allow creating guest > > with host allowed hash values (with 'lehash=hash-value'), and if > > 'hash-value' specified by 'lehash' is not allowed by host, we also fail > > to create guest. > > > > We can only allow creating guest with 'lewr' specified when host allows > > anything. > > > > But in this way, we are restricting guest OS's ability to run LE, as > > only one LE, that is specified by 'lehash' parameter, can be run. But I > > think this won't hurt much, as multiple guests still are able to run > > different LEs? > > > > Also, the only way to fail an MSR > > > write is to send #GP, and Windows (and even Linux) may not expect > > > that. Linux doesn't panic due to #GP on MSR writes these days, but > > > you still get a big fat warning. I wouldn't be at all surprised if > > > Windows BSODs. > > > > We cannot allow writing some particular value to MSRs successfully, > > while injecting #GP when writing other values to the same MSRs. So #GP > > is not option. > > > > ENCLS[EINIT], on the other hand, returns an actual > > > error code. I'm not sure that a sensible error code exists > > > ("SGX_HYPERVISOR_SAID_NO?", perhaps), > > > > Looks no such error code exists. And we cannot return such error code to > > guest as such error code is only supposed to be valid when ENCLS is run > > in hypervisor. > > > > but SGX_INVALID_EINITTOKEN seems > > > to mean, more or less, "the CPU thinks you're not authorized to do > > > this", so forcing that error code could be entirely reasonable. > > > > > > If the host policy is to allow a list of LE signers, you could return > > > SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in > > > the list. > > > > But this would be inconsistent with HW behavior. If the hash value in > > guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, > > EINIT is not supposed to return SGX_INVALID_EINITTOKEN. > > > > I think from VMM's perspective, emulating HW behavior to be consistent > > with real HW behavior is very important. > > > > Paolo, would you provide your comments? > > Hi all, > > This has been quite for a while and I'd like to start discussion again. > Jarkko told me that currently he only supports one LE in SGX driver, but I > am not sure whether he is going to extend in the future or not. I think this > might also depend on requirements from customers. > > Andy, > > If we only support one LE in driver, then we can only support the same LE > for all KVM guests, according to your comments that host kernel launch > control policy should also apply to KVM guests? WOuld you comments more? > > Jarkko, > > Could you help to clarify the whole launch control policy in host side so > that we can have a better understanding together? > > Thanks, > -Kai So. I have pass through LE. It creates EINITTOKEN for anything. Couldn't VMM keep virtual values for MSRs and ask host side LE create token when it needs to? /Jarkko
On 6/9/2017 12:31 AM, Jarkko Sakkinen wrote: > On Wed, Jun 07, 2017 at 08:52:42AM +1200, Huang, Kai wrote: >> >> >> On 5/18/2017 7:45 PM, Huang, Kai wrote: >>> >>> >>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote: >>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai >>>> <kai.huang@linux.intel.com> wrote: >>>>> >>>>> >>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote: >>>>>> >>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> >>>>>> wrote: >>>>>>> >>>>>>> I am not sure whether the cost of writing to 4 MSRs >>>>>>> would be *extremely* >>>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, >>>>>>> writing >>>>>>> to several MSRs, etc. >>>>>> >>>>>> >>>>>> I'm speculating that these MSRs may be rather unoptimized and hence >>>>>> unusualy slow. >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along >>>>>>>> with whatever lock is needed (probably just a mutex). Users of EINIT >>>>>>>> will take the mutex, compare the percpu variable to >>>>>>>> the desired value, >>>>>>>> and, if it's different, do WRMSR and update the percpu variable. >>>>>>>> >>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by >>>>>>>> updating its in-memory >>>>>>>> state but *not* changing the MSRs. KVM will trap >>>>>>>> and emulate EINIT to >>>>>>>> support the same handling as the host. There is no >>>>>>>> action required at >>>>>>>> all on KVM guest entry and exit. >>>>>>> >>>>>>> >>>>>>> >>>>>>> This is doable, but SGX driver needs to do those things and expose >>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to >>>>>>> have, >>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in >>>>>>> performance critical path. We can simply read old value >>>>>>> from MSRs out and >>>>>>> compare whether the old equals to the new. >>>>>> >>>>>> >>>>>> I think the SGX driver should probably live in arch/x86, and the >>>>>> interface could be a simple percpu variable that is exported (from the >>>>>> main kernel image, not from a module). >>>>>> >>>>>>> >>>>>>>> >>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for >>>>>>>> other reasons: someone is going to want to implement policy for what >>>>>>>> enclaves are allowed that applies to guests as well as the host. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I am not very convinced why "what enclaves are allowed" in host would >>>>>>> apply >>>>>>> to guest. Can you elaborate? I mean in general >>>>>>> virtualization just focus >>>>>>> emulating hardware behavior. If a native machine is able >>>>>>> to run any LE, >>>>>>> the >>>>>>> virtual machine should be able to as well (of course, with guest's >>>>>>> IA32_FEATURE_CONTROL[bit 17] set). >>>>>> >>>>>> >>>>>> I strongly disagree. I can imagine two classes of sensible policies >>>>>> for launch control: >>>>>> >>>>>> 1. Allow everything. This seems quite sensible to me. >>>>>> >>>>>> 2. Allow some things, and make sure that VMs have at least as >>>>>> restrictive a policy as host root has. After all, what's the point of >>>>>> restricting enclaves in the host if host code can simply spawn a >>>>>> little VM to run otherwise-disallowed enclaves? >>>>> >>>>> >>>>> What's the current SGX driver launch control policy? Yes allow >>>>> everything >>>>> works for KVM so lets skip this. Are we going to support >>>>> allowing several >>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE >>>>> staff but I don't know details. >>>>> >>>>> I am trying to find a way that we can both not break host launch control >>>>> policy, and be consistent to HW behavior (from guest's view). >>>>> Currently we >>>>> can create a KVM guest with runtime change to >>>>> IA32_SGXLEPUBKEYHASHn either >>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for >>>>> this purpose. >>>>> Actually I introduced below Qemu SGX parameters for creating guest: >>>>> >>>>> -sgx epc=<size>,lehash='SHA-256 hash',lewr >>>>> >>>>> where 'epc' specifies guest's EPC size, lehash specifies >>>>> (initial) value of >>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether >>>>> guest is allowed >>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime. >>>>> >>>>> If host only allows one single LE to run, KVM can add a restrict >>>>> that only >>>>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn >>>>> disabled, so that only host allowed (single) hash can be used by >>>>> guest. From >>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has >>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed >>>>> (single) hash. >>>>> >>>>> If host allows several LEs (not but everything), and if we >>>>> create guest with >>>>> 'lewr', then the behavior is not consistent with HW behavior, as from >>>>> guest's hardware's point of view, we can actually run any LE but >>>>> we have to >>>>> tell guest that you are only allowed to change >>>>> IA32_SGXLEPUBKEYHASHn to some >>>>> specific values. One compromise solution is we don't allow to >>>>> create guest >>>>> with 'lewr' specified, and at the meantime, only allow to create >>>>> guest with >>>>> host approved hashes specified in 'lehash'. This will make >>>>> guest's behavior >>>>> consistent to HW behavior but only allows guest to run one LE (which is >>>>> specified by 'lehash' when guest is created). >>>> >>>> I'm not sure I entirely agree for a couple reasons. >>>> >>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy >>>> in which it checks all enclaves (not just LEs) for acceptability. In >>>> fact, if the kernel sticks with the "no LE at all or just >>>> kernel-internal LE", then checking enclaves directly against some >>>> admin- or distro-provided signer list seems reasonable. This type of >>>> policy can't be forwarded to a guest by restricting allowed LE >>>> signers. But this is mostly speculation since AFAIK no one has >>>> seriously proposed any particular policy support and the plan was to >>>> not have this for the initial implementation. >>>> >>>> 2. While matching hardware behavior is nice in principle, there >>>> doesn't seem to be useful hardware behavior to match here. If the >>>> host had a list of five allowed LE signers, how exactly would it >>>> restrict the MSRs? They're not written atomically, so you can't >>>> directly tell what's being written. >>> >>> In this case I actually plan to just allow creating guest with guest's >>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is >>> specified, creating guest will fail. And we only allow creating guest >>> with host allowed hash values (with 'lehash=hash-value'), and if >>> 'hash-value' specified by 'lehash' is not allowed by host, we also fail >>> to create guest. >>> >>> We can only allow creating guest with 'lewr' specified when host allows >>> anything. >>> >>> But in this way, we are restricting guest OS's ability to run LE, as >>> only one LE, that is specified by 'lehash' parameter, can be run. But I >>> think this won't hurt much, as multiple guests still are able to run >>> different LEs? >>> >>> Also, the only way to fail an MSR >>>> write is to send #GP, and Windows (and even Linux) may not expect >>>> that. Linux doesn't panic due to #GP on MSR writes these days, but >>>> you still get a big fat warning. I wouldn't be at all surprised if >>>> Windows BSODs. >>> >>> We cannot allow writing some particular value to MSRs successfully, >>> while injecting #GP when writing other values to the same MSRs. So #GP >>> is not option. >>> >>> ENCLS[EINIT], on the other hand, returns an actual >>>> error code. I'm not sure that a sensible error code exists >>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps), >>> >>> Looks no such error code exists. And we cannot return such error code to >>> guest as such error code is only supposed to be valid when ENCLS is run >>> in hypervisor. >>> >>> but SGX_INVALID_EINITTOKEN seems >>>> to mean, more or less, "the CPU thinks you're not authorized to do >>>> this", so forcing that error code could be entirely reasonable. >>>> >>>> If the host policy is to allow a list of LE signers, you could return >>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in >>>> the list. >>> >>> But this would be inconsistent with HW behavior. If the hash value in >>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, >>> EINIT is not supposed to return SGX_INVALID_EINITTOKEN. >>> >>> I think from VMM's perspective, emulating HW behavior to be consistent >>> with real HW behavior is very important. >>> >>> Paolo, would you provide your comments? >> >> Hi all, >> >> This has been quite for a while and I'd like to start discussion again. >> Jarkko told me that currently he only supports one LE in SGX driver, but I >> am not sure whether he is going to extend in the future or not. I think this >> might also depend on requirements from customers. >> >> Andy, >> >> If we only support one LE in driver, then we can only support the same LE >> for all KVM guests, according to your comments that host kernel launch >> control policy should also apply to KVM guests? WOuld you comments more? >> >> Jarkko, >> >> Could you help to clarify the whole launch control policy in host side so >> that we can have a better understanding together? >> >> Thanks, >> -Kai > > So. I have pass through LE. It creates EINITTOKEN for anything. Couldn't > VMM keep virtual values for MSRs and ask host side LE create token when > it needs to? Hi Jarkko, Thanks for replying. VMM doesn't need driver to generate EINITTOKEN. The EINITTOKEN is from guest too, upon VMM traps EINIT. I think Andy's comments is, if host SGX driver only allows someone's LE to run (ex, LE from Intel, Redhat, etc...), then SGX driver should also govern LEs from KVM guests as well, by checking SIGSTRUCT from KVM guest (the SIGSTRUCT is provided by KVM by trapping guest's EINIT). In my understanding, although you only allows one LE in kernel, but you won't limit who's LE can be run (basically kernel can run LE signed by anyone, but just one LE when kernel is running), so I don't see there is any limitation to KVM guests here. But it may still be better if SGX driver can provide function like: int sgx_validate_sigstruct(struct sigstruct *sig); for KVM to call, in case driver is changed (ex, to only allows LEs from some particular ones to run), but this is not necessary now. KVM changes can be done later when driver make the changes. Andy, Am I understanding correctly? Does this make sense to you? Thanks, -Kai > > /Jarkko >
On Thu, Jun 8, 2017 at 4:47 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > > On 6/9/2017 12:31 AM, Jarkko Sakkinen wrote: >> >> On Wed, Jun 07, 2017 at 08:52:42AM +1200, Huang, Kai wrote: >>> >>> >>> >>> On 5/18/2017 7:45 PM, Huang, Kai wrote: >>>> >>>> >>>> >>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote: >>>>> >>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai >>>>> <kai.huang@linux.intel.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote: >>>>>>> >>>>>>> >>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai >>>>>>> <kai.huang@linux.intel.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> I am not sure whether the cost of writing to 4 MSRs >>>>>>>> would be *extremely* >>>>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load, >>>>>>>> writing >>>>>>>> to several MSRs, etc. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I'm speculating that these MSRs may be rather unoptimized and hence >>>>>>> unusualy slow. >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH >>>>>>>>> along >>>>>>>>> with whatever lock is needed (probably just a mutex). Users of >>>>>>>>> EINIT >>>>>>>>> will take the mutex, compare the percpu variable to >>>>>>>>> the desired value, >>>>>>>>> and, if it's different, do WRMSR and update the percpu variable. >>>>>>>>> >>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by >>>>>>>>> updating its in-memory >>>>>>>>> state but *not* changing the MSRs. KVM will trap >>>>>>>>> and emulate EINIT to >>>>>>>>> support the same handling as the host. There is no >>>>>>>>> action required at >>>>>>>>> all on KVM guest entry and exit. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> This is doable, but SGX driver needs to do those things and expose >>>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice >>>>>>>> to >>>>>>>> have, >>>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even >>>>>>>> in >>>>>>>> performance critical path. We can simply read old value >>>>>>>> from MSRs out and >>>>>>>> compare whether the old equals to the new. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I think the SGX driver should probably live in arch/x86, and the >>>>>>> interface could be a simple percpu variable that is exported (from >>>>>>> the >>>>>>> main kernel image, not from a module). >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT >>>>>>>>> for >>>>>>>>> other reasons: someone is going to want to implement policy for >>>>>>>>> what >>>>>>>>> enclaves are allowed that applies to guests as well as the host. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I am not very convinced why "what enclaves are allowed" in host >>>>>>>> would >>>>>>>> apply >>>>>>>> to guest. Can you elaborate? I mean in general >>>>>>>> virtualization just focus >>>>>>>> emulating hardware behavior. If a native machine is able >>>>>>>> to run any LE, >>>>>>>> the >>>>>>>> virtual machine should be able to as well (of course, with guest's >>>>>>>> IA32_FEATURE_CONTROL[bit 17] set). >>>>>>> >>>>>>> >>>>>>> >>>>>>> I strongly disagree. I can imagine two classes of sensible policies >>>>>>> for launch control: >>>>>>> >>>>>>> 1. Allow everything. This seems quite sensible to me. >>>>>>> >>>>>>> 2. Allow some things, and make sure that VMs have at least as >>>>>>> restrictive a policy as host root has. After all, what's the point >>>>>>> of >>>>>>> restricting enclaves in the host if host code can simply spawn a >>>>>>> little VM to run otherwise-disallowed enclaves? >>>>>> >>>>>> >>>>>> >>>>>> What's the current SGX driver launch control policy? Yes allow >>>>>> everything >>>>>> works for KVM so lets skip this. Are we going to support >>>>>> allowing several >>>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel >>>>>> LE >>>>>> staff but I don't know details. >>>>>> >>>>>> I am trying to find a way that we can both not break host launch >>>>>> control >>>>>> policy, and be consistent to HW behavior (from guest's view). >>>>>> Currently we >>>>>> can create a KVM guest with runtime change to >>>>>> IA32_SGXLEPUBKEYHASHn either >>>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for >>>>>> this purpose. >>>>>> Actually I introduced below Qemu SGX parameters for creating guest: >>>>>> >>>>>> -sgx epc=<size>,lehash='SHA-256 hash',lewr >>>>>> >>>>>> where 'epc' specifies guest's EPC size, lehash specifies >>>>>> (initial) value of >>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether >>>>>> guest is allowed >>>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime. >>>>>> >>>>>> If host only allows one single LE to run, KVM can add a restrict >>>>>> that only >>>>>> allows to create KVM guest with runtime change to >>>>>> IA32_SGXLEPUBKEYHASHn >>>>>> disabled, so that only host allowed (single) hash can be used by >>>>>> guest. From >>>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and >>>>>> has >>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed >>>>>> (single) hash. >>>>>> >>>>>> If host allows several LEs (not but everything), and if we >>>>>> create guest with >>>>>> 'lewr', then the behavior is not consistent with HW behavior, as from >>>>>> guest's hardware's point of view, we can actually run any LE but >>>>>> we have to >>>>>> tell guest that you are only allowed to change >>>>>> IA32_SGXLEPUBKEYHASHn to some >>>>>> specific values. One compromise solution is we don't allow to >>>>>> create guest >>>>>> with 'lewr' specified, and at the meantime, only allow to create >>>>>> guest with >>>>>> host approved hashes specified in 'lehash'. This will make >>>>>> guest's behavior >>>>>> consistent to HW behavior but only allows guest to run one LE (which >>>>>> is >>>>>> specified by 'lehash' when guest is created). >>>>> >>>>> >>>>> I'm not sure I entirely agree for a couple reasons. >>>>> >>>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy >>>>> in which it checks all enclaves (not just LEs) for acceptability. In >>>>> fact, if the kernel sticks with the "no LE at all or just >>>>> kernel-internal LE", then checking enclaves directly against some >>>>> admin- or distro-provided signer list seems reasonable. This type of >>>>> policy can't be forwarded to a guest by restricting allowed LE >>>>> signers. But this is mostly speculation since AFAIK no one has >>>>> seriously proposed any particular policy support and the plan was to >>>>> not have this for the initial implementation. >>>>> >>>>> 2. While matching hardware behavior is nice in principle, there >>>>> doesn't seem to be useful hardware behavior to match here. If the >>>>> host had a list of five allowed LE signers, how exactly would it >>>>> restrict the MSRs? They're not written atomically, so you can't >>>>> directly tell what's being written. >>>> >>>> >>>> In this case I actually plan to just allow creating guest with guest's >>>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is >>>> specified, creating guest will fail. And we only allow creating guest >>>> with host allowed hash values (with 'lehash=hash-value'), and if >>>> 'hash-value' specified by 'lehash' is not allowed by host, we also fail >>>> to create guest. >>>> >>>> We can only allow creating guest with 'lewr' specified when host allows >>>> anything. >>>> >>>> But in this way, we are restricting guest OS's ability to run LE, as >>>> only one LE, that is specified by 'lehash' parameter, can be run. But I >>>> think this won't hurt much, as multiple guests still are able to run >>>> different LEs? >>>> >>>> Also, the only way to fail an MSR >>>>> >>>>> write is to send #GP, and Windows (and even Linux) may not expect >>>>> that. Linux doesn't panic due to #GP on MSR writes these days, but >>>>> you still get a big fat warning. I wouldn't be at all surprised if >>>>> Windows BSODs. >>>> >>>> >>>> We cannot allow writing some particular value to MSRs successfully, >>>> while injecting #GP when writing other values to the same MSRs. So #GP >>>> is not option. >>>> >>>> ENCLS[EINIT], on the other hand, returns an actual >>>>> >>>>> error code. I'm not sure that a sensible error code exists >>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps), >>>> >>>> >>>> Looks no such error code exists. And we cannot return such error code to >>>> guest as such error code is only supposed to be valid when ENCLS is run >>>> in hypervisor. >>>> >>>> but SGX_INVALID_EINITTOKEN seems >>>>> >>>>> to mean, more or less, "the CPU thinks you're not authorized to do >>>>> this", so forcing that error code could be entirely reasonable. >>>>> >>>>> If the host policy is to allow a list of LE signers, you could return >>>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in >>>>> the list. >>>> >>>> >>>> But this would be inconsistent with HW behavior. If the hash value in >>>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, >>>> EINIT is not supposed to return SGX_INVALID_EINITTOKEN. >>>> >>>> I think from VMM's perspective, emulating HW behavior to be consistent >>>> with real HW behavior is very important. >>>> >>>> Paolo, would you provide your comments? >>> >>> >>> Hi all, >>> >>> This has been quite for a while and I'd like to start discussion again. >>> Jarkko told me that currently he only supports one LE in SGX driver, but >>> I >>> am not sure whether he is going to extend in the future or not. I think >>> this >>> might also depend on requirements from customers. >>> >>> Andy, >>> >>> If we only support one LE in driver, then we can only support the same LE >>> for all KVM guests, according to your comments that host kernel launch >>> control policy should also apply to KVM guests? WOuld you comments more? >>> >>> Jarkko, >>> >>> Could you help to clarify the whole launch control policy in host side so >>> that we can have a better understanding together? >>> >>> Thanks, >>> -Kai >> >> >> So. I have pass through LE. It creates EINITTOKEN for anything. Couldn't >> VMM keep virtual values for MSRs and ask host side LE create token when >> it needs to? > > > Hi Jarkko, > > Thanks for replying. VMM doesn't need driver to generate EINITTOKEN. The > EINITTOKEN is from guest too, upon VMM traps EINIT. > > I think Andy's comments is, if host SGX driver only allows someone's LE to > run (ex, LE from Intel, Redhat, etc...), then SGX driver should also govern > LEs from KVM guests as well, by checking SIGSTRUCT from KVM guest (the > SIGSTRUCT is provided by KVM by trapping guest's EINIT). > > In my understanding, although you only allows one LE in kernel, but you > won't limit who's LE can be run (basically kernel can run LE signed by > anyone, but just one LE when kernel is running), so I don't see there is any > limitation to KVM guests here. > > But it may still be better if SGX driver can provide function like: > > int sgx_validate_sigstruct(struct sigstruct *sig); > > for KVM to call, in case driver is changed (ex, to only allows LEs from some > particular ones to run), but this is not necessary now. KVM changes can be > done later when driver make the changes. > > Andy, > > Am I understanding correctly? Does this make sense to you? My understanding is that the kernel will not (at least initially) allow users to supply an LE. The kernel will handle launches all by itself. How it does so is an implementation detail, and it seems to me that it will cause compatibility issues if guests have to use the host's LE. sgx_validate_sigstruct(...) sounds like it could be a good approach to me. > > Thanks, > -Kai > >> >> /Jarkko >> >
On 6/8/2017 7:53 PM, Andy Lutomirski wrote: > >On Thu, Jun 8, 2017 at 4:47 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: >> >> >> On 6/9/2017 12:31 AM, Jarkko Sakkinen wrote: >>> >>> On Wed, Jun 07, 2017 at 08:52:42AM +1200, Huang, Kai wrote: >>>> >>>> >>>> >>>> On 5/18/2017 7:45 PM, Huang, Kai wrote: >>>>> >>>>> >>>>> >>>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote: >>>>>> >>>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai >>>>>> <kai.huang@linux.intel.com> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai >>>>>>>> <kai.huang@linux.intel.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> I am not sure whether the cost of writing to 4 MSRs would be >>>>>>>>> *extremely* slow, as when vcpu is schedule in, KVM is already >>>>>>>>> doing vmcs_load, writing to several MSRs, etc. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I'm speculating that these MSRs may be rather unoptimized and >>>>>>>> hence unusualy slow. >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH >>>>>>>>>> along with whatever lock is needed (probably just a mutex). >>>>>>>>>> Users of EINIT will take the mutex, compare the percpu >>>>>>>>>> variable to the desired value, and, if it's different, do >>>>>>>>>> WRMSR and update the percpu variable. >>>>>>>>>> >>>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its >>>>>>>>>> in-memory state but *not* changing the MSRs. KVM will trap >>>>>>>>>> and emulate EINIT to support the same handling as the host. >>>>>>>>>> There is no action required at all on KVM guest entry and >>>>>>>>>> exit. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> This is doable, but SGX driver needs to do those things and >>>>>>>>> expose interfaces for KVM to use. In terms of the percpu data, >>>>>>>>> it is nice to have, but I am not sure whether it is mandatory, >>>>>>>>> as IMO EINIT is not even in performance critical path. We can >>>>>>>>> simply read old value from MSRs out and compare whether the old >>>>>>>>> equals to the new. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I think the SGX driver should probably live in arch/x86, and the >>>>>>>> interface could be a simple percpu variable that is exported >>>>>>>> (from the main kernel image, not from a module). >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap >>>>>>>>>> EINIT for other reasons: someone is going to want to implement >>>>>>>>>> policy for what enclaves are allowed that applies to guests as >>>>>>>>>> well as the host. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I am not very convinced why "what enclaves are allowed" in host >>>>>>>>> would apply to guest. Can you elaborate? I mean in general >>>>>>>>> virtualization just focus emulating hardware behavior. If a >>>>>>>>> native machine is able to run any LE, the virtual machine >>>>>>>>> should be able to as well (of course, with guest's >>>>>>>>> IA32_FEATURE_CONTROL[bit 17] set). >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I strongly disagree. I can imagine two classes of sensible >>>>>>>> policies for launch control: >>>>>>>> >>>>>>>> 1. Allow everything. This seems quite sensible to me. >>>>>>>> >>>>>>>> 2. Allow some things, and make sure that VMs have at least as >>>>>>>> restrictive a policy as host root has. After all, what's the >>>>>>>> point of restricting enclaves in the host if host code can >>>>>>>> simply spawn a little VM to run otherwise-disallowed enclaves? >>>>>>> >>>>>>> >>>>>>> >>>>>>> What's the current SGX driver launch control policy? Yes allow >>>>>>> everything works for KVM so lets skip this. Are we going to >>>>>>> support allowing several LEs, or just allowing one single LE? I >>>>>>> know Jarkko is doing in-kernel LE staff but I don't know details. >>>>>>> >>>>>>> I am trying to find a way that we can both not break host launch >>>>>>> control policy, and be consistent to HW behavior (from guest's >>>>>>> view). >>>>>>> Currently we >>>>>>> can create a KVM guest with runtime change to >>>>>>> IA32_SGXLEPUBKEYHASHn either enabled or disabled. I introduced an >>>>>>> Qemu parameter 'lewr' for this purpose. >>>>>>> Actually I introduced below Qemu SGX parameters for creating guest: >>>>>>> >>>>>>> -sgx epc=<size>,lehash='SHA-256 hash',lewr >>>>>>> >>>>>>> where 'epc' specifies guest's EPC size, lehash specifies >>>>>>> (initial) value of >>>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest >>>>>>> is allowed to change guest's IA32_SGXLEPUBKEYHASHn at runtime. >>>>>>> >>>>>>> If host only allows one single LE to run, KVM can add a restrict >>>>>>> that only allows to create KVM guest with runtime change to >>>>>>> IA32_SGXLEPUBKEYHASHn disabled, so that only host allowed >>>>>>> (single) hash can be used by guest. From guest's view, it simply >>>>>>> has IA32_FEATURE_CONTROL[bit17] cleared and has >>>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed >>>>>>> (single) hash. >>>>>>> >>>>>>> If host allows several LEs (not but everything), and if we create >>>>>>> guest with 'lewr', then the behavior is not consistent with HW >>>>>>> behavior, as from guest's hardware's point of view, we can >>>>>>> actually run any LE but we have to tell guest that you are only >>>>>>> allowed to change IA32_SGXLEPUBKEYHASHn to some specific values. >>>>>>> One compromise solution is we don't allow to create guest with >>>>>>> 'lewr' specified, and at the meantime, only allow to create guest >>>>>>> with host approved hashes specified in 'lehash'. This will make >>>>>>> guest's behavior consistent to HW behavior but only allows guest >>>>>>> to run one LE (which is specified by 'lehash' when guest is >>>>>>> created). >>>>>> >>>>>> >>>>>> I'm not sure I entirely agree for a couple reasons. >>>>>> >>>>>> 1. I wouldn't be surprised if the kernel ends up implementing a >>>>>> policy in which it checks all enclaves (not just LEs) for >>>>>> acceptability. In fact, if the kernel sticks with the "no LE at >>>>>> all or just kernel-internal LE", then checking enclaves directly >>>>>> against some >>>>>> admin- or distro-provided signer list seems reasonable. This type >>>>>> of policy can't be forwarded to a guest by restricting allowed LE >>>>>> signers. But this is mostly speculation since AFAIK no one has >>>>>> seriously proposed any particular policy support and the plan was >>>>>> to not have this for the initial implementation. >>>>>> >>>>>> 2. While matching hardware behavior is nice in principle, there >>>>>> doesn't seem to be useful hardware behavior to match here. If the >>>>>> host had a list of five allowed LE signers, how exactly would it >>>>>> restrict the MSRs? They're not written atomically, so you can't >>>>>> directly tell what's being written. >>>>> >>>>> >>>>> In this case I actually plan to just allow creating guest with >>>>> guest's IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). >>>>> If 'lewr' is specified, creating guest will fail. And we only allow >>>>> creating guest with host allowed hash values (with >>>>> 'lehash=hash-value'), and if 'hash-value' specified by 'lehash' is >>>>> not allowed by host, we also fail to create guest. >>>>> >>>>> We can only allow creating guest with 'lewr' specified when host >>>>> allows anything. >>>>> >>>>> But in this way, we are restricting guest OS's ability to run LE, >>>>> as only one LE, that is specified by 'lehash' parameter, can be >>>>> run. But I think this won't hurt much, as multiple guests still are >>>>> able to run different LEs? >>>>> >>>>> Also, the only way to fail an MSR >>>>>> >>>>>> write is to send #GP, and Windows (and even Linux) may not expect >>>>>> that. Linux doesn't panic due to #GP on MSR writes these days, >>>>>> but you still get a big fat warning. I wouldn't be at all >>>>>> surprised if Windows BSODs. >>>>> >>>>> >>>>> We cannot allow writing some particular value to MSRs successfully, >>>>> while injecting #GP when writing other values to the same MSRs. So >>>>> #GP is not option. >>>>> >>>>> ENCLS[EINIT], on the other hand, returns an actual >>>>>> >>>>>> error code. I'm not sure that a sensible error code exists >>>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps), >>>>> >>>>> >>>>> Looks no such error code exists. And we cannot return such error >>>>> code to guest as such error code is only supposed to be valid when >>>>> ENCLS is run in hypervisor. >>>>> >>>>> but SGX_INVALID_EINITTOKEN seems >>>>>> >>>>>> to mean, more or less, "the CPU thinks you're not authorized to do >>>>>> this", so forcing that error code could be entirely reasonable. >>>>>> >>>>>> If the host policy is to allow a list of LE signers, you could >>>>>> return SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE >>>>>> that isn't in the list. >>>>> >>>>> >>>>> But this would be inconsistent with HW behavior. If the hash value >>>>> in guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by >>>>> EINIT, EINIT is not supposed to return SGX_INVALID_EINITTOKEN. >>>>> >>>>> I think from VMM's perspective, emulating HW behavior to be >>>>> consistent with real HW behavior is very important. >>>>> >>>>> Paolo, would you provide your comments? >>>> >>>> >>>> Hi all, >>>> >>>> This has been quite for a while and I'd like to start discussion again. >>>> Jarkko told me that currently he only supports one LE in SGX driver, >>>> but I am not sure whether he is going to extend in the future or >>>> not. I think this might also depend on requirements from customers. >>>> >>>> Andy, >>>> >>>> If we only support one LE in driver, then we can only support the >>>> same LE for all KVM guests, according to your comments that host >>>> kernel launch control policy should also apply to KVM guests? WOuld you >comments more? >>>> >>>> Jarkko, >>>> >>>> Could you help to clarify the whole launch control policy in host >>>> side so that we can have a better understanding together? >>>> >>>> Thanks, >>>> -Kai >>> >>> >>> So. I have pass through LE. It creates EINITTOKEN for anything. >>> Couldn't VMM keep virtual values for MSRs and ask host side LE create >>> token when it needs to? >> >> >> Hi Jarkko, >> >> Thanks for replying. VMM doesn't need driver to generate EINITTOKEN. >> The EINITTOKEN is from guest too, upon VMM traps EINIT. >> >> I think Andy's comments is, if host SGX driver only allows someone's >> LE to run (ex, LE from Intel, Redhat, etc...), then SGX driver should >> also govern LEs from KVM guests as well, by checking SIGSTRUCT from >> KVM guest (the SIGSTRUCT is provided by KVM by trapping guest's EINIT). >> >> In my understanding, although you only allows one LE in kernel, but >> you won't limit who's LE can be run (basically kernel can run LE >> signed by anyone, but just one LE when kernel is running), so I don't >> see there is any limitation to KVM guests here. >> >> But it may still be better if SGX driver can provide function like: >> >> int sgx_validate_sigstruct(struct sigstruct *sig); >> >> for KVM to call, in case driver is changed (ex, to only allows LEs >> from some particular ones to run), but this is not necessary now. KVM >> changes can be done later when driver make the changes. >> >> Andy, >> >> Am I understanding correctly? Does this make sense to you? > >My understanding is that the kernel will not (at least initially) allow users to supply >an LE. The kernel will handle launches all by itself. How it does so is an >implementation detail, and it seems to me that it will cause compatibility issues if >guests have to use the host's LE. > >sgx_validate_sigstruct(...) sounds like it could be a good approach to me. > I don't think the kernel needs to support more than one LE. For my understanding the guest kernel may include different LE and will require different LE hash, either by configuring 'lehash' when launching the guest, or by the guest kernel setting the hash value. I assume that the VMM should not block this MSR writes and allow the guest to set the hash values for its own LE (by caching the MSR writes and setting the values on EINIT). So I don't see why the host kernel should provide EINITTOKEN for the guests. Moreover, I guess that the approach of LE providing tokens to any enclave is only the initial implementation, and different distributions may decide to add some logic to the token generation, so it might be that the host LE or the guest LE will not provide token to some enclave, and I don't think we'll want to override this kernel logic. >> >> Thanks, >> -Kai >> >>> >>> /Jarkko >>> >>
On Fri, Jun 09, 2017 at 11:47:13AM +1200, Huang, Kai wrote: > In my understanding, although you only allows one LE in kernel, but you > won't limit who's LE can be run (basically kernel can run LE signed by > anyone, but just one LE when kernel is running), so I don't see there is any > limitation to KVM guests here. > > But it may still be better if SGX driver can provide function like: > > int sgx_validate_sigstruct(struct sigstruct *sig); > > for KVM to call, in case driver is changed (ex, to only allows LEs from some > particular ones to run), but this is not necessary now. KVM changes can be > done later when driver make the changes. > > Andy, > > Am I understanding correctly? Does this make sense to you? > > Thanks, > -Kai Nope. I don't even understand the *beginnings* what that function would do. I don't understand what the validation means here and what VMM would do if that functions reports "success". How that would work on a system where MSRs cannot be changed? In that kind of system the host OS must generate EINITTOKEN for the LE running on inside the guest and maintain completely virtualized MSR values for the guest. /Jarkko
On 6/11/2017 12:23 AM, Jarkko Sakkinen wrote: > On Fri, Jun 09, 2017 at 11:47:13AM +1200, Huang, Kai wrote: >> In my understanding, although you only allows one LE in kernel, but you >> won't limit who's LE can be run (basically kernel can run LE signed by >> anyone, but just one LE when kernel is running), so I don't see there is any >> limitation to KVM guests here. >> >> But it may still be better if SGX driver can provide function like: >> >> int sgx_validate_sigstruct(struct sigstruct *sig); >> >> for KVM to call, in case driver is changed (ex, to only allows LEs from some >> particular ones to run), but this is not necessary now. KVM changes can be >> done later when driver make the changes. >> >> Andy, >> >> Am I understanding correctly? Does this make sense to you? >> >> Thanks, >> -Kai > > Nope. I don't even understand the *beginnings* what that function would > do. I don't understand what the validation means here and what VMM would > do if that functions reports "success". The validation means either the sigstruct->modulus or SHA256(sigstruct->modulus) should be in a 'approved white-list' maintained by kernel (which I know doesn't exist now, but Andy some kind suggested we may or should have, in the future I guess), otherwise the function returns error to indicate the LE from guest is "unapproved by host kernel/driver". Andy, would you explain here? > > How that would work on a system where MSRs cannot be changed? This is simple, we simply won't allow guest to choose its own IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when creating the guest. To elaborate, currently in my design Qemu has below new parameters to support SGX: # qemu-system-x86_64 -sgx, epc=<size>,lehash=<sha-256 hash>,lewr The 'epc=<size>' specifies guest's EPC size obviously, lehash specifies guest's initial IA32_SGXLEPUBKEYHASHn (similar to the value configured in BIOS for real machine), and 'lewr' specifies whether guest's IA32_SGXLEPUBKEYHASHn can be changed by OS at runtime. The 'lehash' and 'lewr' are optional. If MSRs cannot be changed on physical machine, then we will fail to create guest if either 'lehash' or 'lewr' is specified when creating the guest. > > In that kind of system the host OS must generate EINITTOKEN for the LE > running on inside the guest and maintain completely virtualized MSR > values for the guest. The host OS will not generate EINITTOKEN for guest in any circumstances, as EINITTOKEN will always be from guest's EINIT instruction. KVM traps EINIT from guest and gets both SIGSTRUCT and EINITTOKEN from the EINIT leaf, update MSRs, and run EINIT on behalf of guest. Btw the purpose for KVM to trap EINIT is to update guest's virtual IA32_SGXLEPUBKEYHASHn to physical MSRs, before running EINIT. In fact KVM even doesn't need to trap EINIT but simply updating guest's MSRs to real MSRs when vcpu is scheduled in, if SGX driver can update host LE's hash to MSRs before EINIT in host. KVM are not trying to guarantee running EINIT successfully here, but simply to emulate guest's IA32_SGXLEPUBKEYHASHn and EINIT in the guest. Thanks, -Kai > > /Jarkko >
On Mon, Jun 12, 2017 at 10:45:07AM +1200, Huang, Kai wrote: > > > But it may still be better if SGX driver can provide function like: > > > > > > int sgx_validate_sigstruct(struct sigstruct *sig); > > > > > > for KVM to call, in case driver is changed (ex, to only allows LEs from some > > > particular ones to run), but this is not necessary now. KVM changes can be > > > done later when driver make the changes. > > > > > > Andy, > > > > > > Am I understanding correctly? Does this make sense to you? > > > > > > Thanks, > > > -Kai > > > > Nope. I don't even understand the *beginnings* what that function would > > do. I don't understand what the validation means here and what VMM would > > do if that functions reports "success". > > The validation means either the sigstruct->modulus or > SHA256(sigstruct->modulus) should be in a 'approved white-list' maintained > by kernel (which I know doesn't exist now, but Andy some kind suggested we > may or should have, in the future I guess), otherwise the function returns > error to indicate the LE from guest is "unapproved by host kernel/driver". > > Andy, would you explain here? That can be considered but I still have zero idea what this function is and what its relation to whitelist would be. > > How that would work on a system where MSRs cannot be changed? > > This is simple, we simply won't allow guest to choose its own > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when > creating the guest. Why not? You could have virtual MSRs and ask host LE to generate token if they match to modulus. > To elaborate, currently in my design Qemu has below new parameters to > support SGX: > > # qemu-system-x86_64 -sgx, epc=<size>,lehash=<sha-256 hash>,lewr > > The 'epc=<size>' specifies guest's EPC size obviously, lehash specifies > guest's initial IA32_SGXLEPUBKEYHASHn (similar to the value configured in > BIOS for real machine), and 'lewr' specifies whether guest's > IA32_SGXLEPUBKEYHASHn can be changed by OS at runtime. The 'lehash' and > 'lewr' are optional. > > If MSRs cannot be changed on physical machine, then we will fail to create > guest if either 'lehash' or 'lewr' is specified when creating the guest. > > > > > In that kind of system the host OS must generate EINITTOKEN for the LE > > running on inside the guest and maintain completely virtualized MSR > > values for the guest. > > The host OS will not generate EINITTOKEN for guest in any circumstances, as > EINITTOKEN will always be from guest's EINIT instruction. KVM traps EINIT > from guest and gets both SIGSTRUCT and EINITTOKEN from the EINIT leaf, > update MSRs, and run EINIT on behalf of guest. Seriously sounds like a stupid constraint or I'm not getting something (which also might be the case). If you anyway trap EINIT, you could create a special case for guest LE. /Jarkko
On 6/12/2017 8:36 PM, Jarkko Sakkinen wrote: > On Mon, Jun 12, 2017 at 10:45:07AM +1200, Huang, Kai wrote: >>>> But it may still be better if SGX driver can provide function like: >>>> >>>> int sgx_validate_sigstruct(struct sigstruct *sig); >>>> >>>> for KVM to call, in case driver is changed (ex, to only allows LEs from some >>>> particular ones to run), but this is not necessary now. KVM changes can be >>>> done later when driver make the changes. >>>> >>>> Andy, >>>> >>>> Am I understanding correctly? Does this make sense to you? >>>> >>>> Thanks, >>>> -Kai >>> >>> Nope. I don't even understand the *beginnings* what that function would >>> do. I don't understand what the validation means here and what VMM would >>> do if that functions reports "success". >> >> The validation means either the sigstruct->modulus or >> SHA256(sigstruct->modulus) should be in a 'approved white-list' maintained >> by kernel (which I know doesn't exist now, but Andy some kind suggested we >> may or should have, in the future I guess), otherwise the function returns >> error to indicate the LE from guest is "unapproved by host kernel/driver". >> >> Andy, would you explain here? > > That can be considered but I still have zero idea what this function is > and what its relation to whitelist would be. The relation is this function only returns success when sigstruct->modulus (or SHA256(sigstruct->modulus)) matches the value in the 'white-list'... VMM does nothing if the function returns success, it continues to run EINIT on behalf of guest, and inject the result to guest (either success or failure, VMM doesn't care the result but need to report the result to guest to reflect HW behavior). If this function returns error, VMM needs to report some error code to guest. VMM even doesn't need to run EINIT anymore. But IMO we don't have add this function now. If your driver choose to have such 'white-list', we can do this in the future. > >>> How that would work on a system where MSRs cannot be changed? >> >> This is simple, we simply won't allow guest to choose its own >> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when >> creating the guest. > > Why not? You could have virtual MSRs and ask host LE to generate token > if they match to modulus. The guest has its own LE running inside, and guest's LE will generate token for enclaves in guest. The host will not generate token for guest in any circumstances, because this is totally guest's behavior. Virtualization is only about to emulate hardware's behavior, but not to assume, or depend on, or change guest's SW behavior. We are not trying to make sure EINIT will run successfully in guest, instead, we are trying to make sure EINIT will be just the same behavior as it runs on physical machine -- either success or failure, according to guest's sigstruct and token. If EINIT in guest is supposed to fail (ex, incorrect token is generated by guest's LE), KVM will need to inject the supposed error to guest, to reflect the real hardware behavior. One more example, if KVM even chooses not to trap EINIT, how can you provide token generated by host LE to guest? You simply cannot. Host LE will never generate token for guest, in any circumstance. > >> To elaborate, currently in my design Qemu has below new parameters to >> support SGX: >> >> # qemu-system-x86_64 -sgx, epc=<size>,lehash=<sha-256 hash>,lewr >> >> The 'epc=<size>' specifies guest's EPC size obviously, lehash specifies >> guest's initial IA32_SGXLEPUBKEYHASHn (similar to the value configured in >> BIOS for real machine), and 'lewr' specifies whether guest's >> IA32_SGXLEPUBKEYHASHn can be changed by OS at runtime. The 'lehash' and >> 'lewr' are optional. >> >> If MSRs cannot be changed on physical machine, then we will fail to create >> guest if either 'lehash' or 'lewr' is specified when creating the guest. >> >>> >>> In that kind of system the host OS must generate EINITTOKEN for the LE >>> running on inside the guest and maintain completely virtualized MSR >>> values for the guest. >> >> The host OS will not generate EINITTOKEN for guest in any circumstances, as >> EINITTOKEN will always be from guest's EINIT instruction. KVM traps EINIT >> from guest and gets both SIGSTRUCT and EINITTOKEN from the EINIT leaf, >> update MSRs, and run EINIT on behalf of guest. > > Seriously sounds like a stupid constraint or I'm not getting something > (which also might be the case). If you anyway trap EINIT, you could > create a special case for guest LE. This is not constraint, but KVM has to emulate hardware correctly. For this part please see my explanation above. And let me explain the purpose of trapping EINIT again here. When guest is about to run EINIT, if guest's SHA256(sigstruct->modulus) matches guest's virtual IA32_SGXLEPUBKEYHASHn (and if others are correctly populated in sigstruct and token as well), KVM needs to make sure that EINIT will run successfully in guest, even physical IA32_SGXLEPUBKEYHASHn are not equal to guest's virtual MSRs at this particular time. This is because given the same condition, the EINIT will run successfully on physical machine. KVM needs to emulate the right HW behavior. How to make sure guest's EINIT will run successfully in this case? We need to update guest's virtual MSRs to physical MSRs before guest runs EINIT. The whole purpose for KVM to trap EINIT from guest, is to guarantee that before EINIT, KVM can update guest's virtual MSRs to physical MSRs. Like I said before, KVM even doesn't need to trap EINIT for this purpose, ex, KVM can update guest's virtual MSRs to real MSRs when vcpu is scheduled in (which is exactly what I did in this patch, btw), but this is considered to be performance unfriendly, so Andy, Sean suggested we should do this when trapping EINIT from guest. Another problem of updating MSRs when vcpu is scheduled in is, this will break SGX driver if SGX driver does some performance optimization in terms of updating MSRs, such as keeping per_cpu variables for MSR current value, and only updating MSRs if per_cpu value is not equal to the value you want to update. Because of this, Andy, Sean also suggested that SGX driver can provide a function to handle updating MSRs and run EINIT together with mutex protected, and KVM simply calls that function (of course KVM will provide the sigstruct and token by trapping EINIT from guest). This can cover MSRs updating and EINIT for both host and KVM guests. IMO this is a good idea as well, and you should consider to do in this way. Thanks, -Kai > > /Jarkko >
On Mon, Jun 12, 2017 at 2:53 AM, Huang, Kai <kai.huang@linux.intel.com> wrote: \ >> >>>> How that would work on a system where MSRs cannot be changed? >>> >>> >>> This is simple, we simply won't allow guest to choose its own >>> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when >>> creating the guest. >> >> >> Why not? You could have virtual MSRs and ask host LE to generate token >> if they match to modulus. > > > The guest has its own LE running inside, and guest's LE will generate token > for enclaves in guest. The host will not generate token for guest in any > circumstances, because this is totally guest's behavior. > > Virtualization is only about to emulate hardware's behavior, but not to > assume, or depend on, or change guest's SW behavior. We are not trying to > make sure EINIT will run successfully in guest, instead, we are trying to > make sure EINIT will be just the same behavior as it runs on physical > machine -- either success or failure, according to guest's sigstruct and > token. I disagree. Virtualization can do whatever it wants, but a pretty strong constraint is that the guest should be at least reasonably function. A Windows guest, for example, shouldn't BSOD. But the host most certainly can restrict what the guest can do. If a guest is given pass-through access to a graphics card, the host is well within its rights to impose thermal policy, prevent reflashing, etc. Similarly, if a guest is given access to SGX, the host can and should impose required policy on the guest. If this means that an EINIT that would have succeeded at host CPL 0 fails in the guest, so be it. Of course, there isn't much in the way of host policy right now, so this may not require any particular action until interesting host policy shows up. > This is not constraint, but KVM has to emulate hardware correctly. For this > part please see my explanation above. > > And let me explain the purpose of trapping EINIT again here. > > When guest is about to run EINIT, if guest's SHA256(sigstruct->modulus) > matches guest's virtual IA32_SGXLEPUBKEYHASHn (and if others are correctly > populated in sigstruct and token as well), KVM needs to make sure that EINIT > will run successfully in guest, even physical IA32_SGXLEPUBKEYHASHn are not > equal to guest's virtual MSRs at this particular time. True, so long as it doesn't contradict host policy to do so. > This is because given > the same condition, the EINIT will run successfully on physical machine. KVM > needs to emulate the right HW behavior. No. The host needs to do this because KVM needs to work and be useful, not because KVM needs to precisely match CPU behavior as seen by VMX root. To avoid confusion, I don't believe I've ever said that guests should be restricted in which LEs they can use. The types of restrictions I'm talking about are that, if the host prevents user code from running, say, a provisioning enclave that isn't whitelisted, then the guest should have the same restriction applied. This type of restriction *can't* be usefully done by restricting acceptable MSR values, but it's trivial by trapping EINIT.
On 6/13/2017 4:24 AM, Andy Lutomirski wrote: > On Mon, Jun 12, 2017 at 2:53 AM, Huang, Kai <kai.huang@linux.intel.com> wrote: > \ >>> >>>>> How that would work on a system where MSRs cannot be changed? >>>> >>>> >>>> This is simple, we simply won't allow guest to choose its own >>>> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when >>>> creating the guest. >>> >>> >>> Why not? You could have virtual MSRs and ask host LE to generate token >>> if they match to modulus. >> >> >> The guest has its own LE running inside, and guest's LE will generate token >> for enclaves in guest. The host will not generate token for guest in any >> circumstances, because this is totally guest's behavior. >> >> Virtualization is only about to emulate hardware's behavior, but not to >> assume, or depend on, or change guest's SW behavior. We are not trying to >> make sure EINIT will run successfully in guest, instead, we are trying to >> make sure EINIT will be just the same behavior as it runs on physical >> machine -- either success or failure, according to guest's sigstruct and >> token. > > I disagree. Virtualization can do whatever it wants, but a pretty > strong constraint is that the guest should be at least reasonably > function. Virtualization only can do whatever it wants on the part that it can trap and emulate, and such *whatever* should not break HW behavior presented to guest. This is fundamental thing for virtualization. I don't know whether there are some *minor* case that the HW behavior emulation is not fully respected but I believe, if they exist, those cases are extremely rare, and we certainly have a good reason why such cases are OK. Anyway I'll leave this part to KVM maintainers. Paolo, Radim, Sorry this thread is a little bit long till now. Can you comments on this? A Windows guest, for example, shouldn't BSOD. But the host > most certainly can restrict what the guest can do. If a guest is > given pass-through access to a graphics card, the host is well within > its rights to impose thermal policy, prevent reflashing, etc. You need to see whether those policies are provided by PCIE configration space or by device's registers. If policies are implemented via PCIE configuration space (which is trapped and emulated by VMM/Qemu), you certainly can apply restrict on it by emulating PCIE configuration space access. But if those policies are done by device registers, which driver will control totally, it's totally up to driver and host cannot apply any policies. You can choose to or not to pass through device's specific BARs (ex, you cannot passthrough the BARs contains MSI), but once the BARs are passthroughed to guest, you cannot control guest's behavior. > Similarly, if a guest is given access to SGX, the host can and should > impose required policy on the guest. If this means that an EINIT that > would have succeeded at host CPL 0 fails in the guest, so be it. This is completely different. And I disagree. If EINIT can run at host CPL 0, it can be run in guest's CPL 0 as well. Unless hardware doesn't support this, in which case we even cannot support SGX virtualization. The exception is, if HW provides, for example, some specific capability bits that can be used to control whether EINIT can be run in CPL 0 or not, and hypervisor is able to trap those bits, then hypervisor can manipulate those bits to make guest think HW doesn't allow EINIT to run in CPL 0, in which case it is quite reasonable in guest EINIT cannot run in CPL 0 (because it is HW behavior). > > Of course, there isn't much in the way of host policy right now, so > this may not require any particular action until interesting host > policy shows up. > >> This is not constraint, but KVM has to emulate hardware correctly. For this >> part please see my explanation above. >> >> And let me explain the purpose of trapping EINIT again here. >> >> When guest is about to run EINIT, if guest's SHA256(sigstruct->modulus) >> matches guest's virtual IA32_SGXLEPUBKEYHASHn (and if others are correctly >> populated in sigstruct and token as well), KVM needs to make sure that EINIT >> will run successfully in guest, even physical IA32_SGXLEPUBKEYHASHn are not >> equal to guest's virtual MSRs at this particular time. > > True, so long as it doesn't contradict host policy to do so. > >> This is because given >> the same condition, the EINIT will run successfully on physical machine. KVM >> needs to emulate the right HW behavior. > > No. The host needs to do this because KVM needs to work and be > useful, not because KVM needs to precisely match CPU behavior as seen > by VMX root. If we need to break HW behavior to make SGX useful. The maintainer may choose not to support SGX virtualization, as from HW this feature simply cannot be virtualized. Anyway I'll leave this to KVM maintainers to determine. > > To avoid confusion, I don't believe I've ever said that guests should > be restricted in which LEs they can use. The types of restrictions > I'm talking about are that, if the host prevents user code from > running, say, a provisioning enclave that isn't whitelisted, then the > guest should have the same restriction applied. This type of > restriction *can't* be usefully done by restricting acceptable MSR > values, but it's trivial by trapping EINIT. OK. Sorry I didn't get your point before. I thought it was about restrict LE. I don't know whether SGX driver will have restrict on running provisioning enclave. In my understanding provisioning enclave is always from Intel. However I am not expert here and probably be wrong. Can you point out *exactly* what restricts in host must/should be applied to guest so that Jarkko can know whether he will support those restricts or not? Otherwise I don't think we even need to talk about this topic at current stage. Thanks, -Kai >
On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > I don't know whether SGX driver will have restrict on running provisioning > enclave. In my understanding provisioning enclave is always from Intel. > However I am not expert here and probably be wrong. Can you point out > *exactly* what restricts in host must/should be applied to guest so that > Jarkko can know whether he will support those restricts or not? Otherwise I > don't think we even need to talk about this topic at current stage. > The whole point is that I don't know. But here are two types of restriction I can imagine demand for: 1. Only a particular approved provisioning enclave may run (be it Intel's or otherwise -- with a non-Intel LE, I think you can launch a non-Intel provisioning enclave). This would be done to restrict what types of remote attestation can be done. (Intel supplies a remote attestation service that uses some contractual policy that I don't know. Maybe a system owner wants a different policy applied to ISVs.) Imposing this policy on guests more or less requires filtering EINIT. 2. For kiosk-ish or single-purpose applications, I can imagine that you would want to allow a specific list of enclave signers or even enclave hashes. Maybe you would allow exactly one enclave hash. You could kludge this up with a restrictive LE policy, but you could also do it for real by implementing the specific restriction in the kernel. Then you'd want to impose it on the guest, and you'd do it by filtering EINIT. For the time being, I don't expect either policy to be implemented right away, but I bet that something like this will eventually happen.
On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote: > > > This is simple, we simply won't allow guest to choose its own > > > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when > > > creating the guest. > > > > Why not? You could have virtual MSRs and ask host LE to generate token > > if they match to modulus. > > The guest has its own LE running inside, and guest's LE will generate token > for enclaves in guest. The host will not generate token for guest in any > circumstances, because this is totally guest's behavior. Why can't host LE generate the token without guest knowning it and supply it with EINIT? > > Seriously sounds like a stupid constraint or I'm not getting something > > (which also might be the case). If you anyway trap EINIT, you could > > create a special case for guest LE. > > This is not constraint, but KVM has to emulate hardware correctly. For this > part please see my explanation above. I'm being now totally honest to your: your explanation makes absolutely zero sense to me. You don't need a 1000+ words to explain the scenarios where "host as a delegate LE" approach would go wrong. Please just pinpoint the scenarios where it goes wrong. I'll ignore the text below. /Jarkko
On Tue, Jun 13, 2017 at 09:57:18PM +0300, Jarkko Sakkinen wrote: > On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote: > > > > This is simple, we simply won't allow guest to choose its own > > > > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when > > > > creating the guest. > > > > > > Why not? You could have virtual MSRs and ask host LE to generate token > > > if they match to modulus. > > > > The guest has its own LE running inside, and guest's LE will generate token > > for enclaves in guest. The host will not generate token for guest in any > > circumstances, because this is totally guest's behavior. > > Why can't host LE generate the token without guest knowning it and > supply it with EINIT? > > > Seriously sounds like a stupid constraint or I'm not getting something > > > (which also might be the case). If you anyway trap EINIT, you could > > > create a special case for guest LE. > > > > This is not constraint, but KVM has to emulate hardware correctly. For this > > part please see my explanation above. > > I'm being now totally honest to your: your explanation makes absolutely > zero sense to me. You don't need a 1000+ words to explain the scenarios > where "host as a delegate LE" approach would go wrong. > > Please just pinpoint the scenarios where it goes wrong. I'll ignore > the text below. > > /Jarkko When I've been reading this discussion the biggest lesson for me has been that this is a new argument for having in-kernel LE in addition to what Andy has stated before: the MSRs *never* need to be updated on behalf of the guest. /Jarkko
On Tue, 2017-06-13 at 22:05 +0300, Jarkko Sakkinen wrote: > On Tue, Jun 13, 2017 at 09:57:18PM +0300, Jarkko Sakkinen wrote: > > > > On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote: > > > > > > > > > > > > > > > > > This is simple, we simply won't allow guest to choose its own > > > > > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter > > > > > when > > > > > creating the guest. > > > > Why not? You could have virtual MSRs and ask host LE to generate token > > > > if they match to modulus. > > > The guest has its own LE running inside, and guest's LE will generate > > > token > > > for enclaves in guest. The host will not generate token for guest in any > > > circumstances, because this is totally guest's behavior. > > Why can't host LE generate the token without guest knowning it and > > supply it with EINIT? > > > > > > > > > > > Seriously sounds like a stupid constraint or I'm not getting something > > > > (which also might be the case). If you anyway trap EINIT, you could > > > > create a special case for guest LE. > > > This is not constraint, but KVM has to emulate hardware correctly. For > > > this > > > part please see my explanation above. > > I'm being now totally honest to your: your explanation makes absolutely > > zero sense to me. You don't need a 1000+ words to explain the scenarios > > where "host as a delegate LE" approach would go wrong. > > > > Please just pinpoint the scenarios where it goes wrong. I'll ignore > > the text below. > > > > /Jarkko > When I've been reading this discussion the biggest lesson for me has > been that this is a new argument for having in-kernel LE in addition > to what Andy has stated before: the MSRs *never* need to be updated on > behalf of the guest. > > /Jarkko The MSRs need to be written to run a LE in the guest, EINITTOKEN can't be used to EINIT an enclave that is requesting access to the EINITTOKENKEY, i.e. a LE. Preventing the guest from running its own LE is not an option, as the owner of the LE, e.g. guest kernel or userspace daemon, will likely disable SGX if its LE fails to run (including any ECALLS into the LE). Allowing a guest to run a LE doesn't mean the host can't ignore/discard the guest's EINITTOKENs, assuming the host traps EINIT.
On 6/14/2017 6:57 AM, Jarkko Sakkinen wrote: > On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote: >>>> This is simple, we simply won't allow guest to choose its own >>>> IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when >>>> creating the guest. >>> >>> Why not? You could have virtual MSRs and ask host LE to generate token >>> if they match to modulus. >> >> The guest has its own LE running inside, and guest's LE will generate token >> for enclaves in guest. The host will not generate token for guest in any >> circumstances, because this is totally guest's behavior. > > Why can't host LE generate the token without guest knowning it and > supply it with EINIT? I have said many times, virtualization is only about to emulate hardware behavior. The same software runs in native machine is supposed to run in guest. I don't care on host how you implement -- whether LE delegates token for other enclave or not. Your implementation works on host, the exact driver works in guest. I said several times, I even don't have to trap EINIT, in which case you simply cannot generate token for EINIT from guest, because there is no EINIT from guest! >>> Seriously sounds like a stupid constraint or I'm not getting something >>> (which also might be the case). If you anyway trap EINIT, you could >>> create a special case for guest LE. >> >> This is not constraint, but KVM has to emulate hardware correctly. For this >> part please see my explanation above. > > I'm being now totally honest to your: your explanation makes absolutely > zero sense to me. You don't need a 1000+ words to explain the scenarios > where "host as a delegate LE" approach would go wrong. This doesn't require 1000+ words.. This is simple, and I have explained: This is not constraint, but KVM has to emulate hardware correctly. The additional 1000+ words that I spent lots of time on typing is trying to explain to you -- why we (at least Andy, Sean, Haim I believe) agreed to choose to trap EINIT. > > Please just pinpoint the scenarios where it goes wrong. I'll ignore > the text below. Of course you can, if you already understand why we agreed to choose to trap EINIT. I think Andy's comments just made things more complicated. He is trying to solve problems that don't exist in your implementation, and I believe we can handle this in the future, if those problems come up. So let's focus on how to handle updating MSRs and EINIT. I believe Andy, me, and Sean are all on the same page. Not sure whether you are or not. For me I am perfectly fine not to trap EINIT (which is exactly this patch did), but you have to guarantee the correctness of host side. > > /Jarkko >
On Tue, Jun 13, 2017 at 01:13:04PM -0700, Sean Christopherson wrote: > The MSRs need to be written to run a LE in the guest, EINITTOKEN can't be used > to EINIT an enclave that is requesting access to the EINITTOKENKEY, i.e. a LE. > Preventing the guest from running its own LE is not an option, as the owner of > the LE, e.g. guest kernel or userspace daemon, will likely disable SGX if its LE > fails to run (including any ECALLS into the LE). Allowing a guest to run a LE > doesn't mean the host can't ignore/discard the guest's EINITTOKENs, assuming the > host traps EINIT. [I started one week leave today but will peek MLs seldomly so except some delay in my follow up responses] Please, lets not use the term ECALL in these discussions. It's neither hardware nor kernel specific concept. It's abstraction that exists only in the Intel SDK. I have neither ECALLs nor OCALLs in my LE for example. There are enough moving parts without such abstraction. I'm looking at the section "EINIT - Initialize an Enclave for Execution" from the SDM. I'm not seeing a branch in the pseudo code that checks for ATTRIBUTES.EINITTOKENKEY. 39.1.4 states that "Only Launch Enclaves are allowed to launch without a valid token." I'm not sure what I should deduce from that because that statement is *incorrect*. If you control the MSRs, you can launch anything you want to launch. I guess we should make a bug report of this section as it's complete nonsense? The table 41-56 does not show any key material bound to key hash defined in the MSRs. Instead of teaching me stuff that I already know I would just like to get pinpointed where is the "side-effect" that makes the constraint that you are claiming. I can then update the documentation so that we don't have to go through this discussion anymore :-) /Jarkko
On Wed, Jun 14, 2017 at 11:28:32AM +1200, Huang, Kai wrote: > > > On 6/14/2017 6:57 AM, Jarkko Sakkinen wrote: > > On Mon, Jun 12, 2017 at 09:53:41PM +1200, Huang, Kai wrote: > > > > > This is simple, we simply won't allow guest to choose its own > > > > > IA32_SGXLEPUBKEYHASHn by specifying 'lehash' value in Qemu parameter when > > > > > creating the guest. > > > > > > > > Why not? You could have virtual MSRs and ask host LE to generate token > > > > if they match to modulus. > > > > > > The guest has its own LE running inside, and guest's LE will generate token > > > for enclaves in guest. The host will not generate token for guest in any > > > circumstances, because this is totally guest's behavior. > > > > Why can't host LE generate the token without guest knowning it and > > supply it with EINIT? > > I have said many times, virtualization is only about to emulate hardware > behavior. The same software runs in native machine is supposed to run in > guest. I don't care on host how you implement -- whether LE delegates token > for other enclave or not. Your implementation works on host, the exact > driver works in guest. I said several times, I even don't have to trap > EINIT, in which case you simply cannot generate token for EINIT from guest, > because there is no EINIT from guest! > > > > > > Seriously sounds like a stupid constraint or I'm not getting something > > > > (which also might be the case). If you anyway trap EINIT, you could > > > > create a special case for guest LE. > > > > > > This is not constraint, but KVM has to emulate hardware correctly. For this > > > part please see my explanation above. > > > > I'm being now totally honest to your: your explanation makes absolutely > > zero sense to me. You don't need a 1000+ words to explain the scenarios > > where "host as a delegate LE" approach would go wrong. > > This doesn't require 1000+ words.. This is simple, and I have explained: > > This is not constraint, but KVM has to emulate hardware correctly. > > The additional 1000+ words that I spent lots of time on typing is trying to > explain to you -- why we (at least Andy, Sean, Haim I believe) agreed to > choose to trap EINIT. > > > > > > Please just pinpoint the scenarios where it goes wrong. I'll ignore > > the text below. > > Of course you can, if you already understand why we agreed to choose to trap > EINIT. > > I think Andy's comments just made things more complicated. He is trying to > solve problems that don't exist in your implementation, and I believe we can > handle this in the future, if those problems come up. > > So let's focus on how to handle updating MSRs and EINIT. I believe Andy, me, > and Sean are all on the same page. Not sure whether you are or not. > > For me I am perfectly fine not to trap EINIT (which is exactly this patch > did), but you have to guarantee the correctness of host side. > > > > > /Jarkko > > I'm not yet seeing why MSRs would ever would need to be updated. See my response to Sean for details. There's probably some detail in the SDM that I'm not observing. /Jarkko
Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > On Tue, Jun 13, 2017 at 01:13:04PM -0700, Sean Christopherson wrote: > > > The MSRs need to be written to run a LE in the guest, EINITTOKEN can't be > > used to EINIT an enclave that is requesting access to the EINITTOKENKEY, > > i.e. a LE. Preventing the guest from running its own LE is not an option, > > as the owner of the LE, e.g. guest kernel or userspace daemon, will likely > > disable SGX if its LE fails to run (including any ECALLS into the LE). > > Allowing a guest to run a LE doesn't mean the host can't ignore/discard the > > guest's EINITTOKENs, assuming the host traps EINIT. > > [I started one week leave today but will peek MLs seldomly so except > some delay in my follow up responses] > > Please, lets not use the term ECALL in these discussions. It's neither > hardware nor kernel specific concept. It's abstraction that exists only > in the Intel SDK. I have neither ECALLs nor OCALLs in my LE for example. > There are enough moving parts without such abstraction. > > I'm looking at the section "EINIT - Initialize an Enclave for Execution" > from the SDM. I'm not seeing a branch in the pseudo code that checks for > ATTRIBUTES.EINITTOKENKEY. (* if controlled ATTRIBUTES are set, SIGSTRUCT must be signed using an authorized key *) CONTROLLED_ATTRIBUTES <- 0000000000000020H; IF (((DS:RCX.ATTRIBUTES & CONTROLLED_ATTRIBUTES) != 0) and (TMP_MRSIGNER != IA32_SGXLEPUBKEYHASH)) RFLAG.ZF <- 1; RAX <- SGX_INVALID_ATTRIBUTE; GOTO EXIT; FI; Bit 5, i.e. 20H, corresponds to the EINITTOKENKEY. This is also covered in the text description under Intel SGX Launch Control Configuration - "The hash of the public key used to sign the SIGSTRUCT of the Launch Enclave must equal the value in the IA32_SGXLEPUBKEYHASH MSRs." > 39.1.4 states that "Only Launch Enclaves are allowed to launch without a > valid token." I'm not sure what I should deduce from that because that > statement is *incorrect*. If you control the MSRs, you can launch > anything you want to launch. I guess we should make a bug report of this > section as it's complete nonsense? I wouldn't call it complete nonsense, there are far more egregious ambiguities in the SDM. If you read the statement in the context of someone learning about SGX, it makes perfect sense: if it's not a launch enclave, it needs a token. Sure, rewording the statement to something like "Only enclaves whose public key hash equals the value in the IA32_SGXLEPUBKEYHASH MSRs are allowed to launch without a token." is technically more accurate, but I wouldn't describe the current wording as "complete nonsense". > The table 41-56 does not show any key material bound to key hash defined > in the MSRs. > > Instead of teaching me stuff that I already know I would just like to > get pinpointed where is the "side-effect" that makes the constraint that > you are claiming. I can then update the documentation so that we don't > have to go through this discussion anymore :-) > > /Jarkko
On Wed, Jun 14, 2017 at 03:11:34PM +0000, Christopherson, Sean J wrote: > Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > On Tue, Jun 13, 2017 at 01:13:04PM -0700, Sean Christopherson wrote: > > > > > The MSRs need to be written to run a LE in the guest, EINITTOKEN can't be > > > used to EINIT an enclave that is requesting access to the EINITTOKENKEY, > > > i.e. a LE. Preventing the guest from running its own LE is not an option, > > > as the owner of the LE, e.g. guest kernel or userspace daemon, will likely > > > disable SGX if its LE fails to run (including any ECALLS into the LE). > > > Allowing a guest to run a LE doesn't mean the host can't ignore/discard the > > > guest's EINITTOKENs, assuming the host traps EINIT. > > > > [I started one week leave today but will peek MLs seldomly so except > > some delay in my follow up responses] > > > > Please, lets not use the term ECALL in these discussions. It's neither > > hardware nor kernel specific concept. It's abstraction that exists only > > in the Intel SDK. I have neither ECALLs nor OCALLs in my LE for example. > > There are enough moving parts without such abstraction. > > > > I'm looking at the section "EINIT - Initialize an Enclave for Execution" > > from the SDM. I'm not seeing a branch in the pseudo code that checks for > > ATTRIBUTES.EINITTOKENKEY. > > (* if controlled ATTRIBUTES are set, SIGSTRUCT must be signed using an authorized key *) > CONTROLLED_ATTRIBUTES <- 0000000000000020H; > IF (((DS:RCX.ATTRIBUTES & CONTROLLED_ATTRIBUTES) != 0) and (TMP_MRSIGNER != IA32_SGXLEPUBKEYHASH)) > RFLAG.ZF <- 1; > RAX <- SGX_INVALID_ATTRIBUTE; > GOTO EXIT; > FI; > > Bit 5, i.e. 20H, corresponds to the EINITTOKENKEY. This is also covered in the > text description under Intel SGX Launch Control Configuration - "The hash of the > public key used to sign the SIGSTRUCT of the Launch Enclave must equal the value > in the IA32_SGXLEPUBKEYHASH MSRs." Thanks. I wonder by the naming is ambiguous (the value is exactly the same as the value of ATTRIBUTES.EINITTOKENKEY but the name is different) but there it is. > > 39.1.4 states that "Only Launch Enclaves are allowed to launch without a > > valid token." I'm not sure what I should deduce from that because that > > statement is *incorrect*. If you control the MSRs, you can launch > > anything you want to launch. I guess we should make a bug report of this > > section as it's complete nonsense? > > I wouldn't call it complete nonsense, there are far more egregious ambiguities > in the SDM. If you read the statement in the context of someone learning about > SGX, it makes perfect sense: if it's not a launch enclave, it needs a token. > Sure, rewording the statement to something like "Only enclaves whose public key > hash equals the value in the IA32_SGXLEPUBKEYHASH MSRs are allowed to launch > without a token." is technically more accurate, but I wouldn't describe the > current wording as "complete nonsense". Agreed! That was a harsh overstatement. I think that in this kind of stuff the accurancy still would make sense when cryptography is involved. I'll make updates to intel_sgx.rst. It's good to have it documented when virtualization stuff is upstreamed. /Jarkko
On 6/13/2017 11:00 AM, Andy Lutomirski wrote: > On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: >> >> I don't know whether SGX driver will have restrict on running provisioning >> enclave. In my understanding provisioning enclave is always from Intel. >> However I am not expert here and probably be wrong. Can you point out >> *exactly* what restricts in host must/should be applied to guest so that >> Jarkko can know whether he will support those restricts or not? Otherwise I >> don't think we even need to talk about this topic at current stage. >> > > The whole point is that I don't know. But here are two types of > restriction I can imagine demand for: > > 1. Only a particular approved provisioning enclave may run (be it > Intel's or otherwise -- with a non-Intel LE, I think you can launch a > non-Intel provisioning enclave). This would be done to restrict what > types of remote attestation can be done. (Intel supplies a remote > attestation service that uses some contractual policy that I don't > know. Maybe a system owner wants a different policy applied to ISVs.) > Imposing this policy on guests more or less requires filtering EINIT. Hi Andy, Sorry for late reply. What is the issue if host and guest run provisioning enclave from different vendor, for example, host runs intel's provisioning enclave, and guest runs other vendor's provisioning enclave? Or different guests run provisioning enclaves from different vendors? One reason I am asking is that, on Xen (where we don't have concept of *host*), it's likely that we won't apply any policy at Xen hypervisor at all, and guests will be able to run any enclave from any signer as their wish. Sorry that I don't understand (or kind of forgot) the issues here. > > 2. For kiosk-ish or single-purpose applications, I can imagine that > you would want to allow a specific list of enclave signers or even > enclave hashes. Maybe you would allow exactly one enclave hash. You > could kludge this up with a restrictive LE policy, but you could also > do it for real by implementing the specific restriction in the kernel. > Then you'd want to impose it on the guest, and you'd do it by > filtering EINIT. Assuming the enclave hash means measurement of enclave, and assuming we have a policy that we only allow enclave from one signer to run, would you also elaborate the issue that, if host and guest run enclaves from different signer? If host has such policy, and we are allowing creating guests on such host, I think that typically we will have the same policy in the guest (vetted by guest's kernel). The owner of that host should be aware of the risk (if there's any) by creating guest and run enclave inside it. Thanks, -Kai > > For the time being, I don't expect either policy to be implemented > right away, but I bet that something like this will eventually happen. >
On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > > On 6/13/2017 11:00 AM, Andy Lutomirski wrote: >> >> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com> >> wrote: >>> >>> >>> I don't know whether SGX driver will have restrict on running >>> provisioning >>> enclave. In my understanding provisioning enclave is always from Intel. >>> However I am not expert here and probably be wrong. Can you point out >>> *exactly* what restricts in host must/should be applied to guest so that >>> Jarkko can know whether he will support those restricts or not? Otherwise >>> I >>> don't think we even need to talk about this topic at current stage. >>> >> >> The whole point is that I don't know. But here are two types of >> restriction I can imagine demand for: >> >> 1. Only a particular approved provisioning enclave may run (be it >> Intel's or otherwise -- with a non-Intel LE, I think you can launch a >> non-Intel provisioning enclave). This would be done to restrict what >> types of remote attestation can be done. (Intel supplies a remote >> attestation service that uses some contractual policy that I don't >> know. Maybe a system owner wants a different policy applied to ISVs.) >> Imposing this policy on guests more or less requires filtering EINIT. > > > Hi Andy, > > Sorry for late reply. > > What is the issue if host and guest run provisioning enclave from different > vendor, for example, host runs intel's provisioning enclave, and guest runs > other vendor's provisioning enclave? Or different guests run provisioning > enclaves from different vendors? There's no issue unless someone has tried to impose a policy. There is clearly at least some interest in having policies that affect what enclaves can run -- otherwise there wouldn't be LEs in the first place. > > One reason I am asking is that, on Xen (where we don't have concept of > *host*), it's likely that we won't apply any policy at Xen hypervisor at > all, and guests will be able to run any enclave from any signer as their > wish. That seems entirely reasonable. Someone may eventually ask Xen to add support for SGX enclave restrictions, in which case you'll either have to tell them that it won't happen or implement it. > > Sorry that I don't understand (or kind of forgot) the issues here. > >> >> 2. For kiosk-ish or single-purpose applications, I can imagine that >> you would want to allow a specific list of enclave signers or even >> enclave hashes. Maybe you would allow exactly one enclave hash. You >> could kludge this up with a restrictive LE policy, but you could also >> do it for real by implementing the specific restriction in the kernel. >> Then you'd want to impose it on the guest, and you'd do it by >> filtering EINIT. > > Assuming the enclave hash means measurement of enclave, and assuming we have > a policy that we only allow enclave from one signer to run, would you also > elaborate the issue that, if host and guest run enclaves from different > signer? If host has such policy, and we are allowing creating guests on such > host, I think that typically we will have the same policy in the guest Yes, I presume this too, but. > (vetted by guest's kernel). The owner of that host should be aware of the > risk (if there's any) by creating guest and run enclave inside it. No. The host does not trust the guest in general. If the host has a policy that the only enclave that shall run is X, that doesn't mean that the host shall reject all enclaves requested by the normal userspace API except X but that, if /dev/kvm is used, then the user is magically trusted to not load a guest that fails to respect the host policy. It means that the only enclave that shall run is X regardless of what interface is used. The host must only allow X to be loaded by its userspace and the host must only allow X to be loaded by a guest.
On 6/16/2017 4:11 PM, Andy Lutomirski wrote: > On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: >> >> >> On 6/13/2017 11:00 AM, Andy Lutomirski wrote: >>> >>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com> >>> wrote: >>>> >>>> >>>> I don't know whether SGX driver will have restrict on running >>>> provisioning >>>> enclave. In my understanding provisioning enclave is always from Intel. >>>> However I am not expert here and probably be wrong. Can you point out >>>> *exactly* what restricts in host must/should be applied to guest so that >>>> Jarkko can know whether he will support those restricts or not? Otherwise >>>> I >>>> don't think we even need to talk about this topic at current stage. >>>> >>> >>> The whole point is that I don't know. But here are two types of >>> restriction I can imagine demand for: >>> >>> 1. Only a particular approved provisioning enclave may run (be it >>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a >>> non-Intel provisioning enclave). This would be done to restrict what >>> types of remote attestation can be done. (Intel supplies a remote >>> attestation service that uses some contractual policy that I don't >>> know. Maybe a system owner wants a different policy applied to ISVs.) >>> Imposing this policy on guests more or less requires filtering EINIT. >> >> >> Hi Andy, >> >> Sorry for late reply. >> >> What is the issue if host and guest run provisioning enclave from different >> vendor, for example, host runs intel's provisioning enclave, and guest runs >> other vendor's provisioning enclave? Or different guests run provisioning >> enclaves from different vendors? > > There's no issue unless someone has tried to impose a policy. There > is clearly at least some interest in having policies that affect what > enclaves can run -- otherwise there wouldn't be LEs in the first > place. > >> >> One reason I am asking is that, on Xen (where we don't have concept of >> *host*), it's likely that we won't apply any policy at Xen hypervisor at >> all, and guests will be able to run any enclave from any signer as their >> wish. > > That seems entirely reasonable. Someone may eventually ask Xen to add > support for SGX enclave restrictions, in which case you'll either have > to tell them that it won't happen or implement it. > >> >> Sorry that I don't understand (or kind of forgot) the issues here. >> >>> >>> 2. For kiosk-ish or single-purpose applications, I can imagine that >>> you would want to allow a specific list of enclave signers or even >>> enclave hashes. Maybe you would allow exactly one enclave hash. You >>> could kludge this up with a restrictive LE policy, but you could also >>> do it for real by implementing the specific restriction in the kernel. >>> Then you'd want to impose it on the guest, and you'd do it by >>> filtering EINIT. >> >> Assuming the enclave hash means measurement of enclave, and assuming we have >> a policy that we only allow enclave from one signer to run, would you also >> elaborate the issue that, if host and guest run enclaves from different >> signer? If host has such policy, and we are allowing creating guests on such >> host, I think that typically we will have the same policy in the guest > > Yes, I presume this too, but. > >> (vetted by guest's kernel). The owner of that host should be aware of the >> risk (if there's any) by creating guest and run enclave inside it. > > No. The host does not trust the guest in general. If the host has a I agree. > policy that the only enclave that shall run is X, that doesn't mean > that the host shall reject all enclaves requested by the normal > userspace API except X but that, if /dev/kvm is used, then the user is > magically trusted to not load a guest that fails to respect the host > policy. It means that the only enclave that shall run is X regardless > of what interface is used. The host must only allow X to be loaded by > its userspace and the host must only allow X to be loaded by a guest. > This is theoretical thing. I think your statement makes sense only if we have specific example that can prove there's actual risk when allowing guest to exceed X approved by host. I will dig more in your previous emails to see whether you have listed such real cases (I some kind forgot sorry) but if you don't mind, you can list such cases here. Thanks, -Kai
On 6/16/2017 4:33 PM, Huang, Kai wrote: > > > On 6/16/2017 4:11 PM, Andy Lutomirski wrote: >> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai >> <kai.huang@linux.intel.com> wrote: >>> >>> >>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote: >>>> >>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com> >>>> wrote: >>>>> >>>>> >>>>> I don't know whether SGX driver will have restrict on running >>>>> provisioning >>>>> enclave. In my understanding provisioning enclave is always from >>>>> Intel. >>>>> However I am not expert here and probably be wrong. Can you point out >>>>> *exactly* what restricts in host must/should be applied to guest so >>>>> that >>>>> Jarkko can know whether he will support those restricts or not? >>>>> Otherwise >>>>> I >>>>> don't think we even need to talk about this topic at current stage. >>>>> >>>> >>>> The whole point is that I don't know. But here are two types of >>>> restriction I can imagine demand for: >>>> >>>> 1. Only a particular approved provisioning enclave may run (be it >>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a >>>> non-Intel provisioning enclave). This would be done to restrict what >>>> types of remote attestation can be done. (Intel supplies a remote >>>> attestation service that uses some contractual policy that I don't >>>> know. Maybe a system owner wants a different policy applied to ISVs.) >>>> Imposing this policy on guests more or less requires filtering >>>> EINIT. >>> >>> >>> Hi Andy, >>> >>> Sorry for late reply. >>> >>> What is the issue if host and guest run provisioning enclave from >>> different >>> vendor, for example, host runs intel's provisioning enclave, and >>> guest runs >>> other vendor's provisioning enclave? Or different guests run >>> provisioning >>> enclaves from different vendors? >> >> There's no issue unless someone has tried to impose a policy. There >> is clearly at least some interest in having policies that affect what >> enclaves can run -- otherwise there wouldn't be LEs in the first >> place. >> >>> >>> One reason I am asking is that, on Xen (where we don't have concept of >>> *host*), it's likely that we won't apply any policy at Xen hypervisor at >>> all, and guests will be able to run any enclave from any signer as their >>> wish. >> >> That seems entirely reasonable. Someone may eventually ask Xen to add >> support for SGX enclave restrictions, in which case you'll either have >> to tell them that it won't happen or implement it. >> >>> >>> Sorry that I don't understand (or kind of forgot) the issues here. >>> >>>> >>>> 2. For kiosk-ish or single-purpose applications, I can imagine that >>>> you would want to allow a specific list of enclave signers or even >>>> enclave hashes. Maybe you would allow exactly one enclave hash. You >>>> could kludge this up with a restrictive LE policy, but you could also >>>> do it for real by implementing the specific restriction in the kernel. >>>> Then you'd want to impose it on the guest, and you'd do it by >>>> filtering EINIT. >>> >>> Assuming the enclave hash means measurement of enclave, and assuming >>> we have >>> a policy that we only allow enclave from one signer to run, would you >>> also >>> elaborate the issue that, if host and guest run enclaves from different >>> signer? If host has such policy, and we are allowing creating guests >>> on such >>> host, I think that typically we will have the same policy in the guest >> >> Yes, I presume this too, but. >> >>> (vetted by guest's kernel). The owner of that host should be aware of >>> the >>> risk (if there's any) by creating guest and run enclave inside it. >> >> No. The host does not trust the guest in general. If the host has a > > I agree. > >> policy that the only enclave that shall run is X, that doesn't mean >> that the host shall reject all enclaves requested by the normal >> userspace API except X but that, if /dev/kvm is used, then the user is >> magically trusted to not load a guest that fails to respect the host >> policy. It means that the only enclave that shall run is X regardless >> of what interface is used. The host must only allow X to be loaded by >> its userspace and the host must only allow X to be loaded by a guest. >> > > This is theoretical thing. I think your statement makes sense only if we > have specific example that can prove there's actual risk when allowing > guest to exceed X approved by host. > > I will dig more in your previous emails to see whether you have listed > such real cases (I some kind forgot sorry) but if you don't mind, you > can list such cases here. Hi Andy, I can find an example you listed in your previous email but it is not related to host policy but related to SGX's key architecture issue. I quoted below: "Concretely, imagine I write an enclave that seals my TLS client certificate's private key and offers an API to sign TLS certificate requests with it. This way, if my system is compromised, an attacker can use the certificate only so long as they have access to my machine. If I kick them out or if they merely get the ability to read the sealed data but not to execute code, the private key should still be safe. But, if this system is a VM guest, the attacker could run the exact same enclave on another guest on the same physical CPU and sign using my key. Whoops!" I think you will have this problem even you apply the most strict policy at both host and guest -- only allow one enclave from one signer to run. This is indeed a flaw but virtualization cannot do anything to solve this -- unless we don't support virtualization at all :) Sorry I am just trying to find out whether there's real case that really require we apply host's policy to guest and will have problem if we don't. Thanks, -Kai > > Thanks, > -Kai
On Thu, Jun 15, 2017 at 9:33 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > > On 6/16/2017 4:11 PM, Andy Lutomirski wrote: >> >> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com> >> wrote: >>> >>> >>> >>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote: >>>> >>>> >>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com> >>>> wrote: >>>>> >>>>> >>>>> >>>>> I don't know whether SGX driver will have restrict on running >>>>> provisioning >>>>> enclave. In my understanding provisioning enclave is always from Intel. >>>>> However I am not expert here and probably be wrong. Can you point out >>>>> *exactly* what restricts in host must/should be applied to guest so >>>>> that >>>>> Jarkko can know whether he will support those restricts or not? >>>>> Otherwise >>>>> I >>>>> don't think we even need to talk about this topic at current stage. >>>>> >>>> >>>> The whole point is that I don't know. But here are two types of >>>> restriction I can imagine demand for: >>>> >>>> 1. Only a particular approved provisioning enclave may run (be it >>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a >>>> non-Intel provisioning enclave). This would be done to restrict what >>>> types of remote attestation can be done. (Intel supplies a remote >>>> attestation service that uses some contractual policy that I don't >>>> know. Maybe a system owner wants a different policy applied to ISVs.) >>>> Imposing this policy on guests more or less requires filtering EINIT. >>> >>> >>> >>> Hi Andy, >>> >>> Sorry for late reply. >>> >>> What is the issue if host and guest run provisioning enclave from >>> different >>> vendor, for example, host runs intel's provisioning enclave, and guest >>> runs >>> other vendor's provisioning enclave? Or different guests run provisioning >>> enclaves from different vendors? >> >> >> There's no issue unless someone has tried to impose a policy. There >> is clearly at least some interest in having policies that affect what >> enclaves can run -- otherwise there wouldn't be LEs in the first >> place. >> >>> >>> One reason I am asking is that, on Xen (where we don't have concept of >>> *host*), it's likely that we won't apply any policy at Xen hypervisor at >>> all, and guests will be able to run any enclave from any signer as their >>> wish. >> >> >> That seems entirely reasonable. Someone may eventually ask Xen to add >> support for SGX enclave restrictions, in which case you'll either have >> to tell them that it won't happen or implement it. >> >>> >>> Sorry that I don't understand (or kind of forgot) the issues here. >>> >>>> >>>> 2. For kiosk-ish or single-purpose applications, I can imagine that >>>> you would want to allow a specific list of enclave signers or even >>>> enclave hashes. Maybe you would allow exactly one enclave hash. You >>>> could kludge this up with a restrictive LE policy, but you could also >>>> do it for real by implementing the specific restriction in the kernel. >>>> Then you'd want to impose it on the guest, and you'd do it by >>>> filtering EINIT. >>> >>> >>> Assuming the enclave hash means measurement of enclave, and assuming we >>> have >>> a policy that we only allow enclave from one signer to run, would you >>> also >>> elaborate the issue that, if host and guest run enclaves from different >>> signer? If host has such policy, and we are allowing creating guests on >>> such >>> host, I think that typically we will have the same policy in the guest >> >> >> Yes, I presume this too, but. >> >>> (vetted by guest's kernel). The owner of that host should be aware of the >>> risk (if there's any) by creating guest and run enclave inside it. >> >> >> No. The host does not trust the guest in general. If the host has a > > > I agree. > >> policy that the only enclave that shall run is X, that doesn't mean >> that the host shall reject all enclaves requested by the normal >> userspace API except X but that, if /dev/kvm is used, then the user is >> magically trusted to not load a guest that fails to respect the host >> policy. It means that the only enclave that shall run is X regardless >> of what interface is used. The host must only allow X to be loaded by >> its userspace and the host must only allow X to be loaded by a guest. >> > > This is theoretical thing. I think your statement makes sense only if we > have specific example that can prove there's actual risk when allowing guest > to exceed X approved by host. I would turn this around. Can you come up with any example where the host would have a restrictive policy but that the policy should not be enforced for guests?
On Thu, Jun 15, 2017 at 9:33 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > > On 6/16/2017 4:11 PM, Andy Lutomirski wrote: >> >> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com> >> wrote: >>> >>> >>> >>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote: >>>> >>>> >>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com> >>>> wrote: >>>>> >>>>> >>>>> >>>>> I don't know whether SGX driver will have restrict on running >>>>> provisioning >>>>> enclave. In my understanding provisioning enclave is always from Intel. >>>>> However I am not expert here and probably be wrong. Can you point out >>>>> *exactly* what restricts in host must/should be applied to guest so >>>>> that >>>>> Jarkko can know whether he will support those restricts or not? >>>>> Otherwise >>>>> I >>>>> don't think we even need to talk about this topic at current stage. >>>>> >>>> >>>> The whole point is that I don't know. But here are two types of >>>> restriction I can imagine demand for: >>>> >>>> 1. Only a particular approved provisioning enclave may run (be it >>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a >>>> non-Intel provisioning enclave). This would be done to restrict what >>>> types of remote attestation can be done. (Intel supplies a remote >>>> attestation service that uses some contractual policy that I don't >>>> know. Maybe a system owner wants a different policy applied to ISVs.) >>>> Imposing this policy on guests more or less requires filtering EINIT. >>> >>> >>> >>> Hi Andy, >>> >>> Sorry for late reply. >>> >>> What is the issue if host and guest run provisioning enclave from >>> different >>> vendor, for example, host runs intel's provisioning enclave, and guest >>> runs >>> other vendor's provisioning enclave? Or different guests run provisioning >>> enclaves from different vendors? >> >> >> There's no issue unless someone has tried to impose a policy. There >> is clearly at least some interest in having policies that affect what >> enclaves can run -- otherwise there wouldn't be LEs in the first >> place. >> >>> >>> One reason I am asking is that, on Xen (where we don't have concept of >>> *host*), it's likely that we won't apply any policy at Xen hypervisor at >>> all, and guests will be able to run any enclave from any signer as their >>> wish. >> >> >> That seems entirely reasonable. Someone may eventually ask Xen to add >> support for SGX enclave restrictions, in which case you'll either have >> to tell them that it won't happen or implement it. >> >>> >>> Sorry that I don't understand (or kind of forgot) the issues here. >>> >>>> >>>> 2. For kiosk-ish or single-purpose applications, I can imagine that >>>> you would want to allow a specific list of enclave signers or even >>>> enclave hashes. Maybe you would allow exactly one enclave hash. You >>>> could kludge this up with a restrictive LE policy, but you could also >>>> do it for real by implementing the specific restriction in the kernel. >>>> Then you'd want to impose it on the guest, and you'd do it by >>>> filtering EINIT. >>> >>> >>> Assuming the enclave hash means measurement of enclave, and assuming we >>> have >>> a policy that we only allow enclave from one signer to run, would you >>> also >>> elaborate the issue that, if host and guest run enclaves from different >>> signer? If host has such policy, and we are allowing creating guests on >>> such >>> host, I think that typically we will have the same policy in the guest >> >> >> Yes, I presume this too, but. >> >>> (vetted by guest's kernel). The owner of that host should be aware of the >>> risk (if there's any) by creating guest and run enclave inside it. >> >> >> No. The host does not trust the guest in general. If the host has a > > > I agree. > >> policy that the only enclave that shall run is X, that doesn't mean >> that the host shall reject all enclaves requested by the normal >> userspace API except X but that, if /dev/kvm is used, then the user is >> magically trusted to not load a guest that fails to respect the host >> policy. It means that the only enclave that shall run is X regardless >> of what interface is used. The host must only allow X to be loaded by >> its userspace and the host must only allow X to be loaded by a guest. >> > > This is theoretical thing. I think your statement makes sense only if we > have specific example that can prove there's actual risk when allowing guest > to exceed X approved by host. > > I will dig more in your previous emails to see whether you have listed such > real cases (I some kind forgot sorry) but if you don't mind, you can list > such cases here. I'm operating under the assumption that some kind of policy exists in the first place. I can imagine everything working fairly well without any real policy, but apparently there are vendors who want restrictive policies. What I can't imagine is anyone who wants a restrictive policy but is then okay with the host only partially enforcing it. > > Thanks, > -Kai
Andy Lutomirski <luto@kernel.org> wrote: > On Thu, Jun 15, 2017 at 9:33 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > > > > > On 6/16/2017 4:11 PM, Andy Lutomirski wrote: > >> > >> On Thu, Jun 15, 2017 at 8:46 PM, Huang, Kai <kai.huang@linux.intel.com> > >> wrote: > >>> > >>> > >>> > >>> On 6/13/2017 11:00 AM, Andy Lutomirski wrote: > >>>> > >>>> > >>>> On Mon, Jun 12, 2017 at 3:08 PM, Huang, Kai <kai.huang@linux.intel.com> > >>>> wrote: > >>>>> > >>>>> > >>>>> > >>>>> I don't know whether SGX driver will have restrict on running > >>>>> provisioning > >>>>> enclave. In my understanding provisioning enclave is always from Intel. > >>>>> However I am not expert here and probably be wrong. Can you point out > >>>>> *exactly* what restricts in host must/should be applied to guest so > >>>>> that > >>>>> Jarkko can know whether he will support those restricts or not? > >>>>> Otherwise > >>>>> I > >>>>> don't think we even need to talk about this topic at current stage. > >>>>> > >>>> > >>>> The whole point is that I don't know. But here are two types of > >>>> restriction I can imagine demand for: > >>>> > >>>> 1. Only a particular approved provisioning enclave may run (be it > >>>> Intel's or otherwise -- with a non-Intel LE, I think you can launch a > >>>> non-Intel provisioning enclave). This would be done to restrict what > >>>> types of remote attestation can be done. (Intel supplies a remote > >>>> attestation service that uses some contractual policy that I don't > >>>> know. Maybe a system owner wants a different policy applied to ISVs.) > >>>> Imposing this policy on guests more or less requires filtering EINIT. > >>> > >>> > >>> > >>> Hi Andy, > >>> > >>> Sorry for late reply. > >>> > >>> What is the issue if host and guest run provisioning enclave from > >>> different > >>> vendor, for example, host runs intel's provisioning enclave, and guest > >>> runs > >>> other vendor's provisioning enclave? Or different guests run provisioning > >>> enclaves from different vendors? > >> > >> > >> There's no issue unless someone has tried to impose a policy. There > >> is clearly at least some interest in having policies that affect what > >> enclaves can run -- otherwise there wouldn't be LEs in the first > >> place. > >> > >>> > >>> One reason I am asking is that, on Xen (where we don't have concept of > >>> *host*), it's likely that we won't apply any policy at Xen hypervisor at > >>> all, and guests will be able to run any enclave from any signer as their > >>> wish. > >> > >> > >> That seems entirely reasonable. Someone may eventually ask Xen to add > >> support for SGX enclave restrictions, in which case you'll either have > >> to tell them that it won't happen or implement it. > >> > >>> > >>> Sorry that I don't understand (or kind of forgot) the issues here. > >>> > >>>> > >>>> 2. For kiosk-ish or single-purpose applications, I can imagine that > >>>> you would want to allow a specific list of enclave signers or even > >>>> enclave hashes. Maybe you would allow exactly one enclave hash. You > >>>> could kludge this up with a restrictive LE policy, but you could also > >>>> do it for real by implementing the specific restriction in the kernel. > >>>> Then you'd want to impose it on the guest, and you'd do it by > >>>> filtering EINIT. > >>> > >>> > >>> Assuming the enclave hash means measurement of enclave, and assuming we > >>> have > >>> a policy that we only allow enclave from one signer to run, would you > >>> also > >>> elaborate the issue that, if host and guest run enclaves from different > >>> signer? If host has such policy, and we are allowing creating guests on > >>> such > >>> host, I think that typically we will have the same policy in the guest > >> > >> > >> Yes, I presume this too, but. > >> > >>> (vetted by guest's kernel). The owner of that host should be aware of the > >>> risk (if there's any) by creating guest and run enclave inside it. > >> > >> > >> No. The host does not trust the guest in general. If the host has a > > > > > > I agree. > > > >> policy that the only enclave that shall run is X, that doesn't mean > >> that the host shall reject all enclaves requested by the normal > >> userspace API except X but that, if /dev/kvm is used, then the user is > >> magically trusted to not load a guest that fails to respect the host > >> policy. It means that the only enclave that shall run is X regardless > >> of what interface is used. The host must only allow X to be loaded by > >> its userspace and the host must only allow X to be loaded by a guest. > >> > > > > This is theoretical thing. I think your statement makes sense only if we > > have specific example that can prove there's actual risk when allowing guest > > to exceed X approved by host. > > > > I will dig more in your previous emails to see whether you have listed such > > real cases (I some kind forgot sorry) but if you don't mind, you can list > > such cases here. > > I'm operating under the assumption that some kind of policy exists in > the first place. I can imagine everything working fairly well without > any real policy, but apparently there are vendors who want restrictive > policies. What I can't imagine is anyone who wants a restrictive > policy but is then okay with the host only partially enforcing it. I think there is a certain amount of inception going on here, i.e. the only reason we're discussing LE enforced policies in the kernel is because the LE architecture exists and can't be disabled. The LE, as originally designed, is intended to be a way for *userspace* to control what code can run on the system, e.g. to provide a hook for anti-virus/malware to inspect an enclave since it's impossible to inspect an enclave once it is running. The kernel doesn't need an LE to restrict what enclaves can run, e.g. it can perform inspection at any point during the initialization process. This is true for guest enclaves as well since the kernel can trap EINIT. By making the LE kernel-only we've bastardized the concept of the LE and have negated the primary value provided by an LE[1][2]. In my opinion, the discussion of the kernel's launch policies is much ado about nothing, e.g. if supported by hardware, I think we'd opt to disable launch control completely. [1] On a system with unlocked IA32_SGXLEPUBKEYHASH MSRs, the only value added by a using a LE to enforce the kernel's policies is defense-in-depth, e.g. an attacker can't hide malicious code in an enclave even if it gains control of the kernel. I think this is a very minor benefit since running in an enclave doesn't grant any new privileges and doesn't persist across system reset. [2] I think it's safe to assume that any use case that requires locked hash MSRs is out of scope for this discussion, given that the upstream kernel will require unlocked MSRs.
On Fri, Jun 16, 2017 at 9:31 AM, Christopherson, Sean J <sean.j.christopherson@intel.com> wrote: > I think there is a certain amount of inception going on here, i.e. the only > reason we're discussing LE enforced policies in the kernel is because the LE > architecture exists and can't be disabled. The LE, as originally designed, > is intended to be a way for *userspace* to control what code can run on the > system, e.g. to provide a hook for anti-virus/malware to inspect an enclave > since it's impossible to inspect an enclave once it is running. > > The kernel doesn't need an LE to restrict what enclaves can run, e.g. it can > perform inspection at any point during the initialization process. This is > true for guest enclaves as well since the kernel can trap EINIT. By making > the LE kernel-only we've bastardized the concept of the LE and have negated > the primary value provided by an LE[1][2]. In my opinion, the discussion of > the kernel's launch policies is much ado about nothing, e.g. if supported by > hardware, I think we'd opt to disable launch control completely. Agreed. I don't think I've ever said that the kernel should implement restrictions on what enclaves should run [1]. All I've said is that (a) if the kernel does implement restrictions like this, it should apply them to guests as well and (b) that the kernel should probably trap EINIT because that's the most sensible way to deal with the MSRs. [1] With the possible exception of provisioning enclaves. I'm still not convinced that anyone except root should be allowed to run an exclave with the provision bit set, as that bit gives access to the provisioning key, which is rather special. From memory, it bypasses the owner epoch, and it may have privacy issues. Maybe this is a nonissue, but I'd like to see someone seriously analyze how provisioning enclaves that may not be signed by Intel affect the overall security of the system and how Linux should handle them. SGX was designed under the assumption that provisioning enclaves would only ever be signed by Intel, and that's not the case any more, and dealing with this intelligently may require some thought.
On Thu, 2017-05-11 at 23:11 -0700, Andy Lutomirski wrote: > On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com> wrote: > > > > > Have a percpu variable that stores the current SGXLEPUBKEYHASH along > > > with whatever lock is needed (probably just a mutex). Users of EINIT > > > will take the mutex, compare the percpu variable to the desired value, > > > and, if it's different, do WRMSR and update the percpu variable. > > > > > > KVM will implement writes to SGXLEPUBKEYHASH by updating its in-memory > > > state but *not* changing the MSRs. KVM will trap and emulate EINIT to > > > support the same handling as the host. There is no action required at > > > all on KVM guest entry and exit. > > > > This is doable, but SGX driver needs to do those things and expose > > interfaces for KVM to use. In terms of the percpu data, it is nice to have, > > but I am not sure whether it is mandatory, as IMO EINIT is not even in > > performance critical path. We can simply read old value from MSRs out and > > compare whether the old equals to the new. > I think the SGX driver should probably live in arch/x86, and the > interface could be a simple percpu variable that is exported (from the > main kernel image, not from a module). Jarkko, what are your thoughts on moving the SGX code into arch/x86 and removing the option to build it as a module? This would simplify the KVM and EPC cgroup implementations.
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index e3770f570bb9..70482b951b0f 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -417,6 +417,11 @@ #define MSR_IA32_TSC_ADJUST 0x0000003b #define MSR_IA32_BNDCFGS 0x00000d90 +#define MSR_IA32_SGXLEPUBKEYHASH0 0x0000008c +#define MSR_IA32_SGXLEPUBKEYHASH1 0x0000008d +#define MSR_IA32_SGXLEPUBKEYHASH2 0x0000008e +#define MSR_IA32_SGXLEPUBKEYHASH3 0x0000008f + #define MSR_IA32_XSS 0x00000da0 #define FEATURE_CONTROL_LOCKED (1<<0) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a16539594a99..c96332b9dd44 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -656,6 +656,9 @@ struct vcpu_vmx { */ u64 msr_ia32_feature_control; u64 msr_ia32_feature_control_valid_bits; + + /* SGX Launch Control public key hash */ + u64 msr_ia32_sgxlepubkeyhash[4]; }; enum segment_cache_field { @@ -2244,6 +2247,70 @@ static void decache_tsc_multiplier(struct vcpu_vmx *vmx) vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio); } +static bool cpu_sgx_lepubkeyhash_writable(void) +{ + u64 val, sgx_lc_enabled_mask = (FEATURE_CONTROL_LOCKED | + FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE); + + rdmsrl(MSR_IA32_FEATURE_CONTROL, val); + + return ((val & sgx_lc_enabled_mask) == sgx_lc_enabled_mask); +} + +static bool vmx_sgx_lc_disabled_in_bios(struct kvm_vcpu *vcpu) +{ + return (to_vmx(vcpu)->msr_ia32_feature_control & FEATURE_CONTROL_LOCKED) + && (!(to_vmx(vcpu)->msr_ia32_feature_control & + FEATURE_CONTROL_SGX_LAUNCH_CONTROL_ENABLE)); +} + +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH0 0xa6053e051270b7ac +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH1 0x6cfbe8ba8b3b413d +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH2 0xc4916d99f2b3735d +#define SGX_INTEL_DEFAULT_LEPUBKEYHASH3 0xd4f8c05909f9bb3b + +static void vmx_sgx_init_lepubkeyhash(struct kvm_vcpu *vcpu) +{ + u64 h0, h1, h2, h3; + + /* + * If runtime launch control is enabled (IA32_SGXLEPUBKEYHASHn is + * writable), we set guest's default value to be Intel's default + * hash (which is fixed value and can be hard-coded). Otherwise, + * guest can only use machine's IA32_SGXLEPUBKEYHASHn so set guest's + * default to that. + */ + if (cpu_sgx_lepubkeyhash_writable()) { + h0 = SGX_INTEL_DEFAULT_LEPUBKEYHASH0; + h1 = SGX_INTEL_DEFAULT_LEPUBKEYHASH1; + h2 = SGX_INTEL_DEFAULT_LEPUBKEYHASH2; + h3 = SGX_INTEL_DEFAULT_LEPUBKEYHASH3; + } + else { + rdmsrl(MSR_IA32_SGXLEPUBKEYHASH0, h0); + rdmsrl(MSR_IA32_SGXLEPUBKEYHASH1, h1); + rdmsrl(MSR_IA32_SGXLEPUBKEYHASH2, h2); + rdmsrl(MSR_IA32_SGXLEPUBKEYHASH3, h3); + } + + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0] = h0; + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1] = h1; + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2] = h2; + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3] = h3; +} + +static void vmx_sgx_lepubkeyhash_load(struct kvm_vcpu *vcpu) +{ + wrmsrl(MSR_IA32_SGXLEPUBKEYHASH0, + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[0]); + wrmsrl(MSR_IA32_SGXLEPUBKEYHASH1, + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[1]); + wrmsrl(MSR_IA32_SGXLEPUBKEYHASH2, + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[2]); + wrmsrl(MSR_IA32_SGXLEPUBKEYHASH3, + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[3]); +} + /* * Switches to specified vcpu, until a matching vcpu_put(), but assumes * vcpu mutex is already taken. @@ -2316,6 +2383,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) vmx_vcpu_pi_load(vcpu, cpu); vmx->host_pkru = read_pkru(); + + /* + * Load guset's SGX LE pubkey hash if runtime launch control is + * enabled. + */ + if (guest_cpuid_has_sgx_launch_control(vcpu) && + cpu_sgx_lepubkeyhash_writable()) + vmx_sgx_lepubkeyhash_load(vcpu); } static void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu) @@ -3225,6 +3300,19 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_FEATURE_CONTROL: msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control; break; + case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3: + /* + * SDM 35.1 Model-Specific Registers, table 35-2. + * Read permitted if CPUID.0x12.0:EAX[0] = 1. (We have + * guaranteed this will be true if guest_cpuid_has_sgx + * is true.) + */ + if (!guest_cpuid_has_sgx(vcpu)) + return 1; + msr_info->data = + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_info->index - + MSR_IA32_SGXLEPUBKEYHASH0]; + break; case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC: if (!nested_vmx_allowed(vcpu)) return 1; @@ -3344,6 +3432,37 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) * SGX has been enabled in BIOS before using SGX. */ break; + case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3: + /* + * SDM 35.1 Model-Specific Registers, table 35-2. + * - If CPUID.0x7.0:ECX[30] = 1, FEATURE_CONTROL[17] is + * available. + * - Write permitted if CPUID.0x12.0:EAX[0] = 1 && + * FEATURE_CONTROL[17] = 1 && FEATURE_CONTROL[0] = 1. + */ + if (!guest_cpuid_has_sgx(vcpu) || + !guest_cpuid_has_sgx_launch_control(vcpu)) + return 1; + /* + * Don't let userspace set guest's IA32_SGXLEPUBKEYHASHn, + * if machine's IA32_SGXLEPUBKEYHASHn cannot be changed at + * runtime. Note to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash are + * set to default in vmx_create_vcpu therefore guest is able + * to get the machine's IA32_SGXLEPUBKEYHASHn by rdmsr in + * guest. + */ + if (!cpu_sgx_lepubkeyhash_writable()) + return 1; + /* + * If guest's FEATURE_CONTROL[17] is not set, guest's + * IA32_SGXLEPUBKEYHASHn are not writeable from guest. + */ + if (!vmx_sgx_lc_disabled_in_bios(vcpu) && + !msr_info->host_initiated) + return 1; + to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash[msr_index - + MSR_IA32_SGXLEPUBKEYHASH0] = data; + break; case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC: if (!msr_info->host_initiated) return 1; /* they are read-only */ @@ -9305,6 +9424,10 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) vmx->nested.vpid02 = allocate_vpid(); } + /* Set vcpu's default IA32_SGXLEPUBKEYHASHn */ + if (enable_sgx && boot_cpu_has(X86_FEATURE_SGX_LAUNCH_CONTROL)) + vmx_sgx_init_lepubkeyhash(&vmx->vcpu); + vmx->nested.posted_intr_nv = -1; vmx->nested.current_vmptr = -1ull; vmx->nested.current_vmcs12 = NULL;
If SGX runtime launch control is enabled on host (IA32_FEATURE_CONTROL[17] is set), KVM can support running multiple guests with each running LE signed with different RSA pubkey. KVM traps IA32_SGXLEPUBKEYHASHn MSR write from and keeps the values to vcpu internally, and when vcpu is scheduled in, KVM write those values to real IA32_SGXLEPUBKEYHASHn MSR. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> --- arch/x86/include/asm/msr-index.h | 5 ++ arch/x86/kvm/vmx.c | 123 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 128 insertions(+)