Message ID | 20200421215316.56503-1-jarkko.sakkinen@linux.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Intel SGX foundations | expand |
On 4/21/20 2:52 PM, Jarkko Sakkinen wrote: > v29: > * The selftest has been moved to selftests/sgx. Because SGX is an execution > environment of its own, it really isn't a great fit with more "standard" > x86 tests. > > The RSA key is now generated on fly and the whole signing process has > been made as part of the enclave loader instead of signing the enclave > during the compilation time. > > Finally, the enclave loader loads now the test enclave directly from its > ELF file, which means that ELF file does not need to be coverted as raw > binary during the build process. > * Version the mm_list instead of using on synchronize_mm() when adding new > entries. We hold the write lock for the mm_struct, and dup_mm() can thus > deadlock with the page reclaimer, which could hold the lock for the old > mm_struct. > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can > be used to reserve the address range. Now /dev/sgx supports only opaque > mappings to the (initialized) enclave data. > * Make the vDSO callable directly from C by preserving RBX and taking leaf > from RCX. Hi all, I've been producing Fedora 32 kernel builds with the SGX patches applied for a few of weeks and I've just produced a build with this latest revision[1]. We've been using these kernels to enable SGX for some of our development/test machines. We wanted to offer them here in the hopes that others might find them useful for testing the SGX patchsets on their own machines to send feedback to this list. Please note that these are *not* meant to replace your distro kernel and these are for testing purposes only. I'll continue to upload builds to a Fedora Copr[2] as long as the patches continue to apply cleanly to the Fedora kernels. Best, Connor [1] https://download.copr.fedorainfracloud.org/results/npmccallum/enarx/fedora-32-x86_64/01344404-kernel/ [2] https://copr.fedorainfracloud.org/coprs/npmccallum/enarx/ [3] This is the packaging branch that I work from and rebase on top of the f32 kernels: https://github.com/connorkuehl/fedora-kernel-enarx-pkg/commits/f32-enarx
On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote: Good day, I hope the weekend is going well for everyone. > Intel(R) SGX is a set of CPU instructions that can be used by applications > to set aside private regions of code and data. The code outside the enclave > is disallowed to access the memory inside the enclave by the CPU access > control. > > ... [ elided ] .. > > The current implementation requires that the firmware sets > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > decide what enclaves it wants run. The implementation does not create > any bottlenecks to support read-only MSRs later on. It seems highly unlikely that a driver implementation with any type of support for read-only launch control registers would ever get into the kernel. All one needs to do is review the conversations that Matthew Garrett's lockdown patches engender to get a sense of that, ie: https://lwn.net/Articles/818277/ As a result, the proposed SGX driver needs support for cryptographic policy management before it goes into the kernel. Either the patch that we have offered or something equivalent. Absent that, the driver won't address the full needs of the development community implementing runtimes. In addition it also poses security and privacy issues that are well documented in the literature. As an aside, for those who haven't spent the last 5+ years of their life working with this technology. SGX2 hardware platforms have the ability to allow unrestricted code execution in enclave context. No amount of LSM or IMA interventions can provide any control over that. In fact, the Confidential Computing Consortium, sponsored by none other then the Linux Foundation, has at its fundamental tenant, the notion of developing an eco-system that allows the execution of code and processing of data, over which, the owner or administrator of the platform has no visibility or control. It would seem only logical that adversarial interests would indulge themseleves in those capabilities as well. With respect to SGX and these issues, cryptographic policy management is important for the same reason that 2-factor authentication is now considered standard practice in the security industry. We appreciate, Jarkko, that you feel that such infrastructure is optional, like virtualization support, and is something that can go in after the driver is mainlined. As the diffstat for our patch points out, the proposed support has virtually no impact on the driver, the same cannot be said for virtualization capabilities. Moreover, adding support for key based policy management later would require the addition of another ioctl in order to avoid ABI compatibility issues. The current initialization ioctl is best suited, from an engineering perspective, to support this type of infrastructure. In fact, the necessary support was removed from the ioctl for political reasons rather then for any valid engineering rationale on flexible launch control platforms, particularly with our patch or an equivalent approach. For the benefit of the kernel community at large, I will follow up this e-mail with a copy of our patch for review. In case anyone misses it, or it is corrupted, the patch can be pulled from the following URL: ftp://ftp.enjellic.com/pub/sgx/kernel/SFLC-current.patch We believe the patch or an equivalent approach deserves consideration for the following reasons: 1.) It does not modify the default behavior of the driver. ie. any enclave will be initialized that is presented. 2.) It enables needed functionality only at the discretion and control of the platform owner/administrator. 3.) The impact on the architecture of the driver is negligible. In closing, it is important to note that the proposed SGX driver is not available as a module. This effectively excludes any alternative implementations of the driver without replacement of the kernel at large. It also means that any platform, with SGX hardware support, running a kernel with this driver, has the potential for the security/privacy issues noted above. If key based policy management is not allowed, then the driver needs to be re-architected to have modular support so that alternative implementations or the absence of any driver support are at least tenable. Hopefully this is a reasoned technical approach to what has been a long standing issue surrounding this technology. Best wishes for a productive week. Dr. Greg As always, Dr. Greg Wettstein, Ph.D, Worker SGX secured infrastructure and Enjellic Systems Development, LLC autonomously self-defensive 4206 N. 19th Ave. platforms. Fargo, ND 58102 PH: 701-281-1686 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Opportunity is missed by most people because it is dressed in overalls and looks like work." -- Thomas Edison
The patch I discussed in my previous mail.
--
Implement cryptographic initialization control.
This patch introduces the ability of the platform owner to
implement cryptographically controlled enclave initialization
policy. This functionality provides the platform owner with the
ability to use the identity modulus signature of an enclave
signer (SHA256 hash of the modulus of the signing key) to gate
access to enclave initialization, rather then simply relying
on discretionary access controls that are applied to the SGX
relevant device driver nodes.
The following policy functionality is introduced in this commit:
1.) Control over which keys are allowed to initialize an
enclave.
2.) Control over which keys are allowed to implement launch
enclaves.
3.) Control over which keys are allowed to initialize enclaves
that have access to the PROVISION_KEY attribute.
For each policy type a plurality of key signatures are allowed.
Absent an intent by the platform owner/administrator to use
cryptographic initialization policies, this functionality does
not change the standard behavior of the driver, which is to
allow any enclave presented to the driver to be initialized.
Cryptographic initialization policy is accessed through the
following three pseudo-files that are implemented by this patch:
/sys/kernel/security/signing_keys
/sys/kernel/security/launch_keys
/sys/kernel/security/provisioning_keys
Policy keys are registered with the driver by writing the identity
modulus signature to these files in simple hexadecimal format, ie:
0000000000000000000000000000000000000000000000000000000000000000
The current list of policy keys can be displayed by reading the
contents of the pseudo-files.
In addition to a key signature, the following keywords are
accepted as valid entries for a policy file:
clear
lock
The 'clear' keyword causes all existing entries in a policy list
to be deleted.
The 'lock' keyword causes any further modifications or access to
a policy list to be denied.
All of the policy code is implemented in a single file, policy.c,
with minimal impact to the driver at large. Since the calculation
of the identity modulus signature needed to program a launch control
register is effectively a policy decision, the code to compute the
signature was moved from the ioctl.c file to the policy.c file.
In order to support a plurality of launch keys the code that
loops over initialization attempts was pushed downward into a new
function that is named as follows:
sgx_try_init()
Primarily to avoid excessive indentation that would otherwise be
needed in the sgx_encl_init() function.
Signed-off-by: Dr. Greg <greg@enjellic.com>
---
arch/x86/Kconfig | 1 +
arch/x86/include/uapi/asm/sgx.h | 1 +
arch/x86/kernel/cpu/sgx/Makefile | 3 +-
arch/x86/kernel/cpu/sgx/driver.c | 8 +
arch/x86/kernel/cpu/sgx/driver.h | 2 +
arch/x86/kernel/cpu/sgx/encl.h | 2 +
arch/x86/kernel/cpu/sgx/ioctl.c | 82 ++++---
arch/x86/kernel/cpu/sgx/policy.c | 513 +++++++++++++++++++++++++++++++++++++++
8 files changed, 573 insertions(+), 39 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8bbb4313fae3..5cfde1d36dc9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1955,6 +1955,7 @@ config INTEL_SGX
depends on X86_64 && CPU_SUP_INTEL
select SRCU
select MMU_NOTIFIER
+ select SECURITYFS
help
Intel(R) SGX is a set of CPU instructions that can be used by
applications to set aside private regions of code and data, referred
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index e196cfd44b70..1c3cdbce533d 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -63,6 +63,7 @@ struct sgx_enclave_add_pages {
*/
struct sgx_enclave_init {
__u64 sigstruct;
+ __u64 token;
};
/**
diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
index f8d32da3a67a..d8ee2a889ca1 100644
--- a/arch/x86/kernel/cpu/sgx/Makefile
+++ b/arch/x86/kernel/cpu/sgx/Makefile
@@ -3,4 +3,5 @@ obj-y += \
encl.o \
ioctl.o \
main.o \
- reclaim.o
+ reclaim.o \
+ policy.o
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 997a7f4117c5..d4330b32c243 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -205,5 +205,13 @@ int __init sgx_drv_init(void)
return ret;
}
+ ret = sgx_policy_fs_init();
+ if (ret) {
+ pr_err("SGX policy fs creation failed with %d.\n", ret);
+ misc_deregister(&sgx_dev_provision);
+ misc_deregister(&sgx_dev_enclave);
+ return ret;
+ }
+
return 0;
}
diff --git a/arch/x86/kernel/cpu/sgx/driver.h b/arch/x86/kernel/cpu/sgx/driver.h
index 72747d01c046..8293f4d12e82 100644
--- a/arch/x86/kernel/cpu/sgx/driver.h
+++ b/arch/x86/kernel/cpu/sgx/driver.h
@@ -29,4 +29,6 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
int sgx_drv_init(void);
+int sgx_policy_fs_init(void);
+u64 *sgx_policy_get_launch_signer(u64 *signature);
#endif /* __ARCH_X86_SGX_DRIVER_H__ */
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 44b353aa8866..b5be1f2233ea 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -124,4 +124,6 @@ unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
bool sgx_va_page_full(struct sgx_va_page *va_page);
+int sgx_policy_get_params(struct sgx_encl *encl, void *modulus, u64 *signer,
+ int *signcnt);
#endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 12e1496f8a8b..e960be8f924c 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -556,31 +556,6 @@ static long sgx_ioc_enclave_add_pages(struct sgx_encl *encl, void __user *arg)
return ret;
}
-static int __sgx_get_key_hash(struct crypto_shash *tfm, const void *modulus,
- void *hash)
-{
- SHASH_DESC_ON_STACK(shash, tfm);
-
- shash->tfm = tfm;
-
- return crypto_shash_digest(shash, modulus, SGX_MODULUS_SIZE, hash);
-}
-
-static int sgx_get_key_hash(const void *modulus, void *hash)
-{
- struct crypto_shash *tfm;
- int ret;
-
- tfm = crypto_alloc_shash("sha256", 0, CRYPTO_ALG_ASYNC);
- if (IS_ERR(tfm))
- return PTR_ERR(tfm);
-
- ret = __sgx_get_key_hash(tfm, modulus, hash);
-
- crypto_free_shash(tfm);
- return ret;
-}
-
static void sgx_update_lepubkeyhash_msrs(u64 *lepubkeyhash, bool enforce)
{
u64 *cache;
@@ -611,22 +586,14 @@ static int sgx_einit(struct sgx_sigstruct *sigstruct, void *token,
return ret;
}
-static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
- void *token)
+static int sgx_try_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
+ void *token, u64 *lepubkeyhash)
+
{
- u64 mrsigner[4];
int ret;
int i;
int j;
- /* Check that the required attributes have been authorized. */
- if (encl->secs_attributes & ~encl->allowed_attributes)
- return -EACCES;
-
- ret = sgx_get_key_hash(sigstruct->modulus, mrsigner);
- if (ret)
- return ret;
-
mutex_lock(&encl->lock);
if (atomic_read(&encl->flags) & SGX_ENCL_INITIALIZED) {
@@ -637,7 +604,7 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
for (i = 0; i < SGX_EINIT_SLEEP_COUNT; i++) {
for (j = 0; j < SGX_EINIT_SPIN_COUNT; j++) {
ret = sgx_einit(sigstruct, token, encl->secs.epc_page,
- mrsigner);
+ lepubkeyhash);
if (ret == SGX_UNMASKED_EVENT)
continue;
else
@@ -673,6 +640,36 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct,
return ret;
}
+static int sgx_encl_init(struct sgx_encl *encl,
+ struct sgx_sigstruct *sigstruct, void *token)
+{
+ u64 mrsigner[4];
+ u64 *signer;
+ int ret;
+ int signcnt = 1;
+
+ /* Configure the launch policy. */
+ ret = sgx_policy_get_params(encl, sigstruct->modulus, mrsigner,
+ &signcnt);
+ if (ret)
+ return ret;
+
+ /* Check that the required attributes have been authorized. */
+ if (encl->secs_attributes & ~encl->allowed_attributes)
+ return -EACCES;
+
+ signer = mrsigner;
+ while (signcnt--) {
+ ret = sgx_try_init(encl, sigstruct, token, signer);
+ if (!ret)
+ return ret;
+ if (signcnt)
+ signer = sgx_policy_get_launch_signer(signer);
+ }
+
+ return ret;
+}
+
/**
* sgx_ioc_enclave_init - handler for %SGX_IOC_ENCLAVE_INIT
*
@@ -708,7 +705,16 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg)
sigstruct = kmap(initp_page);
token = (void *)((unsigned long)sigstruct + PAGE_SIZE / 2);
- memset(token, 0, SGX_LAUNCH_TOKEN_SIZE);
+ if (!einit.token)
+ memset(token, 0, SGX_LAUNCH_TOKEN_SIZE);
+ else {
+ if (copy_from_user((uint8_t *) token,
+ (void __user *) einit.token,
+ SGX_LAUNCH_TOKEN_SIZE)) {
+ ret = -EFAULT;
+ goto out;
+ }
+ }
if (copy_from_user(sigstruct, (void __user *)einit.sigstruct,
sizeof(*sigstruct))) {
diff --git a/arch/x86/kernel/cpu/sgx/policy.c b/arch/x86/kernel/cpu/sgx/policy.c
new file mode 100644
index 000000000000..04ff13e4ce73
--- /dev/null
+++ b/arch/x86/kernel/cpu/sgx/policy.c
@@ -0,0 +1,513 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+// Copyright(c) Enjellic Systems Development, LLC
+
+#define KEY_SIZE 32
+
+#include <linux/types.h>
+#include <linux/seq_file.h>
+#include <linux/atomic.h>
+#include <linux/security.h>
+#include "driver.h"
+#include "encl.h"
+
+static struct dentry *sgx_fs;
+
+struct list_key {
+ struct list_head list;
+ u64 key[KEY_SIZE / 8];
+};
+
+struct list_key_iterator {
+ char *type;
+ atomic_t *opencount;
+ unsigned int *count;
+ struct mutex *lock;
+ struct list_head *list;
+ bool *lockfile;
+};
+
+static struct dentry *launch_keys;
+static atomic_t launch_keys_opencount = ATOMIC_INIT(1);
+static unsigned int launch_keys_count;
+static bool launch_keys_locked;
+static DEFINE_MUTEX(launch_key_list_mutex);
+static LIST_HEAD(launch_key_list);
+
+static struct dentry *provision_keys;
+static atomic_t provision_keys_opencount = ATOMIC_INIT(1);
+static unsigned int provision_keys_count;
+static bool provision_keys_locked;
+static DEFINE_MUTEX(provision_key_list_mutex);
+static LIST_HEAD(provision_key_list);
+
+static struct dentry *signing_keys;
+static atomic_t signing_keys_opencount = ATOMIC_INIT(1);
+static unsigned int signing_keys_count;
+static bool signing_keys_locked;
+static DEFINE_MUTEX(signing_key_list_mutex);
+static LIST_HEAD(signing_key_list);
+
+/**
+ * have_signer - Verify the presence presence of a key signer.
+ *
+ * @signature: Pointer to signature of signer.
+ *
+ * Return:
+ * 0 Signer signature was not found.
+ * 1 Signer signature was found.
+ */
+static bool have_signer(struct list_head *keylist, struct mutex *lock,
+ uint8_t *signature)
+{
+ bool retn = false;
+ struct list_key *kp;
+
+ mutex_lock(lock);
+ list_for_each_entry(kp, keylist, list) {
+ pr_debug("%s: Checking signer=%*phN, ks=%*phN\n", __func__,
+ KEY_SIZE, signature, KEY_SIZE, kp->key);
+ if (memcmp(kp->key, signature, KEY_SIZE) == 0) {
+ retn = true;
+ goto done;
+ }
+ }
+
+ done:
+ mutex_unlock(lock);
+ return retn;
+}
+
+static int process_write_key(const char __user *buf, size_t datalen,
+ unsigned int *keycnt, struct mutex *lock,
+ struct list_head *keylist)
+{
+ ssize_t retn;
+
+ char *p, keybufr[KEY_SIZE*2 + 1], key[KEY_SIZE];
+
+ struct list_key *kp;
+
+ if (datalen != sizeof(keybufr)) {
+ retn = -EINVAL;
+ goto done;
+ }
+
+ memset(keybufr, '\0', sizeof(keybufr));
+ if (copy_from_user(keybufr, buf, datalen)) {
+ retn = -EFAULT;
+ goto done;
+ }
+
+ p = strchr(keybufr, '\n');
+ if (!p) {
+ retn = -EINVAL;
+ goto done;
+ }
+ *p = '\0';
+ if (hex2bin(key, keybufr, sizeof(key))) {
+ retn = -EINVAL;
+ goto done;
+ }
+
+ kp = kzalloc(sizeof(*kp), GFP_KERNEL);
+ if (!kp) {
+ retn = -ENOMEM;
+ goto done;
+ }
+ memcpy(kp->key, key, sizeof(kp->key));
+
+ mutex_lock(lock);
+ list_add_tail(&kp->list, keylist);
+ ++*keycnt;
+ mutex_unlock(lock);
+
+ retn = datalen;
+ pr_debug("%s: Added key: %*phN\n", __func__, KEY_SIZE, key);
+
+ done:
+ return retn;
+}
+
+static int process_lock(const char __user *buf, size_t datalen, bool *lockfile)
+{
+ char bufr[5];
+
+ if (datalen != strlen("lock") + 1)
+ return 0;
+
+ memset(bufr, '\0', sizeof(bufr));
+ if (copy_from_user(bufr, buf, datalen-1))
+ return -EFAULT;
+ if (strcmp(bufr, "lock") != 0)
+ return 0;
+
+ *lockfile = true;
+ return datalen;
+}
+
+static int process_clear(const char __user *buf, size_t datalen, char *type,
+ unsigned int *keycnt, struct mutex *lock,
+ struct list_head *keylist)
+{
+ char bufr[6];
+ struct list_key *kp, *kp_tmp;
+
+ if (datalen != strlen("clear") + 1)
+ return 0;
+
+ memset(bufr, '\0', sizeof(bufr));
+ if (copy_from_user(bufr, buf, datalen-1))
+ return -EFAULT;
+ if (strcmp(bufr, "clear") != 0)
+ return 0;
+
+ mutex_lock(lock);
+ list_for_each_entry_safe(kp, kp_tmp, keylist, list) {
+ pr_debug("[%s]: Freeing signature: %*phN\n", __FILE__,
+ KEY_SIZE, kp->key);
+ list_del(&kp->list);
+ kfree(kp);
+ }
+ *keycnt = 0;
+ mutex_unlock(lock);
+
+ pr_info("Cleared %s signatures.\n", type);
+ return datalen;
+}
+
+static void *key_start(struct seq_file *c, loff_t *pos)
+{
+ struct list_key_iterator *ki = (struct list_key_iterator *) c->private;
+
+ if (*pos >= *ki->count)
+ return NULL;
+
+ mutex_lock(ki->lock);
+ return seq_list_start(ki->list, *pos);
+}
+
+static void *key_next(struct seq_file *c, void *p, loff_t *pos)
+{
+ struct list_key_iterator *ki = (struct list_key_iterator *) c->private;
+
+ return seq_list_next(p, ki->list, pos);
+}
+
+static void key_stop(struct seq_file *c, void *p)
+{
+ struct list_key_iterator *ki = (struct list_key_iterator *) c->private;
+
+ mutex_unlock(ki->lock);
+}
+
+static int key_show(struct seq_file *c, void *key)
+{
+ struct list_key *kp;
+
+ kp = list_entry(key, struct list_key, list);
+ seq_printf(c, "%*phN\n", KEY_SIZE, kp->key);
+ return 0;
+}
+
+static const struct seq_operations keys_seqops = {
+ .start = key_start,
+ .next = key_next,
+ .stop = key_stop,
+ .show = key_show
+};
+
+static ssize_t write_keys(struct file *file, const char __user *buf,
+ size_t datalen, loff_t *ppos)
+{
+ struct seq_file *s = file->private_data;
+ struct list_key_iterator *ki = (struct list_key_iterator *) s->private;
+ ssize_t retn;
+
+ if (*ppos != 0)
+ return -EINVAL;
+
+ retn = process_lock(buf, datalen, ki->lockfile);
+ if (retn != 0)
+ return retn;
+
+ retn = process_clear(buf, datalen, ki->type, ki->count, ki->lock,
+ ki->list);
+ if (retn != 0)
+ return retn;
+
+ retn = process_write_key(buf, datalen, ki->count, ki->lock, ki->list);
+ return retn;
+}
+
+static int release_keys(struct inode *inode, struct file *file)
+{
+ struct seq_file *s = file->private_data;
+ struct list_key_iterator *ki = (struct list_key_iterator *) s->private;
+
+ atomic_set(ki->opencount, 1);
+ seq_release_private(inode, file);
+ return 0;
+}
+
+static int open_launch_keys(struct inode *inode, struct file *filp)
+{
+ struct list_key_iterator *ki;
+
+ if (launch_keys_locked)
+ return -EACCES;
+
+ if (!atomic_dec_and_test(&launch_keys_opencount))
+ return -EBUSY;
+
+ ki = __seq_open_private(filp, &keys_seqops, sizeof(*ki));
+ if (!ki)
+ return -ENOMEM;
+
+ ki->type = "launch control";
+ ki->opencount = &launch_keys_opencount;
+ ki->count = &launch_keys_count;
+ ki->lock = &launch_key_list_mutex;
+ ki->list = &launch_key_list;
+ ki->lockfile = &launch_keys_locked;
+
+ return 0;
+}
+
+static const struct file_operations launch_keys_ops = {
+ .open = open_launch_keys,
+ .write = write_keys,
+ .release = release_keys,
+ .read = seq_read,
+ .llseek = seq_lseek,
+};
+
+/* Provisioning control. */
+
+static int open_provision_keys(struct inode *inode, struct file *filp)
+{
+ struct list_key_iterator *ki;
+
+ if (provision_keys_locked)
+ return -EACCES;
+
+ if (!atomic_dec_and_test(&provision_keys_opencount))
+ return -EBUSY;
+
+ ki = __seq_open_private(filp, &keys_seqops, sizeof(*ki));
+ if (!ki)
+ return -ENOMEM;
+
+ ki->type = "provisioning control";
+ ki->opencount = &provision_keys_opencount;
+ ki->count = &provision_keys_count;
+ ki->lock = &provision_key_list_mutex;
+ ki->list = &provision_key_list;
+
+ return 0;
+}
+
+static const struct file_operations provision_keys_ops = {
+ .open = open_provision_keys,
+ .write = write_keys,
+ .release = release_keys,
+ .read = seq_read,
+ .llseek = seq_lseek,
+};
+
+/* Signing control. */
+
+static int open_signing_keys(struct inode *inode, struct file *filp)
+{
+ struct list_key_iterator *ki;
+
+ if (signing_keys_locked)
+ return -EACCES;
+
+ if (!atomic_dec_and_test(&signing_keys_opencount))
+ return -EBUSY;
+
+ ki = __seq_open_private(filp, &keys_seqops, sizeof(*ki));
+ if (!ki)
+ return -ENOMEM;
+
+ ki->type = "signing control";
+ ki->opencount = &signing_keys_opencount;
+ ki->count = &signing_keys_count;
+ ki->lock = &signing_key_list_mutex;
+ ki->list = &signing_key_list;
+
+ return 0;
+}
+
+static const struct file_operations signing_keys_ops = {
+ .open = open_signing_keys,
+ .write = write_keys,
+ .release = release_keys,
+ .read = seq_read,
+ .llseek = seq_lseek,
+};
+
+static int __sgx_get_key_hash(struct crypto_shash *tfm, const void *modulus,
+ void *hash)
+{
+ SHASH_DESC_ON_STACK(shash, tfm);
+
+ shash->tfm = tfm;
+
+ return crypto_shash_digest(shash, modulus, SGX_MODULUS_SIZE, hash);
+}
+
+static int sgx_get_key_hash(const void *modulus, void *hash)
+{
+ struct crypto_shash *tfm;
+ int ret;
+
+ tfm = crypto_alloc_shash("sha256", 0, CRYPTO_ALG_ASYNC);
+ if (IS_ERR(tfm))
+ return PTR_ERR(tfm);
+
+ ret = __sgx_get_key_hash(tfm, modulus, hash);
+
+ crypto_free_shash(tfm);
+ return ret;
+}
+
+/**
+ * sgx_policy_get_params
+ *
+ * This function sets the cryptographically configured initialization
+ * policy parameters. These include the identity modulus signature to
+ * be used as well as the configuration of the allowed enclave
+ * attributes.
+ *
+ * Return:
+ * 0 on success.
+ * -errno otherwise
+ */
+
+int sgx_policy_get_params(struct sgx_encl *encl, void *modulus, u64 *signer,
+ int *signcnt)
+{
+ int retn = -EINVAL;
+ uint8_t mrsigner[KEY_SIZE];
+ struct list_key *kp;
+
+ retn = sgx_get_key_hash(modulus, mrsigner);
+ if (retn)
+ goto no_signer;
+
+ if (provision_keys_count > 0 &&
+ have_signer(&provision_key_list, &provision_key_list_mutex,
+ mrsigner))
+ encl->allowed_attributes |= SGX_ATTR_PROVISIONKEY;
+
+ if (signing_keys_count > 0 &&
+ have_signer(&signing_key_list, &signing_key_list_mutex, mrsigner))
+ goto have_signer;
+
+ if (encl->secs_attributes & SGX_ATTR_EINITTOKENKEY &&
+ launch_keys_count > 0) {
+ if (have_signer(&launch_key_list, &launch_key_list_mutex,
+ mrsigner)) {
+ encl->allowed_attributes |= SGX_ATTR_EINITTOKENKEY;
+ goto have_signer;
+ } else
+ goto no_signer;
+ }
+
+ if (launch_keys_count > 0) {
+ *signcnt = launch_keys_count;
+ kp = list_first_entry(&launch_key_list, struct list_key, list);
+ memcpy(mrsigner, kp->key, KEY_SIZE);
+ }
+
+ have_signer:
+ memcpy(signer, mrsigner, KEY_SIZE);
+ pr_debug("%s: Using signer: %*phN\n", __func__, KEY_SIZE, signer);
+ return 0;
+ no_signer:
+ memset(signer, '\0', KEY_SIZE);
+ return retn;
+}
+
+/**
+ * sgx_policy_get_launch_signer - Iterate through list of enclave authorizers.
+ *
+ * @signer: The last returned enclave signer.
+ *
+ * This function iterates through the list of enclave signers from the
+ * last signature. Calling the function with a NULL value
+ * resets the iteration to the beginning of the list.
+ *
+ * Return:
+ * NULL indicates end of list
+ * non-NULL the next enclave signature on the list.
+ */
+
+u64 *sgx_policy_get_launch_signer(u64 *signer)
+{
+ bool seeking = false;
+ u64 *retn = NULL;
+ struct list_key *kp;
+
+ if (!signer) {
+ kp = list_first_entry(&launch_key_list, struct list_key, list);
+ return kp->key;
+ }
+ kp = list_last_entry(&launch_key_list, struct list_key, list);
+ if (memcmp(kp->key, signer, sizeof(kp->key)) == 0)
+ return NULL;
+
+ mutex_lock(&launch_key_list_mutex);
+ list_for_each_entry(kp, &launch_key_list, list) {
+ if (seeking) {
+ retn = kp->key;
+ goto done;
+ }
+ pr_debug("%s: Skipping: %*phN\n", __func__, KEY_SIZE, kp->key);
+ if (memcmp(kp->key, signer, KEY_SIZE) == 0)
+ seeking = true;
+ }
+
+ done:
+ mutex_unlock(&launch_key_list_mutex);
+ return retn;
+}
+
+int __init sgx_policy_fs_init(void)
+{
+ int retn = -1;
+
+ sgx_fs = securityfs_create_dir("sgx", NULL);
+ if (IS_ERR(sgx_fs)) {
+ retn = PTR_ERR(sgx_fs);
+ goto err;
+ }
+
+ launch_keys = securityfs_create_file("launch_keys", 0600, sgx_fs,
+ NULL, &launch_keys_ops);
+ if (IS_ERR(launch_keys)) {
+ retn = PTR_ERR(launch_keys);
+ goto err;
+ }
+
+ provision_keys = securityfs_create_file("provisioning_keys", 0600,
+ sgx_fs, NULL,
+ &provision_keys_ops);
+ if (IS_ERR(provision_keys)) {
+ retn = PTR_ERR(provision_keys);
+ goto err;
+ }
+
+ signing_keys = securityfs_create_file("signing_keys", 0600, sgx_fs,
+ NULL, &signing_keys_ops);
+ if (IS_ERR(signing_keys)) {
+ retn = PTR_ERR(signing_keys);
+ goto err;
+ }
+
+ return 0;
+
+ err:
+ return retn;
+}
On Wed, Apr 22, 2020 at 09:48:58AM -0700, Connor Kuehl wrote: > On 4/21/20 2:52 PM, Jarkko Sakkinen wrote: > > v29: > > * The selftest has been moved to selftests/sgx. Because SGX is an execution > > environment of its own, it really isn't a great fit with more "standard" > > x86 tests. > > > > The RSA key is now generated on fly and the whole signing process has > > been made as part of the enclave loader instead of signing the enclave > > during the compilation time. > > > > Finally, the enclave loader loads now the test enclave directly from its > > ELF file, which means that ELF file does not need to be coverted as raw > > binary during the build process. > > * Version the mm_list instead of using on synchronize_mm() when adding new > > entries. We hold the write lock for the mm_struct, and dup_mm() can thus > > deadlock with the page reclaimer, which could hold the lock for the old > > mm_struct. > > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can > > be used to reserve the address range. Now /dev/sgx supports only opaque > > mappings to the (initialized) enclave data. > > * Make the vDSO callable directly from C by preserving RBX and taking leaf > > from RCX. > > Hi all, > > I've been producing Fedora 32 kernel builds with the SGX patches applied for > a few of weeks and I've just produced a build with this latest revision[1]. > We've been using these kernels to enable SGX for some of our > development/test machines. Thanks a lot! /Jarkko
On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote: > On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote: > > Good day, I hope the weekend is going well for everyone. > > > Intel(R) SGX is a set of CPU instructions that can be used by applications > > to set aside private regions of code and data. The code outside the enclave > > is disallowed to access the memory inside the enclave by the CPU access > > control. > > > > ... [ elided ] .. > > > > The current implementation requires that the firmware sets > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > > decide what enclaves it wants run. The implementation does not create > > any bottlenecks to support read-only MSRs later on. > > It seems highly unlikely that a driver implementation with any type of > support for read-only launch control registers would ever get into the > kernel. All one needs to do is review the conversations that Matthew > Garrett's lockdown patches engender to get a sense of that, ie: > > https://lwn.net/Articles/818277/ We do not require read-only MSRs. /Jarkko
On Wed, Apr 29, 2020 at 08:23:29AM +0300, Jarkko Sakkinen wrote: > On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote: > > On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote: > > > > Good day, I hope the weekend is going well for everyone. > > > > > Intel(R) SGX is a set of CPU instructions that can be used by applications > > > to set aside private regions of code and data. The code outside the enclave > > > is disallowed to access the memory inside the enclave by the CPU access > > > control. > > > > > > ... [ elided ] .. > > > > > > The current implementation requires that the firmware sets > > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > > > decide what enclaves it wants run. The implementation does not create > > > any bottlenecks to support read-only MSRs later on. > > > > It seems highly unlikely that a driver implementation with any type of > > support for read-only launch control registers would ever get into the > > kernel. All one needs to do is review the conversations that Matthew > > Garrett's lockdown patches engender to get a sense of that, ie: > > > > https://lwn.net/Articles/818277/ > > We do not require read-only MSRs. Greg is pointing out the opposite, that supporting read-only MSRs is highly unlikely to ever be supported in the mainline kernel.
On 2020-04-21 23:52, Jarkko Sakkinen wrote: > Intel(R) SGX is a set of CPU instructions that can be used by applications > to set aside private regions of code and data. The code outside the enclave > is disallowed to access the memory inside the enclave by the CPU access > control. > > There is a new hardware unit in the processor called Memory Encryption > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define > one or many MEE regions that can hold enclave data by configuring them with > PRMRR registers. > > The MEE automatically encrypts the data leaving the processor package to > the MEE regions. The data is encrypted using a random key whose life-time > is exactly one power cycle. > > The current implementation requires that the firmware sets > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > decide what enclaves it wants run. The implementation does not create > any bottlenecks to support read-only MSRs later on. > > You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > > cat /proc/cpuinfo | grep sgx Let's merge this. -- Jethro Beekman | Fortanix
On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote: > In closing, it is important to note that the proposed SGX driver is > not available as a module. This effectively excludes any alternative > implementations of the driver without replacement of the kernel at > large. No it doesn't. The SGX subsytem won't allocate EPC pages unless userspace creates an enclave, i.e. preventing unprivileged userspace from accessing /dev/sgx/enclave will allow loading an alternative out-of-tree SGX module. Yes, SGX sanitizes the EPC on boot, but that's arguably a good thing for out-of-tree modules. And if you want to get crafty and squash in-kernel SGX altogether, boot with "clearcpuid=<SGX_LC>" and/or "clearcpuid=<SGX>" to disable in-kernel support entirely. SGX won't be correctly enumerated in /proc/cpuinfo relative to the existence of an out-of-tree module, but that seems like a very minor issue if you're running with a completely different SGX driver. > It also means that any platform, with SGX hardware support, > running a kernel with this driver, has the potential for the > security/privacy issues noted above. Unless I'm mistaken, /dev/sgx is root-only by default. There are far scarier mechanisms available to root for hosing the system. > If key based policy management is not allowed, then the driver needs > to be re-architected to have modular support so that alternative > implementations or the absence of any driver support are at least > tenable. As above, using an alternative implementation is teneble, albeit a bit kludgy.
On Wed, Apr 29, 2020 at 05:27:48PM +0200, Jethro Beekman wrote: > On 2020-04-21 23:52, Jarkko Sakkinen wrote: > > Intel(R) SGX is a set of CPU instructions that can be used by applications > > to set aside private regions of code and data. The code outside the enclave > > is disallowed to access the memory inside the enclave by the CPU access > > control. > > > > There is a new hardware unit in the processor called Memory Encryption > > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define > > one or many MEE regions that can hold enclave data by configuring them with > > PRMRR registers. > > > > The MEE automatically encrypts the data leaving the processor package to > > the MEE regions. The data is encrypted using a random key whose life-time > > is exactly one power cycle. > > > > The current implementation requires that the firmware sets > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > > decide what enclaves it wants run. The implementation does not create > > any bottlenecks to support read-only MSRs later on. > > > > You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > > > > cat /proc/cpuinfo | grep sgx > > Let's merge this. So can I tag reviewed-by's? /Jarkko
On Wed, Apr 29, 2020 at 08:14:59AM -0700, Sean Christopherson wrote: > On Wed, Apr 29, 2020 at 08:23:29AM +0300, Jarkko Sakkinen wrote: > > On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote: > > > On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote: > > > > > > Good day, I hope the weekend is going well for everyone. > > > > > > > Intel(R) SGX is a set of CPU instructions that can be used by applications > > > > to set aside private regions of code and data. The code outside the enclave > > > > is disallowed to access the memory inside the enclave by the CPU access > > > > control. > > > > > > > > ... [ elided ] .. > > > > > > > > The current implementation requires that the firmware sets > > > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > > > > decide what enclaves it wants run. The implementation does not create > > > > any bottlenecks to support read-only MSRs later on. > > > > > > It seems highly unlikely that a driver implementation with any type of > > > support for read-only launch control registers would ever get into the > > > kernel. All one needs to do is review the conversations that Matthew > > > Garrett's lockdown patches engender to get a sense of that, ie: > > > > > > https://lwn.net/Articles/818277/ > > > > We do not require read-only MSRs. > > Greg is pointing out the opposite, that supporting read-only MSRs is highly > unlikely to ever be supported in the mainline kernel. In a nutshell, what is wrong in the current code changes and what *exactly* should we change? This is way too high level at the moment at least for my limited brain capacity. /Jarkko
On 2020-04-30 05:46, Jarkko Sakkinen wrote: > On Wed, Apr 29, 2020 at 05:27:48PM +0200, Jethro Beekman wrote: >> On 2020-04-21 23:52, Jarkko Sakkinen wrote: >>> Intel(R) SGX is a set of CPU instructions that can be used by applications >>> to set aside private regions of code and data. The code outside the enclave >>> is disallowed to access the memory inside the enclave by the CPU access >>> control. >>> >>> There is a new hardware unit in the processor called Memory Encryption >>> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define >>> one or many MEE regions that can hold enclave data by configuring them with >>> PRMRR registers. >>> >>> The MEE automatically encrypts the data leaving the processor package to >>> the MEE regions. The data is encrypted using a random key whose life-time >>> is exactly one power cycle. >>> >>> The current implementation requires that the firmware sets >>> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can >>> decide what enclaves it wants run. The implementation does not create >>> any bottlenecks to support read-only MSRs later on. >>> >>> You can tell if your CPU supports SGX by looking into /proc/cpuinfo: >>> >>> cat /proc/cpuinfo | grep sgx >> >> Let's merge this. > > So can I tag reviewed-by's? > No, but you already have my tested-by's. If it helps I can try to review some patches, but 1) I know nothing about kernel coding guidelines and best practices and 2) I know little about most kernel internals, so I won't be able to review every patch. -- Jethro Beekman | Fortanix
On Thu, Apr 30, 2020 at 09:19:48AM +0200, Jethro Beekman wrote: > On 2020-04-30 05:46, Jarkko Sakkinen wrote: > > On Wed, Apr 29, 2020 at 05:27:48PM +0200, Jethro Beekman wrote: > >> On 2020-04-21 23:52, Jarkko Sakkinen wrote: > >>> Intel(R) SGX is a set of CPU instructions that can be used by applications > >>> to set aside private regions of code and data. The code outside the enclave > >>> is disallowed to access the memory inside the enclave by the CPU access > >>> control. > >>> > >>> There is a new hardware unit in the processor called Memory Encryption > >>> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define > >>> one or many MEE regions that can hold enclave data by configuring them with > >>> PRMRR registers. > >>> > >>> The MEE automatically encrypts the data leaving the processor package to > >>> the MEE regions. The data is encrypted using a random key whose life-time > >>> is exactly one power cycle. > >>> > >>> The current implementation requires that the firmware sets > >>> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > >>> decide what enclaves it wants run. The implementation does not create > >>> any bottlenecks to support read-only MSRs later on. > >>> > >>> You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > >>> > >>> cat /proc/cpuinfo | grep sgx > >> > >> Let's merge this. > > > > So can I tag reviewed-by's? > > > > No, but you already have my tested-by's. > > If it helps I can try to review some patches, but 1) I know nothing > about kernel coding guidelines and best practices and 2) I know little > about most kernel internals, so I won't be able to review every patch. Ackd-by *acknowledges* that the patches work for you. I think that would be then the correct choice for the driver patch and patches before that. Lets go with that if that is cool for you of course. Did you run the selftest only or possibly also some internal Fortanix tests? /Jarkko
On 2020-04-30 10:23, Jarkko Sakkinen wrote: > On Thu, Apr 30, 2020 at 09:19:48AM +0200, Jethro Beekman wrote: >> On 2020-04-30 05:46, Jarkko Sakkinen wrote: >>> On Wed, Apr 29, 2020 at 05:27:48PM +0200, Jethro Beekman wrote: >>>> On 2020-04-21 23:52, Jarkko Sakkinen wrote: >>>>> Intel(R) SGX is a set of CPU instructions that can be used by applications >>>>> to set aside private regions of code and data. The code outside the enclave >>>>> is disallowed to access the memory inside the enclave by the CPU access >>>>> control. >>>>> >>>>> There is a new hardware unit in the processor called Memory Encryption >>>>> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define >>>>> one or many MEE regions that can hold enclave data by configuring them with >>>>> PRMRR registers. >>>>> >>>>> The MEE automatically encrypts the data leaving the processor package to >>>>> the MEE regions. The data is encrypted using a random key whose life-time >>>>> is exactly one power cycle. >>>>> >>>>> The current implementation requires that the firmware sets >>>>> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can >>>>> decide what enclaves it wants run. The implementation does not create >>>>> any bottlenecks to support read-only MSRs later on. >>>>> >>>>> You can tell if your CPU supports SGX by looking into /proc/cpuinfo: >>>>> >>>>> cat /proc/cpuinfo | grep sgx >>>> >>>> Let's merge this. >>> >>> So can I tag reviewed-by's? >>> >> >> No, but you already have my tested-by's. >> >> If it helps I can try to review some patches, but 1) I know nothing >> about kernel coding guidelines and best practices and 2) I know little >> about most kernel internals, so I won't be able to review every patch. > > Ackd-by *acknowledges* that the patches work for you. I think that would > be then the correct choice for the driver patch and patches before that. > > Lets go with that if that is cool for you of course. > > Did you run the selftest only or possibly also some internal Fortanix > tests? > v29 patches 2 through 18: Acked-by: Jethro Beekman <jethro@fortanix.com> I only ran production SGX software. I didn't run the self test. -- Jethro Beekman | Fortanix
On Thu, Apr 30, 2020 at 06:59:11AM +0300, Jarkko Sakkinen wrote: Good afternoon, I hope the weekend is going well for everyone. > On Wed, Apr 29, 2020 at 08:14:59AM -0700, Sean Christopherson wrote: > > On Wed, Apr 29, 2020 at 08:23:29AM +0300, Jarkko Sakkinen wrote: > > > On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote: > > > > On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote: > > > > > > > > The current implementation requires that the firmware sets > > > > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately > > > > > the kernel can decide what enclaves it wants run. The > > > > > implementation does not create any bottlenecks to support > > > > > read-only MSRs later on. > > > > > > > It seems highly unlikely that a driver implementation with any type of > > > > support for read-only launch control registers would ever get into the > > > > kernel. All one needs to do is review the conversations that Matthew > > > > Garrett's lockdown patches engender to get a sense of that, ie: > > > > > > > > https://lwn.net/Articles/818277/ > > > > > > We do not require read-only MSRs. > > > > Greg is pointing out the opposite, that supporting read-only MSRs is highly > > unlikely to ever be supported in the mainline kernel. > In a nutshell, what is wrong in the current code changes and what > *exactly* should we change? This is way too high level at the moment > at least for my limited brain capacity. In a nutshell, the driver needs our patch that implements cryptographic policy management. A patch that: 1.) Does not change the default behavior of the driver. 2.) Implements capabilities that are consistent with what the hardware was designed to do, but only at the discretion of the platform owner. 3.) Has no impact on the driver architecture. The only consumer for this driver are SGX runtimes. To our knowledge, there exist only two complete runtime implementations, Intel's and ours. It us unclear why our approach to the use of the technology should be discriminated against when it doesn't impact the other user community. The Linux kernel that I have worked on and supported since 1992 has always focused on technical rationale and meritocracy rather then politics. We would be interested in why our proposal for the driver fails on the former grounds rather then the latter. > /Jarkko Best wishes for a productive week. Dr. Greg As always, Dr. Greg Wettstein, Ph.D, Worker Artisans in autonomously Enjellic Systems Development, LLC self-defensive IOT platforms 4206 N. 19th Ave. and edge devices. Fargo, ND 58102 PH: 701-281-1686 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "The best way to predict the future is to invent it." -- Alan Kay
> On May 2, 2020, at 4:05 PM, Dr. Greg <greg@enjellic.com> wrote: > > On Thu, Apr 30, 2020 at 06:59:11AM +0300, Jarkko Sakkinen wrote: > > Good afternoon, I hope the weekend is going well for everyone. > >>> On Wed, Apr 29, 2020 at 08:14:59AM -0700, Sean Christopherson wrote: >>> On Wed, Apr 29, 2020 at 08:23:29AM +0300, Jarkko Sakkinen wrote: >>>> On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote: >>>>> On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote: >>>> >>>>>> The current implementation requires that the firmware sets >>>>>> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately >>>>>> the kernel can decide what enclaves it wants run. The >>>>>> implementation does not create any bottlenecks to support >>>>>> read-only MSRs later on. >>>> >>>>> It seems highly unlikely that a driver implementation with any type of >>>>> support for read-only launch control registers would ever get into the >>>>> kernel. All one needs to do is review the conversations that Matthew >>>>> Garrett's lockdown patches engender to get a sense of that, ie: >>>>> >>>>> https://lwn.net/Articles/818277/ >>>> >>>> We do not require read-only MSRs. >>> >>> Greg is pointing out the opposite, that supporting read-only MSRs is highly >>> unlikely to ever be supported in the mainline kernel. > >> In a nutshell, what is wrong in the current code changes and what >> *exactly* should we change? This is way too high level at the moment >> at least for my limited brain capacity. > > In a nutshell, the driver needs our patch that implements > cryptographic policy management. > > A patch that: > > 1.) Does not change the default behavior of the driver. > > 2.) Implements capabilities that are consistent with what the hardware > was designed to do, but only at the discretion of the platform owner. > > 3.) Has no impact on the driver architecture. > > The only consumer for this driver are SGX runtimes. To our knowledge, > there exist only two complete runtime implementations, Intel's and > ours. It us unclear why our approach to the use of the technology > should be discriminated against when it doesn't impact the other user > community. Can you clarify how exactly this patch set discriminates against your stack? > > The Linux kernel that I have worked on and supported since 1992 has > always focused on technical rationale and meritocracy rather then > politics. We would be interested in why our proposal for the driver > fails on the former grounds rather then the latter. > >> /Jarkko > > Best wishes for a productive week. > > Dr. Greg > > As always, > Dr. Greg Wettstein, Ph.D, Worker Artisans in autonomously > Enjellic Systems Development, LLC self-defensive IOT platforms > 4206 N. 19th Ave. and edge devices. > Fargo, ND 58102 > PH: 701-281-1686 EMAIL: greg@enjellic.com > ------------------------------------------------------------------------------ > "The best way to predict the future is to invent it." > -- Alan Kay
On Sat, May 02, 2020 at 05:48:30PM -0700, Andy Lutomirski wrote: Good morning, I hope the week is starting well for everyone. > > On May 2, 2020, at 4:05 PM, Dr. Greg <greg@enjellic.com> wrote: > > In a nutshell, the driver needs our patch that implements > > cryptographic policy management. > > > > A patch that: > > > > 1.) Does not change the default behavior of the driver. > > > > 2.) Implements capabilities that are consistent with what the hardware > > was designed to do, but only at the discretion of the platform owner. > > > > 3.) Has no impact on the driver architecture. > > > > The only consumer for this driver are SGX runtimes. To our knowledge, > > there exist only two complete runtime implementations, Intel's and > > ours. It us unclear why our approach to the use of the technology > > should be discriminated against when it doesn't impact the other user > > community. > Can you clarify how exactly this patch set discriminates against > your stack? The driver has no provisions for implementing cryptographically based SGX policy management of any type. Our stack is extremely lightweight with no external dependencies and is used in privacy and security sensitive applications, including financial services of certain types. There is a desire in this, and other venues, to use cloud and edge resources with a strong guarantee that the platforms have only had a known set of behaviors. The current DAC based controls in the driver are insufficient to provide those guarantees. I believe I have discussed our use of SGX previously. In a nutshell, we use SGX based modeling engines to supervise kernel based behavioral namespaces, one enclave per namespace. The closest equivalent work that we have seen may be the IPE architecture advanced by Deven Bowers at Microsoft but we address a number of issues that work does not, including non-kernel based behavioral supervision. We support the concern over hardware locked platforms and do not disagree with the driver not supporting those platforms. That being said, there is no technical rationale for not providing cryptographic policy management on FLC based platforms, as I believe our patch demonstrates. Best wishes for a productive week. Dr. Greg As always, Dr. Greg Wettstein, Ph.D, Worker Artisans in autonomously Enjellic Systems Development, LLC self-defensive platforms. 4206 N. 19th Ave. Fargo, ND 58102 PH: 701-281-1686 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Can't they? A 64bit number incremented every millisecond can grow for half a billion years. As far as I'm concerned, that is forever." -- Neil Brown linux-raid
On Thu, Apr 30, 2020 at 04:12:07PM +0200, Jethro Beekman wrote: > On 2020-04-30 10:23, Jarkko Sakkinen wrote: > > On Thu, Apr 30, 2020 at 09:19:48AM +0200, Jethro Beekman wrote: > >> On 2020-04-30 05:46, Jarkko Sakkinen wrote: > >>> On Wed, Apr 29, 2020 at 05:27:48PM +0200, Jethro Beekman wrote: > >>>> On 2020-04-21 23:52, Jarkko Sakkinen wrote: > >>>>> Intel(R) SGX is a set of CPU instructions that can be used by applications > >>>>> to set aside private regions of code and data. The code outside the enclave > >>>>> is disallowed to access the memory inside the enclave by the CPU access > >>>>> control. > >>>>> > >>>>> There is a new hardware unit in the processor called Memory Encryption > >>>>> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define > >>>>> one or many MEE regions that can hold enclave data by configuring them with > >>>>> PRMRR registers. > >>>>> > >>>>> The MEE automatically encrypts the data leaving the processor package to > >>>>> the MEE regions. The data is encrypted using a random key whose life-time > >>>>> is exactly one power cycle. > >>>>> > >>>>> The current implementation requires that the firmware sets > >>>>> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > >>>>> decide what enclaves it wants run. The implementation does not create > >>>>> any bottlenecks to support read-only MSRs later on. > >>>>> > >>>>> You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > >>>>> > >>>>> cat /proc/cpuinfo | grep sgx > >>>> > >>>> Let's merge this. > >>> > >>> So can I tag reviewed-by's? > >>> > >> > >> No, but you already have my tested-by's. > >> > >> If it helps I can try to review some patches, but 1) I know nothing > >> about kernel coding guidelines and best practices and 2) I know little > >> about most kernel internals, so I won't be able to review every patch. > > > > Ackd-by *acknowledges* that the patches work for you. I think that would > > be then the correct choice for the driver patch and patches before that. > > > > Lets go with that if that is cool for you of course. > > > > Did you run the selftest only or possibly also some internal Fortanix > > tests? > > > > v29 patches 2 through 18: > > Acked-by: Jethro Beekman <jethro@fortanix.com> > > I only ran production SGX software. I didn't run the self test. That's great to hear thank you. Updated my tree accordingly. /Jarkko
On 4/21/20 2:52 PM, Jarkko Sakkinen wrote: > Intel(R) SGX is a set of CPU instructions that can be used by applications > to set aside private regions of code and data. The code outside the enclave > is disallowed to access the memory inside the enclave by the CPU access > control. > > There is a new hardware unit in the processor called Memory Encryption > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define > one or many MEE regions that can hold enclave data by configuring them with > PRMRR registers. > > The MEE automatically encrypts the data leaving the processor package to > the MEE regions. The data is encrypted using a random key whose life-time > is exactly one power cycle. > > The current implementation requires that the firmware sets > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > decide what enclaves it wants run. The implementation does not create > any bottlenecks to support read-only MSRs later on. > > You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > > cat /proc/cpuinfo | grep sgx > > v29: > * The selftest has been moved to selftests/sgx. Because SGX is an execution > environment of its own, it really isn't a great fit with more "standard" > x86 tests. > > The RSA key is now generated on fly and the whole signing process has > been made as part of the enclave loader instead of signing the enclave > during the compilation time. > > Finally, the enclave loader loads now the test enclave directly from its > ELF file, which means that ELF file does not need to be coverted as raw > binary during the build process. > * Version the mm_list instead of using on synchronize_mm() when adding new > entries. We hold the write lock for the mm_struct, and dup_mm() can thus > deadlock with the page reclaimer, which could hold the lock for the old > mm_struct. > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can > be used to reserve the address range. Now /dev/sgx supports only opaque > mappings to the (initialized) enclave data. > * Make the vDSO callable directly from C by preserving RBX and taking leaf > from RCX. > Tested with the Open Enclave SDK on top of Intel PSW. Specifically built the Intel PSW with changes to support /dev/sgx mapping[1] new in v29. Tested-by: Jordan Hand <jorhand@linux.microsoft.com> [1] https://github.com/intel/linux-sgx/pull/530
Tested on Enarx. This requires a patch[0] for v29 support. Tested-by: Nathaniel McCallum <npmccallum@redhat.com> However, we did uncover a small usability issue. See below. [0]: https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662 On Tue, Apr 21, 2020 at 5:53 PM Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > Intel(R) SGX is a set of CPU instructions that can be used by applications > to set aside private regions of code and data. The code outside the enclave > is disallowed to access the memory inside the enclave by the CPU access > control. > > There is a new hardware unit in the processor called Memory Encryption > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define > one or many MEE regions that can hold enclave data by configuring them with > PRMRR registers. > > The MEE automatically encrypts the data leaving the processor package to > the MEE regions. The data is encrypted using a random key whose life-time > is exactly one power cycle. > > The current implementation requires that the firmware sets > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > decide what enclaves it wants run. The implementation does not create > any bottlenecks to support read-only MSRs later on. > > You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > > cat /proc/cpuinfo | grep sgx > > v29: > * The selftest has been moved to selftests/sgx. Because SGX is an execution > environment of its own, it really isn't a great fit with more "standard" > x86 tests. > > The RSA key is now generated on fly and the whole signing process has > been made as part of the enclave loader instead of signing the enclave > during the compilation time. > > Finally, the enclave loader loads now the test enclave directly from its > ELF file, which means that ELF file does not need to be coverted as raw > binary during the build process. > * Version the mm_list instead of using on synchronize_mm() when adding new > entries. We hold the write lock for the mm_struct, and dup_mm() can thus > deadlock with the page reclaimer, which could hold the lock for the old > mm_struct. > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can > be used to reserve the address range. Now /dev/sgx supports only opaque > mappings to the (initialized) enclave data. The statement "Any mapping..." isn't actually true. Enarx creates a large enclave (currently 64GiB). This worked when we created a file-backed mapping on /dev/sgx/enclave. However, switching to an anonymous mapping fails with ENOMEM. We suspect this is because the kernel attempts to allocate all the pages and zero them but there is insufficient RAM available. We currently work around this by creating a shared mapping on /dev/zero. If we want to keep this mmap() strategy, we probably don't want to advise mmap(ANON) if it allocates all the memory for the enclave ahead of time, even if it won't be used. This would be wasteful. OTOH, having to mmap("/dev/zero") seems a bit awkward. Currently, the selftest uses mmap(ANON) and the patch message above recommends it. > * Make the vDSO callable directly from C by preserving RBX and taking leaf > from RCX. > > v28: > * Documented to Documentation/x86/sgx.rst how the kernel manages the > enclave ownership. > * Removed non-LC flow from sgx_einit(). > * Removed struct sgx_einittoken since only the size of the corresponding > microarchitectural structure is used in the series ATM. > > v27: > * Disallow RIE processes to use enclaves as there could a permission > conflict between VMA and enclave permissions. > * In the documentation, replace "grep /proc/cpuinfo" with > "grep sgx /proc/cpuinfo". > > v26: > * Fixed the commit author in "x86/sgx: Linux Enclave Driver", which was > changed in v25 by mistake. > * Addressed a bunch of grammar mistakes in sgx.rst (thanks Randy once > again for such a detailed feedback). > * Added back the MAINTAINERS update commit, which was mistakenly removed > in v25. > * EREMOVE's for SECS cannot be done while sanitizing an EPC section. The > CPU does not allow to remove a SECS page before all of its children have > been removed, and a child page can be in some other section than the one > currently being processed. Thus, removed special SECS processing from > sgx_sanitize_page() and instead put sections through it twice. In the > 2nd round the lists should only contain SECS pages. > > v25: > * Fix a double-free issue when SGX_IOC_ENCLAVE_ADD_PAGES > fails on executing ENCLS[EADD]. The rollback path executed > radix_tree_delete() on the same address twice when this happened. > * Return -EINTR instead of -ERESTARTSYS in SGX_IOC_ENCLAVE_ADD_PAGES when > a signal is pending. > * As requested by Borislav, move the CPUID 0x12 features to their own word > in cpufeatures. > * Sean fixed a bug from sgx_reclaimer_write() where sgx_encl_put_backing() > was called with an uninitialized pointer when sgx_encl_get_backing() > fails. > * Migrated /dev/sgx/* to misc. This is future-proof as struct miscdevice > has 'groups' for setting up sysfs attributes for the device. > * Use device_initcall instead of subsys_initcall so that misc_class is > initialized before SGX is initialized. > * Return -EACCES in SGX_IOC_ENCLAVE_INIT when caller tries to select > enclave attributes that we the kernel does not allow it to set instead > of -EINVAL. > * Unless SGX public key MSRs are writable always deny the feature from > Linux. Previously this was only denied from driver. How VMs should be > supported is not really part of initial patch set, which makes this > an obvious choice. > * Cleaned up and refined documentation to be more approachable. > > v24: > * Reclaim unmeasured and TCS pages (regression in v23). > * Replace usages of GFP_HIGHUSER with GFP_KERNEL. > * Return -EIO on when EADD or EEXTEND fails in %SGX_IOC_ENCLAVE_ADD_PAGES > and use the same rollback (destroy enclave). This can happen when host > suspends itself unknowingly to a VM running enclaves. From -EIO the user > space can deduce what happened. > * Have a separate @count in struct sgx_enclave_add_pages to output number > of bytes processed instead of overwriting the input parameters for > clarity and more importantly that the API provides means for partial > processing (@count could be less than @length in success case). > > v23: > * Replace SGX_ENCLAVE_ADD_PAGE with SGX_ENCLAVE_ADD_PAGES. Replace @mrmask > with %SGX_PAGE_MEASURE flag. > * Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page. > * In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail > (because of OOM) even in legit behaviour and after EBLOCK the reclaiming > flow can be only reverted by killing the whole enclave. > * Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact > it should have been bit 6 (Table 37-3 in the SDM). > * Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX > error code. > > v22: > * Refined bunch commit messages and added associated SDM references as > many of them were too exhausting and some outdated. > * Alignment checks have been removed from mmap() because it does not define the > ELRANGE. VMAs only act as windows to the enclave. The semantics compare > somewhat how mmap() works with regular files. > * We now require user space addresses given to SGX_IOC_ENCLAVE_ADD_PAGE to be > page aligned so that we can pass the page directly to EADD and do not have > to do an extra copy. This was made effectively possible by removing the > worker thread for adding pages. > * The selftest build files have been refined throughout of various glitches > and work properly in a cross compilation environment such as BuildRoot. > In addition, libcalls fail the build with an assertion in the linker > script, if they end up to the enclave binary. > * CONFIG_INTEL_SGX_DRIVER has been removed because you cannot use SGX core > for anything without having the driver. This could change when KVM support > is added. > * We require zero permissions in SECINFO for TCS pages because the CPU > overwrites SECINFO flags with zero permissions and measures the page > only after that. Allowing to pass TCS with non-zero permissions would > cause mismatching measurement between the one provided in SIGSTRUCT and > the one computed by the CPU. > * Obviously lots of small fixes and clean ups (does make sense to > document them all). > > v21: > * Check on mmap() that the VMA does cover an area that does not have > enclave pages. Only mapping with PROT_NONE can do that to reserve > initial address space for an enclave. > * Check om mmap() and mprotect() that the VMA permissions do not > surpass the enclave permissions. > * Remove two refcounts from vma_close(): mm_list and encl->refcount. > Enclave refcount is only need for swapper/enclave sync and we can > remove mm_list refcount by destroying mm_struct when the process > is closed. By not having vm_close() the Linux MM can merge VMAs. > * Do not naturally align MAP_FIXED address. > * Numerous small fixes and clean ups. > * Use SRCU for synchronizing the list of mm_struct's. > * Move to stack based call convention in the vDSO. > > v20: > * Fine-tune Kconfig messages and spacing and remove MMU_NOTIFIER > dependency as MMU notifiers are no longer used in the driver. > * Use mm_users instead of mm_count as refcount for mm_struct as mm_count > only protects from deleting mm_struct, not removing its contents. > * Sanitize EPC when the reclaimer thread starts by doing EREMOVE for all > of them. They could be in initialized state when the kernel starts > because it might be spawned by kexec(). > * Documentation overhaul. > * Use a device /dev/sgx/provision for delivering the provision token > instead of securityfs. > * Create a reference to the enclave when already when opening > /dev/sgx/enclave. The file is then associated with this enclave only. > mmap() can be done at free at any point and always get a reference to > the enclave. To summarize the file now represents the enclave. > > v19: > * Took 3-4 months but in some sense this was more like a rewrite of most > of the corners of the source code. If I've forgotten to deal with some > feedback, please don't shout me. Make a remark and I will fix it for > the next version. Hopefully there won't be this big turnovers anymore. > * Validate SECS attributes properly against CPUID given attributes and > against allowed attributes. SECS attributes are the ones that are > enforced whereas SIGSTRUCT attributes tell what is required to run > the enclave. > * Add KSS (Key Sharing Support) to the enclave attributes. > * Deny MAP_PRIVATE as an enclave is always a shared memory entity. > * Revert back to shmem backing storage so that it can be easily shared > by multiple processes. > * Split the recognization of an ENCLS leaf failure by using three > functions to detect it: encsl_faulted(), encls_returned_code() and > sgx_failed(). encls_failed() is only caused by a spurious expections that > should never happen. Thus, it is not defined as an inline function in > order to easily insert a kprobe to it. > * Move low-level enclave management routines, page fault handler and page > reclaiming routines from driver to the core. These cannot be separated > from each other as they are heavily interdependent. The rationale is that > the core does not call any code from the driver. > * Allow the driver to be compiled as a module now that it no code is using > its routines and it only uses exported symbols. Now the driver is > essentially just a thin ioctl layer. > * Reworked the driver to maintain a list of mm_struct's. The VMA callbacks > add new entries to this list as the process is forked. Each entry has > its own refcount because they have a different life-cycle as the enclave > does. In effect @tgid and @mm have been removed from struct sgx_encl > and we allow forking by removing VM_DONTCOPY from vm flags. > * Generate a cpu mask in the reclaimer from the cpu mask's of all > mm_struct's. This will kick out the hardware threads out of the enclave > from multiple processes. It is not a local variable because it would > eat too much of the stack space but instead a field in struct > sgx_encl. > * Allow forking i.e. remove VM_DONTCOPY. I did not change the API > because the old API scaled to the workload that Andy described. The > codebase is now mostly API independent i.e. changing the API is a > small task. For me the proper trigger to chanage it is a as concrete > as possible workload that cannot be fulfilled. I hope you understand > my thinking here. I don't want to change anything w/o proper basis > but I'm ready to change anything if there is a proper basis. I do > not have any kind of attachment to any particular type of API. > * Add Sean's vDSO ENCLS(EENTER) patches and update selftest to use the > new vDSO. > > v18: > * Update the ioctl-number.txt. > * Move the driver under arch/x86. > * Add SGX features (SGX, SGX1, SGX2) to the disabled-features.h. > * Rename the selftest as test_sgx (previously sgx-selftest). > * In order to enable process accounting, swap EPC pages and PCMD's to a VMA > instead of shmem. > * Allow only to initialize and run enclaves with a subset of > {DEBUG, MODE64BIT} set. > * Add SGX_IOC_ENCLAVE_SET_ATTRIBUTE to allow an enclave to have privileged > attributes e.g. PROVISIONKEY. > > v17: > * Add a simple selftest. > * Fix a null pointer dereference to section->pages when its > allocation fails. > * Add Sean's description of the exception handling to the documentation. > > v16: > * Fixed SOB's in the commits that were a bit corrupted in v15. > * Implemented exceptio handling properly to detect_sgx(). > * Use GENMASK() to define SGX_CPUID_SUB_LEAF_TYPE_MASK. > * Updated the documentation to use rst definition lists. > * Added the missing Documentation/x86/index.rst, which has a link to > intel_sgx.rst. Now the SGX and uapi documentation is properly generated > with 'make htmldocs'. > * While enumerating EPC sections, if an undefined section is found, fail > the driver initialization instead of continuing the initialization. > * Issue a warning if there are more than %SGX_MAX_EPC_SECTIONS. > * Remove copyright notice from arch/x86/include/asm/sgx.h. > * Migrated from ioremap_cache() to memremap(). > > v15: > * Split into more digestable size patches. > * Lots of small fixes and clean ups. > * Signal a "plain" SIGSEGV on an EPCM violation. > > v14: > * Change the comment about X86_FEATURE_SGX_LC from “SGX launch > configuration” to “SGX launch control”. > * Move the SGX-related CPU feature flags as part of the Linux defined > virtual leaf 8. > * Add SGX_ prefix to the constants defining the ENCLS leaf functions. > * Use GENMASK*() and BIT*() in sgx_arch.h instead of raw hex numbers. > * Refine the long description for CONFIG_INTEL_SGX_CORE. > * Do not use pr_*_ratelimited() in the driver. The use of the rate limited > versions is legacy cruft from the prototyping phase. > * Detect sleep with SGX_INVALID_EINIT_TOKEN instead of counting power > cycles. > * Manually prefix with “sgx:” in the core SGX code instead of redefining > pr_fmt. > * Report if IA32_SGXLEPUBKEYHASHx MSRs are not writable in the driver > instead of core because it is a driver requirement. > * Change prompt to bool in the entry for CONFIG_INTEL_SGX_CORE because the > default is ‘n’. > * Rename struct sgx_epc_bank as struct sgx_epc_section in order to match > the SDM. > * Allocate struct sgx_epc_page instances one at a time. > * Use “__iomem void *” pointers for the mapped EPC memory consistently. > * Retry once on SGX_INVALID_TOKEN in sgx_einit() instead of counting power > cycles. > * Call enclave swapping operations directly from the driver instead of > calling them .indirectly through struct sgx_epc_page_ops because indirect > calls are not required yet as the patch set does not contain the KVM > support. > * Added special signal SEGV_SGXERR to notify about SGX EPCM violation > errors. > > v13: > * Always use SGX_CPUID constant instead of a hardcoded value. > * Simplified and documented the macros and functions for ENCLS leaves. > * Enable sgx_free_page() to free active enclave pages on demand > in order to allow sgx_invalidate() to delete enclave pages. > It no longer performs EREMOVE if a page is in the process of > being reclaimed. > * Use PM notifier per enclave so that we don't have to traverse > the global list of active EPC pages to find enclaves. > * Removed unused SGX_LE_ROLLBACK constant from uapi/asm/sgx.h > * Always use ioremap() to map EPC banks as we only support 64-bit kernel. > * Invalidate IA32_SGXLEPUBKEYHASH cache used by sgx_einit() when going > to sleep. > > v12: > * Split to more narrow scoped commits in order to ease the review process and > use co-developed-by tag for co-authors of commits instead of listing them in > the source files. > * Removed cruft EXPORT_SYMBOL() declarations and converted to static variables. > * Removed in-kernel LE i.e. this version of the SGX software stack only > supports unlocked IA32_SGXLEPUBKEYHASHx MSRs. > * Refined documentation on launching enclaves, swapping and enclave > construction. > * Refined sgx_arch.h to include alignment information for every struct that > requires it and removed structs that are not needed without an LE. > * Got rid of SGX_CPUID. > * SGX detection now prints log messages about firmware configuration issues. > > v11: > * Polished ENCLS wrappers with refined exception handling. > * ksgxswapd was not stopped (regression in v5) in > sgx_page_cache_teardown(), which causes a leaked kthread after driver > deinitialization. > * Shutdown sgx_le_proxy when going to suspend because its EPC pages will be > invalidated when resuming, which will cause it not function properly > anymore. > * Set EINITTOKEN.VALID to zero for a token that is passed when > SGXLEPUBKEYHASH matches MRSIGNER as alloc_page() does not give a zero > page. > * Fixed the check in sgx_edbgrd() for a TCS page. Allowed to read offsets > around the flags field, which causes a #GP. Only flags read is readable. > * On read access memcpy() call inside sgx_vma_access() had src and dest > parameters in wrong order. > * The build issue with CONFIG_KASAN is now fixed. Added undefined symbols > to LE even if “KASAN_SANITIZE := false” was set in the makefile. > * Fixed a regression in the #PF handler. If a page has > SGX_ENCL_PAGE_RESERVED flag the #PF handler should unconditionally fail. > It did not, which caused weird races when trying to change other parts of > swapping code. > * EPC management has been refactored to a flat LRU cache and moved to > arch/x86. The swapper thread reads a cluster of EPC pages and swaps all > of them. It can now swap from multiple enclaves in the same round. > * For the sake of consistency with SGX_IOC_ENCLAVE_ADD_PAGE, return -EINVAL > when an enclave is already initialized or dead instead of zero. > > v10: > * Cleaned up anon inode based IPC between the ring-0 and ring-3 parts > of the driver. > * Unset the reserved flag from an enclave page if EDBGRD/WR fails > (regression in v6). > * Close the anon inode when LE is stopped (regression in v9). > * Update the documentation with a more detailed description of SGX. > > v9: > * Replaced kernel-LE IPC based on pipes with an anonymous inode. > The driver does not require anymore new exports. > > v8: > * Check that public key MSRs match the LE public key hash in the > driver initialization when the MSRs are read-only. > * Fix the race in VA slot allocation by checking the fullness > immediately after succeesful allocation. > * Fix the race in hash mrsigner calculation between the launch > enclave and user enclaves by having a separate lock for hash > calculation. > > v7: > * Fixed offset calculation in sgx_edbgr/wr(). Address was masked with PAGE_MASK > when it should have been masked with ~PAGE_MASK. > * Fixed a memory leak in sgx_ioc_enclave_create(). > * Simplified swapping code by using a pointer array for a cluster > instead of a linked list. > * Squeezed struct sgx_encl_page to 32 bytes. > * Fixed deferencing of an RSA key on OpenSSL 1.1.0. > * Modified TC's CMAC to use kernel AES-NI. Restructured the code > a bit in order to better align with kernel conventions. > > v6: > * Fixed semaphore underrun when accessing /dev/sgx from the launch enclave. > * In sgx_encl_create() s/IS_ERR(secs)/IS_ERR(encl)/. > * Removed virtualization chapter from the documentation. > * Changed the default filename for the signing key as signing_key.pem. > * Reworked EPC management in a way that instead of a linked list of > struct sgx_epc_page instances there is an array of integers that > encodes address and bank of an EPC page (the same data as 'pa' field > earlier). The locking has been moved to the EPC bank level instead > of a global lock. > * Relaxed locking requirements for EPC management. EPC pages can be > released back to the EPC bank concurrently. > * Cleaned up ptrace() code. > * Refined commit messages for new architectural constants. > * Sorted includes in every source file. > * Sorted local variable declarations according to the line length in > every function. > * Style fixes based on Darren's comments to sgx_le.c. > > v5: > * Described IPC between the Launch Enclave and kernel in the commit messages. > * Fixed all relevant checkpatch.pl issues that I have forgot fix in earlier > versions except those that exist in the imported TinyCrypt code. > * Fixed spelling mistakes in the documentation. > * Forgot to check the return value of sgx_drv_subsys_init(). > * Encapsulated properly page cache init and teardown. > * Collect epc pages to a temp list in sgx_add_epc_bank > * Removed SGX_ENCLAVE_INIT_ARCH constant. > > v4: > * Tied life-cycle of the sgx_le_proxy process to /dev/sgx. > * Removed __exit annotation from sgx_drv_subsys_exit(). > * Fixed a leak of a backing page in sgx_process_add_page_req() in the > case when vm_insert_pfn() fails. > * Removed unused symbol exports for sgx_page_cache.c. > * Updated sgx_alloc_page() to require encl parameter and documented the > behavior (Sean Christopherson). > * Refactored a more lean API for sgx_encl_find() and documented the behavior. > * Moved #PF handler to sgx_fault.c. > * Replaced subsys_system_register() with plain bus_register(). > * Retry EINIT 2nd time only if MSRs are not locked. > > v3: > * Check that FEATURE_CONTROL_LOCKED and FEATURE_CONTROL_SGX_ENABLE are set. > * Return -ERESTARTSYS in __sgx_encl_add_page() when sgx_alloc_page() fails. > * Use unused bits in epc_page->pa to store the bank number. > * Removed #ifdef for WQ_NONREENTRANT. > * If mmu_notifier_register() fails with -EINTR, return -ERESTARTSYS. > * Added --remove-section=.got.plt to objcopy flags in order to prevent a > dummy .got.plt, which will cause an inconsistent size for the LE. > * Documented sgx_encl_* functions. > * Added remark about AES implementation used inside the LE. > * Removed redundant sgx_sys_exit() from le/main.c. > * Fixed struct sgx_secinfo alignment from 128 to 64 bytes. > * Validate miscselect in sgx_encl_create(). > * Fixed SSA frame size calculation to take the misc region into account. > * Implemented consistent exception handling to __encls() and __encls_ret(). > * Implemented a proper device model in order to allow sysfs attributes > and in-kernel API. > * Cleaned up various "find enclave" implementations to the unified > sgx_encl_find(). > * Validate that vm_pgoff is zero. > * Discard backing pages with shmem_truncate_range() after EADD. > * Added missing EEXTEND operations to LE signing and launch. > * Fixed SSA size for GPRS region from 168 to 184 bytes. > * Fixed the checks for TCS flags. Now DBGOPTIN is allowed. > * Check that TCS addresses are in ELRANGE and not just page aligned. > * Require kernel to be compiled with X64_64 and CPU_SUP_INTEL. > * Fixed an incorrect value for SGX_ATTR_DEBUG from 0x01 to 0x02. > > v2: > * get_rand_uint32() changed the value of the pointer instead of value > where it is pointing at. > * Launch enclave incorrectly used sigstruct attributes-field instead of > enclave attributes-field. > * Removed unused struct sgx_add_page_req from sgx_ioctl.c > * Removed unused sgx_has_sgx2. > * Updated arch/x86/include/asm/sgx.h so that it provides stub > implementations when sgx in not enabled. > * Removed cruft rdmsr-calls from sgx_set_pubkeyhash_msrs(). > * return -ENOMEM in sgx_alloc_page() when VA pages consume too much space > * removed unused global sgx_nr_pids > * moved sgx_encl_release to sgx_encl.c > * return -ERESTARTSYS instead of -EINTR in sgx_encl_init() > > Jarkko Sakkinen (10): > x86/sgx: Update MAINTAINERS > x86/sgx: Add SGX microarchitectural data structures > x86/sgx: Add wrappers for ENCLS leaf functions > x86/sgx: Add functions to allocate and free EPC pages > x86/sgx: Linux Enclave Driver > x86/sgx: Add provisioning > x86/sgx: Add a page reclaimer > x86/sgx: ptrace() support for the SGX driver > selftests/x86: Add a selftest for SGX > docs: x86/sgx: Document SGX micro architecture and kernel internals > > Sean Christopherson (10): > x86/cpufeatures: x86/msr: Add Intel SGX hardware bits > x86/cpufeatures: x86/msr: Intel SGX Launch Control hardware bits > x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX > x86/cpu/intel: Detect SGX support > x86/sgx: Enumerate and track EPC sections > mm: Introduce vm_ops->may_mprotect() > x86/vdso: Add support for exception fixup in vDSO functions > x86/fault: Add helper function to sanitize error code > x86/traps: Attempt to fixup exceptions in vDSO before signaling > x86/vdso: Implement a vDSO for Intel SGX enclave call > > .../userspace-api/ioctl/ioctl-number.rst | 1 + > Documentation/x86/index.rst | 1 + > Documentation/x86/sgx.rst | 206 +++++ > MAINTAINERS | 11 + > arch/x86/Kconfig | 14 + > arch/x86/entry/vdso/Makefile | 8 +- > arch/x86/entry/vdso/extable.c | 46 + > arch/x86/entry/vdso/extable.h | 29 + > arch/x86/entry/vdso/vdso-layout.lds.S | 9 +- > arch/x86/entry/vdso/vdso.lds.S | 1 + > arch/x86/entry/vdso/vdso2c.h | 58 +- > arch/x86/entry/vdso/vsgx_enter_enclave.S | 131 +++ > arch/x86/include/asm/cpufeature.h | 5 +- > arch/x86/include/asm/cpufeatures.h | 8 +- > arch/x86/include/asm/disabled-features.h | 18 +- > arch/x86/include/asm/enclu.h | 8 + > arch/x86/include/asm/msr-index.h | 8 + > arch/x86/include/asm/required-features.h | 2 +- > arch/x86/include/asm/traps.h | 1 + > arch/x86/include/asm/vdso.h | 5 + > arch/x86/include/uapi/asm/sgx.h | 175 ++++ > arch/x86/kernel/cpu/Makefile | 1 + > arch/x86/kernel/cpu/common.c | 4 + > arch/x86/kernel/cpu/feat_ctl.c | 32 +- > arch/x86/kernel/cpu/sgx/Makefile | 6 + > arch/x86/kernel/cpu/sgx/arch.h | 343 ++++++++ > arch/x86/kernel/cpu/sgx/driver.c | 209 +++++ > arch/x86/kernel/cpu/sgx/driver.h | 32 + > arch/x86/kernel/cpu/sgx/encl.c | 756 +++++++++++++++++ > arch/x86/kernel/cpu/sgx/encl.h | 128 +++ > arch/x86/kernel/cpu/sgx/encls.h | 238 ++++++ > arch/x86/kernel/cpu/sgx/ioctl.c | 800 ++++++++++++++++++ > arch/x86/kernel/cpu/sgx/main.c | 280 ++++++ > arch/x86/kernel/cpu/sgx/reclaim.c | 471 +++++++++++ > arch/x86/kernel/cpu/sgx/sgx.h | 108 +++ > arch/x86/kernel/traps.c | 14 + > arch/x86/mm/fault.c | 45 +- > include/linux/mm.h | 2 + > mm/mprotect.c | 14 +- > tools/arch/x86/include/asm/cpufeatures.h | 7 +- > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/sgx/.gitignore | 2 + > tools/testing/selftests/sgx/Makefile | 53 ++ > tools/testing/selftests/sgx/call.S | 54 ++ > tools/testing/selftests/sgx/defines.h | 21 + > tools/testing/selftests/sgx/load.c | 282 ++++++ > tools/testing/selftests/sgx/main.c | 199 +++++ > tools/testing/selftests/sgx/main.h | 38 + > tools/testing/selftests/sgx/sigstruct.c | 395 +++++++++ > tools/testing/selftests/sgx/test_encl.c | 20 + > tools/testing/selftests/sgx/test_encl.lds | 40 + > .../selftests/sgx/test_encl_bootstrap.S | 89 ++ > 52 files changed, 5398 insertions(+), 31 deletions(-) > create mode 100644 Documentation/x86/sgx.rst > create mode 100644 arch/x86/entry/vdso/extable.c > create mode 100644 arch/x86/entry/vdso/extable.h > create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S > create mode 100644 arch/x86/include/asm/enclu.h > create mode 100644 arch/x86/include/uapi/asm/sgx.h > create mode 100644 arch/x86/kernel/cpu/sgx/Makefile > create mode 100644 arch/x86/kernel/cpu/sgx/arch.h > create mode 100644 arch/x86/kernel/cpu/sgx/driver.c > create mode 100644 arch/x86/kernel/cpu/sgx/driver.h > create mode 100644 arch/x86/kernel/cpu/sgx/encl.c > create mode 100644 arch/x86/kernel/cpu/sgx/encl.h > create mode 100644 arch/x86/kernel/cpu/sgx/encls.h > create mode 100644 arch/x86/kernel/cpu/sgx/ioctl.c > create mode 100644 arch/x86/kernel/cpu/sgx/main.c > create mode 100644 arch/x86/kernel/cpu/sgx/reclaim.c > create mode 100644 arch/x86/kernel/cpu/sgx/sgx.h > create mode 100644 tools/testing/selftests/sgx/.gitignore > create mode 100644 tools/testing/selftests/sgx/Makefile > create mode 100644 tools/testing/selftests/sgx/call.S > create mode 100644 tools/testing/selftests/sgx/defines.h > create mode 100644 tools/testing/selftests/sgx/load.c > create mode 100644 tools/testing/selftests/sgx/main.c > create mode 100644 tools/testing/selftests/sgx/main.h > create mode 100644 tools/testing/selftests/sgx/sigstruct.c > create mode 100644 tools/testing/selftests/sgx/test_encl.c > create mode 100644 tools/testing/selftests/sgx/test_encl.lds > create mode 100644 tools/testing/selftests/sgx/test_encl_bootstrap.S > > -- > 2.25.1 >
On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote: > Tested on Enarx. This requires a patch[0] for v29 support. > > Tested-by: Nathaniel McCallum <npmccallum@redhat.com> > > However, we did uncover a small usability issue. See below. > > [0]: https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662 ... > > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can > > be used to reserve the address range. Now /dev/sgx supports only opaque > > mappings to the (initialized) enclave data. > > The statement "Any mapping..." isn't actually true. > > Enarx creates a large enclave (currently 64GiB). This worked when we > created a file-backed mapping on /dev/sgx/enclave. However, switching > to an anonymous mapping fails with ENOMEM. We suspect this is because > the kernel attempts to allocate all the pages and zero them but there > is insufficient RAM available. We currently work around this by > creating a shared mapping on /dev/zero. Hmm, the kernel shouldn't actually allocate physical pages unless they're written. I'll see if I can reproduce. > If we want to keep this mmap() strategy, we probably don't want to > advise mmap(ANON) if it allocates all the memory for the enclave ahead > of time, even if it won't be used. This would be wasteful. > > OTOH, having to mmap("/dev/zero") seems a bit awkward.
Greg, "Dr. Greg" <greg@enjellic.com> writes: > As an aside, for those who haven't spent the last 5+ years of their > life working with this technology. SGX2 hardware platforms have the > ability to allow unrestricted code execution in enclave context. Unrestricted code execution even before loaded? Unrestricted by priviledge levels? > No amount of LSM or IMA interventions can provide any control over > that. They can restrict what is started and loaded before anything SGX happens. If you want to make real technical arguments then please be specific and precise and spare us the handwaving marketing speak. > In fact, the Confidential Computing Consortium, sponsored by none > other then the Linux Foundation, has at its fundamental tenant, And that's relevant to the technical question in which way? > the notion of developing an eco-system that allows the execution of > code and processing of data, over which, the owner or administrator of > the platform has no visibility or control. It would seem only logical > that adversarial interests would indulge themseleves in those > capabilities as well. > > With respect to SGX and these issues, cryptographic policy management > is important for the same reason that 2-factor authentication is now > considered standard practice in the security industry. > > We appreciate, Jarkko, that you feel that such infrastructure is > optional, like virtualization support, and is something that can go in > after the driver is mainlined. As the diffstat for our patch points > out, the proposed support has virtually no impact on the driver, the > same cannot be said for virtualization capabilities. The diffstat of your patch is irrelevant. What's relevant is the fact that it is completely unreviewed and that it creates yet another user space visible ABI with a noticable lack of documentation. > Moreover, adding support for key based policy management later would > require the addition of another ioctl in order to avoid ABI > compatibility issues. And that's a problem because? > The current initialization ioctl is best suited, from an engineering > perspective, to support this type of infrastructure. What's wrong with having IOCTL_INIT_TYPE_A and IOCTL_INIT_TYPE_B? Nothing at all. It's pretty straight forward and in fact a better solution than a duct taped multiplexing all in one IOCTL_INIT_PONIES approach. > In fact, the necessary support was removed from the ioctl for > political reasons rather then for any valid engineering rationale on > flexible launch control platforms, particularly with our patch or an > equivalent approach. You're surely making a convincing technical argument by claiming that this was a political decision. The amount of non-technical, i.e. political arguments in your mail is definitely larger than the technical content. > Hopefully this is a reasoned technical approach to what has been a > long standing issue surrounding this technology. It's an approach which guarantees that the driver will miss the next merge window. If that's your intention, then please let us know. Merging the current set of patches does not prevent anything you want to be added. It's an extension to the base implementation and not a prerequisite. > Best wishes for a productive week. > > Dr. Greg Thanks a lot for the best wishes. Unfortunately reading this email was not necessarily productive for me, but I surely wish that you can make productive use of my reply. Thanks, tglx > ------------------------------------------------------------------------------ > "Opportunity is missed by most people because it is dressed in overalls > and looks like work." > -- Thomas Edison ------------------------------------------------------------------------------ "Failure is simply the opportunity to begin again, this time more intelligently" -- Henry Ford
On Wed, 06 May 2020 17:14:22 -0500, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote: >> Tested on Enarx. This requires a patch[0] for v29 support. >> >> Tested-by: Nathaniel McCallum <npmccallum@redhat.com> >> >> However, we did uncover a small usability issue. See below. >> >> [0]: >> https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662 > > ... > >> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. >> anonymous) can >> > be used to reserve the address range. Now /dev/sgx supports only >> opaque >> > mappings to the (initialized) enclave data. >> >> The statement "Any mapping..." isn't actually true. >> >> Enarx creates a large enclave (currently 64GiB). This worked when we >> created a file-backed mapping on /dev/sgx/enclave. However, switching >> to an anonymous mapping fails with ENOMEM. We suspect this is because >> the kernel attempts to allocate all the pages and zero them but there >> is insufficient RAM available. We currently work around this by >> creating a shared mapping on /dev/zero. > > Hmm, the kernel shouldn't actually allocate physical pages unless they're > written. I'll see if I can reproduce. > For larger size mmap, I think it requires enabling vm overcommit mode 1: echo 1 | sudo tee /proc/sys/vm/overcommit_memory >> If we want to keep this mmap() strategy, we probably don't want to >> advise mmap(ANON) if it allocates all the memory for the enclave ahead >> of time, even if it won't be used. This would be wasteful. >> >> OTOH, having to mmap("/dev/zero") seems a bit awkward.
On Thu, May 7, 2020 at 1:03 AM Haitao Huang <haitao.huang@linux.intel.com> wrote: > > On Wed, 06 May 2020 17:14:22 -0500, Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote: > >> Tested on Enarx. This requires a patch[0] for v29 support. > >> > >> Tested-by: Nathaniel McCallum <npmccallum@redhat.com> > >> > >> However, we did uncover a small usability issue. See below. > >> > >> [0]: > >> https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662 > > > > ... > > > >> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. > >> anonymous) can > >> > be used to reserve the address range. Now /dev/sgx supports only > >> opaque > >> > mappings to the (initialized) enclave data. > >> > >> The statement "Any mapping..." isn't actually true. > >> > >> Enarx creates a large enclave (currently 64GiB). This worked when we > >> created a file-backed mapping on /dev/sgx/enclave. However, switching > >> to an anonymous mapping fails with ENOMEM. We suspect this is because > >> the kernel attempts to allocate all the pages and zero them but there > >> is insufficient RAM available. We currently work around this by > >> creating a shared mapping on /dev/zero. > > > > Hmm, the kernel shouldn't actually allocate physical pages unless they're > > written. I'll see if I can reproduce. > > > > For larger size mmap, I think it requires enabling vm overcommit mode 1: > echo 1 | sudo tee /proc/sys/vm/overcommit_memory Which means the default experience isn't good.
On Wed, May 06, 2020 at 09:39:55AM -0700, Jordan Hand wrote: Good afternoon, I hope the week is going well for everyone. > On 4/21/20 2:52 PM, Jarkko Sakkinen wrote: > > Make the vDSO callable directly from C by preserving RBX and taking leaf > > from RCX. > Tested with the Open Enclave SDK on top of Intel PSW. Specifically built > the Intel PSW with changes to support /dev/sgx mapping[1] new in v29. > > Tested-by: Jordan Hand <jorhand@linux.microsoft.com> > > [1] https://github.com/intel/linux-sgx/pull/530 Did you re-wire your SDK to convert all your ECALL and exception handling to the new VDSO architecture? Failures in enclave loading and initialization demonstrate themselves pretty clearly and are in the domain of the PSW being used. If there are going to be subtle SGX application operability issues that need to be found they will be in the new ECALL and exception handling mechanisms. Have a good remainder of the day. Dr. Greg As always, Dr. Greg Wettstein, Ph.D, Worker Artisans in autonomously Enjellic Systems Development, LLC self-defensive IOT platforms 4206 N. 19th Ave. and edge devices. Fargo, ND 58102 PH: 701-281-1686 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Davidsen's first rule of system administration: He learns to swim fastest who is thrown in the deepest water." -- Bill Davidsen
On Thu, May 07, 2020 at 12:49:15PM -0400, Nathaniel McCallum wrote: > On Thu, May 7, 2020 at 1:03 AM Haitao Huang > <haitao.huang@linux.intel.com> wrote: > > > > On Wed, 06 May 2020 17:14:22 -0500, Sean Christopherson > > <sean.j.christopherson@intel.com> wrote: > > > > > On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote: > > >> Tested on Enarx. This requires a patch[0] for v29 support. > > >> > > >> Tested-by: Nathaniel McCallum <npmccallum@redhat.com> > > >> > > >> However, we did uncover a small usability issue. See below. > > >> > > >> [0]: > > >> https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662 > > > > > > ... > > > > > >> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. > > >> anonymous) can > > >> > be used to reserve the address range. Now /dev/sgx supports only > > >> opaque > > >> > mappings to the (initialized) enclave data. > > >> > > >> The statement "Any mapping..." isn't actually true. Yeah, this definitely misleading. I haven't looked at our most recent docs, but I'm going to go out on a limb and assume we haven't documented the preferred mechanism for carving out virtual memory for the enclave. That absolutely should be done. > > >> Enarx creates a large enclave (currently 64GiB). This worked when we > > >> created a file-backed mapping on /dev/sgx/enclave. However, switching > > >> to an anonymous mapping fails with ENOMEM. We suspect this is because > > >> the kernel attempts to allocate all the pages and zero them but there > > >> is insufficient RAM available. We currently work around this by > > >> creating a shared mapping on /dev/zero. > > > > > > Hmm, the kernel shouldn't actually allocate physical pages unless they're > > > written. I'll see if I can reproduce. > > > > > > > For larger size mmap, I think it requires enabling vm overcommit mode 1: > > echo 1 | sudo tee /proc/sys/vm/overcommit_memory It shouldn't unless the initial mmap is "broken". Not truly broken, but broken in the sense that what Enarx is asking for is not actually what it desires. > Which means the default experience isn't good. What PROT_* and MAP_* flags are passed to mmap()? Overcommit only applies to VM_WRITE (a.k.a. PROT_WRITE) && !VM_SHARED && !VM_NORESERVED and, ignoring rlimits, VM expansion only applies to VM_WRITE && !VM_SHARED && !VM_STACK So hopefully Enarx is doing something like base = mmap(NULL, 64gb, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); because that means this is effectively a userspace bug. This goes back to my comment about the mmap() being "broken". Userspace is asking for a writable, private mapping, in which case it absolutely should be accounted. If using base = mmap(NULL, 64gb, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); works, then updating the SGX docs to better explain how to establish ELRANGE is sufficient (we need to that in any case). If the above still fails then something else is in play.
On Thu, 07 May 2020 11:49:15 -0500, Nathaniel McCallum <npmccallum@redhat.com> wrote: > On Thu, May 7, 2020 at 1:03 AM Haitao Huang > <haitao.huang@linux.intel.com> wrote: >> >> On Wed, 06 May 2020 17:14:22 -0500, Sean Christopherson >> <sean.j.christopherson@intel.com> wrote: >> >> > On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote: >> >> Tested on Enarx. This requires a patch[0] for v29 support. >> >> >> >> Tested-by: Nathaniel McCallum <npmccallum@redhat.com> >> >> >> >> However, we did uncover a small usability issue. See below. >> >> >> >> [0]: >> >> >> https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662 >> > >> > ... >> > >> >> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. >> >> anonymous) can >> >> > be used to reserve the address range. Now /dev/sgx supports only >> >> opaque >> >> > mappings to the (initialized) enclave data. >> >> >> >> The statement "Any mapping..." isn't actually true. >> >> >> >> Enarx creates a large enclave (currently 64GiB). This worked when we >> >> created a file-backed mapping on /dev/sgx/enclave. However, switching >> >> to an anonymous mapping fails with ENOMEM. We suspect this is because >> >> the kernel attempts to allocate all the pages and zero them but there >> >> is insufficient RAM available. We currently work around this by >> >> creating a shared mapping on /dev/zero. >> > >> > Hmm, the kernel shouldn't actually allocate physical pages unless >> they're >> > written. I'll see if I can reproduce. >> > >> >> For larger size mmap, I think it requires enabling vm overcommit mode 1: >> echo 1 | sudo tee /proc/sys/vm/overcommit_memory > > Which means the default experience isn't good. > Yes, it is not good default. But this is not sgx specific IIUC. Normal applications would have the same issue if they ask for large mapping than whatever limit kernel enforces by default.
On Thu, 07 May 2020 14:34:59 -0500, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > On Thu, May 07, 2020 at 12:49:15PM -0400, Nathaniel McCallum wrote: >> On Thu, May 7, 2020 at 1:03 AM Haitao Huang >> <haitao.huang@linux.intel.com> wrote: >> > >> > On Wed, 06 May 2020 17:14:22 -0500, Sean Christopherson >> > <sean.j.christopherson@intel.com> wrote: >> > >> > > On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote: >> > >> Tested on Enarx. This requires a patch[0] for v29 support. >> > >> >> > >> Tested-by: Nathaniel McCallum <npmccallum@redhat.com> >> > >> >> > >> However, we did uncover a small usability issue. See below. >> > >> >> > >> [0]: >> > >> >> https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662 >> > > >> > > ... >> > > >> > >> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. >> > >> anonymous) can >> > >> > be used to reserve the address range. Now /dev/sgx supports >> only >> > >> opaque >> > >> > mappings to the (initialized) enclave data. >> > >> >> > >> The statement "Any mapping..." isn't actually true. > > Yeah, this definitely misleading. I haven't looked at our most recent > docs, > but I'm going to go out on a limb and assume we haven't documented the > preferred mechanism for carving out virtual memory for the enclave. That > absolutely should be done. > >> > >> Enarx creates a large enclave (currently 64GiB). This worked when >> we >> > >> created a file-backed mapping on /dev/sgx/enclave. However, >> switching >> > >> to an anonymous mapping fails with ENOMEM. We suspect this is >> because >> > >> the kernel attempts to allocate all the pages and zero them but >> there >> > >> is insufficient RAM available. We currently work around this by >> > >> creating a shared mapping on /dev/zero. >> > > >> > > Hmm, the kernel shouldn't actually allocate physical pages unless >> they're >> > > written. I'll see if I can reproduce. >> > > >> > >> > For larger size mmap, I think it requires enabling vm overcommit mode >> 1: >> > echo 1 | sudo tee /proc/sys/vm/overcommit_memory > > It shouldn't unless the initial mmap is "broken". Not truly broken, but > broken in the sense that what Enarx is asking for is not actually what it > desires. > So I tried, this passes with mode 1 but fail with ENOMEM by default: mmap(NULL, 0x100000000000UL, PROT_NONE, MAP_SHARED| MAP_ANONYMOUS, -1, 0);
On Thu, May 07, 2020 at 05:35:31PM -0500, Haitao Huang wrote: > On Thu, 07 May 2020 14:34:59 -0500, Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > >On Thu, May 07, 2020 at 12:49:15PM -0400, Nathaniel McCallum wrote: > >>> For larger size mmap, I think it requires enabling vm overcommit mode > >>1: > >>> echo 1 | sudo tee /proc/sys/vm/overcommit_memory > > > >It shouldn't unless the initial mmap is "broken". Not truly broken, but > >broken in the sense that what Enarx is asking for is not actually what it > >desires. > > > So I tried, this passes with mode 1 but fail with ENOMEM by default: > > mmap(NULL, 0x100000000000UL, PROT_NONE, MAP_SHARED| MAP_ANONYMOUS, -1, 0); Ah, fudge. shmem_zero_setup() triggers shmem_acct_size() and thus __vm_enough_memory(). Which I should have rememered because I've stared at that code several times when dealing with the enclave's backing store. I wasn't seeing the issue because I happened to use MAP_PRIVATE. So, bad analysis, good conclusion, i.e. the kernel is still doing the right thing, it's just not ideal for userspace. Jarkko, we should update the docs and selftest to recommend and use PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS or PROT_NONE, MAP_SHARED | MAP_NORESERVE | MAP_ANONYMOUS" when carving out ELRANGE, with an explicit comment that all the normal rules for mapping memory still apply.
On Wed, Apr 29, 2020 at 8:30 AM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Sun, Apr 26, 2020 at 11:57:53AM -0500, Dr. Greg wrote: > > In closing, it is important to note that the proposed SGX driver is > > not available as a module. This effectively excludes any alternative > > implementations of the driver without replacement of the kernel at > > large. > > No it doesn't. The SGX subsytem won't allocate EPC pages unless userspace > creates an enclave, i.e. preventing unprivileged userspace from accessing > /dev/sgx/enclave will allow loading an alternative out-of-tree SGX module. > Yes, SGX sanitizes the EPC on boot, but that's arguably a good thing for > out-of-tree modules. > > And if you want to get crafty and squash in-kernel SGX altogether, boot > with "clearcpuid=<SGX_LC>" and/or "clearcpuid=<SGX>" to disable in-kernel > support entirely. SGX won't be correctly enumerated in /proc/cpuinfo > relative to the existence of an out-of-tree module, but that seems like a > very minor issue if you're running with a completely different SGX driver. > > > It also means that any platform, with SGX hardware support, > > running a kernel with this driver, has the potential for the > > security/privacy issues noted above. > > Unless I'm mistaken, /dev/sgx is root-only by default. There are far > scarier mechanisms available to root for hosing the system. > > > If key based policy management is not allowed, then the driver needs > > to be re-architected to have modular support so that alternative > > implementations or the absence of any driver support are at least > > tenable. > > As above, using an alternative implementation is teneble, albeit a bit > kludgy. It is worth noting that, if someone actualy does this, and a future kernel patch breaks it, the upstream developers are unlikely to apologize or even feel particularly bad. See, for example, the current situation with VirtualBox.
On 5/7/20 11:06 AM, Dr. Greg wrote: > On Wed, May 06, 2020 at 09:39:55AM -0700, Jordan Hand wrote: > > Good afternoon, I hope the week is going well for everyone. > >> On 4/21/20 2:52 PM, Jarkko Sakkinen wrote: >>> Make the vDSO callable directly from C by preserving RBX and taking leaf >>> from RCX. > >> Tested with the Open Enclave SDK on top of Intel PSW. Specifically built >> the Intel PSW with changes to support /dev/sgx mapping[1] new in v29. >> >> Tested-by: Jordan Hand <jorhand@linux.microsoft.com> >> >> [1] https://github.com/intel/linux-sgx/pull/530 > > Did you re-wire your SDK to convert all your ECALL and exception > handling to the new VDSO architecture? > No. We have many users of our SDK who rely on the out-of-tree driver and will for the foreseeable future. I aim to support both in-tree and out-of-tree with minimal code diff. > > Failures in enclave loading and initialization demonstrate themselves > pretty clearly and are in the domain of the PSW being used. If there > are going to be subtle SGX application operability issues that need to > be found they will be in the new ECALL and exception handling > mechanisms. Fair enough, no I have not tested those mechanisms. Apologies, I should have removed that line from the quoted text. -Jordan
On Thu, May 07, 2020 at 02:41:30AM +0200, Thomas Gleixner wrote: > Greg, Good morning Thomas, I hope the week has gone well for you, the same to everyone else reading this. > "Dr. Greg" <greg@enjellic.com> writes: > > As an aside, for those who haven't spent the last 5+ years of their > > life working with this technology. SGX2 hardware platforms have the > > ability to allow unrestricted code execution in enclave context. > Unrestricted code execution even before loaded? Unrestricted by > priviledge levels? The LSM/IMA infrastructure will have no visibility into code that will be executed and data processed in enclave context, see immediately below. > > No amount of LSM or IMA interventions can provide any control over > > that. > They can restrict what is started and loaded before anything SGX > happens. At best, the visibility and inspection will be over a standard bootstrap loader of some type is in no way related to the actual code that will be executed or data processed in enclave context. Furthermore, given what is becoming the excepted software architecture for the SGX industry, that code will largely have full access to system resources. > If you want to make real technical arguments then please be specific > and precise and spare us the handwaving marketing speak. Thomas, you don't know me from boo, those that do know me and have worked for me would tell you with absolute certainty that being a 'handwavy marketing type' is the complete antithesis of my character and how I practice engineering. Andy wanted to know why I felt the current driver disadvantaged our work, I provided a technical summary, omitted completely in your response, that indicated how we are using SGX technology in a manner that is inherently different from the rest of the industry. > > In fact, the Confidential Computing Consortium, sponsored by none > > other then the Linux Foundation, has at its fundamental tenant, > And that's relevant to the technical question in which way? It speaks directly to the primary marketing objective that is driving the economics of what this technology is going to turn into. The metaphor for objective, that now seems to be generally accepted, is 'Runtime Encryption'. One of the most common threads on the SGX developer's list is how developers can restrict the ability of other's to see what their enclaves are doing or the code in them. The standard response is 'nothing', a security context has to be established through remote attestation and the confidential material then conveyed into the enclave using appropriate confidentiality and integrity protections. As further technical evidence for all of this, I would refer readers to the following tag in the Intel Linux SGX SDK: sgx_2.1.3 That tags the release that introduced the implementation of Protected Code Loader (PCL) support. This allows developers to create enclaves that are encrypted at rest and then decrypted, loaded and executed by a second bootstrap loader enclave that protects the confidentiality of the first enclave. SGX2 hardware support makes the concept of protected code loading, at once, more practical, more efficient and closes a visibility vulnerability that the current PCL model has. The impact of this has to be viewed in the context of the economic market forces that are driving 'runtime encryption'. The original SGX programming model promoted by Intel was to partition applications so that sensitive data, and the code that operated on it, were inside of an enclave. That approach was doomed by the economic factors driving software development which are roughly as follows: 1.) Ease of development. 2.) Cost of development. 3.) Time to market. 4.) Return on investment. 5.) Security In about that order, although there are probably a half dozen other factors between 4 and 5. As a result, the 'runtime encryption' industry moved to a library OS model where unmodified applications can be run in an enclave along along with full interpreter environments such as Java, Python, Javascript and WASM. In this architecture all of the requests for operating system resources are shimmed through OCALL's to be serviced by the OS. If you combine all of these factors and influences, you end up with, looking forward, to an architecture that will favor a small bootstrap loader that downloads and executes code, over protected channels, that has almost full access to operating system resources. In this model the only visibility that the platform owner has, and by extension the LSM/IMA infrastructure, is the bootstrap loader itself. This architecture will also be driven by how attractive the concept is to the devops model and cloud orchestration in general. Our technical arguement is that it does not seem unreasonable for the Linux driver, at the discretion of the platform owner, to be able to implement the equivalent of 2-factor authentication over the initiation of such execution. I apologize for the detail and hope it is at once both suitably technical and 'non-hand-wavy'. > > the notion of developing an eco-system that allows the execution of > > code and processing of data, over which, the owner or administrator of > > the platform has no visibility or control. It would seem only logical > > that adversarial interests would indulge themseleves in those > > capabilities as well. > > > > With respect to SGX and these issues, cryptographic policy management > > is important for the same reason that 2-factor authentication is now > > considered standard practice in the security industry. > > > > We appreciate, Jarkko, that you feel that such infrastructure is > > optional, like virtualization support, and is something that can go in > > after the driver is mainlined. As the diffstat for our patch points > > out, the proposed support has virtually no impact on the driver, the > > same cannot be said for virtualization capabilities. > The diffstat of your patch is irrelevant. What's relevant is the > fact that it is completely unreviewed and that it creates yet > another user space visible ABI with a noticable lack of > documentation. A number of points on this issue. If anyone cares to review the patch, it has a 73 line commit message that describes how the architecture works. That would obviously be embellished and added to the general documentation. We posted the initial concept implementation of this infrastructure 14 months ago. Andy, rightly so, indicated the design was unclean. The simplification of the SGX driver at large over the last year allowed a much more straight forward implementation of the patch. This version of the patch has been posted twice in the last three months, in response to the two major architectural revisions to the driver that have occurred. Jarkko indicated a year ago that our approach would 'bloat' the driver. A common criticism of patches in general on LKML is that they complicate sub-systems. I believe that diffstats are generally recognized as cognitive indicators of the amount of bloat, complexity and impact that a proposed patch introduces. As to lack of review, we would certainly welcome technical and API comments but we cannot force them. Candidly, the only people capable of working with the patch are groups that have full and complete runtime implementations that they can modify and test and that group is extremely limited. > > Moreover, adding support for key based policy management later would > > require the addition of another ioctl in order to avoid ABI > > compatibility issues. > And that's a problem because? See below. > > The current initialization ioctl is best suited, from an engineering > > perspective, to support this type of infrastructure. > What's wrong with having IOCTL_INIT_TYPE_A and IOCTL_INIT_TYPE_B? > > Nothing at all. It's pretty straight forward and in fact a better > solution than a duct taped multiplexing all in one IOCTL_INIT_PONIES > approach. I believe that a review of our patch would indicate that a 'duct taped multplexing INIT_PONIES ioctl' is not a technically honest assessment of what we are proposing. There is no flag or multi-case variable implementation, the patch simply re-adds a pointer to a structure that was in the previously out of tree driver. In fact, anyone who reviews the patch will see that the current driver creates a pointer in the ioctl call to pass downward into the hardware initialization routines. Our code simply replaces that pointer with one supplied by userspace. That being said, we could certainly wire up a second ioctl and use that. Candidly, under normal circumstances, that would arguably raise a bloat accusation, since why would a second ioctl be implemented when there is an already fully functional and mono-purpose ioctl that triggers enclave initialization. > > In fact, the necessary support was removed from the ioctl for > > political reasons rather then for any valid engineering rationale on > > flexible launch control platforms, particularly with our patch or an > > equivalent approach. > You're surely making a convincing technical argument by claiming > that this was a political decision. The amount of non-technical, > i.e. political arguments in your mail is definitely larger than the > technical content. There is the adage out here in the Upper Midwest, shared elsewhere, that you shouldn't bring a knife to a gunfight, to date the issue of cryptographic policy management has been exclusively political and decidedly non-technical. We have tried to respond by demonstrating, with code, that a minimum impact technical approach is possible. To date the only response has been that we need to get this driver into the kernel as fast as we can and then deal with other issues. Candidly, this is equivalent to, 'lets hurry up and ship it and then we can worry about bugs and security issues', that plagues the rest of the software industry. > > Hopefully this is a reasoned technical approach to what has been a > > long standing issue surrounding this technology. > It's an approach which guarantees that the driver will miss the next > merge window. If that's your intention, then please let us know. I believe that a dispassionate observer, reviewing the last 2-3 years of LKML conversations surrounding SGX, would not conclude that our actions have delayed the driver. Candidly, the issue may be coming down to a question as to whether or not the driver should go into the kernel. I pride myself on extreme technical honesty, I assume everyone else does, so it probably needs to be taken into consideration that it is now common knowledge that Intel is dropping support for SGX, at least on the client side: https://software.intel.com/en-us/forums/intel-software-guard-extensions-intel-sgx/topic/850599 https://www.techspot.com/news/84506-leaked-intel-rocket-lake-s-diagram-highlights-platform.html My understanding is that getting SGX enabled on Comet Lake platforms requires a discussion with vendors regarding a customized BIOS/firmware implementation for the platform that enables the SGX technology that is now being left silent on the chip/firmware. A pity really, the technology arguably had a lot of legs with respect to the capabilities that it brought to edge security but that is another topic in and of itself. SGX takes a lot of expensive silicon real estate. Whether or not that price continues to get paid is going to depend on whether or not there is sufficient market 'pull' from entities who want to push code and data up into the cloud so that it can be executed without anyone knowing what it is doing. Hence our continued advocacy for a driver architecture that allows stronger protections to be applied to that process, without affecting the default behavior of the driver. At the very least a modular form of the driver should be considered that would allow alternate implementations. Sean indicated that there was a 'kludgy' approach that would allow an alternate modular implementation alongside the in-kernel driver. I believe that Andy has already indicated that may not be an advisable approach. > Merging the current set of patches does not prevent anything you want to > be added. It's an extension to the base implementation and not a > prerequisite. My response to that is that a method for the driver to implement the equivalent of 2-factor authentication over unrestricted code execution and data manipulation, that does not affect the standard driver behavior, needs to be part of the base implementation. > > Best wishes for a productive week. > > > > Dr. Greg > Thanks a lot for the best wishes. Unfortunately reading this email was > not necessarily productive for me, but I surely wish that you can make > productive use of my reply. I'm sorry that you found reading my e-mail to be a waste of your time, hopefully the time you took to respond has now allowed everyone reading along at home to enjoy a thorough review of the issues at hand. In a precise and non-handwavy sort of fashion.... > Thanks, > > tglx At the risk of wasting more of everyone's time, best wishes for a productive weekend. Dr. Greg As always, Dr. Greg Wettstein, Ph.D, Worker Developers of autonomously Enjellic Systems Development, LLC self-defensive IOT platforms 4206 N. 19th Ave. and edge devices. Fargo, ND 58102 PH: 701-281-1686 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats." -- Howard Aiken
Adding some Google folks to the party. On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote: > Intel(R) SGX is a set of CPU instructions that can be used by applications > to set aside private regions of code and data. The code outside the enclave > is disallowed to access the memory inside the enclave by the CPU access > control. > > There is a new hardware unit in the processor called Memory Encryption > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define > one or many MEE regions that can hold enclave data by configuring them with > PRMRR registers. > > The MEE automatically encrypts the data leaving the processor package to > the MEE regions. The data is encrypted using a random key whose life-time > is exactly one power cycle. > > The current implementation requires that the firmware sets > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > decide what enclaves it wants run. The implementation does not create > any bottlenecks to support read-only MSRs later on. > > You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > > cat /proc/cpuinfo | grep sgx > > v29: > * The selftest has been moved to selftests/sgx. Because SGX is an execution > environment of its own, it really isn't a great fit with more "standard" > x86 tests. > > The RSA key is now generated on fly and the whole signing process has > been made as part of the enclave loader instead of signing the enclave > during the compilation time. > > Finally, the enclave loader loads now the test enclave directly from its > ELF file, which means that ELF file does not need to be coverted as raw > binary during the build process. > * Version the mm_list instead of using on synchronize_mm() when adding new > entries. We hold the write lock for the mm_struct, and dup_mm() can thus > deadlock with the page reclaimer, which could hold the lock for the old > mm_struct. > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can > be used to reserve the address range. Now /dev/sgx supports only opaque > mappings to the (initialized) enclave data. > * Make the vDSO callable directly from C by preserving RBX and taking leaf > from RCX. > > v28: > * Documented to Documentation/x86/sgx.rst how the kernel manages the > enclave ownership. > * Removed non-LC flow from sgx_einit(). > * Removed struct sgx_einittoken since only the size of the corresponding > microarchitectural structure is used in the series ATM. > > v27: > * Disallow RIE processes to use enclaves as there could a permission > conflict between VMA and enclave permissions. > * In the documentation, replace "grep /proc/cpuinfo" with > "grep sgx /proc/cpuinfo". > > v26: > * Fixed the commit author in "x86/sgx: Linux Enclave Driver", which was > changed in v25 by mistake. > * Addressed a bunch of grammar mistakes in sgx.rst (thanks Randy once > again for such a detailed feedback). > * Added back the MAINTAINERS update commit, which was mistakenly removed > in v25. > * EREMOVE's for SECS cannot be done while sanitizing an EPC section. The > CPU does not allow to remove a SECS page before all of its children have > been removed, and a child page can be in some other section than the one > currently being processed. Thus, removed special SECS processing from > sgx_sanitize_page() and instead put sections through it twice. In the > 2nd round the lists should only contain SECS pages. > > v25: > * Fix a double-free issue when SGX_IOC_ENCLAVE_ADD_PAGES > fails on executing ENCLS[EADD]. The rollback path executed > radix_tree_delete() on the same address twice when this happened. > * Return -EINTR instead of -ERESTARTSYS in SGX_IOC_ENCLAVE_ADD_PAGES when > a signal is pending. > * As requested by Borislav, move the CPUID 0x12 features to their own word > in cpufeatures. > * Sean fixed a bug from sgx_reclaimer_write() where sgx_encl_put_backing() > was called with an uninitialized pointer when sgx_encl_get_backing() > fails. > * Migrated /dev/sgx/* to misc. This is future-proof as struct miscdevice > has 'groups' for setting up sysfs attributes for the device. > * Use device_initcall instead of subsys_initcall so that misc_class is > initialized before SGX is initialized. > * Return -EACCES in SGX_IOC_ENCLAVE_INIT when caller tries to select > enclave attributes that we the kernel does not allow it to set instead > of -EINVAL. > * Unless SGX public key MSRs are writable always deny the feature from > Linux. Previously this was only denied from driver. How VMs should be > supported is not really part of initial patch set, which makes this > an obvious choice. > * Cleaned up and refined documentation to be more approachable. > > v24: > * Reclaim unmeasured and TCS pages (regression in v23). > * Replace usages of GFP_HIGHUSER with GFP_KERNEL. > * Return -EIO on when EADD or EEXTEND fails in %SGX_IOC_ENCLAVE_ADD_PAGES > and use the same rollback (destroy enclave). This can happen when host > suspends itself unknowingly to a VM running enclaves. From -EIO the user > space can deduce what happened. > * Have a separate @count in struct sgx_enclave_add_pages to output number > of bytes processed instead of overwriting the input parameters for > clarity and more importantly that the API provides means for partial > processing (@count could be less than @length in success case). > > v23: > * Replace SGX_ENCLAVE_ADD_PAGE with SGX_ENCLAVE_ADD_PAGES. Replace @mrmask > with %SGX_PAGE_MEASURE flag. > * Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page. > * In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail > (because of OOM) even in legit behaviour and after EBLOCK the reclaiming > flow can be only reverted by killing the whole enclave. > * Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact > it should have been bit 6 (Table 37-3 in the SDM). > * Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX > error code. > > v22: > * Refined bunch commit messages and added associated SDM references as > many of them were too exhausting and some outdated. > * Alignment checks have been removed from mmap() because it does not define the > ELRANGE. VMAs only act as windows to the enclave. The semantics compare > somewhat how mmap() works with regular files. > * We now require user space addresses given to SGX_IOC_ENCLAVE_ADD_PAGE to be > page aligned so that we can pass the page directly to EADD and do not have > to do an extra copy. This was made effectively possible by removing the > worker thread for adding pages. > * The selftest build files have been refined throughout of various glitches > and work properly in a cross compilation environment such as BuildRoot. > In addition, libcalls fail the build with an assertion in the linker > script, if they end up to the enclave binary. > * CONFIG_INTEL_SGX_DRIVER has been removed because you cannot use SGX core > for anything without having the driver. This could change when KVM support > is added. > * We require zero permissions in SECINFO for TCS pages because the CPU > overwrites SECINFO flags with zero permissions and measures the page > only after that. Allowing to pass TCS with non-zero permissions would > cause mismatching measurement between the one provided in SIGSTRUCT and > the one computed by the CPU. > * Obviously lots of small fixes and clean ups (does make sense to > document them all). > > v21: > * Check on mmap() that the VMA does cover an area that does not have > enclave pages. Only mapping with PROT_NONE can do that to reserve > initial address space for an enclave. > * Check om mmap() and mprotect() that the VMA permissions do not > surpass the enclave permissions. > * Remove two refcounts from vma_close(): mm_list and encl->refcount. > Enclave refcount is only need for swapper/enclave sync and we can > remove mm_list refcount by destroying mm_struct when the process > is closed. By not having vm_close() the Linux MM can merge VMAs. > * Do not naturally align MAP_FIXED address. > * Numerous small fixes and clean ups. > * Use SRCU for synchronizing the list of mm_struct's. > * Move to stack based call convention in the vDSO. > > v20: > * Fine-tune Kconfig messages and spacing and remove MMU_NOTIFIER > dependency as MMU notifiers are no longer used in the driver. > * Use mm_users instead of mm_count as refcount for mm_struct as mm_count > only protects from deleting mm_struct, not removing its contents. > * Sanitize EPC when the reclaimer thread starts by doing EREMOVE for all > of them. They could be in initialized state when the kernel starts > because it might be spawned by kexec(). > * Documentation overhaul. > * Use a device /dev/sgx/provision for delivering the provision token > instead of securityfs. > * Create a reference to the enclave when already when opening > /dev/sgx/enclave. The file is then associated with this enclave only. > mmap() can be done at free at any point and always get a reference to > the enclave. To summarize the file now represents the enclave. > > v19: > * Took 3-4 months but in some sense this was more like a rewrite of most > of the corners of the source code. If I've forgotten to deal with some > feedback, please don't shout me. Make a remark and I will fix it for > the next version. Hopefully there won't be this big turnovers anymore. > * Validate SECS attributes properly against CPUID given attributes and > against allowed attributes. SECS attributes are the ones that are > enforced whereas SIGSTRUCT attributes tell what is required to run > the enclave. > * Add KSS (Key Sharing Support) to the enclave attributes. > * Deny MAP_PRIVATE as an enclave is always a shared memory entity. > * Revert back to shmem backing storage so that it can be easily shared > by multiple processes. > * Split the recognization of an ENCLS leaf failure by using three > functions to detect it: encsl_faulted(), encls_returned_code() and > sgx_failed(). encls_failed() is only caused by a spurious expections that > should never happen. Thus, it is not defined as an inline function in > order to easily insert a kprobe to it. > * Move low-level enclave management routines, page fault handler and page > reclaiming routines from driver to the core. These cannot be separated > from each other as they are heavily interdependent. The rationale is that > the core does not call any code from the driver. > * Allow the driver to be compiled as a module now that it no code is using > its routines and it only uses exported symbols. Now the driver is > essentially just a thin ioctl layer. > * Reworked the driver to maintain a list of mm_struct's. The VMA callbacks > add new entries to this list as the process is forked. Each entry has > its own refcount because they have a different life-cycle as the enclave > does. In effect @tgid and @mm have been removed from struct sgx_encl > and we allow forking by removing VM_DONTCOPY from vm flags. > * Generate a cpu mask in the reclaimer from the cpu mask's of all > mm_struct's. This will kick out the hardware threads out of the enclave > from multiple processes. It is not a local variable because it would > eat too much of the stack space but instead a field in struct > sgx_encl. > * Allow forking i.e. remove VM_DONTCOPY. I did not change the API > because the old API scaled to the workload that Andy described. The > codebase is now mostly API independent i.e. changing the API is a > small task. For me the proper trigger to chanage it is a as concrete > as possible workload that cannot be fulfilled. I hope you understand > my thinking here. I don't want to change anything w/o proper basis > but I'm ready to change anything if there is a proper basis. I do > not have any kind of attachment to any particular type of API. > * Add Sean's vDSO ENCLS(EENTER) patches and update selftest to use the > new vDSO. > > v18: > * Update the ioctl-number.txt. > * Move the driver under arch/x86. > * Add SGX features (SGX, SGX1, SGX2) to the disabled-features.h. > * Rename the selftest as test_sgx (previously sgx-selftest). > * In order to enable process accounting, swap EPC pages and PCMD's to a VMA > instead of shmem. > * Allow only to initialize and run enclaves with a subset of > {DEBUG, MODE64BIT} set. > * Add SGX_IOC_ENCLAVE_SET_ATTRIBUTE to allow an enclave to have privileged > attributes e.g. PROVISIONKEY. > > v17: > * Add a simple selftest. > * Fix a null pointer dereference to section->pages when its > allocation fails. > * Add Sean's description of the exception handling to the documentation. > > v16: > * Fixed SOB's in the commits that were a bit corrupted in v15. > * Implemented exceptio handling properly to detect_sgx(). > * Use GENMASK() to define SGX_CPUID_SUB_LEAF_TYPE_MASK. > * Updated the documentation to use rst definition lists. > * Added the missing Documentation/x86/index.rst, which has a link to > intel_sgx.rst. Now the SGX and uapi documentation is properly generated > with 'make htmldocs'. > * While enumerating EPC sections, if an undefined section is found, fail > the driver initialization instead of continuing the initialization. > * Issue a warning if there are more than %SGX_MAX_EPC_SECTIONS. > * Remove copyright notice from arch/x86/include/asm/sgx.h. > * Migrated from ioremap_cache() to memremap(). > > v15: > * Split into more digestable size patches. > * Lots of small fixes and clean ups. > * Signal a "plain" SIGSEGV on an EPCM violation. > > v14: > * Change the comment about X86_FEATURE_SGX_LC from “SGX launch > configuration” to “SGX launch control”. > * Move the SGX-related CPU feature flags as part of the Linux defined > virtual leaf 8. > * Add SGX_ prefix to the constants defining the ENCLS leaf functions. > * Use GENMASK*() and BIT*() in sgx_arch.h instead of raw hex numbers. > * Refine the long description for CONFIG_INTEL_SGX_CORE. > * Do not use pr_*_ratelimited() in the driver. The use of the rate limited > versions is legacy cruft from the prototyping phase. > * Detect sleep with SGX_INVALID_EINIT_TOKEN instead of counting power > cycles. > * Manually prefix with “sgx:” in the core SGX code instead of redefining > pr_fmt. > * Report if IA32_SGXLEPUBKEYHASHx MSRs are not writable in the driver > instead of core because it is a driver requirement. > * Change prompt to bool in the entry for CONFIG_INTEL_SGX_CORE because the > default is ‘n’. > * Rename struct sgx_epc_bank as struct sgx_epc_section in order to match > the SDM. > * Allocate struct sgx_epc_page instances one at a time. > * Use “__iomem void *” pointers for the mapped EPC memory consistently. > * Retry once on SGX_INVALID_TOKEN in sgx_einit() instead of counting power > cycles. > * Call enclave swapping operations directly from the driver instead of > calling them .indirectly through struct sgx_epc_page_ops because indirect > calls are not required yet as the patch set does not contain the KVM > support. > * Added special signal SEGV_SGXERR to notify about SGX EPCM violation > errors. > > v13: > * Always use SGX_CPUID constant instead of a hardcoded value. > * Simplified and documented the macros and functions for ENCLS leaves. > * Enable sgx_free_page() to free active enclave pages on demand > in order to allow sgx_invalidate() to delete enclave pages. > It no longer performs EREMOVE if a page is in the process of > being reclaimed. > * Use PM notifier per enclave so that we don't have to traverse > the global list of active EPC pages to find enclaves. > * Removed unused SGX_LE_ROLLBACK constant from uapi/asm/sgx.h > * Always use ioremap() to map EPC banks as we only support 64-bit kernel. > * Invalidate IA32_SGXLEPUBKEYHASH cache used by sgx_einit() when going > to sleep. > > v12: > * Split to more narrow scoped commits in order to ease the review process and > use co-developed-by tag for co-authors of commits instead of listing them in > the source files. > * Removed cruft EXPORT_SYMBOL() declarations and converted to static variables. > * Removed in-kernel LE i.e. this version of the SGX software stack only > supports unlocked IA32_SGXLEPUBKEYHASHx MSRs. > * Refined documentation on launching enclaves, swapping and enclave > construction. > * Refined sgx_arch.h to include alignment information for every struct that > requires it and removed structs that are not needed without an LE. > * Got rid of SGX_CPUID. > * SGX detection now prints log messages about firmware configuration issues. > > v11: > * Polished ENCLS wrappers with refined exception handling. > * ksgxswapd was not stopped (regression in v5) in > sgx_page_cache_teardown(), which causes a leaked kthread after driver > deinitialization. > * Shutdown sgx_le_proxy when going to suspend because its EPC pages will be > invalidated when resuming, which will cause it not function properly > anymore. > * Set EINITTOKEN.VALID to zero for a token that is passed when > SGXLEPUBKEYHASH matches MRSIGNER as alloc_page() does not give a zero > page. > * Fixed the check in sgx_edbgrd() for a TCS page. Allowed to read offsets > around the flags field, which causes a #GP. Only flags read is readable. > * On read access memcpy() call inside sgx_vma_access() had src and dest > parameters in wrong order. > * The build issue with CONFIG_KASAN is now fixed. Added undefined symbols > to LE even if “KASAN_SANITIZE := false” was set in the makefile. > * Fixed a regression in the #PF handler. If a page has > SGX_ENCL_PAGE_RESERVED flag the #PF handler should unconditionally fail. > It did not, which caused weird races when trying to change other parts of > swapping code. > * EPC management has been refactored to a flat LRU cache and moved to > arch/x86. The swapper thread reads a cluster of EPC pages and swaps all > of them. It can now swap from multiple enclaves in the same round. > * For the sake of consistency with SGX_IOC_ENCLAVE_ADD_PAGE, return -EINVAL > when an enclave is already initialized or dead instead of zero. > > v10: > * Cleaned up anon inode based IPC between the ring-0 and ring-3 parts > of the driver. > * Unset the reserved flag from an enclave page if EDBGRD/WR fails > (regression in v6). > * Close the anon inode when LE is stopped (regression in v9). > * Update the documentation with a more detailed description of SGX. > > v9: > * Replaced kernel-LE IPC based on pipes with an anonymous inode. > The driver does not require anymore new exports. > > v8: > * Check that public key MSRs match the LE public key hash in the > driver initialization when the MSRs are read-only. > * Fix the race in VA slot allocation by checking the fullness > immediately after succeesful allocation. > * Fix the race in hash mrsigner calculation between the launch > enclave and user enclaves by having a separate lock for hash > calculation. > > v7: > * Fixed offset calculation in sgx_edbgr/wr(). Address was masked with PAGE_MASK > when it should have been masked with ~PAGE_MASK. > * Fixed a memory leak in sgx_ioc_enclave_create(). > * Simplified swapping code by using a pointer array for a cluster > instead of a linked list. > * Squeezed struct sgx_encl_page to 32 bytes. > * Fixed deferencing of an RSA key on OpenSSL 1.1.0. > * Modified TC's CMAC to use kernel AES-NI. Restructured the code > a bit in order to better align with kernel conventions. > > v6: > * Fixed semaphore underrun when accessing /dev/sgx from the launch enclave. > * In sgx_encl_create() s/IS_ERR(secs)/IS_ERR(encl)/. > * Removed virtualization chapter from the documentation. > * Changed the default filename for the signing key as signing_key.pem. > * Reworked EPC management in a way that instead of a linked list of > struct sgx_epc_page instances there is an array of integers that > encodes address and bank of an EPC page (the same data as 'pa' field > earlier). The locking has been moved to the EPC bank level instead > of a global lock. > * Relaxed locking requirements for EPC management. EPC pages can be > released back to the EPC bank concurrently. > * Cleaned up ptrace() code. > * Refined commit messages for new architectural constants. > * Sorted includes in every source file. > * Sorted local variable declarations according to the line length in > every function. > * Style fixes based on Darren's comments to sgx_le.c. > > v5: > * Described IPC between the Launch Enclave and kernel in the commit messages. > * Fixed all relevant checkpatch.pl issues that I have forgot fix in earlier > versions except those that exist in the imported TinyCrypt code. > * Fixed spelling mistakes in the documentation. > * Forgot to check the return value of sgx_drv_subsys_init(). > * Encapsulated properly page cache init and teardown. > * Collect epc pages to a temp list in sgx_add_epc_bank > * Removed SGX_ENCLAVE_INIT_ARCH constant. > > v4: > * Tied life-cycle of the sgx_le_proxy process to /dev/sgx. > * Removed __exit annotation from sgx_drv_subsys_exit(). > * Fixed a leak of a backing page in sgx_process_add_page_req() in the > case when vm_insert_pfn() fails. > * Removed unused symbol exports for sgx_page_cache.c. > * Updated sgx_alloc_page() to require encl parameter and documented the > behavior (Sean Christopherson). > * Refactored a more lean API for sgx_encl_find() and documented the behavior. > * Moved #PF handler to sgx_fault.c. > * Replaced subsys_system_register() with plain bus_register(). > * Retry EINIT 2nd time only if MSRs are not locked. > > v3: > * Check that FEATURE_CONTROL_LOCKED and FEATURE_CONTROL_SGX_ENABLE are set. > * Return -ERESTARTSYS in __sgx_encl_add_page() when sgx_alloc_page() fails. > * Use unused bits in epc_page->pa to store the bank number. > * Removed #ifdef for WQ_NONREENTRANT. > * If mmu_notifier_register() fails with -EINTR, return -ERESTARTSYS. > * Added --remove-section=.got.plt to objcopy flags in order to prevent a > dummy .got.plt, which will cause an inconsistent size for the LE. > * Documented sgx_encl_* functions. > * Added remark about AES implementation used inside the LE. > * Removed redundant sgx_sys_exit() from le/main.c. > * Fixed struct sgx_secinfo alignment from 128 to 64 bytes. > * Validate miscselect in sgx_encl_create(). > * Fixed SSA frame size calculation to take the misc region into account. > * Implemented consistent exception handling to __encls() and __encls_ret(). > * Implemented a proper device model in order to allow sysfs attributes > and in-kernel API. > * Cleaned up various "find enclave" implementations to the unified > sgx_encl_find(). > * Validate that vm_pgoff is zero. > * Discard backing pages with shmem_truncate_range() after EADD. > * Added missing EEXTEND operations to LE signing and launch. > * Fixed SSA size for GPRS region from 168 to 184 bytes. > * Fixed the checks for TCS flags. Now DBGOPTIN is allowed. > * Check that TCS addresses are in ELRANGE and not just page aligned. > * Require kernel to be compiled with X64_64 and CPU_SUP_INTEL. > * Fixed an incorrect value for SGX_ATTR_DEBUG from 0x01 to 0x02. > > v2: > * get_rand_uint32() changed the value of the pointer instead of value > where it is pointing at. > * Launch enclave incorrectly used sigstruct attributes-field instead of > enclave attributes-field. > * Removed unused struct sgx_add_page_req from sgx_ioctl.c > * Removed unused sgx_has_sgx2. > * Updated arch/x86/include/asm/sgx.h so that it provides stub > implementations when sgx in not enabled. > * Removed cruft rdmsr-calls from sgx_set_pubkeyhash_msrs(). > * return -ENOMEM in sgx_alloc_page() when VA pages consume too much space > * removed unused global sgx_nr_pids > * moved sgx_encl_release to sgx_encl.c > * return -ERESTARTSYS instead of -EINTR in sgx_encl_init() > > Jarkko Sakkinen (10): > x86/sgx: Update MAINTAINERS > x86/sgx: Add SGX microarchitectural data structures > x86/sgx: Add wrappers for ENCLS leaf functions > x86/sgx: Add functions to allocate and free EPC pages > x86/sgx: Linux Enclave Driver > x86/sgx: Add provisioning > x86/sgx: Add a page reclaimer > x86/sgx: ptrace() support for the SGX driver > selftests/x86: Add a selftest for SGX > docs: x86/sgx: Document SGX micro architecture and kernel internals > > Sean Christopherson (10): > x86/cpufeatures: x86/msr: Add Intel SGX hardware bits > x86/cpufeatures: x86/msr: Intel SGX Launch Control hardware bits > x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX > x86/cpu/intel: Detect SGX support > x86/sgx: Enumerate and track EPC sections > mm: Introduce vm_ops->may_mprotect() > x86/vdso: Add support for exception fixup in vDSO functions > x86/fault: Add helper function to sanitize error code > x86/traps: Attempt to fixup exceptions in vDSO before signaling > x86/vdso: Implement a vDSO for Intel SGX enclave call > > .../userspace-api/ioctl/ioctl-number.rst | 1 + > Documentation/x86/index.rst | 1 + > Documentation/x86/sgx.rst | 206 +++++ > MAINTAINERS | 11 + > arch/x86/Kconfig | 14 + > arch/x86/entry/vdso/Makefile | 8 +- > arch/x86/entry/vdso/extable.c | 46 + > arch/x86/entry/vdso/extable.h | 29 + > arch/x86/entry/vdso/vdso-layout.lds.S | 9 +- > arch/x86/entry/vdso/vdso.lds.S | 1 + > arch/x86/entry/vdso/vdso2c.h | 58 +- > arch/x86/entry/vdso/vsgx_enter_enclave.S | 131 +++ > arch/x86/include/asm/cpufeature.h | 5 +- > arch/x86/include/asm/cpufeatures.h | 8 +- > arch/x86/include/asm/disabled-features.h | 18 +- > arch/x86/include/asm/enclu.h | 8 + > arch/x86/include/asm/msr-index.h | 8 + > arch/x86/include/asm/required-features.h | 2 +- > arch/x86/include/asm/traps.h | 1 + > arch/x86/include/asm/vdso.h | 5 + > arch/x86/include/uapi/asm/sgx.h | 175 ++++ > arch/x86/kernel/cpu/Makefile | 1 + > arch/x86/kernel/cpu/common.c | 4 + > arch/x86/kernel/cpu/feat_ctl.c | 32 +- > arch/x86/kernel/cpu/sgx/Makefile | 6 + > arch/x86/kernel/cpu/sgx/arch.h | 343 ++++++++ > arch/x86/kernel/cpu/sgx/driver.c | 209 +++++ > arch/x86/kernel/cpu/sgx/driver.h | 32 + > arch/x86/kernel/cpu/sgx/encl.c | 756 +++++++++++++++++ > arch/x86/kernel/cpu/sgx/encl.h | 128 +++ > arch/x86/kernel/cpu/sgx/encls.h | 238 ++++++ > arch/x86/kernel/cpu/sgx/ioctl.c | 800 ++++++++++++++++++ > arch/x86/kernel/cpu/sgx/main.c | 280 ++++++ > arch/x86/kernel/cpu/sgx/reclaim.c | 471 +++++++++++ > arch/x86/kernel/cpu/sgx/sgx.h | 108 +++ > arch/x86/kernel/traps.c | 14 + > arch/x86/mm/fault.c | 45 +- > include/linux/mm.h | 2 + > mm/mprotect.c | 14 +- > tools/arch/x86/include/asm/cpufeatures.h | 7 +- > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/sgx/.gitignore | 2 + > tools/testing/selftests/sgx/Makefile | 53 ++ > tools/testing/selftests/sgx/call.S | 54 ++ > tools/testing/selftests/sgx/defines.h | 21 + > tools/testing/selftests/sgx/load.c | 282 ++++++ > tools/testing/selftests/sgx/main.c | 199 +++++ > tools/testing/selftests/sgx/main.h | 38 + > tools/testing/selftests/sgx/sigstruct.c | 395 +++++++++ > tools/testing/selftests/sgx/test_encl.c | 20 + > tools/testing/selftests/sgx/test_encl.lds | 40 + > .../selftests/sgx/test_encl_bootstrap.S | 89 ++ > 52 files changed, 5398 insertions(+), 31 deletions(-) > create mode 100644 Documentation/x86/sgx.rst > create mode 100644 arch/x86/entry/vdso/extable.c > create mode 100644 arch/x86/entry/vdso/extable.h > create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S > create mode 100644 arch/x86/include/asm/enclu.h > create mode 100644 arch/x86/include/uapi/asm/sgx.h > create mode 100644 arch/x86/kernel/cpu/sgx/Makefile > create mode 100644 arch/x86/kernel/cpu/sgx/arch.h > create mode 100644 arch/x86/kernel/cpu/sgx/driver.c > create mode 100644 arch/x86/kernel/cpu/sgx/driver.h > create mode 100644 arch/x86/kernel/cpu/sgx/encl.c > create mode 100644 arch/x86/kernel/cpu/sgx/encl.h > create mode 100644 arch/x86/kernel/cpu/sgx/encls.h > create mode 100644 arch/x86/kernel/cpu/sgx/ioctl.c > create mode 100644 arch/x86/kernel/cpu/sgx/main.c > create mode 100644 arch/x86/kernel/cpu/sgx/reclaim.c > create mode 100644 arch/x86/kernel/cpu/sgx/sgx.h > create mode 100644 tools/testing/selftests/sgx/.gitignore > create mode 100644 tools/testing/selftests/sgx/Makefile > create mode 100644 tools/testing/selftests/sgx/call.S > create mode 100644 tools/testing/selftests/sgx/defines.h > create mode 100644 tools/testing/selftests/sgx/load.c > create mode 100644 tools/testing/selftests/sgx/main.c > create mode 100644 tools/testing/selftests/sgx/main.h > create mode 100644 tools/testing/selftests/sgx/sigstruct.c > create mode 100644 tools/testing/selftests/sgx/test_encl.c > create mode 100644 tools/testing/selftests/sgx/test_encl.lds > create mode 100644 tools/testing/selftests/sgx/test_encl_bootstrap.S > > -- > 2.25.1 >
On Fri, May 08, 2020 at 02:02:26PM -0500, Dr. Greg wrote: > On Thu, May 07, 2020 at 02:41:30AM +0200, Thomas Gleixner wrote: > > The diffstat of your patch is irrelevant. What's relevant is the > > fact that it is completely unreviewed and that it creates yet > > another user space visible ABI with a noticable lack of > > documentation. ... > As to lack of review, we would certainly welcome technical and API > comments but we cannot force them. Thomas' point isn't that your code needs to be reviewed, it's that trying to squeeze it into the initial merge will, not might, _will_ push out the acceptance of SGX. The same is true for other features we have lined up, e.g. KVM and cgroup support. It's not a slight on your code, it's just reality at this point. > In fact, anyone who reviews the patch will see that the current driver > creates a pointer in the ioctl call to pass downward into the hardware > initialization routines. Our code simply replaces that pointer with > one supplied by userspace. Personally, I'm in favor of adding a reserved placeholder for a token so as to avoid adding a second ioctl() in the event that something gets upstreamed that wants the token. Ditto for a file descriptor for the backing store in sgx_enclave_create. But, I have contributed exactly zero ioctls to the kernel, so take that with a big block of salt :-) > At the very least a modular form of the driver should be considered > that would allow alternate implementations. Sean indicated that there > was a 'kludgy' approach that would allow an alternate modular > implementation alongside the in-kernel driver. I believe that Andy > has already indicated that may not be an advisable approach. Modularizing the the driver does nothing for your use case. The "driver" is a relatively lightweight wrapper, removing that is akin to making /dev/sgx/enclave inaccessible, i.e. it doesn't make the EPC tracking and core code go away. Modularizing the whole thing is flat out not an option, as doing so is not compatible with an EPC cgroup and adds unnecessary complexity to KVM enabling, both of which are highly desired features by pretty much everyone that plans on using SGX. Andy is right to caution against playing games to squish in-kernel things, but the virtualization snafu is a completely different beast. E.g. SGX doesn't require fiddling with CR4, won't corrupt random memory across kexec(), doesn't block INIT, etc... And I'm not advocating long-term shenanigans, I just wanted to point out that there are options for running out-of-tree drivers in the short term, e.g. until proper policy controls land upstream.
On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote: > Intel(R) SGX is a set of CPU instructions that can be used by applications > to set aside private regions of code and data. The code outside the enclave > is disallowed to access the memory inside the enclave by the CPU access > control. > > There is a new hardware unit in the processor called Memory Encryption > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define > one or many MEE regions that can hold enclave data by configuring them with > PRMRR registers. > > The MEE automatically encrypts the data leaving the processor package to > the MEE regions. The data is encrypted using a random key whose life-time > is exactly one power cycle. > > The current implementation requires that the firmware sets > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > decide what enclaves it wants run. The implementation does not create > any bottlenecks to support read-only MSRs later on. > > You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > > cat /proc/cpuinfo | grep sgx Tested-by: Chunyang Hui <sanqian.hcy@antfin.com> Occlum project (https://github.com/occlum/occlum) is a libOS built on top of Intel SGX feature. We ran Occlum tests using patch v29 on SGX hardware with the Flexible Launch Control (FLC) feature and didn't find any problems. As Occlum core developers, we would like these patches to be merged soon. > v29: > * The selftest has been moved to selftests/sgx. Because SGX is an execution > environment of its own, it really isn't a great fit with more "standard" > x86 tests. > > The RSA key is now generated on fly and the whole signing process has > been made as part of the enclave loader instead of signing the enclave > during the compilation time. > > Finally, the enclave loader loads now the test enclave directly from its > ELF file, which means that ELF file does not need to be coverted as raw > binary during the build process. > * Version the mm_list instead of using on synchronize_mm() when adding new > entries. We hold the write lock for the mm_struct, and dup_mm() can thus > deadlock with the page reclaimer, which could hold the lock for the old > mm_struct. > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can > be used to reserve the address range. Now /dev/sgx supports only opaque > mappings to the (initialized) enclave data. > * Make the vDSO callable directly from C by preserving RBX and taking leaf > from RCX. > > v28: > * Documented to Documentation/x86/sgx.rst how the kernel manages the > enclave ownership. > * Removed non-LC flow from sgx_einit(). > * Removed struct sgx_einittoken since only the size of the corresponding > microarchitectural structure is used in the series ATM. > > v27: > * Disallow RIE processes to use enclaves as there could a permission > conflict between VMA and enclave permissions. > * In the documentation, replace "grep /proc/cpuinfo" with > "grep sgx /proc/cpuinfo". > > v26: > * Fixed the commit author in "x86/sgx: Linux Enclave Driver", which was > changed in v25 by mistake. > * Addressed a bunch of grammar mistakes in sgx.rst (thanks Randy once > again for such a detailed feedback). > * Added back the MAINTAINERS update commit, which was mistakenly removed > in v25. > * EREMOVE's for SECS cannot be done while sanitizing an EPC section. The > CPU does not allow to remove a SECS page before all of its children have > been removed, and a child page can be in some other section than the one > currently being processed. Thus, removed special SECS processing from > sgx_sanitize_page() and instead put sections through it twice. In the > 2nd round the lists should only contain SECS pages. > > v25: > * Fix a double-free issue when SGX_IOC_ENCLAVE_ADD_PAGES > fails on executing ENCLS[EADD]. The rollback path executed > radix_tree_delete() on the same address twice when this happened. > * Return -EINTR instead of -ERESTARTSYS in SGX_IOC_ENCLAVE_ADD_PAGES when > a signal is pending. > * As requested by Borislav, move the CPUID 0x12 features to their own word > in cpufeatures. > * Sean fixed a bug from sgx_reclaimer_write() where sgx_encl_put_backing() > was called with an uninitialized pointer when sgx_encl_get_backing() > fails. > * Migrated /dev/sgx/* to misc. This is future-proof as struct miscdevice > has 'groups' for setting up sysfs attributes for the device. > * Use device_initcall instead of subsys_initcall so that misc_class is > initialized before SGX is initialized. > * Return -EACCES in SGX_IOC_ENCLAVE_INIT when caller tries to select > enclave attributes that we the kernel does not allow it to set instead > of -EINVAL. > * Unless SGX public key MSRs are writable always deny the feature from > Linux. Previously this was only denied from driver. How VMs should be > supported is not really part of initial patch set, which makes this > an obvious choice. > * Cleaned up and refined documentation to be more approachable. > > v24: > * Reclaim unmeasured and TCS pages (regression in v23). > * Replace usages of GFP_HIGHUSER with GFP_KERNEL. > * Return -EIO on when EADD or EEXTEND fails in %SGX_IOC_ENCLAVE_ADD_PAGES > and use the same rollback (destroy enclave). This can happen when host > suspends itself unknowingly to a VM running enclaves. From -EIO the user > space can deduce what happened. > * Have a separate @count in struct sgx_enclave_add_pages to output number > of bytes processed instead of overwriting the input parameters for > clarity and more importantly that the API provides means for partial > processing (@count could be less than @length in success case). > > v23: > * Replace SGX_ENCLAVE_ADD_PAGE with SGX_ENCLAVE_ADD_PAGES. Replace @mrmask > with %SGX_PAGE_MEASURE flag. > * Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page. > * In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail > (because of OOM) even in legit behaviour and after EBLOCK the reclaiming > flow can be only reverted by killing the whole enclave. > * Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact > it should have been bit 6 (Table 37-3 in the SDM). > * Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX > error code. > > v22: > * Refined bunch commit messages and added associated SDM references as > many of them were too exhausting and some outdated. > * Alignment checks have been removed from mmap() because it does not define the > ELRANGE. VMAs only act as windows to the enclave. The semantics compare > somewhat how mmap() works with regular files. > * We now require user space addresses given to SGX_IOC_ENCLAVE_ADD_PAGE to be > page aligned so that we can pass the page directly to EADD and do not have > to do an extra copy. This was made effectively possible by removing the > worker thread for adding pages. > * The selftest build files have been refined throughout of various glitches > and work properly in a cross compilation environment such as BuildRoot. > In addition, libcalls fail the build with an assertion in the linker > script, if they end up to the enclave binary. > * CONFIG_INTEL_SGX_DRIVER has been removed because you cannot use SGX core > for anything without having the driver. This could change when KVM support > is added. > * We require zero permissions in SECINFO for TCS pages because the CPU > overwrites SECINFO flags with zero permissions and measures the page > only after that. Allowing to pass TCS with non-zero permissions would > cause mismatching measurement between the one provided in SIGSTRUCT and > the one computed by the CPU. > * Obviously lots of small fixes and clean ups (does make sense to > document them all). > > v21: > * Check on mmap() that the VMA does cover an area that does not have > enclave pages. Only mapping with PROT_NONE can do that to reserve > initial address space for an enclave. > * Check om mmap() and mprotect() that the VMA permissions do not > surpass the enclave permissions. > * Remove two refcounts from vma_close(): mm_list and encl->refcount. > Enclave refcount is only need for swapper/enclave sync and we can > remove mm_list refcount by destroying mm_struct when the process > is closed. By not having vm_close() the Linux MM can merge VMAs. > * Do not naturally align MAP_FIXED address. > * Numerous small fixes and clean ups. > * Use SRCU for synchronizing the list of mm_struct's. > * Move to stack based call convention in the vDSO. > > v20: > * Fine-tune Kconfig messages and spacing and remove MMU_NOTIFIER > dependency as MMU notifiers are no longer used in the driver. > * Use mm_users instead of mm_count as refcount for mm_struct as mm_count > only protects from deleting mm_struct, not removing its contents. > * Sanitize EPC when the reclaimer thread starts by doing EREMOVE for all > of them. They could be in initialized state when the kernel starts > because it might be spawned by kexec(). > * Documentation overhaul. > * Use a device /dev/sgx/provision for delivering the provision token > instead of securityfs. > * Create a reference to the enclave when already when opening > /dev/sgx/enclave. The file is then associated with this enclave only. > mmap() can be done at free at any point and always get a reference to > the enclave. To summarize the file now represents the enclave. > > v19: > * Took 3-4 months but in some sense this was more like a rewrite of most > of the corners of the source code. If I've forgotten to deal with some > feedback, please don't shout me. Make a remark and I will fix it for > the next version. Hopefully there won't be this big turnovers anymore. > * Validate SECS attributes properly against CPUID given attributes and > against allowed attributes. SECS attributes are the ones that are > enforced whereas SIGSTRUCT attributes tell what is required to run > the enclave. > * Add KSS (Key Sharing Support) to the enclave attributes. > * Deny MAP_PRIVATE as an enclave is always a shared memory entity. > * Revert back to shmem backing storage so that it can be easily shared > by multiple processes. > * Split the recognization of an ENCLS leaf failure by using three > functions to detect it: encsl_faulted(), encls_returned_code() and > sgx_failed(). encls_failed() is only caused by a spurious expections that > should never happen. Thus, it is not defined as an inline function in > order to easily insert a kprobe to it. > * Move low-level enclave management routines, page fault handler and page > reclaiming routines from driver to the core. These cannot be separated > from each other as they are heavily interdependent. The rationale is that > the core does not call any code from the driver. > * Allow the driver to be compiled as a module now that it no code is using > its routines and it only uses exported symbols. Now the driver is > essentially just a thin ioctl layer. > * Reworked the driver to maintain a list of mm_struct's. The VMA callbacks > add new entries to this list as the process is forked. Each entry has > its own refcount because they have a different life-cycle as the enclave > does. In effect @tgid and @mm have been removed from struct sgx_encl > and we allow forking by removing VM_DONTCOPY from vm flags. > * Generate a cpu mask in the reclaimer from the cpu mask's of all > mm_struct's. This will kick out the hardware threads out of the enclave > from multiple processes. It is not a local variable because it would > eat too much of the stack space but instead a field in struct > sgx_encl. > * Allow forking i.e. remove VM_DONTCOPY. I did not change the API > because the old API scaled to the workload that Andy described. The > codebase is now mostly API independent i.e. changing the API is a > small task. For me the proper trigger to chanage it is a as concrete > as possible workload that cannot be fulfilled. I hope you understand > my thinking here. I don't want to change anything w/o proper basis > but I'm ready to change anything if there is a proper basis. I do > not have any kind of attachment to any particular type of API. > * Add Sean's vDSO ENCLS(EENTER) patches and update selftest to use the > new vDSO. > > v18: > * Update the ioctl-number.txt. > * Move the driver under arch/x86. > * Add SGX features (SGX, SGX1, SGX2) to the disabled-features.h. > * Rename the selftest as test_sgx (previously sgx-selftest). > * In order to enable process accounting, swap EPC pages and PCMD's to a VMA > instead of shmem. > * Allow only to initialize and run enclaves with a subset of > {DEBUG, MODE64BIT} set. > * Add SGX_IOC_ENCLAVE_SET_ATTRIBUTE to allow an enclave to have privileged > attributes e.g. PROVISIONKEY. > > v17: > * Add a simple selftest. > * Fix a null pointer dereference to section->pages when its > allocation fails. > * Add Sean's description of the exception handling to the documentation. > > v16: > * Fixed SOB's in the commits that were a bit corrupted in v15. > * Implemented exceptio handling properly to detect_sgx(). > * Use GENMASK() to define SGX_CPUID_SUB_LEAF_TYPE_MASK. > * Updated the documentation to use rst definition lists. > * Added the missing Documentation/x86/index.rst, which has a link to > intel_sgx.rst. Now the SGX and uapi documentation is properly generated > with 'make htmldocs'. > * While enumerating EPC sections, if an undefined section is found, fail > the driver initialization instead of continuing the initialization. > * Issue a warning if there are more than %SGX_MAX_EPC_SECTIONS. > * Remove copyright notice from arch/x86/include/asm/sgx.h. > * Migrated from ioremap_cache() to memremap(). > > v15: > * Split into more digestable size patches. > * Lots of small fixes and clean ups. > * Signal a "plain" SIGSEGV on an EPCM violation. > > v14: > * Change the comment about X86_FEATURE_SGX_LC from ???SGX launch > configuration??? to ???SGX launch control???. > * Move the SGX-related CPU feature flags as part of the Linux defined > virtual leaf 8. > * Add SGX_ prefix to the constants defining the ENCLS leaf functions. > * Use GENMASK*() and BIT*() in sgx_arch.h instead of raw hex numbers. > * Refine the long description for CONFIG_INTEL_SGX_CORE. > * Do not use pr_*_ratelimited() in the driver. The use of the rate limited > versions is legacy cruft from the prototyping phase. > * Detect sleep with SGX_INVALID_EINIT_TOKEN instead of counting power > cycles. > * Manually prefix with ???sgx:??? in the core SGX code instead of redefining > pr_fmt. > * Report if IA32_SGXLEPUBKEYHASHx MSRs are not writable in the driver > instead of core because it is a driver requirement. > * Change prompt to bool in the entry for CONFIG_INTEL_SGX_CORE because the > default is ???n???. > * Rename struct sgx_epc_bank as struct sgx_epc_section in order to match > the SDM. > * Allocate struct sgx_epc_page instances one at a time. > * Use ???__iomem void *??? pointers for the mapped EPC memory consistently. > * Retry once on SGX_INVALID_TOKEN in sgx_einit() instead of counting power > cycles. > * Call enclave swapping operations directly from the driver instead of > calling them .indirectly through struct sgx_epc_page_ops because indirect > calls are not required yet as the patch set does not contain the KVM > support. > * Added special signal SEGV_SGXERR to notify about SGX EPCM violation > errors. > > v13: > * Always use SGX_CPUID constant instead of a hardcoded value. > * Simplified and documented the macros and functions for ENCLS leaves. > * Enable sgx_free_page() to free active enclave pages on demand > in order to allow sgx_invalidate() to delete enclave pages. > It no longer performs EREMOVE if a page is in the process of > being reclaimed. > * Use PM notifier per enclave so that we don't have to traverse > the global list of active EPC pages to find enclaves. > * Removed unused SGX_LE_ROLLBACK constant from uapi/asm/sgx.h > * Always use ioremap() to map EPC banks as we only support 64-bit kernel. > * Invalidate IA32_SGXLEPUBKEYHASH cache used by sgx_einit() when going > to sleep. > > v12: > * Split to more narrow scoped commits in order to ease the review process and > use co-developed-by tag for co-authors of commits instead of listing them in > the source files. > * Removed cruft EXPORT_SYMBOL() declarations and converted to static variables. > * Removed in-kernel LE i.e. this version of the SGX software stack only > supports unlocked IA32_SGXLEPUBKEYHASHx MSRs. > * Refined documentation on launching enclaves, swapping and enclave > construction. > * Refined sgx_arch.h to include alignment information for every struct that > requires it and removed structs that are not needed without an LE. > * Got rid of SGX_CPUID. > * SGX detection now prints log messages about firmware configuration issues. > > v11: > * Polished ENCLS wrappers with refined exception handling. > * ksgxswapd was not stopped (regression in v5) in > sgx_page_cache_teardown(), which causes a leaked kthread after driver > deinitialization. > * Shutdown sgx_le_proxy when going to suspend because its EPC pages will be > invalidated when resuming, which will cause it not function properly > anymore. > * Set EINITTOKEN.VALID to zero for a token that is passed when > SGXLEPUBKEYHASH matches MRSIGNER as alloc_page() does not give a zero > page. > * Fixed the check in sgx_edbgrd() for a TCS page. Allowed to read offsets > around the flags field, which causes a #GP. Only flags read is readable. > * On read access memcpy() call inside sgx_vma_access() had src and dest > parameters in wrong order. > * The build issue with CONFIG_KASAN is now fixed. Added undefined symbols > to LE even if ???KASAN_SANITIZE := false??? was set in the makefile. > * Fixed a regression in the #PF handler. If a page has > SGX_ENCL_PAGE_RESERVED flag the #PF handler should unconditionally fail. > It did not, which caused weird races when trying to change other parts of > swapping code. > * EPC management has been refactored to a flat LRU cache and moved to > arch/x86. The swapper thread reads a cluster of EPC pages and swaps all > of them. It can now swap from multiple enclaves in the same round. > * For the sake of consistency with SGX_IOC_ENCLAVE_ADD_PAGE, return -EINVAL > when an enclave is already initialized or dead instead of zero. > > v10: > * Cleaned up anon inode based IPC between the ring-0 and ring-3 parts > of the driver. > * Unset the reserved flag from an enclave page if EDBGRD/WR fails > (regression in v6). > * Close the anon inode when LE is stopped (regression in v9). > * Update the documentation with a more detailed description of SGX. > > v9: > * Replaced kernel-LE IPC based on pipes with an anonymous inode. > The driver does not require anymore new exports. > > v8: > * Check that public key MSRs match the LE public key hash in the > driver initialization when the MSRs are read-only. > * Fix the race in VA slot allocation by checking the fullness > immediately after succeesful allocation. > * Fix the race in hash mrsigner calculation between the launch > enclave and user enclaves by having a separate lock for hash > calculation. > > v7: > * Fixed offset calculation in sgx_edbgr/wr(). Address was masked with PAGE_MASK > when it should have been masked with ~PAGE_MASK. > * Fixed a memory leak in sgx_ioc_enclave_create(). > * Simplified swapping code by using a pointer array for a cluster > instead of a linked list. > * Squeezed struct sgx_encl_page to 32 bytes. > * Fixed deferencing of an RSA key on OpenSSL 1.1.0. > * Modified TC's CMAC to use kernel AES-NI. Restructured the code > a bit in order to better align with kernel conventions. > > v6: > * Fixed semaphore underrun when accessing /dev/sgx from the launch enclave. > * In sgx_encl_create() s/IS_ERR(secs)/IS_ERR(encl)/. > * Removed virtualization chapter from the documentation. > * Changed the default filename for the signing key as signing_key.pem. > * Reworked EPC management in a way that instead of a linked list of > struct sgx_epc_page instances there is an array of integers that > encodes address and bank of an EPC page (the same data as 'pa' field > earlier). The locking has been moved to the EPC bank level instead > of a global lock. > * Relaxed locking requirements for EPC management. EPC pages can be > released back to the EPC bank concurrently. > * Cleaned up ptrace() code. > * Refined commit messages for new architectural constants. > * Sorted includes in every source file. > * Sorted local variable declarations according to the line length in > every function. > * Style fixes based on Darren's comments to sgx_le.c. > > v5: > * Described IPC between the Launch Enclave and kernel in the commit messages. > * Fixed all relevant checkpatch.pl issues that I have forgot fix in earlier > versions except those that exist in the imported TinyCrypt code. > * Fixed spelling mistakes in the documentation. > * Forgot to check the return value of sgx_drv_subsys_init(). > * Encapsulated properly page cache init and teardown. > * Collect epc pages to a temp list in sgx_add_epc_bank > * Removed SGX_ENCLAVE_INIT_ARCH constant. > > v4: > * Tied life-cycle of the sgx_le_proxy process to /dev/sgx. > * Removed __exit annotation from sgx_drv_subsys_exit(). > * Fixed a leak of a backing page in sgx_process_add_page_req() in the > case when vm_insert_pfn() fails. > * Removed unused symbol exports for sgx_page_cache.c. > * Updated sgx_alloc_page() to require encl parameter and documented the > behavior (Sean Christopherson). > * Refactored a more lean API for sgx_encl_find() and documented the behavior. > * Moved #PF handler to sgx_fault.c. > * Replaced subsys_system_register() with plain bus_register(). > * Retry EINIT 2nd time only if MSRs are not locked. > > v3: > * Check that FEATURE_CONTROL_LOCKED and FEATURE_CONTROL_SGX_ENABLE are set. > * Return -ERESTARTSYS in __sgx_encl_add_page() when sgx_alloc_page() fails. > * Use unused bits in epc_page->pa to store the bank number. > * Removed #ifdef for WQ_NONREENTRANT. > * If mmu_notifier_register() fails with -EINTR, return -ERESTARTSYS. > * Added --remove-section=.got.plt to objcopy flags in order to prevent a > dummy .got.plt, which will cause an inconsistent size for the LE. > * Documented sgx_encl_* functions. > * Added remark about AES implementation used inside the LE. > * Removed redundant sgx_sys_exit() from le/main.c. > * Fixed struct sgx_secinfo alignment from 128 to 64 bytes. > * Validate miscselect in sgx_encl_create(). > * Fixed SSA frame size calculation to take the misc region into account. > * Implemented consistent exception handling to __encls() and __encls_ret(). > * Implemented a proper device model in order to allow sysfs attributes > and in-kernel API. > * Cleaned up various "find enclave" implementations to the unified > sgx_encl_find(). > * Validate that vm_pgoff is zero. > * Discard backing pages with shmem_truncate_range() after EADD. > * Added missing EEXTEND operations to LE signing and launch. > * Fixed SSA size for GPRS region from 168 to 184 bytes. > * Fixed the checks for TCS flags. Now DBGOPTIN is allowed. > * Check that TCS addresses are in ELRANGE and not just page aligned. > * Require kernel to be compiled with X64_64 and CPU_SUP_INTEL. > * Fixed an incorrect value for SGX_ATTR_DEBUG from 0x01 to 0x02. > > v2: > * get_rand_uint32() changed the value of the pointer instead of value > where it is pointing at. > * Launch enclave incorrectly used sigstruct attributes-field instead of > enclave attributes-field. > * Removed unused struct sgx_add_page_req from sgx_ioctl.c > * Removed unused sgx_has_sgx2. > * Updated arch/x86/include/asm/sgx.h so that it provides stub > implementations when sgx in not enabled. > * Removed cruft rdmsr-calls from sgx_set_pubkeyhash_msrs(). > * return -ENOMEM in sgx_alloc_page() when VA pages consume too much space > * removed unused global sgx_nr_pids > * moved sgx_encl_release to sgx_encl.c > * return -ERESTARTSYS instead of -EINTR in sgx_encl_init() > > Jarkko Sakkinen (10): > x86/sgx: Update MAINTAINERS > x86/sgx: Add SGX microarchitectural data structures > x86/sgx: Add wrappers for ENCLS leaf functions > x86/sgx: Add functions to allocate and free EPC pages > x86/sgx: Linux Enclave Driver > x86/sgx: Add provisioning > x86/sgx: Add a page reclaimer > x86/sgx: ptrace() support for the SGX driver > selftests/x86: Add a selftest for SGX > docs: x86/sgx: Document SGX micro architecture and kernel internals > > Sean Christopherson (10): > x86/cpufeatures: x86/msr: Add Intel SGX hardware bits > x86/cpufeatures: x86/msr: Intel SGX Launch Control hardware bits > x86/mm: x86/sgx: Signal SIGSEGV with PF_SGX > x86/cpu/intel: Detect SGX support > x86/sgx: Enumerate and track EPC sections > mm: Introduce vm_ops->may_mprotect() > x86/vdso: Add support for exception fixup in vDSO functions > x86/fault: Add helper function to sanitize error code > x86/traps: Attempt to fixup exceptions in vDSO before signaling > x86/vdso: Implement a vDSO for Intel SGX enclave call > > .../userspace-api/ioctl/ioctl-number.rst | 1 + > Documentation/x86/index.rst | 1 + > Documentation/x86/sgx.rst | 206 +++++ > MAINTAINERS | 11 + > arch/x86/Kconfig | 14 + > arch/x86/entry/vdso/Makefile | 8 +- > arch/x86/entry/vdso/extable.c | 46 + > arch/x86/entry/vdso/extable.h | 29 + > arch/x86/entry/vdso/vdso-layout.lds.S | 9 +- > arch/x86/entry/vdso/vdso.lds.S | 1 + > arch/x86/entry/vdso/vdso2c.h | 58 +- > arch/x86/entry/vdso/vsgx_enter_enclave.S | 131 +++ > arch/x86/include/asm/cpufeature.h | 5 +- > arch/x86/include/asm/cpufeatures.h | 8 +- > arch/x86/include/asm/disabled-features.h | 18 +- > arch/x86/include/asm/enclu.h | 8 + > arch/x86/include/asm/msr-index.h | 8 + > arch/x86/include/asm/required-features.h | 2 +- > arch/x86/include/asm/traps.h | 1 + > arch/x86/include/asm/vdso.h | 5 + > arch/x86/include/uapi/asm/sgx.h | 175 ++++ > arch/x86/kernel/cpu/Makefile | 1 + > arch/x86/kernel/cpu/common.c | 4 + > arch/x86/kernel/cpu/feat_ctl.c | 32 +- > arch/x86/kernel/cpu/sgx/Makefile | 6 + > arch/x86/kernel/cpu/sgx/arch.h | 343 ++++++++ > arch/x86/kernel/cpu/sgx/driver.c | 209 +++++ > arch/x86/kernel/cpu/sgx/driver.h | 32 + > arch/x86/kernel/cpu/sgx/encl.c | 756 +++++++++++++++++ > arch/x86/kernel/cpu/sgx/encl.h | 128 +++ > arch/x86/kernel/cpu/sgx/encls.h | 238 ++++++ > arch/x86/kernel/cpu/sgx/ioctl.c | 800 ++++++++++++++++++ > arch/x86/kernel/cpu/sgx/main.c | 280 ++++++ > arch/x86/kernel/cpu/sgx/reclaim.c | 471 +++++++++++ > arch/x86/kernel/cpu/sgx/sgx.h | 108 +++ > arch/x86/kernel/traps.c | 14 + > arch/x86/mm/fault.c | 45 +- > include/linux/mm.h | 2 + > mm/mprotect.c | 14 +- > tools/arch/x86/include/asm/cpufeatures.h | 7 +- > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/sgx/.gitignore | 2 + > tools/testing/selftests/sgx/Makefile | 53 ++ > tools/testing/selftests/sgx/call.S | 54 ++ > tools/testing/selftests/sgx/defines.h | 21 + > tools/testing/selftests/sgx/load.c | 282 ++++++ > tools/testing/selftests/sgx/main.c | 199 +++++ > tools/testing/selftests/sgx/main.h | 38 + > tools/testing/selftests/sgx/sigstruct.c | 395 +++++++++ > tools/testing/selftests/sgx/test_encl.c | 20 + > tools/testing/selftests/sgx/test_encl.lds | 40 + > .../selftests/sgx/test_encl_bootstrap.S | 89 ++ > 52 files changed, 5398 insertions(+), 31 deletions(-) > create mode 100644 Documentation/x86/sgx.rst > create mode 100644 arch/x86/entry/vdso/extable.c > create mode 100644 arch/x86/entry/vdso/extable.h > create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S > create mode 100644 arch/x86/include/asm/enclu.h > create mode 100644 arch/x86/include/uapi/asm/sgx.h > create mode 100644 arch/x86/kernel/cpu/sgx/Makefile > create mode 100644 arch/x86/kernel/cpu/sgx/arch.h > create mode 100644 arch/x86/kernel/cpu/sgx/driver.c > create mode 100644 arch/x86/kernel/cpu/sgx/driver.h > create mode 100644 arch/x86/kernel/cpu/sgx/encl.c > create mode 100644 arch/x86/kernel/cpu/sgx/encl.h > create mode 100644 arch/x86/kernel/cpu/sgx/encls.h > create mode 100644 arch/x86/kernel/cpu/sgx/ioctl.c > create mode 100644 arch/x86/kernel/cpu/sgx/main.c > create mode 100644 arch/x86/kernel/cpu/sgx/reclaim.c > create mode 100644 arch/x86/kernel/cpu/sgx/sgx.h > create mode 100644 tools/testing/selftests/sgx/.gitignore > create mode 100644 tools/testing/selftests/sgx/Makefile > create mode 100644 tools/testing/selftests/sgx/call.S > create mode 100644 tools/testing/selftests/sgx/defines.h > create mode 100644 tools/testing/selftests/sgx/load.c > create mode 100644 tools/testing/selftests/sgx/main.c > create mode 100644 tools/testing/selftests/sgx/main.h > create mode 100644 tools/testing/selftests/sgx/sigstruct.c > create mode 100644 tools/testing/selftests/sgx/test_encl.c > create mode 100644 tools/testing/selftests/sgx/test_encl.lds > create mode 100644 tools/testing/selftests/sgx/test_encl_bootstrap.S > > -- > 2.25.1 >
On Tue, May 12, 2020 at 07:55:58PM +0800, Hui, Chunyang wrote: > > You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > > > > cat /proc/cpuinfo | grep sgx > > Tested-by: Chunyang Hui <sanqian.hcy@antfin.com> > Occlum project (https://github.com/occlum/occlum) is a libOS built > on top of Intel SGX feature. We ran Occlum tests using patch v29 on > SGX hardware with the Flexible Launch Control (FLC) feature and > didn't find any problems. As Occlum core developers, we would like > these patches to be merged soon. Do you use the Intel PSW or your own? Are you using the standard ECALL interface or did the tests run with the new VDSO entry and exception handler? Have a good day. Dr. Greg As always, Dr. Greg Wettstein, Ph.D, Worker Autonomously self-defensive Enjellic Systems Development, LLC IOT platforms and edge devices. 4206 N. 19th Ave. Fargo, ND 58102 PH: 701-281-1686 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "I had far rather walk, as I do, in daily terror of eternity, than feel that this was only a children's game in which all of the contestants would get equally worthless prizes in the end." -- T. S. Elliot
On Wed, 2020-05-06 at 17:42 -0400, Nathaniel McCallum wrote: > Tested on Enarx. This requires a patch[0] for v29 support. > > Tested-by: Nathaniel McCallum <npmccallum@redhat.com> Thank you. Update in my tree. Sean, I'll fixed that whitespace issue too in my tree. General question: maybe it would be easiest that I issue a pull request once everyone feels that the series is ready to be pulled and stop sending new versions of the series? /Jarkko
On Thu, 2020-05-14 at 01:14 +0300, Jarkko Sakkinen wrote: > On Wed, 2020-05-06 at 17:42 -0400, Nathaniel McCallum wrote: > > Tested on Enarx. This requires a patch[0] for v29 support. > > > > Tested-by: Nathaniel McCallum <npmccallum@redhat.com> > > Thank you. Update in my tree. > > Sean, I'll fixed that whitespace issue too in my tree. > > General question: maybe it would be easiest that I issue a pull request > once everyone feels that the series is ready to be pulled and stop sending > new versions of the series? My honest feelings about the series ATM are: 1. It is not perfect like no code never is and there are always issues. 2. Some things are very well matured, even more so than in a lot of mainline code I've seen. I'm particularly happy how the locking code has been converged. 3. Not worried to maintain the code in its current state. It is manageable. /Jarkko
On Wed, 2020-05-06 at 09:39 -0700, Jordan Hand wrote: > On 4/21/20 2:52 PM, Jarkko Sakkinen wrote: > > Intel(R) SGX is a set of CPU instructions that can be used by applications > > to set aside private regions of code and data. The code outside the enclave > > is disallowed to access the memory inside the enclave by the CPU access > > control. > > > > There is a new hardware unit in the processor called Memory Encryption > > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define > > one or many MEE regions that can hold enclave data by configuring them with > > PRMRR registers. > > > > The MEE automatically encrypts the data leaving the processor package to > > the MEE regions. The data is encrypted using a random key whose life-time > > is exactly one power cycle. > > > > The current implementation requires that the firmware sets > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > > decide what enclaves it wants run. The implementation does not create > > any bottlenecks to support read-only MSRs later on. > > > > You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > > > > cat /proc/cpuinfo | grep sgx > > > > v29: > > * The selftest has been moved to selftests/sgx. Because SGX is an execution > > environment of its own, it really isn't a great fit with more "standard" > > x86 tests. > > > > The RSA key is now generated on fly and the whole signing process has > > been made as part of the enclave loader instead of signing the enclave > > during the compilation time. > > > > Finally, the enclave loader loads now the test enclave directly from its > > ELF file, which means that ELF file does not need to be coverted as raw > > binary during the build process. > > * Version the mm_list instead of using on synchronize_mm() when adding new > > entries. We hold the write lock for the mm_struct, and dup_mm() can thus > > deadlock with the page reclaimer, which could hold the lock for the old > > mm_struct. > > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g. anonymous) can > > be used to reserve the address range. Now /dev/sgx supports only opaque > > mappings to the (initialized) enclave data. > > * Make the vDSO callable directly from C by preserving RBX and taking leaf > > from RCX. > > > > Tested with the Open Enclave SDK on top of Intel PSW. Specifically built > the Intel PSW with changes to support /dev/sgx mapping[1] new in v29. > > Tested-by: Jordan Hand <jorhand@linux.microsoft.com> > > [1] https://github.com/intel/linux-sgx/pull/530 Thank you! /Jarkko
On 2020-05-14 00:14, Jarkko Sakkinen wrote: > General question: maybe it would be easiest that I issue a pull request > once everyone feels that the series is ready to be pulled and stop sending > new versions of the series? Sounds good -- Jethro Beekman | Fortanix
On Fri, May 08, 2020 at 12:56:35PM -0700, Sean Christopherson wrote: Good morning, I hope the week is proceeding well for everyone. > On Fri, May 08, 2020 at 02:02:26PM -0500, Dr. Greg wrote: > > On Thu, May 07, 2020 at 02:41:30AM +0200, Thomas Gleixner wrote: > > > The diffstat of your patch is irrelevant. What's relevant is the > > > fact that it is completely unreviewed and that it creates yet > > > another user space visible ABI with a noticable lack of > > > documentation. > > ... > > > As to lack of review, we would certainly welcome technical and API > > comments but we cannot force them. > Thomas' point isn't that your code needs to be reviewed, it's that > trying to squeeze it into the initial merge will, not might, _will_ > push out the acceptance of SGX. The same is true for other features > we have lined up, e.g. KVM and cgroup support. It's not a slight on > your code, it's just reality at this point. For the record and for whatever it is worth at this point. The patch has been available in its present form and function for around 14 months. So there was plenty of time for its consideration and integration into what is being prepared as the final merge candidate. > > In fact, anyone who reviews the patch will see that the current driver > > creates a pointer in the ioctl call to pass downward into the hardware > > initialization routines. Our code simply replaces that pointer with > > one supplied by userspace. > Personally, I'm in favor of adding a reserved placeholder for a > token so as to avoid adding a second ioctl() in the event that > something gets upstreamed that wants the token. Ditto for a file > descriptor for the backing store in sgx_enclave_create. That would be a very low cost and forward looking addition. > > At the very least a modular form of the driver should be > > considered that would allow alternate implementations. Sean > > indicated that there was a 'kludgy' approach that would allow an > > alternate modular implementation alongside the in-kernel driver. > > I believe that Andy has already indicated that may not be an > > advisable approach. > Modularizing the the driver does nothing for your use case. The > "driver" is a relatively lightweight wrapper, removing that is akin > to making /dev/sgx/enclave inaccessible, i.e. it doesn't make the > EPC tracking and core code go away. Modularizing the whole thing is > flat out not an option, as doing so is not compatible with an EPC > cgroup and adds unnecessary complexity to KVM enabling, both of > which are highly desired features by pretty much everyone that plans > on using SGX. All well taken technical points, but they don't speak directly to the issue at hand. The potential security issue in all of this is access to /dev/sgx/enclave, not the EPC tracking and core code. Here in a nutshell is the paradox the kernel faces: No one seems to be disputing the fact that the primary focus of this driver will be to support the notion of 'runtime encryption' and Confidential Computing. The whole premise of the concept and economic predicate of the initiative is that the owner/manager of the platform has no visibility into what is being done on the platform. If the Linux community believes that standard platform security controls can make qualitatively good judgements on what enclave based execution is doing, it is effectively making the statement that the very concept of runtime encryption and by extension Confidential Computing is invalid. If we accept the concept that Confidential Computing is valid then we admit the technology provides the opportunity for unsupervised code execution and data manipulation. Our premise in all of this is that one page of code implementing three linked lists is a small price to pay to provide platform owners with the opportunity to optionally implement what is effectively 2-factor authentication over this process. Going forward, if we are intellectually honest, all of this suggests that adding complexity to the driver with LSM controls makes little sense technically. Since, by definition and economic intent, the technology provides tools and infrastructure that allows these controls to be evaded. The notion that entities with adversarial intent would not try and take advantage of this flies in the face of everything we know about security. > Andy is right to caution against playing games to squish in-kernel > things, but the virtualization snafu is a completely different > beast. E.g. SGX doesn't require fiddling with CR4, won't corrupt > random memory across kexec(), doesn't block INIT, etc... And I'm > not advocating long-term shenanigans, I just wanted to point out > that there are options for running out-of-tree drivers in the short > term, e.g. until proper policy controls land upstream. It appears that the distributions, at least IBM/RHEL, are going to compile this driver in and backport it as well. What we would recommend at this point is that you and Jarkko do the Linux community and beyond a favor and wire up a simple kernel command-line parameter that controls the ability of the driver to be used, ie. enables/disables access to /dev/sgx/enclave. Distributions or others can set this command-line parameter by default to 'disable' and avoid any possibility of the technology being used for nefarious purposes. Since the technology now appears to be focused only on the cloud providers they will certainly be capable of configuring their implementations to change the default. In essence, make the kernel's behavior secure by default. Best wishes for a pleasant weekend to everyone. Dr. Greg As always, Dr. Greg Wettstein, Ph.D, Worker Artisans in autonomously Enjellic Systems Development, LLC self-defensive IOT platforms 4206 N. 19th Ave. and edge devices. Fargo, ND 58102 PH: 701-281-1686 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Intellectuals solve problems; geniuses prevent them." -- Albert Einstein
On Tue, 2020-05-12 at 19:55 +0800, Hui, Chunyang wrote: > On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote: > > Intel(R) SGX is a set of CPU instructions that can be used by applications > > to set aside private regions of code and data. The code outside the enclave > > is disallowed to access the memory inside the enclave by the CPU access > > control. > > > > There is a new hardware unit in the processor called Memory Encryption > > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define > > one or many MEE regions that can hold enclave data by configuring them with > > PRMRR registers. > > > > The MEE automatically encrypts the data leaving the processor package to > > the MEE regions. The data is encrypted using a random key whose life-time > > is exactly one power cycle. > > > > The current implementation requires that the firmware sets > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > > decide what enclaves it wants run. The implementation does not create > > any bottlenecks to support read-only MSRs later on. > > > > You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > > > > cat /proc/cpuinfo | grep sgx > > Tested-by: Chunyang Hui <sanqian.hcy@antfin.com> > > Occlum project (https://github.com/occlum/occlum) is a libOS built on top of > Intel SGX feature. We ran Occlum tests using patch v29 on SGX hardware with > the Flexible Launch Control (FLC) feature and didn't find any problems. > As Occlum core developers, we would like these patches to be merged soon. Great, thanks adding tested by to the driver and reclaimer patch. /Jarkko
On Thu, May 14, 2020 at 04:16:37AM -0500, Dr. Greg wrote: > What we would recommend at this point is that you and Jarkko do the > Linux community and beyond a favor and wire up a simple kernel > command-line parameter that controls the ability of the driver to be > used, ie. enables/disables access to /dev/sgx/enclave. I'm not opposed to adding a kernel param to disable SGX. At one point there was a proposal to extend clearcpuid to allow disabling multiple feature bits, but it looks like that went the way of the dodo. Note, such a param would disable SGX entirely, e.g. clear the feature bit in /proc/cpuinfo and prevent any in-kernel SGX code from running.
On Thu, May 14, 2020 at 09:15:59AM -0700, Sean Christopherson wrote: > I'm not opposed to adding a kernel param to disable SGX. At one point > there was a proposal to extend clearcpuid to allow disabling multiple > feature bits, but it looks like that went the way of the dodo. > > Note, such a param would disable SGX entirely, e.g. clear the feature bit > in /proc/cpuinfo and prevent any in-kernel SGX code from running. It is a usual practice for big features like SGX to add a "nosgx" cmdline param to disable it in case something goes south. We do this for all features - see all "no*" switches in Documentation/admin-guide/kernel-parameters.txt Thx.
On Fri, May 8, 2020 at 12:08 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > Adding some Google folks to the party. Thanks, Sean. > On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote: > > Intel(R) SGX is a set of CPU instructions that can be used by applications > > to set aside private regions of code and data. The code outside the enclave > > is disallowed to access the memory inside the enclave by the CPU access > > control. > > > > There is a new hardware unit in the processor called Memory Encryption > > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define > > one or many MEE regions that can hold enclave data by configuring them with > > PRMRR registers. > > > > The MEE automatically encrypts the data leaving the processor package to > > the MEE regions. The data is encrypted using a random key whose life-time > > is exactly one power cycle. > > > > The current implementation requires that the firmware sets > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > > decide what enclaves it wants run. The implementation does not create > > any bottlenecks to support read-only MSRs later on. > > > > You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > > > > cat /proc/cpuinfo | grep sgx We applied the v29 patches to Linux 5.6.0, then tested on Xeon(R) E-2186G with Asylo (http://asylo.dev). Looks good. All Asylo tests pass. Tested-by: Seth Moore <sethmo@google.com>
Borislav Petkov <bp@alien8.de> writes: > On Thu, May 14, 2020 at 09:15:59AM -0700, Sean Christopherson wrote: >> I'm not opposed to adding a kernel param to disable SGX. At one point >> there was a proposal to extend clearcpuid to allow disabling multiple >> feature bits, but it looks like that went the way of the dodo. clearcpuid is a trainwreck which should have never happened. >> Note, such a param would disable SGX entirely, e.g. clear the feature bit >> in /proc/cpuinfo and prevent any in-kernel SGX code from running. > > It is a usual practice for big features like SGX to add a > "nosgx" cmdline param to disable it in case something goes > south. We do this for all features - see all "no*" switches in > Documentation/admin-guide/kernel-parameters.txt Correct. Thanks, tglx
Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> writes: > > General question: maybe it would be easiest that I issue a pull request > once everyone feels that the series is ready to be pulled and stop sending > new versions of the series? Might be the easiest for you, but I prefer a final series in email. Thanks, tglx
On Thu, 2020-05-14 at 09:15 -0700, Sean Christopherson wrote: > On Thu, May 14, 2020 at 04:16:37AM -0500, Dr. Greg wrote: > > What we would recommend at this point is that you and Jarkko do the > > Linux community and beyond a favor and wire up a simple kernel > > command-line parameter that controls the ability of the driver to be > > used, ie. enables/disables access to /dev/sgx/enclave. > > I'm not opposed to adding a kernel param to disable SGX. At one point > there was a proposal to extend clearcpuid to allow disabling multiple > feature bits, but it looks like that went the way of the dodo. > > Note, such a param would disable SGX entirely, e.g. clear the feature bit > in /proc/cpuinfo and prevent any in-kernel SGX code from running. Greg, you are free to submit a patch for review that adds any possible kernel command line parameter SGX and beyond. SGX support does not "wire up" anything that would prevent reviewing such patches. /Jarkko
On Thu, 2020-05-14 at 21:30 +0200, Thomas Gleixner wrote: > Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> writes: > > General question: maybe it would be easiest that I issue a pull request > > once everyone feels that the series is ready to be pulled and stop sending > > new versions of the series? > > Might be the easiest for you, but I prefer a final series in email. > > Thanks, > > tglx For me both are just as easy or as hard :-) Just wanted to query the preference. /Jarkko
On Thu, 2020-05-14 at 12:05 -0700, Seth Moore wrote: > On Fri, May 8, 2020 at 12:08 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > Adding some Google folks to the party. > > Thanks, Sean. > > > On Wed, Apr 22, 2020 at 12:52:56AM +0300, Jarkko Sakkinen wrote: > > > Intel(R) SGX is a set of CPU instructions that can be used by applications > > > to set aside private regions of code and data. The code outside the enclave > > > is disallowed to access the memory inside the enclave by the CPU access > > > control. > > > > > > There is a new hardware unit in the processor called Memory Encryption > > > Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define > > > one or many MEE regions that can hold enclave data by configuring them with > > > PRMRR registers. > > > > > > The MEE automatically encrypts the data leaving the processor package to > > > the MEE regions. The data is encrypted using a random key whose life-time > > > is exactly one power cycle. > > > > > > The current implementation requires that the firmware sets > > > IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can > > > decide what enclaves it wants run. The implementation does not create > > > any bottlenecks to support read-only MSRs later on. > > > > > > You can tell if your CPU supports SGX by looking into /proc/cpuinfo: > > > > > > cat /proc/cpuinfo | grep sgx > > We applied the v29 patches to Linux 5.6.0, then tested on Xeon(R) E-2186G > with Asylo (http://asylo.dev). > > Looks good. All Asylo tests pass. > > Tested-by: Seth Moore <sethmo@google.com> Thanks. /Jarkko
On Thu, 2020-05-14 at 18:20 +0200, Borislav Petkov wrote: > On Thu, May 14, 2020 at 09:15:59AM -0700, Sean Christopherson wrote: > > I'm not opposed to adding a kernel param to disable SGX. At one point > > there was a proposal to extend clearcpuid to allow disabling multiple > > feature bits, but it looks like that went the way of the dodo. > > > > Note, such a param would disable SGX entirely, e.g. clear the feature bit > > in /proc/cpuinfo and prevent any in-kernel SGX code from running. > > It is a usual practice for big features like SGX to add a > "nosgx" cmdline param to disable it in case something goes > south. We do this for all features - see all "no*" switches in > Documentation/admin-guide/kernel-parameters.txt Uh oh, should probably address this. Should I send v31 today with a "nosgx" patch added? Sorry for missing this one :-/ /Jarkko
On Fri, May 15, 2020 at 12:28:54PM +0300, Jarkko Sakkinen wrote: > Uh oh, should probably address this. Should I send v31 today with a "nosgx" > patch added? Sorry for missing this one :-/ Not the whole thing - just the one patch as a reply to your thread pls. Thx.
On Fri, 2020-05-15 at 11:42 +0200, Borislav Petkov wrote: > On Fri, May 15, 2020 at 12:28:54PM +0300, Jarkko Sakkinen wrote: > > Uh oh, should probably address this. Should I send v31 today with a "nosgx" > > patch added? Sorry for missing this one :-/ > > Not the whole thing - just the one patch as a reply to your thread pls. > > Thx. OK, cool, thank you. /Jarkko
On Thu, May 14, 2020 at 5:17 AM Dr. Greg <greg@enjellic.com> wrote: > > On Fri, May 08, 2020 at 12:56:35PM -0700, Sean Christopherson wrote: > > Good morning, I hope the week is proceeding well for everyone. > > > On Fri, May 08, 2020 at 02:02:26PM -0500, Dr. Greg wrote: > > > On Thu, May 07, 2020 at 02:41:30AM +0200, Thomas Gleixner wrote: > > > > The diffstat of your patch is irrelevant. What's relevant is the > > > > fact that it is completely unreviewed and that it creates yet > > > > another user space visible ABI with a noticable lack of > > > > documentation. > > > > ... > > > > > As to lack of review, we would certainly welcome technical and API > > > comments but we cannot force them. > > > Thomas' point isn't that your code needs to be reviewed, it's that > > trying to squeeze it into the initial merge will, not might, _will_ > > push out the acceptance of SGX. The same is true for other features > > we have lined up, e.g. KVM and cgroup support. It's not a slight on > > your code, it's just reality at this point. > > For the record and for whatever it is worth at this point. The patch > has been available in its present form and function for around 14 > months. > > So there was plenty of time for its consideration and integration into > what is being prepared as the final merge candidate. > > > > In fact, anyone who reviews the patch will see that the current driver > > > creates a pointer in the ioctl call to pass downward into the hardware > > > initialization routines. Our code simply replaces that pointer with > > > one supplied by userspace. > > > Personally, I'm in favor of adding a reserved placeholder for a > > token so as to avoid adding a second ioctl() in the event that > > something gets upstreamed that wants the token. Ditto for a file > > descriptor for the backing store in sgx_enclave_create. > > That would be a very low cost and forward looking addition. > > > > At the very least a modular form of the driver should be > > > considered that would allow alternate implementations. Sean > > > indicated that there was a 'kludgy' approach that would allow an > > > alternate modular implementation alongside the in-kernel driver. > > > I believe that Andy has already indicated that may not be an > > > advisable approach. > > > Modularizing the the driver does nothing for your use case. The > > "driver" is a relatively lightweight wrapper, removing that is akin > > to making /dev/sgx/enclave inaccessible, i.e. it doesn't make the > > EPC tracking and core code go away. Modularizing the whole thing is > > flat out not an option, as doing so is not compatible with an EPC > > cgroup and adds unnecessary complexity to KVM enabling, both of > > which are highly desired features by pretty much everyone that plans > > on using SGX. > > All well taken technical points, but they don't speak directly to the > issue at hand. The potential security issue in all of this is access > to /dev/sgx/enclave, not the EPC tracking and core code. > > Here in a nutshell is the paradox the kernel faces: > > No one seems to be disputing the fact that the primary focus of this > driver will be to support the notion of 'runtime encryption' and > Confidential Computing. The whole premise of the concept and economic > predicate of the initiative is that the owner/manager of the platform > has no visibility into what is being done on the platform. > > If the Linux community believes that standard platform security > controls can make qualitatively good judgements on what enclave based > execution is doing, it is effectively making the statement that the > very concept of runtime encryption and by extension Confidential > Computing is invalid. > > If we accept the concept that Confidential Computing is valid then we > admit the technology provides the opportunity for unsupervised code > execution and data manipulation. > > Our premise in all of this is that one page of code implementing three > linked lists is a small price to pay to provide platform owners with > the opportunity to optionally implement what is effectively 2-factor > authentication over this process. > > Going forward, if we are intellectually honest, all of this suggests > that adding complexity to the driver with LSM controls makes little > sense technically. Since, by definition and economic intent, the > technology provides tools and infrastructure that allows these > controls to be evaded. > > The notion that entities with adversarial intent would not try and > take advantage of this flies in the face of everything we know about > security. > > > Andy is right to caution against playing games to squish in-kernel > > things, but the virtualization snafu is a completely different > > beast. E.g. SGX doesn't require fiddling with CR4, won't corrupt > > random memory across kexec(), doesn't block INIT, etc... And I'm > > not advocating long-term shenanigans, I just wanted to point out > > that there are options for running out-of-tree drivers in the short > > term, e.g. until proper policy controls land upstream. > > It appears that the distributions, at least IBM/RHEL, are going to > compile this driver in and backport it as well. The (Red Hat sponsored) Enarx project will continue building an unofficial, unsupported version of the Fedora kernel with the SGX patches[0] until such time as the patches are upstream. Once upstream, I intend to propose that the feature be enabled in the stock Fedora kernel. Enarx requires EDMM support as a prerequisite to being production ready. Therefore, we are likely to continue building this custom Fedora kernel with the latest patches until such point as EDMM support lands upstream. This also implies that I have no current plan to request an SGX backport to a RHEL kernel until such time as it supports our full needs. Disclaimer: I do not control RHEL or Fedora kernel features. None of the above constitutes a commitment to deliver anything. [0]: https://copr.fedorainfracloud.org/coprs/npmccallum/enarx/package/kernel/ > What we would recommend at this point is that you and Jarkko do the > Linux community and beyond a favor and wire up a simple kernel > command-line parameter that controls the ability of the driver to be > used, ie. enables/disables access to /dev/sgx/enclave. > > Distributions or others can set this command-line parameter by default > to 'disable' and avoid any possibility of the technology being used > for nefarious purposes. Since the technology now appears to be > focused only on the cloud providers they will certainly be capable of > configuring their implementations to change the default. > > In essence, make the kernel's behavior secure by default. > > Best wishes for a pleasant weekend to everyone. > > Dr. Greg > > As always, > Dr. Greg Wettstein, Ph.D, Worker Artisans in autonomously > Enjellic Systems Development, LLC self-defensive IOT platforms > 4206 N. 19th Ave. and edge devices. > Fargo, ND 58102 > PH: 701-281-1686 EMAIL: greg@enjellic.com > ------------------------------------------------------------------------------ > "Intellectuals solve problems; geniuses prevent them." > -- Albert Einstein >
On Fri, 2020-05-15 at 15:54 -0400, Nathaniel McCallum wrote: > The (Red Hat sponsored) Enarx project will continue building an > unofficial, unsupported version of the Fedora kernel with the SGX > patches[0] until such time as the patches are upstream. Once upstream, > I intend to propose that the feature be enabled in the stock Fedora > kernel. > > Enarx requires EDMM support as a prerequisite to being production > ready. Therefore, we are likely to continue building this custom > Fedora kernel with the latest patches until such point as EDMM support > lands upstream. This also implies that I have no current plan to > request an SGX backport to a RHEL kernel until such time as it > supports our full needs. > > Disclaimer: I do not control RHEL or Fedora kernel features. None of > the above constitutes a commitment to deliver anything. > > [0]: https://copr.fedorainfracloud.org/coprs/npmccallum/enarx/package/kernel/ SGX is somewhat self-contained feature, i.e. should be easy to backport for any recent kernel. Only the vDSO is outside of arch/x86/kernel/cpu/sgx. /Jarkko
Hi! > > > At the very least a modular form of the driver should be > > > considered that would allow alternate implementations. Sean > > > indicated that there was a 'kludgy' approach that would allow an > > > alternate modular implementation alongside the in-kernel driver. > > > I believe that Andy has already indicated that may not be an > > > advisable approach. > > > Modularizing the the driver does nothing for your use case. The > > "driver" is a relatively lightweight wrapper, removing that is akin > > to making /dev/sgx/enclave inaccessible, i.e. it doesn't make the Well... SGX is proprietary feature of Intel. I don't see any effort for standartization so that other architectures could use it. Yet it provides userspace interface... You clearly want distros to enable it, but that will waste memory on non-Intel systems. That is not good. > Here in a nutshell is the paradox the kernel faces: > > No one seems to be disputing the fact that the primary focus of this > driver will be to support the notion of 'runtime encryption' and > Confidential Computing. The whole premise of the concept and economic > predicate of the initiative is that the owner/manager of the platform > has no visibility into what is being done on the platform. Well, in my eyes preventing owner of the machine from accessing all its parts is pretty questionable. Physics says it is impossible, many tried, and many failed. Why it should be different this time? > If the Linux community believes that standard platform security > controls can make qualitatively good judgements on what enclave based > execution is doing, it is effectively making the statement that the > very concept of runtime encryption and by extension Confidential > Computing is invalid. And yes, I believe that concept of Confidential Computing is invalid.. and we should simply not merge this support. It provides false sense of security, and I'm afraid big companies will try to force people to use it. "DRM, now with hardware support". "Finally advertisments you can not skip". "Just run this piece of code on your machine to access your bank account. Trust us!" :-(. Pavel
From: Pavel Machek > Sent: 24 May 2020 22:27 .. > It provides false sense of security, and I'm afraid big companies will try to force > people to use it. "DRM, now with hardware support". "Finally advertisments you can > not skip". "Just run this piece of code on your machine to access your bank account. > Trust us!" Hey malware guys, here is somewhere you can hide your code making it very difficult to find. You can then use the hardware disk encryption that people think gives them security to encrypt all the files. Job done... David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, May 07, 2020 at 05:25:55PM -0700, Sean Christopherson wrote: > Ah, fudge. shmem_zero_setup() triggers shmem_acct_size() and thus > __vm_enough_memory(). Which I should have rememered because I've stared > at that code several times when dealing with the enclave's backing store. > I wasn't seeing the issue because I happened to use MAP_PRIVATE. > > So, bad analysis, good conclusion, i.e. the kernel is still doing the > right thing, it's just not ideal for userspace. > > > Jarkko, we should update the docs and selftest to recommend and use > > PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS > > or > > PROT_NONE, MAP_SHARED | MAP_NORESERVE | MAP_ANONYMOUS" > > when carving out ELRANGE, with an explicit comment that all the normal > rules for mapping memory still apply. Ugh, had forgotten this. OK, I guess this comment explains it all: " /* * shmem_file_setup pre-accounts the whole fixed size of a VM object, * for shared memory and for shared anonymous (/dev/zero) mappings * (unless MAP_NORESERVE and sysctl_overcommit_memory <= 1), * consistent with the pre-accounting of private mappings ... */ static inline int shmem_acct_size(unsigned long flags, loff_t size) " /Jarkko
On Thu, May 28, 2020 at 02:15:18PM +0300, Jarkko Sakkinen wrote: > On Thu, May 07, 2020 at 05:25:55PM -0700, Sean Christopherson wrote: > > Ah, fudge. shmem_zero_setup() triggers shmem_acct_size() and thus > > __vm_enough_memory(). Which I should have rememered because I've stared > > at that code several times when dealing with the enclave's backing store. > > I wasn't seeing the issue because I happened to use MAP_PRIVATE. > > > > So, bad analysis, good conclusion, i.e. the kernel is still doing the > > right thing, it's just not ideal for userspace. > > > > > > Jarkko, we should update the docs and selftest to recommend and use > > > > PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS > > > > or > > > > PROT_NONE, MAP_SHARED | MAP_NORESERVE | MAP_ANONYMOUS" > > > > when carving out ELRANGE, with an explicit comment that all the normal > > rules for mapping memory still apply. > > Ugh, had forgotten this. > > OK, I guess this comment explains it all: > > " > /* > * shmem_file_setup pre-accounts the whole fixed size of a VM object, > * for shared memory and for shared anonymous (/dev/zero) mappings > * (unless MAP_NORESERVE and sysctl_overcommit_memory <= 1), > * consistent with the pre-accounting of private mappings ... > */ > static inline int shmem_acct_size(unsigned long flags, loff_t size) > " Do not agree though that any documentation should be produced but the selftest should have correct parameters, yes. Instructions on how to reserve a range of addresses simply does not belong to SGX documentation because it is not SGX related in the first place. The patterns you showed are universal. I'll fix just the selftest for v31. /Jarkko