Message ID | 4ce06608b5351f65f4e6bc6fc87c88a71215a2e7.1644274683.git.reinette.chatre@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/sgx and selftests/sgx: Support SGX2 | expand |
On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote: > In the initial (SGX1) version of SGX, pages in an enclave need to be > created with permissions that support all usages of the pages, from the > time the enclave is initialized until it is unloaded. For example, > pages used by a JIT compiler or when code needs to otherwise be > relocated need to always have RWX permissions. > > SGX2 includes a new function ENCLS[EMODPR] that is run from the kernel > and can be used to restrict the EPCM permissions of regular enclave > pages within an initialized enclave. > > Introduce ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS to support > restricting EPCM permissions. With this ioctl() the user specifies > a page range and the permissions to be applied to all pages in > the provided range. After checking the new permissions (more detail > below) the page table entries are reset and any new page > table entries will contain the new, restricted, permissions. > ENCLS[EMODPR] is run to restrict the EPCM permissions followed by > the ENCLS[ETRACK] flow that will ensure no cached > linear-to-physical address mappings to the changed pages remain. > > It is possible for the permission change request to fail on any > page within the provided range, either with an error encountered > by the kernel or by the SGX hardware while running > ENCLS[EMODPR]. To support partial success the ioctl() returns an > error code based on failures encountered by the kernel as well > as two result output parameters: one for the number of pages > that were successfully changed and one for the SGX return code. > > Checking user provided new permissions > ====================================== > > Enclave page permission changes need to be approached with care and > for this reason permission changes are only allowed if the new > permissions are the same or more restrictive that the vetted > permissions. No additional checking is done to ensure that the > permissions are actually being restricted. This is because the > enclave may have relaxed the EPCM permissions from within > the enclave without letting the kernel know. An attempt to relax > permissions using this call will be ignored by the hardware. > > For example, together with the support for relaxing of EPCM permissions, > enclave pages added with the vetted permissions in brackets below > are allowed to have permissions as follows: > * (RWX) => RW => R => RX => RWX > * (RW) => R => RW > * (RX) => R => RX > > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> > --- > Changes since V1: > - Change terminology to use "relax" instead of "extend" to refer to > the case when enclave page permissions are added (Dave). > - Use ioctl() in commit message (Dave). > - Add examples on what permissions would be allowed (Dave). > - Split enclave page permission changes into two ioctl()s, one for > permission restricting (SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS) > and one for permission relaxing (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS) > (Jarkko). > - In support of the ioctl() name change the following names have been > changed: > struct sgx_page_modp -> struct sgx_enclave_restrict_perm > sgx_ioc_page_modp() -> sgx_ioc_enclave_restrict_perm() > sgx_page_modp() -> sgx_enclave_restrict_perm() > - ioctl() takes entire secinfo as input instead of > page permissions only (Jarkko). > - Fix kernel-doc to include () in function name. > - Create and use utility for the ETRACK flow. > - Fixups in comments > - Move kernel-doc to function that provides documentation for > Documentation/x86/sgx.rst. > - Remove redundant comment. > - Make explicit which members of struct sgx_enclave_restrict_perm > are for output (Dave). > > arch/x86/include/uapi/asm/sgx.h | 21 +++ > arch/x86/kernel/cpu/sgx/encl.c | 4 +- > arch/x86/kernel/cpu/sgx/encl.h | 3 + > arch/x86/kernel/cpu/sgx/ioctl.c | 229 ++++++++++++++++++++++++++++++++ > 4 files changed, 255 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h > index 5c678b27bb72..b0ffb80bc67f 100644 > --- a/arch/x86/include/uapi/asm/sgx.h > +++ b/arch/x86/include/uapi/asm/sgx.h > @@ -31,6 +31,8 @@ enum sgx_page_flags { > _IO(SGX_MAGIC, 0x04) > #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \ > _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm) > +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \ > + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm) > > /** > * struct sgx_enclave_create - parameter structure for the > @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm { > __u64 count; > }; > > +/** > + * struct sgx_enclave_restrict_perm - parameters for ioctl > + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS > + * @offset: starting page offset (page aligned relative to enclave base > + * address defined in SECS) > + * @length: length of memory (multiple of the page size) > + * @secinfo: address for the SECINFO data containing the new permission bits > + * for pages in range described by @offset and @length > + * @result: (output) SGX result code of ENCLS[EMODPR] function > + * @count: (output) bytes successfully changed (multiple of page size) > + */ > +struct sgx_enclave_restrict_perm { > + __u64 offset; > + __u64 length; > + __u64 secinfo; > + __u64 result; > + __u64 count; > +}; > + > struct sgx_enclave_run; > > /** > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > index 8da813504249..a5d4a7efb986 100644 > --- a/arch/x86/kernel/cpu/sgx/encl.c > +++ b/arch/x86/kernel/cpu/sgx/encl.c > @@ -90,8 +90,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page, > return epc_page; > } > > -static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, > - unsigned long addr) > +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, > + unsigned long addr) > { > struct sgx_epc_page *epc_page; > struct sgx_encl_page *entry; > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h > index cb9f16d457ac..848a28d28d3d 100644 > --- a/arch/x86/kernel/cpu/sgx/encl.h > +++ b/arch/x86/kernel/cpu/sgx/encl.h > @@ -120,4 +120,7 @@ void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset); > bool sgx_va_page_full(struct sgx_va_page *va_page); > void sgx_encl_free_epc_page(struct sgx_epc_page *page); > > +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, > + unsigned long addr); > + > #endif /* _X86_ENCL_H */ > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c > index 9cc6af404bf6..23bdf558b231 100644 > --- a/arch/x86/kernel/cpu/sgx/ioctl.c > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c > @@ -894,6 +894,232 @@ static long sgx_ioc_enclave_relax_perm(struct sgx_encl *encl, void __user *arg) > return ret; > } > > +/* > + * Some SGX functions require that no cached linear-to-physical address > + * mappings are present before they can succeed. Collaborate with > + * hardware via ENCLS[ETRACK] to ensure that all cached > + * linear-to-physical address mappings belonging to all threads of > + * the enclave are cleared. See sgx_encl_cpumask() for details. > + */ > +static int sgx_enclave_etrack(struct sgx_encl *encl) > +{ > + void *epc_virt; > + int ret; > + > + epc_virt = sgx_get_epc_virt_addr(encl->secs.epc_page); > + ret = __etrack(epc_virt); > + if (ret) { > + /* > + * ETRACK only fails when there is an OS issue. For > + * example, two consecutive ETRACK was sent without > + * completed IPI between. > + */ > + pr_err_once("ETRACK returned %d (0x%x)", ret, ret); > + /* > + * Send IPIs to kick CPUs out of the enclave and > + * try ETRACK again. > + */ > + on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1); > + ret = __etrack(epc_virt); > + if (ret) { > + pr_err_once("ETRACK repeat returned %d (0x%x)", > + ret, ret); > + return -EFAULT; > + } > + } > + on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1); > + > + return 0; > +} > + > +/** > + * sgx_enclave_restrict_perm() - Restrict EPCM permissions and align OS view > + * @encl: Enclave to which the pages belong. > + * @modp: Checked parameters from user on which pages need modifying. > + * @secinfo_perm: New (validated) permission bits. > + * > + * Return: > + * - 0: Success. > + * - -errno: Otherwise. > + */ > +static long sgx_enclave_restrict_perm(struct sgx_encl *encl, > + struct sgx_enclave_restrict_perm *modp, > + u64 secinfo_perm) > +{ > + unsigned long vm_prot, run_prot_restore; > + struct sgx_encl_page *entry; > + struct sgx_secinfo secinfo; > + unsigned long addr; > + unsigned long c; > + void *epc_virt; > + int ret; > + > + memset(&secinfo, 0, sizeof(secinfo)); > + secinfo.flags = secinfo_perm; > + > + vm_prot = vm_prot_from_secinfo(secinfo_perm); > + > + for (c = 0 ; c < modp->length; c += PAGE_SIZE) { > + addr = encl->base + modp->offset + c; > + > + mutex_lock(&encl->lock); > + > + entry = sgx_encl_load_page(encl, addr); > + if (IS_ERR(entry)) { > + ret = PTR_ERR(entry) == -EBUSY ? -EAGAIN : -EFAULT; > + goto out_unlock; > + } > + > + /* > + * Changing EPCM permissions is only supported on regular > + * SGX pages. Attempting this change on other pages will > + * result in #PF. > + */ > + if (entry->type != SGX_PAGE_TYPE_REG) { > + ret = -EINVAL; > + goto out_unlock; > + } > + > + /* > + * Do not verify if current runtime protection bits are what > + * is being requested. The enclave may have relaxed EPCM > + * permissions calls without letting the kernel know and > + * thus permission restriction may still be needed even if > + * from the kernel's perspective the permissions are unchanged. > + */ > + > + /* New permissions should never exceed vetted permissions. */ > + if ((entry->vm_max_prot_bits & vm_prot) != vm_prot) { > + ret = -EPERM; > + goto out_unlock; > + } > + > + /* Make sure page stays around while releasing mutex. */ > + if (sgx_unmark_page_reclaimable(entry->epc_page)) { > + ret = -EAGAIN; > + goto out_unlock; > + } > + > + /* > + * Change runtime protection before zapping PTEs to ensure > + * any new #PF uses new permissions. EPCM permissions (if > + * needed) not changed yet. > + */ > + run_prot_restore = entry->vm_run_prot_bits; > + entry->vm_run_prot_bits = vm_prot; > + > + mutex_unlock(&encl->lock); > + /* > + * Do not keep encl->lock because of dependency on > + * mmap_lock acquired in sgx_zap_enclave_ptes(). > + */ > + sgx_zap_enclave_ptes(encl, addr); > + > + mutex_lock(&encl->lock); > + > + /* Change EPCM permissions. */ > + epc_virt = sgx_get_epc_virt_addr(entry->epc_page); > + ret = __emodpr(&secinfo, epc_virt); > + if (encls_faulted(ret)) { > + /* > + * All possible faults should be avoidable: > + * parameters have been checked, will only change > + * permissions of a regular page, and no concurrent > + * SGX1/SGX2 ENCLS instructions since these > + * are protected with mutex. > + */ > + pr_err_once("EMODPR encountered exception %d\n", > + ENCLS_TRAPNR(ret)); > + ret = -EFAULT; > + goto out_prot_restore; > + } > + if (encls_failed(ret)) { > + modp->result = ret; > + ret = -EFAULT; > + goto out_prot_restore; > + } > + > + ret = sgx_enclave_etrack(encl); > + if (ret) { > + ret = -EFAULT; > + goto out_reclaim; > + } > + > + sgx_mark_page_reclaimable(entry->epc_page); > + mutex_unlock(&encl->lock); > + } > + > + ret = 0; > + goto out; > + > +out_prot_restore: > + entry->vm_run_prot_bits = run_prot_restore; > +out_reclaim: > + sgx_mark_page_reclaimable(entry->epc_page); > +out_unlock: > + mutex_unlock(&encl->lock); > +out: > + modp->count = c; > + > + return ret; > +} > + > +/** > + * sgx_ioc_enclave_restrict_perm() - handler for > + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS > + * @encl: an enclave pointer > + * @arg: userspace pointer to a &struct sgx_enclave_restrict_perm > + * instance > + * > + * SGX2 distinguishes between relaxing and restricting the enclave page > + * permissions maintained by the hardware (EPCM permissions) of pages > + * belonging to an initialized enclave (after SGX_IOC_ENCLAVE_INIT). > + * > + * EPCM permissions cannot be restricted from within the enclave, the enclave > + * requires the kernel to run the privileged level 0 instructions ENCLS[EMODPR] > + * and ENCLS[ETRACK]. An attempt to relax EPCM permissions with this call > + * will be ignored by the hardware. > + * > + * Enclave page permissions are not allowed to exceed the maximum vetted > + * permissions maintained in &struct sgx_encl_page->vm_max_prot_bits. > + * > + * Return: > + * - 0: Success > + * - -errno: Otherwise > + */ > +static long sgx_ioc_enclave_restrict_perm(struct sgx_encl *encl, > + void __user *arg) > +{ > + struct sgx_enclave_restrict_perm params; > + u64 secinfo_perm; > + long ret; > + > + ret = sgx_ioc_sgx2_ready(encl); > + if (ret) > + return ret; > + > + if (copy_from_user(¶ms, arg, sizeof(params))) > + return -EFAULT; > + > + if (sgx_validate_offset_length(encl, params.offset, params.length)) > + return -EINVAL; > + > + ret = sgx_perm_from_user_secinfo((void __user *)params.secinfo, > + &secinfo_perm); > + if (ret) > + return ret; > + > + if (params.result || params.count) > + return -EINVAL; > + > + ret = sgx_enclave_restrict_perm(encl, ¶ms, secinfo_perm); > + > + if (copy_to_user(arg, ¶ms, sizeof(params))) > + return -EFAULT; > + > + return ret; > +} > + > long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) > { > struct sgx_encl *encl = filep->private_data; > @@ -918,6 +1144,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) > case SGX_IOC_ENCLAVE_RELAX_PERMISSIONS: > ret = sgx_ioc_enclave_relax_perm(encl, (void __user *)arg); > break; > + case SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS: > + ret = sgx_ioc_enclave_restrict_perm(encl, (void __user *)arg); > + break; > default: > ret = -ENOIOCTLCMD; > break; > -- > 2.25.1 > Just a suggestion but these might be a bit less cluttered explanations of the fields: /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure #[repr(C)] pub struct RelaxPermissions { /// In: starting page offset offset: u64, /// In: length of the address range (multiple of the page size) length: u64, /// In: SECINFO containing the relaxed permissions secinfo: u64, /// Out: length of the address range successfully changed count: u64, }; /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure #[repr(C)] pub struct RestrictPermissions { /// In: starting page offset offset: u64, /// In: length of the address range (multiple of the page size) length: u64, /// In: SECINFO containing the restricted permissions secinfo: u64, /// In: ENCLU[EMODPR] return value result: u64, /// Out: length of the address range successfully changed count: u64, }; I can live with the current ones too but I rewrote them so that I can quickly make sense of the fields later. It's Rust code but the point is the documentation... Also, it should not be too much trouble to use the struct in user space code even if the struct names are struct sgx_enclave_relax_permissions and struct sgx_enclave_restrict_permissions, given that you most likely have exactly single call-site in the run-time. Other than that, looks quite good. BR, Jarkko
Hi Jarkko, On 2/20/2022 4:49 PM, Jarkko Sakkinen wrote: > On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote: ... >> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h >> index 5c678b27bb72..b0ffb80bc67f 100644 >> --- a/arch/x86/include/uapi/asm/sgx.h >> +++ b/arch/x86/include/uapi/asm/sgx.h >> @@ -31,6 +31,8 @@ enum sgx_page_flags { >> _IO(SGX_MAGIC, 0x04) >> #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \ >> _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm) >> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \ >> + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm) >> >> /** >> * struct sgx_enclave_create - parameter structure for the >> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm { >> __u64 count; >> }; >> >> +/** >> + * struct sgx_enclave_restrict_perm - parameters for ioctl >> + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS >> + * @offset: starting page offset (page aligned relative to enclave base >> + * address defined in SECS) >> + * @length: length of memory (multiple of the page size) >> + * @secinfo: address for the SECINFO data containing the new permission bits >> + * for pages in range described by @offset and @length >> + * @result: (output) SGX result code of ENCLS[EMODPR] function >> + * @count: (output) bytes successfully changed (multiple of page size) >> + */ >> +struct sgx_enclave_restrict_perm { >> + __u64 offset; >> + __u64 length; >> + __u64 secinfo; >> + __u64 result; >> + __u64 count; >> +}; >> + >> struct sgx_enclave_run; >> >> /** ... > > Just a suggestion but these might be a bit less cluttered explanations of > the fields: > > /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure > #[repr(C)] > pub struct RelaxPermissions { > /// In: starting page offset > offset: u64, > /// In: length of the address range (multiple of the page size) > length: u64, > /// In: SECINFO containing the relaxed permissions > secinfo: u64, > /// Out: length of the address range successfully changed > count: u64, > }; > > /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure > #[repr(C)] > pub struct RestrictPermissions { > /// In: starting page offset > offset: u64, > /// In: length of the address range (multiple of the page size) > length: u64, > /// In: SECINFO containing the restricted permissions > secinfo: u64, > /// In: ENCLU[EMODPR] return value > result: u64, > /// Out: length of the address range successfully changed > count: u64, > }; In your proposal you shorten the descriptions from the current implementation. I do consider the removed information valuable since I believe that it helps users understand the kernel interface requirements without needing to be familiar with or dig into the kernel code to understand how the provided data is used. For example, you shorten offset to "starting page offset", but what was removed was the requirement that this offset has to be page aligned and what the offset is relative to. I do believe summarizing these requirements upfront helps a user space developer by not needing to dig through kernel code later in order to understand why an -EINVAL was received. > I can live with the current ones too but I rewrote them so that I can > quickly make sense of the fields later. It's Rust code but the point is > the documentation... Since you do seem to be ok with the current descriptions I would prefer to keep them. > Also, it should not be too much trouble to use the struct in user space > code even if the struct names are struct sgx_enclave_relax_permissions and > struct sgx_enclave_restrict_permissions, given that you most likely have > exactly single call-site in the run-time. Are you requesting that I make the following name changes? struct sgx_enclave_relax_perm -> struct sgx_enclave_relax_permissions struct sgx_enclave_restrict_perm -> struct sgx_enclave_restrict_permissions If so, do you want the function names also written out in this way? sgx_enclave_relax_perm() -> sgx_enclave_relax_permissions() sgx_ioc_enclave_relax_perm() -> sgx_ioc_enclave_relax_permissions() sgx_enclave_restrict_perm() -> sgx_enclave_restrict_permissions() sgx_ioc_enclave_restrict_perm() -> sgx_ioc_enclave_restrict_permissions() > Other than that, looks quite good. Thank you very much for reviewing and testing this work. Reinette
On Tue, Feb 22, 2022 at 10:35:04AM -0800, Reinette Chatre wrote: > Hi Jarkko, > > On 2/20/2022 4:49 PM, Jarkko Sakkinen wrote: > > On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote: > > ... > > >> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h > >> index 5c678b27bb72..b0ffb80bc67f 100644 > >> --- a/arch/x86/include/uapi/asm/sgx.h > >> +++ b/arch/x86/include/uapi/asm/sgx.h > >> @@ -31,6 +31,8 @@ enum sgx_page_flags { > >> _IO(SGX_MAGIC, 0x04) > >> #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \ > >> _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm) > >> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \ > >> + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm) > >> > >> /** > >> * struct sgx_enclave_create - parameter structure for the > >> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm { > >> __u64 count; > >> }; > >> > >> +/** > >> + * struct sgx_enclave_restrict_perm - parameters for ioctl > >> + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS > >> + * @offset: starting page offset (page aligned relative to enclave base > >> + * address defined in SECS) > >> + * @length: length of memory (multiple of the page size) > >> + * @secinfo: address for the SECINFO data containing the new permission bits > >> + * for pages in range described by @offset and @length > >> + * @result: (output) SGX result code of ENCLS[EMODPR] function > >> + * @count: (output) bytes successfully changed (multiple of page size) > >> + */ > >> +struct sgx_enclave_restrict_perm { > >> + __u64 offset; > >> + __u64 length; > >> + __u64 secinfo; > >> + __u64 result; > >> + __u64 count; > >> +}; > >> + > >> struct sgx_enclave_run; > >> > >> /** > > ... > > > > > Just a suggestion but these might be a bit less cluttered explanations of > > the fields: > > > > /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure > > #[repr(C)] > > pub struct RelaxPermissions { > > /// In: starting page offset > > offset: u64, > > /// In: length of the address range (multiple of the page size) > > length: u64, > > /// In: SECINFO containing the relaxed permissions > > secinfo: u64, > > /// Out: length of the address range successfully changed > > count: u64, > > }; > > > > /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure > > #[repr(C)] > > pub struct RestrictPermissions { > > /// In: starting page offset > > offset: u64, > > /// In: length of the address range (multiple of the page size) > > length: u64, > > /// In: SECINFO containing the restricted permissions > > secinfo: u64, > > /// In: ENCLU[EMODPR] return value > > result: u64, > > /// Out: length of the address range successfully changed > > count: u64, > > }; > > In your proposal you shorten the descriptions from the current implementation. > I do consider the removed information valuable since I believe that it helps > users understand the kernel interface requirements without needing to be > familiar with or dig into the kernel code to understand how the provided data > is used. > > For example, you shorten offset to "starting page offset", but what was removed > was the requirement that this offset has to be page aligned and what the offset > is relative to. I do believe summarizing these requirements upfront helps > a user space developer by not needing to dig through kernel code later > in order to understand why an -EINVAL was received. > > > > I can live with the current ones too but I rewrote them so that I can > > quickly make sense of the fields later. It's Rust code but the point is > > the documentation... > > Since you do seem to be ok with the current descriptions I would prefer > to keep them. Yeah, they are fine to me. > > Also, it should not be too much trouble to use the struct in user space > > code even if the struct names are struct sgx_enclave_relax_permissions and > > struct sgx_enclave_restrict_permissions, given that you most likely have > > exactly single call-site in the run-time. > > Are you requesting that I make the following name changes? > struct sgx_enclave_relax_perm -> struct sgx_enclave_relax_permissions > struct sgx_enclave_restrict_perm -> struct sgx_enclave_restrict_permissions > > If so, do you want the function names also written out in this way? > sgx_enclave_relax_perm() -> sgx_enclave_relax_permissions() > sgx_ioc_enclave_relax_perm() -> sgx_ioc_enclave_relax_permissions() > sgx_enclave_restrict_perm() -> sgx_enclave_restrict_permissions() > sgx_ioc_enclave_restrict_perm() -> sgx_ioc_enclave_restrict_permissions() Yes, unless you have a specific reason to shorten them :-) > > Other than that, looks quite good. > > Thank you very much for reviewing and testing this work. NP > Reinette BR, Jarkko
Hi All, Regarding the recent update of splitting the page permissions change request into two IOCTLS (RELAX and RESTRICT), can we combine them into one? That is, revert to how it was done in the v1 version? Why? Currently in Gramine (a library OS for unmodified applications, https://gramineproject.io/) with the new proposed change, one needs to store the page permission for each page or range of pages. And for every request of `mmap` or `mprotect`, Gramine would have to do a lookup of the page permissions for the request range and then call the respective IOCTL either RESTRICT or RELAX. This seems a little overwhelming. Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do an `EACCEPT` irrespective of RELAX or RESTRICT page permission request? With this approach, we can avoid storing page permissions and simplify the implementation. I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` flows to do TLB shootdowns which might not be needed for RELAX IOCTL but I am not sure what will be the performance impact. Is there any data point to see the performance impact? Thanks, -Vijay > -----Original Message----- > From: Jarkko Sakkinen <jarkko@kernel.org> > Sent: Sunday, February 20, 2022 4:50 PM > To: Reinette Chatre <reinette.chatre@intel.com> > Cc: dave.hansen@linux.intel.com; tglx@linutronix.de; bp@alien8.de; > luto@kernel.org; mingo@redhat.com; linux-sgx@vger.kernel.org; > x86@kernel.org; seanjc@google.com; kai.huang@intel.com; > cathy.zhang@intel.com; cedric.xing@intel.com; haitao.huang@intel.com; > mark.shanahan@intel.com; hpa@zytor.com; linux-kernel@vger.kernel.org > Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page > permissions > > On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote: > > In the initial (SGX1) version of SGX, pages in an enclave need to be > > created with permissions that support all usages of the pages, from > > the time the enclave is initialized until it is unloaded. For example, > > pages used by a JIT compiler or when code needs to otherwise be > > relocated need to always have RWX permissions. > > > > SGX2 includes a new function ENCLS[EMODPR] that is run from the kernel > > and can be used to restrict the EPCM permissions of regular enclave > > pages within an initialized enclave. > > > > Introduce ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS to support > > restricting EPCM permissions. With this ioctl() the user specifies a > > page range and the permissions to be applied to all pages in the > > provided range. After checking the new permissions (more detail > > below) the page table entries are reset and any new page table entries > > will contain the new, restricted, permissions. > > ENCLS[EMODPR] is run to restrict the EPCM permissions followed by the > > ENCLS[ETRACK] flow that will ensure no cached linear-to-physical > > address mappings to the changed pages remain. > > > > It is possible for the permission change request to fail on any page > > within the provided range, either with an error encountered by the > > kernel or by the SGX hardware while running ENCLS[EMODPR]. To support > > partial success the ioctl() returns an error code based on failures > > encountered by the kernel as well as two result output parameters: one > > for the number of pages that were successfully changed and one for the > > SGX return code. > > > > Checking user provided new permissions > > ====================================== > > > > Enclave page permission changes need to be approached with care and > > for this reason permission changes are only allowed if the new > > permissions are the same or more restrictive that the vetted > > permissions. No additional checking is done to ensure that the > > permissions are actually being restricted. This is because the enclave > > may have relaxed the EPCM permissions from within the enclave without > > letting the kernel know. An attempt to relax permissions using this > > call will be ignored by the hardware. > > > > For example, together with the support for relaxing of EPCM > > permissions, enclave pages added with the vetted permissions in > > brackets below are allowed to have permissions as follows: > > * (RWX) => RW => R => RX => RWX > > * (RW) => R => RW > > * (RX) => R => RX > > > > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> > > --- > > Changes since V1: > > - Change terminology to use "relax" instead of "extend" to refer to > > the case when enclave page permissions are added (Dave). > > - Use ioctl() in commit message (Dave). > > - Add examples on what permissions would be allowed (Dave). > > - Split enclave page permission changes into two ioctl()s, one for > > permission restricting (SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS) > > and one for permission relaxing > (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS) > > (Jarkko). > > - In support of the ioctl() name change the following names have been > > changed: > > struct sgx_page_modp -> struct sgx_enclave_restrict_perm > > sgx_ioc_page_modp() -> sgx_ioc_enclave_restrict_perm() > > sgx_page_modp() -> sgx_enclave_restrict_perm() > > - ioctl() takes entire secinfo as input instead of > > page permissions only (Jarkko). > > - Fix kernel-doc to include () in function name. > > - Create and use utility for the ETRACK flow. > > - Fixups in comments > > - Move kernel-doc to function that provides documentation for > > Documentation/x86/sgx.rst. > > - Remove redundant comment. > > - Make explicit which members of struct sgx_enclave_restrict_perm > > are for output (Dave). > > > > arch/x86/include/uapi/asm/sgx.h | 21 +++ > > arch/x86/kernel/cpu/sgx/encl.c | 4 +- > > arch/x86/kernel/cpu/sgx/encl.h | 3 + > > arch/x86/kernel/cpu/sgx/ioctl.c | 229 > > ++++++++++++++++++++++++++++++++ > > 4 files changed, 255 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/include/uapi/asm/sgx.h > > b/arch/x86/include/uapi/asm/sgx.h index 5c678b27bb72..b0ffb80bc67f > > 100644 > > --- a/arch/x86/include/uapi/asm/sgx.h > > +++ b/arch/x86/include/uapi/asm/sgx.h > > @@ -31,6 +31,8 @@ enum sgx_page_flags { > > _IO(SGX_MAGIC, 0x04) > > #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \ > > _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm) > > +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \ > > + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm) > > > > /** > > * struct sgx_enclave_create - parameter structure for the @@ -95,6 > > +97,25 @@ struct sgx_enclave_relax_perm { > > __u64 count; > > }; > > > > +/** > > + * struct sgx_enclave_restrict_perm - parameters for ioctl > > + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS > > + * @offset: starting page offset (page aligned relative to enclave base > > + * address defined in SECS) > > + * @length: length of memory (multiple of the page size) > > + * @secinfo: address for the SECINFO data containing the new permission > bits > > + * for pages in range described by @offset and @length > > + * @result: (output) SGX result code of ENCLS[EMODPR] function > > + * @count: (output) bytes successfully changed (multiple of page size) > > + */ > > +struct sgx_enclave_restrict_perm { > > + __u64 offset; > > + __u64 length; > > + __u64 secinfo; > > + __u64 result; > > + __u64 count; > > +}; > > + > > struct sgx_enclave_run; > > > > /** > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c > > b/arch/x86/kernel/cpu/sgx/encl.c index 8da813504249..a5d4a7efb986 > > 100644 > > --- a/arch/x86/kernel/cpu/sgx/encl.c > > +++ b/arch/x86/kernel/cpu/sgx/encl.c > > @@ -90,8 +90,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct > sgx_encl_page *encl_page, > > return epc_page; > > } > > > > -static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, > > - unsigned long addr) > > +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, > > + unsigned long addr) > > { > > struct sgx_epc_page *epc_page; > > struct sgx_encl_page *entry; > > diff --git a/arch/x86/kernel/cpu/sgx/encl.h > > b/arch/x86/kernel/cpu/sgx/encl.h index cb9f16d457ac..848a28d28d3d > > 100644 > > --- a/arch/x86/kernel/cpu/sgx/encl.h > > +++ b/arch/x86/kernel/cpu/sgx/encl.h > > @@ -120,4 +120,7 @@ void sgx_free_va_slot(struct sgx_va_page > *va_page, > > unsigned int offset); bool sgx_va_page_full(struct sgx_va_page > > *va_page); void sgx_encl_free_epc_page(struct sgx_epc_page *page); > > > > +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, > > + unsigned long addr); > > + > > #endif /* _X86_ENCL_H */ > > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c > > b/arch/x86/kernel/cpu/sgx/ioctl.c index 9cc6af404bf6..23bdf558b231 > > 100644 > > --- a/arch/x86/kernel/cpu/sgx/ioctl.c > > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c > > @@ -894,6 +894,232 @@ static long sgx_ioc_enclave_relax_perm(struct > sgx_encl *encl, void __user *arg) > > return ret; > > } > > > > +/* > > + * Some SGX functions require that no cached linear-to-physical > > +address > > + * mappings are present before they can succeed. Collaborate with > > + * hardware via ENCLS[ETRACK] to ensure that all cached > > + * linear-to-physical address mappings belonging to all threads of > > + * the enclave are cleared. See sgx_encl_cpumask() for details. > > + */ > > +static int sgx_enclave_etrack(struct sgx_encl *encl) { > > + void *epc_virt; > > + int ret; > > + > > + epc_virt = sgx_get_epc_virt_addr(encl->secs.epc_page); > > + ret = __etrack(epc_virt); > > + if (ret) { > > + /* > > + * ETRACK only fails when there is an OS issue. For > > + * example, two consecutive ETRACK was sent without > > + * completed IPI between. > > + */ > > + pr_err_once("ETRACK returned %d (0x%x)", ret, ret); > > + /* > > + * Send IPIs to kick CPUs out of the enclave and > > + * try ETRACK again. > > + */ > > + on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, > NULL, 1); > > + ret = __etrack(epc_virt); > > + if (ret) { > > + pr_err_once("ETRACK repeat returned %d (0x%x)", > > + ret, ret); > > + return -EFAULT; > > + } > > + } > > + on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1); > > + > > + return 0; > > +} > > + > > +/** > > + * sgx_enclave_restrict_perm() - Restrict EPCM permissions and align OS > view > > + * @encl: Enclave to which the pages belong. > > + * @modp: Checked parameters from user on which pages need > modifying. > > + * @secinfo_perm: New (validated) permission bits. > > + * > > + * Return: > > + * - 0: Success. > > + * - -errno: Otherwise. > > + */ > > +static long sgx_enclave_restrict_perm(struct sgx_encl *encl, > > + struct sgx_enclave_restrict_perm *modp, > > + u64 secinfo_perm) > > +{ > > + unsigned long vm_prot, run_prot_restore; > > + struct sgx_encl_page *entry; > > + struct sgx_secinfo secinfo; > > + unsigned long addr; > > + unsigned long c; > > + void *epc_virt; > > + int ret; > > + > > + memset(&secinfo, 0, sizeof(secinfo)); > > + secinfo.flags = secinfo_perm; > > + > > + vm_prot = vm_prot_from_secinfo(secinfo_perm); > > + > > + for (c = 0 ; c < modp->length; c += PAGE_SIZE) { > > + addr = encl->base + modp->offset + c; > > + > > + mutex_lock(&encl->lock); > > + > > + entry = sgx_encl_load_page(encl, addr); > > + if (IS_ERR(entry)) { > > + ret = PTR_ERR(entry) == -EBUSY ? -EAGAIN : - > EFAULT; > > + goto out_unlock; > > + } > > + > > + /* > > + * Changing EPCM permissions is only supported on regular > > + * SGX pages. Attempting this change on other pages will > > + * result in #PF. > > + */ > > + if (entry->type != SGX_PAGE_TYPE_REG) { > > + ret = -EINVAL; > > + goto out_unlock; > > + } > > + > > + /* > > + * Do not verify if current runtime protection bits are what > > + * is being requested. The enclave may have relaxed EPCM > > + * permissions calls without letting the kernel know and > > + * thus permission restriction may still be needed even if > > + * from the kernel's perspective the permissions are > unchanged. > > + */ > > + > > + /* New permissions should never exceed vetted > permissions. */ > > + if ((entry->vm_max_prot_bits & vm_prot) != vm_prot) { > > + ret = -EPERM; > > + goto out_unlock; > > + } > > + > > + /* Make sure page stays around while releasing mutex. */ > > + if (sgx_unmark_page_reclaimable(entry->epc_page)) { > > + ret = -EAGAIN; > > + goto out_unlock; > > + } > > + > > + /* > > + * Change runtime protection before zapping PTEs to ensure > > + * any new #PF uses new permissions. EPCM permissions (if > > + * needed) not changed yet. > > + */ > > + run_prot_restore = entry->vm_run_prot_bits; > > + entry->vm_run_prot_bits = vm_prot; > > + > > + mutex_unlock(&encl->lock); > > + /* > > + * Do not keep encl->lock because of dependency on > > + * mmap_lock acquired in sgx_zap_enclave_ptes(). > > + */ > > + sgx_zap_enclave_ptes(encl, addr); > > + > > + mutex_lock(&encl->lock); > > + > > + /* Change EPCM permissions. */ > > + epc_virt = sgx_get_epc_virt_addr(entry->epc_page); > > + ret = __emodpr(&secinfo, epc_virt); > > + if (encls_faulted(ret)) { > > + /* > > + * All possible faults should be avoidable: > > + * parameters have been checked, will only change > > + * permissions of a regular page, and no concurrent > > + * SGX1/SGX2 ENCLS instructions since these > > + * are protected with mutex. > > + */ > > + pr_err_once("EMODPR encountered exception > %d\n", > > + ENCLS_TRAPNR(ret)); > > + ret = -EFAULT; > > + goto out_prot_restore; > > + } > > + if (encls_failed(ret)) { > > + modp->result = ret; > > + ret = -EFAULT; > > + goto out_prot_restore; > > + } > > + > > + ret = sgx_enclave_etrack(encl); > > + if (ret) { > > + ret = -EFAULT; > > + goto out_reclaim; > > + } > > + > > + sgx_mark_page_reclaimable(entry->epc_page); > > + mutex_unlock(&encl->lock); > > + } > > + > > + ret = 0; > > + goto out; > > + > > +out_prot_restore: > > + entry->vm_run_prot_bits = run_prot_restore; > > +out_reclaim: > > + sgx_mark_page_reclaimable(entry->epc_page); > > +out_unlock: > > + mutex_unlock(&encl->lock); > > +out: > > + modp->count = c; > > + > > + return ret; > > +} > > + > > +/** > > + * sgx_ioc_enclave_restrict_perm() - handler for > > + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS > > + * @encl: an enclave pointer > > + * @arg: userspace pointer to a &struct sgx_enclave_restrict_perm > > + * instance > > + * > > + * SGX2 distinguishes between relaxing and restricting the enclave > > +page > > + * permissions maintained by the hardware (EPCM permissions) of pages > > + * belonging to an initialized enclave (after SGX_IOC_ENCLAVE_INIT). > > + * > > + * EPCM permissions cannot be restricted from within the enclave, the > > +enclave > > + * requires the kernel to run the privileged level 0 instructions > > +ENCLS[EMODPR] > > + * and ENCLS[ETRACK]. An attempt to relax EPCM permissions with this > > +call > > + * will be ignored by the hardware. > > + * > > + * Enclave page permissions are not allowed to exceed the maximum > > +vetted > > + * permissions maintained in &struct sgx_encl_page->vm_max_prot_bits. > > + * > > + * Return: > > + * - 0: Success > > + * - -errno: Otherwise > > + */ > > +static long sgx_ioc_enclave_restrict_perm(struct sgx_encl *encl, > > + void __user *arg) > > +{ > > + struct sgx_enclave_restrict_perm params; > > + u64 secinfo_perm; > > + long ret; > > + > > + ret = sgx_ioc_sgx2_ready(encl); > > + if (ret) > > + return ret; > > + > > + if (copy_from_user(¶ms, arg, sizeof(params))) > > + return -EFAULT; > > + > > + if (sgx_validate_offset_length(encl, params.offset, params.length)) > > + return -EINVAL; > > + > > + ret = sgx_perm_from_user_secinfo((void __user *)params.secinfo, > > + &secinfo_perm); > > + if (ret) > > + return ret; > > + > > + if (params.result || params.count) > > + return -EINVAL; > > + > > + ret = sgx_enclave_restrict_perm(encl, ¶ms, secinfo_perm); > > + > > + if (copy_to_user(arg, ¶ms, sizeof(params))) > > + return -EFAULT; > > + > > + return ret; > > +} > > + > > long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long > > arg) { > > struct sgx_encl *encl = filep->private_data; @@ -918,6 +1144,9 @@ > > long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) > > case SGX_IOC_ENCLAVE_RELAX_PERMISSIONS: > > ret = sgx_ioc_enclave_relax_perm(encl, (void __user *)arg); > > break; > > + case SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS: > > + ret = sgx_ioc_enclave_restrict_perm(encl, (void __user > *)arg); > > + break; > > default: > > ret = -ENOIOCTLCMD; > > break; > > -- > > 2.25.1 > > > > Just a suggestion but these might be a bit less cluttered explanations of > the fields: > > /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure > #[repr(C)] > pub struct RelaxPermissions { > /// In: starting page offset > offset: u64, > /// In: length of the address range (multiple of the page size) > length: u64, > /// In: SECINFO containing the relaxed permissions > secinfo: u64, > /// Out: length of the address range successfully changed > count: u64, > }; > > /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure > #[repr(C)] > pub struct RestrictPermissions { > /// In: starting page offset > offset: u64, > /// In: length of the address range (multiple of the page size) > length: u64, > /// In: SECINFO containing the restricted permissions > secinfo: u64, > /// In: ENCLU[EMODPR] return value > result: u64, > /// Out: length of the address range successfully changed > count: u64, > }; > > I can live with the current ones too but I rewrote them so that I can > quickly make sense of the fields later. It's Rust code but the point is > the documentation... > > Also, it should not be too much trouble to use the struct in user space > code even if the struct names are struct sgx_enclave_relax_permissions and > struct sgx_enclave_restrict_permissions, given that you most likely have > exactly single call-site in the run-time. > > Other than that, looks quite good. > > BR, Jarkko
Hi Jarkko, On 2/23/2022 7:46 AM, Jarkko Sakkinen wrote: > On Tue, Feb 22, 2022 at 10:35:04AM -0800, Reinette Chatre wrote: >> Hi Jarkko, >> >> On 2/20/2022 4:49 PM, Jarkko Sakkinen wrote: >>> On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote: >> >> ... >> >>>> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h >>>> index 5c678b27bb72..b0ffb80bc67f 100644 >>>> --- a/arch/x86/include/uapi/asm/sgx.h >>>> +++ b/arch/x86/include/uapi/asm/sgx.h >>>> @@ -31,6 +31,8 @@ enum sgx_page_flags { >>>> _IO(SGX_MAGIC, 0x04) >>>> #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \ >>>> _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm) >>>> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \ >>>> + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm) >>>> >>>> /** >>>> * struct sgx_enclave_create - parameter structure for the >>>> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm { >>>> __u64 count; >>>> }; >>>> >>>> +/** >>>> + * struct sgx_enclave_restrict_perm - parameters for ioctl >>>> + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS >>>> + * @offset: starting page offset (page aligned relative to enclave base >>>> + * address defined in SECS) >>>> + * @length: length of memory (multiple of the page size) >>>> + * @secinfo: address for the SECINFO data containing the new permission bits >>>> + * for pages in range described by @offset and @length >>>> + * @result: (output) SGX result code of ENCLS[EMODPR] function >>>> + * @count: (output) bytes successfully changed (multiple of page size) >>>> + */ >>>> +struct sgx_enclave_restrict_perm { >>>> + __u64 offset; >>>> + __u64 length; >>>> + __u64 secinfo; >>>> + __u64 result; >>>> + __u64 count; >>>> +}; >>>> + >>>> struct sgx_enclave_run; >>>> >>>> /** >> >> ... >> >>> >>> Just a suggestion but these might be a bit less cluttered explanations of >>> the fields: >>> >>> /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure >>> #[repr(C)] >>> pub struct RelaxPermissions { >>> /// In: starting page offset >>> offset: u64, >>> /// In: length of the address range (multiple of the page size) >>> length: u64, >>> /// In: SECINFO containing the relaxed permissions >>> secinfo: u64, >>> /// Out: length of the address range successfully changed >>> count: u64, >>> }; >>> >>> /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure >>> #[repr(C)] >>> pub struct RestrictPermissions { >>> /// In: starting page offset >>> offset: u64, >>> /// In: length of the address range (multiple of the page size) >>> length: u64, >>> /// In: SECINFO containing the restricted permissions >>> secinfo: u64, >>> /// In: ENCLU[EMODPR] return value >>> result: u64, >>> /// Out: length of the address range successfully changed >>> count: u64, >>> }; >> >> In your proposal you shorten the descriptions from the current implementation. >> I do consider the removed information valuable since I believe that it helps >> users understand the kernel interface requirements without needing to be >> familiar with or dig into the kernel code to understand how the provided data >> is used. >> >> For example, you shorten offset to "starting page offset", but what was removed >> was the requirement that this offset has to be page aligned and what the offset >> is relative to. I do believe summarizing these requirements upfront helps >> a user space developer by not needing to dig through kernel code later >> in order to understand why an -EINVAL was received. >> >> >>> I can live with the current ones too but I rewrote them so that I can >>> quickly make sense of the fields later. It's Rust code but the point is >>> the documentation... >> >> Since you do seem to be ok with the current descriptions I would prefer >> to keep them. > > Yeah, they are fine to me. > >>> Also, it should not be too much trouble to use the struct in user space >>> code even if the struct names are struct sgx_enclave_relax_permissions and >>> struct sgx_enclave_restrict_permissions, given that you most likely have >>> exactly single call-site in the run-time. >> >> Are you requesting that I make the following name changes? >> struct sgx_enclave_relax_perm -> struct sgx_enclave_relax_permissions >> struct sgx_enclave_restrict_perm -> struct sgx_enclave_restrict_permissions >> >> If so, do you want the function names also written out in this way? >> sgx_enclave_relax_perm() -> sgx_enclave_relax_permissions() >> sgx_ioc_enclave_relax_perm() -> sgx_ioc_enclave_relax_permissions() >> sgx_enclave_restrict_perm() -> sgx_enclave_restrict_permissions() >> sgx_ioc_enclave_restrict_perm() -> sgx_ioc_enclave_restrict_permissions() > > Yes, unless you have a specific reason to shorten them :-) Just aesthetic reasons ... having a long function name can look unbalanced if it has many parameters and if the parameters themselves are long it becomes hard to keep to the required line length. Even so, it does look as though the longest ones can be made to work within 80 characters: sgx_enclave_restrict_permissions(... struct sgx_enclave_restrict_permissions *modp, ...) Other (aesthetic) consequence would be, for example, the core sgx_ioctl() would now have some branches span more lines than other so it would not look as neat as now (this is subjective I know). Apart from the aesthetic reasons I do not have another reason not to make the change and I will do so in the next version. Reinette
Hi Vijay, On 2/23/2022 11:21 AM, Dhanraj, Vijay wrote: > Hi All, > > Regarding the recent update of splitting the page permissions changerequest > into two IOCTLS (RELAX and RESTRICT), can we combine them into one? That is, > revert to how it was done in the v1 version? While V1 did have a single ioctl() to handle both relaxing and restricting permissions it never was possible for the kernel to distinguish what the user intended. For this reason, even though there was a single ioctl() in V1, it implemented permission restriction while supporting permission relaxing as a side effect since the PTEs are flushed and new PTEs will support the new permission. A consequence was that the V1 SGX_IOC_PAGE_MODP required ENCLU[EACCEPT] from within the enclave even if it was only intended to be used to relax permissions. SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS in V2 is exactly the same as SGX_IOC_PAGE_MODP of V1. > > Why? Currently in Gramine (a library OS for unmodified applications, > https://gramineproject.io/) with the new proposed change, one needs > to store the page permission for each page or range of pages. And for > every request of `mmap` or `mprotect`, Gramine would have to do a lookup > of the page permissions for the request range and then call the respective > IOCTL either RESTRICT or RELAX. This seems a little overwhelming. Gramine would also need to know when to enter the enclave to run EMODPE, which goes in hand with running SGX_IOC_ENCLAVE_RELAX_PERMISSIONS. > > Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do an > `EACCEPT` irrespective of RELAX or RESTRICT page permission request? With this > approach, we can avoid storing page permissions and simplify the implementation. This should be possible with the current implementation, similar to previous implementation, but not optimal if only EMODPE followed by SGX_IOC_ENCLAVE_RELAX_PERMISSIONS is what is needed. > > I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` flows to do > TLB shootdowns which might not be needed for RELAX IOCTL but I am not sure what > will be the performance impact. Is there any data point to see the performance impact? It can be worse than just that. EMODPR requires the EPC page to be present and thus the page would need to be loaded from swap and decrypted if it is not present. This may also mean that existing EPC pages need to be swapped out (first blocked, then encrypted to backing storage, then the ETRACK flow followed by IPIs to ensure there are no more references to that page) ... before there is space available for needed page to be loaded and decrypted. That only takes care of the EMODPR ... which as you state needs to be followed by the ETRACK flow and IPIs. The above is also just for the OS portion - after that there is the EACCEPT that needs to be run from within the enclave for every page whether permissions were relaxed or restricted. This would be dependent on the implementation - whether the enclave is entered once per EACCEPT or once for all EACCEPTs. All of the above would be unnecessary if permissions were just relaxed from within the enclave while SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS used to perform the OS actions. The performance impact should be easy to determine: run both ioctl()s and compare how long they take. Since you are asking about Gramine this may be best to do in that environment but I can attempt something on your behalf by using the existing SGX selftest infrastructure. As an experiment I modified the existing "unclobbered_vdso_oversubscribed_remove" test case that currently runs the SGX_IOC_ENCLAVE_MODIFY_TYPE on a large memory region to instead run ioctl()s SGX_IOC_ENCLAVE_RELAX_PERMISSIONS and SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS. In my test I ran these ioctl()s on a 4GB memory range to amplify any performance impact since I was just measuring it by printing timestamps from user space. My result showed that: * Running SGX_IOC_ENCLAVE_RELAX_PERMISSIONS on the 4GB region took less than a second No EACCEPT needed from user space. * Running SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS on the 4GB region took about 20 seconds. * Running EACCEPT on each enclave page took an additional 20 seconds. (Please note that this is using a sub obtimal way of entering the enclave for each EACCEPT where it would be more efficient to enter the enclave once and run EACCEPT for each page without exiting the enclave.) The performance impact seems significant to me. Reinette
On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote: > Hi All, > > Regarding the recent update of splitting the page permissions change > request into two IOCTLS (RELAX and RESTRICT), can we combine them into > one? That is, revert to how it was done in the v1 version? They are logically separate complex functionalities: 1. "restrict" calls EMODPR and requires EACCEPT 2. "relax" increases permissions up to vetted ("EADD") and could be combined with EMODPE called inside enclave. I don't think it is a good idea. BR, Jarkko
On Wed, Feb 23, 2022 at 11:55:03AM -0800, Reinette Chatre wrote: > Hi Jarkko, > > On 2/23/2022 7:46 AM, Jarkko Sakkinen wrote: > > On Tue, Feb 22, 2022 at 10:35:04AM -0800, Reinette Chatre wrote: > >> Hi Jarkko, > >> > >> On 2/20/2022 4:49 PM, Jarkko Sakkinen wrote: > >>> On Mon, Feb 07, 2022 at 04:45:38PM -0800, Reinette Chatre wrote: > >> > >> ... > >> > >>>> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h > >>>> index 5c678b27bb72..b0ffb80bc67f 100644 > >>>> --- a/arch/x86/include/uapi/asm/sgx.h > >>>> +++ b/arch/x86/include/uapi/asm/sgx.h > >>>> @@ -31,6 +31,8 @@ enum sgx_page_flags { > >>>> _IO(SGX_MAGIC, 0x04) > >>>> #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \ > >>>> _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm) > >>>> +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \ > >>>> + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm) > >>>> > >>>> /** > >>>> * struct sgx_enclave_create - parameter structure for the > >>>> @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm { > >>>> __u64 count; > >>>> }; > >>>> > >>>> +/** > >>>> + * struct sgx_enclave_restrict_perm - parameters for ioctl > >>>> + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS > >>>> + * @offset: starting page offset (page aligned relative to enclave base > >>>> + * address defined in SECS) > >>>> + * @length: length of memory (multiple of the page size) > >>>> + * @secinfo: address for the SECINFO data containing the new permission bits > >>>> + * for pages in range described by @offset and @length > >>>> + * @result: (output) SGX result code of ENCLS[EMODPR] function > >>>> + * @count: (output) bytes successfully changed (multiple of page size) > >>>> + */ > >>>> +struct sgx_enclave_restrict_perm { > >>>> + __u64 offset; > >>>> + __u64 length; > >>>> + __u64 secinfo; > >>>> + __u64 result; > >>>> + __u64 count; > >>>> +}; > >>>> + > >>>> struct sgx_enclave_run; > >>>> > >>>> /** > >> > >> ... > >> > >>> > >>> Just a suggestion but these might be a bit less cluttered explanations of > >>> the fields: > >>> > >>> /// SGX_IOC_ENCLAVE_RELAX_PERMISSIONS parameter structure > >>> #[repr(C)] > >>> pub struct RelaxPermissions { > >>> /// In: starting page offset > >>> offset: u64, > >>> /// In: length of the address range (multiple of the page size) > >>> length: u64, > >>> /// In: SECINFO containing the relaxed permissions > >>> secinfo: u64, > >>> /// Out: length of the address range successfully changed > >>> count: u64, > >>> }; > >>> > >>> /// SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS parameter structure > >>> #[repr(C)] > >>> pub struct RestrictPermissions { > >>> /// In: starting page offset > >>> offset: u64, > >>> /// In: length of the address range (multiple of the page size) > >>> length: u64, > >>> /// In: SECINFO containing the restricted permissions > >>> secinfo: u64, > >>> /// In: ENCLU[EMODPR] return value > >>> result: u64, > >>> /// Out: length of the address range successfully changed > >>> count: u64, > >>> }; > >> > >> In your proposal you shorten the descriptions from the current implementation. > >> I do consider the removed information valuable since I believe that it helps > >> users understand the kernel interface requirements without needing to be > >> familiar with or dig into the kernel code to understand how the provided data > >> is used. > >> > >> For example, you shorten offset to "starting page offset", but what was removed > >> was the requirement that this offset has to be page aligned and what the offset > >> is relative to. I do believe summarizing these requirements upfront helps > >> a user space developer by not needing to dig through kernel code later > >> in order to understand why an -EINVAL was received. > >> > >> > >>> I can live with the current ones too but I rewrote them so that I can > >>> quickly make sense of the fields later. It's Rust code but the point is > >>> the documentation... > >> > >> Since you do seem to be ok with the current descriptions I would prefer > >> to keep them. > > > > Yeah, they are fine to me. > > > >>> Also, it should not be too much trouble to use the struct in user space > >>> code even if the struct names are struct sgx_enclave_relax_permissions and > >>> struct sgx_enclave_restrict_permissions, given that you most likely have > >>> exactly single call-site in the run-time. > >> > >> Are you requesting that I make the following name changes? > >> struct sgx_enclave_relax_perm -> struct sgx_enclave_relax_permissions > >> struct sgx_enclave_restrict_perm -> struct sgx_enclave_restrict_permissions > >> > >> If so, do you want the function names also written out in this way? > >> sgx_enclave_relax_perm() -> sgx_enclave_relax_permissions() > >> sgx_ioc_enclave_relax_perm() -> sgx_ioc_enclave_relax_permissions() > >> sgx_enclave_restrict_perm() -> sgx_enclave_restrict_permissions() > >> sgx_ioc_enclave_restrict_perm() -> sgx_ioc_enclave_restrict_permissions() > > > > Yes, unless you have a specific reason to shorten them :-) > > Just aesthetic reasons ... having a long function name can look unbalanced > if it has many parameters and if the parameters themselves are long it > becomes hard to keep to the required line length. > > Even so, it does look as though the longest ones can be made to work within 80 > characters: > sgx_enclave_restrict_permissions(... > struct sgx_enclave_restrict_permissions *modp, > ...) > > Other (aesthetic) consequence would be, for example, the core sgx_ioctl() would > now have some branches span more lines than other so it would not look as neat as > now (this is subjective I know). > > Apart from the aesthetic reasons I do not have another reason not to make the > change and I will do so in the next version. IMHO, for one call site aesthics reason in alignment is less important than a no-brainer function name. BR, Jarkko
On Mon, Feb 28, 2022 at 01:25:07PM +0100, Jarkko Sakkinen wrote: > On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote: > > Hi All, > > > > Regarding the recent update of splitting the page permissions change > > request into two IOCTLS (RELAX and RESTRICT), can we combine them into > > one? That is, revert to how it was done in the v1 version? > > They are logically separate complex functionalities: > > 1. "restrict" calls EMODPR and requires EACCEPT > 2. "relax" increases permissions up to vetted ("EADD") and could be > combined with EMODPE called inside enclave. > > I don't think it is a good idea. I.e. in microarchitecture there is no EMODP but two different flows, and thus it is not sane to act like there was with that kind of ioctl. It is as granular as the hardware is this way, and I think that is common sense. It would make much sense as combining ECREATE/EADD/EINIT into a single multi-function ioctl. Often user space needs to be anyway have at least some logically distinct flows fore these. BR, Jarkko
On 2/28/22 04:24, Jarkko Sakkinen wrote: >> Regarding the recent update of splitting the page permissions change >> request into two IOCTLS (RELAX and RESTRICT), can we combine them into >> one? That is, revert to how it was done in the v1 version? > They are logically separate complex functionalities: > > 1. "restrict" calls EMODPR and requires EACCEPT > 2. "relax" increases permissions up to vetted ("EADD") and could be > combined with EMODPE called inside enclave. It would be great to have a _slightly_ better justification than that. Existing permission interfaces like chmod or mprotect() don't have this asymmetry. I think you're saying that the underlying hardware implementation is asymmetric, so the interface should be too. I don't find that argument very convincing. If the hardware interface is arcane and we can make it look more sane in the ioctl() layer, we should that, asymmetry or not. If we can't make it any more sane, let's say why the ioctl() must or should be asymmetric. The SGX2 page permission mechanism is horribly counter intuitive. *Everybody* that looks at it thinks that it's wrong. That means that we have a lot of work ahead of us to explain the interfaces that get layered on top.
> On 2/28/22 04:24, Jarkko Sakkinen wrote: > >> Regarding the recent update of splitting the page permissions change > >> request into two IOCTLS (RELAX and RESTRICT), can we combine them > >> into one? That is, revert to how it was done in the v1 version? > > They are logically separate complex functionalities: > > > > 1. "restrict" calls EMODPR and requires EACCEPT 2. "relax" increases > > permissions up to vetted ("EADD") and could be > > combined with EMODPE called inside enclave. > > It would be great to have a _slightly_ better justification than that. > Existing permission interfaces like chmod or mprotect() don't have this > asymmetry. > > I think you're saying that the underlying hardware implementation is > asymmetric, so the interface should be too. I don't find that argument very > convincing. If the hardware interface is arcane and we can make it look more > sane in the ioctl() layer, we should that, asymmetry or not. > Very nice analogy with `mprotect` and agree to this. It would be simpler from user space point of view if we can abstract this and maintain a single interface to relax or restrict permission. But if committee feels having two IOCTLS is the way, then will modify Gramine to adopt this approach.
On Mon, Feb 28, 2022 at 07:16:22AM -0800, Dave Hansen wrote: > On 2/28/22 04:24, Jarkko Sakkinen wrote: > >> Regarding the recent update of splitting the page permissions change > >> request into two IOCTLS (RELAX and RESTRICT), can we combine them into > >> one? That is, revert to how it was done in the v1 version? > > They are logically separate complex functionalities: > > > > 1. "restrict" calls EMODPR and requires EACCEPT > > 2. "relax" increases permissions up to vetted ("EADD") and could be > > combined with EMODPE called inside enclave. > > It would be great to have a _slightly_ better justification than that. > Existing permission interfaces like chmod or mprotect() don't have this > asymmetry. > > I think you're saying that the underlying hardware implementation is > asymmetric, so the interface should be too. I don't find that argument > very convincing. If the hardware interface is arcane and we can make it > look more sane in the ioctl() layer, we should that, asymmetry or not. That is my argument, yes. > If we can't make it any more sane, let's say why the ioctl() must or > should be asymmetric. Perhaps underling this asymmetry in kdoc would be enough. > The SGX2 page permission mechanism is horribly counter intuitive. > *Everybody* that looks at it thinks that it's wrong. That means that we > have a lot of work ahead of us to explain the interfaces that get > layered on top. I fully agree on this :-) With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but obviously new RX pages are now out of the picture: /* * Adding a regular page that is architecturally allowed to only * be created with RW permissions. * TBD: Interface with user space policy to support max permissions * of RWX. */ prot = PROT_READ | PROT_WRITE; encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits; If that TBD is left out to the final version the page augmentation has a risk of a API bottleneck, and that risk can realize then also in the page permission ioctls. I.e. now any review comment is based on not fully known territory, we have one known unknown, and some unknown unknowns from unpredictable effect to future API changes. BR, Jarkko
On Tue, Mar 01, 2022 at 02:26:48PM +0100, Jarkko Sakkinen wrote: > On Mon, Feb 28, 2022 at 07:16:22AM -0800, Dave Hansen wrote: > > On 2/28/22 04:24, Jarkko Sakkinen wrote: > > >> Regarding the recent update of splitting the page permissions change > > >> request into two IOCTLS (RELAX and RESTRICT), can we combine them into > > >> one? That is, revert to how it was done in the v1 version? > > > They are logically separate complex functionalities: > > > > > > 1. "restrict" calls EMODPR and requires EACCEPT > > > 2. "relax" increases permissions up to vetted ("EADD") and could be > > > combined with EMODPE called inside enclave. > > > > It would be great to have a _slightly_ better justification than that. > > Existing permission interfaces like chmod or mprotect() don't have this > > asymmetry. > > > > I think you're saying that the underlying hardware implementation is > > asymmetric, so the interface should be too. I don't find that argument > > very convincing. If the hardware interface is arcane and we can make it > > look more sane in the ioctl() layer, we should that, asymmetry or not. > > That is my argument, yes. > > > If we can't make it any more sane, let's say why the ioctl() must or > > should be asymmetric. > > Perhaps underling this asymmetry in kdoc would be enough. > > > The SGX2 page permission mechanism is horribly counter intuitive. > > *Everybody* that looks at it thinks that it's wrong. That means that we > > have a lot of work ahead of us to explain the interfaces that get > > layered on top. > > I fully agree on this :-) > > With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of > EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but > obviously new RX pages are now out of the picture: > > > /* > * Adding a regular page that is architecturally allowed to only > * be created with RW permissions. > * TBD: Interface with user space policy to support max permissions > * of RWX. > */ > prot = PROT_READ | PROT_WRITE; > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); > encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits; > > If that TBD is left out to the final version the page augmentation has a > risk of a API bottleneck, and that risk can realize then also in the page > permission ioctls. > > I.e. now any review comment is based on not fully known territory, we have > one known unknown, and some unknown unknowns from unpredictable effect to > future API changes. I think the best way to move forward would be to do EAUG's explicitly with an ioctl that could also include secinfo for permissions. Then you can easily do the rest with EACCEPTCOPY inside the enclave. Putting EAUG to the #PF handler and implicitly call it just too flakky and hard to make deterministic for e.g. JIT compiler in our use case (not to mention that JIT is not possible at all because inability to do RX pages). BR, Jarkko
Hi Jarkko, On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote: >> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of >> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but >> obviously new RX pages are now out of the picture: >> >> >> /* >> * Adding a regular page that is architecturally allowed to only >> * be created with RW permissions. >> * TBD: Interface with user space policy to support max permissions >> * of RWX. >> */ >> prot = PROT_READ | PROT_WRITE; >> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); >> encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits; >> >> If that TBD is left out to the final version the page augmentation has a >> risk of a API bottleneck, and that risk can realize then also in the page >> permission ioctls. >> >> I.e. now any review comment is based on not fully known territory, we have >> one known unknown, and some unknown unknowns from unpredictable effect to >> future API changes. The plan to complete the "TBD" in the above snippet was to follow this work with user policy integration at this location. On a high level the plan was for this to look something like: /* * Adding a regular page that is architecturally allowed to only * be created with RW permissions. * Interface with user space policy to support max permissions * of RWX. */ prot = PROT_READ | PROT_WRITE; encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); if (user space policy allows RWX on dynamically added pages) encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE | PROT_EXEC, 0); else encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE, 0); The work that follows this series aimed to do the integration with user space policy. > I think the best way to move forward would be to do EAUG's explicitly with > an ioctl that could also include secinfo for permissions. Then you can > easily do the rest with EACCEPTCOPY inside the enclave. SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for this purpose. It already includes SECINFO which may also be useful if needing to later support EAUG of PT_SS* pages. How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES after enclave initialization on any memory region within the enclave where pages are planned to be added dynamically. This ioctl() calls EAUG to add the new pages with RW permissions and their vm_max_prot_bits can be set to the permissions found in the included SECINFO. This will support later EACCEPTCOPY as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS The big question is whether communicating user policy after enclave initialization via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would appreciate a confirmation on this direction considering the significant history behind this topic. > Putting EAUG to the #PF handler and implicitly call it just too flakky and > hard to make deterministic for e.g. JIT compiler in our use case (not to > mention that JIT is not possible at all because inability to do RX pages). In this series this is indeed not possible because it lacks the user policy integration. JIT will be possible after user policy integration. Reinette
On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote: > Hi Jarkko, > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote: > >> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of > >> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but > >> obviously new RX pages are now out of the picture: > >> > >> > >> /* > >> * Adding a regular page that is architecturally allowed to only > >> * be created with RW permissions. > >> * TBD: Interface with user space policy to support max permissions > >> * of RWX. > >> */ > >> prot = PROT_READ | PROT_WRITE; > >> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); > >> encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits; > >> > >> If that TBD is left out to the final version the page augmentation has a > >> risk of a API bottleneck, and that risk can realize then also in the page > >> permission ioctls. > >> > >> I.e. now any review comment is based on not fully known territory, we have > >> one known unknown, and some unknown unknowns from unpredictable effect to > >> future API changes. > > The plan to complete the "TBD" in the above snippet was to follow this work > with user policy integration at this location. On a high level the plan was > for this to look something like: > > > /* > * Adding a regular page that is architecturally allowed to only > * be created with RW permissions. > * Interface with user space policy to support max permissions > * of RWX. > */ > prot = PROT_READ | PROT_WRITE; > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); > > if (user space policy allows RWX on dynamically added pages) > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE | PROT_EXEC, 0); > else > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE, 0); > > The work that follows this series aimed to do the integration with user > space policy. What do you mean by "user space policy" anyway exactly? I'm sorry but I just don't fully understand this. It's too big of a risk to accept this series without X taken care of. Patch series should neither have TODO nor TBD comments IMHO. I don't want to ack a series based on speculation what might happen in the future. > > I think the best way to move forward would be to do EAUG's explicitly with > > an ioctl that could also include secinfo for permissions. Then you can > > easily do the rest with EACCEPTCOPY inside the enclave. > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for > this purpose. It already includes SECINFO which may also be useful if > needing to later support EAUG of PT_SS* pages. You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day. And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is this weird thing added to the #PF handler? Why is it added at all then? > How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES > after enclave initialization on any memory region within the enclave where > pages are planned to be added dynamically. This ioctl() calls EAUG to add the > new pages with RW permissions and their vm_max_prot_bits can be set to the > permissions found in the included SECINFO. This will support later EACCEPTCOPY > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS I don't like this type of re-use of the existing API. > The big question is whether communicating user policy after enclave initialization > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would > appreciate a confirmation on this direction considering the significant history > behind this topic. I have no idea because I don't know what is user space policy. > > Putting EAUG to the #PF handler and implicitly call it just too flakky and > > hard to make deterministic for e.g. JIT compiler in our use case (not to > > mention that JIT is not possible at all because inability to do RX pages). > > In this series this is indeed not possible because it lacks the user policy > integration. JIT will be possible after user policy integration. Like this I don't what this series can be used in practice. Majority of practical use cases for EDMM boil down to having a way to add new executable code (not just Enarx). > Reinette BR, Jarkko
On Wed, Mar 02, 2022 at 03:05:25AM +0100, Jarkko Sakkinen wrote: > > The work that follows this series aimed to do the integration with user > > space policy. > > What do you mean by "user space policy" anyway exactly? I'm sorry but I > just don't fully understand this. > > It's too big of a risk to accept this series without X taken care of. Patch > series should neither have TODO nor TBD comments IMHO. I don't want to ack > a series based on speculation what might happen in the future. If I accept this, then I'm kind of pre-acking code that I have no idea what it looks like, can it be acked, or am I doing the right thing for the kernel by acking this. It's unfortunately force majeure situation for me. I simply could not ack this, whether I want it or not. BR, Jarkko
On Wed, Mar 02, 2022 at 03:11:06AM +0100, Jarkko Sakkinen wrote: > On Wed, Mar 02, 2022 at 03:05:25AM +0100, Jarkko Sakkinen wrote: > > > The work that follows this series aimed to do the integration with user > > > space policy. > > > > What do you mean by "user space policy" anyway exactly? I'm sorry but I > > just don't fully understand this. > > > > It's too big of a risk to accept this series without X taken care of. Patch > > series should neither have TODO nor TBD comments IMHO. I don't want to ack > > a series based on speculation what might happen in the future. > > If I accept this, then I'm kind of pre-acking code that I have no idea what > it looks like, can it be acked, or am I doing the right thing for the > kernel by acking this. > > It's unfortunately force majeure situation for me. I simply could not ack > this, whether I want it or not. I'd actually to leave out permission change madness completely out of this patch set, as we all know it is a grazy beast of microarchitecture. For user space having that is less critical than having executable pages. Simply with EAUG/EACCEPTCOPY you can already populate enclave with any permissions you had in mind. Augmenting alone would be logically consistent patch set that is actually usable for many workloads. Now there is half-broken augmenting (this is even writtend down to the TBD comment) and complex code for EMODPR and EMODT that is usable only for kselftests and not much else before there is fully working augmenting. This way we get actually sound patch set that is easy to review and apply to the mainline. It is also factors easier for you to iterate a smaller set of patches. After this it is so much easier to start to look at remaining functionality, and at the same time augmenting part can be stress tested with real-world code and it will mature quickly. This whole thing *really* needs a serious U-turn on how it is delivered to the upstream. Sometimes it is better just to admit that this didn't start with the right foot. BR, Jarkko
Hi Jarkko, On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote: > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote: >> Hi Jarkko, >> >> On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote: >>>> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version of >>>> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but >>>> obviously new RX pages are now out of the picture: >>>> >>>> >>>> /* >>>> * Adding a regular page that is architecturally allowed to only >>>> * be created with RW permissions. >>>> * TBD: Interface with user space policy to support max permissions >>>> * of RWX. >>>> */ >>>> prot = PROT_READ | PROT_WRITE; >>>> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); >>>> encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits; >>>> >>>> If that TBD is left out to the final version the page augmentation has a >>>> risk of a API bottleneck, and that risk can realize then also in the page >>>> permission ioctls. >>>> >>>> I.e. now any review comment is based on not fully known territory, we have >>>> one known unknown, and some unknown unknowns from unpredictable effect to >>>> future API changes. >> >> The plan to complete the "TBD" in the above snippet was to follow this work >> with user policy integration at this location. On a high level the plan was >> for this to look something like: >> >> >> /* >> * Adding a regular page that is architecturally allowed to only >> * be created with RW permissions. >> * Interface with user space policy to support max permissions >> * of RWX. >> */ >> prot = PROT_READ | PROT_WRITE; >> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); >> >> if (user space policy allows RWX on dynamically added pages) >> encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE | PROT_EXEC, 0); >> else >> encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | PROT_WRITE, 0); >> >> The work that follows this series aimed to do the integration with user >> space policy. > > What do you mean by "user space policy" anyway exactly? I'm sorry but I > just don't fully understand this. My apologies - I just assumed that you would need no reminder about this contentious part of SGX history. Essentially it means that, yes, the kernel could theoretically permit any kind of access to any file/page, but some accesses are known to generally be a bad idea - like making memory executable as well as writable - and thus there are additional checks based on what user space permits before the kernel allows such accesses. For example, mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() User policy and SGX has seen significant discussion. Some notable threads: https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/ https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/ > It's too big of a risk to accept this series without X taken care of. Patch > series should neither have TODO nor TBD comments IMHO. I don't want to ack > a series based on speculation what might happen in the future. ok > >>> I think the best way to move forward would be to do EAUG's explicitly with >>> an ioctl that could also include secinfo for permissions. Then you can >>> easily do the rest with EACCEPTCOPY inside the enclave. >> >> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for >> this purpose. It already includes SECINFO which may also be useful if >> needing to later support EAUG of PT_SS* pages. > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day. I could, yes. > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is this weird > thing added to the #PF handler? Why is it added at all then? I was just speculating in my response, there is no plan to extend SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of). >> How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES >> after enclave initialization on any memory region within the enclave where >> pages are planned to be added dynamically. This ioctl() calls EAUG to add the >> new pages with RW permissions and their vm_max_prot_bits can be set to the >> permissions found in the included SECINFO. This will support later EACCEPTCOPY >> as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS > > I don't like this type of re-use of the existing API. I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus after considering the user policy question (above) and performance trade-off (more below). > >> The big question is whether communicating user policy after enclave initialization >> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would >> appreciate a confirmation on this direction considering the significant history >> behind this topic. > > I have no idea because I don't know what is user space policy. This discussion is about some enclave usages needing RWX permissions on dynamically added enclave pages. RWX permissions on dynamically added pages is not something that should blindly be allowed for all SGX enclaves but instead the user needs to explicitly allow specific enclaves to have such ability. This is equivalent to (but not the same as) what exists in Linux today with LSM. As seen in mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able to make files and memory be both writable and executable, but it would only do so for those files and memory that the LSM (which is how user policy is communicated, like SELinux) indicates it is allowed, not blindly do so for all files and all memory. >>> Putting EAUG to the #PF handler and implicitly call it just too flakky and >>> hard to make deterministic for e.g. JIT compiler in our use case (not to >>> mention that JIT is not possible at all because inability to do RX pages). I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic but from what I understand it would have a performance impact since it would require all memory that may be needed by the enclave be pre-allocated from outside the enclave and not just dynamically allocated from within the enclave at the time it is needed. Would such a performance impact be acceptable? >> In this series this is indeed not possible because it lacks the user policy >> integration. JIT will be possible after user policy integration. > > Like this I don't what this series can be used in practice. > > Majority of practical use cases for EDMM boil down to having a way to add > new executable code (not just Enarx). > Understood. On 3/1/2022 8:03 PM, Jarkko Sakkinen wrote: > I'd actually to leave out permission change madness completely out of this > patch set, as we all know it is a grazy beast of microarchitecture. For > user space having that is less critical than having executable pages. > > Simply with EAUG/EACCEPTCOPY you can already populate enclave with any > permissions you had in mind. Augmenting alone would be logically consistent > patch set that is actually usable for many workloads. Support for permission changes is required in order to support dynamically added pages (EAUG pages) to be made executable. Yes, you could make a dynamically added page have executable EPCM permissions using EACCEPTCOPY but the kernel is still required to make the PTE executable. > Now there is half-broken augmenting (this is even writtend down to the TBD > comment) and complex code for EMODPR and EMODT that is usable only for > kselftests and not much else before there is fully working augmenting. > > This way we get actually sound patch set that is easy to review and apply > to the mainline. It is also factors easier for you to iterate a smaller > set of patches. > > After this it is so much easier to start to look at remaining functionality, > and at the same time augmenting part can be stress tested with real-world > code and it will mature quickly. > > This whole thing *really* needs a serious U-turn on how it is delivered to > the upstream. Sometimes it is better just to admit that this didn't start > with the right foot. As mentioned above, from what I understand the support for (as you state) the "majority of practical use cases" on dynamically added pages do require supporting permission changes also. It thus seems to me that it would help consuming this feature if dynamic addition of pages and permission changes are presented together. The SGX2 functionality that remains after that is the changing of page type, which forms part of the page removal flow. In this regard I also find that presenting the page addition flow at the same time as the page removal flow would make these features easier to consume. I think supporting the addition of pages and leaving page removal to "future work" would be similarly frustrating to consume. Reinette
Hi all, On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre <reinette.chatre@intel.com> wrote: > Hi Jarkko, > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote: >> On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote: >>> Hi Jarkko, >>> >>> On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote: >>>>> With EACCEPTCOPY (kudos to Mark S. for reminding me of this version >>>>> of >>>>> EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but >>>>> obviously new RX pages are now out of the picture: >>>>> >>>>> >>>>> /* >>>>> * Adding a regular page that is architecturally allowed to only >>>>> * be created with RW permissions. >>>>> * TBD: Interface with user space policy to support max permissions >>>>> * of RWX. >>>>> */ >>>>> prot = PROT_READ | PROT_WRITE; >>>>> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); >>>>> encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits; >>>>> >>>>> If that TBD is left out to the final version the page augmentation >>>>> has a >>>>> risk of a API bottleneck, and that risk can realize then also in the >>>>> page >>>>> permission ioctls. >>>>> >>>>> I.e. now any review comment is based on not fully known territory, >>>>> we have >>>>> one known unknown, and some unknown unknowns from unpredictable >>>>> effect to >>>>> future API changes. >>> >>> The plan to complete the "TBD" in the above snippet was to follow this >>> work >>> with user policy integration at this location. On a high level the >>> plan was >>> for this to look something like: >>> >>> >>> /* >>> * Adding a regular page that is architecturally allowed to only >>> * be created with RW permissions. >>> * Interface with user space policy to support max permissions >>> * of RWX. >>> */ >>> prot = PROT_READ | PROT_WRITE; >>> encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); >>> >>> if (user space policy allows RWX on dynamically added pages) >>> encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | >>> PROT_WRITE | PROT_EXEC, 0); >>> else >>> encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | >>> PROT_WRITE, 0); >>> >>> The work that follows this series aimed to do the integration with user >>> space policy. >> >> What do you mean by "user space policy" anyway exactly? I'm sorry but I >> just don't fully understand this. > > My apologies - I just assumed that you would need no reminder about this > contentious > part of SGX history. Essentially it means that, yes, the kernel could > theoretically > permit any kind of access to any file/page, but some accesses are known > to generally > be a bad idea - like making memory executable as well as writable - and > thus there > are additional checks based on what user space permits before the kernel > allows > such accesses. > > For example, > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() > > User policy and SGX has seen significant discussion. Some notable > threads: > https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/ > https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/ > >> It's too big of a risk to accept this series without X taken care of. >> Patch >> series should neither have TODO nor TBD comments IMHO. I don't want to >> ack >> a series based on speculation what might happen in the future. > > ok > >> >>>> I think the best way to move forward would be to do EAUG's explicitly >>>> with >>>> an ioctl that could also include secinfo for permissions. Then you can >>>> easily do the rest with EACCEPTCOPY inside the enclave. >>> >>> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for >>> this purpose. It already includes SECINFO which may also be useful if >>> needing to later support EAUG of PT_SS* pages. >> >> You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a >> day. > > I could, yes. > >> And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is this >> weird >> thing added to the #PF handler? Why is it added at all then? > > I was just speculating in my response, there is no plan to extend > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of). > >>> How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES >>> after enclave initialization on any memory region within the enclave >>> where >>> pages are planned to be added dynamically. This ioctl() calls EAUG to >>> add the >>> new pages with RW permissions and their vm_max_prot_bits can be set to >>> the >>> permissions found in the included SECINFO. This will support later >>> EACCEPTCOPY >>> as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS >> >> I don't like this type of re-use of the existing API. > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus > after > considering the user policy question (above) and performance trade-off > (more below). > >> >>> The big question is whether communicating user policy after enclave >>> initialization >>> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? >>> I would >>> appreciate a confirmation on this direction considering the >>> significant history >>> behind this topic. >> >> I have no idea because I don't know what is user space policy. > > This discussion is about some enclave usages needing RWX permissions > on dynamically added enclave pages. RWX permissions on dynamically added > pages is > not something that should blindly be allowed for all SGX enclaves but > instead the user > needs to explicitly allow specific enclaves to have such ability. This > is equivalent > to (but not the same as) what exists in Linux today with LSM. As seen in > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able > to make > files and memory be both writable and executable, but it would only do > so for those > files and memory that the LSM (which is how user policy is communicated, > like SELinux) > indicates it is allowed, not blindly do so for all files and all memory. > >>>> Putting EAUG to the #PF handler and implicitly call it just too >>>> flakky and >>>> hard to make deterministic for e.g. JIT compiler in our use case (not >>>> to >>>> mention that JIT is not possible at all because inability to do RX >>>> pages). > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic > but from > what I understand it would have a performance impact since it would > require all memory > that may be needed by the enclave be pre-allocated from outside the > enclave and not > just dynamically allocated from within the enclave at the time it is > needed. > > Would such a performance impact be acceptable? > User space won't always have enough info to decide whether the pages to be EAUG'd immediately. In some cases (shared libraries, JVM for example) lots of code/data pages can be mapped but never actually touched. One enclave/process does not know if any other more important enclave/process would need the EPC. It should be for kernel to make the final decision as it has overall picture of the system EPC usage and availability. User space can provide a hint (similar to MAP_POPULATE) to kernel that the mmap'd area will soon be needed and kernel should EAUG as soon as it sees fit based on current system usage. Or kernel implement some policy to avoid #PF triggered by EACCEPT, for example, if the system has ton of free EPC relative to the requested by mmap at the time. BR Haitao
Hi Haitao, On 3/3/2022 8:08 AM, Haitao Huang wrote: > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre <reinette.chatre@intel.com> wrote: >> On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote: >>> On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote: >>>> On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote: ... >>>>> I think the best way to move forward would be to do EAUG's explicitly with >>>>> an ioctl that could also include secinfo for permissions. Then you can >>>>> easily do the rest with EACCEPTCOPY inside the enclave. >>>> >>>> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for >>>> this purpose. It already includes SECINFO which may also be useful if >>>> needing to later support EAUG of PT_SS* pages. >>> >>> You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day. >> >> I could, yes. >> >>> And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is this weird >>> thing added to the #PF handler? Why is it added at all then? >> >> I was just speculating in my response, there is no plan to extend >> SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of). >> >>>> How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES >>>> after enclave initialization on any memory region within the enclave where >>>> pages are planned to be added dynamically. This ioctl() calls EAUG to add the >>>> new pages with RW permissions and their vm_max_prot_bits can be set to the >>>> permissions found in the included SECINFO. This will support later EACCEPTCOPY >>>> as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS >>> >>> I don't like this type of re-use of the existing API. >> >> I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus after >> considering the user policy question (above) and performance trade-off (more below). >> >>> >>>> The big question is whether communicating user policy after enclave initialization >>>> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would >>>> appreciate a confirmation on this direction considering the significant history >>>> behind this topic. >>> >>> I have no idea because I don't know what is user space policy. >> >> This discussion is about some enclave usages needing RWX permissions >> on dynamically added enclave pages. RWX permissions on dynamically added pages is >> not something that should blindly be allowed for all SGX enclaves but instead the user >> needs to explicitly allow specific enclaves to have such ability. This is equivalent >> to (but not the same as) what exists in Linux today with LSM. As seen in >> mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able to make >> files and memory be both writable and executable, but it would only do so for those >> files and memory that the LSM (which is how user policy is communicated, like SELinux) >> indicates it is allowed, not blindly do so for all files and all memory. >> >>>>> Putting EAUG to the #PF handler and implicitly call it just too flakky and >>>>> hard to make deterministic for e.g. JIT compiler in our use case (not to >>>>> mention that JIT is not possible at all because inability to do RX pages). >> >> I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic but from >> what I understand it would have a performance impact since it would require all memory >> that may be needed by the enclave be pre-allocated from outside the enclave and not >> just dynamically allocated from within the enclave at the time it is needed. >> >> Would such a performance impact be acceptable? >> > > User space won't always have enough info to decide whether the pages to be EAUG'd immediately. In some cases (shared libraries, JVM for example) lots of code/data pages can be mapped but never actually touched. One enclave/process does not know if any other more important enclave/process would need the EPC. > > It should be for kernel to make the final decision as it has overall picture of the system EPC usage and availability. > > User space can provide a hint (similar to MAP_POPULATE) to kernel that the mmap'd area will soon be needed and kernel should EAUG as soon as it sees fit based on current system usage. Or kernel implement some policy to avoid #PF triggered by EACCEPT, for example, if the system has ton of free EPC relative to the requested by mmap at the time. > mmap(...,...,...,MAP_POPULATE,...,...) would be most fitting and ideal since it would enable user space to indicate that the pages would be needed soon and the kernel can then prefault the pages. This is already desirable in the current implementation to avoid the first page fault on pages added via SGX_IOC_ENCLAVE_ADD_PAGES. Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability then I believe that SGX would benefit. Reinette
On 3/3/22 13:23, Reinette Chatre wrote: > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability > then I believe that SGX would benefit. Some Intel folks asked for this quite a while ago. I think it's entirely doable: add a new vm_ops->populate() function that will allow ignoring VM_IO|VM_PFNMAP if present. Or, if nobody wants to waste all of the vm_ops space, just add an arch_vma_populate() or something which can call over into SGX. I'll happily review the patches if anyone can put such a beast together.
On Wed, Mar 02, 2022 at 02:57:45PM -0800, Reinette Chatre wrote: > > What do you mean by "user space policy" anyway exactly? I'm sorry but I > > just don't fully understand this. > > My apologies - I just assumed that you would need no reminder about this contentious > part of SGX history. Essentially it means that, yes, the kernel could theoretically > permit any kind of access to any file/page, but some accesses are known to generally > be a bad idea - like making memory executable as well as writable - and thus there > are additional checks based on what user space permits before the kernel allows > such accesses. The device files are limited by a GID (in systemd upstream), which is a "user policy". What you want to add and why augmentation cannot be made complete before the unknown factor is added to the access control? > >>> I think the best way to move forward would be to do EAUG's explicitly with > >>> an ioctl that could also include secinfo for permissions. Then you can > >>> easily do the rest with EACCEPTCOPY inside the enclave. > >> > >> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for > >> this purpose. It already includes SECINFO which may also be useful if > >> needing to later support EAUG of PT_SS* pages. > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day. > > I could, yes. And this enables EACCEPTCOPY pattern nicely. E.g. you can implement mmap() with EAUG and then EACCEPTCOPY feeded with permissions and a zero page: 1. enclave calls back to host to do mmap() 2. host does eaug on given range and enter back to enclave. 3. enclave does eacceptcopy with given permissions and a zero page. > > I don't like this type of re-use of the existing API. > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus after > considering the user policy question (above) and performance trade-off (more below). Ok. If adding this would be a bottleneck it would be already persistent int "add pages", so whatever limitation there might be, it already exist. Thus, logically, that could be safely added without worrying about user policies all that much... > > > > >> The big question is whether communicating user policy after enclave initialization > >> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would > >> appreciate a confirmation on this direction considering the significant history > >> behind this topic. > > > > I have no idea because I don't know what is user space policy. > > This discussion is about some enclave usages needing RWX permissions > on dynamically added enclave pages. RWX permissions on dynamically added pages is I'm not sure if that is actually necessary, if you use EAUG-EACCEPTCOPY type of pattern. Please correct if I'm wrong. > not something that should blindly be allowed for all SGX enclaves but instead the user > needs to explicitly allow specific enclaves to have such ability. This is equivalent > to (but not the same as) what exists in Linux today with LSM. As seen in > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able to make > files and memory be both writable and executable, but it would only do so for those > files and memory that the LSM (which is how user policy is communicated, like SELinux) > indicates it is allowed, not blindly do so for all files and all memory. We could also potentially make LSM hooks to ioctls, if that is ever needed. And as I said earlier, EAUG ioctl does not make things any worse they might be. > >>> Putting EAUG to the #PF handler and implicitly call it just too flakky and > >>> hard to make deterministic for e.g. JIT compiler in our use case (not to > >>> mention that JIT is not possible at all because inability to do RX pages). > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic but from > what I understand it would have a performance impact since it would require all memory > that may be needed by the enclave be pre-allocated from outside the enclave and not > just dynamically allocated from within the enclave at the time it is needed. > > Would such a performance impact be acceptable? IMHO yes because bad behaving enclave can cause the same issue anyway, and more indeterministic manner. BR, Jarkko
On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote: > Hi all, > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre > <reinette.chatre@intel.com> wrote: > > > Hi Jarkko, > > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote: > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote: > > > > Hi Jarkko, > > > > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote: > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of > > > > > > this version of > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX pages but > > > > > > obviously new RX pages are now out of the picture: > > > > > > > > > > > > > > > > > > /* > > > > > > * Adding a regular page that is architecturally allowed to only > > > > > > * be created with RW permissions. > > > > > > * TBD: Interface with user space policy to support max permissions > > > > > > * of RWX. > > > > > > */ > > > > > > prot = PROT_READ | PROT_WRITE; > > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); > > > > > > encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits; > > > > > > > > > > > > If that TBD is left out to the final version the page > > > > > > augmentation has a > > > > > > risk of a API bottleneck, and that risk can realize then > > > > > > also in the page > > > > > > permission ioctls. > > > > > > > > > > > > I.e. now any review comment is based on not fully known > > > > > > territory, we have > > > > > > one known unknown, and some unknown unknowns from > > > > > > unpredictable effect to > > > > > > future API changes. > > > > > > > > The plan to complete the "TBD" in the above snippet was to > > > > follow this work > > > > with user policy integration at this location. On a high level > > > > the plan was > > > > for this to look something like: > > > > > > > > > > > > /* > > > > * Adding a regular page that is architecturally allowed to only > > > > * be created with RW permissions. > > > > * Interface with user space policy to support max permissions > > > > * of RWX. > > > > */ > > > > prot = PROT_READ | PROT_WRITE; > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); > > > > > > > > if (user space policy allows RWX on dynamically added pages) > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | > > > > PROT_WRITE | PROT_EXEC, 0); > > > > else > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | > > > > PROT_WRITE, 0); > > > > > > > > The work that follows this series aimed to do the integration with user > > > > space policy. > > > > > > What do you mean by "user space policy" anyway exactly? I'm sorry but I > > > just don't fully understand this. > > > > My apologies - I just assumed that you would need no reminder about this > > contentious > > part of SGX history. Essentially it means that, yes, the kernel could > > theoretically > > permit any kind of access to any file/page, but some accesses are known > > to generally > > be a bad idea - like making memory executable as well as writable - and > > thus there > > are additional checks based on what user space permits before the kernel > > allows > > such accesses. > > > > For example, > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() > > > > User policy and SGX has seen significant discussion. Some notable > > threads: > > https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/ > > https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/ > > > > > It's too big of a risk to accept this series without X taken care > > > of. Patch > > > series should neither have TODO nor TBD comments IMHO. I don't want > > > to ack > > > a series based on speculation what might happen in the future. > > > > ok > > > > > > > > > > I think the best way to move forward would be to do EAUG's > > > > > explicitly with > > > > > an ioctl that could also include secinfo for permissions. Then you can > > > > > easily do the rest with EACCEPTCOPY inside the enclave. > > > > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for > > > > this purpose. It already includes SECINFO which may also be useful if > > > > needing to later support EAUG of PT_SS* pages. > > > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it > > > a day. > > > > I could, yes. > > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is > > > this weird > > > thing added to the #PF handler? Why is it added at all then? > > > > I was just speculating in my response, there is no plan to extend > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of). > > > > > > How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES > > > > after enclave initialization on any memory region within the > > > > enclave where > > > > pages are planned to be added dynamically. This ioctl() calls > > > > EAUG to add the > > > > new pages with RW permissions and their vm_max_prot_bits can be > > > > set to the > > > > permissions found in the included SECINFO. This will support > > > > later EACCEPTCOPY > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS > > > > > > I don't like this type of re-use of the existing API. > > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus > > after > > considering the user policy question (above) and performance trade-off > > (more below). > > > > > > > > > The big question is whether communicating user policy after > > > > enclave initialization > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable > > > > to all? I would > > > > appreciate a confirmation on this direction considering the > > > > significant history > > > > behind this topic. > > > > > > I have no idea because I don't know what is user space policy. > > > > This discussion is about some enclave usages needing RWX permissions > > on dynamically added enclave pages. RWX permissions on dynamically added > > pages is > > not something that should blindly be allowed for all SGX enclaves but > > instead the user > > needs to explicitly allow specific enclaves to have such ability. This > > is equivalent > > to (but not the same as) what exists in Linux today with LSM. As seen in > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able > > to make > > files and memory be both writable and executable, but it would only do > > so for those > > files and memory that the LSM (which is how user policy is communicated, > > like SELinux) > > indicates it is allowed, not blindly do so for all files and all memory. > > > > > > > Putting EAUG to the #PF handler and implicitly call it just > > > > > too flakky and > > > > > hard to make deterministic for e.g. JIT compiler in our use > > > > > case (not to > > > > > mention that JIT is not possible at all because inability to > > > > > do RX pages). > > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic > > but from > > what I understand it would have a performance impact since it would > > require all memory > > that may be needed by the enclave be pre-allocated from outside the > > enclave and not > > just dynamically allocated from within the enclave at the time it is > > needed. > > > > Would such a performance impact be acceptable? > > > > User space won't always have enough info to decide whether the pages to be > EAUG'd immediately. In some cases (shared libraries, JVM for example) lots > of code/data pages can be mapped but never actually touched. One > enclave/process does not know if any other more important enclave/process > would need the EPC. > > It should be for kernel to make the final decision as it has overall picture > of the system EPC usage and availability. EAUG ioctl does not give better capabilities for user space to waste EPC given that EADD ioctl already exists, i.e. your argument is logically incorrect. BR, Jarkko
Hi Jarkko, On 3/3/2022 3:12 PM, Jarkko Sakkinen wrote: > On Wed, Mar 02, 2022 at 02:57:45PM -0800, Reinette Chatre wrote: >>> What do you mean by "user space policy" anyway exactly? I'm sorry but I >>> just don't fully understand this. >> >> My apologies - I just assumed that you would need no reminder about this contentious >> part of SGX history. Essentially it means that, yes, the kernel could theoretically >> permit any kind of access to any file/page, but some accesses are known to generally >> be a bad idea - like making memory executable as well as writable - and thus there >> are additional checks based on what user space permits before the kernel allows >> such accesses. > > The device files are limited by a GID (in systemd upstream), which is a > "user policy". > > What you want to add and why augmentation cannot be made complete before > the unknown factor is added to the access control? After studying this part of SGX history I learned that unfortunately none of the existing user policy controls have been found to be a perfect fit for enclaves. Current user policy type permissions are associated with files and processes and enclaves have properties of both. One process can execute multiple enclaves and only one/some of those enclaves may require to execute dirty pages. Associating a permission to execute dirty pages with the process, and thus giving that ability to all of its enclaves, is not ideal. Similarly, the file /dev/sgx_enclave can represent multiple enclaves used by multiple processes and a file permission is similarly too broad. What I was planning to propose and discuss after the SGX2 core enabling was an ability for user space to uniquely identify enclaves that require the ability to execute dirty pages. This identification can be specified by using enclave properties like MRENCLAVE and MRSIGNER. Executing dirty pages would only be allowed for these specific enclaves identified to require this ability. A solution like this is possible using the kernel's keys subsystem by introducing a new "enclave_execdirty" key that contains these properties. I have this working as a PoC. Perhaps the SGX_IOC_ENCLAVE_AUGMENT_PAGES what you propose can also be seen as a solution to support user space policy ... instead that it is more fine grained in that it is used to identify specific memory ranges within specific enclaves that are allowed to execute dirty pages. What do you think? >>>>> I think the best way to move forward would be to do EAUG's explicitly with >>>>> an ioctl that could also include secinfo for permissions. Then you can >>>>> easily do the rest with EACCEPTCOPY inside the enclave. >>>> >>>> SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be used for >>>> this purpose. It already includes SECINFO which may also be useful if >>>> needing to later support EAUG of PT_SS* pages. >>> >>> You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it a day. >> >> I could, yes. > > And this enables EACCEPTCOPY pattern nicely. > > E.g. you can implement mmap() with EAUG and then EACCEPTCOPY feeded with > permissions and a zero page: > > 1. enclave calls back to host to do mmap() > 2. host does eaug on given range and enter back to enclave. > 3. enclave does eacceptcopy with given permissions and a zero page. > >>> I don't like this type of re-use of the existing API. >> >> I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is consensus after >> considering the user policy question (above) and performance trade-off (more below). > > Ok. > > If adding this would be a bottleneck it would be already persistent int > "add pages", so whatever limitation there might be, it already exist. Currently this checking is built in as part of "add pages", for example, user space is prevented from circumventing existing protections on the source pages with the "vma->vm_flags & VM_MAYEXEC" check in __sgx_encl_add_page(). Further, there is trust here in that the pages added before enclave initialization are accompanied by their secinfo with the permissions of the pages and those values are included in the measurement (MRENCLAVE) of the final enclave. The maximum permissions any enclave page specified during "add pages" may have is "locked down" during this time. Permissions of EAUG pages are not included in the MRENCLAVE of the enclave and there is no backing memory that can be referenced to learn what is already allowed. It is possible that some of the code dynamically loaded into the enclave could indeed be buggy or malicious so effort should be made to only allow executing of dirty pages to those enclaves specified to require the ability. > Thus, logically, that could be safely added without worrying about user > policies all that much... > >> >>> >>>> The big question is whether communicating user policy after enclave initialization >>>> via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable to all? I would >>>> appreciate a confirmation on this direction considering the significant history >>>> behind this topic. >>> >>> I have no idea because I don't know what is user space policy. >> >> This discussion is about some enclave usages needing RWX permissions >> on dynamically added enclave pages. RWX permissions on dynamically added pages is > > I'm not sure if that is actually necessary, if you use EAUG-EACCEPTCOPY > type of pattern. Please correct if I'm wrong. This only takes EPCM permissions into account. The issue comes in when the kernel needs to determine whether it should allow the PTEs pointing to these pages to be executable. To elaborate your example, to use dynamically added RWX pages EAUG->EACCEPTCOPY->SGX_IOC_ENCLAVE_RELAX_PERMISSIONS is required and SGX_IOC_ENCLAVE_RELAX_PERMISSIONS will only allow PTEs that are allowed. In the driver sgx_encl_page->vm_max_prot_bits dictates what permissions are allowed and SGX_IOC_ENCLAVE_RELAX_PERMISSIONS will return EPERM if an attempt is made to relax permissions beyond that. When considering the user space policy integration, sgx_encl_page->vm_max_prot_bits will be initialized to reflect allowed permissions, RWX if the enclave is so allowed, in this way EAUG pages can be made executable using SGX_IOC_ENCLAVE_RELAX_PERMISSIONS. >> not something that should blindly be allowed for all SGX enclaves but instead the user >> needs to explicitly allow specific enclaves to have such ability. This is equivalent >> to (but not the same as) what exists in Linux today with LSM. As seen in >> mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is able to make >> files and memory be both writable and executable, but it would only do so for those >> files and memory that the LSM (which is how user policy is communicated, like SELinux) >> indicates it is allowed, not blindly do so for all files and all memory. > > We could also potentially make LSM hooks to ioctls, if that is ever needed. Could you please elaborate? > > And as I said earlier, EAUG ioctl does not make things any worse they might > be. I hope my earlier comments noting the differences with adding pages shine some light here. > >>>>> Putting EAUG to the #PF handler and implicitly call it just too flakky and >>>>> hard to make deterministic for e.g. JIT compiler in our use case (not to >>>>> mention that JIT is not possible at all because inability to do RX pages). >> >> I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more deterministic but from >> what I understand it would have a performance impact since it would require all memory >> that may be needed by the enclave be pre-allocated from outside the enclave and not >> just dynamically allocated from within the enclave at the time it is needed. >> >> Would such a performance impact be acceptable? > > IMHO yes because bad behaving enclave can cause the same issue anyway, > and more indeterministic manner. With EAUG pages supported in the page fault handler it is possible to support both usages. Especially now that Dave provided guidance on how to support MAP_POPULATE. As I understand, when MAP_POPULATE is supported a usage needing deterministic behavior can pre-fault all the EAUG pages while those usages mapping a lot of memory that mostly will go unused are also supported. Reinette
On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote: >> Hi all, >> >> On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre >> <reinette.chatre@intel.com> wrote: >> >> > Hi Jarkko, >> > >> > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote: >> > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote: >> > > > Hi Jarkko, >> > > > >> > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote: >> > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of >> > > > > > this version of >> > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX >> pages but >> > > > > > obviously new RX pages are now out of the picture: >> > > > > > >> > > > > > >> > > > > > /* >> > > > > > * Adding a regular page that is architecturally allowed to >> only >> > > > > > * be created with RW permissions. >> > > > > > * TBD: Interface with user space policy to support max >> permissions >> > > > > > * of RWX. >> > > > > > */ >> > > > > > prot = PROT_READ | PROT_WRITE; >> > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); >> > > > > > encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits; >> > > > > > >> > > > > > If that TBD is left out to the final version the page >> > > > > > augmentation has a >> > > > > > risk of a API bottleneck, and that risk can realize then >> > > > > > also in the page >> > > > > > permission ioctls. >> > > > > > >> > > > > > I.e. now any review comment is based on not fully known >> > > > > > territory, we have >> > > > > > one known unknown, and some unknown unknowns from >> > > > > > unpredictable effect to >> > > > > > future API changes. >> > > > >> > > > The plan to complete the "TBD" in the above snippet was to >> > > > follow this work >> > > > with user policy integration at this location. On a high level >> > > > the plan was >> > > > for this to look something like: >> > > > >> > > > >> > > > /* >> > > > * Adding a regular page that is architecturally allowed to only >> > > > * be created with RW permissions. >> > > > * Interface with user space policy to support max permissions >> > > > * of RWX. >> > > > */ >> > > > prot = PROT_READ | PROT_WRITE; >> > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); >> > > > >> > > > if (user space policy allows RWX on dynamically added >> pages) >> > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | >> > > > PROT_WRITE | PROT_EXEC, 0); >> > > > else >> > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | >> > > > PROT_WRITE, 0); >> > > > >> > > > The work that follows this series aimed to do the integration >> with user >> > > > space policy. >> > > >> > > What do you mean by "user space policy" anyway exactly? I'm sorry >> but I >> > > just don't fully understand this. >> > >> > My apologies - I just assumed that you would need no reminder about >> this >> > contentious >> > part of SGX history. Essentially it means that, yes, the kernel could >> > theoretically >> > permit any kind of access to any file/page, but some accesses are >> known >> > to generally >> > be a bad idea - like making memory executable as well as writable - >> and >> > thus there >> > are additional checks based on what user space permits before the >> kernel >> > allows >> > such accesses. >> > >> > For example, >> > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() >> > >> > User policy and SGX has seen significant discussion. Some notable >> > threads: >> > >> https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/ >> > >> https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/ >> > >> > > It's too big of a risk to accept this series without X taken care >> > > of. Patch >> > > series should neither have TODO nor TBD comments IMHO. I don't want >> > > to ack >> > > a series based on speculation what might happen in the future. >> > >> > ok >> > >> > > >> > > > > I think the best way to move forward would be to do EAUG's >> > > > > explicitly with >> > > > > an ioctl that could also include secinfo for permissions. Then >> you can >> > > > > easily do the rest with EACCEPTCOPY inside the enclave. >> > > > >> > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be >> used for >> > > > this purpose. It already includes SECINFO which may also be >> useful if >> > > > needing to later support EAUG of PT_SS* pages. >> > > >> > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it >> > > a day. >> > >> > I could, yes. >> > >> > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is >> > > this weird >> > > thing added to the #PF handler? Why is it added at all then? >> > >> > I was just speculating in my response, there is no plan to extend >> > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of). >> > >> > > > How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES >> > > > after enclave initialization on any memory region within the >> > > > enclave where >> > > > pages are planned to be added dynamically. This ioctl() calls >> > > > EAUG to add the >> > > > new pages with RW permissions and their vm_max_prot_bits can be >> > > > set to the >> > > > permissions found in the included SECINFO. This will support >> > > > later EACCEPTCOPY >> > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS >> > > >> > > I don't like this type of re-use of the existing API. >> > >> > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is >> consensus >> > after >> > considering the user policy question (above) and performance trade-off >> > (more below). >> > >> > > >> > > > The big question is whether communicating user policy after >> > > > enclave initialization >> > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable >> > > > to all? I would >> > > > appreciate a confirmation on this direction considering the >> > > > significant history >> > > > behind this topic. >> > > >> > > I have no idea because I don't know what is user space policy. >> > >> > This discussion is about some enclave usages needing RWX permissions >> > on dynamically added enclave pages. RWX permissions on dynamically >> added >> > pages is >> > not something that should blindly be allowed for all SGX enclaves but >> > instead the user >> > needs to explicitly allow specific enclaves to have such ability. This >> > is equivalent >> > to (but not the same as) what exists in Linux today with LSM. As seen >> in >> > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux is >> able >> > to make >> > files and memory be both writable and executable, but it would only do >> > so for those >> > files and memory that the LSM (which is how user policy is >> communicated, >> > like SELinux) >> > indicates it is allowed, not blindly do so for all files and all >> memory. >> > >> > > > > Putting EAUG to the #PF handler and implicitly call it just >> > > > > too flakky and >> > > > > hard to make deterministic for e.g. JIT compiler in our use >> > > > > case (not to >> > > > > mention that JIT is not possible at all because inability to >> > > > > do RX pages). >> > >> > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more >> deterministic >> > but from >> > what I understand it would have a performance impact since it would >> > require all memory >> > that may be needed by the enclave be pre-allocated from outside the >> > enclave and not >> > just dynamically allocated from within the enclave at the time it is >> > needed. >> > >> > Would such a performance impact be acceptable? >> > >> >> User space won't always have enough info to decide whether the pages to >> be >> EAUG'd immediately. In some cases (shared libraries, JVM for example) >> lots >> of code/data pages can be mapped but never actually touched. One >> enclave/process does not know if any other more important >> enclave/process >> would need the EPC. >> >> It should be for kernel to make the final decision as it has overall >> picture >> of the system EPC usage and availability. > > EAUG ioctl does not give better capabilities for user space to waste > EPC given that EADD ioctl already exists, i.e. your argument is logically > incorrect. The point of adding EAUG is to allow more efficient use of EPC pages. Without EAUG, enclaves have to EADD everything upfront into EPC, consuming predetermined number of EPC pages, some of which may not be used at all. With EAUG, enclaves should be able to load minimal pages to get started, pages added on #PF as they are actually accessed. Obviously as you pointed out, some usages make more sense to pre-EAUG (EAUG before #PF). But your proposal of supporting only pre-EAUG here essentially makes EAUG behave almost the same as EADD. If the current implementation with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems possible based on Dave's comments), then it is flxible to cover all cases and allow kernel to optimize allocation of EPC pages. Thanks Haitao
On Thu, Mar 03, 2022 at 10:03:30PM -0600, Haitao Huang wrote: > > On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote: > > > Hi all, > > > > > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre > > > <reinette.chatre@intel.com> wrote: > > > > > > > Hi Jarkko, > > > > > > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote: > > > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote: > > > > > > Hi Jarkko, > > > > > > > > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote: > > > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of > > > > > > > > this version of > > > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX > > > pages but > > > > > > > > obviously new RX pages are now out of the picture: > > > > > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > * Adding a regular page that is architecturally allowed > > > to only > > > > > > > > * be created with RW permissions. > > > > > > > > * TBD: Interface with user space policy to support max > > > permissions > > > > > > > > * of RWX. > > > > > > > > */ > > > > > > > > prot = PROT_READ | PROT_WRITE; > > > > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); > > > > > > > > encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits; > > > > > > > > > > > > > > > > If that TBD is left out to the final version the page > > > > > > > > augmentation has a > > > > > > > > risk of a API bottleneck, and that risk can realize then > > > > > > > > also in the page > > > > > > > > permission ioctls. > > > > > > > > > > > > > > > > I.e. now any review comment is based on not fully known > > > > > > > > territory, we have > > > > > > > > one known unknown, and some unknown unknowns from > > > > > > > > unpredictable effect to > > > > > > > > future API changes. > > > > > > > > > > > > The plan to complete the "TBD" in the above snippet was to > > > > > > follow this work > > > > > > with user policy integration at this location. On a high level > > > > > > the plan was > > > > > > for this to look something like: > > > > > > > > > > > > > > > > > > /* > > > > > > * Adding a regular page that is architecturally allowed to only > > > > > > * be created with RW permissions. > > > > > > * Interface with user space policy to support max permissions > > > > > > * of RWX. > > > > > > */ > > > > > > prot = PROT_READ | PROT_WRITE; > > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); > > > > > > > > > > > > if (user space policy allows RWX on dynamically added > > > pages) > > > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | > > > > > > PROT_WRITE | PROT_EXEC, 0); > > > > > > else > > > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | > > > > > > PROT_WRITE, 0); > > > > > > > > > > > > The work that follows this series aimed to do the integration > > > with user > > > > > > space policy. > > > > > > > > > > What do you mean by "user space policy" anyway exactly? I'm > > > sorry but I > > > > > just don't fully understand this. > > > > > > > > My apologies - I just assumed that you would need no reminder > > > about this > > > > contentious > > > > part of SGX history. Essentially it means that, yes, the kernel could > > > > theoretically > > > > permit any kind of access to any file/page, but some accesses are > > > known > > > > to generally > > > > be a bad idea - like making memory executable as well as writable > > > - and > > > > thus there > > > > are additional checks based on what user space permits before the > > > kernel > > > > allows > > > > such accesses. > > > > > > > > For example, > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() > > > > > > > > User policy and SGX has seen significant discussion. Some notable > > > > threads: > > > > https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/ > > > > https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/ > > > > > > > > > It's too big of a risk to accept this series without X taken care > > > > > of. Patch > > > > > series should neither have TODO nor TBD comments IMHO. I don't want > > > > > to ack > > > > > a series based on speculation what might happen in the future. > > > > > > > > ok > > > > > > > > > > > > > > > > I think the best way to move forward would be to do EAUG's > > > > > > > explicitly with > > > > > > > an ioctl that could also include secinfo for permissions. > > > Then you can > > > > > > > easily do the rest with EACCEPTCOPY inside the enclave. > > > > > > > > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be > > > used for > > > > > > this purpose. It already includes SECINFO which may also be > > > useful if > > > > > > needing to later support EAUG of PT_SS* pages. > > > > > > > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and call it > > > > > a day. > > > > > > > > I could, yes. > > > > > > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is > > > > > this weird > > > > > thing added to the #PF handler? Why is it added at all then? > > > > > > > > I was just speculating in my response, there is no plan to extend > > > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of). > > > > > > > > > > How this could work is user space calls SGX_IOC_ENCLAVE_ADD_PAGES > > > > > > after enclave initialization on any memory region within the > > > > > > enclave where > > > > > > pages are planned to be added dynamically. This ioctl() calls > > > > > > EAUG to add the > > > > > > new pages with RW permissions and their vm_max_prot_bits can be > > > > > > set to the > > > > > > permissions found in the included SECINFO. This will support > > > > > > later EACCEPTCOPY > > > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS > > > > > > > > > > I don't like this type of re-use of the existing API. > > > > > > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is > > > consensus > > > > after > > > > considering the user policy question (above) and performance trade-off > > > > (more below). > > > > > > > > > > > > > > > The big question is whether communicating user policy after > > > > > > enclave initialization > > > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable > > > > > > to all? I would > > > > > > appreciate a confirmation on this direction considering the > > > > > > significant history > > > > > > behind this topic. > > > > > > > > > > I have no idea because I don't know what is user space policy. > > > > > > > > This discussion is about some enclave usages needing RWX permissions > > > > on dynamically added enclave pages. RWX permissions on dynamically > > > added > > > > pages is > > > > not something that should blindly be allowed for all SGX enclaves but > > > > instead the user > > > > needs to explicitly allow specific enclaves to have such ability. This > > > > is equivalent > > > > to (but not the same as) what exists in Linux today with LSM. As > > > seen in > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux > > > is able > > > > to make > > > > files and memory be both writable and executable, but it would only do > > > > so for those > > > > files and memory that the LSM (which is how user policy is > > > communicated, > > > > like SELinux) > > > > indicates it is allowed, not blindly do so for all files and all > > > memory. > > > > > > > > > > > Putting EAUG to the #PF handler and implicitly call it just > > > > > > > too flakky and > > > > > > > hard to make deterministic for e.g. JIT compiler in our use > > > > > > > case (not to > > > > > > > mention that JIT is not possible at all because inability to > > > > > > > do RX pages). > > > > > > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more > > > deterministic > > > > but from > > > > what I understand it would have a performance impact since it would > > > > require all memory > > > > that may be needed by the enclave be pre-allocated from outside the > > > > enclave and not > > > > just dynamically allocated from within the enclave at the time it is > > > > needed. > > > > > > > > Would such a performance impact be acceptable? > > > > > > > > > > User space won't always have enough info to decide whether the pages > > > to be > > > EAUG'd immediately. In some cases (shared libraries, JVM for > > > example) lots > > > of code/data pages can be mapped but never actually touched. One > > > enclave/process does not know if any other more important > > > enclave/process > > > would need the EPC. > > > > > > It should be for kernel to make the final decision as it has overall > > > picture > > > of the system EPC usage and availability. > > > > EAUG ioctl does not give better capabilities for user space to waste > > EPC given that EADD ioctl already exists, i.e. your argument is logically > > incorrect. > > The point of adding EAUG is to allow more efficient use of EPC pages. > Without EAUG, enclaves have to EADD everything upfront into EPC, consuming > predetermined number of EPC pages, some of which may not be used at all. > With EAUG, enclaves should be able to load minimal pages to get started, > pages added on #PF as they are actually accessed. > > Obviously as you pointed out, some usages make more sense to pre-EAUG (EAUG > before #PF). But your proposal of supporting only pre-EAUG here essentially > makes EAUG behave almost the same as EADD. If the current implementation > with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems possible > based on Dave's comments), then it is flxible to cover all cases and allow > kernel to optimize allocation of EPC pages. There is no even a working #PF based implementation in existance, and your argument has too many if's for my taste. Reinette, can you squash this fixup to your patch set and send v3 so that we get to a working implementation that can be benchmarked against e.g. ioctl based version: https://lore.kernel.org/linux-sgx/20220304033918.361495-1-jarkko@kernel.org/T/#u This also objectively fixes some performance issues, e.g. EMODPE can be just used without any round-trips (v2 requires relax ioctl). BR, Jark
Hi Jarkko On Fri, 04 Mar 2022 02:30:22 -0600, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Thu, Mar 03, 2022 at 10:03:30PM -0600, Haitao Huang wrote: >> >> On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen <jarkko@kernel.org> >> wrote: >> >> > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote: >> > > Hi all, >> > > >> > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre >> > > <reinette.chatre@intel.com> wrote: >> > > >> > > > Hi Jarkko, >> > > > >> > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote: >> > > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote: >> > > > > > Hi Jarkko, >> > > > > > >> > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote: >> > > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of >> > > > > > > > this version of >> > > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX >> > > pages but >> > > > > > > > obviously new RX pages are now out of the picture: >> > > > > > > > >> > > > > > > > >> > > > > > > > /* >> > > > > > > > * Adding a regular page that is architecturally allowed >> > > to only >> > > > > > > > * be created with RW permissions. >> > > > > > > > * TBD: Interface with user space policy to support max >> > > permissions >> > > > > > > > * of RWX. >> > > > > > > > */ >> > > > > > > > prot = PROT_READ | PROT_WRITE; >> > > > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); >> > > > > > > > encl_page->vm_max_prot_bits = >> encl_page->vm_run_prot_bits; >> > > > > > > > >> > > > > > > > If that TBD is left out to the final version the page >> > > > > > > > augmentation has a >> > > > > > > > risk of a API bottleneck, and that risk can realize then >> > > > > > > > also in the page >> > > > > > > > permission ioctls. >> > > > > > > > >> > > > > > > > I.e. now any review comment is based on not fully known >> > > > > > > > territory, we have >> > > > > > > > one known unknown, and some unknown unknowns from >> > > > > > > > unpredictable effect to >> > > > > > > > future API changes. >> > > > > > >> > > > > > The plan to complete the "TBD" in the above snippet was to >> > > > > > follow this work >> > > > > > with user policy integration at this location. On a high level >> > > > > > the plan was >> > > > > > for this to look something like: >> > > > > > >> > > > > > >> > > > > > /* >> > > > > > * Adding a regular page that is architecturally allowed to >> only >> > > > > > * be created with RW permissions. >> > > > > > * Interface with user space policy to support max >> permissions >> > > > > > * of RWX. >> > > > > > */ >> > > > > > prot = PROT_READ | PROT_WRITE; >> > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); >> > > > > > >> > > > > > if (user space policy allows RWX on dynamically added >> > > pages) >> > > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | >> > > > > > PROT_WRITE | PROT_EXEC, 0); >> > > > > > else >> > > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | >> > > > > > PROT_WRITE, 0); >> > > > > > >> > > > > > The work that follows this series aimed to do the integration >> > > with user >> > > > > > space policy. >> > > > > >> > > > > What do you mean by "user space policy" anyway exactly? I'm >> > > sorry but I >> > > > > just don't fully understand this. >> > > > >> > > > My apologies - I just assumed that you would need no reminder >> > > about this >> > > > contentious >> > > > part of SGX history. Essentially it means that, yes, the kernel >> could >> > > > theoretically >> > > > permit any kind of access to any file/page, but some accesses are >> > > known >> > > > to generally >> > > > be a bad idea - like making memory executable as well as writable >> > > - and >> > > > thus there >> > > > are additional checks based on what user space permits before the >> > > kernel >> > > > allows >> > > > such accesses. >> > > > >> > > > For example, >> > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() >> > > > >> > > > User policy and SGX has seen significant discussion. Some notable >> > > > threads: >> > > > >> https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/ >> > > > >> https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/ >> > > > >> > > > > It's too big of a risk to accept this series without X taken >> care >> > > > > of. Patch >> > > > > series should neither have TODO nor TBD comments IMHO. I don't >> want >> > > > > to ack >> > > > > a series based on speculation what might happen in the future. >> > > > >> > > > ok >> > > > >> > > > > >> > > > > > > I think the best way to move forward would be to do EAUG's >> > > > > > > explicitly with >> > > > > > > an ioctl that could also include secinfo for permissions. >> > > Then you can >> > > > > > > easily do the rest with EACCEPTCOPY inside the enclave. >> > > > > > >> > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be >> > > used for >> > > > > > this purpose. It already includes SECINFO which may also be >> > > useful if >> > > > > > needing to later support EAUG of PT_SS* pages. >> > > > > >> > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and >> call it >> > > > > a day. >> > > > >> > > > I could, yes. >> > > > >> > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is >> > > > > this weird >> > > > > thing added to the #PF handler? Why is it added at all then? >> > > > >> > > > I was just speculating in my response, there is no plan to extend >> > > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of). >> > > > >> > > > > > How this could work is user space calls >> SGX_IOC_ENCLAVE_ADD_PAGES >> > > > > > after enclave initialization on any memory region within the >> > > > > > enclave where >> > > > > > pages are planned to be added dynamically. This ioctl() calls >> > > > > > EAUG to add the >> > > > > > new pages with RW permissions and their vm_max_prot_bits can >> be >> > > > > > set to the >> > > > > > permissions found in the included SECINFO. This will support >> > > > > > later EACCEPTCOPY >> > > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS >> > > > > >> > > > > I don't like this type of re-use of the existing API. >> > > > >> > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is >> > > consensus >> > > > after >> > > > considering the user policy question (above) and performance >> trade-off >> > > > (more below). >> > > > >> > > > > >> > > > > > The big question is whether communicating user policy after >> > > > > > enclave initialization >> > > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable >> > > > > > to all? I would >> > > > > > appreciate a confirmation on this direction considering the >> > > > > > significant history >> > > > > > behind this topic. >> > > > > >> > > > > I have no idea because I don't know what is user space policy. >> > > > >> > > > This discussion is about some enclave usages needing RWX >> permissions >> > > > on dynamically added enclave pages. RWX permissions on dynamically >> > > added >> > > > pages is >> > > > not something that should blindly be allowed for all SGX enclaves >> but >> > > > instead the user >> > > > needs to explicitly allow specific enclaves to have such ability. >> This >> > > > is equivalent >> > > > to (but not the same as) what exists in Linux today with LSM. As >> > > seen in >> > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux >> > > is able >> > > > to make >> > > > files and memory be both writable and executable, but it would >> only do >> > > > so for those >> > > > files and memory that the LSM (which is how user policy is >> > > communicated, >> > > > like SELinux) >> > > > indicates it is allowed, not blindly do so for all files and all >> > > memory. >> > > > >> > > > > > > Putting EAUG to the #PF handler and implicitly call it just >> > > > > > > too flakky and >> > > > > > > hard to make deterministic for e.g. JIT compiler in our use >> > > > > > > case (not to >> > > > > > > mention that JIT is not possible at all because inability to >> > > > > > > do RX pages). >> > > > >> > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more >> > > deterministic >> > > > but from >> > > > what I understand it would have a performance impact since it >> would >> > > > require all memory >> > > > that may be needed by the enclave be pre-allocated from outside >> the >> > > > enclave and not >> > > > just dynamically allocated from within the enclave at the time it >> is >> > > > needed. >> > > > >> > > > Would such a performance impact be acceptable? >> > > > >> > > >> > > User space won't always have enough info to decide whether the pages >> > > to be >> > > EAUG'd immediately. In some cases (shared libraries, JVM for >> > > example) lots >> > > of code/data pages can be mapped but never actually touched. One >> > > enclave/process does not know if any other more important >> > > enclave/process >> > > would need the EPC. >> > > >> > > It should be for kernel to make the final decision as it has overall >> > > picture >> > > of the system EPC usage and availability. >> > >> > EAUG ioctl does not give better capabilities for user space to waste >> > EPC given that EADD ioctl already exists, i.e. your argument is >> logically >> > incorrect. >> >> The point of adding EAUG is to allow more efficient use of EPC pages. >> Without EAUG, enclaves have to EADD everything upfront into EPC, >> consuming >> predetermined number of EPC pages, some of which may not be used at all. >> With EAUG, enclaves should be able to load minimal pages to get started, >> pages added on #PF as they are actually accessed. >> >> Obviously as you pointed out, some usages make more sense to pre-EAUG >> (EAUG >> before #PF). But your proposal of supporting only pre-EAUG here >> essentially >> makes EAUG behave almost the same as EADD. If the current >> implementation >> with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems possible >> based on Dave's comments), then it is flxible to cover all cases and >> allow >> kernel to optimize allocation of EPC pages. > > There is no even a working #PF based implementation in existance, and > your > argument has too many if's for my taste. 1) if you mean no user space is implementing this kind of solution, read this section, otherwise, skip to 2) below which is only couple of sentences. If you are willing to look, there is already implementation in our SDK to do heap and stack expansion on demand on #PF. Enclaves may not know heap/stack size up front, we have implemented these features to make EPC usage more efficient. I don't know why normal processes can add RAM on #PF, but enclaves adding EPC on #PF becomes so unacceptable concept to you. And the kernel does that for EPC swapping already when #PF happens on a swapped out EPC page. Our implementation has gone through several rounds, the latest is here:https://github.com/intel/linux-sgx/tree/edmm_v2/sdk/emm. It was also implemented in original OOT driver based SDK implementation. Customers are using it and found them useful. I think this is a critical feature that many other runtimes will also need. 2) It's OK for you to request additional support for your usage and I agree it is needed. But IMHO, totally getting rid of EAUG on #PF is bad and unnecessary. Current implementation can be extended to support your usage. What's the reason you think MAP_POPULATE won't work for you? BR Haitao
On Fri, Mar 04, 2022 at 09:51:22AM -0600, Haitao Huang wrote: > Hi Jarkko > > On Fri, 04 Mar 2022 02:30:22 -0600, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > On Thu, Mar 03, 2022 at 10:03:30PM -0600, Haitao Huang wrote: > > > > > > On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen <jarkko@kernel.org> > > > wrote: > > > > > > > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote: > > > > > Hi all, > > > > > > > > > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre > > > > > <reinette.chatre@intel.com> wrote: > > > > > > > > > > > Hi Jarkko, > > > > > > > > > > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote: > > > > > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre wrote: > > > > > > > > Hi Jarkko, > > > > > > > > > > > > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote: > > > > > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of > > > > > > > > > > this version of > > > > > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R and RX > > > > > pages but > > > > > > > > > > obviously new RX pages are now out of the picture: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > > > * Adding a regular page that is architecturally allowed > > > > > to only > > > > > > > > > > * be created with RW permissions. > > > > > > > > > > * TBD: Interface with user space policy to support max > > > > > permissions > > > > > > > > > > * of RWX. > > > > > > > > > > */ > > > > > > > > > > prot = PROT_READ | PROT_WRITE; > > > > > > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); > > > > > > > > > > encl_page->vm_max_prot_bits = > > > encl_page->vm_run_prot_bits; > > > > > > > > > > > > > > > > > > > > If that TBD is left out to the final version the page > > > > > > > > > > augmentation has a > > > > > > > > > > risk of a API bottleneck, and that risk can realize then > > > > > > > > > > also in the page > > > > > > > > > > permission ioctls. > > > > > > > > > > > > > > > > > > > > I.e. now any review comment is based on not fully known > > > > > > > > > > territory, we have > > > > > > > > > > one known unknown, and some unknown unknowns from > > > > > > > > > > unpredictable effect to > > > > > > > > > > future API changes. > > > > > > > > > > > > > > > > The plan to complete the "TBD" in the above snippet was to > > > > > > > > follow this work > > > > > > > > with user policy integration at this location. On a high level > > > > > > > > the plan was > > > > > > > > for this to look something like: > > > > > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > * Adding a regular page that is architecturally allowed > > > to only > > > > > > > > * be created with RW permissions. > > > > > > > > * Interface with user space policy to support max > > > permissions > > > > > > > > * of RWX. > > > > > > > > */ > > > > > > > > prot = PROT_READ | PROT_WRITE; > > > > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0); > > > > > > > > > > > > > > > > if (user space policy allows RWX on dynamically added > > > > > pages) > > > > > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | > > > > > > > > PROT_WRITE | PROT_EXEC, 0); > > > > > > > > else > > > > > > > > encl_page->vm_max_prot_bits = calc_vm_prot_bits(PROT_READ | > > > > > > > > PROT_WRITE, 0); > > > > > > > > > > > > > > > > The work that follows this series aimed to do the integration > > > > > with user > > > > > > > > space policy. > > > > > > > > > > > > > > What do you mean by "user space policy" anyway exactly? I'm > > > > > sorry but I > > > > > > > just don't fully understand this. > > > > > > > > > > > > My apologies - I just assumed that you would need no reminder > > > > > about this > > > > > > contentious > > > > > > part of SGX history. Essentially it means that, yes, the > > > kernel could > > > > > > theoretically > > > > > > permit any kind of access to any file/page, but some accesses are > > > > > known > > > > > > to generally > > > > > > be a bad idea - like making memory executable as well as writable > > > > > - and > > > > > > thus there > > > > > > are additional checks based on what user space permits before the > > > > > kernel > > > > > > allows > > > > > > such accesses. > > > > > > > > > > > > For example, > > > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() > > > > > > > > > > > > User policy and SGX has seen significant discussion. Some notable > > > > > > threads: > > > > > > https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/ > > > > > > https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/ > > > > > > > > > > > > > It's too big of a risk to accept this series without X taken > > > care > > > > > > > of. Patch > > > > > > > series should neither have TODO nor TBD comments IMHO. I > > > don't want > > > > > > > to ack > > > > > > > a series based on speculation what might happen in the future. > > > > > > > > > > > > ok > > > > > > > > > > > > > > > > > > > > > > I think the best way to move forward would be to do EAUG's > > > > > > > > > explicitly with > > > > > > > > > an ioctl that could also include secinfo for permissions. > > > > > Then you can > > > > > > > > > easily do the rest with EACCEPTCOPY inside the enclave. > > > > > > > > > > > > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could possibly be > > > > > used for > > > > > > > > this purpose. It already includes SECINFO which may also be > > > > > useful if > > > > > > > > needing to later support EAUG of PT_SS* pages. > > > > > > > > > > > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and > > > call it > > > > > > > a day. > > > > > > > > > > > > I could, yes. > > > > > > > > > > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES what is > > > > > > > this weird > > > > > > > thing added to the #PF handler? Why is it added at all then? > > > > > > > > > > > > I was just speculating in my response, there is no plan to extend > > > > > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of). > > > > > > > > > > > > > > How this could work is user space calls > > > SGX_IOC_ENCLAVE_ADD_PAGES > > > > > > > > after enclave initialization on any memory region within the > > > > > > > > enclave where > > > > > > > > pages are planned to be added dynamically. This ioctl() calls > > > > > > > > EAUG to add the > > > > > > > > new pages with RW permissions and their vm_max_prot_bits > > > can be > > > > > > > > set to the > > > > > > > > permissions found in the included SECINFO. This will support > > > > > > > > later EACCEPTCOPY > > > > > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS > > > > > > > > > > > > > > I don't like this type of re-use of the existing API. > > > > > > > > > > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is > > > > > consensus > > > > > > after > > > > > > considering the user policy question (above) and performance > > > trade-off > > > > > > (more below). > > > > > > > > > > > > > > > > > > > > > The big question is whether communicating user policy after > > > > > > > > enclave initialization > > > > > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is acceptable > > > > > > > > to all? I would > > > > > > > > appreciate a confirmation on this direction considering the > > > > > > > > significant history > > > > > > > > behind this topic. > > > > > > > > > > > > > > I have no idea because I don't know what is user space policy. > > > > > > > > > > > > This discussion is about some enclave usages needing RWX > > > permissions > > > > > > on dynamically added enclave pages. RWX permissions on dynamically > > > > > added > > > > > > pages is > > > > > > not something that should blindly be allowed for all SGX > > > enclaves but > > > > > > instead the user > > > > > > needs to explicitly allow specific enclaves to have such > > > ability. This > > > > > > is equivalent > > > > > > to (but not the same as) what exists in Linux today with LSM. As > > > > > seen in > > > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() Linux > > > > > is able > > > > > > to make > > > > > > files and memory be both writable and executable, but it would > > > only do > > > > > > so for those > > > > > > files and memory that the LSM (which is how user policy is > > > > > communicated, > > > > > > like SELinux) > > > > > > indicates it is allowed, not blindly do so for all files and all > > > > > memory. > > > > > > > > > > > > > > > Putting EAUG to the #PF handler and implicitly call it just > > > > > > > > > too flakky and > > > > > > > > > hard to make deterministic for e.g. JIT compiler in our use > > > > > > > > > case (not to > > > > > > > > > mention that JIT is not possible at all because inability to > > > > > > > > > do RX pages). > > > > > > > > > > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more > > > > > deterministic > > > > > > but from > > > > > > what I understand it would have a performance impact since it > > > would > > > > > > require all memory > > > > > > that may be needed by the enclave be pre-allocated from > > > outside the > > > > > > enclave and not > > > > > > just dynamically allocated from within the enclave at the time > > > it is > > > > > > needed. > > > > > > > > > > > > Would such a performance impact be acceptable? > > > > > > > > > > > > > > > > User space won't always have enough info to decide whether the pages > > > > > to be > > > > > EAUG'd immediately. In some cases (shared libraries, JVM for > > > > > example) lots > > > > > of code/data pages can be mapped but never actually touched. One > > > > > enclave/process does not know if any other more important > > > > > enclave/process > > > > > would need the EPC. > > > > > > > > > > It should be for kernel to make the final decision as it has overall > > > > > picture > > > > > of the system EPC usage and availability. > > > > > > > > EAUG ioctl does not give better capabilities for user space to waste > > > > EPC given that EADD ioctl already exists, i.e. your argument is > > > logically > > > > incorrect. > > > > > > The point of adding EAUG is to allow more efficient use of EPC pages. > > > Without EAUG, enclaves have to EADD everything upfront into EPC, > > > consuming > > > predetermined number of EPC pages, some of which may not be used at all. > > > With EAUG, enclaves should be able to load minimal pages to get started, > > > pages added on #PF as they are actually accessed. > > > > > > Obviously as you pointed out, some usages make more sense to > > > pre-EAUG (EAUG > > > before #PF). But your proposal of supporting only pre-EAUG here > > > essentially > > > makes EAUG behave almost the same as EADD. If the current > > > implementation > > > with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems possible > > > based on Dave's comments), then it is flxible to cover all cases and > > > allow > > > kernel to optimize allocation of EPC pages. > > > > There is no even a working #PF based implementation in existance, and > > your > > argument has too many if's for my taste. > > 1) if you mean no user space is implementing this kind of solution, read > this section, otherwise, skip to 2) below which is only couple of sentences. > > If you are willing to look, there is already implementation in our SDK to do > heap and stack expansion on demand on #PF. Enclaves may not know heap/stack > size up front, we have implemented these features to make EPC usage more > efficient. I don't know why normal processes can add RAM on #PF, but > enclaves adding EPC on #PF becomes so unacceptable concept to you. And the > kernel does that for EPC swapping already when #PF happens on a swapped out > EPC page. In adds O(n) round-trips for a mmap() emulation, which can be done in O(1) round-trips with a ioctl. > Our implementation has gone through several rounds, the latest is > here:https://github.com/intel/linux-sgx/tree/edmm_v2/sdk/emm. It was also > implemented in original OOT driver based SDK implementation. Customers are > using it and found them useful. I think this is a critical feature that many > other runtimes will also need. I'm not sure what the common sense argument here is. > 2) > It's OK for you to request additional support for your usage and I agree it > is needed. But IMHO, totally getting rid of EAUG on #PF is bad and > unnecessary. Current implementation can be extended to support your usage. > What's the reason you think MAP_POPULATE won't work for you? I do not recall taking stand on MAP_POPULATE. > BR > Haitao BR, Jarkko
Sorry, I missed this. On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote: > On 3/3/22 13:23, Reinette Chatre wrote: > > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their > > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability > > then I believe that SGX would benefit. > > Some Intel folks asked for this quite a while ago. I think it's > entirely doable: add a new vm_ops->populate() function that will allow > ignoring VM_IO|VM_PFNMAP if present. I'm sorry what I don't understand what you mean by ignoring here, i.e. cannot fully comprehend the last sentece. And would the vm_ops->populate() be called right after the existing ones involved with the VMA creation process? > Or, if nobody wants to waste all of the vm_ops space, just add an > arch_vma_populate() or something which can call over into SGX. > > I'll happily review the patches if anyone can put such a beast together. I'll start with vm_ops->populate() and check the feedback first for that. BR, Jarkko
On Sat, Mar 05, 2022 at 05:19:24AM +0200, Jarkko Sakkinen wrote: > Sorry, I missed this. > > On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote: > > On 3/3/22 13:23, Reinette Chatre wrote: > > > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their > > > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability > > > then I believe that SGX would benefit. > > > > Some Intel folks asked for this quite a while ago. I think it's > > entirely doable: add a new vm_ops->populate() function that will allow > > ignoring VM_IO|VM_PFNMAP if present. > > I'm sorry what I don't understand what you mean by ignoring here, > i.e. cannot fully comprehend the last sentece. > > And would the vm_ops->populate() be called right after the existing ones > involved with the VMA creation process? > > > Or, if nobody wants to waste all of the vm_ops space, just add an > > arch_vma_populate() or something which can call over into SGX. > > > > I'll happily review the patches if anyone can put such a beast together. > > I'll start with vm_ops->populate() and check the feedback first for > that. I would instead extend populate() in file_operations into: int (*populate)(struct file *, struct vm_area_struct *, bool populate); This does not add to memory consumption. BR, Jarkko
On Sun, Mar 06, 2022 at 02:15:32AM +0200, Jarkko Sakkinen wrote: > On Sat, Mar 05, 2022 at 05:19:24AM +0200, Jarkko Sakkinen wrote: > > Sorry, I missed this. > > > > On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote: > > > On 3/3/22 13:23, Reinette Chatre wrote: > > > > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their > > > > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability > > > > then I believe that SGX would benefit. > > > > > > Some Intel folks asked for this quite a while ago. I think it's > > > entirely doable: add a new vm_ops->populate() function that will allow > > > ignoring VM_IO|VM_PFNMAP if present. > > > > I'm sorry what I don't understand what you mean by ignoring here, > > i.e. cannot fully comprehend the last sentece. > > > > And would the vm_ops->populate() be called right after the existing ones > > involved with the VMA creation process? > > > > > Or, if nobody wants to waste all of the vm_ops space, just add an > > > arch_vma_populate() or something which can call over into SGX. > > > > > > I'll happily review the patches if anyone can put such a beast together. > > > > I'll start with vm_ops->populate() and check the feedback first for > > that. > > I would instead extend populate() in file_operations into: > > int (*populate)(struct file *, struct vm_area_struct *, bool populate); > > This does not add to memory consumption. Ugh, mixing my words, sorry :-) I meant: int (*mmap)(struct file *, struct vm_area_struct *, bool populate); BR, Jarkko
On Fri, 04 Mar 2022 19:02:28 -0600, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Fri, Mar 04, 2022 at 09:51:22AM -0600, Haitao Huang wrote: >> Hi Jarkko >> >> On Fri, 04 Mar 2022 02:30:22 -0600, Jarkko Sakkinen <jarkko@kernel.org> >> wrote: >> >> > On Thu, Mar 03, 2022 at 10:03:30PM -0600, Haitao Huang wrote: >> > > >> > > On Thu, 03 Mar 2022 17:18:33 -0600, Jarkko Sakkinen >> <jarkko@kernel.org> >> > > wrote: >> > > >> > > > On Thu, Mar 03, 2022 at 10:08:14AM -0600, Haitao Huang wrote: >> > > > > Hi all, >> > > > > >> > > > > On Wed, 02 Mar 2022 16:57:45 -0600, Reinette Chatre >> > > > > <reinette.chatre@intel.com> wrote: >> > > > > >> > > > > > Hi Jarkko, >> > > > > > >> > > > > > On 3/1/2022 6:05 PM, Jarkko Sakkinen wrote: >> > > > > > > On Tue, Mar 01, 2022 at 09:48:48AM -0800, Reinette Chatre >> wrote: >> > > > > > > > Hi Jarkko, >> > > > > > > > >> > > > > > > > On 3/1/2022 5:42 AM, Jarkko Sakkinen wrote: >> > > > > > > > > > With EACCEPTCOPY (kudos to Mark S. for reminding me of >> > > > > > > > > > this version of >> > > > > > > > > > EACCEPT @ chat.enarx.dev) it is possible to make R >> and RX >> > > > > pages but >> > > > > > > > > > obviously new RX pages are now out of the picture: >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > /* >> > > > > > > > > > * Adding a regular page that is architecturally >> allowed >> > > > > to only >> > > > > > > > > > * be created with RW permissions. >> > > > > > > > > > * TBD: Interface with user space policy to support >> max >> > > > > permissions >> > > > > > > > > > * of RWX. >> > > > > > > > > > */ >> > > > > > > > > > prot = PROT_READ | PROT_WRITE; >> > > > > > > > > > encl_page->vm_run_prot_bits = >> calc_vm_prot_bits(prot, 0); >> > > > > > > > > > encl_page->vm_max_prot_bits = >> > > encl_page->vm_run_prot_bits; >> > > > > > > > > > >> > > > > > > > > > If that TBD is left out to the final version the page >> > > > > > > > > > augmentation has a >> > > > > > > > > > risk of a API bottleneck, and that risk can realize >> then >> > > > > > > > > > also in the page >> > > > > > > > > > permission ioctls. >> > > > > > > > > > >> > > > > > > > > > I.e. now any review comment is based on not fully >> known >> > > > > > > > > > territory, we have >> > > > > > > > > > one known unknown, and some unknown unknowns from >> > > > > > > > > > unpredictable effect to >> > > > > > > > > > future API changes. >> > > > > > > > >> > > > > > > > The plan to complete the "TBD" in the above snippet was to >> > > > > > > > follow this work >> > > > > > > > with user policy integration at this location. On a high >> level >> > > > > > > > the plan was >> > > > > > > > for this to look something like: >> > > > > > > > >> > > > > > > > >> > > > > > > > /* >> > > > > > > > * Adding a regular page that is architecturally allowed >> > > to only >> > > > > > > > * be created with RW permissions. >> > > > > > > > * Interface with user space policy to support max >> > > permissions >> > > > > > > > * of RWX. >> > > > > > > > */ >> > > > > > > > prot = PROT_READ | PROT_WRITE; >> > > > > > > > encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, >> 0); >> > > > > > > > >> > > > > > > > if (user space policy allows RWX on dynamically >> added >> > > > > pages) >> > > > > > > > encl_page->vm_max_prot_bits = >> calc_vm_prot_bits(PROT_READ | >> > > > > > > > PROT_WRITE | PROT_EXEC, 0); >> > > > > > > > else >> > > > > > > > encl_page->vm_max_prot_bits = >> calc_vm_prot_bits(PROT_READ | >> > > > > > > > PROT_WRITE, 0); >> > > > > > > > >> > > > > > > > The work that follows this series aimed to do the >> integration >> > > > > with user >> > > > > > > > space policy. >> > > > > > > >> > > > > > > What do you mean by "user space policy" anyway exactly? I'm >> > > > > sorry but I >> > > > > > > just don't fully understand this. >> > > > > > >> > > > > > My apologies - I just assumed that you would need no reminder >> > > > > about this >> > > > > > contentious >> > > > > > part of SGX history. Essentially it means that, yes, the >> > > kernel could >> > > > > > theoretically >> > > > > > permit any kind of access to any file/page, but some accesses >> are >> > > > > known >> > > > > > to generally >> > > > > > be a bad idea - like making memory executable as well as >> writable >> > > > > - and >> > > > > > thus there >> > > > > > are additional checks based on what user space permits before >> the >> > > > > kernel >> > > > > > allows >> > > > > > such accesses. >> > > > > > >> > > > > > For example, >> > > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() >> > > > > > >> > > > > > User policy and SGX has seen significant discussion. Some >> notable >> > > > > > threads: >> > > > > > >> https://lore.kernel.org/linux-security-module/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com/ >> > > > > > >> https://lore.kernel.org/linux-security-module/20190619222401.14942-1-sean.j.christopherson@intel.com/ >> > > > > > >> > > > > > > It's too big of a risk to accept this series without X taken >> > > care >> > > > > > > of. Patch >> > > > > > > series should neither have TODO nor TBD comments IMHO. I >> > > don't want >> > > > > > > to ack >> > > > > > > a series based on speculation what might happen in the >> future. >> > > > > > >> > > > > > ok >> > > > > > >> > > > > > > >> > > > > > > > > I think the best way to move forward would be to do >> EAUG's >> > > > > > > > > explicitly with >> > > > > > > > > an ioctl that could also include secinfo for >> permissions. >> > > > > Then you can >> > > > > > > > > easily do the rest with EACCEPTCOPY inside the enclave. >> > > > > > > > >> > > > > > > > SGX_IOC_ENCLAVE_ADD_PAGES already exists and could >> possibly be >> > > > > used for >> > > > > > > > this purpose. It already includes SECINFO which may also >> be >> > > > > useful if >> > > > > > > > needing to later support EAUG of PT_SS* pages. >> > > > > > > >> > > > > > > You could also simply add SGX_IOC_ENCLAVE_AUGMENT_PAGES and >> > > call it >> > > > > > > a day. >> > > > > > >> > > > > > I could, yes. >> > > > > > >> > > > > > > And if there is plan to extend SGX_IOC_ENCLAVE_ADD_PAGES >> what is >> > > > > > > this weird >> > > > > > > thing added to the #PF handler? Why is it added at all then? >> > > > > > >> > > > > > I was just speculating in my response, there is no plan to >> extend >> > > > > > SGX_IOC_ENCLAVE_ADD_PAGES (that I am aware of). >> > > > > > >> > > > > > > > How this could work is user space calls >> > > SGX_IOC_ENCLAVE_ADD_PAGES >> > > > > > > > after enclave initialization on any memory region within >> the >> > > > > > > > enclave where >> > > > > > > > pages are planned to be added dynamically. This ioctl() >> calls >> > > > > > > > EAUG to add the >> > > > > > > > new pages with RW permissions and their vm_max_prot_bits >> > > can be >> > > > > > > > set to the >> > > > > > > > permissions found in the included SECINFO. This will >> support >> > > > > > > > later EACCEPTCOPY >> > > > > > > > as well as SGX_IOC_ENCLAVE_RELAX_PERMISSIONS >> > > > > > > >> > > > > > > I don't like this type of re-use of the existing API. >> > > > > > >> > > > > > I could proceed with SGX_IOC_ENCLAVE_AUGMENT_PAGES if there is >> > > > > consensus >> > > > > > after >> > > > > > considering the user policy question (above) and performance >> > > trade-off >> > > > > > (more below). >> > > > > > >> > > > > > > >> > > > > > > > The big question is whether communicating user policy >> after >> > > > > > > > enclave initialization >> > > > > > > > via the SECINFO within SGX_IOC_ENCLAVE_ADD_PAGES is >> acceptable >> > > > > > > > to all? I would >> > > > > > > > appreciate a confirmation on this direction considering >> the >> > > > > > > > significant history >> > > > > > > > behind this topic. >> > > > > > > >> > > > > > > I have no idea because I don't know what is user space >> policy. >> > > > > > >> > > > > > This discussion is about some enclave usages needing RWX >> > > permissions >> > > > > > on dynamically added enclave pages. RWX permissions on >> dynamically >> > > > > added >> > > > > > pages is >> > > > > > not something that should blindly be allowed for all SGX >> > > enclaves but >> > > > > > instead the user >> > > > > > needs to explicitly allow specific enclaves to have such >> > > ability. This >> > > > > > is equivalent >> > > > > > to (but not the same as) what exists in Linux today with LSM. >> As >> > > > > seen in >> > > > > > mm/mprotect.c:do_mprotect_pkey()->security_file_mprotect() >> Linux >> > > > > is able >> > > > > > to make >> > > > > > files and memory be both writable and executable, but it would >> > > only do >> > > > > > so for those >> > > > > > files and memory that the LSM (which is how user policy is >> > > > > communicated, >> > > > > > like SELinux) >> > > > > > indicates it is allowed, not blindly do so for all files and >> all >> > > > > memory. >> > > > > > >> > > > > > > > > Putting EAUG to the #PF handler and implicitly call it >> just >> > > > > > > > > too flakky and >> > > > > > > > > hard to make deterministic for e.g. JIT compiler in our >> use >> > > > > > > > > case (not to >> > > > > > > > > mention that JIT is not possible at all because >> inability to >> > > > > > > > > do RX pages). >> > > > > > >> > > > > > I understand how SGX_IOC_ENCLAVE_AUGMENT_PAGES can be more >> > > > > deterministic >> > > > > > but from >> > > > > > what I understand it would have a performance impact since it >> > > would >> > > > > > require all memory >> > > > > > that may be needed by the enclave be pre-allocated from >> > > outside the >> > > > > > enclave and not >> > > > > > just dynamically allocated from within the enclave at the time >> > > it is >> > > > > > needed. >> > > > > > >> > > > > > Would such a performance impact be acceptable? >> > > > > > >> > > > > >> > > > > User space won't always have enough info to decide whether the >> pages >> > > > > to be >> > > > > EAUG'd immediately. In some cases (shared libraries, JVM for >> > > > > example) lots >> > > > > of code/data pages can be mapped but never actually touched. One >> > > > > enclave/process does not know if any other more important >> > > > > enclave/process >> > > > > would need the EPC. >> > > > > >> > > > > It should be for kernel to make the final decision as it has >> overall >> > > > > picture >> > > > > of the system EPC usage and availability. >> > > > >> > > > EAUG ioctl does not give better capabilities for user space to >> waste >> > > > EPC given that EADD ioctl already exists, i.e. your argument is >> > > logically >> > > > incorrect. >> > > >> > > The point of adding EAUG is to allow more efficient use of EPC >> pages. >> > > Without EAUG, enclaves have to EADD everything upfront into EPC, >> > > consuming >> > > predetermined number of EPC pages, some of which may not be used at >> all. >> > > With EAUG, enclaves should be able to load minimal pages to get >> started, >> > > pages added on #PF as they are actually accessed. >> > > >> > > Obviously as you pointed out, some usages make more sense to >> > > pre-EAUG (EAUG >> > > before #PF). But your proposal of supporting only pre-EAUG here >> > > essentially >> > > makes EAUG behave almost the same as EADD. If the current >> > > implementation >> > > with EAUG on #PF can also use MAP_POPULATE for pre-EAUG (seems >> possible >> > > based on Dave's comments), then it is flxible to cover all cases and >> > > allow >> > > kernel to optimize allocation of EPC pages. >> > >> > There is no even a working #PF based implementation in existance, and >> > your >> > argument has too many if's for my taste. >> >> 1) if you mean no user space is implementing this kind of solution, read >> this section, otherwise, skip to 2) below which is only couple of >> sentences. >> >> If you are willing to look, there is already implementation in our SDK >> to do >> heap and stack expansion on demand on #PF. Enclaves may not know >> heap/stack >> size up front, we have implemented these features to make EPC usage more >> efficient. I don't know why normal processes can add RAM on #PF, but >> enclaves adding EPC on #PF becomes so unacceptable concept to you. And >> the >> kernel does that for EPC swapping already when #PF happens on a swapped >> out >> EPC page. > > In adds O(n) round-trips for a mmap() emulation, which can be done in > O(1) > round-trips with a ioctl. > >> Our implementation has gone through several rounds, the latest is >> here:https://github.com/intel/linux-sgx/tree/edmm_v2/sdk/emm. It was >> also >> implemented in original OOT driver based SDK implementation. Customers >> are >> using it and found them useful. I think this is a critical feature that >> many >> other runtimes will also need. > > I'm not sure what the common sense argument here is. > My (wrong) assumption was that you are disabling EAUG on #PF totally, and all I was saying EAUG on #PF is critical for many usages and disabling it requires good justification. But you are expecting an ioctl call for each #PF for those usages: https://lore.kernel.org/linux-sgx/YiK8NEnvgPerEdFB@iki.fi/#t. IIUC, that's better than total disabling but less optimal. (I have not checked all call sequences in detail to be sure it would work for all our cases) >> 2) >> It's OK for you to request additional support for your usage and I >> agree it >> is needed. But IMHO, totally getting rid of EAUG on #PF is bad and >> unnecessary. Current implementation can be extended to support your >> usage. >> What's the reason you think MAP_POPULATE won't work for you? > > I do not recall taking stand on MAP_POPULATE. Thanks for looking into that. Like I said, that should cover all usages.
On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote: > On 3/3/22 13:23, Reinette Chatre wrote: > > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their > > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability > > then I believe that SGX would benefit. > > Some Intel folks asked for this quite a while ago. I think it's > entirely doable: add a new vm_ops->populate() function that will allow > ignoring VM_IO|VM_PFNMAP if present. > > Or, if nobody wants to waste all of the vm_ops space, just add an > arch_vma_populate() or something which can call over into SGX. > > I'll happily review the patches if anyone can put such a beast together. Everyone would be better off, if EAUG's were done unconditionally for mmap() after initialization. Nice property is that this needs no core mm changes. The resource saving argument is at least a bit weak because you might use EMODPR for the address range anyway. So you end up doing things just slower. And to have good confidentiality, you actually probably want to clear also dynamically added pages with EACCEPTCOPY (and zero page) when you take them into use. I find it also a bit worrying that enclave has direct access to allocate kernel resources and trigger ring-0 opcode. I don't like that part at all. syscall/ioctl sets the correct barrier, as the host side should be and is the resource manager, not the enclave. BR, Jarkko
On Thu, Mar 10, 2022 at 07:43:42AM +0200, Jarkko Sakkinen wrote: > On Thu, Mar 03, 2022 at 01:44:14PM -0800, Dave Hansen wrote: > > On 3/3/22 13:23, Reinette Chatre wrote: > > > Unfortunately MAP_POPULATE is not supported by SGX VMAs because of their > > > VM_IO and VM_PFNMAP flags. When VMAs with such flags obtain this capability > > > then I believe that SGX would benefit. > > > > Some Intel folks asked for this quite a while ago. I think it's > > entirely doable: add a new vm_ops->populate() function that will allow > > ignoring VM_IO|VM_PFNMAP if present. > > > > Or, if nobody wants to waste all of the vm_ops space, just add an > > arch_vma_populate() or something which can call over into SGX. > > > > I'll happily review the patches if anyone can put such a beast together. > > Everyone would be better off, if EAUG's were done unconditionally for > mmap() after initialization. Nice property is that this needs no core mm > changes. > > The resource saving argument is at least a bit weak because you might use > EMODPR for the address range anyway. So you end up doing things just > slower. And to have good confidentiality, you actually probably want to > clear also dynamically added pages with EACCEPTCOPY (and zero page) when > you take them into use. > > I find it also a bit worrying that enclave has direct access to allocate > kernel resources and trigger ring-0 opcode. I don't like that part at > all. syscall/ioctl sets the correct barrier, as the host side should be > and is the resource manager, not the enclave. Actually, this should be ABI compatible too. I'd expect all kselftests continue work as they are. BR, Jarkko
On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote: > Hi All, > > Regarding the recent update of splitting the page permissions change > request into two IOCTLS (RELAX and RESTRICT), can we combine them into > one? That is, revert to how it was done in the v1 version? > > Why? Currently in Gramine (a library OS for unmodified applications, > https://gramineproject.io/) with the new proposed change, one needs to > store the page permission for each page or range of pages. And for every > request of `mmap` or `mprotect`, Gramine would have to do a lookup of the > page permissions for the request range and then call the respective IOCTL > either RESTRICT or RELAX. This seems a little overwhelming. > > Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do > an `EACCEPT` irrespective of RELAX or RESTRICT page permission request? > With this approach, we can avoid storing page permissions and simplify > the implementation. > > I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` flows > to do TLB shootdowns which might not be needed for RELAX IOCTL but I am > not sure what will be the performance impact. Is there any data point to > see the performance impact? > > Thanks, > -Vijay This should get better in the next versuin. "relax" is gone. And for dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e. internal vm_max_prot_bits is set to RWX. I patched the existing series eno For Enarx I'm using the following patterns. Shim mmap() handler: 1. Ask host for mmap() syscall. 2. Construct secinfo matching the protection bits. 3. For each page in the address range: EACCEPTCOPY with a zero page. Shim mprotect() handler: 1. Ask host for mprotect() syscall. 2. For each page in the address range: EACCEPT with PROT_NONE secinfo and EMODPE with the secinfo having the prot bits. Backend mprotect() handler: 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address range with PROT_NONE. 2. Invoke real mprotect() syscall. Not super-complicated. That is the safest way to changes permissions i.e. use EMODPR only to reset the permissions, and EMODPE as EMODP. Then the page is always either inaccessible completely or with the correct permissions. Any other ways to use EMODPR are a bit questionable. That's why I tend to think that it would be better to kernel provide only limited version of it to reset the permissions. Most of the other use will be most likely mis-use. IMHO there is only one legit pattern to use it, i.e. "least racy" pattern. I would replace SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS with SGX_IOC_ENCLAVE_RESET_PERMISSIONS that resets pages to PROT_NONE or embed this straight into mprotect(). BR, Jarkko
Hi Jarkko I have some trouble understanding the sequences below. On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote: >> Hi All, >> >> Regarding the recent update of splitting the page permissions change >> request into two IOCTLS (RELAX and RESTRICT), can we combine them into >> one? That is, revert to how it was done in the v1 version? >> >> Why? Currently in Gramine (a library OS for unmodified applications, >> https://gramineproject.io/) with the new proposed change, one needs to >> store the page permission for each page or range of pages. And for every >> request of `mmap` or `mprotect`, Gramine would have to do a lookup of >> the >> page permissions for the request range and then call the respective >> IOCTL >> either RESTRICT or RELAX. This seems a little overwhelming. >> >> Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do >> an `EACCEPT` irrespective of RELAX or RESTRICT page permission request? >> With this approach, we can avoid storing page permissions and simplify >> the implementation. >> >> I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` >> flows >> to do TLB shootdowns which might not be needed for RELAX IOCTL but I am >> not sure what will be the performance impact. Is there any data point to >> see the performance impact? >> >> Thanks, >> -Vijay > > This should get better in the next versuin. "relax" is gone. And for > dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e. > internal vm_max_prot_bits is set to RWX. > > I patched the existing series eno > > For Enarx I'm using the following patterns. > > Shim mmap() handler: > 1. Ask host for mmap() syscall. > 2. Construct secinfo matching the protection bits. > 3. For each page in the address range: EACCEPTCOPY with a > zero page. For EACCEPTCOPY to work, I believe PTE.RW is required for the target page. So this only works for mmap(..., RW) or mmap(...,RWX). So that gives you pages with RW/RWX. To change permissions of any of those pages from RW/RWX to R/RX , you need call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't just do EMODPE. so for RW->R, you either: 1)EMODPR(EPCM.NONE) 2)EACCEPT(EPCM.NONE) 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read access permitted by enclave" or: 1)EMODPR(EPCM.PROT_R) 2)EACCEPT(EPCM.PROT_R) > Shim mprotect() handler: > 1. Ask host for mprotect() syscall. > 2. For each page in the address range: EACCEPT with PROT_NONE > secinfo and EMODPE with the secinfo having the prot bits. EACCEPT requires PTE.R. And EAUG'd pages will always initialized with EPCM.RW, so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH. > Backend mprotect() handler: > 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address > range with PROT_NONE. > 2. Invoke real mprotect() syscall. > Note #1 can only be done after EACCEPT. MODPR is not allowed for pending pages. > Not super-complicated. > > That is the safest way to changes permissions i.e. use EMODPR only to > reset > the permissions, and EMODPE as EMODP. Then the page is always either > inaccessible completely or with the correct permissions. > > Any other ways to use EMODPR are a bit questionable. That's why I tend to > think that it would be better to kernel provide only limited version of > it > to reset the permissions. Most of the other use will be most likely > mis-use. IMHO there is only one legit pattern to use it, i.e. "least > racy" pattern. > I don't see it as "racy" if you copy some data into RW page and reduce it to R. From kernel point of view the only diff is EMODPR(NONE) vs EMODPR(R). It's more efficient to do just EMODPR(R) than EMODPR(NONE)+ EMODPE(R). Thanks Haitao
On Thu, Mar 10, 2022 at 12:33:20PM -0600, Haitao Huang wrote: > Hi Jarkko > > I have some trouble understanding the sequences below. > > On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote: > > > Hi All, > > > > > > Regarding the recent update of splitting the page permissions change > > > request into two IOCTLS (RELAX and RESTRICT), can we combine them into > > > one? That is, revert to how it was done in the v1 version? > > > > > > Why? Currently in Gramine (a library OS for unmodified applications, > > > https://gramineproject.io/) with the new proposed change, one needs to > > > store the page permission for each page or range of pages. And for every > > > request of `mmap` or `mprotect`, Gramine would have to do a lookup > > > of the > > > page permissions for the request range and then call the respective > > > IOCTL > > > either RESTRICT or RELAX. This seems a little overwhelming. > > > > > > Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do > > > an `EACCEPT` irrespective of RELAX or RESTRICT page permission request? > > > With this approach, we can avoid storing page permissions and simplify > > > the implementation. > > > > > > I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` > > > flows > > > to do TLB shootdowns which might not be needed for RELAX IOCTL but I am > > > not sure what will be the performance impact. Is there any data point to > > > see the performance impact? > > > > > > Thanks, > > > -Vijay > > > > This should get better in the next versuin. "relax" is gone. And for > > dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e. > > internal vm_max_prot_bits is set to RWX. > > > > I patched the existing series eno > > > > For Enarx I'm using the following patterns. > > > > Shim mmap() handler: > > 1. Ask host for mmap() syscall. > > 2. Construct secinfo matching the protection bits. > > 3. For each page in the address range: EACCEPTCOPY with a > > zero page. > > For EACCEPTCOPY to work, I believe PTE.RW is required for the target page. > So this only works for mmap(..., RW) or mmap(...,RWX). I use it only with EAUG. > So that gives you pages with RW/RWX. > > To change permissions of any of those pages from RW/RWX to R/RX , you need > call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't > just do EMODPE. > > so for RW->R, you either: > > 1)EMODPR(EPCM.NONE) > 2)EACCEPT(EPCM.NONE) > 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read > access permitted by enclave" > > or: > > 1)EMODPR(EPCM.PROT_R) > 2)EACCEPT(EPCM.PROT_R) I checked from SDM and you're correct. Then the appropriate thing is to reset to R. > > Shim mprotect() handler: > > 1. Ask host for mprotect() syscall. > > 2. For each page in the address range: EACCEPT with PROT_NONE > > secinfo and EMODPE with the secinfo having the prot bits. > > EACCEPT requires PTE.R. And EAUG'd pages will always initialized with > EPCM.RW, > so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH. Ditto. > > Backend mprotect() handler: > > 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address > > range with PROT_NONE. > > 2. Invoke real mprotect() syscall. > > > Note #1 can only be done after EACCEPT. MODPR is not allowed for pending > pages. Yes, and that's what I'm doing. After that shim does EACCEPT's in a loop. Reinette, the ioctl should already check that either R or W is set in secinfo and return -EACCES. I.e. (* Check for misconfigured SECINFO flags*) IF ( (SCRATCH_SECINFO reserved fields are not zero ) or (SCRATCH_SECINFO.FLAGS.R is 0 and SCRATCH_SECINFO.FLAGS.W is not 0) ) THEN #GP(0); FI; I was testing this and wondering why my enclave #GP's, and then I checked SDM after reading Haitao's response. So clearly check in kernel side is needed. BR, Jarkko
On Fri, Mar 11, 2022 at 02:10:24PM +0200, Jarkko Sakkinen wrote: > On Thu, Mar 10, 2022 at 12:33:20PM -0600, Haitao Huang wrote: > > Hi Jarkko > > > > I have some trouble understanding the sequences below. > > > > On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <jarkko@kernel.org> > > wrote: > > > > > On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote: > > > > Hi All, > > > > > > > > Regarding the recent update of splitting the page permissions change > > > > request into two IOCTLS (RELAX and RESTRICT), can we combine them into > > > > one? That is, revert to how it was done in the v1 version? > > > > > > > > Why? Currently in Gramine (a library OS for unmodified applications, > > > > https://gramineproject.io/) with the new proposed change, one needs to > > > > store the page permission for each page or range of pages. And for every > > > > request of `mmap` or `mprotect`, Gramine would have to do a lookup > > > > of the > > > > page permissions for the request range and then call the respective > > > > IOCTL > > > > either RESTRICT or RELAX. This seems a little overwhelming. > > > > > > > > Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do > > > > an `EACCEPT` irrespective of RELAX or RESTRICT page permission request? > > > > With this approach, we can avoid storing page permissions and simplify > > > > the implementation. > > > > > > > > I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` > > > > flows > > > > to do TLB shootdowns which might not be needed for RELAX IOCTL but I am > > > > not sure what will be the performance impact. Is there any data point to > > > > see the performance impact? > > > > > > > > Thanks, > > > > -Vijay > > > > > > This should get better in the next versuin. "relax" is gone. And for > > > dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e. > > > internal vm_max_prot_bits is set to RWX. > > > > > > I patched the existing series eno > > > > > > For Enarx I'm using the following patterns. > > > > > > Shim mmap() handler: > > > 1. Ask host for mmap() syscall. > > > 2. Construct secinfo matching the protection bits. > > > 3. For each page in the address range: EACCEPTCOPY with a > > > zero page. > > > > For EACCEPTCOPY to work, I believe PTE.RW is required for the target page. > > So this only works for mmap(..., RW) or mmap(...,RWX). > > I use it only with EAUG. > > > So that gives you pages with RW/RWX. > > > > To change permissions of any of those pages from RW/RWX to R/RX , you need > > call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't > > just do EMODPE. > > > > so for RW->R, you either: > > > > 1)EMODPR(EPCM.NONE) > > 2)EACCEPT(EPCM.NONE) > > 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read > > access permitted by enclave" > > > > or: > > > > 1)EMODPR(EPCM.PROT_R) > > 2)EACCEPT(EPCM.PROT_R) > > I checked from SDM and you're correct. > > Then the appropriate thing is to reset to R. > > > > Shim mprotect() handler: > > > 1. Ask host for mprotect() syscall. > > > 2. For each page in the address range: EACCEPT with PROT_NONE > > > secinfo and EMODPE with the secinfo having the prot bits. > > > > EACCEPT requires PTE.R. And EAUG'd pages will always initialized with > > EPCM.RW, > > so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH. > > Ditto. > > > > Backend mprotect() handler: > > > 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address > > > range with PROT_NONE. > > > 2. Invoke real mprotect() syscall. > > > > > Note #1 can only be done after EACCEPT. MODPR is not allowed for pending > > pages. > > Yes, and that's what I'm doing. After that shim does EACCEPT's in a loop. > > Reinette, the ioctl should already check that either R or W is set in > secinfo and return -EACCES. > > I.e. > > (* Check for misconfigured SECINFO flags*) > IF ( (SCRATCH_SECINFO reserved fields are not zero ) or > (SCRATCH_SECINFO.FLAGS.R is 0 and SCRATCH_SECINFO.FLAGS.W is not 0) ) > THEN #GP(0); FI; > > I was testing this and wondering why my enclave #GP's, and then I checked > SDM after reading Haitao's response. So clearly check in kernel side is > needed. I would consider also adding such check "add pages". It's our least common denominator. If we can assume that at least R is there for every enclave page, then it gives invariant that enables EMODPR with R all the time. BR, Jarkko
On Fri, Mar 11, 2022 at 02:16:47PM +0200, Jarkko Sakkinen wrote: > On Fri, Mar 11, 2022 at 02:10:24PM +0200, Jarkko Sakkinen wrote: > > On Thu, Mar 10, 2022 at 12:33:20PM -0600, Haitao Huang wrote: > > > Hi Jarkko > > > > > > I have some trouble understanding the sequences below. > > > > > > On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <jarkko@kernel.org> > > > wrote: > > > > > > > On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote: > > > > > Hi All, > > > > > > > > > > Regarding the recent update of splitting the page permissions change > > > > > request into two IOCTLS (RELAX and RESTRICT), can we combine them into > > > > > one? That is, revert to how it was done in the v1 version? > > > > > > > > > > Why? Currently in Gramine (a library OS for unmodified applications, > > > > > https://gramineproject.io/) with the new proposed change, one needs to > > > > > store the page permission for each page or range of pages. And for every > > > > > request of `mmap` or `mprotect`, Gramine would have to do a lookup > > > > > of the > > > > > page permissions for the request range and then call the respective > > > > > IOCTL > > > > > either RESTRICT or RELAX. This seems a little overwhelming. > > > > > > > > > > Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do > > > > > an `EACCEPT` irrespective of RELAX or RESTRICT page permission request? > > > > > With this approach, we can avoid storing page permissions and simplify > > > > > the implementation. > > > > > > > > > > I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` > > > > > flows > > > > > to do TLB shootdowns which might not be needed for RELAX IOCTL but I am > > > > > not sure what will be the performance impact. Is there any data point to > > > > > see the performance impact? > > > > > > > > > > Thanks, > > > > > -Vijay > > > > > > > > This should get better in the next versuin. "relax" is gone. And for > > > > dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e. > > > > internal vm_max_prot_bits is set to RWX. > > > > > > > > I patched the existing series eno > > > > > > > > For Enarx I'm using the following patterns. > > > > > > > > Shim mmap() handler: > > > > 1. Ask host for mmap() syscall. > > > > 2. Construct secinfo matching the protection bits. > > > > 3. For each page in the address range: EACCEPTCOPY with a > > > > zero page. > > > > > > For EACCEPTCOPY to work, I believe PTE.RW is required for the target page. > > > So this only works for mmap(..., RW) or mmap(...,RWX). > > > > I use it only with EAUG. > > > > > So that gives you pages with RW/RWX. > > > > > > To change permissions of any of those pages from RW/RWX to R/RX , you need > > > call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't > > > just do EMODPE. > > > > > > so for RW->R, you either: > > > > > > 1)EMODPR(EPCM.NONE) > > > 2)EACCEPT(EPCM.NONE) > > > 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read > > > access permitted by enclave" > > > > > > or: > > > > > > 1)EMODPR(EPCM.PROT_R) > > > 2)EACCEPT(EPCM.PROT_R) > > > > I checked from SDM and you're correct. > > > > Then the appropriate thing is to reset to R. > > > > > > Shim mprotect() handler: > > > > 1. Ask host for mprotect() syscall. > > > > 2. For each page in the address range: EACCEPT with PROT_NONE > > > > secinfo and EMODPE with the secinfo having the prot bits. > > > > > > EACCEPT requires PTE.R. And EAUG'd pages will always initialized with > > > EPCM.RW, > > > so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH. > > > > Ditto. > > > > > > Backend mprotect() handler: > > > > 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address > > > > range with PROT_NONE. > > > > 2. Invoke real mprotect() syscall. > > > > > > > Note #1 can only be done after EACCEPT. MODPR is not allowed for pending > > > pages. > > > > Yes, and that's what I'm doing. After that shim does EACCEPT's in a loop. > > > > Reinette, the ioctl should already check that either R or W is set in > > secinfo and return -EACCES. > > > > I.e. > > > > (* Check for misconfigured SECINFO flags*) > > IF ( (SCRATCH_SECINFO reserved fields are not zero ) or > > (SCRATCH_SECINFO.FLAGS.R is 0 and SCRATCH_SECINFO.FLAGS.W is not 0) ) > > THEN #GP(0); FI; > > > > I was testing this and wondering why my enclave #GP's, and then I checked > > SDM after reading Haitao's response. So clearly check in kernel side is > > needed. > > I would consider also adding such check "add pages". It's our least common > denominator. > > If we can assume that at least R is there for every enclave page, then it > gives invariant that enables EMODPR with R all the time. Since EAUG is done already in the #PF handler, so must be EMODPR. Otherwise we do things incosistently [*]. One being in #PF handler and other being ioctl is unacceptable. Moving EMODPR to #PF handler would be trivial: 1. In mprotect() callback unmap PTE's for the range. 2. In #PF handler, EMODPR with read permissions. This is something that would be understandable for the user space. The only API ever required would be EMODPE for permission changes. You could basically implement the whole thing for EPCM inside enclave with no ioctls required. That would leave only ioctls to the series: 1. SGX_IOC_ENCLAVE_MODIFY_TYPE 2. SGX_IOO_ENCLAVE_REMOVE_PAGES [*] For me stick to #PF handler for EAUG is fine for the first mainline version. The API side is factors more critical. BR, Jarkko
Hi Jarkko, On 3/11/2022 4:16 AM, Jarkko Sakkinen wrote: > On Fri, Mar 11, 2022 at 02:10:24PM +0200, Jarkko Sakkinen wrote: >> On Thu, Mar 10, 2022 at 12:33:20PM -0600, Haitao Huang wrote: >>> Hi Jarkko >>> >>> I have some trouble understanding the sequences below. >>> >>> On Thu, 10 Mar 2022 00:10:48 -0600, Jarkko Sakkinen <jarkko@kernel.org> >>> wrote: >>> >>>> On Wed, Feb 23, 2022 at 07:21:50PM +0000, Dhanraj, Vijay wrote: >>>>> Hi All, >>>>> >>>>> Regarding the recent update of splitting the page permissions change >>>>> request into two IOCTLS (RELAX and RESTRICT), can we combine them into >>>>> one? That is, revert to how it was done in the v1 version? >>>>> >>>>> Why? Currently in Gramine (a library OS for unmodified applications, >>>>> https://gramineproject.io/) with the new proposed change, one needs to >>>>> store the page permission for each page or range of pages. And for every >>>>> request of `mmap` or `mprotect`, Gramine would have to do a lookup >>>>> of the >>>>> page permissions for the request range and then call the respective >>>>> IOCTL >>>>> either RESTRICT or RELAX. This seems a little overwhelming. >>>>> >>>>> Request: Instead, can we do `MODPE`, call `RESTRICT` IOCTL, and then do >>>>> an `EACCEPT` irrespective of RELAX or RESTRICT page permission request? >>>>> With this approach, we can avoid storing page permissions and simplify >>>>> the implementation. >>>>> >>>>> I understand RESTRICT IOCTL would do a `MODPR` and trigger `ETRACK` >>>>> flows >>>>> to do TLB shootdowns which might not be needed for RELAX IOCTL but I am >>>>> not sure what will be the performance impact. Is there any data point to >>>>> see the performance impact? >>>>> >>>>> Thanks, >>>>> -Vijay >>>> >>>> This should get better in the next versuin. "relax" is gone. And for >>>> dynamic EAUG'd pages only VMA and EPCM permissions matter, i.e. >>>> internal vm_max_prot_bits is set to RWX. >>>> >>>> I patched the existing series eno >>>> >>>> For Enarx I'm using the following patterns. >>>> >>>> Shim mmap() handler: >>>> 1. Ask host for mmap() syscall. >>>> 2. Construct secinfo matching the protection bits. >>>> 3. For each page in the address range: EACCEPTCOPY with a >>>> zero page. >>> >>> For EACCEPTCOPY to work, I believe PTE.RW is required for the target page. >>> So this only works for mmap(..., RW) or mmap(...,RWX). >> >> I use it only with EAUG. >> >>> So that gives you pages with RW/RWX. >>> >>> To change permissions of any of those pages from RW/RWX to R/RX , you need >>> call ENCLAVE_RESTRICT_PERMISSIONS ioctl with R or with PROT_NONE. you can't >>> just do EMODPE. >>> >>> so for RW->R, you either: >>> >>> 1)EMODPR(EPCM.NONE) >>> 2)EACCEPT(EPCM.NONE) >>> 3)EMODPE(R) -- not sure this would work as spec says EMODPE requires "Read >>> access permitted by enclave" >>> >>> or: >>> >>> 1)EMODPR(EPCM.PROT_R) >>> 2)EACCEPT(EPCM.PROT_R) >> >> I checked from SDM and you're correct. >> >> Then the appropriate thing is to reset to R. >> >>>> Shim mprotect() handler: >>>> 1. Ask host for mprotect() syscall. >>>> 2. For each page in the address range: EACCEPT with PROT_NONE >>>> secinfo and EMODPE with the secinfo having the prot bits. >>> >>> EACCEPT requires PTE.R. And EAUG'd pages will always initialized with >>> EPCM.RW, >>> so EACCEPT(EPCM.PROT_NONE) will fail with SGX_PAGE_ATTRIBUTES_MISMATCH. >> >> Ditto. >> >>>> Backend mprotect() handler: >>>> 1. Invoke ENCLAVE_RESTRICT_PERMISSIONS ioctl for the address >>>> range with PROT_NONE. >>>> 2. Invoke real mprotect() syscall. >>>> >>> Note #1 can only be done after EACCEPT. MODPR is not allowed for pending >>> pages. >> >> Yes, and that's what I'm doing. After that shim does EACCEPT's in a loop. >> >> Reinette, the ioctl should already check that either R or W is set in >> secinfo and return -EACCES. >> >> I.e. >> >> (* Check for misconfigured SECINFO flags*) >> IF ( (SCRATCH_SECINFO reserved fields are not zero ) or >> (SCRATCH_SECINFO.FLAGS.R is 0 and SCRATCH_SECINFO.FLAGS.W is not 0) ) >> THEN #GP(0); FI; >> >> I was testing this and wondering why my enclave #GP's, and then I checked >> SDM after reading Haitao's response. So clearly check in kernel side is >> needed. I do not believe that you encountered the #GP documented above because that check is already present in the current implementation of SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS: sgx_ioc_enclave_restrict_permissions()->sgx_perm_from_user_secinfo(): if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R)) return -EINVAL; It does return EINVAL which is the catch-all error code used to represent invalid input from user space. I am not convinced that EACCES should be used instead though, EACCES means "Permission denied", which is not the case here. The case here is just an invalid request. It currently does not prevent the user from setting PROT_NONE though, which EMODPR does seem to allow. I saw Haitao's note that EMODPE requires "Read access permitted by enclave". This motivates that EMODPR->PROT_NONE should not be allowed since it would not be possible to relax permissions (run EMODPE) after that. Even so, I also found in the SDM that EACCEPT has the note "Read access permitted by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical from that perspective either since the enclave will not be able to EACCEPT the change. Does that match your understanding? I will add the check for R in SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS at least. > I would consider also adding such check "add pages". It's our least common > denominator. > > If we can assume that at least R is there for every enclave page, then it > gives invariant that enables EMODPR with R all the time. Adding pages without permissions to an enclave does not seem practical. I do not know if there are such usages. I can add this as a separate change for consideration. Reinette
On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: > I do not believe that you encountered the #GP documented above because that > check is already present in the current implementation of > SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS: > > sgx_ioc_enclave_restrict_permissions()->sgx_perm_from_user_secinfo(): > if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R)) > return -EINVAL; > > It does return EINVAL which is the catch-all error code used to represent > invalid input from user space. I am not convinced that EACCES should be used > instead though, EACCES means "Permission denied", which is not the case here. > The case here is just an invalid request. > > It currently does not prevent the user from setting PROT_NONE though, which > EMODPR does seem to allow. > > I saw Haitao's note that EMODPE requires "Read access permitted by enclave". > This motivates that EMODPR->PROT_NONE should not be allowed since it would > not be possible to relax permissions (run EMODPE) after that. Even so, I > also found in the SDM that EACCEPT has the note "Read access permitted > by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical > from that perspective either since the enclave will not be able to > EACCEPT the change. Does that match your understanding? > > I will add the check for R in SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS at least. Yes, I think we are in the same line with this. But there is another thing. As EAUG is taken care by the page handler so should EMODPR. It makes the developer experience whole a lot easier when you don't have to back call to host and ask it to execute EMODPR for the range. It's also a huge incosistency in this patch set that they are handled differently. And it creates a concurrency case for user space that is complicated to say the least, i.e. divided work between host and enclave implementation to execute EMODPR is a nightmare scenario. On the other hand this is trivial to sort out in kernel. So what it means that, in one way or antoher, mprotect() needs to be the melting point for both. This can be called mandatory requirement, however this patch set it done, not least because of managing concurrency between kernel and user space. You can get that done by these steps: 1. Unmap PTE's in mprotect() flow. 2. In #PF handler, EMODPR with R set. This clear API for enclave developer because you know in what state pages are after mprotect(), and what you need to still do to them. Only the syscall needs to be them performed by the host side. BR, Jarkko
Hi Jarkko, On 3/11/2022 10:11 AM, Jarkko Sakkinen wrote: > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: > >> I do not believe that you encountered the #GP documented above because that >> check is already present in the current implementation of >> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS: >> >> sgx_ioc_enclave_restrict_permissions()->sgx_perm_from_user_secinfo(): >> if ((perm & SGX_SECINFO_W) && !(perm & SGX_SECINFO_R)) >> return -EINVAL; >> >> It does return EINVAL which is the catch-all error code used to represent >> invalid input from user space. I am not convinced that EACCES should be used >> instead though, EACCES means "Permission denied", which is not the case here. >> The case here is just an invalid request. >> >> It currently does not prevent the user from setting PROT_NONE though, which >> EMODPR does seem to allow. >> >> I saw Haitao's note that EMODPE requires "Read access permitted by enclave". >> This motivates that EMODPR->PROT_NONE should not be allowed since it would >> not be possible to relax permissions (run EMODPE) after that. Even so, I >> also found in the SDM that EACCEPT has the note "Read access permitted >> by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical >> from that perspective either since the enclave will not be able to >> EACCEPT the change. Does that match your understanding? >> >> I will add the check for R in SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS at least. > > Yes, I think we are in the same line with this. > > But there is another thing. > > As EAUG is taken care by the page handler so should EMODPR. It makes the > developer experience whole a lot easier when you don't have to back call > to host and ask it to execute EMODPR for the range. > > It's also a huge incosistency in this patch set that they are handled > differently. > > And it creates a concurrency case for user space that is complicated to say > the least, i.e. divided work between host and enclave implementation to > execute EMODPR is a nightmare scenario. On the other hand this is trivial > to sort out in kernel. EMODPR has possible failures due to state that is managed by the user space runtime. Being able to communicate accurate EMODPR error codes to user space runtime is helpful to the runtime in supporting its management of the enclave memory. Accurate EMODPR error codes can be communicated when using an ioctl(), not when run from within a page fault handler. > So what it means that, in one way or antoher, mprotect() needs to be the > melting point for both. mprotect() is the syscall to modify VMA permissions. EPCM permissions are different from VMA permissions and they are currently treated differently by the kernel. Moving EPCM permission changes to mprotect() forces EPCM permissions to be the same as VMA permissions. That is a significant change. It is also inconsistent since EPCM permission changes cannot be managed completely from the kernel since the kernel can only ever restrict permissions. > This can be called mandatory requirement, however > this patch set it done, not least because of managing concurrency between > kernel and user space. > > You can get that done by these steps: > > 1. Unmap PTE's in mprotect() flow. > 2. In #PF handler, EMODPR with R set. There is also the very significant ETRACK flow that needs to be run after EMODPR. The implications of sending IPIs to all CPUs that may be running in an enclave while in a page fault handler needs to be considered. Page faults should be as fast as possible. If this is considered then this tremendous impact on the page fault handler should be managed and avoided as much as possible - but how will the page fault handler even know when it should run EMODPR? The enclave can run EMODPE from within the enclave at any time without any insight from the kernel so the only way to have accurate permissions would then be to run EMODPR on _every_ page fault, which is obviously a non-starter due to the significant impact (EMODPR and ETRACK) and blast radius (IPIs). Trying to move running of EMODPR earlier, during the mprotect() call itself is also full of obstacles since the mprotect() call may result in VMAs being split, which is an operation that can fail, and followed by the EMODPR-ETRACK flows that can also fail (and not be able to undo the VMA splits). With the EMODPR-ETRACK flows that can fail it is here also not possible to communicate accurately to user space since now there is the whole page range to consider, for example, mprotect() cannot communicate (a) which pages caused the failure, and (b) what failure was encountered. This is possible when using the ioctl(). > This clear API for enclave developer because you know in what state pages > are after mprotect(), and what you need to still do to them. Only the > syscall needs to be them performed by the host side. Supporting permission restriction in an ioctl() enables the runtime to manage the enclave memory without needing to map it. I have considered the idea of supporting the permission restriction with mprotect() but as you can see in this response I did not find it to be practical. Reinette
On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: > I saw Haitao's note that EMODPE requires "Read access permitted by enclave". > This motivates that EMODPR->PROT_NONE should not be allowed since it would > not be possible to relax permissions (run EMODPE) after that. Even so, I > also found in the SDM that EACCEPT has the note "Read access permitted > by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical > from that perspective either since the enclave will not be able to > EACCEPT the change. Does that match your understanding? Yes, PROT_NONE should not be allowed. This is however the real problem. The current kernel patch set has inconsistent API and EMODPR ioctl is simply unacceptable. It also requires more concurrency management from user space run-time, which would be heck a lot easier to do in the kernel. If you really want EMODPR as ioctl, then for consistencys sake, then EAUG should be too. Like this when things go opposite directions, this patch set plain and simply will not work out. I would pick EAUG's strategy from these two as it requires half the back calls to host from an enclave. I.e. please combine mprotect() and EMODPR, either in the #PF handler or as part of mprotect(), which ever suits you best. I'll try demonstrate this with two examples. mmap() could go something like this() (simplified): 1. Execution #UD's to SYSCALL. 2. Host calls enclave's mmap() handler with mmap() parameters. 3. Enclave up-calls host's mmap(). 4. Loops the range with EACCEPTCOPY. mprotect() has to be done like this: 1. Execution #UD's to SYSCALL. 2. Host calls enclave's mprotect() handler. 3. Enclave up-calls host's mprotect(). 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS. 3. Loops the range with EACCEPT. This is just terrible IMHO. I hope these examples bring some insight. BR, Jarkko
On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote: > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: > > > I saw Haitao's note that EMODPE requires "Read access permitted by enclave". > > This motivates that EMODPR->PROT_NONE should not be allowed since it would > > not be possible to relax permissions (run EMODPE) after that. Even so, I > > also found in the SDM that EACCEPT has the note "Read access permitted > > by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical > > from that perspective either since the enclave will not be able to > > EACCEPT the change. Does that match your understanding? > > Yes, PROT_NONE should not be allowed. > > This is however the real problem. > > The current kernel patch set has inconsistent API and EMODPR ioctl is > simply unacceptable. It also requires more concurrency management from > user space run-time, which would be heck a lot easier to do in the kernel. > > If you really want EMODPR as ioctl, then for consistencys sake, then EAUG > should be too. Like this when things go opposite directions, this patch set > plain and simply will not work out. > > I would pick EAUG's strategy from these two as it requires half the back > calls to host from an enclave. I.e. please combine mprotect() and EMODPR, > either in the #PF handler or as part of mprotect(), which ever suits you > best. > > I'll try demonstrate this with two examples. > > mmap() could go something like this() (simplified): > 1. Execution #UD's to SYSCALL. > 2. Host calls enclave's mmap() handler with mmap() parameters. > 3. Enclave up-calls host's mmap(). > 4. Loops the range with EACCEPTCOPY. > > mprotect() has to be done like this: > 1. Execution #UD's to SYSCALL. > 2. Host calls enclave's mprotect() handler. > 3. Enclave up-calls host's mprotect(). > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS. > 3. Loops the range with EACCEPT. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5. Loops the range with EACCEPT + EMODPE. > This is just terrible IMHO. I hope these examples bring some insight. BR, Jarkko
On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote: > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote: > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: > > > > > I saw Haitao's note that EMODPE requires "Read access permitted by enclave". > > > This motivates that EMODPR->PROT_NONE should not be allowed since it would > > > not be possible to relax permissions (run EMODPE) after that. Even so, I > > > also found in the SDM that EACCEPT has the note "Read access permitted > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is not practical > > > from that perspective either since the enclave will not be able to > > > EACCEPT the change. Does that match your understanding? > > > > Yes, PROT_NONE should not be allowed. > > > > This is however the real problem. > > > > The current kernel patch set has inconsistent API and EMODPR ioctl is > > simply unacceptable. It also requires more concurrency management from > > user space run-time, which would be heck a lot easier to do in the kernel. > > > > If you really want EMODPR as ioctl, then for consistencys sake, then EAUG > > should be too. Like this when things go opposite directions, this patch set > > plain and simply will not work out. > > > > I would pick EAUG's strategy from these two as it requires half the back > > calls to host from an enclave. I.e. please combine mprotect() and EMODPR, > > either in the #PF handler or as part of mprotect(), which ever suits you > > best. > > > > I'll try demonstrate this with two examples. > > > > mmap() could go something like this() (simplified): > > 1. Execution #UD's to SYSCALL. > > 2. Host calls enclave's mmap() handler with mmap() parameters. > > 3. Enclave up-calls host's mmap(). > > 4. Loops the range with EACCEPTCOPY. > > > > mprotect() has to be done like this: > > 1. Execution #UD's to SYSCALL. > > 2. Host calls enclave's mprotect() handler. > > 3. Enclave up-calls host's mprotect(). > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS. > > 3. Loops the range with EACCEPT. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > 5. Loops the range with EACCEPT + EMODPE. > > > This is just terrible IMHO. I hope these examples bring some insight. E.g. in Enarx we have to add a special up-call (so called enarxcall in intermediate that we call sallyport, which provides shared buffer to communicate with the enclave) just for reseting the range with PROT_READ. Feel very redundant, adds ugly cruft and is completely opposite strategy to what you've chosen to do with EAUG, which is I think correct choice as far as API is concerned. BR, Jarkko
On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote: > Supporting permission restriction in an ioctl() enables the runtime to manage > the enclave memory without needing to map it. Which is opposite what you do in EAUG. You can also augment pages without needing the map them. Sure you get that capability, but it is quite useless in practice. > I have considered the idea of supporting the permission restriction with > mprotect() but as you can see in this response I did not find it to be > practical. Where is it practical? What is your application? How is it practical to delegate the concurrency management of a split mprotect() to user space? How do we get rid off a useless up-call to the host? > Reinette BR, Jarkko
On Mon, Mar 14, 2022 at 05:42:43AM +0200, Jarkko Sakkinen wrote: > On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote: > > Supporting permission restriction in an ioctl() enables the runtime to manage > > the enclave memory without needing to map it. > > Which is opposite what you do in EAUG. You can also augment pages without > needing the map them. Sure you get that capability, but it is quite useless > in practice. Essentially you are tuning for a niche artifical use case over the common case that most people end up doing. It makes no sense. BR, Jarkko
On Mon, Mar 14, 2022 at 05:45:48AM +0200, Jarkko Sakkinen wrote: > On Mon, Mar 14, 2022 at 05:42:43AM +0200, Jarkko Sakkinen wrote: > > On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote: > > > Supporting permission restriction in an ioctl() enables the runtime to manage > > > the enclave memory without needing to map it. > > > > Which is opposite what you do in EAUG. You can also augment pages without > > needing the map them. Sure you get that capability, but it is quite useless > > in practice. > > Essentially you are tuning for a niche artifical use case over the common > case that most people end up doing. It makes no sense. Also it is important to remember why EMODPR is there: it is not to bring useful control mechanism or interesting applications for SGX. It's there because of hardware constraints. Therefore it should be used accordingly and certainly not to fully expose its interface to the user space. Without hardware constraints, we would have only in-enclave EMODP. It is essentially a reset mechanism for EPCM, not more or less. Therefore, it should be used as such and pick a *fixed* value to reset the EPCM from the mapped range. I think PROT_READ is the sanest choice of the available options. Then, EMODPE can be used for the most part just like "EMODP". Please do not fully expose EMODPR to the user space. It's a pandora box of misbehaviour and shooting yourself into foot. BR, Jarkko
Hi Jarkko, On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote: > On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote: >> Supporting permission restriction in an ioctl() enables the runtime to manage >> the enclave memory without needing to map it. > > Which is opposite what you do in EAUG. You can also augment pages without > needing the map them. Sure you get that capability, but it is quite useless > in practice. > >> I have considered the idea of supporting the permission restriction with >> mprotect() but as you can see in this response I did not find it to be >> practical. > > Where is it practical? What is your application? How is it practical to > delegate the concurrency management of a split mprotect() to user space? > How do we get rid off a useless up-call to the host? > The email you responded to contained many obstacles against using mprotect() but you chose to ignore them and snipped them all from your response. Could you please address the issues instead of dismissing them? Reinette
Hi Jarkko On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote: >> On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote: >> > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: >> > >> > > I saw Haitao's note that EMODPE requires "Read access permitted by >> enclave". >> > > This motivates that EMODPR->PROT_NONE should not be allowed since >> it would >> > > not be possible to relax permissions (run EMODPE) after that. Even >> so, I >> > > also found in the SDM that EACCEPT has the note "Read access >> permitted >> > > by enclave". That seems to indicate that EMODPR->PROT_NONE is not >> practical >> > > from that perspective either since the enclave will not be able to >> > > EACCEPT the change. Does that match your understanding? >> > >> > Yes, PROT_NONE should not be allowed. >> > >> > This is however the real problem. >> > >> > The current kernel patch set has inconsistent API and EMODPR ioctl is >> > simply unacceptable. It also requires more concurrency management >> from >> > user space run-time, which would be heck a lot easier to do in the >> kernel. >> > >> > If you really want EMODPR as ioctl, then for consistencys sake, then >> EAUG >> > should be too. Like this when things go opposite directions, this >> patch set >> > plain and simply will not work out. >> > >> > I would pick EAUG's strategy from these two as it requires half the >> back >> > calls to host from an enclave. I.e. please combine mprotect() and >> EMODPR, >> > either in the #PF handler or as part of mprotect(), which ever suits >> you >> > best. >> > >> > I'll try demonstrate this with two examples. >> > >> > mmap() could go something like this() (simplified): >> > 1. Execution #UD's to SYSCALL. >> > 2. Host calls enclave's mmap() handler with mmap() parameters. >> > 3. Enclave up-calls host's mmap(). >> > 4. Loops the range with EACCEPTCOPY. >> > >> > mprotect() has to be done like this: >> > 1. Execution #UD's to SYSCALL. >> > 2. Host calls enclave's mprotect() handler. >> > 3. Enclave up-calls host's mprotect(). >> > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS. I assume up-calls here are ocalls as we call them in our implementation, which are the calls enclave make to untrusted side via EEXIT. If so, can your implementation combine this two up-calls into one, then host side just do ioctl() and mprotect to kernel? If so, would that address your concern about extra up-calls? >> > 3. Loops the range with EACCEPT. >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> 5. Loops the range with EACCEPT + EMODPE. >> >> > This is just terrible IMHO. I hope these examples bring some insight. > > E.g. in Enarx we have to add a special up-call (so called enarxcall in > intermediate that we call sallyport, which provides shared buffer to > communicate with the enclave) just for reseting the range with PROT_READ. > Feel very redundant, adds ugly cruft and is completely opposite strategy > to > what you've chosen to do with EAUG, which is I think correct choice as > far > as API is concerned. The problem with EMODPR on #PF is that kernel needs to know what permissions requested from enclave at the time of #PF. So enclave has to make at least one call to kernel (again via ocall in our case, I assume up-call in your case) to make the change. Enclave runtime may not know the permissions until upper layer application code (JIT or some kind of code loader) make the decision to change it. And the ocalls/up-calls can only be done at that time, not upfront, like mmap that is only used to reserve ranges. I also see this model as consistent to what kernel does for regular memory mappings: adding physical pages on #PF or pre-fault and changing PTE permissions only after mprotect is called. I would agree/prefer mprotect and the ioctl() for EMODPR be combined, but Reinette pointed out some issues above on managing VMAs and handling errors in that approach. BR Haitao
On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote: > Hi Jarkko, > > On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote: > > On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote: > >> Supporting permission restriction in an ioctl() enables the runtime to manage > >> the enclave memory without needing to map it. > > > > Which is opposite what you do in EAUG. You can also augment pages without > > needing the map them. Sure you get that capability, but it is quite useless > > in practice. > > > >> I have considered the idea of supporting the permission restriction with > >> mprotect() but as you can see in this response I did not find it to be > >> practical. > > > > Where is it practical? What is your application? How is it practical to > > delegate the concurrency management of a split mprotect() to user space? > > How do we get rid off a useless up-call to the host? > > > > The email you responded to contained many obstacles against using mprotect() > but you chose to ignore them and snipped them all from your response. Could > you please address the issues instead of dismissing them? I did read the whole email but did not see anything that would make a case for fully exposed EMODPR, or having asymmetrical towards how EAUG works. I had the same discussion with Haitao about PROT_NONE earlier, and am fully aware that PROT_READ is required. BR, Jarkko
On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote: > Hi Jarkko > > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote: > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote: > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: > > > > > > > > > I saw Haitao's note that EMODPE requires "Read access permitted > > > by enclave". > > > > > This motivates that EMODPR->PROT_NONE should not be allowed > > > since it would > > > > > not be possible to relax permissions (run EMODPE) after that. > > > Even so, I > > > > > also found in the SDM that EACCEPT has the note "Read access > > > permitted > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is > > > not practical > > > > > from that perspective either since the enclave will not be able to > > > > > EACCEPT the change. Does that match your understanding? > > > > > > > > Yes, PROT_NONE should not be allowed. > > > > > > > > This is however the real problem. > > > > > > > > The current kernel patch set has inconsistent API and EMODPR ioctl is > > > > simply unacceptable. It also requires more concurrency management > > > from > > > > user space run-time, which would be heck a lot easier to do in the > > > kernel. > > > > > > > > If you really want EMODPR as ioctl, then for consistencys sake, > > > then EAUG > > > > should be too. Like this when things go opposite directions, this > > > patch set > > > > plain and simply will not work out. > > > > > > > > I would pick EAUG's strategy from these two as it requires half > > > the back > > > > calls to host from an enclave. I.e. please combine mprotect() and > > > EMODPR, > > > > either in the #PF handler or as part of mprotect(), which ever > > > suits you > > > > best. > > > > > > > > I'll try demonstrate this with two examples. > > > > > > > > mmap() could go something like this() (simplified): > > > > 1. Execution #UD's to SYSCALL. > > > > 2. Host calls enclave's mmap() handler with mmap() parameters. > > > > 3. Enclave up-calls host's mmap(). > > > > 4. Loops the range with EACCEPTCOPY. > > > > > > > > mprotect() has to be done like this: > > > > 1. Execution #UD's to SYSCALL. > > > > 2. Host calls enclave's mprotect() handler. > > > > 3. Enclave up-calls host's mprotect(). > > > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS. > > I assume up-calls here are ocalls as we call them in our implementation, > which are the calls enclave make to untrusted side via EEXIT. > > If so, can your implementation combine this two up-calls into one, then host > side just do ioctl() and mprotect to kernel? If so, would that address your > concern about extra up-calls? > > > > > > 3. Loops the range with EACCEPT. > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > 5. Loops the range with EACCEPT + EMODPE. > > > > > > > This is just terrible IMHO. I hope these examples bring some insight. > > > > E.g. in Enarx we have to add a special up-call (so called enarxcall in > > intermediate that we call sallyport, which provides shared buffer to > > communicate with the enclave) just for reseting the range with PROT_READ. > > Feel very redundant, adds ugly cruft and is completely opposite strategy > > to > > what you've chosen to do with EAUG, which is I think correct choice as > > far > > as API is concerned. > > The problem with EMODPR on #PF is that kernel needs to know what permissions > requested from enclave at the time of #PF. So enclave has to make at least > one call to kernel (again via ocall in our case, I assume up-call in your > case) to make the change. Your security scheme is broken if permissions are requested outside the enclave, i.e. the hostile environment controls the permissions. That should always come from the enclave and enclave uses EACCEPT* to validate that what was given to EMODPR, EAUG and EMODT matches its expections. Upper layer application should not never be in charge, and a broken security scheme should never be supported. If EMODPR sets unconditionally to PROT_READ, enclave is able to validate this fact and then it can use EMODPE to set appropriate permissions. BR, Jarkko
On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote: > I also see this model as consistent to what kernel does for regular memory > mappings: adding physical pages on #PF or pre-fault and changing PTE > permissions only after mprotect is called. And you were against this in EAUG's case. As in the EAUG's case EMODPR could be done as part of the mprotect() flow. BR, Jarkko
On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote: > Hi Jarkko > > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote: > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote: > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: > > > > > > > > > I saw Haitao's note that EMODPE requires "Read access permitted > > > by enclave". > > > > > This motivates that EMODPR->PROT_NONE should not be allowed > > > since it would > > > > > not be possible to relax permissions (run EMODPE) after that. > > > Even so, I > > > > > also found in the SDM that EACCEPT has the note "Read access > > > permitted > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is > > > not practical > > > > > from that perspective either since the enclave will not be able to > > > > > EACCEPT the change. Does that match your understanding? > > > > > > > > Yes, PROT_NONE should not be allowed. > > > > > > > > This is however the real problem. > > > > > > > > The current kernel patch set has inconsistent API and EMODPR ioctl is > > > > simply unacceptable. It also requires more concurrency management > > > from > > > > user space run-time, which would be heck a lot easier to do in the > > > kernel. > > > > > > > > If you really want EMODPR as ioctl, then for consistencys sake, > > > then EAUG > > > > should be too. Like this when things go opposite directions, this > > > patch set > > > > plain and simply will not work out. > > > > > > > > I would pick EAUG's strategy from these two as it requires half > > > the back > > > > calls to host from an enclave. I.e. please combine mprotect() and > > > EMODPR, > > > > either in the #PF handler or as part of mprotect(), which ever > > > suits you > > > > best. > > > > > > > > I'll try demonstrate this with two examples. > > > > > > > > mmap() could go something like this() (simplified): > > > > 1. Execution #UD's to SYSCALL. > > > > 2. Host calls enclave's mmap() handler with mmap() parameters. > > > > 3. Enclave up-calls host's mmap(). > > > > 4. Loops the range with EACCEPTCOPY. > > > > > > > > mprotect() has to be done like this: > > > > 1. Execution #UD's to SYSCALL. > > > > 2. Host calls enclave's mprotect() handler. > > > > 3. Enclave up-calls host's mprotect(). > > > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS. > > I assume up-calls here are ocalls as we call them in our implementation, > which are the calls enclave make to untrusted side via EEXIT. > > If so, can your implementation combine this two up-calls into one, then host > side just do ioctl() and mprotect to kernel? If so, would that address your > concern about extra up-calls? > > > > > > 3. Loops the range with EACCEPT. > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > 5. Loops the range with EACCEPT + EMODPE. > > > > > > > This is just terrible IMHO. I hope these examples bring some insight. > > > > E.g. in Enarx we have to add a special up-call (so called enarxcall in > > intermediate that we call sallyport, which provides shared buffer to > > communicate with the enclave) just for reseting the range with PROT_READ. > > Feel very redundant, adds ugly cruft and is completely opposite strategy > > to > > what you've chosen to do with EAUG, which is I think correct choice as > > far > > as API is concerned. > > The problem with EMODPR on #PF is that kernel needs to know what permissions > requested from enclave at the time of #PF. So enclave has to make at least > one call to kernel (again via ocall in our case, I assume up-call in your > case) to make the change. The #PF handler should do unconditionally EMODPR with PROT_READ. BR, Jarkko
On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote: > On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote: > > Hi Jarkko > > > > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <jarkko@kernel.org> > > wrote: > > > > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote: > > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote: > > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: > > > > > > > > > > > I saw Haitao's note that EMODPE requires "Read access permitted > > > > by enclave". > > > > > > This motivates that EMODPR->PROT_NONE should not be allowed > > > > since it would > > > > > > not be possible to relax permissions (run EMODPE) after that. > > > > Even so, I > > > > > > also found in the SDM that EACCEPT has the note "Read access > > > > permitted > > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is > > > > not practical > > > > > > from that perspective either since the enclave will not be able to > > > > > > EACCEPT the change. Does that match your understanding? > > > > > > > > > > Yes, PROT_NONE should not be allowed. > > > > > > > > > > This is however the real problem. > > > > > > > > > > The current kernel patch set has inconsistent API and EMODPR ioctl is > > > > > simply unacceptable. It also requires more concurrency management > > > > from > > > > > user space run-time, which would be heck a lot easier to do in the > > > > kernel. > > > > > > > > > > If you really want EMODPR as ioctl, then for consistencys sake, > > > > then EAUG > > > > > should be too. Like this when things go opposite directions, this > > > > patch set > > > > > plain and simply will not work out. > > > > > > > > > > I would pick EAUG's strategy from these two as it requires half > > > > the back > > > > > calls to host from an enclave. I.e. please combine mprotect() and > > > > EMODPR, > > > > > either in the #PF handler or as part of mprotect(), which ever > > > > suits you > > > > > best. > > > > > > > > > > I'll try demonstrate this with two examples. > > > > > > > > > > mmap() could go something like this() (simplified): > > > > > 1. Execution #UD's to SYSCALL. > > > > > 2. Host calls enclave's mmap() handler with mmap() parameters. > > > > > 3. Enclave up-calls host's mmap(). > > > > > 4. Loops the range with EACCEPTCOPY. > > > > > > > > > > mprotect() has to be done like this: > > > > > 1. Execution #UD's to SYSCALL. > > > > > 2. Host calls enclave's mprotect() handler. > > > > > 3. Enclave up-calls host's mprotect(). > > > > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS. > > > > I assume up-calls here are ocalls as we call them in our implementation, > > which are the calls enclave make to untrusted side via EEXIT. > > > > If so, can your implementation combine this two up-calls into one, then host > > side just do ioctl() and mprotect to kernel? If so, would that address your > > concern about extra up-calls? > > > > > > > > > 3. Loops the range with EACCEPT. > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > 5. Loops the range with EACCEPT + EMODPE. > > > > > > > > > This is just terrible IMHO. I hope these examples bring some insight. > > > > > > E.g. in Enarx we have to add a special up-call (so called enarxcall in > > > intermediate that we call sallyport, which provides shared buffer to > > > communicate with the enclave) just for reseting the range with PROT_READ. > > > Feel very redundant, adds ugly cruft and is completely opposite strategy > > > to > > > what you've chosen to do with EAUG, which is I think correct choice as > > > far > > > as API is concerned. > > > > The problem with EMODPR on #PF is that kernel needs to know what permissions > > requested from enclave at the time of #PF. So enclave has to make at least > > one call to kernel (again via ocall in our case, I assume up-call in your > > case) to make the change. > > The #PF handler should do unconditionally EMODPR with PROT_READ. Or mprotect(), as long as secinfo contains PROT_READ. I don't care about this detail hugely anymore because it does not affect uapi. Using EMODPR as a permission control mechanism is a ridiculous idea, and I cannot commit to maintain a broken uapi. BR, Jarkko
Hi On Thu, 17 Mar 2022 02:11:28 -0500, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote: >> On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote: >> > Hi Jarkko >> > >> > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen >> <jarkko@kernel.org> >> > wrote: >> > >> > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote: >> > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote: >> > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: >> > > > > >> > > > > > I saw Haitao's note that EMODPE requires "Read access >> permitted >> > > > by enclave". >> > > > > > This motivates that EMODPR->PROT_NONE should not be allowed >> > > > since it would >> > > > > > not be possible to relax permissions (run EMODPE) after that. >> > > > Even so, I >> > > > > > also found in the SDM that EACCEPT has the note "Read access >> > > > permitted >> > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is >> > > > not practical >> > > > > > from that perspective either since the enclave will not be >> able to >> > > > > > EACCEPT the change. Does that match your understanding? >> > > > > >> > > > > Yes, PROT_NONE should not be allowed. >> > > > > >> > > > > This is however the real problem. >> > > > > >> > > > > The current kernel patch set has inconsistent API and EMODPR >> ioctl is >> > > > > simply unacceptable. It also requires more concurrency >> management >> > > > from >> > > > > user space run-time, which would be heck a lot easier to do in >> the >> > > > kernel. >> > > > > >> > > > > If you really want EMODPR as ioctl, then for consistencys sake, >> > > > then EAUG >> > > > > should be too. Like this when things go opposite directions, >> this >> > > > patch set >> > > > > plain and simply will not work out. >> > > > > >> > > > > I would pick EAUG's strategy from these two as it requires half >> > > > the back >> > > > > calls to host from an enclave. I.e. please combine mprotect() >> and >> > > > EMODPR, >> > > > > either in the #PF handler or as part of mprotect(), which ever >> > > > suits you >> > > > > best. >> > > > > >> > > > > I'll try demonstrate this with two examples. >> > > > > >> > > > > mmap() could go something like this() (simplified): >> > > > > 1. Execution #UD's to SYSCALL. >> > > > > 2. Host calls enclave's mmap() handler with mmap() parameters. >> > > > > 3. Enclave up-calls host's mmap(). >> > > > > 4. Loops the range with EACCEPTCOPY. >> > > > > >> > > > > mprotect() has to be done like this: >> > > > > 1. Execution #UD's to SYSCALL. >> > > > > 2. Host calls enclave's mprotect() handler. >> > > > > 3. Enclave up-calls host's mprotect(). >> > > > > 4. Enclave up-calls host's ioctl() to >> SGX_IOC_ENCLAVE_PERMISSIONS. >> > >> > I assume up-calls here are ocalls as we call them in our >> implementation, >> > which are the calls enclave make to untrusted side via EEXIT. >> >ar >> > If so, can your implementation combine this two up-calls into one, >> then host >> > side just do ioctl() and mprotect to kernel? If so, would that >> address your >> > concern about extra up-calls? >> > >> > >> > > > > 3. Loops the range with EACCEPT. >> > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> > > > 5. Loops the range with EACCEPT + EMODPE. >> > > > >> > > > > This is just terrible IMHO. I hope these examples bring some >> insight. >> > > >> > > E.g. in Enarx we have to add a special up-call (so called enarxcall >> in >> > > intermediate that we call sallyport, which provides shared buffer to >> > > communicate with the enclave) just for reseting the range with >> PROT_READ. >> > > Feel very redundant, adds ugly cruft and is completely opposite >> strategy >> > > to >> > > what you've chosen to do with EAUG, which is I think correct choice >> as >> > > far >> > > as API is concerned. >> > >> > The problem with EMODPR on #PF is that kernel needs to know what >> permissions >> > requested from enclave at the time of #PF. So enclave has to make at >> least >> > one call to kernel (again via ocall in our case, I assume up-call in >> your >> > case) to make the change. >> >> The #PF handler should do unconditionally EMODPR with PROT_READ. > > Or mprotect(), as long as secinfo contains PROT_READ. I don't care about > this detail hugely anymore because it does not affect uapi. > > Using EMODPR as a permission control mechanism is a ridiculous idea, and > I cannot commit to maintain a broken uapi. > Jarkko, how would automatically forcing PROT_READ on #PF work for this sequence? 1) EAUG a page (has to be RW) 2) EACCEPT(RW) 3) enclave copies some data to page 4) enclave wants to change permission to R If you are proposing mprotect, then as I indicated earlier, please address concerns raised by Reinette: https://lore.kernel.org/linux-sgx/e1c04077-0165-c5ec-53be-7fd732965e80@intel.com/ Thanks Haitao
On Wed, 16 Mar 2022 23:34:39 -0500, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote: >> Hi Jarkko >> >> On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen <jarkko@kernel.org> >> wrote: >> >> > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote: >> > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote: >> > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: >> > > > >> > > > > I saw Haitao's note that EMODPE requires "Read access permitted >> > > by enclave". >> > > > > This motivates that EMODPR->PROT_NONE should not be allowed >> > > since it would >> > > > > not be possible to relax permissions (run EMODPE) after that. >> > > Even so, I >> > > > > also found in the SDM that EACCEPT has the note "Read access >> > > permitted >> > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is >> > > not practical >> > > > > from that perspective either since the enclave will not be able >> to >> > > > > EACCEPT the change. Does that match your understanding? >> > > > >> > > > Yes, PROT_NONE should not be allowed. >> > > > >> > > > This is however the real problem. >> > > > >> > > > The current kernel patch set has inconsistent API and EMODPR >> ioctl is >> > > > simply unacceptable. It also requires more concurrency management >> > > from >> > > > user space run-time, which would be heck a lot easier to do in the >> > > kernel. >> > > > >> > > > If you really want EMODPR as ioctl, then for consistencys sake, >> > > then EAUG >> > > > should be too. Like this when things go opposite directions, this >> > > patch set >> > > > plain and simply will not work out. >> > > > >> > > > I would pick EAUG's strategy from these two as it requires half >> > > the back >> > > > calls to host from an enclave. I.e. please combine mprotect() and >> > > EMODPR, >> > > > either in the #PF handler or as part of mprotect(), which ever >> > > suits you >> > > > best. >> > > > >> > > > I'll try demonstrate this with two examples. >> > > > >> > > > mmap() could go something like this() (simplified): >> > > > 1. Execution #UD's to SYSCALL. >> > > > 2. Host calls enclave's mmap() handler with mmap() parameters. >> > > > 3. Enclave up-calls host's mmap(). >> > > > 4. Loops the range with EACCEPTCOPY. >> > > > >> > > > mprotect() has to be done like this: >> > > > 1. Execution #UD's to SYSCALL. >> > > > 2. Host calls enclave's mprotect() handler. >> > > > 3. Enclave up-calls host's mprotect(). >> > > > 4. Enclave up-calls host's ioctl() to SGX_IOC_ENCLAVE_PERMISSIONS. >> >> I assume up-calls here are ocalls as we call them in our implementation, >> which are the calls enclave make to untrusted side via EEXIT. >> >> If so, can your implementation combine this two up-calls into one, then >> host >> side just do ioctl() and mprotect to kernel? If so, would that address >> your >> concern about extra up-calls? >> >> >> > > > 3. Loops the range with EACCEPT. >> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> > > 5. Loops the range with EACCEPT + EMODPE. >> > > >> > > > This is just terrible IMHO. I hope these examples bring some >> insight. >> > >> > E.g. in Enarx we have to add a special up-call (so called enarxcall in >> > intermediate that we call sallyport, which provides shared buffer to >> > communicate with the enclave) just for reseting the range with >> PROT_READ. >> > Feel very redundant, adds ugly cruft and is completely opposite >> strategy >> > to >> > what you've chosen to do with EAUG, which is I think correct choice as >> > far >> > as API is concerned. >> >> The problem with EMODPR on #PF is that kernel needs to know what >> permissions >> requested from enclave at the time of #PF. So enclave has to make at >> least >> one call to kernel (again via ocall in our case, I assume up-call in >> your >> case) to make the change. > > Your security scheme is broken if permissions are requested outside the > enclave, i.e. the hostile environment controls the permissions. That > should > always come from the enclave and enclave uses EACCEPT* to validate that > what was given to EMODPR, EAUG and EMODT matches its expections. > > Upper layer application should not never be in charge, and a broken > security scheme should never be supported. > Upper layer in this case I mean code inside enclave. Enclave can always use EACCEPT to verify permissions and is in full control of EPCM permissions. Kernel(code outside enclave invoking kernel) would only be able to reduce EPCM permissions, and as you know enclave can always EMODPE. So this is not related to enclave security. Haitao
On Wed, 16 Mar 2022 23:37:26 -0500, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote: >> I also see this model as consistent to what kernel does for regular >> memory >> mappings: adding physical pages on #PF or pre-fault and changing PTE >> permissions only after mprotect is called. > > And you were against this in EAUG's case. As in the EAUG's case > EMODPR could be done as part of the mprotect() flow. > I preferred not automatic/unconditional EAUG during mmap. Here I think automatic/unconditional EMODPR(PROT_READ) on #PF would not work for all cases. See my reply to your other email. Thanks Haitao
On Thu, Mar 17, 2022 at 09:28:45AM -0500, Haitao Huang wrote: > Hi > > On Thu, 17 Mar 2022 02:11:28 -0500, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote: > > > On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote: > > > > Hi Jarkko > > > > > > > > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen > > > <jarkko@kernel.org> > > > > wrote: > > > > > > > > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote: > > > > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote: > > > > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: > > > > > > > > > > > > > > > I saw Haitao's note that EMODPE requires "Read access > > > permitted > > > > > > by enclave". > > > > > > > > This motivates that EMODPR->PROT_NONE should not be allowed > > > > > > since it would > > > > > > > > not be possible to relax permissions (run EMODPE) after that. > > > > > > Even so, I > > > > > > > > also found in the SDM that EACCEPT has the note "Read access > > > > > > permitted > > > > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is > > > > > > not practical > > > > > > > > from that perspective either since the enclave will not be > > > able to > > > > > > > > EACCEPT the change. Does that match your understanding? > > > > > > > > > > > > > > Yes, PROT_NONE should not be allowed. > > > > > > > > > > > > > > This is however the real problem. > > > > > > > > > > > > > > The current kernel patch set has inconsistent API and EMODPR > > > ioctl is > > > > > > > simply unacceptable. It also requires more concurrency > > > management > > > > > > from > > > > > > > user space run-time, which would be heck a lot easier to do > > > in the > > > > > > kernel. > > > > > > > > > > > > > > If you really want EMODPR as ioctl, then for consistencys sake, > > > > > > then EAUG > > > > > > > should be too. Like this when things go opposite directions, > > > this > > > > > > patch set > > > > > > > plain and simply will not work out. > > > > > > > > > > > > > > I would pick EAUG's strategy from these two as it requires half > > > > > > the back > > > > > > > calls to host from an enclave. I.e. please combine > > > mprotect() and > > > > > > EMODPR, > > > > > > > either in the #PF handler or as part of mprotect(), which ever > > > > > > suits you > > > > > > > best. > > > > > > > > > > > > > > I'll try demonstrate this with two examples. > > > > > > > > > > > > > > mmap() could go something like this() (simplified): > > > > > > > 1. Execution #UD's to SYSCALL. > > > > > > > 2. Host calls enclave's mmap() handler with mmap() parameters. > > > > > > > 3. Enclave up-calls host's mmap(). > > > > > > > 4. Loops the range with EACCEPTCOPY. > > > > > > > > > > > > > > mprotect() has to be done like this: > > > > > > > 1. Execution #UD's to SYSCALL. > > > > > > > 2. Host calls enclave's mprotect() handler. > > > > > > > 3. Enclave up-calls host's mprotect(). > > > > > > > 4. Enclave up-calls host's ioctl() to > > > SGX_IOC_ENCLAVE_PERMISSIONS. > > > > > > > > I assume up-calls here are ocalls as we call them in our > > > implementation, > > > > which are the calls enclave make to untrusted side via EEXIT. > > > >ar > > > > If so, can your implementation combine this two up-calls into one, > > > then host > > > > side just do ioctl() and mprotect to kernel? If so, would that > > > address your > > > > concern about extra up-calls? > > > > > > > > > > > > > > > 3. Loops the range with EACCEPT. > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > 5. Loops the range with EACCEPT + EMODPE. > > > > > > > > > > > > > This is just terrible IMHO. I hope these examples bring some > > > insight. > > > > > > > > > > E.g. in Enarx we have to add a special up-call (so called > > > enarxcall in > > > > > intermediate that we call sallyport, which provides shared buffer to > > > > > communicate with the enclave) just for reseting the range with > > > PROT_READ. > > > > > Feel very redundant, adds ugly cruft and is completely opposite > > > strategy > > > > > to > > > > > what you've chosen to do with EAUG, which is I think correct > > > choice as > > > > > far > > > > > as API is concerned. > > > > > > > > The problem with EMODPR on #PF is that kernel needs to know what > > > permissions > > > > requested from enclave at the time of #PF. So enclave has to make > > > at least > > > > one call to kernel (again via ocall in our case, I assume up-call > > > in your > > > > case) to make the change. > > > > > > The #PF handler should do unconditionally EMODPR with PROT_READ. > > > > Or mprotect(), as long as secinfo contains PROT_READ. I don't care about > > this detail hugely anymore because it does not affect uapi. > > > > Using EMODPR as a permission control mechanism is a ridiculous idea, and > > I cannot commit to maintain a broken uapi. > > > > Jarkko, how would automatically forcing PROT_READ on #PF work for this > sequence? > > 1) EAUG a page (has to be RW) > 2) EACCEPT(RW) > 3) enclave copies some data to page > 4) enclave wants to change permission to R > > If you are proposing mprotect, then as I indicated earlier, please address > concerns raised by Reinette: > https://lore.kernel.org/linux-sgx/e1c04077-0165-c5ec-53be-7fd732965e80@intel.com/ For EAUG you can choose between #PF handler and having it as part of mmap() with the same uapi. For EMODPR clearly #PF handler would be tricky but nothing prevents resetting the permissions as part of mprotect() flow, which is trivial. One good reason to have a fixed EMODPR is that e.g. emulating properly mprotect() is almost undoable if you don't do it otherwise. Specifically the scenario where your address range spans through multiple adjacent VMAs. It's even without EMODPR complex enough scenario that you really don't want to ask yourself for more trouble than use EMODPR in a super conservative manner. Having EMODPR fully exposed will only make more difficult API to do with extra round-trips. If you want to use ring-0 instructions fully exposed, please don't use a kernel. There's a bunch of hardware features in Intel CPUs for which Linux does not provide 1:1 all wide open interfaces. BR, Jarkko
On Thu, Mar 17, 2022 at 11:50:41PM +0200, Jarkko Sakkinen wrote: > On Thu, Mar 17, 2022 at 09:28:45AM -0500, Haitao Huang wrote: > > Hi > > > > On Thu, 17 Mar 2022 02:11:28 -0500, Jarkko Sakkinen <jarkko@kernel.org> > > wrote: > > > > > On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote: > > > > On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote: > > > > > Hi Jarkko > > > > > > > > > > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen > > > > <jarkko@kernel.org> > > > > > wrote: > > > > > > > > > > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote: > > > > > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote: > > > > > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: > > > > > > > > > > > > > > > > > I saw Haitao's note that EMODPE requires "Read access > > > > permitted > > > > > > > by enclave". > > > > > > > > > This motivates that EMODPR->PROT_NONE should not be allowed > > > > > > > since it would > > > > > > > > > not be possible to relax permissions (run EMODPE) after that. > > > > > > > Even so, I > > > > > > > > > also found in the SDM that EACCEPT has the note "Read access > > > > > > > permitted > > > > > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is > > > > > > > not practical > > > > > > > > > from that perspective either since the enclave will not be > > > > able to > > > > > > > > > EACCEPT the change. Does that match your understanding? > > > > > > > > > > > > > > > > Yes, PROT_NONE should not be allowed. > > > > > > > > > > > > > > > > This is however the real problem. > > > > > > > > > > > > > > > > The current kernel patch set has inconsistent API and EMODPR > > > > ioctl is > > > > > > > > simply unacceptable. It also requires more concurrency > > > > management > > > > > > > from > > > > > > > > user space run-time, which would be heck a lot easier to do > > > > in the > > > > > > > kernel. > > > > > > > > > > > > > > > > If you really want EMODPR as ioctl, then for consistencys sake, > > > > > > > then EAUG > > > > > > > > should be too. Like this when things go opposite directions, > > > > this > > > > > > > patch set > > > > > > > > plain and simply will not work out. > > > > > > > > > > > > > > > > I would pick EAUG's strategy from these two as it requires half > > > > > > > the back > > > > > > > > calls to host from an enclave. I.e. please combine > > > > mprotect() and > > > > > > > EMODPR, > > > > > > > > either in the #PF handler or as part of mprotect(), which ever > > > > > > > suits you > > > > > > > > best. > > > > > > > > > > > > > > > > I'll try demonstrate this with two examples. > > > > > > > > > > > > > > > > mmap() could go something like this() (simplified): > > > > > > > > 1. Execution #UD's to SYSCALL. > > > > > > > > 2. Host calls enclave's mmap() handler with mmap() parameters. > > > > > > > > 3. Enclave up-calls host's mmap(). > > > > > > > > 4. Loops the range with EACCEPTCOPY. > > > > > > > > > > > > > > > > mprotect() has to be done like this: > > > > > > > > 1. Execution #UD's to SYSCALL. > > > > > > > > 2. Host calls enclave's mprotect() handler. > > > > > > > > 3. Enclave up-calls host's mprotect(). > > > > > > > > 4. Enclave up-calls host's ioctl() to > > > > SGX_IOC_ENCLAVE_PERMISSIONS. > > > > > > > > > > I assume up-calls here are ocalls as we call them in our > > > > implementation, > > > > > which are the calls enclave make to untrusted side via EEXIT. > > > > >ar > > > > > If so, can your implementation combine this two up-calls into one, > > > > then host > > > > > side just do ioctl() and mprotect to kernel? If so, would that > > > > address your > > > > > concern about extra up-calls? > > > > > > > > > > > > > > > > > > 3. Loops the range with EACCEPT. > > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > > 5. Loops the range with EACCEPT + EMODPE. > > > > > > > > > > > > > > > This is just terrible IMHO. I hope these examples bring some > > > > insight. > > > > > > > > > > > > E.g. in Enarx we have to add a special up-call (so called > > > > enarxcall in > > > > > > intermediate that we call sallyport, which provides shared buffer to > > > > > > communicate with the enclave) just for reseting the range with > > > > PROT_READ. > > > > > > Feel very redundant, adds ugly cruft and is completely opposite > > > > strategy > > > > > > to > > > > > > what you've chosen to do with EAUG, which is I think correct > > > > choice as > > > > > > far > > > > > > as API is concerned. > > > > > > > > > > The problem with EMODPR on #PF is that kernel needs to know what > > > > permissions > > > > > requested from enclave at the time of #PF. So enclave has to make > > > > at least > > > > > one call to kernel (again via ocall in our case, I assume up-call > > > > in your > > > > > case) to make the change. > > > > > > > > The #PF handler should do unconditionally EMODPR with PROT_READ. > > > > > > Or mprotect(), as long as secinfo contains PROT_READ. I don't care about > > > this detail hugely anymore because it does not affect uapi. > > > > > > Using EMODPR as a permission control mechanism is a ridiculous idea, and > > > I cannot commit to maintain a broken uapi. > > > > > > > Jarkko, how would automatically forcing PROT_READ on #PF work for this > > sequence? > > > > 1) EAUG a page (has to be RW) > > 2) EACCEPT(RW) > > 3) enclave copies some data to page > > 4) enclave wants to change permission to R > > > > If you are proposing mprotect, then as I indicated earlier, please address > > concerns raised by Reinette: > > https://lore.kernel.org/linux-sgx/e1c04077-0165-c5ec-53be-7fd732965e80@intel.com/ > > For EAUG you can choose between #PF handler and having it as part of > mmap() with the same uapi. > > For EMODPR clearly #PF handler would be tricky but nothing prevents > resetting the permissions as part of mprotect() flow, which is trivial. > > One good reason to have a fixed EMODPR is that e.g. emulating properly > mprotect() is almost undoable if you don't do it otherwise. Specifically s/don't//g > the scenario where your address range spans through multiple adjacent > VMAs. It's even without EMODPR complex enough scenario that you really > don't want to ask yourself for more trouble than use EMODPR in a super > conservative manner. > > Having EMODPR fully exposed will only make more difficult API to do with > extra round-trips. If you want to use ring-0 instructions fully exposed, > please don't use a kernel. There's a bunch of hardware features in Intel > CPUs for which Linux does not provide 1:1 all wide open interfaces. > > BR, Jarkko
Hi Jarkko, On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote: > On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote: >> Hi Jarkko, >> >> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote: >>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote: >>>> Supporting permission restriction in an ioctl() enables the runtime to manage >>>> the enclave memory without needing to map it. >>> >>> Which is opposite what you do in EAUG. You can also augment pages without >>> needing the map them. Sure you get that capability, but it is quite useless >>> in practice. >>> >>>> I have considered the idea of supporting the permission restriction with >>>> mprotect() but as you can see in this response I did not find it to be >>>> practical. >>> >>> Where is it practical? What is your application? How is it practical to >>> delegate the concurrency management of a split mprotect() to user space? >>> How do we get rid off a useless up-call to the host? >>> >> >> The email you responded to contained many obstacles against using mprotect() >> but you chose to ignore them and snipped them all from your response. Could >> you please address the issues instead of dismissing them? > > I did read the whole email but did not see anything that would make a case > for fully exposed EMODPR, or having asymmetrical towards how EAUG works. I believe that on its own each obstacle I shared with you is significant enough to not follow that approach. You simply respond that I am just not making a case without acknowledging any obstacle or providing a reason why the obstacles are not valid. To help me understand your view, could you please respond to each of the obstacles I list below and how it is not an issue? 1) ABI change: mprotect() is currently supported to modify VMA permissions irrespective of EPCM permissions. Supporting EPCM permission changes with mprotect() would change this behavior. For example, currently it is possible to have RW enclave memory and support multiple tasks accessing the memory. Two tasks can map the memory RW and later one can run mprotect() to reduce the VMA permissions to read-only without impacting the access of the other task. By moving EPCM permission changes to mprotect() this usage will no longer be supported and current behavior will change. 2) Only half EPCM permission management: Moving to mprotect() as a way to set EPCM permissions is not a clear interface for EPCM permission management because the kernel can only restrict permissions. Even so, the kernel has no insight into the current EPCM permissions and thus whether they actually need to be restricted so every mprotect() call, all except RWX, will need to be treated as a permission restriction with all the implementation obstacles that accompany it (more below). There are two possible ways to implement permission restriction as triggered by mprotect(), (a) during the mprotect() call or (b) during a subsequent #PF (as suggested by you), each has its own obstacles. 3) mprotect() implementation When the user calls mprotect() the expectation is that the call will either succeed or fail. If the call fails the user expects the system to be unchanged. This is not possible if permission restriction is done as part of mprotect(). (a) mprotect() may span multiple VMAs and involves VMA splits that (from what I understand) cannot be undone. SGX memory does not support VMA merges. If any SGX function (EMODPR or ETRACK on any page) done after a VMA split fails then the user will be left with fragmented memory. (b) The EMODPR/ETRACK pair can fail on any of the pages provided by the mprotect() call. If there is a failure then the kernel cannot undo previously executed EMODPR since the kernel cannot run EMODPE. The EPCM permissions are thus left in inconsistent state since some of the pages would have changed EPCM permissions and mprotect() does not have mechanism to communicate partial success. The partial success is needed to communicate to user space (i) which pages need EACCEPT, (ii) which pages need to be in new request (although user space does not have information to help the new request succeed - see below). (c) User space runtime has control over management of EPC memory and accurate failure information would help it to do so. Knowing the error code of the EMODPR failure would help user space to take appropriate action. For example, EMODPR can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime to learn that it needs to run EACCEPT on that page before the EMODPR can succeed. Alternatively, if it learns that the return is "SGX_EPC_PAGE_CONFLICT" then it could determine that some other part of the runtime attempted an ENCLU function on that page. It is not possible to provide such detailed errors to user space with mprotect(). 4) #PF implementation (a) There is more to restricting permissions than just running ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should also initiate the ETRACK flow to ensure that any thread within the enclave is interrupted by sending an IPI to the CPU, this includes the thread that just triggered the #PF. (b) Second consideration of the EMODPR and ETRACK flow is that this has a large "blast radius" in that any thread in the enclave needs to be interrupted. #PFs may arrive at any time so setting up a page range where a fault into any page in the page range will trigger enclave exits for all threads is a significant yet random impact. I believe it would be better to update all pages in the range at the same time and in this way contain the impact of this significant EMODPR/ETRACK/IPIs flow. (c) How will the page fault handler know when EMODPR/ETRACK should be run? Consider that the page fault handler can be called significantly later than the mprotect() call and that user space can call EMODPE any time to extend permissions. This implies that EMODPR/ETRACK/IPIs should be run during *every* page fault, irrespective of mprotect(). (d) If a page is in pending or modified state then EMODPR will always fail. This is something that needs to be fixed by user space runtime but the page fault will not be able to communicate this. Considering the above, could you please provide clear guidance on how you envision permission restriction to be supported by mprotect()? Reinette
On Fri, Mar 18, 2022 at 12:00:17AM +0200, Jarkko Sakkinen wrote: > On Thu, Mar 17, 2022 at 11:50:41PM +0200, Jarkko Sakkinen wrote: > > On Thu, Mar 17, 2022 at 09:28:45AM -0500, Haitao Huang wrote: > > > Hi > > > > > > On Thu, 17 Mar 2022 02:11:28 -0500, Jarkko Sakkinen <jarkko@kernel.org> > > > wrote: > > > > > > > On Thu, Mar 17, 2022 at 09:01:07AM +0200, Jarkko Sakkinen wrote: > > > > > On Mon, Mar 14, 2022 at 10:39:36AM -0500, Haitao Huang wrote: > > > > > > Hi Jarkko > > > > > > > > > > > > On Sun, 13 Mar 2022 21:58:51 -0500, Jarkko Sakkinen > > > > > <jarkko@kernel.org> > > > > > > wrote: > > > > > > > > > > > > > On Mon, Mar 14, 2022 at 04:50:56AM +0200, Jarkko Sakkinen wrote: > > > > > > > > On Mon, Mar 14, 2022 at 04:49:37AM +0200, Jarkko Sakkinen wrote: > > > > > > > > > On Fri, Mar 11, 2022 at 09:53:29AM -0800, Reinette Chatre wrote: > > > > > > > > > > > > > > > > > > > I saw Haitao's note that EMODPE requires "Read access > > > > > permitted > > > > > > > > by enclave". > > > > > > > > > > This motivates that EMODPR->PROT_NONE should not be allowed > > > > > > > > since it would > > > > > > > > > > not be possible to relax permissions (run EMODPE) after that. > > > > > > > > Even so, I > > > > > > > > > > also found in the SDM that EACCEPT has the note "Read access > > > > > > > > permitted > > > > > > > > > > by enclave". That seems to indicate that EMODPR->PROT_NONE is > > > > > > > > not practical > > > > > > > > > > from that perspective either since the enclave will not be > > > > > able to > > > > > > > > > > EACCEPT the change. Does that match your understanding? > > > > > > > > > > > > > > > > > > Yes, PROT_NONE should not be allowed. > > > > > > > > > > > > > > > > > > This is however the real problem. > > > > > > > > > > > > > > > > > > The current kernel patch set has inconsistent API and EMODPR > > > > > ioctl is > > > > > > > > > simply unacceptable. It also requires more concurrency > > > > > management > > > > > > > > from > > > > > > > > > user space run-time, which would be heck a lot easier to do > > > > > in the > > > > > > > > kernel. > > > > > > > > > > > > > > > > > > If you really want EMODPR as ioctl, then for consistencys sake, > > > > > > > > then EAUG > > > > > > > > > should be too. Like this when things go opposite directions, > > > > > this > > > > > > > > patch set > > > > > > > > > plain and simply will not work out. > > > > > > > > > > > > > > > > > > I would pick EAUG's strategy from these two as it requires half > > > > > > > > the back > > > > > > > > > calls to host from an enclave. I.e. please combine > > > > > mprotect() and > > > > > > > > EMODPR, > > > > > > > > > either in the #PF handler or as part of mprotect(), which ever > > > > > > > > suits you > > > > > > > > > best. > > > > > > > > > > > > > > > > > > I'll try demonstrate this with two examples. > > > > > > > > > > > > > > > > > > mmap() could go something like this() (simplified): > > > > > > > > > 1. Execution #UD's to SYSCALL. > > > > > > > > > 2. Host calls enclave's mmap() handler with mmap() parameters. > > > > > > > > > 3. Enclave up-calls host's mmap(). > > > > > > > > > 4. Loops the range with EACCEPTCOPY. > > > > > > > > > > > > > > > > > > mprotect() has to be done like this: > > > > > > > > > 1. Execution #UD's to SYSCALL. > > > > > > > > > 2. Host calls enclave's mprotect() handler. > > > > > > > > > 3. Enclave up-calls host's mprotect(). > > > > > > > > > 4. Enclave up-calls host's ioctl() to > > > > > SGX_IOC_ENCLAVE_PERMISSIONS. > > > > > > > > > > > > I assume up-calls here are ocalls as we call them in our > > > > > implementation, > > > > > > which are the calls enclave make to untrusted side via EEXIT. > > > > > >ar > > > > > > If so, can your implementation combine this two up-calls into one, > > > > > then host > > > > > > side just do ioctl() and mprotect to kernel? If so, would that > > > > > address your > > > > > > concern about extra up-calls? > > > > > > > > > > > > > > > > > > > > > 3. Loops the range with EACCEPT. > > > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > > > 5. Loops the range with EACCEPT + EMODPE. > > > > > > > > > > > > > > > > > This is just terrible IMHO. I hope these examples bring some > > > > > insight. > > > > > > > > > > > > > > E.g. in Enarx we have to add a special up-call (so called > > > > > enarxcall in > > > > > > > intermediate that we call sallyport, which provides shared buffer to > > > > > > > communicate with the enclave) just for reseting the range with > > > > > PROT_READ. > > > > > > > Feel very redundant, adds ugly cruft and is completely opposite > > > > > strategy > > > > > > > to > > > > > > > what you've chosen to do with EAUG, which is I think correct > > > > > choice as > > > > > > > far > > > > > > > as API is concerned. > > > > > > > > > > > > The problem with EMODPR on #PF is that kernel needs to know what > > > > > permissions > > > > > > requested from enclave at the time of #PF. So enclave has to make > > > > > at least > > > > > > one call to kernel (again via ocall in our case, I assume up-call > > > > > in your > > > > > > case) to make the change. > > > > > > > > > > The #PF handler should do unconditionally EMODPR with PROT_READ. > > > > > > > > Or mprotect(), as long as secinfo contains PROT_READ. I don't care about > > > > this detail hugely anymore because it does not affect uapi. > > > > > > > > Using EMODPR as a permission control mechanism is a ridiculous idea, and > > > > I cannot commit to maintain a broken uapi. > > > > > > > > > > Jarkko, how would automatically forcing PROT_READ on #PF work for this > > > sequence? > > > > > > 1) EAUG a page (has to be RW) > > > 2) EACCEPT(RW) > > > 3) enclave copies some data to page > > > 4) enclave wants to change permission to R > > > > > > If you are proposing mprotect, then as I indicated earlier, please address > > > concerns raised by Reinette: > > > https://lore.kernel.org/linux-sgx/e1c04077-0165-c5ec-53be-7fd732965e80@intel.com/ > > > > For EAUG you can choose between #PF handler and having it as part of > > mmap() with the same uapi. > > > > For EMODPR clearly #PF handler would be tricky but nothing prevents > > resetting the permissions as part of mprotect() flow, which is trivial. > > > > One good reason to have a fixed EMODPR is that e.g. emulating properly > > mprotect() is almost undoable if you don't do it otherwise. Specifically > > s/don't//g > > > the scenario where your address range spans through multiple adjacent > > VMAs. It's even without EMODPR complex enough scenario that you really > > don't want to ask yourself for more trouble than use EMODPR in a super > > conservative manner. > > > > Having EMODPR fully exposed will only make more difficult API to do with > > extra round-trips. If you want to use ring-0 instructions fully exposed, > > please don't use a kernel. There's a bunch of hardware features in Intel > > CPUs for which Linux does not provide 1:1 all wide open interfaces. I've now run a tweaked SGX2 v2 patch set [*] over 1,5 weeks and I'm really really confident about the stability. My laptop has not crashed a single time. For EAUG portion I'm probably rather sooner than later ready to give reviewed-by's because the API works just great. Just want to put a note that it is not the internals that I'm too concerned off. For v3 I'd suggest that it is sent as you see fit and not to get stuck to EMODPR. What I'll do, once I get it, is that I'll construct a small well-defined patch or perhaps patch set, which shows how I would change the EMODPR part. [*] I run it my 2020 XPS13 laptop, which is SGX2 capable, and created this CI thing that produces periodically automated kernel package builds of it for the Arch Linux: https://github.com/jarkkojs/aur-linux-sgx/actions. It's distro kernel with the same config, Reinette's patches on top, and my tweaks on top of them. When v3 comes out, I'll update the kernel version and replaces the v2+ patches with them. BR, Jarkko
On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote: > Hi Jarkko, > > On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote: > > On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote: > >> Hi Jarkko, > >> > >> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote: > >>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote: > >>>> Supporting permission restriction in an ioctl() enables the runtime to manage > >>>> the enclave memory without needing to map it. > >>> > >>> Which is opposite what you do in EAUG. You can also augment pages without > >>> needing the map them. Sure you get that capability, but it is quite useless > >>> in practice. > >>> > >>>> I have considered the idea of supporting the permission restriction with > >>>> mprotect() but as you can see in this response I did not find it to be > >>>> practical. > >>> > >>> Where is it practical? What is your application? How is it practical to > >>> delegate the concurrency management of a split mprotect() to user space? > >>> How do we get rid off a useless up-call to the host? > >>> > >> > >> The email you responded to contained many obstacles against using mprotect() > >> but you chose to ignore them and snipped them all from your response. Could > >> you please address the issues instead of dismissing them? > > > > I did read the whole email but did not see anything that would make a case > > for fully exposed EMODPR, or having asymmetrical towards how EAUG works. > > I believe that on its own each obstacle I shared with you is significant enough > to not follow that approach. You simply respond that I am just not making a > case without acknowledging any obstacle or providing a reason why the obstacles > are not valid. > > To help me understand your view, could you please respond to each of the > obstacles I list below and how it is not an issue? > > > 1) ABI change: > mprotect() is currently supported to modify VMA permissions > irrespective of EPCM permissions. Supporting EPCM permission > changes with mprotect() would change this behavior. > For example, currently it is possible to have RW enclave > memory and support multiple tasks accessing the memory. Two > tasks can map the memory RW and later one can run mprotect() > to reduce the VMA permissions to read-only without impacting > the access of the other task. > By moving EPCM permission changes to mprotect() this usage > will no longer be supported and current behavior will change. Your concurrency scenario is somewhat artificial. Obviously you need to synchronize somehow, and breaking something that could be done with one system call into two separates is not going to help with that. On the contrary, it will add a yet one more difficulty layer. mprotect() controls PTE permissions, not EPCM permissions. It is the corner stone to do any sort of confidential computing to have this division. That's why EACCEPT and EACCEPTCOPY exist. There is no "current behaviour" yet because there is no mainline code, i.e. that is easy one to address. > 2) Only half EPCM permission management: > Moving to mprotect() as a way to set EPCM permissions is > not a clear interface for EPCM permission management because > the kernel can only restrict permissions. Even so, the kernel > has no insight into the current EPCM permissions and thus whether they > actually need to be restricted so every mprotect() call, > all except RWX, will need to be treated as a permission > restriction with all the implementation obstacles > that accompany it (more below). > > There are two possible ways to implement permission restriction > as triggered by mprotect(), (a) during the mprotect() call or > (b) during a subsequent #PF (as suggested by you), each has > its own obstacles. I would have prefered also for EAUG to bundle it unconditionally to mmap() flow. I've merely said that I don't care whether it is a part of mprotect() flow or in the #PF handler, as long as the feature is not uncontrolled chaos. Probably at least in mprotect() case it is easier flow to implement it directly as part of mprotect(). Kernel is not the most trusted party in the confidential computing scenarios. It is one of the adversaries. And SGX is designed in the way that enclave controls EPCMD database and kernel PTEs. By trying to artificially limit this you don't bring security, other than trying to block implementing applications based on SGX2. We can ditch the whole SGX, if the point is that kernel controls what happens inside enclave. Normal VMAs are much more capable for that purpose, and kernel has full control over them with e.g. PTEs. > > 3) mprotect() implementation > > When the user calls mprotect() the expectation is that the > call will either succeed or fail. If the call fails the user > expects the system to be unchanged. This is not possible if > permission restriction is done as part of mprotect(). > > (a) mprotect() may span multiple VMAs and involves VMA splits > that (from what I understand) cannot be undone. SGX memory > does not support VMA merges. If any SGX function > (EMODPR or ETRACK on any page) done after a VMA split fails > then the user will be left with fragmented memory. Oh well, SGX does not even support syscalls, if we go this level of arguments. And you are trying to sort this out with even more flakky interface, rather than stable EPCM reset to read state. I've been implementing this exact feature lately and only realistic way to do it without many corner cases is first use the current ioctl to reset the range to READ in EPCM, and with EMODPE set the appropriate permissions. > (b) The EMODPR/ETRACK pair can fail on any of the pages provided > by the mprotect() call. If there is a failure then the > kernel cannot undo previously executed EMODPR since the kernel > cannot run EMODPE. The EPCM permissions are thus left in inconsistent > state since some of the pages would have changed EPCM permissions > and mprotect() does not have mechanism to communicate > partial success. > The partial success is needed to communicate to user space > (i) which pages need EACCEPT, (ii) which pages need to be > in new request (although user space does not have information > to help the new request succeed - see below). It's true but how common is that? Return e.g. -EIO, and run-time will re-build the enclave. That anyway happens all the time with SGX for various reasons (e.g. VM migration, S3 and whatnot). It's only important that you know when this happens. > > (c) User space runtime has control over management of EPC memory > and accurate failure information would help it to do so. > Knowing the error code of the EMODPR failure would help > user space to take appropriate action. For example, EMODPR > can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime > to learn that it needs to run EACCEPT on that page before > the EMODPR can succeed. Alternatively, if it learns that the > return is "SGX_EPC_PAGE_CONFLICT" then it could determine > that some other part of the runtime attempted an ENCLU > function on that page. > It is not possible to provide such detailed errors to user > space with mprotect(). Actually user space run-time is also an adversary. Kernel and user space can e.g. kill the enclave or limit it with PTEs but EPCM is beyond them *after* initialization. The whole point is to be able to put e.g. containers to untrusted cloud. > > > 4) #PF implementation > > (a) There is more to restricting permissions than just running > ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should > also initiate the ETRACK flow to ensure that any thread within > the enclave is interrupted by sending an IPI to the CPU, > this includes the thread that just triggered the #PF. > > (b) Second consideration of the EMODPR and ETRACK flow is that > this has a large "blast radius" in that any thread in the > enclave needs to be interrupted. #PFs may arrive at any time > so setting up a page range where a fault into any page in the > page range will trigger enclave exits for all threads is > a significant yet random impact. I believe it would be better > to update all pages in the range at the same time and in this > way contain the impact of this significant EMODPR/ETRACK/IPIs > flow. > > (c) How will the page fault handler know when EMODPR/ETRACK should > be run? Consider that the page fault handler can be called > significantly later than the mprotect() call and that > user space can call EMODPE any time to extend permissions. > This implies that EMODPR/ETRACK/IPIs should be run during > *every* page fault, irrespective of mprotect(). > > (d) If a page is in pending or modified state then EMODPR will > always fail. This is something that needs to be fixed by > user space runtime but the page fault will not be able > to communicate this. > > Considering the above, could you please provide clear guidance on > how you envision permission restriction to be supported by mprotect()? I'm not specifically driving #PF implementation but because it was so important for EAUG, I said that I'm fine with #PF based implementation. Personally, I would do both EAUG and EMODPR as part of mmap() and mprotect() (e.g. to catch that partial success and return that -EIO) flow but either works for me. The API is more of a concern than the internals. > > Reinette BR, Jarkko
Hi Jarkko, On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote: > On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote: >> Hi Jarkko, >> >> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote: >>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote: >>>> Hi Jarkko, >>>> >>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote: >>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote: >>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage >>>>>> the enclave memory without needing to map it. >>>>> >>>>> Which is opposite what you do in EAUG. You can also augment pages without >>>>> needing the map them. Sure you get that capability, but it is quite useless >>>>> in practice. >>>>> >>>>>> I have considered the idea of supporting the permission restriction with >>>>>> mprotect() but as you can see in this response I did not find it to be >>>>>> practical. >>>>> >>>>> Where is it practical? What is your application? How is it practical to >>>>> delegate the concurrency management of a split mprotect() to user space? >>>>> How do we get rid off a useless up-call to the host? >>>>> >>>> >>>> The email you responded to contained many obstacles against using mprotect() >>>> but you chose to ignore them and snipped them all from your response. Could >>>> you please address the issues instead of dismissing them? >>> >>> I did read the whole email but did not see anything that would make a case >>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works. >> >> I believe that on its own each obstacle I shared with you is significant enough >> to not follow that approach. You simply respond that I am just not making a >> case without acknowledging any obstacle or providing a reason why the obstacles >> are not valid. >> >> To help me understand your view, could you please respond to each of the >> obstacles I list below and how it is not an issue? >> >> >> 1) ABI change: >> mprotect() is currently supported to modify VMA permissions >> irrespective of EPCM permissions. Supporting EPCM permission >> changes with mprotect() would change this behavior. >> For example, currently it is possible to have RW enclave >> memory and support multiple tasks accessing the memory. Two >> tasks can map the memory RW and later one can run mprotect() >> to reduce the VMA permissions to read-only without impacting >> the access of the other task. >> By moving EPCM permission changes to mprotect() this usage >> will no longer be supported and current behavior will change. > > Your concurrency scenario is somewhat artificial. Obviously you need to > synchronize somehow, and breaking something that could be done with one > system call into two separates is not going to help with that. On the > contrary, it will add a yet one more difficulty layer. This is about supporting multiple threads in a single enclave, they can all have their own memory mappings based on the needs. This is currently supported in mainline as part of SGX1. > > mprotect() controls PTE permissions, not EPCM permissions. It is the corner > stone to do any sort of confidential computing to have this division. > That's why EACCEPT and EACCEPTCOPY exist. Right, mprotect() controls PTE permissions but now you are requesting it to control EPCM permissions also. There is only one permission field in the mprotect() API so this implies that you request VMA and EPCM permissions to be in sync. This is new behavior - different from the current mainline behavior. > > There is no "current behaviour" yet because there is no mainline code, i.e. > that is easy one to address. What I described is the current behavior in mainline code. It is the current SGX1 behavior. Running an environment as I described on a SGX2 system with the mprotect() behavior you propose will see new behavior with some threads encountering page faults with SGX error code when it could run without issue on SGX1 system. I do consider this an ABI change. It should be addressed before using mprotect() for EPCM permissions can be considered. Please do provide your opinion about the ABI change. >> 2) Only half EPCM permission management: >> Moving to mprotect() as a way to set EPCM permissions is >> not a clear interface for EPCM permission management because >> the kernel can only restrict permissions. Even so, the kernel >> has no insight into the current EPCM permissions and thus whether they >> actually need to be restricted so every mprotect() call, >> all except RWX, will need to be treated as a permission >> restriction with all the implementation obstacles >> that accompany it (more below). >> >> There are two possible ways to implement permission restriction >> as triggered by mprotect(), (a) during the mprotect() call or >> (b) during a subsequent #PF (as suggested by you), each has >> its own obstacles. > > I would have prefered also for EAUG to bundle it unconditionally to mmap() > flow. I've merely said that I don't care whether it is a part of mprotect() > flow or in the #PF handler, as long as the feature is not uncontrolled > chaos. Probably at least in mprotect() case it is easier flow to implement > it directly as part of mprotect(). > > Kernel is not the most trusted party in the confidential computing > scenarios. It is one of the adversaries. And SGX is designed in the way > that enclave controls EPCMD database and kernel PTEs. By trying to > artificially limit this you don't bring security, other than trying to > block implementing applications based on SGX2. I do not follow your argument. How is implementing EPCM permission restriction with an ioctl() limiting anything? > > We can ditch the whole SGX, if the point is that kernel controls what > happens inside enclave. Normal VMAs are much more capable for that purpose, > and kernel has full control over them with e.g. PTEs. > >> >> 3) mprotect() implementation >> >> When the user calls mprotect() the expectation is that the >> call will either succeed or fail. If the call fails the user >> expects the system to be unchanged. This is not possible if >> permission restriction is done as part of mprotect(). >> >> (a) mprotect() may span multiple VMAs and involves VMA splits >> that (from what I understand) cannot be undone. SGX memory >> does not support VMA merges. If any SGX function >> (EMODPR or ETRACK on any page) done after a VMA split fails >> then the user will be left with fragmented memory. > > Oh well, SGX does not even support syscalls, if we go this level of > arguments. And you are trying to sort this out with even more flakky > interface, rather than stable EPCM reset to read state. I did not find your answer on how to handle this obstacle. Are you saying that leaving the user with fragmented memory and inconsistent state is acceptable? Could you please elaborate? I am trying to understand how to support this permission restriction with mprotect() and I get stuck on the scenario where VMAs need to be split - this has to be handled if we go this route. If it is possible to integrate with mprotect() then I can do so but I do not see how to do so yet and here I mention one issue and you again just dismiss it. If we are not able to handle this then it is indeed mprotect() that will be the "flakky interface" and we should stick with the ioctl(). > I've been implementing this exact feature lately and only realistic way to > do it without many corner cases is first use the current ioctl to reset the > range to READ in EPCM, and with EMODPE set the appropriate permissions. This is supported in the current implementation with the SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl(). > > >> (b) The EMODPR/ETRACK pair can fail on any of the pages provided >> by the mprotect() call. If there is a failure then the >> kernel cannot undo previously executed EMODPR since the kernel >> cannot run EMODPE. The EPCM permissions are thus left in inconsistent >> state since some of the pages would have changed EPCM permissions >> and mprotect() does not have mechanism to communicate >> partial success. >> The partial success is needed to communicate to user space >> (i) which pages need EACCEPT, (ii) which pages need to be >> in new request (although user space does not have information >> to help the new request succeed - see below). > > It's true but how common is that? The kernel needs to handle all scenarios, whether it is common or not. > Return e.g. -EIO, and run-time will > re-build the enclave. That anyway happens all the time with SGX for > various reasons (e.g. VM migration, S3 and whatnot). It's only important > that you know when this happens. Please confirm: you support a user space implementation using mprotect() that can leave the system in inconsistent state? >> (c) User space runtime has control over management of EPC memory >> and accurate failure information would help it to do so. >> Knowing the error code of the EMODPR failure would help >> user space to take appropriate action. For example, EMODPR >> can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime >> to learn that it needs to run EACCEPT on that page before >> the EMODPR can succeed. Alternatively, if it learns that the >> return is "SGX_EPC_PAGE_CONFLICT" then it could determine >> that some other part of the runtime attempted an ENCLU >> function on that page. >> It is not possible to provide such detailed errors to user >> space with mprotect(). > > Actually user space run-time is also an adversary. Kernel and user > space can e.g. kill the enclave or limit it with PTEs but EPCM is > beyond them *after* initialization. The whole point is to be able > to put e.g. containers to untrusted cloud. You seem to be saying that while the kernel could help the runtime to manage the enclave it should not. Is this correct? There may be scenarios where an enclave could repair itself during runtime, for example by running EACCEPT on a page that had a PENDING bit set. This information is provided to the runtime with the SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl(), but with this mprotect() implementation the kernel cannot provide this information and thus forces the enclave to be torn down and rebuilt to recover. Is this (using mprotect()) the kernel implementation you prefer? >> 4) #PF implementation >> >> (a) There is more to restricting permissions than just running >> ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should >> also initiate the ETRACK flow to ensure that any thread within >> the enclave is interrupted by sending an IPI to the CPU, >> this includes the thread that just triggered the #PF. >> >> (b) Second consideration of the EMODPR and ETRACK flow is that >> this has a large "blast radius" in that any thread in the >> enclave needs to be interrupted. #PFs may arrive at any time >> so setting up a page range where a fault into any page in the >> page range will trigger enclave exits for all threads is >> a significant yet random impact. I believe it would be better >> to update all pages in the range at the same time and in this >> way contain the impact of this significant EMODPR/ETRACK/IPIs >> flow. >> >> (c) How will the page fault handler know when EMODPR/ETRACK should >> be run? Consider that the page fault handler can be called >> significantly later than the mprotect() call and that >> user space can call EMODPE any time to extend permissions. >> This implies that EMODPR/ETRACK/IPIs should be run during >> *every* page fault, irrespective of mprotect(). >> >> (d) If a page is in pending or modified state then EMODPR will >> always fail. This is something that needs to be fixed by >> user space runtime but the page fault will not be able >> to communicate this. >> >> Considering the above, could you please provide clear guidance on >> how you envision permission restriction to be supported by mprotect()? > > I'm not specifically driving #PF implementation but because it was so > important for EAUG, I said that I'm fine with #PF based implementation. > > Personally, I would do both EAUG and EMODPR as part of mmap() and > mprotect() (e.g. to catch that partial success and return that -EIO) > flow but either works for me. The API is more of a concern than the > internals. Are you now requesting EMODPR as part of mmap() also? Could you please elaborate how mmap() and mprotect() can handle partial success? Reinette
On Thu, Mar 17, 2022 at 05:11:40PM -0700, Reinette Chatre wrote: > Hi Jarkko, > > On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote: > > On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote: > >> Hi Jarkko, > >> > >> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote: > >>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote: > >>>> Hi Jarkko, > >>>> > >>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote: > >>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote: > >>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage > >>>>>> the enclave memory without needing to map it. > >>>>> > >>>>> Which is opposite what you do in EAUG. You can also augment pages without > >>>>> needing the map them. Sure you get that capability, but it is quite useless > >>>>> in practice. > >>>>> > >>>>>> I have considered the idea of supporting the permission restriction with > >>>>>> mprotect() but as you can see in this response I did not find it to be > >>>>>> practical. > >>>>> > >>>>> Where is it practical? What is your application? How is it practical to > >>>>> delegate the concurrency management of a split mprotect() to user space? > >>>>> How do we get rid off a useless up-call to the host? > >>>>> > >>>> > >>>> The email you responded to contained many obstacles against using mprotect() > >>>> but you chose to ignore them and snipped them all from your response. Could > >>>> you please address the issues instead of dismissing them? > >>> > >>> I did read the whole email but did not see anything that would make a case > >>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works. > >> > >> I believe that on its own each obstacle I shared with you is significant enough > >> to not follow that approach. You simply respond that I am just not making a > >> case without acknowledging any obstacle or providing a reason why the obstacles > >> are not valid. > >> > >> To help me understand your view, could you please respond to each of the > >> obstacles I list below and how it is not an issue? > >> > >> > >> 1) ABI change: > >> mprotect() is currently supported to modify VMA permissions > >> irrespective of EPCM permissions. Supporting EPCM permission > >> changes with mprotect() would change this behavior. > >> For example, currently it is possible to have RW enclave > >> memory and support multiple tasks accessing the memory. Two > >> tasks can map the memory RW and later one can run mprotect() > >> to reduce the VMA permissions to read-only without impacting > >> the access of the other task. > >> By moving EPCM permission changes to mprotect() this usage > >> will no longer be supported and current behavior will change. > > > > Your concurrency scenario is somewhat artificial. Obviously you need to > > synchronize somehow, and breaking something that could be done with one > > system call into two separates is not going to help with that. On the > > contrary, it will add a yet one more difficulty layer. > > This is about supporting multiple threads in a single enclave, they can > all have their own memory mappings based on the needs. This is currently > supported in mainline as part of SGX1. > > > > > mprotect() controls PTE permissions, not EPCM permissions. It is the corner > > stone to do any sort of confidential computing to have this division. > > That's why EACCEPT and EACCEPTCOPY exist. > > Right, mprotect() controls PTE permissions but now you are requesting it > to control EPCM permissions also. > > There is only one permission field in the mprotect() API so this implies > that you request VMA and EPCM permissions to be in sync. This is new > behavior - different from the current mainline behavior. Not true. mprotect() should do EPCM reset by fixed PROT_READ for EMODPR. Then enclave can use EMODPE to set the permissions. > > > > > There is no "current behaviour" yet because there is no mainline code, i.e. > > that is easy one to address. > > What I described is the current behavior in mainline code. It is the > current SGX1 behavior. Running an environment as I described on a SGX2 > system with the mprotect() behavior you propose will see new behavior > with some threads encountering page faults with SGX error > code when it could run without issue on SGX1 system. > > I do consider this an ABI change. It should be addressed > before using mprotect() for EPCM permissions can be considered. > > Please do provide your opinion about the ABI change. With SGX1 there's no meaningful use for mprotect() after EINIT. This would be of course applicable after EINIT, not before. We have a flag to check whether enclave has been initialized. > > >> 2) Only half EPCM permission management: > >> Moving to mprotect() as a way to set EPCM permissions is > >> not a clear interface for EPCM permission management because > >> the kernel can only restrict permissions. Even so, the kernel > >> has no insight into the current EPCM permissions and thus whether they > >> actually need to be restricted so every mprotect() call, > >> all except RWX, will need to be treated as a permission > >> restriction with all the implementation obstacles > >> that accompany it (more below). > >> > >> There are two possible ways to implement permission restriction > >> as triggered by mprotect(), (a) during the mprotect() call or > >> (b) during a subsequent #PF (as suggested by you), each has > >> its own obstacles. > > > > I would have prefered also for EAUG to bundle it unconditionally to mmap() > > flow. I've merely said that I don't care whether it is a part of mprotect() > > flow or in the #PF handler, as long as the feature is not uncontrolled > > chaos. Probably at least in mprotect() case it is easier flow to implement > > it directly as part of mprotect(). > > > > Kernel is not the most trusted party in the confidential computing > > scenarios. It is one of the adversaries. And SGX is designed in the way > > that enclave controls EPCMD database and kernel PTEs. By trying to > > artificially limit this you don't bring security, other than trying to > > block implementing applications based on SGX2. > > I do not follow your argument. How is implementing EPCM permission restriction > with an ioctl() limiting anything? If you use minimal permissions with EMODPR, it gives freedom for EMODPE to use like it was EMODP, which is great. > > > > > We can ditch the whole SGX, if the point is that kernel controls what > > happens inside enclave. Normal VMAs are much more capable for that purpose, > > and kernel has full control over them with e.g. PTEs. > > > >> > >> 3) mprotect() implementation > >> > >> When the user calls mprotect() the expectation is that the > >> call will either succeed or fail. If the call fails the user > >> expects the system to be unchanged. This is not possible if > >> permission restriction is done as part of mprotect(). > >> > >> (a) mprotect() may span multiple VMAs and involves VMA splits > >> that (from what I understand) cannot be undone. SGX memory > >> does not support VMA merges. If any SGX function > >> (EMODPR or ETRACK on any page) done after a VMA split fails > >> then the user will be left with fragmented memory. > > > > Oh well, SGX does not even support syscalls, if we go this level of > > arguments. And you are trying to sort this out with even more flakky > > interface, rather than stable EPCM reset to read state. > > I did not find your answer on how to handle this obstacle. Are you > saying that leaving the user with fragmented memory and inconsistent > state is acceptable? > > Could you please elaborate? I am trying to understand how to support > this permission restriction with mprotect() and I get stuck on the scenario > where VMAs need to be split - this has to be handled if we go this route. > > If it is possible to integrate with mprotect() then I can do so but I > do not see how to do so yet and here I mention one issue and you > again just dismiss it. If we are not able to handle this then it is > indeed mprotect() that will be the "flakky interface" and we should > stick with the ioctl(). It's flakky because you have to pair every single mprotect() with ioctl() that is unconditionally set to PROT_READ. Also it is concurrency wise worse because mprotect() can do both with mmap_sem held. It adds an extra useless round trip to the kernel. > > > > I've been implementing this exact feature lately and only realistic way to > > do it without many corner cases is first use the current ioctl to reset the > > range to READ in EPCM, and with EMODPE set the appropriate permissions. > > This is supported in the current implementation with the > SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl(). > > > > > > >> (b) The EMODPR/ETRACK pair can fail on any of the pages provided > >> by the mprotect() call. If there is a failure then the > >> kernel cannot undo previously executed EMODPR since the kernel > >> cannot run EMODPE. The EPCM permissions are thus left in inconsistent > >> state since some of the pages would have changed EPCM permissions > >> and mprotect() does not have mechanism to communicate > >> partial success. > >> The partial success is needed to communicate to user space > >> (i) which pages need EACCEPT, (ii) which pages need to be > >> in new request (although user space does not have information > >> to help the new request succeed - see below). > > > > It's true but how common is that? > > The kernel needs to handle all scenarios, whether it is common or not. This is not true. Kernel needs to provide meaningful interface to the hardware that does not user space to do stupid things. We do not provide 1:1 inteface to every single hardware interface. Allowing to use EMODPE actually does provide full control of the permissions. That should be enough. > > > Return e.g. -EIO, and run-time will > > re-build the enclave. That anyway happens all the time with SGX for > > various reasons (e.g. VM migration, S3 and whatnot). It's only important > > that you know when this happens. > > Please confirm: you support a user space implementation using mprotect() > that can leave the system in inconsistent state? It actually does not leave kernel structures to incosistent state so it's all fine. Partial success is almost inexistent unless there is actual bug in the run-time. It's same as with files, sockets etc. If partial success happens, user space is probably already in incosistent state. I'm not sure how "system" is defined here so I cannot give definitive a yes/no answer. User space kicking itself to foot is not something that kernel usually has to take extra measures for. > > > >> (c) User space runtime has control over management of EPC memory > >> and accurate failure information would help it to do so. > >> Knowing the error code of the EMODPR failure would help > >> user space to take appropriate action. For example, EMODPR > >> can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime > >> to learn that it needs to run EACCEPT on that page before > >> the EMODPR can succeed. Alternatively, if it learns that the > >> return is "SGX_EPC_PAGE_CONFLICT" then it could determine > >> that some other part of the runtime attempted an ENCLU > >> function on that page. > >> It is not possible to provide such detailed errors to user > >> space with mprotect(). > > > > Actually user space run-time is also an adversary. Kernel and user > > space can e.g. kill the enclave or limit it with PTEs but EPCM is > > beyond them *after* initialization. The whole point is to be able > > to put e.g. containers to untrusted cloud. > > You seem to be saying that while the kernel could help the > runtime to manage the enclave it should not. Is this correct? > > There may be scenarios where an enclave could repair itself during runtime, > for example by running EACCEPT on a page that had a PENDING bit set. > This information is provided to the runtime with the > SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl(), but with this mprotect() > implementation the kernel cannot provide this information and thus > forces the enclave to be torn down and rebuilt to recover. > > Is this (using mprotect()) the kernel implementation you prefer? If there is partial success it's a bug, not a legit scenario for well behaving run-time. > > >> 4) #PF implementation > >> > >> (a) There is more to restricting permissions than just running > >> ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should > >> also initiate the ETRACK flow to ensure that any thread within > >> the enclave is interrupted by sending an IPI to the CPU, > >> this includes the thread that just triggered the #PF. > >> > >> (b) Second consideration of the EMODPR and ETRACK flow is that > >> this has a large "blast radius" in that any thread in the > >> enclave needs to be interrupted. #PFs may arrive at any time > >> so setting up a page range where a fault into any page in the > >> page range will trigger enclave exits for all threads is > >> a significant yet random impact. I believe it would be better > >> to update all pages in the range at the same time and in this > >> way contain the impact of this significant EMODPR/ETRACK/IPIs > >> flow. > >> > >> (c) How will the page fault handler know when EMODPR/ETRACK should > >> be run? Consider that the page fault handler can be called > >> significantly later than the mprotect() call and that > >> user space can call EMODPE any time to extend permissions. > >> This implies that EMODPR/ETRACK/IPIs should be run during > >> *every* page fault, irrespective of mprotect(). > >> > >> (d) If a page is in pending or modified state then EMODPR will > >> always fail. This is something that needs to be fixed by > >> user space runtime but the page fault will not be able > >> to communicate this. > >> > >> Considering the above, could you please provide clear guidance on > >> how you envision permission restriction to be supported by mprotect()? > > > > I'm not specifically driving #PF implementation but because it was so > > important for EAUG, I said that I'm fine with #PF based implementation. > > > > Personally, I would do both EAUG and EMODPR as part of mmap() and > > mprotect() (e.g. to catch that partial success and return that -EIO) > > flow but either works for me. The API is more of a concern than the > > internals. > > Are you now requesting EMODPR as part of mmap() also? Could you > please elaborate how mmap() and mprotect() can handle partial success? Nope, I was just referring that EAUG is #PF based but could have been also been implemented as part of mmap() flow. API wise it is symmetrical. BR, Jarkko
Hi Jarkko, On 3/19/2022 5:24 PM, Jarkko Sakkinen wrote: > On Thu, Mar 17, 2022 at 05:11:40PM -0700, Reinette Chatre wrote: >> Hi Jarkko, >> >> On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote: >>> On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote: >>>> Hi Jarkko, >>>> >>>> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote: >>>>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote: >>>>>> Hi Jarkko, >>>>>> >>>>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote: >>>>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote: >>>>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage >>>>>>>> the enclave memory without needing to map it. >>>>>>> >>>>>>> Which is opposite what you do in EAUG. You can also augment pages without >>>>>>> needing the map them. Sure you get that capability, but it is quite useless >>>>>>> in practice. >>>>>>> >>>>>>>> I have considered the idea of supporting the permission restriction with >>>>>>>> mprotect() but as you can see in this response I did not find it to be >>>>>>>> practical. >>>>>>> >>>>>>> Where is it practical? What is your application? How is it practical to >>>>>>> delegate the concurrency management of a split mprotect() to user space? >>>>>>> How do we get rid off a useless up-call to the host? >>>>>>> >>>>>> >>>>>> The email you responded to contained many obstacles against using mprotect() >>>>>> but you chose to ignore them and snipped them all from your response. Could >>>>>> you please address the issues instead of dismissing them? >>>>> >>>>> I did read the whole email but did not see anything that would make a case >>>>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works. >>>> >>>> I believe that on its own each obstacle I shared with you is significant enough >>>> to not follow that approach. You simply respond that I am just not making a >>>> case without acknowledging any obstacle or providing a reason why the obstacles >>>> are not valid. >>>> >>>> To help me understand your view, could you please respond to each of the >>>> obstacles I list below and how it is not an issue? >>>> >>>> >>>> 1) ABI change: >>>> mprotect() is currently supported to modify VMA permissions >>>> irrespective of EPCM permissions. Supporting EPCM permission >>>> changes with mprotect() would change this behavior. >>>> For example, currently it is possible to have RW enclave >>>> memory and support multiple tasks accessing the memory. Two >>>> tasks can map the memory RW and later one can run mprotect() >>>> to reduce the VMA permissions to read-only without impacting >>>> the access of the other task. >>>> By moving EPCM permission changes to mprotect() this usage >>>> will no longer be supported and current behavior will change. >>> >>> Your concurrency scenario is somewhat artificial. Obviously you need to >>> synchronize somehow, and breaking something that could be done with one >>> system call into two separates is not going to help with that. On the >>> contrary, it will add a yet one more difficulty layer. >> >> This is about supporting multiple threads in a single enclave, they can >> all have their own memory mappings based on the needs. This is currently >> supported in mainline as part of SGX1. Could you please comment on the above? >> >>> >>> mprotect() controls PTE permissions, not EPCM permissions. It is the corner >>> stone to do any sort of confidential computing to have this division. >>> That's why EACCEPT and EACCEPTCOPY exist. >> >> Right, mprotect() controls PTE permissions but now you are requesting it >> to control EPCM permissions also. >> >> There is only one permission field in the mprotect() API so this implies >> that you request VMA and EPCM permissions to be in sync. This is new >> behavior - different from the current mainline behavior. > > Not true. mprotect() should do EPCM reset by fixed PROT_READ for EMODPR. > Then enclave can use EMODPE to set the permissions. I think that I am starting to decipher what your vision is. If I understand correctly mprotect() would serve a double purpose: a) modify VMA permissions exactly as is done in SGX1 (no consideration of EPCM permissions and only limitation is that VMA permissions are not allowed to exceed vm_max_prot_bits) b) EPCM permissions are _always_ restricted to PROT_READ irrespective of VMA permissions requested (new) Is this correct? With mprotect() always resetting EPCM to be PROT_READ there is no new sync between VMA and EPCM permissions. >>> There is no "current behaviour" yet because there is no mainline code, i.e. >>> that is easy one to address. >> >> What I described is the current behavior in mainline code. It is the >> current SGX1 behavior. Running an environment as I described on a SGX2 >> system with the mprotect() behavior you propose will see new behavior >> with some threads encountering page faults with SGX error >> code when it could run without issue on SGX1 system. >> >> I do consider this an ABI change. It should be addressed >> before using mprotect() for EPCM permissions can be considered. >> >> Please do provide your opinion about the ABI change. > > With SGX1 there's no meaningful use for mprotect() after EINIT. This > would be of course applicable after EINIT, not before. We have a flag > to check whether enclave has been initialized. I interpret your comment to mean that the ABI change is acceptable since existing usages of mprotect() after EINIT are not meaningful. >>>> 2) Only half EPCM permission management: >>>> Moving to mprotect() as a way to set EPCM permissions is >>>> not a clear interface for EPCM permission management because >>>> the kernel can only restrict permissions. Even so, the kernel >>>> has no insight into the current EPCM permissions and thus whether they >>>> actually need to be restricted so every mprotect() call, >>>> all except RWX, will need to be treated as a permission >>>> restriction with all the implementation obstacles >>>> that accompany it (more below). >>>> >>>> There are two possible ways to implement permission restriction >>>> as triggered by mprotect(), (a) during the mprotect() call or >>>> (b) during a subsequent #PF (as suggested by you), each has >>>> its own obstacles. >>> >>> I would have prefered also for EAUG to bundle it unconditionally to mmap() >>> flow. I've merely said that I don't care whether it is a part of mprotect() >>> flow or in the #PF handler, as long as the feature is not uncontrolled >>> chaos. Probably at least in mprotect() case it is easier flow to implement >>> it directly as part of mprotect(). >>> >>> Kernel is not the most trusted party in the confidential computing >>> scenarios. It is one of the adversaries. And SGX is designed in the way >>> that enclave controls EPCMD database and kernel PTEs. By trying to >>> artificially limit this you don't bring security, other than trying to >>> block implementing applications based on SGX2. >> >> I do not follow your argument. How is implementing EPCM permission restriction >> with an ioctl() limiting anything? > > If you use minimal permissions with EMODPR, it gives freedom for EMODPE > to use like it was EMODP, which is great. Understood. > >> >>> >>> We can ditch the whole SGX, if the point is that kernel controls what >>> happens inside enclave. Normal VMAs are much more capable for that purpose, >>> and kernel has full control over them with e.g. PTEs. >>> >>>> >>>> 3) mprotect() implementation >>>> >>>> When the user calls mprotect() the expectation is that the >>>> call will either succeed or fail. If the call fails the user >>>> expects the system to be unchanged. This is not possible if >>>> permission restriction is done as part of mprotect(). >>>> >>>> (a) mprotect() may span multiple VMAs and involves VMA splits >>>> that (from what I understand) cannot be undone. SGX memory >>>> does not support VMA merges. If any SGX function >>>> (EMODPR or ETRACK on any page) done after a VMA split fails >>>> then the user will be left with fragmented memory. >>> >>> Oh well, SGX does not even support syscalls, if we go this level of >>> arguments. And you are trying to sort this out with even more flakky >>> interface, rather than stable EPCM reset to read state. >> >> I did not find your answer on how to handle this obstacle. Are you >> saying that leaving the user with fragmented memory and inconsistent >> state is acceptable? >> >> Could you please elaborate? I am trying to understand how to support >> this permission restriction with mprotect() and I get stuck on the scenario >> where VMAs need to be split - this has to be handled if we go this route. >> >> If it is possible to integrate with mprotect() then I can do so but I >> do not see how to do so yet and here I mention one issue and you >> again just dismiss it. If we are not able to handle this then it is >> indeed mprotect() that will be the "flakky interface" and we should >> stick with the ioctl(). > > It's flakky because you have to pair every single mprotect() with > ioctl() that is unconditionally set to PROT_READ. Also it is concurrency > wise worse because mprotect() can do both with mmap_sem held. It adds > an extra useless round trip to the kernel. This still does not address my concern regarding possible fragmented memory. Are you considering fragmented memory to be in the same category as the inconsistent state mentioned below? (That it is a consequence of a bug in the run-time?) >>> I've been implementing this exact feature lately and only realistic way to >>> do it without many corner cases is first use the current ioctl to reset the >>> range to READ in EPCM, and with EMODPE set the appropriate permissions. >> >> This is supported in the current implementation with the >> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl(). >> >>> >>> >>>> (b) The EMODPR/ETRACK pair can fail on any of the pages provided >>>> by the mprotect() call. If there is a failure then the >>>> kernel cannot undo previously executed EMODPR since the kernel >>>> cannot run EMODPE. The EPCM permissions are thus left in inconsistent >>>> state since some of the pages would have changed EPCM permissions >>>> and mprotect() does not have mechanism to communicate >>>> partial success. >>>> The partial success is needed to communicate to user space >>>> (i) which pages need EACCEPT, (ii) which pages need to be >>>> in new request (although user space does not have information >>>> to help the new request succeed - see below). >>> >>> It's true but how common is that? >> >> The kernel needs to handle all scenarios, whether it is common or not. > > This is not true. Kernel needs to provide meaningful interface to the > hardware that does not user space to do stupid things. We do not provide > 1:1 inteface to every single hardware interface. Allowing to use EMODPE > actually does provide full control of the permissions. That should be > enough. I was not proposing that the kernel "provides a 1:1 interface for every single hardware interface". My comment was that the kernel needs to handle all user space scenarios. It is possible that an enclave page is in a state where EMODPR can fail because of something that needs to be fixed from within the enclave or run-time, for example, clearing a EPCM.PENDING bit. The kernel needs to handle such scenarios. I understand from your explanations that run-time handling of such scenarios are not a goal or requirement but instead should always require enclave re-build. >>> Return e.g. -EIO, and run-time will >>> re-build the enclave. That anyway happens all the time with SGX for >>> various reasons (e.g. VM migration, S3 and whatnot). It's only important >>> that you know when this happens. >> >> Please confirm: you support a user space implementation using mprotect() >> that can leave the system in inconsistent state? > > It actually does not leave kernel structures to incosistent state so it's > all fine. Partial success is almost inexistent unless there is actual bug > in the run-time. It's same as with files, sockets etc. If partial success > happens, user space is probably already in incosistent state. > > I'm not sure how "system" is defined here so I cannot give definitive a > yes/no answer. > > User space kicking itself to foot is not something that kernel usually > has to take extra measures for. I am not against allowing user space kicking itself. I was of the opinion that it would be helpful if the kernel can provide information to user space to salvage itself instead of always forcing it to re-build. You make it clear here and below that this is not a goal or requirement. >>>> (c) User space runtime has control over management of EPC memory >>>> and accurate failure information would help it to do so. >>>> Knowing the error code of the EMODPR failure would help >>>> user space to take appropriate action. For example, EMODPR >>>> can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime >>>> to learn that it needs to run EACCEPT on that page before >>>> the EMODPR can succeed. Alternatively, if it learns that the >>>> return is "SGX_EPC_PAGE_CONFLICT" then it could determine >>>> that some other part of the runtime attempted an ENCLU >>>> function on that page. >>>> It is not possible to provide such detailed errors to user >>>> space with mprotect(). >>> >>> Actually user space run-time is also an adversary. Kernel and user >>> space can e.g. kill the enclave or limit it with PTEs but EPCM is >>> beyond them *after* initialization. The whole point is to be able >>> to put e.g. containers to untrusted cloud. >> >> You seem to be saying that while the kernel could help the >> runtime to manage the enclave it should not. Is this correct? >> >> There may be scenarios where an enclave could repair itself during runtime, >> for example by running EACCEPT on a page that had a PENDING bit set. >> This information is provided to the runtime with the >> SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl(), but with this mprotect() >> implementation the kernel cannot provide this information and thus >> forces the enclave to be torn down and rebuilt to recover. >> >> Is this (using mprotect()) the kernel implementation you prefer? > > If there is partial success it's a bug, not a legit scenario for well > behaving run-time. ok Reinette
On Mon, Mar 28, 2022 at 04:22:35PM -0700, Reinette Chatre wrote: > Hi Jarkko, > > On 3/19/2022 5:24 PM, Jarkko Sakkinen wrote: > > On Thu, Mar 17, 2022 at 05:11:40PM -0700, Reinette Chatre wrote: > >> Hi Jarkko, > >> > >> On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote: > >>> On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote: > >>>> Hi Jarkko, > >>>> > >>>> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote: > >>>>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote: > >>>>>> Hi Jarkko, > >>>>>> > >>>>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote: > >>>>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote: > >>>>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage > >>>>>>>> the enclave memory without needing to map it. > >>>>>>> > >>>>>>> Which is opposite what you do in EAUG. You can also augment pages without > >>>>>>> needing the map them. Sure you get that capability, but it is quite useless > >>>>>>> in practice. > >>>>>>> > >>>>>>>> I have considered the idea of supporting the permission restriction with > >>>>>>>> mprotect() but as you can see in this response I did not find it to be > >>>>>>>> practical. > >>>>>>> > >>>>>>> Where is it practical? What is your application? How is it practical to > >>>>>>> delegate the concurrency management of a split mprotect() to user space? > >>>>>>> How do we get rid off a useless up-call to the host? > >>>>>>> > >>>>>> > >>>>>> The email you responded to contained many obstacles against using mprotect() > >>>>>> but you chose to ignore them and snipped them all from your response. Could > >>>>>> you please address the issues instead of dismissing them? > >>>>> > >>>>> I did read the whole email but did not see anything that would make a case > >>>>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works. > >>>> > >>>> I believe that on its own each obstacle I shared with you is significant enough > >>>> to not follow that approach. You simply respond that I am just not making a > >>>> case without acknowledging any obstacle or providing a reason why the obstacles > >>>> are not valid. > >>>> > >>>> To help me understand your view, could you please respond to each of the > >>>> obstacles I list below and how it is not an issue? > >>>> > >>>> > >>>> 1) ABI change: > >>>> mprotect() is currently supported to modify VMA permissions > >>>> irrespective of EPCM permissions. Supporting EPCM permission > >>>> changes with mprotect() would change this behavior. > >>>> For example, currently it is possible to have RW enclave > >>>> memory and support multiple tasks accessing the memory. Two > >>>> tasks can map the memory RW and later one can run mprotect() > >>>> to reduce the VMA permissions to read-only without impacting > >>>> the access of the other task. > >>>> By moving EPCM permission changes to mprotect() this usage > >>>> will no longer be supported and current behavior will change. > >>> > >>> Your concurrency scenario is somewhat artificial. Obviously you need to > >>> synchronize somehow, and breaking something that could be done with one > >>> system call into two separates is not going to help with that. On the > >>> contrary, it will add a yet one more difficulty layer. > >> > >> This is about supporting multiple threads in a single enclave, they can > >> all have their own memory mappings based on the needs. This is currently > >> supported in mainline as part of SGX1. > > > Could you please comment on the above? I've probably spent probably over two weeks of my life addressing concerns to the point that I feel as I was implementing this feature (that could be faster way to get it done). So I'll just wait the next version and see how it is like and give my feedback based on that. It's not really my problem to address every possible concern. BR, Jarkko
On Wed, Mar 30, 2022 at 06:00:30PM +0300, Jarkko Sakkinen wrote: > On Mon, Mar 28, 2022 at 04:22:35PM -0700, Reinette Chatre wrote: > > Hi Jarkko, > > > > On 3/19/2022 5:24 PM, Jarkko Sakkinen wrote: > > > On Thu, Mar 17, 2022 at 05:11:40PM -0700, Reinette Chatre wrote: > > >> Hi Jarkko, > > >> > > >> On 3/17/2022 3:51 PM, Jarkko Sakkinen wrote: > > >>> On Thu, Mar 17, 2022 at 03:08:04PM -0700, Reinette Chatre wrote: > > >>>> Hi Jarkko, > > >>>> > > >>>> On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote: > > >>>>> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote: > > >>>>>> Hi Jarkko, > > >>>>>> > > >>>>>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote: > > >>>>>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote: > > >>>>>>>> Supporting permission restriction in an ioctl() enables the runtime to manage > > >>>>>>>> the enclave memory without needing to map it. > > >>>>>>> > > >>>>>>> Which is opposite what you do in EAUG. You can also augment pages without > > >>>>>>> needing the map them. Sure you get that capability, but it is quite useless > > >>>>>>> in practice. > > >>>>>>> > > >>>>>>>> I have considered the idea of supporting the permission restriction with > > >>>>>>>> mprotect() but as you can see in this response I did not find it to be > > >>>>>>>> practical. > > >>>>>>> > > >>>>>>> Where is it practical? What is your application? How is it practical to > > >>>>>>> delegate the concurrency management of a split mprotect() to user space? > > >>>>>>> How do we get rid off a useless up-call to the host? > > >>>>>>> > > >>>>>> > > >>>>>> The email you responded to contained many obstacles against using mprotect() > > >>>>>> but you chose to ignore them and snipped them all from your response. Could > > >>>>>> you please address the issues instead of dismissing them? > > >>>>> > > >>>>> I did read the whole email but did not see anything that would make a case > > >>>>> for fully exposed EMODPR, or having asymmetrical towards how EAUG works. > > >>>> > > >>>> I believe that on its own each obstacle I shared with you is significant enough > > >>>> to not follow that approach. You simply respond that I am just not making a > > >>>> case without acknowledging any obstacle or providing a reason why the obstacles > > >>>> are not valid. > > >>>> > > >>>> To help me understand your view, could you please respond to each of the > > >>>> obstacles I list below and how it is not an issue? > > >>>> > > >>>> > > >>>> 1) ABI change: > > >>>> mprotect() is currently supported to modify VMA permissions > > >>>> irrespective of EPCM permissions. Supporting EPCM permission > > >>>> changes with mprotect() would change this behavior. > > >>>> For example, currently it is possible to have RW enclave > > >>>> memory and support multiple tasks accessing the memory. Two > > >>>> tasks can map the memory RW and later one can run mprotect() > > >>>> to reduce the VMA permissions to read-only without impacting > > >>>> the access of the other task. > > >>>> By moving EPCM permission changes to mprotect() this usage > > >>>> will no longer be supported and current behavior will change. > > >>> > > >>> Your concurrency scenario is somewhat artificial. Obviously you need to > > >>> synchronize somehow, and breaking something that could be done with one > > >>> system call into two separates is not going to help with that. On the > > >>> contrary, it will add a yet one more difficulty layer. > > >> > > >> This is about supporting multiple threads in a single enclave, they can > > >> all have their own memory mappings based on the needs. This is currently > > >> supported in mainline as part of SGX1. > > > > > > Could you please comment on the above? > > > I've probably spent probably over two weeks of my life addressing concerns > to the point that I feel as I was implementing this feature (that could be > faster way to get it done). > > So I'll just wait the next version and see how it is like and give my > feedback based on that. It's not really my problem to address every > possible concern. Once v3 is out, I'll check what I think is right, and what is wrong and might send some fixups and see where that leads to. I think it is more costructive way to move forward. Repeating same arguments leads to nowhere. BR, Jarkko
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h index 5c678b27bb72..b0ffb80bc67f 100644 --- a/arch/x86/include/uapi/asm/sgx.h +++ b/arch/x86/include/uapi/asm/sgx.h @@ -31,6 +31,8 @@ enum sgx_page_flags { _IO(SGX_MAGIC, 0x04) #define SGX_IOC_ENCLAVE_RELAX_PERMISSIONS \ _IOWR(SGX_MAGIC, 0x05, struct sgx_enclave_relax_perm) +#define SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS \ + _IOWR(SGX_MAGIC, 0x06, struct sgx_enclave_restrict_perm) /** * struct sgx_enclave_create - parameter structure for the @@ -95,6 +97,25 @@ struct sgx_enclave_relax_perm { __u64 count; }; +/** + * struct sgx_enclave_restrict_perm - parameters for ioctl + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS + * @offset: starting page offset (page aligned relative to enclave base + * address defined in SECS) + * @length: length of memory (multiple of the page size) + * @secinfo: address for the SECINFO data containing the new permission bits + * for pages in range described by @offset and @length + * @result: (output) SGX result code of ENCLS[EMODPR] function + * @count: (output) bytes successfully changed (multiple of page size) + */ +struct sgx_enclave_restrict_perm { + __u64 offset; + __u64 length; + __u64 secinfo; + __u64 result; + __u64 count; +}; + struct sgx_enclave_run; /** diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 8da813504249..a5d4a7efb986 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -90,8 +90,8 @@ static struct sgx_epc_page *sgx_encl_eldu(struct sgx_encl_page *encl_page, return epc_page; } -static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, - unsigned long addr) +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, + unsigned long addr) { struct sgx_epc_page *epc_page; struct sgx_encl_page *entry; diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index cb9f16d457ac..848a28d28d3d 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -120,4 +120,7 @@ void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset); bool sgx_va_page_full(struct sgx_va_page *va_page); void sgx_encl_free_epc_page(struct sgx_epc_page *page); +struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, + unsigned long addr); + #endif /* _X86_ENCL_H */ diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 9cc6af404bf6..23bdf558b231 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -894,6 +894,232 @@ static long sgx_ioc_enclave_relax_perm(struct sgx_encl *encl, void __user *arg) return ret; } +/* + * Some SGX functions require that no cached linear-to-physical address + * mappings are present before they can succeed. Collaborate with + * hardware via ENCLS[ETRACK] to ensure that all cached + * linear-to-physical address mappings belonging to all threads of + * the enclave are cleared. See sgx_encl_cpumask() for details. + */ +static int sgx_enclave_etrack(struct sgx_encl *encl) +{ + void *epc_virt; + int ret; + + epc_virt = sgx_get_epc_virt_addr(encl->secs.epc_page); + ret = __etrack(epc_virt); + if (ret) { + /* + * ETRACK only fails when there is an OS issue. For + * example, two consecutive ETRACK was sent without + * completed IPI between. + */ + pr_err_once("ETRACK returned %d (0x%x)", ret, ret); + /* + * Send IPIs to kick CPUs out of the enclave and + * try ETRACK again. + */ + on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1); + ret = __etrack(epc_virt); + if (ret) { + pr_err_once("ETRACK repeat returned %d (0x%x)", + ret, ret); + return -EFAULT; + } + } + on_each_cpu_mask(sgx_encl_cpumask(encl), sgx_ipi_cb, NULL, 1); + + return 0; +} + +/** + * sgx_enclave_restrict_perm() - Restrict EPCM permissions and align OS view + * @encl: Enclave to which the pages belong. + * @modp: Checked parameters from user on which pages need modifying. + * @secinfo_perm: New (validated) permission bits. + * + * Return: + * - 0: Success. + * - -errno: Otherwise. + */ +static long sgx_enclave_restrict_perm(struct sgx_encl *encl, + struct sgx_enclave_restrict_perm *modp, + u64 secinfo_perm) +{ + unsigned long vm_prot, run_prot_restore; + struct sgx_encl_page *entry; + struct sgx_secinfo secinfo; + unsigned long addr; + unsigned long c; + void *epc_virt; + int ret; + + memset(&secinfo, 0, sizeof(secinfo)); + secinfo.flags = secinfo_perm; + + vm_prot = vm_prot_from_secinfo(secinfo_perm); + + for (c = 0 ; c < modp->length; c += PAGE_SIZE) { + addr = encl->base + modp->offset + c; + + mutex_lock(&encl->lock); + + entry = sgx_encl_load_page(encl, addr); + if (IS_ERR(entry)) { + ret = PTR_ERR(entry) == -EBUSY ? -EAGAIN : -EFAULT; + goto out_unlock; + } + + /* + * Changing EPCM permissions is only supported on regular + * SGX pages. Attempting this change on other pages will + * result in #PF. + */ + if (entry->type != SGX_PAGE_TYPE_REG) { + ret = -EINVAL; + goto out_unlock; + } + + /* + * Do not verify if current runtime protection bits are what + * is being requested. The enclave may have relaxed EPCM + * permissions calls without letting the kernel know and + * thus permission restriction may still be needed even if + * from the kernel's perspective the permissions are unchanged. + */ + + /* New permissions should never exceed vetted permissions. */ + if ((entry->vm_max_prot_bits & vm_prot) != vm_prot) { + ret = -EPERM; + goto out_unlock; + } + + /* Make sure page stays around while releasing mutex. */ + if (sgx_unmark_page_reclaimable(entry->epc_page)) { + ret = -EAGAIN; + goto out_unlock; + } + + /* + * Change runtime protection before zapping PTEs to ensure + * any new #PF uses new permissions. EPCM permissions (if + * needed) not changed yet. + */ + run_prot_restore = entry->vm_run_prot_bits; + entry->vm_run_prot_bits = vm_prot; + + mutex_unlock(&encl->lock); + /* + * Do not keep encl->lock because of dependency on + * mmap_lock acquired in sgx_zap_enclave_ptes(). + */ + sgx_zap_enclave_ptes(encl, addr); + + mutex_lock(&encl->lock); + + /* Change EPCM permissions. */ + epc_virt = sgx_get_epc_virt_addr(entry->epc_page); + ret = __emodpr(&secinfo, epc_virt); + if (encls_faulted(ret)) { + /* + * All possible faults should be avoidable: + * parameters have been checked, will only change + * permissions of a regular page, and no concurrent + * SGX1/SGX2 ENCLS instructions since these + * are protected with mutex. + */ + pr_err_once("EMODPR encountered exception %d\n", + ENCLS_TRAPNR(ret)); + ret = -EFAULT; + goto out_prot_restore; + } + if (encls_failed(ret)) { + modp->result = ret; + ret = -EFAULT; + goto out_prot_restore; + } + + ret = sgx_enclave_etrack(encl); + if (ret) { + ret = -EFAULT; + goto out_reclaim; + } + + sgx_mark_page_reclaimable(entry->epc_page); + mutex_unlock(&encl->lock); + } + + ret = 0; + goto out; + +out_prot_restore: + entry->vm_run_prot_bits = run_prot_restore; +out_reclaim: + sgx_mark_page_reclaimable(entry->epc_page); +out_unlock: + mutex_unlock(&encl->lock); +out: + modp->count = c; + + return ret; +} + +/** + * sgx_ioc_enclave_restrict_perm() - handler for + * %SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS + * @encl: an enclave pointer + * @arg: userspace pointer to a &struct sgx_enclave_restrict_perm + * instance + * + * SGX2 distinguishes between relaxing and restricting the enclave page + * permissions maintained by the hardware (EPCM permissions) of pages + * belonging to an initialized enclave (after SGX_IOC_ENCLAVE_INIT). + * + * EPCM permissions cannot be restricted from within the enclave, the enclave + * requires the kernel to run the privileged level 0 instructions ENCLS[EMODPR] + * and ENCLS[ETRACK]. An attempt to relax EPCM permissions with this call + * will be ignored by the hardware. + * + * Enclave page permissions are not allowed to exceed the maximum vetted + * permissions maintained in &struct sgx_encl_page->vm_max_prot_bits. + * + * Return: + * - 0: Success + * - -errno: Otherwise + */ +static long sgx_ioc_enclave_restrict_perm(struct sgx_encl *encl, + void __user *arg) +{ + struct sgx_enclave_restrict_perm params; + u64 secinfo_perm; + long ret; + + ret = sgx_ioc_sgx2_ready(encl); + if (ret) + return ret; + + if (copy_from_user(¶ms, arg, sizeof(params))) + return -EFAULT; + + if (sgx_validate_offset_length(encl, params.offset, params.length)) + return -EINVAL; + + ret = sgx_perm_from_user_secinfo((void __user *)params.secinfo, + &secinfo_perm); + if (ret) + return ret; + + if (params.result || params.count) + return -EINVAL; + + ret = sgx_enclave_restrict_perm(encl, ¶ms, secinfo_perm); + + if (copy_to_user(arg, ¶ms, sizeof(params))) + return -EFAULT; + + return ret; +} + long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) { struct sgx_encl *encl = filep->private_data; @@ -918,6 +1144,9 @@ long sgx_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) case SGX_IOC_ENCLAVE_RELAX_PERMISSIONS: ret = sgx_ioc_enclave_relax_perm(encl, (void __user *)arg); break; + case SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS: + ret = sgx_ioc_enclave_restrict_perm(encl, (void __user *)arg); + break; default: ret = -ENOIOCTLCMD; break;
In the initial (SGX1) version of SGX, pages in an enclave need to be created with permissions that support all usages of the pages, from the time the enclave is initialized until it is unloaded. For example, pages used by a JIT compiler or when code needs to otherwise be relocated need to always have RWX permissions. SGX2 includes a new function ENCLS[EMODPR] that is run from the kernel and can be used to restrict the EPCM permissions of regular enclave pages within an initialized enclave. Introduce ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS to support restricting EPCM permissions. With this ioctl() the user specifies a page range and the permissions to be applied to all pages in the provided range. After checking the new permissions (more detail below) the page table entries are reset and any new page table entries will contain the new, restricted, permissions. ENCLS[EMODPR] is run to restrict the EPCM permissions followed by the ENCLS[ETRACK] flow that will ensure no cached linear-to-physical address mappings to the changed pages remain. It is possible for the permission change request to fail on any page within the provided range, either with an error encountered by the kernel or by the SGX hardware while running ENCLS[EMODPR]. To support partial success the ioctl() returns an error code based on failures encountered by the kernel as well as two result output parameters: one for the number of pages that were successfully changed and one for the SGX return code. Checking user provided new permissions ====================================== Enclave page permission changes need to be approached with care and for this reason permission changes are only allowed if the new permissions are the same or more restrictive that the vetted permissions. No additional checking is done to ensure that the permissions are actually being restricted. This is because the enclave may have relaxed the EPCM permissions from within the enclave without letting the kernel know. An attempt to relax permissions using this call will be ignored by the hardware. For example, together with the support for relaxing of EPCM permissions, enclave pages added with the vetted permissions in brackets below are allowed to have permissions as follows: * (RWX) => RW => R => RX => RWX * (RW) => R => RW * (RX) => R => RX Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> --- Changes since V1: - Change terminology to use "relax" instead of "extend" to refer to the case when enclave page permissions are added (Dave). - Use ioctl() in commit message (Dave). - Add examples on what permissions would be allowed (Dave). - Split enclave page permission changes into two ioctl()s, one for permission restricting (SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS) and one for permission relaxing (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS) (Jarkko). - In support of the ioctl() name change the following names have been changed: struct sgx_page_modp -> struct sgx_enclave_restrict_perm sgx_ioc_page_modp() -> sgx_ioc_enclave_restrict_perm() sgx_page_modp() -> sgx_enclave_restrict_perm() - ioctl() takes entire secinfo as input instead of page permissions only (Jarkko). - Fix kernel-doc to include () in function name. - Create and use utility for the ETRACK flow. - Fixups in comments - Move kernel-doc to function that provides documentation for Documentation/x86/sgx.rst. - Remove redundant comment. - Make explicit which members of struct sgx_enclave_restrict_perm are for output (Dave). arch/x86/include/uapi/asm/sgx.h | 21 +++ arch/x86/kernel/cpu/sgx/encl.c | 4 +- arch/x86/kernel/cpu/sgx/encl.h | 3 + arch/x86/kernel/cpu/sgx/ioctl.c | 229 ++++++++++++++++++++++++++++++++ 4 files changed, 255 insertions(+), 2 deletions(-)