diff mbox series

[V2,19/32] x86/sgx: Support adding of pages to an initialized enclave

Message ID fcbde9c3e67289eaff9cd8b34989919629fe823c.1644274683.git.reinette.chatre@intel.com (mailing list archive)
State New, archived
Headers show
Series x86/sgx and selftests/sgx: Support SGX2 | expand

Commit Message

Reinette Chatre Feb. 8, 2022, 12:45 a.m. UTC
With SGX1 an enclave needs to be created with its maximum memory demands
allocated. Pages cannot be added to an enclave after it is initialized.
SGX2 introduces a new function, ENCLS[EAUG], that can be used to add
pages to an initialized enclave. With SGX2 the enclave still needs to
set aside address space for its maximum memory demands during enclave
creation, but all pages need not be added before enclave initialization.
Pages can be added during enclave runtime.

Add support for dynamically adding pages to an initialized enclave,
architecturally limited to RW permission. Add pages via the page fault
handler at the time an enclave address without a backing enclave page
is accessed, potentially directly reclaiming pages if no free pages
are available.

The enclave is still required to run ENCLU[EACCEPT] on the page before
it can be used. A useful flow is for the enclave to run ENCLU[EACCEPT]
on an uninitialized address. This will trigger the page fault handler
that will add the enclave page and return execution to the enclave to
repeat the ENCLU[EACCEPT] instruction, this time successful.

If the enclave accesses an uninitialized address in another way, for
example by expanding the enclave stack to a page that has not yet been
added, then the page fault handler would add the page on the first
write but upon returning to the enclave the instruction that triggered
the page fault would be repeated and since ENCLU[EACCEPT] was not run
yet it would trigger a second page fault, this time with the SGX flag
set in the page fault error code. This can only be recovered by entering
the enclave again and directly running the ENCLU[EACCEPT] instruction on
the now initialized address.

Accessing an uninitialized address from outside the enclave also
triggers this flow but the page will remain inaccessible (access will
result in #PF) until accepted from within the enclave via
ENCLU[EACCEPT].

The page is added with the architecturally constrained RW permissions
as runtime as well as maximum allowed permissions. It is understood that
there are some use cases, for example code relocation, that requires RWX
maximum permissions. Supporting these use cases require guidance from
user space policy before such maximum permissions can be allowed.
Integration with user policy is deferred.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- Fix subject line "to initialized" -> "to an initialized" (Jarkko).
- Move text about hardware's PENDING state to the patch that introduces
  the ENCLS[EAUG] wrapper (Jarkko).
- Ensure kernel-doc uses brackets when referring to function.

 arch/x86/kernel/cpu/sgx/encl.c  | 133 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.h  |   2 +
 arch/x86/kernel/cpu/sgx/ioctl.c |   4 +-
 3 files changed, 137 insertions(+), 2 deletions(-)

Comments

Jarkko Sakkinen Feb. 19, 2022, 11:57 a.m. UTC | #1
On Mon, Feb 07, 2022 at 04:45:41PM -0800, Reinette Chatre wrote:
> With SGX1 an enclave needs to be created with its maximum memory demands
> allocated. Pages cannot be added to an enclave after it is initialized.
> SGX2 introduces a new function, ENCLS[EAUG], that can be used to add
> pages to an initialized enclave. With SGX2 the enclave still needs to
> set aside address space for its maximum memory demands during enclave
> creation, but all pages need not be added before enclave initialization.
> Pages can be added during enclave runtime.
> 
> Add support for dynamically adding pages to an initialized enclave,
> architecturally limited to RW permission. Add pages via the page fault
> handler at the time an enclave address without a backing enclave page
> is accessed, potentially directly reclaiming pages if no free pages
> are available.
> 
> The enclave is still required to run ENCLU[EACCEPT] on the page before
> it can be used. A useful flow is for the enclave to run ENCLU[EACCEPT]
> on an uninitialized address. This will trigger the page fault handler
> that will add the enclave page and return execution to the enclave to
> repeat the ENCLU[EACCEPT] instruction, this time successful.
> 
> If the enclave accesses an uninitialized address in another way, for
> example by expanding the enclave stack to a page that has not yet been
> added, then the page fault handler would add the page on the first
> write but upon returning to the enclave the instruction that triggered
> the page fault would be repeated and since ENCLU[EACCEPT] was not run
> yet it would trigger a second page fault, this time with the SGX flag
> set in the page fault error code. This can only be recovered by entering
> the enclave again and directly running the ENCLU[EACCEPT] instruction on
> the now initialized address.
> 
> Accessing an uninitialized address from outside the enclave also
> triggers this flow but the page will remain inaccessible (access will
> result in #PF) until accepted from within the enclave via
> ENCLU[EACCEPT].
> 
> The page is added with the architecturally constrained RW permissions
> as runtime as well as maximum allowed permissions. It is understood that
> there are some use cases, for example code relocation, that requires RWX
> maximum permissions. Supporting these use cases require guidance from
> user space policy before such maximum permissions can be allowed.
> Integration with user policy is deferred.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V1:
> - Fix subject line "to initialized" -> "to an initialized" (Jarkko).
> - Move text about hardware's PENDING state to the patch that introduces
>   the ENCLS[EAUG] wrapper (Jarkko).
> - Ensure kernel-doc uses brackets when referring to function.
> 
>  arch/x86/kernel/cpu/sgx/encl.c  | 133 ++++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/encl.h  |   2 +
>  arch/x86/kernel/cpu/sgx/ioctl.c |   4 +-
>  3 files changed, 137 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index a5d4a7efb986..d1e3ea86b902 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -124,6 +124,128 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>  	return entry;
>  }
>  
> +/**
> + * sgx_encl_eaug_page() - Dynamically add page to initialized enclave
> + * @vma:	VMA obtained from fault info from where page is accessed
> + * @encl:	enclave accessing the page
> + * @addr:	address that triggered the page fault
> + *
> + * When an initialized enclave accesses a page with no backing EPC page
> + * on a SGX2 system then the EPC can be added dynamically via the SGX2
> + * ENCLS[EAUG] instruction.
> + *
> + * Returns: Appropriate vm_fault_t: VM_FAULT_NOPAGE when PTE was installed
> + * successfully, VM_FAULT_SIGBUS or VM_FAULT_OOM as error otherwise.
> + */
> +static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
> +				     struct sgx_encl *encl, unsigned long addr)
> +{
> +	struct sgx_pageinfo pginfo = {0};
> +	struct sgx_encl_page *encl_page;
> +	struct sgx_epc_page *epc_page;
> +	struct sgx_va_page *va_page;
> +	unsigned long phys_addr;
> +	unsigned long prot;
> +	vm_fault_t vmret;
> +	int ret;
> +
> +	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
> +		return VM_FAULT_SIGBUS;
> +
> +	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
> +	if (!encl_page)
> +		return VM_FAULT_OOM;
> +
> +	encl_page->desc = addr;
> +	encl_page->encl = encl;
> +
> +	/*
> +	 * Adding a regular page that is architecturally allowed to only
> +	 * be created with RW permissions.
> +	 * TBD: Interface with user space policy to support max permissions
> +	 * of RWX.
> +	 */
> +	prot = PROT_READ | PROT_WRITE;
> +	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> +	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> +
> +	epc_page = sgx_alloc_epc_page(encl_page, true);
> +	if (IS_ERR(epc_page)) {
> +		kfree(encl_page);
> +		return VM_FAULT_SIGBUS;
> +	}
> +
> +	va_page = sgx_encl_grow(encl);
> +	if (IS_ERR(va_page)) {
> +		ret = PTR_ERR(va_page);
> +		goto err_out_free;
> +	}
> +
> +	mutex_lock(&encl->lock);
> +
> +	/*
> +	 * Copy comment from sgx_encl_add_page() to maintain guidance in
> +	 * this similar flow:
> +	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
> +	 * deleting (via sgx_encl_shrink()) in the error path.
> +	 */
> +	if (va_page)
> +		list_add(&va_page->list, &encl->va_pages);
> +
> +	ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
> +			encl_page, GFP_KERNEL);
> +	/*
> +	 * If ret == -EBUSY then page was created in another flow while
> +	 * running without encl->lock
> +	 */
> +	if (ret)
> +		goto err_out_unlock;
> +
> +	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
> +	pginfo.addr = encl_page->desc & PAGE_MASK;
> +	pginfo.metadata = 0;
> +
> +	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
> +	if (ret)
> +		goto err_out;
> +
> +	encl_page->encl = encl;
> +	encl_page->epc_page = epc_page;
> +	encl_page->type = SGX_PAGE_TYPE_REG;
> +	encl->secs_child_cnt++;
> +
> +	sgx_mark_page_reclaimable(encl_page->epc_page);
> +
> +	phys_addr = sgx_get_epc_phys_addr(epc_page);
> +	/*
> +	 * Do not undo everything when creating PTE entry fails - next #PF
> +	 * would find page ready for a PTE.
> +	 * PAGE_SHARED because protection is forced to be RW above and COW
> +	 * is not supported.
> +	 */
> +	vmret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
> +				    PAGE_SHARED);
> +	if (vmret != VM_FAULT_NOPAGE) {
> +		mutex_unlock(&encl->lock);
> +		return VM_FAULT_SIGBUS;
> +	}
> +	mutex_unlock(&encl->lock);
> +	return VM_FAULT_NOPAGE;
> +
> +err_out:
> +	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
> +
> +err_out_unlock:
> +	sgx_encl_shrink(encl, va_page);
> +	mutex_unlock(&encl->lock);
> +
> +err_out_free:
> +	sgx_encl_free_epc_page(epc_page);
> +	kfree(encl_page);
> +
> +	return VM_FAULT_SIGBUS;
> +}
> +
>  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>  {
>  	unsigned long addr = (unsigned long)vmf->address;
> @@ -145,6 +267,17 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
>  	if (unlikely(!encl))
>  		return VM_FAULT_SIGBUS;
>  
> +	/*
> +	 * The page_array keeps track of all enclave pages, whether they
> +	 * are swapped out or not. If there is no entry for this page and
> +	 * the system supports SGX2 then it is possible to dynamically add
> +	 * a new enclave page. This is only possible for an initialized
> +	 * enclave that will be checked for right away.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_SGX2) &&
> +	    (!xa_load(&encl->page_array, PFN_DOWN(addr))))
> +		return sgx_encl_eaug_page(vma, encl, addr);
> +
>  	mutex_lock(&encl->lock);
>  
>  	entry = sgx_encl_load_page(encl, addr);
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index 848a28d28d3d..1b6ce1da7c92 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -123,4 +123,6 @@ void sgx_encl_free_epc_page(struct sgx_epc_page *page);
>  struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
>  					 unsigned long addr);
>  
> +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl);
> +void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page);
>  #endif /* _X86_ENCL_H */
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 23bdf558b231..58ff62a1fb00 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -17,7 +17,7 @@
>  #include "encl.h"
>  #include "encls.h"
>  
> -static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
>  {
>  	struct sgx_va_page *va_page = NULL;
>  	void *err;
> @@ -43,7 +43,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
>  	return va_page;
>  }
>  
> -static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
> +void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
>  {
>  	encl->page_cnt--;
>  
> -- 
> 2.25.1
> 

Quickly looking through also this sequence is possible:

1. Enclave's run-time flow ignores the whole EACCEPT but instead a memory
   dereference will initialize the sequence.
2. This causes #PF handler to do EAUG and after the enclave is re-entered
   the vDSO exists because the page is not EACCEPT'd.
2. Enclave host enter in-enclave exception handler, which does EACCEPT.

Can you confirm this? I'm planning to test this patch by implementing EAUG
support in Rust for Enarx. At this point I'm not yet sure whether I choose
EACCEPT initiated or memory deference initiated code path but I think it is
good if the kernel implementation is good enough to support both.

Other than that, this looks super solid!

BR, Jarkko
Jarkko Sakkinen Feb. 19, 2022, 12:01 p.m. UTC | #2
On Sat, Feb 19, 2022 at 12:57:21PM +0100, Jarkko Sakkinen wrote:
> On Mon, Feb 07, 2022 at 04:45:41PM -0800, Reinette Chatre wrote:
> > With SGX1 an enclave needs to be created with its maximum memory demands
> > allocated. Pages cannot be added to an enclave after it is initialized.
> > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add
> > pages to an initialized enclave. With SGX2 the enclave still needs to
> > set aside address space for its maximum memory demands during enclave
> > creation, but all pages need not be added before enclave initialization.
> > Pages can be added during enclave runtime.
> > 
> > Add support for dynamically adding pages to an initialized enclave,
> > architecturally limited to RW permission. Add pages via the page fault
> > handler at the time an enclave address without a backing enclave page
> > is accessed, potentially directly reclaiming pages if no free pages
> > are available.
> > 
> > The enclave is still required to run ENCLU[EACCEPT] on the page before
> > it can be used. A useful flow is for the enclave to run ENCLU[EACCEPT]
> > on an uninitialized address. This will trigger the page fault handler
> > that will add the enclave page and return execution to the enclave to
> > repeat the ENCLU[EACCEPT] instruction, this time successful.
> > 
> > If the enclave accesses an uninitialized address in another way, for
> > example by expanding the enclave stack to a page that has not yet been
> > added, then the page fault handler would add the page on the first
> > write but upon returning to the enclave the instruction that triggered
> > the page fault would be repeated and since ENCLU[EACCEPT] was not run
> > yet it would trigger a second page fault, this time with the SGX flag
> > set in the page fault error code. This can only be recovered by entering
> > the enclave again and directly running the ENCLU[EACCEPT] instruction on
> > the now initialized address.
> > 
> > Accessing an uninitialized address from outside the enclave also
> > triggers this flow but the page will remain inaccessible (access will
> > result in #PF) until accepted from within the enclave via
> > ENCLU[EACCEPT].
> > 
> > The page is added with the architecturally constrained RW permissions
> > as runtime as well as maximum allowed permissions. It is understood that
> > there are some use cases, for example code relocation, that requires RWX
> > maximum permissions. Supporting these use cases require guidance from
> > user space policy before such maximum permissions can be allowed.
> > Integration with user policy is deferred.
> > 
> > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > ---
> > Changes since V1:
> > - Fix subject line "to initialized" -> "to an initialized" (Jarkko).
> > - Move text about hardware's PENDING state to the patch that introduces
> >   the ENCLS[EAUG] wrapper (Jarkko).
> > - Ensure kernel-doc uses brackets when referring to function.
> > 
> >  arch/x86/kernel/cpu/sgx/encl.c  | 133 ++++++++++++++++++++++++++++++++
> >  arch/x86/kernel/cpu/sgx/encl.h  |   2 +
> >  arch/x86/kernel/cpu/sgx/ioctl.c |   4 +-
> >  3 files changed, 137 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index a5d4a7efb986..d1e3ea86b902 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -124,6 +124,128 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> >  	return entry;
> >  }
> >  
> > +/**
> > + * sgx_encl_eaug_page() - Dynamically add page to initialized enclave
> > + * @vma:	VMA obtained from fault info from where page is accessed
> > + * @encl:	enclave accessing the page
> > + * @addr:	address that triggered the page fault
> > + *
> > + * When an initialized enclave accesses a page with no backing EPC page
> > + * on a SGX2 system then the EPC can be added dynamically via the SGX2
> > + * ENCLS[EAUG] instruction.
> > + *
> > + * Returns: Appropriate vm_fault_t: VM_FAULT_NOPAGE when PTE was installed
> > + * successfully, VM_FAULT_SIGBUS or VM_FAULT_OOM as error otherwise.
> > + */
> > +static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
> > +				     struct sgx_encl *encl, unsigned long addr)
> > +{
> > +	struct sgx_pageinfo pginfo = {0};
> > +	struct sgx_encl_page *encl_page;
> > +	struct sgx_epc_page *epc_page;
> > +	struct sgx_va_page *va_page;
> > +	unsigned long phys_addr;
> > +	unsigned long prot;
> > +	vm_fault_t vmret;
> > +	int ret;
> > +
> > +	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
> > +		return VM_FAULT_SIGBUS;
> > +
> > +	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
> > +	if (!encl_page)
> > +		return VM_FAULT_OOM;
> > +
> > +	encl_page->desc = addr;
> > +	encl_page->encl = encl;
> > +
> > +	/*
> > +	 * Adding a regular page that is architecturally allowed to only
> > +	 * be created with RW permissions.
> > +	 * TBD: Interface with user space policy to support max permissions
> > +	 * of RWX.
> > +	 */
> > +	prot = PROT_READ | PROT_WRITE;
> > +	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > +	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> > +
> > +	epc_page = sgx_alloc_epc_page(encl_page, true);
> > +	if (IS_ERR(epc_page)) {
> > +		kfree(encl_page);
> > +		return VM_FAULT_SIGBUS;
> > +	}
> > +
> > +	va_page = sgx_encl_grow(encl);
> > +	if (IS_ERR(va_page)) {
> > +		ret = PTR_ERR(va_page);
> > +		goto err_out_free;
> > +	}
> > +
> > +	mutex_lock(&encl->lock);
> > +
> > +	/*
> > +	 * Copy comment from sgx_encl_add_page() to maintain guidance in
> > +	 * this similar flow:
> > +	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
> > +	 * deleting (via sgx_encl_shrink()) in the error path.
> > +	 */
> > +	if (va_page)
> > +		list_add(&va_page->list, &encl->va_pages);
> > +
> > +	ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
> > +			encl_page, GFP_KERNEL);
> > +	/*
> > +	 * If ret == -EBUSY then page was created in another flow while
> > +	 * running without encl->lock
> > +	 */
> > +	if (ret)
> > +		goto err_out_unlock;
> > +
> > +	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
> > +	pginfo.addr = encl_page->desc & PAGE_MASK;
> > +	pginfo.metadata = 0;
> > +
> > +	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
> > +	if (ret)
> > +		goto err_out;
> > +
> > +	encl_page->encl = encl;
> > +	encl_page->epc_page = epc_page;
> > +	encl_page->type = SGX_PAGE_TYPE_REG;
> > +	encl->secs_child_cnt++;
> > +
> > +	sgx_mark_page_reclaimable(encl_page->epc_page);
> > +
> > +	phys_addr = sgx_get_epc_phys_addr(epc_page);
> > +	/*
> > +	 * Do not undo everything when creating PTE entry fails - next #PF
> > +	 * would find page ready for a PTE.
> > +	 * PAGE_SHARED because protection is forced to be RW above and COW
> > +	 * is not supported.
> > +	 */
> > +	vmret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
> > +				    PAGE_SHARED);
> > +	if (vmret != VM_FAULT_NOPAGE) {
> > +		mutex_unlock(&encl->lock);
> > +		return VM_FAULT_SIGBUS;
> > +	}
> > +	mutex_unlock(&encl->lock);
> > +	return VM_FAULT_NOPAGE;
> > +
> > +err_out:
> > +	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
> > +
> > +err_out_unlock:
> > +	sgx_encl_shrink(encl, va_page);
> > +	mutex_unlock(&encl->lock);
> > +
> > +err_out_free:
> > +	sgx_encl_free_epc_page(epc_page);
> > +	kfree(encl_page);
> > +
> > +	return VM_FAULT_SIGBUS;
> > +}
> > +
> >  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >  {
> >  	unsigned long addr = (unsigned long)vmf->address;
> > @@ -145,6 +267,17 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> >  	if (unlikely(!encl))
> >  		return VM_FAULT_SIGBUS;
> >  
> > +	/*
> > +	 * The page_array keeps track of all enclave pages, whether they
> > +	 * are swapped out or not. If there is no entry for this page and
> > +	 * the system supports SGX2 then it is possible to dynamically add
> > +	 * a new enclave page. This is only possible for an initialized
> > +	 * enclave that will be checked for right away.
> > +	 */
> > +	if (cpu_feature_enabled(X86_FEATURE_SGX2) &&
> > +	    (!xa_load(&encl->page_array, PFN_DOWN(addr))))
> > +		return sgx_encl_eaug_page(vma, encl, addr);
> > +
> >  	mutex_lock(&encl->lock);
> >  
> >  	entry = sgx_encl_load_page(encl, addr);
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> > index 848a28d28d3d..1b6ce1da7c92 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > @@ -123,4 +123,6 @@ void sgx_encl_free_epc_page(struct sgx_epc_page *page);
> >  struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> >  					 unsigned long addr);
> >  
> > +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl);
> > +void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page);
> >  #endif /* _X86_ENCL_H */
> > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> > index 23bdf558b231..58ff62a1fb00 100644
> > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > @@ -17,7 +17,7 @@
> >  #include "encl.h"
> >  #include "encls.h"
> >  
> > -static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> > +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> >  {
> >  	struct sgx_va_page *va_page = NULL;
> >  	void *err;
> > @@ -43,7 +43,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> >  	return va_page;
> >  }
> >  
> > -static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
> > +void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
> >  {
> >  	encl->page_cnt--;
> >  
> > -- 
> > 2.25.1
> > 
> 
> Quickly looking through also this sequence is possible:
> 
> 1. Enclave's run-time flow ignores the whole EACCEPT but instead a memory
>    dereference will initialize the sequence.
> 2. This causes #PF handler to do EAUG and after the enclave is re-entered
>    the vDSO exists because the page is not EACCEPT'd.
> 2. Enclave host enter in-enclave exception handler, which does EACCEPT.
> 
> Can you confirm this? I'm planning to test this patch by implementing EAUG
> support in Rust for Enarx. At this point I'm not yet sure whether I choose
> EACCEPT initiated or memory deference initiated code path but I think it is
> good if the kernel implementation is good enough to support both.
> 
> Other than that, this looks super solid!

I got my answer:

https://lore.kernel.org/linux-sgx/32c1116934a588bd3e6c174684e3e36a05c0a4d4.1644274683.git.reinette.chatre@intel.com/

I could almost give reviewed-by but I need to write the user space
implementation first to check that this works for Enarx.

BR, Jarkko
Jarkko Sakkinen Feb. 20, 2022, 6:40 p.m. UTC | #3
On Sat, Feb 19, 2022 at 01:01:08PM +0100, Jarkko Sakkinen wrote:
> On Sat, Feb 19, 2022 at 12:57:21PM +0100, Jarkko Sakkinen wrote:
> > On Mon, Feb 07, 2022 at 04:45:41PM -0800, Reinette Chatre wrote:
> > > With SGX1 an enclave needs to be created with its maximum memory demands
> > > allocated. Pages cannot be added to an enclave after it is initialized.
> > > SGX2 introduces a new function, ENCLS[EAUG], that can be used to add
> > > pages to an initialized enclave. With SGX2 the enclave still needs to
> > > set aside address space for its maximum memory demands during enclave
> > > creation, but all pages need not be added before enclave initialization.
> > > Pages can be added during enclave runtime.
> > > 
> > > Add support for dynamically adding pages to an initialized enclave,
> > > architecturally limited to RW permission. Add pages via the page fault
> > > handler at the time an enclave address without a backing enclave page
> > > is accessed, potentially directly reclaiming pages if no free pages
> > > are available.
> > > 
> > > The enclave is still required to run ENCLU[EACCEPT] on the page before
> > > it can be used. A useful flow is for the enclave to run ENCLU[EACCEPT]
> > > on an uninitialized address. This will trigger the page fault handler
> > > that will add the enclave page and return execution to the enclave to
> > > repeat the ENCLU[EACCEPT] instruction, this time successful.
> > > 
> > > If the enclave accesses an uninitialized address in another way, for
> > > example by expanding the enclave stack to a page that has not yet been
> > > added, then the page fault handler would add the page on the first
> > > write but upon returning to the enclave the instruction that triggered
> > > the page fault would be repeated and since ENCLU[EACCEPT] was not run
> > > yet it would trigger a second page fault, this time with the SGX flag
> > > set in the page fault error code. This can only be recovered by entering
> > > the enclave again and directly running the ENCLU[EACCEPT] instruction on
> > > the now initialized address.
> > > 
> > > Accessing an uninitialized address from outside the enclave also
> > > triggers this flow but the page will remain inaccessible (access will
> > > result in #PF) until accepted from within the enclave via
> > > ENCLU[EACCEPT].
> > > 
> > > The page is added with the architecturally constrained RW permissions
> > > as runtime as well as maximum allowed permissions. It is understood that
> > > there are some use cases, for example code relocation, that requires RWX
> > > maximum permissions. Supporting these use cases require guidance from
> > > user space policy before such maximum permissions can be allowed.
> > > Integration with user policy is deferred.
> > > 
> > > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > > ---
> > > Changes since V1:
> > > - Fix subject line "to initialized" -> "to an initialized" (Jarkko).
> > > - Move text about hardware's PENDING state to the patch that introduces
> > >   the ENCLS[EAUG] wrapper (Jarkko).
> > > - Ensure kernel-doc uses brackets when referring to function.
> > > 
> > >  arch/x86/kernel/cpu/sgx/encl.c  | 133 ++++++++++++++++++++++++++++++++
> > >  arch/x86/kernel/cpu/sgx/encl.h  |   2 +
> > >  arch/x86/kernel/cpu/sgx/ioctl.c |   4 +-
> > >  3 files changed, 137 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > > index a5d4a7efb986..d1e3ea86b902 100644
> > > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > > @@ -124,6 +124,128 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > >  	return entry;
> > >  }
> > >  
> > > +/**
> > > + * sgx_encl_eaug_page() - Dynamically add page to initialized enclave
> > > + * @vma:	VMA obtained from fault info from where page is accessed
> > > + * @encl:	enclave accessing the page
> > > + * @addr:	address that triggered the page fault
> > > + *
> > > + * When an initialized enclave accesses a page with no backing EPC page
> > > + * on a SGX2 system then the EPC can be added dynamically via the SGX2
> > > + * ENCLS[EAUG] instruction.
> > > + *
> > > + * Returns: Appropriate vm_fault_t: VM_FAULT_NOPAGE when PTE was installed
> > > + * successfully, VM_FAULT_SIGBUS or VM_FAULT_OOM as error otherwise.
> > > + */
> > > +static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
> > > +				     struct sgx_encl *encl, unsigned long addr)
> > > +{
> > > +	struct sgx_pageinfo pginfo = {0};
> > > +	struct sgx_encl_page *encl_page;
> > > +	struct sgx_epc_page *epc_page;
> > > +	struct sgx_va_page *va_page;
> > > +	unsigned long phys_addr;
> > > +	unsigned long prot;
> > > +	vm_fault_t vmret;
> > > +	int ret;
> > > +
> > > +	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
> > > +		return VM_FAULT_SIGBUS;
> > > +
> > > +	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
> > > +	if (!encl_page)
> > > +		return VM_FAULT_OOM;
> > > +
> > > +	encl_page->desc = addr;
> > > +	encl_page->encl = encl;
> > > +
> > > +	/*
> > > +	 * Adding a regular page that is architecturally allowed to only
> > > +	 * be created with RW permissions.
> > > +	 * TBD: Interface with user space policy to support max permissions
> > > +	 * of RWX.
> > > +	 */
> > > +	prot = PROT_READ | PROT_WRITE;
> > > +	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> > > +	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
> > > +
> > > +	epc_page = sgx_alloc_epc_page(encl_page, true);
> > > +	if (IS_ERR(epc_page)) {
> > > +		kfree(encl_page);
> > > +		return VM_FAULT_SIGBUS;
> > > +	}
> > > +
> > > +	va_page = sgx_encl_grow(encl);
> > > +	if (IS_ERR(va_page)) {
> > > +		ret = PTR_ERR(va_page);
> > > +		goto err_out_free;
> > > +	}
> > > +
> > > +	mutex_lock(&encl->lock);
> > > +
> > > +	/*
> > > +	 * Copy comment from sgx_encl_add_page() to maintain guidance in
> > > +	 * this similar flow:
> > > +	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
> > > +	 * deleting (via sgx_encl_shrink()) in the error path.
> > > +	 */
> > > +	if (va_page)
> > > +		list_add(&va_page->list, &encl->va_pages);
> > > +
> > > +	ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
> > > +			encl_page, GFP_KERNEL);
> > > +	/*
> > > +	 * If ret == -EBUSY then page was created in another flow while
> > > +	 * running without encl->lock
> > > +	 */
> > > +	if (ret)
> > > +		goto err_out_unlock;
> > > +
> > > +	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
> > > +	pginfo.addr = encl_page->desc & PAGE_MASK;
> > > +	pginfo.metadata = 0;
> > > +
> > > +	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
> > > +	if (ret)
> > > +		goto err_out;
> > > +
> > > +	encl_page->encl = encl;
> > > +	encl_page->epc_page = epc_page;
> > > +	encl_page->type = SGX_PAGE_TYPE_REG;
> > > +	encl->secs_child_cnt++;
> > > +
> > > +	sgx_mark_page_reclaimable(encl_page->epc_page);
> > > +
> > > +	phys_addr = sgx_get_epc_phys_addr(epc_page);
> > > +	/*
> > > +	 * Do not undo everything when creating PTE entry fails - next #PF
> > > +	 * would find page ready for a PTE.
> > > +	 * PAGE_SHARED because protection is forced to be RW above and COW
> > > +	 * is not supported.
> > > +	 */
> > > +	vmret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
> > > +				    PAGE_SHARED);
> > > +	if (vmret != VM_FAULT_NOPAGE) {
> > > +		mutex_unlock(&encl->lock);
> > > +		return VM_FAULT_SIGBUS;
> > > +	}
> > > +	mutex_unlock(&encl->lock);
> > > +	return VM_FAULT_NOPAGE;
> > > +
> > > +err_out:
> > > +	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
> > > +
> > > +err_out_unlock:
> > > +	sgx_encl_shrink(encl, va_page);
> > > +	mutex_unlock(&encl->lock);
> > > +
> > > +err_out_free:
> > > +	sgx_encl_free_epc_page(epc_page);
> > > +	kfree(encl_page);
> > > +
> > > +	return VM_FAULT_SIGBUS;
> > > +}
> > > +
> > >  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> > >  {
> > >  	unsigned long addr = (unsigned long)vmf->address;
> > > @@ -145,6 +267,17 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
> > >  	if (unlikely(!encl))
> > >  		return VM_FAULT_SIGBUS;
> > >  
> > > +	/*
> > > +	 * The page_array keeps track of all enclave pages, whether they
> > > +	 * are swapped out or not. If there is no entry for this page and
> > > +	 * the system supports SGX2 then it is possible to dynamically add
> > > +	 * a new enclave page. This is only possible for an initialized
> > > +	 * enclave that will be checked for right away.
> > > +	 */
> > > +	if (cpu_feature_enabled(X86_FEATURE_SGX2) &&
> > > +	    (!xa_load(&encl->page_array, PFN_DOWN(addr))))
> > > +		return sgx_encl_eaug_page(vma, encl, addr);
> > > +
> > >  	mutex_lock(&encl->lock);
> > >  
> > >  	entry = sgx_encl_load_page(encl, addr);
> > > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> > > index 848a28d28d3d..1b6ce1da7c92 100644
> > > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > > @@ -123,4 +123,6 @@ void sgx_encl_free_epc_page(struct sgx_epc_page *page);
> > >  struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > >  					 unsigned long addr);
> > >  
> > > +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl);
> > > +void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page);
> > >  #endif /* _X86_ENCL_H */
> > > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> > > index 23bdf558b231..58ff62a1fb00 100644
> > > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > > @@ -17,7 +17,7 @@
> > >  #include "encl.h"
> > >  #include "encls.h"
> > >  
> > > -static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> > > +struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> > >  {
> > >  	struct sgx_va_page *va_page = NULL;
> > >  	void *err;
> > > @@ -43,7 +43,7 @@ static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
> > >  	return va_page;
> > >  }
> > >  
> > > -static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
> > > +void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
> > >  {
> > >  	encl->page_cnt--;
> > >  
> > > -- 
> > > 2.25.1
> > > 
> > 
> > Quickly looking through also this sequence is possible:
> > 
> > 1. Enclave's run-time flow ignores the whole EACCEPT but instead a memory
> >    dereference will initialize the sequence.
> > 2. This causes #PF handler to do EAUG and after the enclave is re-entered
> >    the vDSO exists because the page is not EACCEPT'd.
> > 2. Enclave host enter in-enclave exception handler, which does EACCEPT.
> > 
> > Can you confirm this? I'm planning to test this patch by implementing EAUG
> > support in Rust for Enarx. At this point I'm not yet sure whether I choose
> > EACCEPT initiated or memory deference initiated code path but I think it is
> > good if the kernel implementation is good enough to support both.
> > 
> > Other than that, this looks super solid!
> 
> I got my answer:
> 
> https://lore.kernel.org/linux-sgx/32c1116934a588bd3e6c174684e3e36a05c0a4d4.1644274683.git.reinette.chatre@intel.com/
> 
> I could almost give reviewed-by but I need to write the user space
> implementation first to check that this works for Enarx.

Do you know if it is possible to do EAUG, EMODPR and the do a single
EACCEPT for both? Just looking at pseudo-code, it looked doable but
I need to check this.

I.e. EAUG has this

EPCM(DS:RCX).BLOCKED := 0;
EPCM(DS:RCX).PENDING := 1;
EPCM(DS:RCX).MODIFIED := 0;
EPCM(DS:RCX).PR := 0;
(* associate the EPCPAGE with the SECS by storing the SECS identifier of DS:TMP_SECS *)
Update EPCM(DS:RCX) SECS identifier to reference DS:TMP_SECS identifier;
(* Set EPCM valid fields *)
EPCM(DS:RCX).VALID := 1;

And EMODPR only checks .VALID.

Doing two EACCEPT rounds is a bit rough as you have the page available in a
kind of "stalled' state.

/Jarkko
Reinette Chatre Feb. 22, 2022, 7:19 p.m. UTC | #4
Hi Jarkko,

On 2/20/2022 10:40 AM, Jarkko Sakkinen wrote:
...
 
> Do you know if it is possible to do EAUG, EMODPR and the do a single
> EACCEPT for both? Just looking at pseudo-code, it looked doable but
> I need to check this.
> 
> I.e. EAUG has this
> 
> EPCM(DS:RCX).BLOCKED := 0;
> EPCM(DS:RCX).PENDING := 1;
> EPCM(DS:RCX).MODIFIED := 0;
> EPCM(DS:RCX).PR := 0;
> (* associate the EPCPAGE with the SECS by storing the SECS identifier of DS:TMP_SECS *)
> Update EPCM(DS:RCX) SECS identifier to reference DS:TMP_SECS identifier;
> (* Set EPCM valid fields *)
> EPCM(DS:RCX).VALID := 1;
> 
> And EMODPR only checks .VALID.

After that check there is also:
IF (EPCM(DS:RCX).PENDING is not 0 or (EPCM(DS:RCX).MODIFIED is not 0) )
    THEN
        RFLAGS.ZF := 1;
        RAX := SGX_PAGE_NOT_MODIFIABLE;
        GOTO DONE;
FI;

Attempting the SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl() on a recently
added page (EAUG) that has not yet been EACCEPTed is thus expected to fail
with errno of EFAULT (indicating ENCLS[EMODPR] failure) and the returned
structure's result field set to 20 (SGX_PAGE_NOT_MODIFIABLE).

I confirmed this behavior by modifying the "augment" kselftest test by adding
a SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS call between the new memory access and
the EACCEPT.

Reinette
Jarkko Sakkinen Feb. 23, 2022, 3:46 p.m. UTC | #5
On Tue, Feb 22, 2022 at 11:19:11AM -0800, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 2/20/2022 10:40 AM, Jarkko Sakkinen wrote:
> ...
>  
> > Do you know if it is possible to do EAUG, EMODPR and the do a single
> > EACCEPT for both? Just looking at pseudo-code, it looked doable but
> > I need to check this.
> > 
> > I.e. EAUG has this
> > 
> > EPCM(DS:RCX).BLOCKED := 0;
> > EPCM(DS:RCX).PENDING := 1;
> > EPCM(DS:RCX).MODIFIED := 0;
> > EPCM(DS:RCX).PR := 0;
> > (* associate the EPCPAGE with the SECS by storing the SECS identifier of DS:TMP_SECS *)
> > Update EPCM(DS:RCX) SECS identifier to reference DS:TMP_SECS identifier;
> > (* Set EPCM valid fields *)
> > EPCM(DS:RCX).VALID := 1;
> > 
> > And EMODPR only checks .VALID.
> 
> After that check there is also:
> IF (EPCM(DS:RCX).PENDING is not 0 or (EPCM(DS:RCX).MODIFIED is not 0) )
>     THEN
>         RFLAGS.ZF := 1;
>         RAX := SGX_PAGE_NOT_MODIFIABLE;
>         GOTO DONE;
> FI;
> 
> Attempting the SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl() on a recently
> added page (EAUG) that has not yet been EACCEPTed is thus expected to fail
> with errno of EFAULT (indicating ENCLS[EMODPR] failure) and the returned
> structure's result field set to 20 (SGX_PAGE_NOT_MODIFIABLE).
> 
> I confirmed this behavior by modifying the "augment" kselftest test by adding
> a SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS call between the new memory access and
> the EACCEPT.

Thank you, also Mark confirmed this.

BR, Jarkko
Jarkko Sakkinen March 7, 2022, 4:16 p.m. UTC | #6
On Mon, Feb 07, 2022 at 04:45:41PM -0800, Reinette Chatre wrote:
> +	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
> +	if (!encl_page)
> +		return VM_FAULT_OOM;
> +
> +	encl_page->desc = addr;
> +	encl_page->encl = encl;
> +
> +	/*
> +	 * Adding a regular page that is architecturally allowed to only
> +	 * be created with RW permissions.
> +	 * TBD: Interface with user space policy to support max permissions
> +	 * of RWX.
> +	 */
> +	prot = PROT_READ | PROT_WRITE;
> +	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
> +	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;

You should use sgx_encl_page_alloc() here and not reinvent wheel.

I wrote a patch that exports it:

https://lore.kernel.org/linux-sgx/20220306053211.135762-3-jarkko@kernel.org/T/#u

BR, Jarkko
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index a5d4a7efb986..d1e3ea86b902 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -124,6 +124,128 @@  struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
 	return entry;
 }
 
+/**
+ * sgx_encl_eaug_page() - Dynamically add page to initialized enclave
+ * @vma:	VMA obtained from fault info from where page is accessed
+ * @encl:	enclave accessing the page
+ * @addr:	address that triggered the page fault
+ *
+ * When an initialized enclave accesses a page with no backing EPC page
+ * on a SGX2 system then the EPC can be added dynamically via the SGX2
+ * ENCLS[EAUG] instruction.
+ *
+ * Returns: Appropriate vm_fault_t: VM_FAULT_NOPAGE when PTE was installed
+ * successfully, VM_FAULT_SIGBUS or VM_FAULT_OOM as error otherwise.
+ */
+static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
+				     struct sgx_encl *encl, unsigned long addr)
+{
+	struct sgx_pageinfo pginfo = {0};
+	struct sgx_encl_page *encl_page;
+	struct sgx_epc_page *epc_page;
+	struct sgx_va_page *va_page;
+	unsigned long phys_addr;
+	unsigned long prot;
+	vm_fault_t vmret;
+	int ret;
+
+	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
+		return VM_FAULT_SIGBUS;
+
+	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
+	if (!encl_page)
+		return VM_FAULT_OOM;
+
+	encl_page->desc = addr;
+	encl_page->encl = encl;
+
+	/*
+	 * Adding a regular page that is architecturally allowed to only
+	 * be created with RW permissions.
+	 * TBD: Interface with user space policy to support max permissions
+	 * of RWX.
+	 */
+	prot = PROT_READ | PROT_WRITE;
+	encl_page->vm_run_prot_bits = calc_vm_prot_bits(prot, 0);
+	encl_page->vm_max_prot_bits = encl_page->vm_run_prot_bits;
+
+	epc_page = sgx_alloc_epc_page(encl_page, true);
+	if (IS_ERR(epc_page)) {
+		kfree(encl_page);
+		return VM_FAULT_SIGBUS;
+	}
+
+	va_page = sgx_encl_grow(encl);
+	if (IS_ERR(va_page)) {
+		ret = PTR_ERR(va_page);
+		goto err_out_free;
+	}
+
+	mutex_lock(&encl->lock);
+
+	/*
+	 * Copy comment from sgx_encl_add_page() to maintain guidance in
+	 * this similar flow:
+	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
+	 * deleting (via sgx_encl_shrink()) in the error path.
+	 */
+	if (va_page)
+		list_add(&va_page->list, &encl->va_pages);
+
+	ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
+			encl_page, GFP_KERNEL);
+	/*
+	 * If ret == -EBUSY then page was created in another flow while
+	 * running without encl->lock
+	 */
+	if (ret)
+		goto err_out_unlock;
+
+	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
+	pginfo.addr = encl_page->desc & PAGE_MASK;
+	pginfo.metadata = 0;
+
+	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
+	if (ret)
+		goto err_out;
+
+	encl_page->encl = encl;
+	encl_page->epc_page = epc_page;
+	encl_page->type = SGX_PAGE_TYPE_REG;
+	encl->secs_child_cnt++;
+
+	sgx_mark_page_reclaimable(encl_page->epc_page);
+
+	phys_addr = sgx_get_epc_phys_addr(epc_page);
+	/*
+	 * Do not undo everything when creating PTE entry fails - next #PF
+	 * would find page ready for a PTE.
+	 * PAGE_SHARED because protection is forced to be RW above and COW
+	 * is not supported.
+	 */
+	vmret = vmf_insert_pfn_prot(vma, addr, PFN_DOWN(phys_addr),
+				    PAGE_SHARED);
+	if (vmret != VM_FAULT_NOPAGE) {
+		mutex_unlock(&encl->lock);
+		return VM_FAULT_SIGBUS;
+	}
+	mutex_unlock(&encl->lock);
+	return VM_FAULT_NOPAGE;
+
+err_out:
+	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
+
+err_out_unlock:
+	sgx_encl_shrink(encl, va_page);
+	mutex_unlock(&encl->lock);
+
+err_out_free:
+	sgx_encl_free_epc_page(epc_page);
+	kfree(encl_page);
+
+	return VM_FAULT_SIGBUS;
+}
+
 static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 {
 	unsigned long addr = (unsigned long)vmf->address;
@@ -145,6 +267,17 @@  static vm_fault_t sgx_vma_fault(struct vm_fault *vmf)
 	if (unlikely(!encl))
 		return VM_FAULT_SIGBUS;
 
+	/*
+	 * The page_array keeps track of all enclave pages, whether they
+	 * are swapped out or not. If there is no entry for this page and
+	 * the system supports SGX2 then it is possible to dynamically add
+	 * a new enclave page. This is only possible for an initialized
+	 * enclave that will be checked for right away.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SGX2) &&
+	    (!xa_load(&encl->page_array, PFN_DOWN(addr))))
+		return sgx_encl_eaug_page(vma, encl, addr);
+
 	mutex_lock(&encl->lock);
 
 	entry = sgx_encl_load_page(encl, addr);
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 848a28d28d3d..1b6ce1da7c92 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -123,4 +123,6 @@  void sgx_encl_free_epc_page(struct sgx_epc_page *page);
 struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
 					 unsigned long addr);
 
+struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl);
+void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page);
 #endif /* _X86_ENCL_H */
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 23bdf558b231..58ff62a1fb00 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -17,7 +17,7 @@ 
 #include "encl.h"
 #include "encls.h"
 
-static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
+struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
 {
 	struct sgx_va_page *va_page = NULL;
 	void *err;
@@ -43,7 +43,7 @@  static struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl)
 	return va_page;
 }
 
-static void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
+void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page)
 {
 	encl->page_cnt--;