diff mbox series

[v4,3/3] x86/sgx: Resolve EREMOVE page vs EAUG page data race

Message ID 20240705074524.443713-4-dmitrii.kuvaiskii@intel.com (mailing list archive)
State New, archived
Headers show
Series x86/sgx: Fix two data races in EAUG/EREMOVE flows | expand

Commit Message

Dmitrii Kuvaiskii July 5, 2024, 7:45 a.m. UTC
Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and MADV_DONTNEED semantics). Consider some enclave page added to the
enclave. User space decides to temporarily remove this page (e.g.,
emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
space performs a memory access on the same page on CPU2, which results
in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
follows:

/*
 * CPU1: User space performs
 * ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
 * on enclave page X
 */
sgx_encl_remove_pages() {

  mutex_lock(&encl->lock);

  entry = sgx_encl_load_page(encl);
  /*
   * verify that page is
   * trimmed and accepted
   */

  mutex_unlock(&encl->lock);

  /*
   * remove PTE entry; cannot
   * be performed under lock
   */
  sgx_zap_enclave_ptes(encl);
                                 /*
                                  * Fault on CPU2 on same page X
                                  */
                                 sgx_vma_fault() {
                                   /*
                                    * PTE entry was removed, but the
                                    * page is still in enclave's xarray
                                    */
                                   xa_load(&encl->page_array) != NULL ->
                                   /*
                                    * SGX driver thinks that this page
                                    * was swapped out and loads it
                                    */
                                   mutex_lock(&encl->lock);
                                   /*
                                    * this is effectively a no-op
                                    */
                                   entry = sgx_encl_load_page_in_vma();
                                   /*
                                    * add PTE entry
                                    *
                                    * *BUG*: a PTE is installed for a
                                    * page in process of being removed
                                    */
                                   vmf_insert_pfn(...);

                                   mutex_unlock(&encl->lock);
                                   return VM_FAULT_NOPAGE;
                                 }
  /*
   * continue with page removal
   */
  mutex_lock(&encl->lock);

  sgx_encl_free_epc_page(epc_page) {
    /*
     * remove page via EREMOVE
     */
    /*
     * free EPC page
     */
    sgx_free_epc_page(epc_page);
  }

  xa_erase(&encl->page_array);

  mutex_unlock(&encl->lock);
}

Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
same page. This enclave page becomes perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page.

Fix this race by forcing the fault handler on CPU2 to back off if the
page is currently being removed (on CPU1). This is achieved by
setting SGX_ENCL_PAGE_BUSY flag right-before the first mutex_unlock() in
sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
page is busy, and if yes then CPU2 backs off and waits until the page is
completely removed. After that, any memory access to this page results
in a normal "allocate and EAUG a page on #PF" flow.

Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
---
 arch/x86/kernel/cpu/sgx/ioctl.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Haitao Huang July 10, 2024, 3:16 p.m. UTC | #1
On Fri, 05 Jul 2024 02:45:24 -0500, Dmitrii Kuvaiskii  
<dmitrii.kuvaiskii@intel.com> wrote:

> Two enclave threads may try to add and remove the same enclave page
> simultaneously (e.g., if the SGX runtime supports both lazy allocation
> and MADV_DONTNEED semantics). Consider some enclave page added to the
> enclave. User space decides to temporarily remove this page (e.g.,
> emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
> space performs a memory access on the same page on CPU2, which results
> in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
> follows:
>
> /*
>  * CPU1: User space performs
>  * ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
>  * on enclave page X
>  */
> sgx_encl_remove_pages() {
>
>   mutex_lock(&encl->lock);
>
>   entry = sgx_encl_load_page(encl);
>   /*
>    * verify that page is
>    * trimmed and accepted
>    */
>
>   mutex_unlock(&encl->lock);
>
>   /*
>    * remove PTE entry; cannot
>    * be performed under lock
>    */
>   sgx_zap_enclave_ptes(encl);
>                                  /*
>                                   * Fault on CPU2 on same page X
>                                   */
>                                  sgx_vma_fault() {
>                                    /*
>                                     * PTE entry was removed, but the
>                                     * page is still in enclave's xarray
>                                     */
>                                    xa_load(&encl->page_array) != NULL ->
>                                    /*
>                                     * SGX driver thinks that this page
>                                     * was swapped out and loads it
>                                     */
>                                    mutex_lock(&encl->lock);
>                                    /*
>                                     * this is effectively a no-op
>                                     */
>                                    entry = sgx_encl_load_page_in_vma();
>                                    /*
>                                     * add PTE entry
>                                     *
>                                     * *BUG*: a PTE is installed for a
>                                     * page in process of being removed
>                                     */
>                                    vmf_insert_pfn(...);
>
>                                    mutex_unlock(&encl->lock);
>                                    return VM_FAULT_NOPAGE;
>                                  }
>   /*
>    * continue with page removal
>    */
>   mutex_lock(&encl->lock);
>
>   sgx_encl_free_epc_page(epc_page) {
>     /*
>      * remove page via EREMOVE
>      */
>     /*
>      * free EPC page
>      */
>     sgx_free_epc_page(epc_page);
>   }
>
>   xa_erase(&encl->page_array);
>
>   mutex_unlock(&encl->lock);
> }
>
> Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
> same page. This enclave page becomes perpetually inaccessible (until
> another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
> marked accessible in the PTE entry but is not EAUGed, and any subsequent
> access to this page raises a fault: with the kernel believing there to
> be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
> path do_user_addr_fault() -> access_error() causes the SGX driver's
> sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
> The userspace SIGSEGV handler cannot perform EACCEPT because the page
> was not EAUGed. Thus, the user space is stuck with the inaccessible
> page.
>
> Fix this race by forcing the fault handler on CPU2 to back off if the
> page is currently being removed (on CPU1). This is achieved by
> setting SGX_ENCL_PAGE_BUSY flag right-before the first mutex_unlock() in
> sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
> page is busy, and if yes then CPU2 backs off and waits until the page is
> completely removed. After that, any memory access to this page results
> in a normal "allocate and EAUG a page on #PF" flow.
>
> Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/ioctl.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c  
> b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 5d390df21440..02441883401d 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -1141,7 +1141,14 @@ static long sgx_encl_remove_pages(struct sgx_encl  
> *encl,
>  		/*
>  		 * Do not keep encl->lock because of dependency on
>  		 * mmap_lock acquired in sgx_zap_enclave_ptes().
> +		 *
> +		 * Releasing encl->lock leads to a data race: while CPU1
> +		 * performs sgx_zap_enclave_ptes() and removes the PTE entry
> +		 * for the enclave page, CPU2 may attempt to load this page
> +		 * (because the page is still in enclave's xarray). To prevent
> +		 * CPU2 from loading the page, mark the page as busy.
>  		 */
> +		entry->desc |= SGX_ENCL_PAGE_BUSY;
>  		mutex_unlock(&encl->lock);
> 		sgx_zap_enclave_ptes(encl, addr);

Reviewed-by: Haitao Huang <haitao.huang@linux.intel.com>
Thanks
Haitao
Jarkko Sakkinen July 17, 2024, 10:38 a.m. UTC | #2
On Fri Jul 5, 2024 at 10:45 AM EEST, Dmitrii Kuvaiskii wrote:
> Two enclave threads may try to add and remove the same enclave page
> simultaneously (e.g., if the SGX runtime supports both lazy allocation
> and MADV_DONTNEED semantics). Consider some enclave page added to the
> enclave. User space decides to temporarily remove this page (e.g.,
> emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
> space performs a memory access on the same page on CPU2, which results
> in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
> follows:
>
> /*
>  * CPU1: User space performs
>  * ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
>  * on enclave page X
>  */
> sgx_encl_remove_pages() {
>
>   mutex_lock(&encl->lock);
>
>   entry = sgx_encl_load_page(encl);
>   /*
>    * verify that page is
>    * trimmed and accepted
>    */
>
>   mutex_unlock(&encl->lock);
>
>   /*
>    * remove PTE entry; cannot
>    * be performed under lock
>    */
>   sgx_zap_enclave_ptes(encl);
>                                  /*
>                                   * Fault on CPU2 on same page X
>                                   */
>                                  sgx_vma_fault() {
>                                    /*
>                                     * PTE entry was removed, but the
>                                     * page is still in enclave's xarray
>                                     */
>                                    xa_load(&encl->page_array) != NULL ->
>                                    /*
>                                     * SGX driver thinks that this page
>                                     * was swapped out and loads it
>                                     */
>                                    mutex_lock(&encl->lock);
>                                    /*
>                                     * this is effectively a no-op
>                                     */
>                                    entry = sgx_encl_load_page_in_vma();
>                                    /*
>                                     * add PTE entry
>                                     *
>                                     * *BUG*: a PTE is installed for a
>                                     * page in process of being removed
>                                     */
>                                    vmf_insert_pfn(...);
>
>                                    mutex_unlock(&encl->lock);
>                                    return VM_FAULT_NOPAGE;
>                                  }
>   /*
>    * continue with page removal
>    */
>   mutex_lock(&encl->lock);
>
>   sgx_encl_free_epc_page(epc_page) {
>     /*
>      * remove page via EREMOVE
>      */
>     /*
>      * free EPC page
>      */
>     sgx_free_epc_page(epc_page);
>   }
>
>   xa_erase(&encl->page_array);
>
>   mutex_unlock(&encl->lock);
> }
>
> Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
> same page. This enclave page becomes perpetually inaccessible (until
> another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
> marked accessible in the PTE entry but is not EAUGed, and any subsequent
> access to this page raises a fault: with the kernel believing there to
> be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
> path do_user_addr_fault() -> access_error() causes the SGX driver's
> sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
> The userspace SIGSEGV handler cannot perform EACCEPT because the page
> was not EAUGed. Thus, the user space is stuck with the inaccessible
> page.
>
> Fix this race by forcing the fault handler on CPU2 to back off if the
> page is currently being removed (on CPU1). This is achieved by
> setting SGX_ENCL_PAGE_BUSY flag right-before the first mutex_unlock() in
> sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
> page is busy, and if yes then CPU2 backs off and waits until the page is
> completely removed. After that, any memory access to this page results
> in a normal "allocate and EAUG a page on #PF" flow.
>
> Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/ioctl.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 5d390df21440..02441883401d 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -1141,7 +1141,14 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
>  		/*
>  		 * Do not keep encl->lock because of dependency on
>  		 * mmap_lock acquired in sgx_zap_enclave_ptes().
> +		 *
> +		 * Releasing encl->lock leads to a data race: while CPU1
> +		 * performs sgx_zap_enclave_ptes() and removes the PTE entry
> +		 * for the enclave page, CPU2 may attempt to load this page
> +		 * (because the page is still in enclave's xarray). To prevent
> +		 * CPU2 from loading the page, mark the page as busy.
>  		 */
> +		entry->desc |= SGX_ENCL_PAGE_BUSY;
>  		mutex_unlock(&encl->lock);
>  
>  		sgx_zap_enclave_ptes(encl, addr);

Ditto.

BR, Jarkko
Huang, Kai July 25, 2024, 1:21 a.m. UTC | #3
On 5/07/2024 7:45 pm, Dmitrii Kuvaiskii wrote:
> Two enclave threads may try to add and remove the same enclave page
> simultaneously (e.g., if the SGX runtime supports both lazy allocation
> and MADV_DONTNEED semantics). Consider some enclave page added to the
> enclave. User space decides to temporarily remove this page (e.g.,
> emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
> space performs a memory access on the same page on CPU2, which results
> in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
> follows:
> 
> /*
>   * CPU1: User space performs
>   * ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
>   * on enclave page X
>   */
> sgx_encl_remove_pages() {
> 
>    mutex_lock(&encl->lock);
> 
>    entry = sgx_encl_load_page(encl);
>    /*
>     * verify that page is
>     * trimmed and accepted
>     */
> 
>    mutex_unlock(&encl->lock);
> 
>    /*
>     * remove PTE entry; cannot
>     * be performed under lock
>     */
>    sgx_zap_enclave_ptes(encl);
>                                   /*
>                                    * Fault on CPU2 on same page X
>                                    */
>                                   sgx_vma_fault() {
>                                     /*
>                                      * PTE entry was removed, but the
>                                      * page is still in enclave's xarray
>                                      */
>                                     xa_load(&encl->page_array) != NULL ->
>                                     /*
>                                      * SGX driver thinks that this page
>                                      * was swapped out and loads it
>                                      */
>                                     mutex_lock(&encl->lock);
>                                     /*
>                                      * this is effectively a no-op
>                                      */
>                                     entry = sgx_encl_load_page_in_vma();
>                                     /*
>                                      * add PTE entry
>                                      *
>                                      * *BUG*: a PTE is installed for a
>                                      * page in process of being removed
>                                      */
>                                     vmf_insert_pfn(...);
> 
>                                     mutex_unlock(&encl->lock);
>                                     return VM_FAULT_NOPAGE;
>                                   }
>    /*
>     * continue with page removal
>     */
>    mutex_lock(&encl->lock);
> 
>    sgx_encl_free_epc_page(epc_page) {
>      /*
>       * remove page via EREMOVE
>       */
>      /*
>       * free EPC page
>       */
>      sgx_free_epc_page(epc_page);
>    }
> 
>    xa_erase(&encl->page_array);
> 
>    mutex_unlock(&encl->lock);
> }
> 
> Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
> same page. This enclave page becomes perpetually inaccessible (until
> another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
> marked accessible in the PTE entry but is not EAUGed, and any subsequent
> access to this page raises a fault: with the kernel believing there to
> be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
> path do_user_addr_fault() -> access_error() causes the SGX driver's
> sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
> The userspace SIGSEGV handler cannot perform EACCEPT because the page
> was not EAUGed. Thus, the user space is stuck with the inaccessible
> page.

Reading the code, it seems the ioctl(sgx_ioc_enclave_modify_types) also 
zaps EPC mapping when converting a normal page to TSC.  Thus IIUC it 
should also suffer this issue?

> 
> Fix this race by forcing the fault handler on CPU2 to back off if the
> page is currently being removed (on CPU1). This is achieved by
> setting SGX_ENCL_PAGE_BUSY flag right-before the first mutex_unlock() in
> sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
> page is busy, and if yes then CPU2 backs off and waits until the page is
> completely removed. After that, any memory access to this page results
> in a normal "allocate and EAUG a page on #PF" flow.
> 
> Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
> ---
>   arch/x86/kernel/cpu/sgx/ioctl.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 5d390df21440..02441883401d 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -1141,7 +1141,14 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
>   		/*
>   		 * Do not keep encl->lock because of dependency on
>   		 * mmap_lock acquired in sgx_zap_enclave_ptes().
> +		 *
> +		 * Releasing encl->lock leads to a data race: while CPU1
> +		 * performs sgx_zap_enclave_ptes() and removes the PTE entry
> +		 * for the enclave page, CPU2 may attempt to load this page
> +		 * (because the page is still in enclave's xarray). To prevent
> +		 * CPU2 from loading the page, mark the page as busy.
>   		 */
> +		entry->desc |= SGX_ENCL_PAGE_BUSY;
>   		mutex_unlock(&encl->lock);
>   
>   		sgx_zap_enclave_ptes(encl, addr);

The fix seems reasonable to me for the REMOVE case.  But IIUC the BUSY 
flag should be applied to the above case (PT change) too?
Dmitrii Kuvaiskii Aug. 9, 2024, 9:35 a.m. UTC | #4
On Thu, Jul 25, 2024 at 01:21:56PM +1200, Huang, Kai wrote:
>
> > Two enclave threads may try to add and remove the same enclave page
> > simultaneously (e.g., if the SGX runtime supports both lazy allocation
> > and MADV_DONTNEED semantics). Consider some enclave page added to the
> > enclave. User space decides to temporarily remove this page (e.g.,
> > emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
> > space performs a memory access on the same page on CPU2, which results
> > in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
> > follows:
> >
> >   [ ... skipped ... ]
> >
> > Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
> > same page. This enclave page becomes perpetually inaccessible (until
> > another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
> > marked accessible in the PTE entry but is not EAUGed, and any subsequent
> > access to this page raises a fault: with the kernel believing there to
> > be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
> > path do_user_addr_fault() -> access_error() causes the SGX driver's
> > sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
> > The userspace SIGSEGV handler cannot perform EACCEPT because the page
> > was not EAUGed. Thus, the user space is stuck with the inaccessible
> > page.
>
> Reading the code, it seems the ioctl(sgx_ioc_enclave_modify_types) also zaps
> EPC mapping when converting a normal page to TSC.  Thus IIUC it should also
> suffer this issue?

Technically yes, sgx_enclave_modify_types() has a similar code path and
can be patched in a similar way.

Practically though, I can't imagine an SGX program or framework to allow a
scenario when CPU1 modifies the type of the enclave page from REG to TCS
and at the same time CPU2 performs a memory access on the same page. This
would be clearly a bug in the SGX program/framework. For example, Gramine
always follows the path of: create a new REG enclave page, modify it to
TCS, only then start using it; i.e., there is never a point in time at
which the REG page is allocated and ready to be converted to a TCS page,
and some other thread/CPU accesses it in-between these steps.

TLDR: I can add similar handling to sgx_enclave_modify_types() if
reviewers insist, but I don't see how this data race can ever be
triggered by benign real-world SGX applications.

--
Dmitrii Kuvaiskii
Huang, Kai Aug. 9, 2024, 11:19 a.m. UTC | #5
On Fri, 2024-08-09 at 02:35 -0700, Dmitrii Kuvaiskii wrote:
> On Thu, Jul 25, 2024 at 01:21:56PM +1200, Huang, Kai wrote:
> > 
> > > Two enclave threads may try to add and remove the same enclave page
> > > simultaneously (e.g., if the SGX runtime supports both lazy allocation
> > > and MADV_DONTNEED semantics). Consider some enclave page added to the
> > > enclave. User space decides to temporarily remove this page (e.g.,
> > > emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
> > > space performs a memory access on the same page on CPU2, which results
> > > in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
> > > follows:
> > > 
> > >   [ ... skipped ... ]
> > > 
> > > Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
> > > same page. This enclave page becomes perpetually inaccessible (until
> > > another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
> > > marked accessible in the PTE entry but is not EAUGed, and any subsequent
> > > access to this page raises a fault: with the kernel believing there to
> > > be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
> > > path do_user_addr_fault() -> access_error() causes the SGX driver's
> > > sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
> > > The userspace SIGSEGV handler cannot perform EACCEPT because the page
> > > was not EAUGed. Thus, the user space is stuck with the inaccessible
> > > page.
> > 
> > Reading the code, it seems the ioctl(sgx_ioc_enclave_modify_types) also zaps
> > EPC mapping when converting a normal page to TSC.  Thus IIUC it should also
> > suffer this issue?
> 
> Technically yes, sgx_enclave_modify_types() has a similar code path and
> can be patched in a similar way.
> 
> Practically though, I can't imagine an SGX program or framework to allow a
> scenario when CPU1 modifies the type of the enclave page from REG to TCS
> and at the same time CPU2 performs a memory access on the same page. This
> would be clearly a bug in the SGX program/framework. For example, Gramine
> always follows the path of: create a new REG enclave page, modify it to
> TCS, only then start using it; i.e., there is never a point in time at
> which the REG page is allocated and ready to be converted to a TCS page,
> and some other thread/CPU accesses it in-between these steps.

I think we need to understand the consequence of such bug (assuming such
behaviour is 100% a bug) both to kernel and to enclave.

To the kernel I don't see any big issue: for the sgx_vma_fault() path it will
find the EPC page is already loaded thus just setup the mapping again; for the
sgx_enclave_modify_types() path the worst case is __emodt() could fail,
resulting in enclave being killed probably.

So if this race is 100% a bug, and will also certainly kill the enclave, then
I guess it is fine not to handle.  But there's an "assuming" here.

On the other hand, there's no risk if we apply BUSY flag here too.  If it is a
bug in enclave, then it can die anyway; otherwise it may survive.

> 
> TLDR: I can add similar handling to sgx_enclave_modify_types() if
> reviewers insist, but I don't see how this data race can ever be
> triggered by benign real-world SGX applications.
> 

So as mentioned above, I intend to suggest to also apply the BUSY flag here.  
And we can have a consist rule in the kernel:

If an enclave page is under certainly operation by the kernel with the mapping
removed, other threads trying to access that page are temporarily blocked and
should retry.

But this is only my 2cents, and I'll leave to maintainers.
Dmitrii Kuvaiskii Aug. 12, 2024, 8:25 a.m. UTC | #6
On Wed, Jul 17, 2024 at 01:38:59PM +0300, Jarkko Sakkinen wrote:

> Ditto.

Just to be sure: I assume this means "Fixes should be in the head of the
series so please reorder"? If yes, please see my reply in the other email
[1].

[1] https://lore.kernel.org/all/20240812082128.3084051-1-dmitrii.kuvaiskii@intel.com/

--
Dmitrii Kuvaiskii
Dmitrii Kuvaiskii Aug. 12, 2024, 8:32 a.m. UTC | #7
On Fri, Aug 09, 2024 at 11:19:22AM +0000, Huang, Kai wrote:

> > TLDR: I can add similar handling to sgx_enclave_modify_types() if
> > reviewers insist, but I don't see how this data race can ever be
> > triggered by benign real-world SGX applications.
>
> So as mentioned above, I intend to suggest to also apply the BUSY flag here.  
> And we can have a consist rule in the kernel:
>
> If an enclave page is under certainly operation by the kernel with the mapping
> removed, other threads trying to access that page are temporarily blocked and
> should retry.

I agree with your assessment on the consequences of such bug in
sgx_enclave_modify_types(). To my understanding, this bug can only affect
the SGX enclave (i.e. the userspace) -- either the SGX enclave will hang
or will be terminated.

Anyway, I will apply the BUSY flag also in sgx_enclave_modify_types() in
the next iteration of this patch series.

--
Dmitrii Kuvaiskii
Huang, Kai Aug. 12, 2024, 10:34 a.m. UTC | #8
On Mon, 2024-08-12 at 01:32 -0700, Kuvaiskii, Dmitrii wrote:
> On Fri, Aug 09, 2024 at 11:19:22AM +0000, Huang, Kai wrote:
> 
> > > TLDR: I can add similar handling to sgx_enclave_modify_types() if
> > > reviewers insist, but I don't see how this data race can ever be
> > > triggered by benign real-world SGX applications.
> > 
> > So as mentioned above, I intend to suggest to also apply the BUSY flag here.  
> > And we can have a consist rule in the kernel:
> > 
> > If an enclave page is under certainly operation by the kernel with the mapping
> > removed, other threads trying to access that page are temporarily blocked and
> > should retry.
> 
> I agree with your assessment on the consequences of such bug in
> sgx_enclave_modify_types(). To my understanding, this bug can only affect
> the SGX enclave (i.e. the userspace) -- either the SGX enclave will hang
> or will be terminated.
> 
> Anyway, I will apply the BUSY flag also in sgx_enclave_modify_types() in
> the next iteration of this patch series.
> 

Thanks.
Jarkko Sakkinen Aug. 15, 2024, 6:34 p.m. UTC | #9
On Mon Aug 12, 2024 at 11:25 AM EEST, Dmitrii Kuvaiskii wrote:
> On Wed, Jul 17, 2024 at 01:38:59PM +0300, Jarkko Sakkinen wrote:
>
> > Ditto.
>
> Just to be sure: I assume this means "Fixes should be in the head of the
> series so please reorder"? If yes, please see my reply in the other email
> [1].

OK, based on your earlier remarks and references I agree with you.

>
> [1] https://lore.kernel.org/all/20240812082128.3084051-1-dmitrii.kuvaiskii@intel.com/
>
> --
> Dmitrii Kuvaiskii

I think for future and since we have bunch of state flags, removing
that "e.g." is worth of doing. Often you need to go through all of
the flags to remind you how they interact, and at that point "one
vs many" does help navigating the complexity.

BR, Jarkko
Jarkko Sakkinen Aug. 15, 2024, 6:37 p.m. UTC | #10
On Thu Aug 15, 2024 at 9:34 PM EEST, Jarkko Sakkinen wrote:
> On Mon Aug 12, 2024 at 11:25 AM EEST, Dmitrii Kuvaiskii wrote:
> > On Wed, Jul 17, 2024 at 01:38:59PM +0300, Jarkko Sakkinen wrote:
> >
> > > Ditto.
> >
> > Just to be sure: I assume this means "Fixes should be in the head of the
> > series so please reorder"? If yes, please see my reply in the other email
> > [1].
>
> OK, based on your earlier remarks and references I agree with you.
>
> >
> > [1] https://lore.kernel.org/all/20240812082128.3084051-1-dmitrii.kuvaiskii@intel.com/
> >
> > --
> > Dmitrii Kuvaiskii
>
> I think for future and since we have bunch of state flags, removing
> that "e.g." is worth of doing. Often you need to go through all of
> the flags to remind you how they interact, and at that point "one
> vs many" does help navigating the complexity.

Actually every time there's a patch that has anything to do with
the state flags I go through all  of em as a reminder. Might seem
like irrelevant detail but really is not (and neither unnecessarry
nitpicking). All small clues speed up that process or can mislead.

BR, Jarkko
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 5d390df21440..02441883401d 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1141,7 +1141,14 @@  static long sgx_encl_remove_pages(struct sgx_encl *encl,
 		/*
 		 * Do not keep encl->lock because of dependency on
 		 * mmap_lock acquired in sgx_zap_enclave_ptes().
+		 *
+		 * Releasing encl->lock leads to a data race: while CPU1
+		 * performs sgx_zap_enclave_ptes() and removes the PTE entry
+		 * for the enclave page, CPU2 may attempt to load this page
+		 * (because the page is still in enclave's xarray). To prevent
+		 * CPU2 from loading the page, mark the page as busy.
 		 */
+		entry->desc |= SGX_ENCL_PAGE_BUSY;
 		mutex_unlock(&encl->lock);
 
 		sgx_zap_enclave_ptes(encl, addr);