diff mbox series

[RFC,3/4] x86/sgx: Obtain backing storage page with enclave mutex held

Message ID 24fd9203331d11918b785c6a67f85d799d100be8.1651171455.git.reinette.chatre@intel.com (mailing list archive)
State New, archived
Headers show
Series [RFC,1/4] x86/sgx: Do not free backing memory on ENCLS[ELDU] failure | expand

Commit Message

Reinette Chatre April 28, 2022, 8:11 p.m. UTC
The SGX backing storage is accessed on two paths: when there
are insufficient enclave pages in the EPC the reclaimer works
to move enclave pages to the backing storage and as enclaves
access pages that have been moved to the backing storage
they are retrieved from there as part of page fault handling.

An oversubscribed SGX system will often run the reclaimer and
page fault handler concurrently and needs to ensure that the
backing store is accessed safely between the reclaimer and
the page fault handler. The scenarios to consider here are:
(a) faulting a page right after it was reclaimed,
(b) faulting a page and reclaiming another page that are
    sharing a PCMD page.

The reclaimer obtains pages from the backing storage without
holding the enclave mutex and runs the risk of concurrently
accessing the backing storage with the page fault handler that
does access the backing storage with the enclave mutex held.

In the scenario below a page is written to the backing store
by the reclaimer and then immediately faulted back, before
the reclaimer is able to set the dirty bit of the page:

sgx_reclaim_pages() {                    sgx_vma_fault() {
...                                      ...
/* write data to backing store */
sgx_reclaimer_write();
                                         mutex_lock(&encl->lock);
                                         __sgx_encl_eldu() {
                                         ...
                                         /* page not dirty -
                                          * contents may not be
                                          * up to date
                                         */
                                         sgx_encl_get_backing();
                                         ...
                                         }
                                         ...
/* set page dirty */
sgx_encl_put_backing();
...
                                         mutex_unlock(&encl->lock);
}                                        }

While it is not possible to concurrently reclaim and fault the same
enclave page the PCMD pages are shared between enclave pages
in the enclave and enclave pages in the backing store.
In the below scenario a PCMD page is truncated from the backing
store after all its pages have been loaded in to the enclave
at the same time the PCMD page is loaded from the backing store
when one of its pages are reclaimed:

sgx_reclaim_pages() {              sgx_vma_fault() {
                                     ...
                                     mutex_lock(&encl->lock);
                                     ...
                                     __sgx_encl_eldu() {
                                       ...
                                       if (pcmd_page_empty) {
/*
 * EPC page being reclaimed              /*
 * shares a PCMD page with an             * PCMD page truncated
 * enclave page that is being             * while requested from
 * faulted in.                            * reclaimer.
 */                                       */
sgx_encl_get_backing()  <---------->      sgx_encl_truncate_backing_page()
                                        }
}                                    }

Protect the reclaimer's backing store access with the enclave's mutex
to ensure that it can safely run concurrently with the page fault handler.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Dave Hansen April 28, 2022, 9:58 p.m. UTC | #1
On 4/28/22 13:11, Reinette Chatre wrote:
> @@ -252,6 +252,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
>  	sgx_encl_ewb(epc_page, backing);
>  	encl_page->epc_page = NULL;
>  	encl->secs_child_cnt--;
> +	sgx_encl_put_backing(backing, true);

Could you also talk a bit about why this needed to move?  It's a bit
harder to make sense of the refcounting when the get and put are done in
different functions.
Reinette Chatre April 28, 2022, 10:44 p.m. UTC | #2
Hi Dave,

On 4/28/2022 2:58 PM, Dave Hansen wrote:
> On 4/28/22 13:11, Reinette Chatre wrote:
>> @@ -252,6 +252,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
>>  	sgx_encl_ewb(epc_page, backing);
>>  	encl_page->epc_page = NULL;
>>  	encl->secs_child_cnt--;
>> +	sgx_encl_put_backing(backing, true);
> 
> Could you also talk a bit about why this needed to move?  It's a bit
> harder to make sense of the refcounting when the get and put are done in
> different functions.

This needed to move to address the following scenario described in the
changelog:

sgx_reclaim_pages() {                    sgx_vma_fault() {
...                                      ...
/* write data to backing store */
sgx_reclaimer_write();
                                         mutex_lock(&encl->lock);
                                         __sgx_encl_eldu() {
                                         ...
                                         /* page not dirty -
                                          * contents may not be
                                          * up to date
                                         */
                                         sgx_encl_get_backing();
                                         ...
                                         }
                                         ...
/* set page dirty */
sgx_encl_put_backing();
...
                                         mutex_unlock(&encl->lock);
}                                        }

Before this change the backing store page was modified within
sgx_reclaimer_write() that essentially does:

sgx_reclaimer_write() {
    mutex_lock(&encl->lock);
    /* write encrypted data to backing store */
    mutex_unlock(&encl->lock);
}

The reclaimer followed the sgx_reclaimer_write() call with a
call to sgx_encl_put_backing() where the pages have their
dirty bits set. sgx_encl_put_backing() was thus done without
the enclave mutex held. If that page is faulted in at that
time the page fault handler may thus attempt to load
the page from the backing store between its contents being
changed and it being marked as dirty.

After the change the page fault handler would not be able to
load the page from the backing store before it is marked as
dirty.

I can improve the flow in the changelog to be clear on the
reclaimer's mutex usage. Perhaps something like:

sgx_reclaim_pages() {                    sgx_vma_fault() {
...                                      ...

  sgx_reclaimer_write() {
     mutex_lock(&encl->lock);
     /* write to backing store */
     mutex_unlock(&encl->lock);
  }
                                         mutex_lock(&encl->lock);
                                         __sgx_encl_eldu() {
                                         ...
                                         /* page not dirty -
                                          * contents may not be
                                          * up to date
                                         */
                                         sgx_encl_get_backing();
                                         ...
                                         }
                                         ...
/* set page dirty */
sgx_encl_put_backing();
...
                                         mutex_unlock(&encl->lock);
}                                        }



Reinette
Jarkko Sakkinen May 6, 2022, 10:43 p.m. UTC | #3
On Thu, Apr 28, 2022 at 01:11:26PM -0700, Reinette Chatre wrote:
> The SGX backing storage is accessed on two paths: when there
> are insufficient enclave pages in the EPC the reclaimer works
> to move enclave pages to the backing storage and as enclaves
> access pages that have been moved to the backing storage
> they are retrieved from there as part of page fault handling.
> 
> An oversubscribed SGX system will often run the reclaimer and
> page fault handler concurrently and needs to ensure that the
> backing store is accessed safely between the reclaimer and
> the page fault handler. The scenarios to consider here are:
> (a) faulting a page right after it was reclaimed,
> (b) faulting a page and reclaiming another page that are
>     sharing a PCMD page.
> 
> The reclaimer obtains pages from the backing storage without
> holding the enclave mutex and runs the risk of concurrently
> accessing the backing storage with the page fault handler that
> does access the backing storage with the enclave mutex held.
> 
> In the scenario below a page is written to the backing store
> by the reclaimer and then immediately faulted back, before
> the reclaimer is able to set the dirty bit of the page:
> 
> sgx_reclaim_pages() {                    sgx_vma_fault() {
> ...                                      ...
> /* write data to backing store */
> sgx_reclaimer_write();
>                                          mutex_lock(&encl->lock);
>                                          __sgx_encl_eldu() {
>                                          ...
>                                          /* page not dirty -
>                                           * contents may not be
>                                           * up to date
>                                          */
>                                          sgx_encl_get_backing();
>                                          ...
>                                          }
>                                          ...
> /* set page dirty */
> sgx_encl_put_backing();
> ...
>                                          mutex_unlock(&encl->lock);
> }                                        }
> 
> While it is not possible to concurrently reclaim and fault the same
> enclave page the PCMD pages are shared between enclave pages
> in the enclave and enclave pages in the backing store.
> In the below scenario a PCMD page is truncated from the backing
> store after all its pages have been loaded in to the enclave
> at the same time the PCMD page is loaded from the backing store
> when one of its pages are reclaimed:
> 
> sgx_reclaim_pages() {              sgx_vma_fault() {
>                                      ...
>                                      mutex_lock(&encl->lock);
>                                      ...
>                                      __sgx_encl_eldu() {
>                                        ...
>                                        if (pcmd_page_empty) {
> /*
>  * EPC page being reclaimed              /*
>  * shares a PCMD page with an             * PCMD page truncated
>  * enclave page that is being             * while requested from
>  * faulted in.                            * reclaimer.
>  */                                       */
> sgx_encl_get_backing()  <---------->      sgx_encl_truncate_backing_page()
>                                         }
> }                                    }
> 
> Protect the reclaimer's backing store access with the enclave's mutex
> to ensure that it can safely run concurrently with the page fault handler.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/main.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 0e8741a80cf3..ae79b8d6f645 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -252,6 +252,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
>  	sgx_encl_ewb(epc_page, backing);
>  	encl_page->epc_page = NULL;
>  	encl->secs_child_cnt--;
> +	sgx_encl_put_backing(backing, true);
>  
>  	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
>  		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
> @@ -323,11 +324,14 @@ static void sgx_reclaim_pages(void)
>  			goto skip;
>  
>  		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
> +
> +		mutex_lock(&encl_page->encl->lock);
>  		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
> -		if (ret)
> +		if (ret) {
> +			mutex_unlock(&encl_page->encl->lock);
>  			goto skip;
> +		}
>  
> -		mutex_lock(&encl_page->encl->lock);
>  		encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
>  		mutex_unlock(&encl_page->encl->lock);
>  		continue;
> @@ -355,7 +359,6 @@ static void sgx_reclaim_pages(void)
>  
>  		encl_page = epc_page->owner;
>  		sgx_reclaimer_write(epc_page, &backing[i]);
> -		sgx_encl_put_backing(&backing[i], true);
>  
>  		kref_put(&encl_page->encl->refcount, sgx_encl_release);
>  		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> -- 
> 2.25.1
> 

I fully agree with your fix but also there I would open code the required
statements from sgx_encL_put_backing(). Let's render that out overtime.
It masks more than helps.

BR, Jarkko
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 0e8741a80cf3..ae79b8d6f645 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -252,6 +252,7 @@  static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
 	sgx_encl_ewb(epc_page, backing);
 	encl_page->epc_page = NULL;
 	encl->secs_child_cnt--;
+	sgx_encl_put_backing(backing, true);
 
 	if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
 		ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
@@ -323,11 +324,14 @@  static void sgx_reclaim_pages(void)
 			goto skip;
 
 		page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
+
+		mutex_lock(&encl_page->encl->lock);
 		ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
-		if (ret)
+		if (ret) {
+			mutex_unlock(&encl_page->encl->lock);
 			goto skip;
+		}
 
-		mutex_lock(&encl_page->encl->lock);
 		encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
 		mutex_unlock(&encl_page->encl->lock);
 		continue;
@@ -355,7 +359,6 @@  static void sgx_reclaim_pages(void)
 
 		encl_page = epc_page->owner;
 		sgx_reclaimer_write(epc_page, &backing[i]);
-		sgx_encl_put_backing(&backing[i], true);
 
 		kref_put(&encl_page->encl->refcount, sgx_encl_release);
 		epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;