Message ID | b98b7b0d60778845eceb4e279b4354632b7111f2.1652131695.git.reinette.chatre@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V2,1/5] x86/sgx: Disconnect backing page references from dirty status | expand |
On Mon, May 09, 2022 at 02:48:01PM -0700, Reinette Chatre wrote: > Haitao reported encountering a WARN triggered by the ENCLS[ELDU] > instruction faulting with a #GP. > > The WARN is encountered when the reclaimer evicts a range of > pages from the enclave when the same pages are faulted back > right away. > > The SGX backing storage is accessed on two paths: when there > are insufficient free pages in the EPC the reclaimer works > to move enclave pages to the backing storage and as enclaves > access pages that have been moved to the backing storage > they are retrieved from there as part of page fault handling. > > An oversubscribed SGX system will often run the reclaimer and > page fault handler concurrently and needs to ensure that the > backing store is accessed safely between the reclaimer and > the page fault handler. This is not the case because the > reclaimer accesses the backing store without the enclave mutex > while the page fault handler accesses the backing store with > the enclave mutex. > > Two scenarios are considered to describe the consequences of > the unsafe access: > (a) Scenario: Fault a page right after it was reclaimed. > Consequence: The page is faulted by loading outdated data > into the enclave using ENCLS[ELDU] that faults when it checks > the MAC and PCMD data. > (b) Scenario: Fault a page while reclaiming another page that > share a PCMD page. > Consequence: A race between the reclaimer and page fault > handler, the reclaimer attempting to access a PCMD at the > same time it is truncated by the page fault handler. This > could result in lost PCMD data. Data may still be > lost if the reclaimer wins the race, this is addressed in > the following patch. > > The reclaimer accesses pages from the backing storage without > holding the enclave mutex and runs the risk of concurrently > accessing the backing storage with the page fault handler that > does access the backing storage with the enclave mutex held. > > The two scenarios ((a) and (b)) are shown below. > > In scenario (a), a page is written to the backing store > by the reclaimer and then immediately faulted back, before > the reclaimer is able to set the dirty bit of the page: > > sgx_reclaim_pages() { sgx_vma_fault() { > ... ... > sgx_reclaimer_write() { > mutex_lock(&encl->lock); > /* Write data to backing store */ > mutex_unlock(&encl->lock); > } > mutex_lock(&encl->lock); > __sgx_encl_eldu() { > ... > /* Enclave backing store > * page not released > * nor marked dirty - > * contents may not be > * up to date. > */ > sgx_encl_get_backing(); > ... > /* > * Enclave data restored > * from backing store > * and PCMD pages that > * are not up to date. > * ENCLS[ELDU] faults > * because of MAC or PCMD > * checking failure. > */ > sgx_encl_put_backing(); > } > ... > /* set page dirty */ > sgx_encl_put_backing(); > ... > mutex_unlock(&encl->lock); > } } > > In scenario (b) below a PCMD page is truncated from the backing > store after all its pages have been loaded in to the enclave > at the same time the PCMD page is loaded from the backing store > when one of its pages are reclaimed: > > sgx_reclaim_pages() { sgx_vma_fault() { > ... > mutex_lock(&encl->lock); > ... > __sgx_encl_eldu() { > ... > if (pcmd_page_empty) { > /* > * EPC page being reclaimed /* > * shares a PCMD page with an * PCMD page truncated > * enclave page that is being * while requested from > * faulted in. * reclaimer. > */ */ > sgx_encl_get_backing() <----------> sgx_encl_truncate_backing_page() > } > mutex_unlock(&encl->lock); > } } > > In scenario (b) there is a race between the reclaimer and the page fault > handler when the reclaimer attempts to get access to the same PCMD page > that is being truncated. This could result in the reclaimer writing to > the PCMD page that is then truncated, causing the PCMD data to be lost, > or in a new PCMD page being allocated. The lost PCMD data may still occur > after protecting the backing store access with the mutex - this is fixed > in the next patch. By ensuring the backing store is accessed with the mutex > held the enclave page state can be made accurate with the > SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page > is in the process of being reclaimed. > > Consistently protect the reclaimer's backing store access with the > enclave's mutex to ensure that it can safely run concurrently with the > page fault handler. > > Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") > Reported-by: Haitao Huang <haitao.huang@intel.com> > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> > --- > arch/x86/kernel/cpu/sgx/main.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index fad3d6c4756e..a60f8b2780fb 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -310,6 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, > sgx_encl_ewb(epc_page, backing); > encl_page->epc_page = NULL; > encl->secs_child_cnt--; > + sgx_encl_put_backing(backing); > > if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) { > ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size), > @@ -381,11 +382,14 @@ static void sgx_reclaim_pages(void) > goto skip; > > page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); > + > + mutex_lock(&encl_page->encl->lock); > ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]); > - if (ret) > + if (ret) { > + mutex_unlock(&encl_page->encl->lock); > goto skip; > + } > > - mutex_lock(&encl_page->encl->lock); > encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; > mutex_unlock(&encl_page->encl->lock); > continue; > @@ -413,7 +417,6 @@ static void sgx_reclaim_pages(void) > > encl_page = epc_page->owner; > sgx_reclaimer_write(epc_page, &backing[i]); > - sgx_encl_put_backing(&backing[i]); > > kref_put(&encl_page->encl->refcount, sgx_encl_release); > epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; > -- > 2.25.1 > I get the locking part but why is the move of sgx_encl_put_backing relevant? BR, Jarkko
Hi Jarkko, On 5/11/2022 4:13 AM, Jarkko Sakkinen wrote: > On Mon, May 09, 2022 at 02:48:01PM -0700, Reinette Chatre wrote: ... >> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c >> index fad3d6c4756e..a60f8b2780fb 100644 >> --- a/arch/x86/kernel/cpu/sgx/main.c >> +++ b/arch/x86/kernel/cpu/sgx/main.c >> @@ -310,6 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, >> sgx_encl_ewb(epc_page, backing); >> encl_page->epc_page = NULL; >> encl->secs_child_cnt--; >> + sgx_encl_put_backing(backing); >> >> if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) { >> ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size), >> @@ -381,11 +382,14 @@ static void sgx_reclaim_pages(void) >> goto skip; >> >> page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); >> + >> + mutex_lock(&encl_page->encl->lock); >> ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]); >> - if (ret) >> + if (ret) { >> + mutex_unlock(&encl_page->encl->lock); >> goto skip; >> + } >> >> - mutex_lock(&encl_page->encl->lock); >> encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; >> mutex_unlock(&encl_page->encl->lock); >> continue; >> @@ -413,7 +417,6 @@ static void sgx_reclaim_pages(void) >> >> encl_page = epc_page->owner; >> sgx_reclaimer_write(epc_page, &backing[i]); >> - sgx_encl_put_backing(&backing[i]); >> >> kref_put(&encl_page->encl->refcount, sgx_encl_release); >> epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; >> -- >> 2.25.1 >> > > I get the locking part but why is the move of sgx_encl_put_backing > relevant? Moving sgx_encl_put_backing() accomplishes the locking goal. Before the patch: sgx_reclaim_pages() { ... sgx_reclaimer_write() { mutex_lock(&encl->lock); ... mutex_unlock(&encl->lock); } sgx_encl_put_backing(); /* Not protected by enclave mutex */ } After the patch: sgx_reclaim_pages() { ... sgx_reclaimer_write() { mutex_lock(&encl->lock); ... sgx_encl_put_backing(); /* Protected by enclave mutex */ ... mutex_unlock(&encl->lock); } } Even so, because of patch 1/1 the first scenario described in the changelog is no longer valid since the page is marked as dirty with the enclave mutex held. It may thus not be required to call sgx_encl_put_backing() with enclave mutex held but it remains important for sgx_encl_get_backing() to be called with enclave mutex held since it ensures that SGX_ENCL_PAGE_BEING_RECLAIMED can be used (in patch 4/5) to reliably reflect references to the backing storage. Considering that I would like to continue to consistently protect sgx_encl_get_backing()/sgx_encl_put_backing() with the enclave mutex. Reinette
On Wed, May 11, 2022 at 11:02:47AM -0700, Reinette Chatre wrote: > Hi Jarkko, > > On 5/11/2022 4:13 AM, Jarkko Sakkinen wrote: > > On Mon, May 09, 2022 at 02:48:01PM -0700, Reinette Chatre wrote: > > ... > > >> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > >> index fad3d6c4756e..a60f8b2780fb 100644 > >> --- a/arch/x86/kernel/cpu/sgx/main.c > >> +++ b/arch/x86/kernel/cpu/sgx/main.c > >> @@ -310,6 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, > >> sgx_encl_ewb(epc_page, backing); > >> encl_page->epc_page = NULL; > >> encl->secs_child_cnt--; > >> + sgx_encl_put_backing(backing); > >> > >> if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) { > >> ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size), > >> @@ -381,11 +382,14 @@ static void sgx_reclaim_pages(void) > >> goto skip; > >> > >> page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); > >> + > >> + mutex_lock(&encl_page->encl->lock); > >> ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]); > >> - if (ret) > >> + if (ret) { > >> + mutex_unlock(&encl_page->encl->lock); > >> goto skip; > >> + } > >> > >> - mutex_lock(&encl_page->encl->lock); > >> encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; > >> mutex_unlock(&encl_page->encl->lock); > >> continue; > >> @@ -413,7 +417,6 @@ static void sgx_reclaim_pages(void) > >> > >> encl_page = epc_page->owner; > >> sgx_reclaimer_write(epc_page, &backing[i]); > >> - sgx_encl_put_backing(&backing[i]); > >> > >> kref_put(&encl_page->encl->refcount, sgx_encl_release); > >> epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; > >> -- > >> 2.25.1 > >> > > > > I get the locking part but why is the move of sgx_encl_put_backing > > relevant? > > Moving sgx_encl_put_backing() accomplishes the locking goal. > > Before the patch: > > sgx_reclaim_pages() { > ... > sgx_reclaimer_write() { > mutex_lock(&encl->lock); > ... > mutex_unlock(&encl->lock); > } > sgx_encl_put_backing(); /* Not protected by enclave mutex */ > } > > After the patch: > > sgx_reclaim_pages() { > ... > sgx_reclaimer_write() { > mutex_lock(&encl->lock); > ... > sgx_encl_put_backing(); /* Protected by enclave mutex */ > ... > mutex_unlock(&encl->lock); > } > > } Right. > Even so, because of patch 1/1 the first scenario described in the > changelog is no longer valid since the page is marked as dirty > with the enclave mutex held. It may thus not be required > to call sgx_encl_put_backing() with enclave mutex held but it > remains important for sgx_encl_get_backing() to be called with > enclave mutex held since it ensures that SGX_ENCL_PAGE_BEING_RECLAIMED > can be used (in patch 4/5) to reliably reflect references to the > backing storage. > Considering that I would like to continue to consistently protect > sgx_encl_get_backing()/sgx_encl_put_backing() with the enclave mutex. I fully agree with your conclusion. > Reinette Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Also Tested-by: Jarkko Sakkinen <jarkko@kernel.org> BR, Jarkko
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index fad3d6c4756e..a60f8b2780fb 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -310,6 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, sgx_encl_ewb(epc_page, backing); encl_page->epc_page = NULL; encl->secs_child_cnt--; + sgx_encl_put_backing(backing); if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) { ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size), @@ -381,11 +382,14 @@ static void sgx_reclaim_pages(void) goto skip; page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); + + mutex_lock(&encl_page->encl->lock); ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]); - if (ret) + if (ret) { + mutex_unlock(&encl_page->encl->lock); goto skip; + } - mutex_lock(&encl_page->encl->lock); encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl_page->encl->lock); continue; @@ -413,7 +417,6 @@ static void sgx_reclaim_pages(void) encl_page = epc_page->owner; sgx_reclaimer_write(epc_page, &backing[i]); - sgx_encl_put_backing(&backing[i]); kref_put(&encl_page->encl->refcount, sgx_encl_release); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP. The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away. The SGX backing storage is accessed on two paths: when there are insufficient free pages in the EPC the reclaimer works to move enclave pages to the backing storage and as enclaves access pages that have been moved to the backing storage they are retrieved from there as part of page fault handling. An oversubscribed SGX system will often run the reclaimer and page fault handler concurrently and needs to ensure that the backing store is accessed safely between the reclaimer and the page fault handler. This is not the case because the reclaimer accesses the backing store without the enclave mutex while the page fault handler accesses the backing store with the enclave mutex. Two scenarios are considered to describe the consequences of the unsafe access: (a) Scenario: Fault a page right after it was reclaimed. Consequence: The page is faulted by loading outdated data into the enclave using ENCLS[ELDU] that faults when it checks the MAC and PCMD data. (b) Scenario: Fault a page while reclaiming another page that share a PCMD page. Consequence: A race between the reclaimer and page fault handler, the reclaimer attempting to access a PCMD at the same time it is truncated by the page fault handler. This could result in lost PCMD data. Data may still be lost if the reclaimer wins the race, this is addressed in the following patch. The reclaimer accesses pages from the backing storage without holding the enclave mutex and runs the risk of concurrently accessing the backing storage with the page fault handler that does access the backing storage with the enclave mutex held. The two scenarios ((a) and (b)) are shown below. In scenario (a), a page is written to the backing store by the reclaimer and then immediately faulted back, before the reclaimer is able to set the dirty bit of the page: sgx_reclaim_pages() { sgx_vma_fault() { ... ... sgx_reclaimer_write() { mutex_lock(&encl->lock); /* Write data to backing store */ mutex_unlock(&encl->lock); } mutex_lock(&encl->lock); __sgx_encl_eldu() { ... /* Enclave backing store * page not released * nor marked dirty - * contents may not be * up to date. */ sgx_encl_get_backing(); ... /* * Enclave data restored * from backing store * and PCMD pages that * are not up to date. * ENCLS[ELDU] faults * because of MAC or PCMD * checking failure. */ sgx_encl_put_backing(); } ... /* set page dirty */ sgx_encl_put_backing(); ... mutex_unlock(&encl->lock); } } In scenario (b) below a PCMD page is truncated from the backing store after all its pages have been loaded in to the enclave at the same time the PCMD page is loaded from the backing store when one of its pages are reclaimed: sgx_reclaim_pages() { sgx_vma_fault() { ... mutex_lock(&encl->lock); ... __sgx_encl_eldu() { ... if (pcmd_page_empty) { /* * EPC page being reclaimed /* * shares a PCMD page with an * PCMD page truncated * enclave page that is being * while requested from * faulted in. * reclaimer. */ */ sgx_encl_get_backing() <----------> sgx_encl_truncate_backing_page() } mutex_unlock(&encl->lock); } } In scenario (b) there is a race between the reclaimer and the page fault handler when the reclaimer attempts to get access to the same PCMD page that is being truncated. This could result in the reclaimer writing to the PCMD page that is then truncated, causing the PCMD data to be lost, or in a new PCMD page being allocated. The lost PCMD data may still occur after protecting the backing store access with the mutex - this is fixed in the next patch. By ensuring the backing store is accessed with the mutex held the enclave page state can be made accurate with the SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page is in the process of being reclaimed. Consistently protect the reclaimer's backing store access with the enclave's mutex to ensure that it can safely run concurrently with the page fault handler. Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Reported-by: Haitao Huang <haitao.huang@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> --- arch/x86/kernel/cpu/sgx/main.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)