Message ID | ed20a5db516aa813873268e125680041ae11dfcf.1652389823.git.reinette.chatre@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V3,1/5] x86/sgx: Disconnect backing page references from dirty status | expand |
On Thu, May 12, 2022 at 02:51:00PM -0700, Reinette Chatre wrote: > Haitao reported encountering a WARN triggered by the ENCLS[ELDU] > instruction faulting with a #GP. > > The WARN is encountered when the reclaimer evicts a range of > pages from the enclave when the same pages are faulted back right away. > > Consider two enclave pages (ENCLAVE_A and ENCLAVE_B) > sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the > enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains > just one entry, that of ENCLAVE_B. > > Scenario proceeds where ENCLAVE_A is being evicted from the enclave > while ENCLAVE_B is faulted in. > > sgx_reclaim_pages() { > > ... > > /* > * Reclaim ENCLAVE_A > */ > mutex_lock(&encl->lock); > /* > * Get a reference to ENCLAVE_A's > * shmem page where enclave page > * encrypted data will be stored > * as well as a reference to the > * enclave page's PCMD data page, > * PCMD_AB. > * Release mutex before writing > * any data to the shmem pages. > */ > sgx_encl_get_backing(...); > encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; > mutex_unlock(&encl->lock); > > /* > * Fault ENCLAVE_B > */ > > sgx_vma_fault() { > > mutex_lock(&encl->lock); > /* > * Get reference to > * ENCLAVE_B's shmem page > * as well as PCMD_AB. > */ > sgx_encl_get_backing(...) > /* > * Load page back into > * enclave via ELDU. > */ > /* > * Release reference to > * ENCLAVE_B' shmem page and > * PCMD_AB. > */ > sgx_encl_put_backing(...); > /* > * PCMD_AB is found empty so > * it and ENCLAVE_B's shmem page > * are truncated. > */ > /* Truncate ENCLAVE_B backing page */ > sgx_encl_truncate_backing_page(); > /* Truncate PCMD_AB */ > sgx_encl_truncate_backing_page(); > > mutex_unlock(&encl->lock); > > ... > } > mutex_lock(&encl->lock); > encl_page->desc &= > ~SGX_ENCL_PAGE_BEING_RECLAIMED; > /* > * Write encrypted contents of > * ENCLAVE_A to ENCLAVE_A shmem > * page and its PCMD data to > * PCMD_AB. > */ > sgx_encl_put_backing(...) > > /* > * Reference to PCMD_AB is > * dropped and it is truncated. > * ENCLAVE_A's PCMD data is lost. > */ > mutex_unlock(&encl->lock); > } > > What happens next depends on whether it is ENCLAVE_A being faulted > in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting > with a #GP. > > If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called > a new PCMD page is allocated and providing the empty PCMD data for > ENCLAVE_A would cause ENCLS[ELDU] to #GP > > If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the > reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction > would #GP during its checks of the PCMD value and the WARN would be > encountered. > > Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time > it obtains a reference to the backing store pages of an enclave page it > is in the process of reclaiming, fix the race by only truncating the PCMD > page after ensuring that no page sharing the PCMD page is in the process > of being reclaimed. > > Cc: stable@vger.kernel.org > Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") > Reported-by: Haitao Huang <haitao.huang@intel.com> > Tested-by: Haitao Huang <haitao.huang@intel.com> > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> > --- > Changes since V2: > - Declare "addr" and "entry" within loop. (Dave) > - Fix incorrect error return when enclave page not found to belong > to enclave. Continue search instead of returning failure. (Dave) > - Add Haitao's "Tested-by" tag. > - Rename pcmd_page_in_use() to reclaimer_writing_to_pcmd() to be less > generic. (Dave) > - Improve function comments to be clear about it testing for PCMD > page soon becoming non-empty, also add context info to kernel-doc > to indicate that enclave mutex must be held. (Dave) > > Changes since RFC v1: > - New patch. > > arch/x86/kernel/cpu/sgx/encl.c | 94 +++++++++++++++++++++++++++++++++- > 1 file changed, 93 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > index 5104a428b72c..243f3bd78145 100644 > --- a/arch/x86/kernel/cpu/sgx/encl.c > +++ b/arch/x86/kernel/cpu/sgx/encl.c > @@ -12,6 +12,92 @@ > #include "encls.h" > #include "sgx.h" > > +#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd)) > +/* > + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to > + * determine the page index associated with the first PCMD entry > + * within a PCMD page. > + */ > +#define PCMD_FIRST_MASK GENMASK(4, 0) > + > +/** > + * reclaimer_writing_to_pcmd() - Query if any enclave page associated with > + * a PCMD page is in process of being reclaimed. > + * @encl: Enclave to which PCMD page belongs > + * @start_addr: Address of enclave page using first entry within the PCMD page > + * > + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is > + * stored. The PCMD data of a reclaimed enclave page contains enough > + * information for the processor to verify the page at the time > + * it is loaded back into the Enclave Page Cache (EPC). > + * > + * The backing storage to which enclave pages are reclaimed is laid out as > + * follows: > + * Encrypted enclave pages:SECS page:PCMD pages > + * > + * Each PCMD page contains the PCMD metadata of > + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages. > + * > + * A PCMD page can only be truncated if it is (a) empty, and (b) not in the > + * process of getting data (and thus soon being non-empty). (b) is tested with > + * a check if an enclave page sharing the PCMD page is in the process of being > + * reclaimed. > + * > + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it > + * intends to reclaim that enclave page - it means that the PCMD page > + * associated with that enclave page is about to get some data and thus > + * even if the PCMD page is empty, it should not be truncated. > + * > + * Context: Enclave mutex (&sgx_encl->lock) must be held. > + * Return: 1 if the reclaimer is about to write to the PCMD page > + * 0 if the reclaimer has no intention to write to the PCMD page > + */ > +static int reclaimer_writing_to_pcmd(struct sgx_encl *encl, > + unsigned long start_addr) > +{ > + int reclaimed = 0; > + int i; > + > + /* > + * PCMD_FIRST_MASK is based on number of PCMD entries within > + * PCMD page being 32. > + */ > + BUILD_BUG_ON(PCMDS_PER_PAGE != 32); > + > + for (i = 0; i < PCMDS_PER_PAGE; i++) { > + struct sgx_encl_page *entry; > + unsigned long addr; > + > + addr = start_addr + i * PAGE_SIZE; > + > + /* > + * Stop when reaching the SECS page - it does not > + * have a page_array entry and its reclaim is > + * started and completed with enclave mutex held so > + * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED > + * flag. > + */ > + if (addr == encl->base + encl->size) > + break; > + > + entry = xa_load(&encl->page_array, PFN_DOWN(addr)); > + if (!entry) > + continue; > + > + /* > + * VA page slot ID uses same bit as the flag so it is important > + * to ensure that the page is not already in backing store. > + */ > + if (entry->epc_page && > + (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) { > + reclaimed = 1; > + break; > + } > + } > + > + return reclaimed; > +} > + > /* > * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's > * follow right after the EPC data in the backing storage. In addition to the > @@ -47,6 +133,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, > unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK; > struct sgx_encl *encl = encl_page->encl; > pgoff_t page_index, page_pcmd_off; > + unsigned long pcmd_first_page; > struct sgx_pageinfo pginfo; > struct sgx_backing b; > bool pcmd_page_empty; > @@ -58,6 +145,11 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, > else > page_index = PFN_DOWN(encl->size); > > + /* > + * Address of enclave page using the first entry within the PCMD page. > + */ > + pcmd_first_page = PFN_PHYS(page_index & ~PCMD_FIRST_MASK) + encl->base; > + > page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index); > > ret = sgx_encl_get_backing(encl, page_index, &b); > @@ -99,7 +191,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, > > sgx_encl_truncate_backing_page(encl, page_index); > > - if (pcmd_page_empty) > + if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page)) > sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off)); > > return ret; > -- > 2.25.1 > Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> BR, Jarkko
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 5104a428b72c..243f3bd78145 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -12,6 +12,92 @@ #include "encls.h" #include "sgx.h" +#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd)) +/* + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to + * determine the page index associated with the first PCMD entry + * within a PCMD page. + */ +#define PCMD_FIRST_MASK GENMASK(4, 0) + +/** + * reclaimer_writing_to_pcmd() - Query if any enclave page associated with + * a PCMD page is in process of being reclaimed. + * @encl: Enclave to which PCMD page belongs + * @start_addr: Address of enclave page using first entry within the PCMD page + * + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is + * stored. The PCMD data of a reclaimed enclave page contains enough + * information for the processor to verify the page at the time + * it is loaded back into the Enclave Page Cache (EPC). + * + * The backing storage to which enclave pages are reclaimed is laid out as + * follows: + * Encrypted enclave pages:SECS page:PCMD pages + * + * Each PCMD page contains the PCMD metadata of + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages. + * + * A PCMD page can only be truncated if it is (a) empty, and (b) not in the + * process of getting data (and thus soon being non-empty). (b) is tested with + * a check if an enclave page sharing the PCMD page is in the process of being + * reclaimed. + * + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it + * intends to reclaim that enclave page - it means that the PCMD page + * associated with that enclave page is about to get some data and thus + * even if the PCMD page is empty, it should not be truncated. + * + * Context: Enclave mutex (&sgx_encl->lock) must be held. + * Return: 1 if the reclaimer is about to write to the PCMD page + * 0 if the reclaimer has no intention to write to the PCMD page + */ +static int reclaimer_writing_to_pcmd(struct sgx_encl *encl, + unsigned long start_addr) +{ + int reclaimed = 0; + int i; + + /* + * PCMD_FIRST_MASK is based on number of PCMD entries within + * PCMD page being 32. + */ + BUILD_BUG_ON(PCMDS_PER_PAGE != 32); + + for (i = 0; i < PCMDS_PER_PAGE; i++) { + struct sgx_encl_page *entry; + unsigned long addr; + + addr = start_addr + i * PAGE_SIZE; + + /* + * Stop when reaching the SECS page - it does not + * have a page_array entry and its reclaim is + * started and completed with enclave mutex held so + * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED + * flag. + */ + if (addr == encl->base + encl->size) + break; + + entry = xa_load(&encl->page_array, PFN_DOWN(addr)); + if (!entry) + continue; + + /* + * VA page slot ID uses same bit as the flag so it is important + * to ensure that the page is not already in backing store. + */ + if (entry->epc_page && + (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) { + reclaimed = 1; + break; + } + } + + return reclaimed; +} + /* * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's * follow right after the EPC data in the backing storage. In addition to the @@ -47,6 +133,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK; struct sgx_encl *encl = encl_page->encl; pgoff_t page_index, page_pcmd_off; + unsigned long pcmd_first_page; struct sgx_pageinfo pginfo; struct sgx_backing b; bool pcmd_page_empty; @@ -58,6 +145,11 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, else page_index = PFN_DOWN(encl->size); + /* + * Address of enclave page using the first entry within the PCMD page. + */ + pcmd_first_page = PFN_PHYS(page_index & ~PCMD_FIRST_MASK) + encl->base; + page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index); ret = sgx_encl_get_backing(encl, page_index, &b); @@ -99,7 +191,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page, sgx_encl_truncate_backing_page(encl, page_index); - if (pcmd_page_empty) + if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page)) sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off)); return ret;