diff mbox series

[V2,4/5] x86/sgx: Fix race between reclaimer and page fault handler

Message ID d0ace4a1770ab8f4196bfeae06d0970ddb14ef01.1652131695.git.reinette.chatre@intel.com (mailing list archive)
State New, archived
Headers show
Series [V2,1/5] x86/sgx: Disconnect backing page references from dirty status | expand

Commit Message

Reinette Chatre May 9, 2022, 9:48 p.m. UTC
Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
instruction faulting with a #GP.

The WARN is encountered when the reclaimer evicts a range of
pages from the enclave when the same pages are faulted back right away.

Consider two enclave pages (ENCLAVE_A and ENCLAVE_B)
sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the
enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains
just one entry, that of ENCLAVE_B.

Scenario proceeds where ENCLAVE_A is being evicted from the enclave
while ENCLAVE_B is faulted in.

sgx_reclaim_pages() {

  ...

  /*
   * Reclaim ENCLAVE_A
   */
  mutex_lock(&encl->lock);
  /*
   * Get a reference to ENCLAVE_A's
   * shmem page where enclave page
   * encrypted data will be stored
   * as well as a reference to the
   * enclave page's PCMD data page,
   * PCMD_AB.
   * Release mutex before writing
   * any data to the shmem pages.
   */
  sgx_encl_get_backing(...);
  encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
  mutex_unlock(&encl->lock);

                                    /*
                                     * Fault ENCLAVE_B
                                     */

                                    sgx_vma_fault() {

                                      mutex_lock(&encl->lock);
                                      /*
                                       * Get reference to
                                       * ENCLAVE_B's shmem page
                                       * as well as PCMD_AB.
                                       */
                                      sgx_encl_get_backing(...)
                                     /*
                                      * Load page back into
                                      * enclave via ELDU.
                                      */
                                     /*
                                      * Release reference to
                                      * ENCLAVE_B' shmem page and
                                      * PCMD_AB.
                                      */
                                     sgx_encl_put_backing(...);
                                     /*
                                      * PCMD_AB is found empty so
                                      * it and ENCLAVE_B's shmem page
                                      * are truncated.
                                      */
                                     /* Truncate ENCLAVE_B backing page */
                                     sgx_encl_truncate_backing_page();
                                     /* Truncate PCMD_AB */
                                     sgx_encl_truncate_backing_page();

                                     mutex_unlock(&encl->lock);

                                     ...
                                     }
  mutex_lock(&encl->lock);
  encl_page->desc &=
       ~SGX_ENCL_PAGE_BEING_RECLAIMED;
  /*
  * Write encrypted contents of
  * ENCLAVE_A to ENCLAVE_A shmem
  * page and its PCMD data to
  * PCMD_AB.
  */
  sgx_encl_put_backing(...)

  /*
   * Reference to PCMD_AB is
   * dropped and it is truncated.
   * ENCLAVE_A's PCMD data is lost.
   */
  mutex_unlock(&encl->lock);
}

What happens next depends on whether it is ENCLAVE_A being faulted
in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting
with a #GP.

If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called
a new PCMD page is allocated and providing the empty PCMD data for
ENCLAVE_A would cause ENCLS[ELDU] to #GP

If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the
reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction
would #GP during its checks of the PCMD value and the WARN would be
encountered.

Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time
it obtains a reference to the backing store pages of an enclave page it
is in the process of reclaiming, fix the race by only truncating the PCMD
page after ensuring that no page sharing the PCMD page is in the process
of being reclaimed.

Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
Reported-by: Haitao Huang <haitao.huang@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/sgx/encl.c | 90 +++++++++++++++++++++++++++++++++-
 1 file changed, 89 insertions(+), 1 deletion(-)

Comments

Dave Hansen May 10, 2022, 9:55 p.m. UTC | #1
On 5/9/22 14:48, Reinette Chatre wrote:
...
> +#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd))
> +/*
> + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to
> + * determine the page index associated with the first PCMD entry
> + * within a PCMD page.
> + */
> +#define PCMD_FIRST_MASK GENMASK(4, 0)
> +
> +/**
> + * pcmd_page_in_use() - Query if any enclave page associated with a PCMD
> + *                      page is in process of being reclaimed.

The name is a bit too generic for what it does.

> + * @encl:        Enclave to which PCMD page belongs
> + * @start_addr:  Address of enclave page using first entry within the PCMD page
> + *
> + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is
> + * stored. The PCMD data of a reclaimed enclave page contains enough
> + * information for the processor to verify the page at the time
> + * it is loaded back into the Enclave Page Cache (EPC).
> + *
> + * The backing storage to which enclave pages are reclaimed is laid out as
> + * follows:
> + * All enclave pages:SECS page:PCMD pages
> + *
> + * Each PCMD page contains the PCMD metadata of
> + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages.
> + *
> + * A PCMD page can only be truncated if it is (a) empty, and (b) no enclave
> + * page sharing the PCMD page is in the process of being reclaimed.

Let's also point out that (b) is _because_ the page is about to become
non-empty.

> + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
> + * intends to reclaim that enclave page - it means that the PCMD data
> + * associated with that enclave page is about to get some data and thus
> + * even if the PCMD page is empty, it should not be truncated.
> + *
> + * Return: 1 the PCMD page is in use, 0 if is not in use, -EFAULT on error
> + */
> +static int pcmd_page_in_use(struct sgx_encl *encl,
> +			    unsigned long start_addr)
> +{
> +	struct sgx_encl_page *entry;
> +	unsigned long addr;
> +	int reclaimed = 0;
> +	int i;

Can 'addr' and 'entry' be declared within the loop instead?

> +
> +	/*
> +	 * PCMD_FIRST_MASK is based on number of PCMD entries within
> +	 * PCMD page being 32.
> +	 */
> +	BUILD_BUG_ON(PCMDS_PER_PAGE != 32);
> +
> +	for (i = 0; i < PCMDS_PER_PAGE; i++) {
> +		addr = start_addr + i * PAGE_SIZE;
> +
> +		/*
> +		 * Stop when reaching the SECS page - it does not
> +		 * have a page_array entry and its reclaim is
> +		 * started and completed with enclave mutex held so
> +		 * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
> +		 * flag.
> +		 */
> +		if (addr == encl->base + encl->size)
> +			break;
> +
> +		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
> +		if (!entry)
> +			return -EFAULT;

Can xa_load() return NULL if it simply can't find an 'encl_page' at
'addr'?  In that case, there would never be a PCMD entry for the page
since it doesn't exist.  Returning -EFAULT would be a
pcmd_page_in_use()==true condition, which doesn't seem accurate.

> +		/*
> +		 * VA page slot ID uses same bit as the flag so it is important
> +		 * to ensure that the page is not already in backing store.
> +		 */

Probably a patch for another day, but isn't this a bit silly?  Are we
short of bits in ->desc or something?

> +		if (entry->epc_page &&
> +		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
> +			reclaimed = 1;
> +			break;
> +		}
> +	}
> +
> +	return reclaimed;
> +}

Could we also please add a comment about encl->lock needing to be held
over this?  Without that, there would be all kinds of trouble.
Reinette Chatre May 10, 2022, 11:01 p.m. UTC | #2
Hi Dave,

On 5/10/2022 2:55 PM, Dave Hansen wrote:
> On 5/9/22 14:48, Reinette Chatre wrote:
> ...
>> +#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd))
>> +/*
>> + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to
>> + * determine the page index associated with the first PCMD entry
>> + * within a PCMD page.
>> + */
>> +#define PCMD_FIRST_MASK GENMASK(4, 0)
>> +
>> +/**
>> + * pcmd_page_in_use() - Query if any enclave page associated with a PCMD
>> + *                      page is in process of being reclaimed.
> 
> The name is a bit too generic for what it does.

This function is called after a PCMD page is determined to be empty and is
about to be truncated. The question this function needs to answer is, "is this
PCMD page in use?" - that is, even though it is empty, it cannot be truncated
since there is a reference to this page (specifically from the reclaimer) and
a reference is obtained with the intent to write data to the page.

For some other name options, how about:
reclaimer_has_pcmd_ref()
reclaimer_using_pcmd()

Do any of these look better to you?

> 
>> + * @encl:        Enclave to which PCMD page belongs
>> + * @start_addr:  Address of enclave page using first entry within the PCMD page
>> + *
>> + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is
>> + * stored. The PCMD data of a reclaimed enclave page contains enough
>> + * information for the processor to verify the page at the time
>> + * it is loaded back into the Enclave Page Cache (EPC).
>> + *
>> + * The backing storage to which enclave pages are reclaimed is laid out as
>> + * follows:
>> + * All enclave pages:SECS page:PCMD pages
>> + *
>> + * Each PCMD page contains the PCMD metadata of
>> + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages.
>> + *
>> + * A PCMD page can only be truncated if it is (a) empty, and (b) no enclave
>> + * page sharing the PCMD page is in the process of being reclaimed.
> 
> Let's also point out that (b) is _because_ the page is about to become
> non-empty.

Thank you. I will emphasize that.

> 
>> + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
>> + * intends to reclaim that enclave page - it means that the PCMD data
>> + * associated with that enclave page is about to get some data and thus
>> + * even if the PCMD page is empty, it should not be truncated.
>> + *
>> + * Return: 1 the PCMD page is in use, 0 if is not in use, -EFAULT on error
>> + */
>> +static int pcmd_page_in_use(struct sgx_encl *encl,
>> +			    unsigned long start_addr)
>> +{
>> +	struct sgx_encl_page *entry;
>> +	unsigned long addr;
>> +	int reclaimed = 0;
>> +	int i;
> 
> Can 'addr' and 'entry' be declared within the loop instead?

Yes, will do.

> 
>> +
>> +	/*
>> +	 * PCMD_FIRST_MASK is based on number of PCMD entries within
>> +	 * PCMD page being 32.
>> +	 */
>> +	BUILD_BUG_ON(PCMDS_PER_PAGE != 32);
>> +
>> +	for (i = 0; i < PCMDS_PER_PAGE; i++) {
>> +		addr = start_addr + i * PAGE_SIZE;
>> +
>> +		/*
>> +		 * Stop when reaching the SECS page - it does not
>> +		 * have a page_array entry and its reclaim is
>> +		 * started and completed with enclave mutex held so
>> +		 * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
>> +		 * flag.
>> +		 */
>> +		if (addr == encl->base + encl->size)
>> +			break;
>> +
>> +		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
>> +		if (!entry)
>> +			return -EFAULT;
> 
> Can xa_load() return NULL if it simply can't find an 'encl_page' at
> 'addr'?  In that case, there would never be a PCMD entry for the page
> since it doesn't exist.  Returning -EFAULT would be a
> pcmd_page_in_use()==true condition, which doesn't seem accurate.

Thank you very much for catching this. This should be:

entry = xa_load(&encl->page_array, PFN_DOWN(addr));
if (!entry)
	continue;

> 
>> +		/*
>> +		 * VA page slot ID uses same bit as the flag so it is important
>> +		 * to ensure that the page is not already in backing store.
>> +		 */
> 
> Probably a patch for another day, but isn't this a bit silly?  Are we
> short of bits in ->desc or something?

I am not familiar with the history behind this. The VA slot information
do indeed take up quite a few bits though, leaving just three bits
available.

> 
>> +		if (entry->epc_page &&
>> +		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
>> +			reclaimed = 1;
>> +			break;
>> +		}
>> +	}
>> +
>> +	return reclaimed;
>> +}
> 
> Could we also please add a comment about encl->lock needing to be held
> over this?  Without that, there would be all kinds of trouble.

Will do. I will add a "Context:" portion to the function's kernel-doc.

Reinette
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 2521d64e8bcf..d1d4e8572702 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -12,6 +12,87 @@ 
 #include "encls.h"
 #include "sgx.h"
 
+#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd))
+/*
+ * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to
+ * determine the page index associated with the first PCMD entry
+ * within a PCMD page.
+ */
+#define PCMD_FIRST_MASK GENMASK(4, 0)
+
+/**
+ * pcmd_page_in_use() - Query if any enclave page associated with a PCMD
+ *                      page is in process of being reclaimed.
+ * @encl:        Enclave to which PCMD page belongs
+ * @start_addr:  Address of enclave page using first entry within the PCMD page
+ *
+ * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is
+ * stored. The PCMD data of a reclaimed enclave page contains enough
+ * information for the processor to verify the page at the time
+ * it is loaded back into the Enclave Page Cache (EPC).
+ *
+ * The backing storage to which enclave pages are reclaimed is laid out as
+ * follows:
+ * All enclave pages:SECS page:PCMD pages
+ *
+ * Each PCMD page contains the PCMD metadata of
+ * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages.
+ *
+ * A PCMD page can only be truncated if it is (a) empty, and (b) no enclave
+ * page sharing the PCMD page is in the process of being reclaimed.
+ *
+ * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
+ * intends to reclaim that enclave page - it means that the PCMD data
+ * associated with that enclave page is about to get some data and thus
+ * even if the PCMD page is empty, it should not be truncated.
+ *
+ * Return: 1 the PCMD page is in use, 0 if is not in use, -EFAULT on error
+ */
+static int pcmd_page_in_use(struct sgx_encl *encl,
+			    unsigned long start_addr)
+{
+	struct sgx_encl_page *entry;
+	unsigned long addr;
+	int reclaimed = 0;
+	int i;
+
+	/*
+	 * PCMD_FIRST_MASK is based on number of PCMD entries within
+	 * PCMD page being 32.
+	 */
+	BUILD_BUG_ON(PCMDS_PER_PAGE != 32);
+
+	for (i = 0; i < PCMDS_PER_PAGE; i++) {
+		addr = start_addr + i * PAGE_SIZE;
+
+		/*
+		 * Stop when reaching the SECS page - it does not
+		 * have a page_array entry and its reclaim is
+		 * started and completed with enclave mutex held so
+		 * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
+		 * flag.
+		 */
+		if (addr == encl->base + encl->size)
+			break;
+
+		entry = xa_load(&encl->page_array, PFN_DOWN(addr));
+		if (!entry)
+			return -EFAULT;
+
+		/*
+		 * VA page slot ID uses same bit as the flag so it is important
+		 * to ensure that the page is not already in backing store.
+		 */
+		if (entry->epc_page &&
+		    (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
+			reclaimed = 1;
+			break;
+		}
+	}
+
+	return reclaimed;
+}
+
 /*
  * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's
  * follow right after the EPC data in the backing storage. In addition to the
@@ -47,6 +128,7 @@  static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK;
 	struct sgx_encl *encl = encl_page->encl;
 	pgoff_t page_index, page_pcmd_off;
+	unsigned long pcmd_first_page;
 	struct sgx_pageinfo pginfo;
 	struct sgx_backing b;
 	bool pcmd_page_empty;
@@ -58,6 +140,12 @@  static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 	else
 		page_index = PFN_DOWN(encl->size);
 
+	/*
+	 * Address of enclave page using the first entry within the
+	 * PCMD page.
+	 */
+	pcmd_first_page = PFN_PHYS(page_index & ~PCMD_FIRST_MASK) + encl->base;
+
 	page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
 
 	ret = sgx_encl_get_backing(encl, page_index, &b);
@@ -99,7 +187,7 @@  static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
 
 	sgx_encl_truncate_backing_page(encl, page_index);
 
-	if (pcmd_page_empty)
+	if (pcmd_page_empty && !pcmd_page_in_use(encl, pcmd_first_page))
 		sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
 
 	return ret;