diff mbox series

[RFC,v6,03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()

Message ID 308bd5a53199d1bf520d488f748e11ce76156a33.1614338774.git.kai.huang@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM SGX virtualization support | expand

Commit Message

Kai Huang Feb. 26, 2021, 12:14 p.m. UTC
From: Jarkko Sakkinen <jarkko@kernel.org>

EREMOVE takes a pages and removes any association between that page and
an enclave.  It must be run on a page before it can be added into
another enclave.  Currently, EREMOVE is run as part of pages being freed
into the SGX page allocator.  It is not expected to fail.

KVM does not track how guest pages are used, which means that SGX
virtualization use of EREMOVE might fail.

Break out the EREMOVE call from the SGX page allocator.  This will allow
the SGX virtualization code to use the allocator directly.  (SGX/KVM
will also introduce a more permissive EREMOVE helper).

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v5->v6:

 - Make sgx_reset_epc_page() to return int from void, and only call
   sgx_free_epc_page() when ret is successful. Because original behavior
   was: if EREMOVE failed, the page won't be put back into free EPC page pool.
   This patch doesn't intend to add functional change.

v4->v5:

 - Refined the comment of sgx_reset_epc_page(), per Dave.
 - Refined the commit msg (which I missed in v4), per Dave.
 - Refined the grammar of the comment of sgx_free_epc_page() (which I missed
   in v4), per Dave.

v3->v4:

 - Moved WARN() on SGX_EPC_PAGE_RECLAIMER_TRACKED flag to sgx_reset_epc_page(),
   since the patch to remove the WARN() in v3 was removed. Dave and Sean were
   not convinced, and Sean "tripped more than once in the past during one of
   the many rebases of the virtual EPC and EPC cgroup branches".
 - Added a comment in sgx_reset_epc_page() to explain sgx_free_epc_page() now
   won't do EREMOVE and is expecting EPC page already in clean slate, per Dave.

---
 arch/x86/kernel/cpu/sgx/encl.c | 26 +++++++++++++++++++++++---
 arch/x86/kernel/cpu/sgx/main.c | 12 ++++--------
 2 files changed, 27 insertions(+), 11 deletions(-)

Comments

Dave Hansen Feb. 26, 2021, 5:08 p.m. UTC | #1
On 2/26/21 4:14 AM, Kai Huang wrote:
> +/*
> + * Place the page in uninitialized state.  Only usable by callers that
> + * know the page is in a clean state in which EREMOVE will succeed.
> + */
> +static int sgx_reset_epc_page(struct sgx_epc_page *epc_page)
> +{
> +	int ret;
> +
> +	WARN_ON_ONCE(epc_page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
> +
> +	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
> +	WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret);
> +
> +	return ret;
> +}
> +
>  /**
>   * sgx_encl_release - Destroy an enclave instance
>   * @kref:	address of a kref inside &sgx_encl
> @@ -404,7 +421,8 @@ void sgx_encl_release(struct kref *ref)
>  			if (sgx_unmark_page_reclaimable(entry->epc_page))
>  				continue;
>  
> -			sgx_free_epc_page(entry->epc_page);
> +			if (!sgx_reset_epc_page(entry->epc_page))
> +				sgx_free_epc_page(entry->epc_page);

Won't this leak the page?

I think that's fine; the page *IS* unusable if this happens.  But, the
error message that will show up isn't super informative.  If this
happened to a bunch of EPC pages, we'd be out of EPC with nothing to
show for it.

We must give a more informative message saying that the page is leaked.
 Ideally, we'd also make this debuggable by dumping out how many of
these pages there have been somewhere.  That can wait, though, until we
have some kind of stats coming out of the code (there's nothing now).  A
comment to remind us to do this would be nice.

Anyway, these are in decent shape and only getting better.  It's time to
get some more eyeballs on them and get the RFC tag off, so assuming that
a better error message gets stuck in here:

Acked-by: Dave Hansen <dave.hansen@intel.com>
Sean Christopherson Feb. 26, 2021, 7:52 p.m. UTC | #2
On Fri, Feb 26, 2021, Dave Hansen wrote:
> On 2/26/21 4:14 AM, Kai Huang wrote:
> > @@ -404,7 +421,8 @@ void sgx_encl_release(struct kref *ref)
> >  			if (sgx_unmark_page_reclaimable(entry->epc_page))
> >  				continue;
> >  
> > -			sgx_free_epc_page(entry->epc_page);
> > +			if (!sgx_reset_epc_page(entry->epc_page))
> > +				sgx_free_epc_page(entry->epc_page);
> 
> Won't this leak the page?

Yep.

> I think that's fine; the page *IS* unusable if this happens.  But, the
> error message that will show up isn't super informative.  If this
> happened to a bunch of EPC pages, we'd be out of EPC with nothing to
> show for it.
> 
> We must give a more informative message saying that the page is leaked.
>  Ideally, we'd also make this debuggable by dumping out how many of
> these pages there have been somewhere.  That can wait, though, until we
> have some kind of stats coming out of the code (there's nothing now).  A
> comment to remind us to do this would be nice.

Eh, having debugged these several times, the WARN_ONCE in sgx_reset_epc_page()
is probably sufficient.  IIRC, when I hit this, things were either laughably
broken and every page was failing, or there was another ENCLS failure somewhere
else that provided additional info.  Not saying don't add more debug info,
rather that it's probably not a priority.

> Anyway, these are in decent shape and only getting better.  It's time to
> get some more eyeballs on them and get the RFC tag off, so assuming that
> a better error message gets stuck in here:
> 
> Acked-by: Dave Hansen <dave.hansen@intel.com>
Dave Hansen Feb. 26, 2021, 8:12 p.m. UTC | #3
On 2/26/21 11:52 AM, Sean Christopherson wrote:
>> We must give a more informative message saying that the page is leaked.
>>  Ideally, we'd also make this debuggable by dumping out how many of
>> these pages there have been somewhere.  That can wait, though, until we
>> have some kind of stats coming out of the code (there's nothing now).  A
>> comment to remind us to do this would be nice.
> Eh, having debugged these several times, the WARN_ONCE in sgx_reset_epc_page()
> is probably sufficient.  IIRC, when I hit this, things were either laughably
> broken and every page was failing, or there was another ENCLS failure somewhere
> else that provided additional info.  Not saying don't add more debug info,
> rather that it's probably not a priority.

Minimally, I just want a warning that says, "Whoops, I leaked a page".
Or EREMOVE could even say, "whoops, this *MIGHT* leak a page".

My beef is mostly that "EREMOVE failed" doesn't tell and end user squat
about what this means for their system.  At least if we say "leaked",
they have some inclination that they've got to reboot to get the page back.
Sean Christopherson Feb. 26, 2021, 10:34 p.m. UTC | #4
On Fri, Feb 26, 2021, Dave Hansen wrote:
> On 2/26/21 11:52 AM, Sean Christopherson wrote:
> >> We must give a more informative message saying that the page is leaked.
> >>  Ideally, we'd also make this debuggable by dumping out how many of
> >> these pages there have been somewhere.  That can wait, though, until we
> >> have some kind of stats coming out of the code (there's nothing now).  A
> >> comment to remind us to do this would be nice.
> > Eh, having debugged these several times, the WARN_ONCE in sgx_reset_epc_page()
> > is probably sufficient.  IIRC, when I hit this, things were either laughably
> > broken and every page was failing, or there was another ENCLS failure somewhere
> > else that provided additional info.  Not saying don't add more debug info,
> > rather that it's probably not a priority.
> 
> Minimally, I just want a warning that says, "Whoops, I leaked a page".
> Or EREMOVE could even say, "whoops, this *MIGHT* leak a page".
> 
> My beef is mostly that "EREMOVE failed" doesn't tell and end user squat
> about what this means for their system.  At least if we say "leaked",
> they have some inclination that they've got to reboot to get the page back.

Oh yeah, no argument there.
Kai Huang March 1, 2021, 6:13 a.m. UTC | #5
On Fri, 2021-02-26 at 12:12 -0800, Dave Hansen wrote:
> On 2/26/21 11:52 AM, Sean Christopherson wrote:
> > > We must give a more informative message saying that the page is leaked.
> > >  Ideally, we'd also make this debuggable by dumping out how many of
> > > these pages there have been somewhere.  That can wait, though, until we
> > > have some kind of stats coming out of the code (there's nothing now).  A
> > > comment to remind us to do this would be nice.
> > Eh, having debugged these several times, the WARN_ONCE in sgx_reset_epc_page()
> > is probably sufficient.  IIRC, when I hit this, things were either laughably
> > broken and every page was failing, or there was another ENCLS failure somewhere
> > else that provided additional info.  Not saying don't add more debug info,
> > rather that it's probably not a priority.
> 
> Minimally, I just want a warning that says, "Whoops, I leaked a page".
> Or EREMOVE could even say, "whoops, this *MIGHT* leak a page".
> 
> My beef is mostly that "EREMOVE failed" doesn't tell and end user squat
> about what this means for their system.  At least if we say "leaked",
> they have some inclination that they've got to reboot to get the page back.

Agreed that a msg to say EPC page is leaked is useful. However I found with current
sgx_reset_epc_page() I cannot find a suitable place to add:

Theoretically, it's not that right to add "EPC page is leaked", or even *might* (btw,
I don't think we should use *might* since it is vague), in to sgx_reset_epc_page(),
since whether leak or not is controlled by whether to call sgx_free_epc_page() upon
error, which is not in sgx_reset_epc_page(). And

	if (!sgx_reset_epc_page())
		sgx_free_epc_page();

is called 3 times so I don't want to add a msg for each of them.

I ended up with this solution: 

1) Rename existing sgx_free_epc_page() to sgx_encl_free_epc_page() to make it more
specific that it is used to free EPC page that is assigned to an enclave. 2) Wrap
non-EREMOVE part (putting back to free EPC pool) to sgx_free_epc_page() so it can be
used by virtual EPC.

In this way we can just put the error msg in sgx_encl_free_epc_page().

And as you said it's time to get RFC tag off, so I'll send out formal patch with
above solution, but w/o your Acked-by on this particular patch. Thanks :)
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 20a2dd5ba2b4..7a09a98fe68d 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -381,6 +381,23 @@  const struct vm_operations_struct sgx_vm_ops = {
 	.access = sgx_vma_access,
 };
 
+
+/*
+ * Place the page in uninitialized state.  Only usable by callers that
+ * know the page is in a clean state in which EREMOVE will succeed.
+ */
+static int sgx_reset_epc_page(struct sgx_epc_page *epc_page)
+{
+	int ret;
+
+	WARN_ON_ONCE(epc_page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
+
+	ret = __eremove(sgx_get_epc_virt_addr(epc_page));
+	WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret);
+
+	return ret;
+}
+
 /**
  * sgx_encl_release - Destroy an enclave instance
  * @kref:	address of a kref inside &sgx_encl
@@ -404,7 +421,8 @@  void sgx_encl_release(struct kref *ref)
 			if (sgx_unmark_page_reclaimable(entry->epc_page))
 				continue;
 
-			sgx_free_epc_page(entry->epc_page);
+			if (!sgx_reset_epc_page(entry->epc_page))
+				sgx_free_epc_page(entry->epc_page);
 			encl->secs_child_cnt--;
 			entry->epc_page = NULL;
 		}
@@ -415,7 +433,8 @@  void sgx_encl_release(struct kref *ref)
 	xa_destroy(&encl->page_array);
 
 	if (!encl->secs_child_cnt && encl->secs.epc_page) {
-		sgx_free_epc_page(encl->secs.epc_page);
+		if (!sgx_reset_epc_page(encl->secs.epc_page))
+			sgx_free_epc_page(encl->secs.epc_page);
 		encl->secs.epc_page = NULL;
 	}
 
@@ -423,7 +442,8 @@  void sgx_encl_release(struct kref *ref)
 		va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
 					   list);
 		list_del(&va_page->list);
-		sgx_free_epc_page(va_page->epc_page);
+		if (!sgx_reset_epc_page(va_page->epc_page))
+			sgx_free_epc_page(va_page->epc_page);
 		kfree(va_page);
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 8df81a3ed945..44fe91a5bfb3 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -598,18 +598,14 @@  struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
  * sgx_free_epc_page() - Free an EPC page
  * @page:	an EPC page
  *
- * Call EREMOVE for an EPC page and insert it back to the list of free pages.
+ * Put the EPC page back to the list of free pages. It's the caller's
+ * responsibility to make sure that the page is in uninitialized state. In other
+ * words, do EREMOVE, EWB or whatever operation is necessary before calling
+ * this function.
  */
 void sgx_free_epc_page(struct sgx_epc_page *page)
 {
 	struct sgx_epc_section *section = &sgx_epc_sections[page->section];
-	int ret;
-
-	WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED);
-
-	ret = __eremove(sgx_get_epc_virt_addr(page));
-	if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret))
-		return;
 
 	spin_lock(&section->lock);
 	list_add_tail(&page->list, &section->page_list);