diff mbox series

[RFC,v4,04/26] x86/sgx: Add SGX_CHILD_PRESENT hardware error code

Message ID 3c1edb38e95843eb9bf3fcbbec6cf9bdd9b3e7b1.1612777752.git.kai.huang@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM SGX virtualization support | expand

Commit Message

Huang, Kai Feb. 8, 2021, 10:54 a.m. UTC
From: Sean Christopherson <sean.j.christopherson@intel.com>

SGX driver can accurately track how enclave pages are used.  This
enables SECS to be specifically targeted and EREMOVE'd only after all
child pages have been EREMOVE'd.  This ensures that SGX driver will
never encounter SGX_CHILD_PRESENT in normal operation.

Virtual EPC is different.  The host does not track how EPC pages are
used by the guest, so it cannot guarantee EREMOVE success.  It might,
for instance, encounter a SECS with a non-zero child count.

Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
failures are expected, but only due to SGX_CHILD_PRESENT.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v3->v4:

 - Refined the commit msg, per Dave.

v2->v3:

 - Changed from 'Enclave has child' to 'SECS has child', per Jarkko.

---
 arch/x86/kernel/cpu/sgx/arch.h | 2 ++
 1 file changed, 2 insertions(+)

Comments

Dave Hansen Feb. 9, 2021, 4:24 p.m. UTC | #1
On 2/8/21 2:54 AM, Kai Huang wrote:
...
> Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
> failures are expected, but only due to SGX_CHILD_PRESENT.

This paragraph broke my brain when I read it.  How about:

	Add a definition of SGX_CHILD_PRESENT.  It will be used
	exclusively by the SGX virtualization driver to suppress EREMOVE
	warnings.
Sean Christopherson Feb. 9, 2021, 4:48 p.m. UTC | #2
On Tue, Feb 09, 2021, Dave Hansen wrote:
> On 2/8/21 2:54 AM, Kai Huang wrote:
> ...
> > Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
> > failures are expected, but only due to SGX_CHILD_PRESENT.
> 
> This paragraph broke my brain when I read it.  How about:
> 
> 	Add a definition of SGX_CHILD_PRESENT.  It will be used
> 	exclusively by the SGX virtualization driver to suppress EREMOVE
> 	warnings.

Maybe worth clarifying that the driver isn't suppressing warnings willy-nilly?
And the error code isn't about suppressing warnings, it's about identifying the
expected EREMOVE failure scenario.  The patch that creates the separate helper
for doing EREMOVE without the WARN is what provides the suppression mechanism.

Something like this?

  Add a definition of SGX_CHILD_PRESENT.  It will be used exclusively by
  the SGX virtualization driver to handle recoverable EREMOVE errors when
  saniziting EPC pages after they are reclaimed from a guest.
Dave Hansen Feb. 9, 2021, 4:52 p.m. UTC | #3
On 2/9/21 8:48 AM, Sean Christopherson wrote:
> On Tue, Feb 09, 2021, Dave Hansen wrote:
>> On 2/8/21 2:54 AM, Kai Huang wrote:
>> ...
>>> Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
>>> failures are expected, but only due to SGX_CHILD_PRESENT.
>> This paragraph broke my brain when I read it.  How about:
>>
>> 	Add a definition of SGX_CHILD_PRESENT.  It will be used
>> 	exclusively by the SGX virtualization driver to suppress EREMOVE
>> 	warnings.
> Maybe worth clarifying that the driver isn't suppressing warnings willy-nilly?
> And the error code isn't about suppressing warnings, it's about identifying the
> expected EREMOVE failure scenario.  The patch that creates the separate helper
> for doing EREMOVE without the WARN is what provides the suppression mechanism.
> 
> Something like this?
> 
>   Add a definition of SGX_CHILD_PRESENT.  It will be used exclusively by
>   the SGX virtualization driver to handle recoverable EREMOVE errors when
>   saniziting EPC pages after they are reclaimed from a guest.

Looks great to me.  One nit: to a me, "reclaim" is different than
"free".  Reclaim is a specific operation where a page is taken from one
user and reclaimed for other use.  "Free" is the more general case
(which includes reclaim) when a physical page is no longer being used
(because the user is done *or* had the page reclaimed) and may be either
used by someone else or put in a free pool.

I *think* this is actually a "free" operation, rather than a "reclaim".
 IIRC, this code gets used at munmap().
Sean Christopherson Feb. 9, 2021, 5:07 p.m. UTC | #4
On Tue, Feb 09, 2021, Dave Hansen wrote:
> On 2/9/21 8:48 AM, Sean Christopherson wrote:
> > On Tue, Feb 09, 2021, Dave Hansen wrote:
> >> On 2/8/21 2:54 AM, Kai Huang wrote:
> >> ...
> >>> Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
> >>> failures are expected, but only due to SGX_CHILD_PRESENT.
> >> This paragraph broke my brain when I read it.  How about:
> >>
> >> 	Add a definition of SGX_CHILD_PRESENT.  It will be used
> >> 	exclusively by the SGX virtualization driver to suppress EREMOVE
> >> 	warnings.
> > Maybe worth clarifying that the driver isn't suppressing warnings willy-nilly?
> > And the error code isn't about suppressing warnings, it's about identifying the
> > expected EREMOVE failure scenario.  The patch that creates the separate helper
> > for doing EREMOVE without the WARN is what provides the suppression mechanism.
> > 
> > Something like this?
> > 
> >   Add a definition of SGX_CHILD_PRESENT.  It will be used exclusively by
> >   the SGX virtualization driver to handle recoverable EREMOVE errors when
> >   saniziting EPC pages after they are reclaimed from a guest.
> 
> Looks great to me.  One nit: to a me, "reclaim" is different than
> "free".  Reclaim is a specific operation where a page is taken from one
> user and reclaimed for other use.  "Free" is the more general case
> (which includes reclaim) when a physical page is no longer being used
> (because the user is done *or* had the page reclaimed) and may be either
> used by someone else or put in a free pool.
> 
> I *think* this is actually a "free" operation, rather than a "reclaim".
>  IIRC, this code gets used at munmap().

It does.  I used reclaim because userspace, which does the freeing from this
code's perspective, never touches the EPC pages.  The SGX_CHILD_PRESENT case is
handling the scenario where userspace has for all intents and purposed reclaimed
the EPC from a guest.  If the guest cleanly tears down its enclaves, EREMOVE
will not fail.

"free" is probably better though, the above is far from obvious and still not
guaranteed to be a true reclaim scenario.  If using "freed", drop the "from a
guest" part.
Huang, Kai Feb. 9, 2021, 9:03 p.m. UTC | #5
On Tue, 2021-02-09 at 17:07 +0000, Sean Christopherson wrote:
> On Tue, Feb 09, 2021, Dave Hansen wrote:
> > On 2/9/21 8:48 AM, Sean Christopherson wrote:
> > > On Tue, Feb 09, 2021, Dave Hansen wrote:
> > > > On 2/8/21 2:54 AM, Kai Huang wrote:
> > > > ...
> > > > > Add SGX_CHILD_PRESENT for use by SGX virtualization to assert EREMOVE
> > > > > failures are expected, but only due to SGX_CHILD_PRESENT.
> > > > This paragraph broke my brain when I read it.  How about:
> > > > 
> > > > 	Add a definition of SGX_CHILD_PRESENT.  It will be used
> > > > 	exclusively by the SGX virtualization driver to suppress EREMOVE
> > > > 	warnings.
> > > Maybe worth clarifying that the driver isn't suppressing warnings willy-nilly?
> > > And the error code isn't about suppressing warnings, it's about identifying the
> > > expected EREMOVE failure scenario.  The patch that creates the separate helper
> > > for doing EREMOVE without the WARN is what provides the suppression mechanism.
> > > 
> > > Something like this?
> > > 
> > >   Add a definition of SGX_CHILD_PRESENT.  It will be used exclusively by
> > >   the SGX virtualization driver to handle recoverable EREMOVE errors when
> > >   saniziting EPC pages after they are reclaimed from a guest.
> > 
> > Looks great to me.  One nit: to a me, "reclaim" is different than
> > "free".  Reclaim is a specific operation where a page is taken from one
> > user and reclaimed for other use.  "Free" is the more general case
> > (which includes reclaim) when a physical page is no longer being used
> > (because the user is done *or* had the page reclaimed) and may be either
> > used by someone else or put in a free pool.
> > 
> > I *think* this is actually a "free" operation, rather than a "reclaim".
> >  IIRC, this code gets used at munmap().
> 
> It does.  I used reclaim because userspace, which does the freeing from this
> code's perspective, never touches the EPC pages.  The SGX_CHILD_PRESENT case is
> handling the scenario where userspace has for all intents and purposed reclaimed
> the EPC from a guest.  If the guest cleanly tears down its enclaves, EREMOVE
> will not fail.
> 
> "free" is probably better though, the above is far from obvious and still not
> guaranteed to be a true reclaim scenario.  If using "freed", drop the "from a
> guest" part.

Thanks for feedback. I'll use below:

  Add a definition of SGX_CHILD_PRESENT.  It will be used exclusively by
  the SGX virtualization driver to handle recoverable EREMOVE errors when
  saniziting EPC pages after they are freed.
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/sgx/arch.h b/arch/x86/kernel/cpu/sgx/arch.h
index dd7602c44c72..abf99bb71fdc 100644
--- a/arch/x86/kernel/cpu/sgx/arch.h
+++ b/arch/x86/kernel/cpu/sgx/arch.h
@@ -26,12 +26,14 @@ 
  * enum sgx_return_code - The return code type for ENCLS, ENCLU and ENCLV
  * %SGX_NOT_TRACKED:		Previous ETRACK's shootdown sequence has not
  *				been completed yet.
+ * %SGX_CHILD_PRESENT		SECS has child pages present in the EPC.
  * %SGX_INVALID_EINITTOKEN:	EINITTOKEN is invalid and enclave signer's
  *				public key does not match IA32_SGXLEPUBKEYHASH.
  * %SGX_UNMASKED_EVENT:		An unmasked event, e.g. INTR, was received
  */
 enum sgx_return_code {
 	SGX_NOT_TRACKED			= 11,
+	SGX_CHILD_PRESENT		= 13,
 	SGX_INVALID_EINITTOKEN		= 16,
 	SGX_UNMASKED_EVENT		= 128,
 };