diff mbox series

[v39,21/24] x86/vdso: Implement a vDSO for Intel SGX enclave call

Message ID 20201003045059.665934-22-jarkko.sakkinen@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Intel SGX foundations | expand

Commit Message

Jarkko Sakkinen Oct. 3, 2020, 4:50 a.m. UTC
From: Sean Christopherson <sean.j.christopherson@intel.com>

An SGX runtime must be aware of the exceptions, which happen inside an
enclave. Introduce a vDSO call that wraps EENTER/ERESUME cycle and returns
the CPU exception back to the caller exactly when it happens.

Kernel fixups the exception information to RDI, RSI and RDX. The SGX call
vDSO handler fills this information to the user provided buffer or
alternatively trigger user provided callback at the time of the exception.

The calling convention supports providing the parameters in standard RDI
RSI, RDX, RCX, R8 and R9 registers, i.e. it is possible to declare the vDSO
as a C prototype, but other than that there is no specific support for
SystemV ABI. Storing XSAVE etc. is all responsibility of the enclave and
the associated run-time.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Jethro Beekman <jethro@fortanix.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Cedric Xing <cedric.xing@intel.com>
Signed-off-by: Cedric Xing <cedric.xing@intel.com>
Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/entry/vdso/Makefile    |   2 +
 arch/x86/entry/vdso/vdso.lds.S  |   1 +
 arch/x86/entry/vdso/vsgx.S      | 157 ++++++++++++++++++++++++++++++++
 arch/x86/include/asm/enclu.h    |   9 ++
 arch/x86/include/uapi/asm/sgx.h |  98 ++++++++++++++++++++
 5 files changed, 267 insertions(+)
 create mode 100644 arch/x86/entry/vdso/vsgx.S
 create mode 100644 arch/x86/include/asm/enclu.h

Comments

Sean Christopherson Oct. 6, 2020, 2:57 a.m. UTC | #1
On Sat, Oct 03, 2020 at 07:50:56AM +0300, Jarkko Sakkinen wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> +	/* Validate that the reserved area contains only zeros. */
> +	push	%rax
> +	push	%rbx
> +	mov	$SGX_ENCLAVE_RUN_RESERVED_START, %rbx
> +1:
> +	mov	(%rcx, %rbx), %rax
> +	cmpq	$0, %rax
> +	jne	.Linvalid_input
> +
> +	add	$8, %rbx
> +	cmpq	$SGX_ENCLAVE_RUN_RESERVED_END, %rbx
> +	jne	1b
> +	pop	%rbx
> +	pop	%rax

This can more succinctly be (untested):

	movq	SGX_ENCLAVE_RUN_RESERVED_1(%rbp), %rbx	
	orq	SGX_ENCLAVE_RUN_RESERVED_2(%rbp), %rbx	
	orq	SGX_ENCLAVE_RUN_RESERVED_3(%rbp), %rbx	
	jnz	.Linvalid_input

Note, %rbx is getting clobbered anyways, no need to save/restore it.

> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> index b6ba036a9b82..3dd2df44d569 100644
> --- a/arch/x86/include/uapi/asm/sgx.h
> +++ b/arch/x86/include/uapi/asm/sgx.h
> @@ -74,4 +74,102 @@ struct sgx_enclave_provision {
>  	__u64 attribute_fd;
>  };
>  
> +struct sgx_enclave_run;
> +
> +/**
> + * typedef sgx_enclave_user_handler_t - Exit handler function accepted by
> + *					__vdso_sgx_enter_enclave()
> + * @run:	Pointer to the caller provided struct sgx_enclave_run
> + *
> + * The register parameters contain the snapshot of their values at enclave
> + * exit
> + *
> + * Return:
> + *  0 or negative to exit vDSO
> + *  positive to re-enter enclave (must be EENTER or ERESUME leaf)
> + */
> +typedef int (*sgx_enclave_user_handler_t)(long rdi, long rsi, long rdx,
> +					  long rsp, long r8, long r9,
> +					  struct sgx_enclave_run *run);
> +
> +/**
> + * struct sgx_enclave_run - the execution context of __vdso_sgx_enter_enclave()
> + * @tcs:			TCS used to enter the enclave
> + * @user_handler:		User provided callback run on exception
> + * @user_data:			Data passed to the user handler
> + * @leaf:			The ENCLU leaf we were at (EENTER/ERESUME/EEXIT)
> + * @exception_vector:		The interrupt vector of the exception
> + * @exception_error_code:	The exception error code pulled out of the stack
> + * @exception_addr:		The address that triggered the exception
> + * @reserved			Reserved for possible future use
> + */
> +struct sgx_enclave_run {
> +	__u64 tcs;
> +	__u64 user_handler;
> +	__u64 user_data;
> +	__u32 leaf;

I am still very strongly opposed to omitting exit_reason.  It is not at all
difficult to imagine scenarios where 'leaf' alone is insufficient for the
caller or its handler to deduce why the CPU exited the enclave.  E.g. see
Jethro's request for intercepting interrupts.

I don't buy the argument that the N bytes needed for the exit_reason are at
all expensive.

> +	__u16 exception_vector;
> +	__u16 exception_error_code;
> +	__u64 exception_addr;
> +	__u8  reserved[24];

I also think it's a waste of space to bother with multiple reserved fields.
24 bytes isn't so much that it guarantees we'll never run into problems in
the future.  But I care far less about this than I do about exit_reason.

> +};
Jethro Beekman Oct. 6, 2020, 8:30 a.m. UTC | #2
On 2020-10-06 04:57, Sean Christopherson wrote:
> On Sat, Oct 03, 2020 at 07:50:56AM +0300, Jarkko Sakkinen wrote:
>> From: Sean Christopherson <sean.j.christopherson@intel.com>
>> +	/* Validate that the reserved area contains only zeros. */
>> +	push	%rax
>> +	push	%rbx
>> +	mov	$SGX_ENCLAVE_RUN_RESERVED_START, %rbx
>> +1:
>> +	mov	(%rcx, %rbx), %rax
>> +	cmpq	$0, %rax
>> +	jne	.Linvalid_input
>> +
>> +	add	$8, %rbx
>> +	cmpq	$SGX_ENCLAVE_RUN_RESERVED_END, %rbx
>> +	jne	1b
>> +	pop	%rbx
>> +	pop	%rax
> 
> This can more succinctly be (untested):
> 
> 	movq	SGX_ENCLAVE_RUN_RESERVED_1(%rbp), %rbx	
> 	orq	SGX_ENCLAVE_RUN_RESERVED_2(%rbp), %rbx	
> 	orq	SGX_ENCLAVE_RUN_RESERVED_3(%rbp), %rbx	
> 	jnz	.Linvalid_input
> 
> Note, %rbx is getting clobbered anyways, no need to save/restore it.
> 
>> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
>> index b6ba036a9b82..3dd2df44d569 100644
>> --- a/arch/x86/include/uapi/asm/sgx.h
>> +++ b/arch/x86/include/uapi/asm/sgx.h
>> @@ -74,4 +74,102 @@ struct sgx_enclave_provision {
>>  	__u64 attribute_fd;
>>  };
>>  
>> +struct sgx_enclave_run;
>> +
>> +/**
>> + * typedef sgx_enclave_user_handler_t - Exit handler function accepted by
>> + *					__vdso_sgx_enter_enclave()
>> + * @run:	Pointer to the caller provided struct sgx_enclave_run
>> + *
>> + * The register parameters contain the snapshot of their values at enclave
>> + * exit
>> + *
>> + * Return:
>> + *  0 or negative to exit vDSO
>> + *  positive to re-enter enclave (must be EENTER or ERESUME leaf)
>> + */
>> +typedef int (*sgx_enclave_user_handler_t)(long rdi, long rsi, long rdx,
>> +					  long rsp, long r8, long r9,
>> +					  struct sgx_enclave_run *run);
>> +
>> +/**
>> + * struct sgx_enclave_run - the execution context of __vdso_sgx_enter_enclave()
>> + * @tcs:			TCS used to enter the enclave
>> + * @user_handler:		User provided callback run on exception
>> + * @user_data:			Data passed to the user handler
>> + * @leaf:			The ENCLU leaf we were at (EENTER/ERESUME/EEXIT)
>> + * @exception_vector:		The interrupt vector of the exception
>> + * @exception_error_code:	The exception error code pulled out of the stack
>> + * @exception_addr:		The address that triggered the exception
>> + * @reserved			Reserved for possible future use
>> + */
>> +struct sgx_enclave_run {
>> +	__u64 tcs;
>> +	__u64 user_handler;
>> +	__u64 user_data;
>> +	__u32 leaf;
> 
> I am still very strongly opposed to omitting exit_reason.  It is not at all
> difficult to imagine scenarios where 'leaf' alone is insufficient for the
> caller or its handler to deduce why the CPU exited the enclave.  E.g. see
> Jethro's request for intercepting interrupts.

Not entirely sure what this has to do with my request, I just expect to see leaf=ERESUME in this case, I think? E.g. as you would see in EAX when calling ENCLU.

--
Jethro Beekman | Fortanix
Sean Christopherson Oct. 6, 2020, 3:15 p.m. UTC | #3
On Tue, Oct 06, 2020 at 10:30:16AM +0200, Jethro Beekman wrote:
> On 2020-10-06 04:57, Sean Christopherson wrote:
> > On Sat, Oct 03, 2020 at 07:50:56AM +0300, Jarkko Sakkinen wrote:
> >> +struct sgx_enclave_run {
> >> +  __u64 tcs;
> >> +  __u64 user_handler;
> >> +  __u64 user_data;
> >> +  __u32 leaf;
> >
> > I am still very strongly opposed to omitting exit_reason.  It is not at all
> > difficult to imagine scenarios where 'leaf' alone is insufficient for the
> > caller or its handler to deduce why the CPU exited the enclave.  E.g. see
> > Jethro's request for intercepting interrupts.
>
> Not entirely sure what this has to do with my request, I just expect to see
> leaf=ERESUME in this case, I think? E.g. as you would see in EAX when calling
> ENCLU.

But how would you differentiate from the case that an exception occured in
the enclave?  That will also transfer control with leaf=ERESUME.  If there
was a prior exception and userspace didn't zero out the struct, there would
be "valid" data in the exception fields.

An exit_reason also would allow retrofitting the exception fields into a
union, i.e. the fields are valid if and only if exit_reason is exception.
Jarkko Sakkinen Oct. 6, 2020, 3:36 p.m. UTC | #4
On Mon, Oct 05, 2020 at 07:57:05PM -0700, Sean Christopherson wrote:
> On Sat, Oct 03, 2020 at 07:50:56AM +0300, Jarkko Sakkinen wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > +	/* Validate that the reserved area contains only zeros. */
> > +	push	%rax
> > +	push	%rbx
> > +	mov	$SGX_ENCLAVE_RUN_RESERVED_START, %rbx
> > +1:
> > +	mov	(%rcx, %rbx), %rax
> > +	cmpq	$0, %rax
> > +	jne	.Linvalid_input
> > +
> > +	add	$8, %rbx
> > +	cmpq	$SGX_ENCLAVE_RUN_RESERVED_END, %rbx
> > +	jne	1b
> > +	pop	%rbx
> > +	pop	%rax
> 
> This can more succinctly be (untested):
> 
> 	movq	SGX_ENCLAVE_RUN_RESERVED_1(%rbp), %rbx	
> 	orq	SGX_ENCLAVE_RUN_RESERVED_2(%rbp), %rbx	
> 	orq	SGX_ENCLAVE_RUN_RESERVED_3(%rbp), %rbx	
> 	jnz	.Linvalid_input
> 
> Note, %rbx is getting clobbered anyways, no need to save/restore it.

Right of course, because TCS comes through the run-struct. I've created
a backlog entry for this. Thank you.

> > diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> > index b6ba036a9b82..3dd2df44d569 100644
> > --- a/arch/x86/include/uapi/asm/sgx.h
> > +++ b/arch/x86/include/uapi/asm/sgx.h
> > @@ -74,4 +74,102 @@ struct sgx_enclave_provision {
> >  	__u64 attribute_fd;
> >  };
> >  
> > +struct sgx_enclave_run;
> > +
> > +/**
> > + * typedef sgx_enclave_user_handler_t - Exit handler function accepted by
> > + *					__vdso_sgx_enter_enclave()
> > + * @run:	Pointer to the caller provided struct sgx_enclave_run
> > + *
> > + * The register parameters contain the snapshot of their values at enclave
> > + * exit
> > + *
> > + * Return:
> > + *  0 or negative to exit vDSO
> > + *  positive to re-enter enclave (must be EENTER or ERESUME leaf)
> > + */
> > +typedef int (*sgx_enclave_user_handler_t)(long rdi, long rsi, long rdx,
> > +					  long rsp, long r8, long r9,
> > +					  struct sgx_enclave_run *run);
> > +
> > +/**
> > + * struct sgx_enclave_run - the execution context of __vdso_sgx_enter_enclave()
> > + * @tcs:			TCS used to enter the enclave
> > + * @user_handler:		User provided callback run on exception
> > + * @user_data:			Data passed to the user handler
> > + * @leaf:			The ENCLU leaf we were at (EENTER/ERESUME/EEXIT)
> > + * @exception_vector:		The interrupt vector of the exception
> > + * @exception_error_code:	The exception error code pulled out of the stack
> > + * @exception_addr:		The address that triggered the exception
> > + * @reserved			Reserved for possible future use
> > + */
> > +struct sgx_enclave_run {
> > +	__u64 tcs;
> > +	__u64 user_handler;
> > +	__u64 user_data;
> > +	__u32 leaf;
> 
> I am still very strongly opposed to omitting exit_reason.  It is not at all
> difficult to imagine scenarios where 'leaf' alone is insufficient for the
> caller or its handler to deduce why the CPU exited the enclave.  E.g. see
> Jethro's request for intercepting interrupts.
> 
> I don't buy the argument that the N bytes needed for the exit_reason are at
> all expensive.

It's not used for anything.

> > +	__u16 exception_vector;
> > +	__u16 exception_error_code;
> > +	__u64 exception_addr;
> > +	__u8  reserved[24];
> 
> I also think it's a waste of space to bother with multiple reserved fields.
> 24 bytes isn't so much that it guarantees we'll never run into problems in
> the future.  But I care far less about this than I do about exit_reason.

/Jarkko
Jarkko Sakkinen Oct. 6, 2020, 3:49 p.m. UTC | #5
On Tue, Oct 06, 2020 at 10:30:16AM +0200, Jethro Beekman wrote:
> On 2020-10-06 04:57, Sean Christopherson wrote:
> > On Sat, Oct 03, 2020 at 07:50:56AM +0300, Jarkko Sakkinen wrote:
> >> From: Sean Christopherson <sean.j.christopherson@intel.com>
> >> +	/* Validate that the reserved area contains only zeros. */
> >> +	push	%rax
> >> +	push	%rbx
> >> +	mov	$SGX_ENCLAVE_RUN_RESERVED_START, %rbx
> >> +1:
> >> +	mov	(%rcx, %rbx), %rax
> >> +	cmpq	$0, %rax
> >> +	jne	.Linvalid_input
> >> +
> >> +	add	$8, %rbx
> >> +	cmpq	$SGX_ENCLAVE_RUN_RESERVED_END, %rbx
> >> +	jne	1b
> >> +	pop	%rbx
> >> +	pop	%rax
> > 
> > This can more succinctly be (untested):
> > 
> > 	movq	SGX_ENCLAVE_RUN_RESERVED_1(%rbp), %rbx	
> > 	orq	SGX_ENCLAVE_RUN_RESERVED_2(%rbp), %rbx	
> > 	orq	SGX_ENCLAVE_RUN_RESERVED_3(%rbp), %rbx	
> > 	jnz	.Linvalid_input
> > 
> > Note, %rbx is getting clobbered anyways, no need to save/restore it.
> > 
> >> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> >> index b6ba036a9b82..3dd2df44d569 100644
> >> --- a/arch/x86/include/uapi/asm/sgx.h
> >> +++ b/arch/x86/include/uapi/asm/sgx.h
> >> @@ -74,4 +74,102 @@ struct sgx_enclave_provision {
> >>  	__u64 attribute_fd;
> >>  };
> >>  
> >> +struct sgx_enclave_run;
> >> +
> >> +/**
> >> + * typedef sgx_enclave_user_handler_t - Exit handler function accepted by
> >> + *					__vdso_sgx_enter_enclave()
> >> + * @run:	Pointer to the caller provided struct sgx_enclave_run
> >> + *
> >> + * The register parameters contain the snapshot of their values at enclave
> >> + * exit
> >> + *
> >> + * Return:
> >> + *  0 or negative to exit vDSO
> >> + *  positive to re-enter enclave (must be EENTER or ERESUME leaf)
> >> + */
> >> +typedef int (*sgx_enclave_user_handler_t)(long rdi, long rsi, long rdx,
> >> +					  long rsp, long r8, long r9,
> >> +					  struct sgx_enclave_run *run);
> >> +
> >> +/**
> >> + * struct sgx_enclave_run - the execution context of __vdso_sgx_enter_enclave()
> >> + * @tcs:			TCS used to enter the enclave
> >> + * @user_handler:		User provided callback run on exception
> >> + * @user_data:			Data passed to the user handler
> >> + * @leaf:			The ENCLU leaf we were at (EENTER/ERESUME/EEXIT)
> >> + * @exception_vector:		The interrupt vector of the exception
> >> + * @exception_error_code:	The exception error code pulled out of the stack
> >> + * @exception_addr:		The address that triggered the exception
> >> + * @reserved			Reserved for possible future use
> >> + */
> >> +struct sgx_enclave_run {
> >> +	__u64 tcs;
> >> +	__u64 user_handler;
> >> +	__u64 user_data;
> >> +	__u32 leaf;
> > 
> > I am still very strongly opposed to omitting exit_reason.  It is not at all
> > difficult to imagine scenarios where 'leaf' alone is insufficient for the
> > caller or its handler to deduce why the CPU exited the enclave.  E.g. see
> > Jethro's request for intercepting interrupts.
> 
> Not entirely sure what this has to do with my request, I just expect
> to see leaf=ERESUME in this case, I think? E.g. as you would see in
> EAX when calling ENCLU.

The documentation needs to be fixed but the answer is yes.

I.e.

- Leaf will contain ERESUME on interrupt.
- Leaf will contain EEXIT on normal exit.

Maybe I should rename it as exit_leaf and rewrite the description to
improve clarity?

/Jarkko
Jarkko Sakkinen Oct. 6, 2020, 5:28 p.m. UTC | #6
On Tue, Oct 06, 2020 at 08:15:32AM -0700, Sean Christopherson wrote:
> On Tue, Oct 06, 2020 at 10:30:16AM +0200, Jethro Beekman wrote:
> > On 2020-10-06 04:57, Sean Christopherson wrote:
> > > On Sat, Oct 03, 2020 at 07:50:56AM +0300, Jarkko Sakkinen wrote:
> > >> +struct sgx_enclave_run {
> > >> +  __u64 tcs;
> > >> +  __u64 user_handler;
> > >> +  __u64 user_data;
> > >> +  __u32 leaf;
> > >
> > > I am still very strongly opposed to omitting exit_reason.  It is not at all
> > > difficult to imagine scenarios where 'leaf' alone is insufficient for the
> > > caller or its handler to deduce why the CPU exited the enclave.  E.g. see
> > > Jethro's request for intercepting interrupts.
> >
> > Not entirely sure what this has to do with my request, I just expect to see
> > leaf=ERESUME in this case, I think? E.g. as you would see in EAX when calling
> > ENCLU.
> 
> But how would you differentiate from the case that an exception occured in
> the enclave?  That will also transfer control with leaf=ERESUME.  If there
> was a prior exception and userspace didn't zero out the struct, there would
> be "valid" data in the exception fields.
> 
> An exit_reason also would allow retrofitting the exception fields into a
> union, i.e. the fields are valid if and only if exit_reason is exception.

Let's purge this a bit. Please remark where my logic goes wrong. I'm
just explaining how I've deduced the whole thing.

The information was encoded in v38 version of the vDSO was exactly this:

- On normal EEXIT, it got the value 0.
- Otherwise, it got the value 1.

The leaf, then embdded to struct sgx_exception but essentially the same
field got the value from EAX, and the value that EAX had was only
written on exception path.

Thus, I deduced that if you write $EEXIT to leaf on synchrous exit you
get the same information content, nothing gets overwritten. I.e. you
can make same conclusions as you would with those two struct fields.

/Jarkko
Jarkko Sakkinen Oct. 6, 2020, 9:39 p.m. UTC | #7
On Mon, Oct 05, 2020 at 07:57:05PM -0700, Sean Christopherson wrote:
> On Sat, Oct 03, 2020 at 07:50:56AM +0300, Jarkko Sakkinen wrote:
> > +	__u16 exception_vector;
> > +	__u16 exception_error_code;
> > +	__u64 exception_addr;
> > +	__u8  reserved[24];
> 
> I also think it's a waste of space to bother with multiple reserved fields.
> 24 bytes isn't so much that it guarantees we'll never run into problems in
> the future.  But I care far less about this than I do about exit_reason.

For me the real problem is that there has not been "no brainer" basis
for any size, so a one cache line worth of data is just something that
makes sense, because would neither make much sense to have less.

I'll throw an argument to have it a bit bigger amount of reserved space
for future use.

First, there is always some amount of unknown unknowns when it comes to
run-time structures, given the evolution of microarchitectures. So yes,
some more "state" might be needed in the future.

Secondly, this is a bigger problem for the vDSO than it is for ioctl's
because we can have only one. With ioctl's, in the absolute worst case,
we can have a second version of the same ioctl.

At least 256 bytes would be probably a good number, if we want to
increase it. The reserved space zero validation that I implemented to
this version probably does not add much to the overhead anyway.

I'm not sure why care about one struct field more than making sure that
the run-time structure can stand time.

/Jarkko
Sean Christopherson Oct. 6, 2020, 11:21 p.m. UTC | #8
On Tue, Oct 06, 2020 at 08:28:19PM +0300, Jarkko Sakkinen wrote:
> On Tue, Oct 06, 2020 at 08:15:32AM -0700, Sean Christopherson wrote:
> > On Tue, Oct 06, 2020 at 10:30:16AM +0200, Jethro Beekman wrote:
> > > On 2020-10-06 04:57, Sean Christopherson wrote:
> > > > On Sat, Oct 03, 2020 at 07:50:56AM +0300, Jarkko Sakkinen wrote:
> > > >> +struct sgx_enclave_run {
> > > >> +  __u64 tcs;
> > > >> +  __u64 user_handler;
> > > >> +  __u64 user_data;
> > > >> +  __u32 leaf;
> > > >
> > > > I am still very strongly opposed to omitting exit_reason.  It is not at all
> > > > difficult to imagine scenarios where 'leaf' alone is insufficient for the
> > > > caller or its handler to deduce why the CPU exited the enclave.  E.g. see
> > > > Jethro's request for intercepting interrupts.
> > >
> > > Not entirely sure what this has to do with my request, I just expect to see
> > > leaf=ERESUME in this case, I think? E.g. as you would see in EAX when calling
> > > ENCLU.
> > 
> > But how would you differentiate from the case that an exception occured in
> > the enclave?  That will also transfer control with leaf=ERESUME.  If there
> > was a prior exception and userspace didn't zero out the struct, there would
> > be "valid" data in the exception fields.
> > 
> > An exit_reason also would allow retrofitting the exception fields into a
> > union, i.e. the fields are valid if and only if exit_reason is exception.
> 
> Let's purge this a bit. Please remark where my logic goes wrong. I'm
> just explaining how I've deduced the whole thing.
> 
> The information was encoded in v38 version of the vDSO was exactly this:
> 
> - On normal EEXIT, it got the value 0.
> - Otherwise, it got the value 1.
> 
> The leaf, then embdded to struct sgx_exception but essentially the same
> field got the value from EAX, and the value that EAX had was only
> written on exception path.
> 
> Thus, I deduced that if you write $EEXIT to leaf on synchrous exit you
> get the same information content, nothing gets overwritten. I.e. you
> can make same conclusions as you would with those two struct fields.

And then a third flavor comes along, e.g. Jethro's request interrupt case,
and exit_reason can also return '2'.  How do you handle that with only the
leaf?
Jarkko Sakkinen Oct. 7, 2020, 12:22 a.m. UTC | #9
On Tue, Oct 06, 2020 at 04:21:29PM -0700, Sean Christopherson wrote:
> On Tue, Oct 06, 2020 at 08:28:19PM +0300, Jarkko Sakkinen wrote:
> > On Tue, Oct 06, 2020 at 08:15:32AM -0700, Sean Christopherson wrote:
> > > On Tue, Oct 06, 2020 at 10:30:16AM +0200, Jethro Beekman wrote:
> > > > On 2020-10-06 04:57, Sean Christopherson wrote:
> > > > > On Sat, Oct 03, 2020 at 07:50:56AM +0300, Jarkko Sakkinen wrote:
> > > > >> +struct sgx_enclave_run {
> > > > >> +  __u64 tcs;
> > > > >> +  __u64 user_handler;
> > > > >> +  __u64 user_data;
> > > > >> +  __u32 leaf;
> > > > >
> > > > > I am still very strongly opposed to omitting exit_reason.  It is not at all
> > > > > difficult to imagine scenarios where 'leaf' alone is insufficient for the
> > > > > caller or its handler to deduce why the CPU exited the enclave.  E.g. see
> > > > > Jethro's request for intercepting interrupts.
> > > >
> > > > Not entirely sure what this has to do with my request, I just expect to see
> > > > leaf=ERESUME in this case, I think? E.g. as you would see in EAX when calling
> > > > ENCLU.
> > > 
> > > But how would you differentiate from the case that an exception occured in
> > > the enclave?  That will also transfer control with leaf=ERESUME.  If there
> > > was a prior exception and userspace didn't zero out the struct, there would
> > > be "valid" data in the exception fields.
> > > 
> > > An exit_reason also would allow retrofitting the exception fields into a
> > > union, i.e. the fields are valid if and only if exit_reason is exception.
> > 
> > Let's purge this a bit. Please remark where my logic goes wrong. I'm
> > just explaining how I've deduced the whole thing.
> > 
> > The information was encoded in v38 version of the vDSO was exactly this:
> > 
> > - On normal EEXIT, it got the value 0.
> > - Otherwise, it got the value 1.
> > 
> > The leaf, then embdded to struct sgx_exception but essentially the same
> > field got the value from EAX, and the value that EAX had was only
> > written on exception path.
> > 
> > Thus, I deduced that if you write $EEXIT to leaf on synchrous exit you
> > get the same information content, nothing gets overwritten. I.e. you
> > can make same conclusions as you would with those two struct fields.
> 
> And then a third flavor comes along, e.g. Jethro's request interrupt case,
> and exit_reason can also return '2'.  How do you handle that with only the
> leaf?

I'm listening. How was that handled before? I saw only '0' and '1'.  Can
you bring some context on that? I did read the emails that were swapped
when the run structure was added but I'm not sure what is the exact
differentiator. Maybe I'm missing something.

/Jarkko
Jarkko Sakkinen Oct. 7, 2020, 12:23 a.m. UTC | #10
On Wed, Oct 07, 2020 at 12:39:27AM +0300, Jarkko Sakkinen wrote:
> On Mon, Oct 05, 2020 at 07:57:05PM -0700, Sean Christopherson wrote:
> > On Sat, Oct 03, 2020 at 07:50:56AM +0300, Jarkko Sakkinen wrote:
> > > +	__u16 exception_vector;
> > > +	__u16 exception_error_code;
> > > +	__u64 exception_addr;
> > > +	__u8  reserved[24];
> > 
> > I also think it's a waste of space to bother with multiple reserved fields.
> > 24 bytes isn't so much that it guarantees we'll never run into problems in
> > the future.  But I care far less about this than I do about exit_reason.
> 
> For me the real problem is that there has not been "no brainer" basis
> for any size, so a one cache line worth of data is just something that
> makes sense, because would neither make much sense to have less.
> 
> I'll throw an argument to have it a bit bigger amount of reserved space
> for future use.
> 
> First, there is always some amount of unknown unknowns when it comes to
> run-time structures, given the evolution of microarchitectures. So yes,
> some more "state" might be needed in the future.
> 
> Secondly, this is a bigger problem for the vDSO than it is for ioctl's
> because we can have only one. With ioctl's, in the absolute worst case,
> we can have a second version of the same ioctl.
> 
> At least 256 bytes would be probably a good number, if we want to
> increase it. The reserved space zero validation that I implemented to
> this version probably does not add much to the overhead anyway.
> 
> I'm not sure why care about one struct field more than making sure that
> the run-time structure can stand time.

So what I could do is to grow the reserved area and based on my response
explain this in the changelog message but I need to make sure that I got
the reasoning right behind the size.

/Jarkko
Sean Christopherson Oct. 7, 2020, 1:17 a.m. UTC | #11
On Wed, Oct 07, 2020 at 03:22:36AM +0300, Jarkko Sakkinen wrote:
> On Tue, Oct 06, 2020 at 04:21:29PM -0700, Sean Christopherson wrote:
> > On Tue, Oct 06, 2020 at 08:28:19PM +0300, Jarkko Sakkinen wrote:
> > > On Tue, Oct 06, 2020 at 08:15:32AM -0700, Sean Christopherson wrote:
> > > > On Tue, Oct 06, 2020 at 10:30:16AM +0200, Jethro Beekman wrote:
> > > > > On 2020-10-06 04:57, Sean Christopherson wrote:
> > > > > > On Sat, Oct 03, 2020 at 07:50:56AM +0300, Jarkko Sakkinen wrote:
> > > > > >> +struct sgx_enclave_run {
> > > > > >> +  __u64 tcs;
> > > > > >> +  __u64 user_handler;
> > > > > >> +  __u64 user_data;
> > > > > >> +  __u32 leaf;
> > > > > >
> > > > > > I am still very strongly opposed to omitting exit_reason.  It is not at all
> > > > > > difficult to imagine scenarios where 'leaf' alone is insufficient for the
> > > > > > caller or its handler to deduce why the CPU exited the enclave.  E.g. see
> > > > > > Jethro's request for intercepting interrupts.
> > > > >
> > > > > Not entirely sure what this has to do with my request, I just expect to see
> > > > > leaf=ERESUME in this case, I think? E.g. as you would see in EAX when calling
> > > > > ENCLU.
> > > > 
> > > > But how would you differentiate from the case that an exception occured in
> > > > the enclave?  That will also transfer control with leaf=ERESUME.  If there
> > > > was a prior exception and userspace didn't zero out the struct, there would
> > > > be "valid" data in the exception fields.
> > > > 
> > > > An exit_reason also would allow retrofitting the exception fields into a
> > > > union, i.e. the fields are valid if and only if exit_reason is exception.
> > > 
> > > Let's purge this a bit. Please remark where my logic goes wrong. I'm
> > > just explaining how I've deduced the whole thing.
> > > 
> > > The information was encoded in v38 version of the vDSO was exactly this:
> > > 
> > > - On normal EEXIT, it got the value 0.
> > > - Otherwise, it got the value 1.
> > > 
> > > The leaf, then embdded to struct sgx_exception but essentially the same
> > > field got the value from EAX, and the value that EAX had was only
> > > written on exception path.
> > > 
> > > Thus, I deduced that if you write $EEXIT to leaf on synchrous exit you
> > > get the same information content, nothing gets overwritten. I.e. you
> > > can make same conclusions as you would with those two struct fields.
> > 
> > And then a third flavor comes along, e.g. Jethro's request interrupt case,
> > and exit_reason can also return '2'.  How do you handle that with only the
> > leaf?
> 
> I'm listening. How was that handled before? I saw only '0' and '1'.  Can
> you bring some context on that? I did read the emails that were swapped
> when the run structure was added but I'm not sure what is the exact
> differentiator. Maybe I'm missing something.

https://patchwork.kernel.org/patch/11719889/
Jarkko Sakkinen Oct. 7, 2020, 3:14 a.m. UTC | #12
On Tue, Oct 06, 2020 at 06:17:38PM -0700, Sean Christopherson wrote:
> On Wed, Oct 07, 2020 at 03:22:36AM +0300, Jarkko Sakkinen wrote:
> > On Tue, Oct 06, 2020 at 04:21:29PM -0700, Sean Christopherson wrote:
> > > On Tue, Oct 06, 2020 at 08:28:19PM +0300, Jarkko Sakkinen wrote:
> > > > On Tue, Oct 06, 2020 at 08:15:32AM -0700, Sean Christopherson wrote:
> > > > > On Tue, Oct 06, 2020 at 10:30:16AM +0200, Jethro Beekman wrote:
> > > > > > On 2020-10-06 04:57, Sean Christopherson wrote:
> > > > > > > On Sat, Oct 03, 2020 at 07:50:56AM +0300, Jarkko Sakkinen wrote:
> > > > > > >> +struct sgx_enclave_run {
> > > > > > >> +  __u64 tcs;
> > > > > > >> +  __u64 user_handler;
> > > > > > >> +  __u64 user_data;
> > > > > > >> +  __u32 leaf;
> > > > > > >
> > > > > > > I am still very strongly opposed to omitting exit_reason.  It is not at all
> > > > > > > difficult to imagine scenarios where 'leaf' alone is insufficient for the
> > > > > > > caller or its handler to deduce why the CPU exited the enclave.  E.g. see
> > > > > > > Jethro's request for intercepting interrupts.
> > > > > >
> > > > > > Not entirely sure what this has to do with my request, I just expect to see
> > > > > > leaf=ERESUME in this case, I think? E.g. as you would see in EAX when calling
> > > > > > ENCLU.
> > > > > 
> > > > > But how would you differentiate from the case that an exception occured in
> > > > > the enclave?  That will also transfer control with leaf=ERESUME.  If there
> > > > > was a prior exception and userspace didn't zero out the struct, there would
> > > > > be "valid" data in the exception fields.
> > > > > 
> > > > > An exit_reason also would allow retrofitting the exception fields into a
> > > > > union, i.e. the fields are valid if and only if exit_reason is exception.
> > > > 
> > > > Let's purge this a bit. Please remark where my logic goes wrong. I'm
> > > > just explaining how I've deduced the whole thing.
> > > > 
> > > > The information was encoded in v38 version of the vDSO was exactly this:
> > > > 
> > > > - On normal EEXIT, it got the value 0.
> > > > - Otherwise, it got the value 1.
> > > > 
> > > > The leaf, then embdded to struct sgx_exception but essentially the same
> > > > field got the value from EAX, and the value that EAX had was only
> > > > written on exception path.
> > > > 
> > > > Thus, I deduced that if you write $EEXIT to leaf on synchrous exit you
> > > > get the same information content, nothing gets overwritten. I.e. you
> > > > can make same conclusions as you would with those two struct fields.
> > > 
> > > And then a third flavor comes along, e.g. Jethro's request interrupt case,
> > > and exit_reason can also return '2'.  How do you handle that with only the
> > > leaf?
> > 
> > I'm listening. How was that handled before? I saw only '0' and '1'.  Can
> > you bring some context on that? I did read the emails that were swapped
> > when the run structure was added but I'm not sure what is the exact
> > differentiator. Maybe I'm missing something.
> 
> https://patchwork.kernel.org/patch/11719889/

Thank you.

There's aboslutely nothing that is blocking adding such support for such
AEP handling in the current implementation. SGX_SYNCHRONOUS_EXIT is just
another name for EEXIT. Even if that was in place, you'd need to
separate normal and interrupt. Tristate is useless here. As far as I'm
concerned, no bottlenecks have been created.

/Jarkko
Sean Christopherson Oct. 7, 2020, 4:34 a.m. UTC | #13
On Wed, Oct 07, 2020 at 06:14:02AM +0300, Jarkko Sakkinen wrote:
> On Tue, Oct 06, 2020 at 06:17:38PM -0700, Sean Christopherson wrote:
> > On Wed, Oct 07, 2020 at 03:22:36AM +0300, Jarkko Sakkinen wrote:
> > > > And then a third flavor comes along, e.g. Jethro's request interrupt case,
> > > > and exit_reason can also return '2'.  How do you handle that with only the
> > > > leaf?
> > > 
> > > I'm listening. How was that handled before? I saw only '0' and '1'.  Can
> > > you bring some context on that? I did read the emails that were swapped
> > > when the run structure was added but I'm not sure what is the exact
> > > differentiator. Maybe I'm missing something.
> > 
> > https://patchwork.kernel.org/patch/11719889/
> 
> Thank you.
> 
> There's aboslutely nothing that is blocking adding such support for such
> AEP handling in the current implementation. SGX_SYNCHRONOUS_EXIT is just
> another name for EEXIT.

Sure.  And SGX_EXCEPTION_EXIT is just another name for EENTER|ERESUME.

> Even if that was in place, you'd need to separate normal and interrupt.
> Tristate is useless here. 

Huh?  You mean like adding SGX_INTERRUPT_EXIT and SGX_EXCEPTION_EXIT?

> As far as I'm concerned, no bottlenecks have been created.

There's no bottleneck, just an inflexible and kludgy API for userspace.

	if (run->leaf == EEXIT)
		return handle_eexit();

	if (run->leaf == EENTER || run->leaf == ERESUME)
	        return handle_exception(run->leaf);

	return -EIO;

Let's say we come up with a clever opt-in scheme that allows exception fixup
to inform the vDSO that the enclave was invalid, even on SGX1.  Now we're in
a scenario where we want to tell userspace that the enclave is lost, but
userspace assumes any exit EENTER or ERESUME is an exception.

	if (run->leaf == EEXIT)
		return handle_eexit();

	if (run->leaf == EENTER || run->leaf == ERESUME)
		return handle_invalid_enclave_or_maybe_exception();

	return -EIO;

We could add a new exit reason, but we'd still need to ensure EENTER|ERESUME
means "exception" for old userspace.  Or we could add exit_reason now and end
up with (IMO) a sane and extensible interface.

	if (run->exit_reason == SGX_ENCLAVE_INVALID)
		return handle_invalid_enclave();

	if (run->exit_reason == SGX_SYNCHRONOUS_EXIT)
		return handle_eexit();

	if (run->exit_reason == SGX_EXCEPTION)
		return handle_exception();

	return -EIO;

And maybe we get really clever and figure out a way to (deterministically)
redirect SIGALRM to the vDSO.  Then we'd want:

	if (run->exit_reason == SGX_ENCLAVE_INVALID)
		return handle_invalid_enclave();

	if (run->exit_reason == SGX_SYNCHRONOUS_EXIT)
		return handle_eexit();

	if (run->exit_reason == SGX_ALARM)
		return handle_reschedule();

	if (run->exit_reason == SGX_EXCEPTION)
		return handle_exception();

	return -EIO;

Even more hypothetical would be if Andy gets one of his wishes, and EENTER2
comes along that doesn't allow the enclave to dictate the exit point,
"returns" an error code on enclave failure, and allows the kernel to
auto-restart the enclave on IRQs/NMIs.  That (very hypothetical) scenario
fits nicely into the exit_reason handling.

I'm not arguing that any of the above is even remotely likely.  I just don't
understand why we'd want an API that at best requires heuristics in userspace
to determine why the enclave stopped running, and at worst will saddle us with
an ugly mess in the future.  All to save 4 bytes that no one cares about (they
literally cost nothing), and a single MOV in a flow that is hundreds, if not
thousands, of cycles.
Jarkko Sakkinen Oct. 7, 2020, 7:39 a.m. UTC | #14
On Tue, Oct 06, 2020 at 09:34:19PM -0700, Sean Christopherson wrote:
> On Wed, Oct 07, 2020 at 06:14:02AM +0300, Jarkko Sakkinen wrote:
> > On Tue, Oct 06, 2020 at 06:17:38PM -0700, Sean Christopherson wrote:
> > > On Wed, Oct 07, 2020 at 03:22:36AM +0300, Jarkko Sakkinen wrote:
> > > > > And then a third flavor comes along, e.g. Jethro's request interrupt case,
> > > > > and exit_reason can also return '2'.  How do you handle that with only the
> > > > > leaf?
> > > > 
> > > > I'm listening. How was that handled before? I saw only '0' and '1'.  Can
> > > > you bring some context on that? I did read the emails that were swapped
> > > > when the run structure was added but I'm not sure what is the exact
> > > > differentiator. Maybe I'm missing something.
> > > 
> > > https://patchwork.kernel.org/patch/11719889/
> > 
> > Thank you.
> > 
> > There's aboslutely nothing that is blocking adding such support for such
> > AEP handling in the current implementation. SGX_SYNCHRONOUS_EXIT is just
> > another name for EEXIT.
> 
> Sure.  And SGX_EXCEPTION_EXIT is just another name for EENTER|ERESUME.

Kind of yes.

> > Even if that was in place, you'd need to separate normal and interrupt.
> > Tristate is useless here. 
> 
> Huh?  You mean like adding SGX_INTERRUPT_EXIT and SGX_EXCEPTION_EXIT?

OK, so I'll throw something.

1. "normal" is either exception from either EENTER or ERESUME,
   or just EEXIT.
2. "interrupt" is something where you want to tailor AEP path.

> > As far as I'm concerned, no bottlenecks have been created.
> 
> There's no bottleneck, just an inflexible and kludgy API for userspace.
> 
> 	if (run->leaf == EEXIT)
> 		return handle_eexit();
> 
> 	if (run->leaf == EENTER || run->leaf == ERESUME)
> 	        return handle_exception(run->leaf);
> 
> 	return -EIO;

I think that's quite intuitive to have just one state variable.

> Let's say we come up with a clever opt-in scheme that allows exception fixup
> to inform the vDSO that the enclave was invalid, even on SGX1.  Now we're in
> a scenario where we want to tell userspace that the enclave is lost, but
> userspace assumes any exit EENTER or ERESUME is an exception.
> 
> 	if (run->leaf == EEXIT)
> 		return handle_eexit();
> 
> 	if (run->leaf == EENTER || run->leaf == ERESUME)
> 		return handle_invalid_enclave_or_maybe_exception();
> 
> 	return -EIO;

What I'd do would be to add a 'flags' field.

It could have a bit for interrupt, let's call it for the sake of an
example as SGX_ENCLAVE_RUN_FLAG_INT.

Then you'd do this if you want to exit from the vDSO instead of doing
ERESUME:

	run->flags |= SGX_ENCLAVE_RUN_FLAG_INT

The vDSO would check this bit on AEP and:

1. If it's cleared, it would ERESUME.
2. If it's set, it would clear it and exit from vDSO.

> We could add a new exit reason, but we'd still need to ensure EENTER|ERESUME
> means "exception" for old userspace.  Or we could add exit_reason now and end
> up with (IMO) a sane and extensible interface.
> 
> 	if (run->exit_reason == SGX_ENCLAVE_INVALID)
> 		return handle_invalid_enclave();
> 
> 	if (run->exit_reason == SGX_SYNCHRONOUS_EXIT)
> 		return handle_eexit();
> 
> 	if (run->exit_reason == SGX_EXCEPTION)
> 		return handle_exception();
> 
> 	return -EIO;
> 
> And maybe we get really clever and figure out a way to (deterministically)
> redirect SIGALRM to the vDSO.  Then we'd want:
> 
> 	if (run->exit_reason == SGX_ENCLAVE_INVALID)
> 		return handle_invalid_enclave();
> 
> 	if (run->exit_reason == SGX_SYNCHRONOUS_EXIT)
> 		return handle_eexit();
> 
> 	if (run->exit_reason == SGX_ALARM)
> 		return handle_reschedule();
> 
> 	if (run->exit_reason == SGX_EXCEPTION)
> 		return handle_exception();
> 
> 	return -EIO;
> 
> Even more hypothetical would be if Andy gets one of his wishes, and EENTER2
> comes along that doesn't allow the enclave to dictate the exit point,
> "returns" an error code on enclave failure, and allows the kernel to
> auto-restart the enclave on IRQs/NMIs.  That (very hypothetical) scenario
> fits nicely into the exit_reason handling.
> 
> I'm not arguing that any of the above is even remotely likely.  I just don't
> understand why we'd want an API that at best requires heuristics in userspace
> to determine why the enclave stopped running, and at worst will saddle us with
> an ugly mess in the future.  All to save 4 bytes that no one cares about (they
> literally cost nothing), and a single MOV in a flow that is hundreds, if not
> thousands, of cycles.

I don't care as much as saving bytes as defining API, which has zero
ambiguous state variables.

And since the field 'leaf' is there, and was before too, no degrees of
freedom are lost. Removing one variable does not make more of a mess.

/Jarkko
Jarkko Sakkinen Oct. 7, 2020, 8:04 a.m. UTC | #15
On Wed, Oct 07, 2020 at 10:39:23AM +0300, Jarkko Sakkinen wrote:
> On Tue, Oct 06, 2020 at 09:34:19PM -0700, Sean Christopherson wrote:
> > Even more hypothetical would be if Andy gets one of his wishes, and EENTER2
> > comes along that doesn't allow the enclave to dictate the exit point,
> > "returns" an error code on enclave failure, and allows the kernel to
> > auto-restart the enclave on IRQs/NMIs.  That (very hypothetical) scenario
> > fits nicely into the exit_reason handling.
> > 
> > I'm not arguing that any of the above is even remotely likely.  I just don't
> > understand why we'd want an API that at best requires heuristics in userspace
> > to determine why the enclave stopped running, and at worst will saddle us with
> > an ugly mess in the future.  All to save 4 bytes that no one cares about (they
> > literally cost nothing), and a single MOV in a flow that is hundreds, if not
> > thousands, of cycles.
> 
> I don't care as much as saving bytes as defining API, which has zero
> ambiguous state variables.
> 
> And since the field 'leaf' is there, and was before too, no degrees of
> freedom are lost. Removing one variable does not make more of a mess.

I think the reserved space should be expanded though.

I'd go with that 256 bytes as it was before. It's still fast to validate
and the loop construct for that is already in place. I complement that
with addition to the changelog with the reasoning that I gave earlier.
That was lacking for that detail in earlier patch set versions.

/Jarkko
Sean Christopherson Oct. 7, 2020, 3:25 p.m. UTC | #16
On Wed, Oct 07, 2020 at 10:39:23AM +0300, Jarkko Sakkinen wrote:
> On Tue, Oct 06, 2020 at 09:34:19PM -0700, Sean Christopherson wrote:
> > > Even if that was in place, you'd need to separate normal and interrupt.
> > > Tristate is useless here. 
> > 
> > Huh?  You mean like adding SGX_INTERRUPT_EXIT and SGX_EXCEPTION_EXIT?
> 
> OK, so I'll throw something.
> 
> 1. "normal" is either exception from either EENTER or ERESUME,
>    or just EEXIT.
> 2. "interrupt" is something where you want to tailor AEP path.

Manipulating the behavior of the vDSO, as in #2, would be done via an input
flag.  It may be related to the exit reason, e.g. the flag may also opt-in to
a new exit reason, but that has no bearing on whether or not a dedicated exit
reason is valuable.

> > I'm not arguing that any of the above is even remotely likely.  I just don't
> > understand why we'd want an API that at best requires heuristics in userspace
> > to determine why the enclave stopped running, and at worst will saddle us with
> > an ugly mess in the future.  All to save 4 bytes that no one cares about (they
> > literally cost nothing), and a single MOV in a flow that is hundreds, if not
> > thousands, of cycles.
> 
> I don't care as much as saving bytes as defining API, which has zero
> ambiguous state variables.

How is exit_reason ambiguous?
Jarkko Sakkinen Oct. 7, 2020, 5:08 p.m. UTC | #17
On Wed, Oct 07, 2020 at 08:25:45AM -0700, Sean Christopherson wrote:
> On Wed, Oct 07, 2020 at 10:39:23AM +0300, Jarkko Sakkinen wrote:
> > On Tue, Oct 06, 2020 at 09:34:19PM -0700, Sean Christopherson wrote:
> > > > Even if that was in place, you'd need to separate normal and interrupt.
> > > > Tristate is useless here. 
> > > 
> > > Huh?  You mean like adding SGX_INTERRUPT_EXIT and SGX_EXCEPTION_EXIT?
> > 
> > OK, so I'll throw something.
> > 
> > 1. "normal" is either exception from either EENTER or ERESUME,
> >    or just EEXIT.
> > 2. "interrupt" is something where you want to tailor AEP path.
> 
> Manipulating the behavior of the vDSO, as in #2, would be done via an input
> flag.  It may be related to the exit reason, e.g. the flag may also opt-in to
> a new exit reason, but that has no bearing on whether or not a dedicated exit
> reason is valuable.

The fact is that AEP path is not actual right now.

I'm not even interested to go further on discussing about feature that
does not exist. Perhaps if/when it exist it turns out that we want a
variable lets say 'exit_reason' to present something in that context.

I'm neither against that or for it because it is all speculative.

> > > I'm not arguing that any of the above is even remotely likely.  I just don't
> > > understand why we'd want an API that at best requires heuristics in userspace
> > > to determine why the enclave stopped running, and at worst will saddle us with
> > > an ugly mess in the future.  All to save 4 bytes that no one cares about (they
> > > literally cost nothing), and a single MOV in a flow that is hundreds, if not
> > > thousands, of cycles.
> > 
> > I don't care as much as saving bytes as defining API, which has zero
> > ambiguous state variables.
> 
> How is exit_reason ambiguous?

I rather pick the word redundant:

1. 'leaf' exist anyway.
2. It can represent all the state we need right now.
3. It does not block anything.,

I care deeply about wasting 4 bytes in a fixed size struct for
absolutely nothing.

/Jarkko
Jarkko Sakkinen Oct. 7, 2020, 5:13 p.m. UTC | #18
On Wed, Oct 07, 2020 at 08:08:32PM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 07, 2020 at 08:25:45AM -0700, Sean Christopherson wrote:
> > On Wed, Oct 07, 2020 at 10:39:23AM +0300, Jarkko Sakkinen wrote:
> > > On Tue, Oct 06, 2020 at 09:34:19PM -0700, Sean Christopherson wrote:
> > > > > Even if that was in place, you'd need to separate normal and interrupt.
> > > > > Tristate is useless here. 
> > > > 
> > > > Huh?  You mean like adding SGX_INTERRUPT_EXIT and SGX_EXCEPTION_EXIT?
> > > 
> > > OK, so I'll throw something.
> > > 
> > > 1. "normal" is either exception from either EENTER or ERESUME,
> > >    or just EEXIT.
> > > 2. "interrupt" is something where you want to tailor AEP path.
> > 
> > Manipulating the behavior of the vDSO, as in #2, would be done via an input
> > flag.  It may be related to the exit reason, e.g. the flag may also opt-in to
> > a new exit reason, but that has no bearing on whether or not a dedicated exit
> > reason is valuable.
> 
> The fact is that AEP path is not actual right now.
> 
> I'm not even interested to go further on discussing about feature that
> does not exist. Perhaps if/when it exist it turns out that we want a
> variable lets say 'exit_reason' to present something in that context.
> 
> I'm neither against that or for it because it is all speculative.
> 
> > > > I'm not arguing that any of the above is even remotely likely.  I just don't
> > > > understand why we'd want an API that at best requires heuristics in userspace
> > > > to determine why the enclave stopped running, and at worst will saddle us with
> > > > an ugly mess in the future.  All to save 4 bytes that no one cares about (they
> > > > literally cost nothing), and a single MOV in a flow that is hundreds, if not
> > > > thousands, of cycles.
> > > 
> > > I don't care as much as saving bytes as defining API, which has zero
> > > ambiguous state variables.
> > 
> > How is exit_reason ambiguous?
> 
> I rather pick the word redundant:
> 
> 1. 'leaf' exist anyway.
> 2. It can represent all the state we need right now.
> 3. It does not block anything.,
> 
> I care deeply about wasting 4 bytes in a fixed size struct for
> absolutely nothing.

And I do care about what to pick for the struct size. My remarks on
that are lost somewhere in this thread. I absoutely do not have any
interest whether 'exit_reason' in some future situation is useful
or not.

/Jarkko
Andy Lutomirski Oct. 17, 2020, 1:48 a.m. UTC | #19
On Fri, Oct 2, 2020 at 9:51 PM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> From: Sean Christopherson <sean.j.christopherson@intel.com>
>
> An SGX runtime must be aware of the exceptions, which happen inside an
> enclave. Introduce a vDSO call that wraps EENTER/ERESUME cycle and returns
> the CPU exception back to the caller exactly when it happens.
>
> Kernel fixups the exception information to RDI, RSI and RDX. The SGX call
> vDSO handler fills this information to the user provided buffer or
> alternatively trigger user provided callback at the time of the exception.
>
> The calling convention supports providing the parameters in standard RDI
> RSI, RDX, RCX, R8 and R9 registers, i.e. it is possible to declare the vDSO
> as a C prototype, but other than that there is no specific support for
> SystemV ABI. Storing XSAVE etc. is all responsibility of the enclave and
> the associated run-time.
>
> Suggested-by: Andy Lutomirski <luto@amacapital.net>
> Acked-by: Jethro Beekman <jethro@fortanix.com>
> Tested-by: Jethro Beekman <jethro@fortanix.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Co-developed-by: Cedric Xing <cedric.xing@intel.com>
> Signed-off-by: Cedric Xing <cedric.xing@intel.com>
> Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>

> +SYM_FUNC_START(__vdso_sgx_enter_enclave)
> +       /* Prolog */
> +       .cfi_startproc
> +       push    %rbp
> +       .cfi_adjust_cfa_offset  8
> +       .cfi_rel_offset         %rbp, 0
> +       mov     %rsp, %rbp
> +       .cfi_def_cfa_register   %rbp
> +       push    %rbx
> +       .cfi_rel_offset         %rbx, -8

This *looks* right, but I'm not really an expert.

> +
> +       mov     %ecx, %eax
> +.Lenter_enclave:
> +       /* EENTER <= leaf <= ERESUME */
> +       cmp     $EENTER, %eax
> +       jb      .Linvalid_input
> +       cmp     $ERESUME, %eax
> +       ja      .Linvalid_input
> +
> +       mov     SGX_ENCLAVE_OFFSET_OF_RUN(%rbp), %rcx
> +
> +       /* Validate that the reserved area contains only zeros. */
> +       push    %rax
> +       push    %rbx

This could use a .cfi_register_something_or_other for rbx

> +       mov     $SGX_ENCLAVE_RUN_RESERVED_START, %rbx
> +1:
> +       mov     (%rcx, %rbx), %rax
> +       cmpq    $0, %rax
> +       jne     .Linvalid_input
> +
> +       add     $8, %rbx
> +       cmpq    $SGX_ENCLAVE_RUN_RESERVED_END, %rbx
> +       jne     1b
> +       pop     %rbx

This should undo it.

> +       pop     %rax
> +
> +       /* Load TCS and AEP */
> +       mov     SGX_ENCLAVE_RUN_TCS(%rcx), %rbx
> +       lea     .Lasync_exit_pointer(%rip), %rcx
> +
> +       /* Single ENCLU serving as both EENTER and AEP (ERESUME) */
> +.Lasync_exit_pointer:
> +.Lenclu_eenter_eresume:
> +       enclu
> +
> +       /* EEXIT jumps here unless the enclave is doing something fancy. */
> +       mov     SGX_ENCLAVE_OFFSET_OF_RUN(%rbp), %rbx
> +
> +       /* Set exit_reason. */
> +       movl    $EEXIT, SGX_ENCLAVE_RUN_LEAF(%rbx)
> +
> +       /* Invoke userspace's exit handler if one was provided. */
> +.Lhandle_exit:
> +       cmpq    $0, SGX_ENCLAVE_RUN_USER_HANDLER(%rbx)
> +       jne     .Linvoke_userspace_handler
> +
> +       /* Success, in the sense that ENCLU was attempted. */
> +       xor     %eax, %eax
> +
> +.Lout:
> +       pop     %rbx

and this should undo the .cfi_register.

> +       leave
> +       .cfi_def_cfa            %rsp, 8
> +       ret
> +
> +       /* The out-of-line code runs with the pre-leave stack frame. */
> +       .cfi_def_cfa            %rbp, 16
> +
> +.Linvalid_input:

Here rbx and rax are pushed, and I guess pop rbx and leave fixes that
up, so okay.

> +       mov     $(-EINVAL), %eax
> +       jmp     .Lout
> +
> +.Lhandle_exception:
> +       mov     SGX_ENCLAVE_OFFSET_OF_RUN(%rbp), %rbx
> +
> +       /* Set the exception info. */
> +       mov     %eax, (SGX_ENCLAVE_RUN_LEAF)(%rbx)
> +       mov     %di,  (SGX_ENCLAVE_RUN_EXCEPTION_VECTOR)(%rbx)
> +       mov     %si,  (SGX_ENCLAVE_RUN_EXCEPTION_ERROR_CODE)(%rbx)
> +       mov     %rdx, (SGX_ENCLAVE_RUN_EXCEPTION_ADDR)(%rbx)
> +       jmp     .Lhandle_exit
> +
> +.Linvoke_userspace_handler:
> +       /* Pass the untrusted RSP (at exit) to the callback via %rcx. */
> +       mov     %rsp, %rcx
> +
> +       /* Save struct sgx_enclave_exception %rbx is about to be clobbered. */
> +       mov     %rbx, %rax
> +
> +       /* Save the untrusted RSP offset in %rbx (non-volatile register). */
> +       mov     %rsp, %rbx
> +       and     $0xf, %rbx
> +
> +       /*
> +        * Align stack per x86_64 ABI. Note, %rsp needs to be 16-byte aligned
> +        * _after_ pushing the parameters on the stack, hence the bonus push.
> +        */
> +       and     $-0x10, %rsp
> +       push    %rax
> +
> +       /* Push struct sgx_enclave_exception as a param to the callback. */
> +       push    %rax
> +
> +       /* Clear RFLAGS.DF per x86_64 ABI */
> +       cld
> +
> +       /*
> +        * Load the callback pointer to %rax and lfence for LVI (load value
> +        * injection) protection before making the call.
> +        */
> +       mov     SGX_ENCLAVE_RUN_USER_HANDLER(%rax), %rax
> +       lfence
> +       call    *%rax
> +
> +       /* Undo the post-exit %rsp adjustment. */
> +       lea     0x10(%rsp, %rbx), %rsp
> +
> +       /*
> +        * If the return from callback is zero or negative, return immediately,
> +        * else re-execute ENCLU with the postive return value interpreted as
> +        * the requested ENCLU leaf.
> +        */
> +       cmp     $0, %eax
> +       jle     .Lout
> +       jmp     .Lenter_enclave
> +
> +       .cfi_endproc
> +
> +_ASM_VDSO_EXTABLE_HANDLE(.Lenclu_eenter_eresume, .Lhandle_exception)
Jarkko Sakkinen Oct. 17, 2020, 9:02 p.m. UTC | #20
On Fri, Oct 16, 2020 at 06:48:53PM -0700, Andy Lutomirski wrote:
> On Fri, Oct 2, 2020 at 9:51 PM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> >
> > An SGX runtime must be aware of the exceptions, which happen inside an
> > enclave. Introduce a vDSO call that wraps EENTER/ERESUME cycle and returns
> > the CPU exception back to the caller exactly when it happens.
> >
> > Kernel fixups the exception information to RDI, RSI and RDX. The SGX call
> > vDSO handler fills this information to the user provided buffer or
> > alternatively trigger user provided callback at the time of the exception.
> >
> > The calling convention supports providing the parameters in standard RDI
> > RSI, RDX, RCX, R8 and R9 registers, i.e. it is possible to declare the vDSO
> > as a C prototype, but other than that there is no specific support for
> > SystemV ABI. Storing XSAVE etc. is all responsibility of the enclave and
> > the associated run-time.
> >
> > Suggested-by: Andy Lutomirski <luto@amacapital.net>
> > Acked-by: Jethro Beekman <jethro@fortanix.com>
> > Tested-by: Jethro Beekman <jethro@fortanix.com>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > Co-developed-by: Cedric Xing <cedric.xing@intel.com>
> > Signed-off-by: Cedric Xing <cedric.xing@intel.com>
> > Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> 
> > +SYM_FUNC_START(__vdso_sgx_enter_enclave)
> > +       /* Prolog */
> > +       .cfi_startproc
> > +       push    %rbp
> > +       .cfi_adjust_cfa_offset  8
> > +       .cfi_rel_offset         %rbp, 0
> > +       mov     %rsp, %rbp
> > +       .cfi_def_cfa_register   %rbp
> > +       push    %rbx
> > +       .cfi_rel_offset         %rbx, -8
> 
> This *looks* right, but I'm not really an expert.

I did not change this from earlier versions.

> > +
> > +       mov     %ecx, %eax
> > +.Lenter_enclave:
> > +       /* EENTER <= leaf <= ERESUME */
> > +       cmp     $EENTER, %eax
> > +       jb      .Linvalid_input
> > +       cmp     $ERESUME, %eax
> > +       ja      .Linvalid_input
> > +
> > +       mov     SGX_ENCLAVE_OFFSET_OF_RUN(%rbp), %rcx
> > +
> > +       /* Validate that the reserved area contains only zeros. */
> > +       push    %rax
> > +       push    %rbx
> 
> This could use a .cfi_register_something_or_other for rbx

Sean pointed out that saving %rbx is not necessary here:

https://lore.kernel.org/linux-sgx/20201006025703.GG15803@linux.intel.com/

> > +       mov     $SGX_ENCLAVE_RUN_RESERVED_START, %rbx
> > +1:
> > +       mov     (%rcx, %rbx), %rax
> > +       cmpq    $0, %rax
> > +       jne     .Linvalid_input
> > +
> > +       add     $8, %rbx
> > +       cmpq    $SGX_ENCLAVE_RUN_RESERVED_END, %rbx
> > +       jne     1b
> > +       pop     %rbx
> 
> This should undo it.

Given private feedback from Sean, I'm replacing this with:

        mov     $SGX_ENCLAVE_RUN_RESERVED_START, %rbx
 1:
        cmpq    $0, (%rcx, %rbx)
        jne     .Linvalid_input

        add     $8, %rbx
        cmpq    $SGX_ENCLAVE_RUN_RESERVED_END, %rbx
        jne     1b

There was bug in the error path, %rax was not popped. I did negative
testing (testing both branches) for this but it went clean.

I guess if I fix this, that will deal with all of your comments?

> > +       pop     %rax
> > +
> > +       /* Load TCS and AEP */
> > +       mov     SGX_ENCLAVE_RUN_TCS(%rcx), %rbx
> > +       lea     .Lasync_exit_pointer(%rip), %rcx
> > +
> > +       /* Single ENCLU serving as both EENTER and AEP (ERESUME) */
> > +.Lasync_exit_pointer:
> > +.Lenclu_eenter_eresume:
> > +       enclu
> > +
> > +       /* EEXIT jumps here unless the enclave is doing something fancy. */
> > +       mov     SGX_ENCLAVE_OFFSET_OF_RUN(%rbp), %rbx
> > +
> > +       /* Set exit_reason. */
> > +       movl    $EEXIT, SGX_ENCLAVE_RUN_LEAF(%rbx)
> > +
> > +       /* Invoke userspace's exit handler if one was provided. */
> > +.Lhandle_exit:
> > +       cmpq    $0, SGX_ENCLAVE_RUN_USER_HANDLER(%rbx)
> > +       jne     .Linvoke_userspace_handler
> > +
> > +       /* Success, in the sense that ENCLU was attempted. */
> > +       xor     %eax, %eax
> > +
> > +.Lout:
> > +       pop     %rbx
> 
> and this should undo the .cfi_register.
> 
> > +       leave
> > +       .cfi_def_cfa            %rsp, 8
> > +       ret
> > +
> > +       /* The out-of-line code runs with the pre-leave stack frame. */
> > +       .cfi_def_cfa            %rbp, 16
> > +
> > +.Linvalid_input:
> 
> Here rbx and rax are pushed, and I guess pop rbx and leave fixes that
> up, so okay.
> 
> > +       mov     $(-EINVAL), %eax
> > +       jmp     .Lout
> > +
> > +.Lhandle_exception:
> > +       mov     SGX_ENCLAVE_OFFSET_OF_RUN(%rbp), %rbx
> > +
> > +       /* Set the exception info. */
> > +       mov     %eax, (SGX_ENCLAVE_RUN_LEAF)(%rbx)
> > +       mov     %di,  (SGX_ENCLAVE_RUN_EXCEPTION_VECTOR)(%rbx)
> > +       mov     %si,  (SGX_ENCLAVE_RUN_EXCEPTION_ERROR_CODE)(%rbx)
> > +       mov     %rdx, (SGX_ENCLAVE_RUN_EXCEPTION_ADDR)(%rbx)
> > +       jmp     .Lhandle_exit
> > +
> > +.Linvoke_userspace_handler:
> > +       /* Pass the untrusted RSP (at exit) to the callback via %rcx. */
> > +       mov     %rsp, %rcx
> > +
> > +       /* Save struct sgx_enclave_exception %rbx is about to be clobbered. */
> > +       mov     %rbx, %rax
> > +
> > +       /* Save the untrusted RSP offset in %rbx (non-volatile register). */
> > +       mov     %rsp, %rbx
> > +       and     $0xf, %rbx
> > +
> > +       /*
> > +        * Align stack per x86_64 ABI. Note, %rsp needs to be 16-byte aligned
> > +        * _after_ pushing the parameters on the stack, hence the bonus push.
> > +        */
> > +       and     $-0x10, %rsp
> > +       push    %rax
> > +
> > +       /* Push struct sgx_enclave_exception as a param to the callback. */
> > +       push    %rax
> > +
> > +       /* Clear RFLAGS.DF per x86_64 ABI */
> > +       cld
> > +
> > +       /*
> > +        * Load the callback pointer to %rax and lfence for LVI (load value
> > +        * injection) protection before making the call.
> > +        */
> > +       mov     SGX_ENCLAVE_RUN_USER_HANDLER(%rax), %rax
> > +       lfence
> > +       call    *%rax
> > +
> > +       /* Undo the post-exit %rsp adjustment. */
> > +       lea     0x10(%rsp, %rbx), %rsp
> > +
> > +       /*
> > +        * If the return from callback is zero or negative, return immediately,
> > +        * else re-execute ENCLU with the postive return value interpreted as
> > +        * the requested ENCLU leaf.
> > +        */
> > +       cmp     $0, %eax
> > +       jle     .Lout
> > +       jmp     .Lenter_enclave
> > +
> > +       .cfi_endproc
> > +
> > +_ASM_VDSO_EXTABLE_HANDLE(.Lenclu_eenter_eresume, .Lhandle_exception)

/Jarkko
diff mbox series

Patch

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 3f183d0b8826..27e7635e31d3 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -29,6 +29,7 @@  VDSO32-$(CONFIG_IA32_EMULATION)	:= y
 vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
 vobjs32-y := vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
 vobjs32-y += vdso32/vclock_gettime.o
+vobjs-$(VDSO64-y)		+= vsgx.o
 
 # files to link into kernel
 obj-y				+= vma.o extable.o
@@ -100,6 +101,7 @@  $(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CFLAGS
 CFLAGS_REMOVE_vclock_gettime.o = -pg
 CFLAGS_REMOVE_vdso32/vclock_gettime.o = -pg
 CFLAGS_REMOVE_vgetcpu.o = -pg
+CFLAGS_REMOVE_vsgx.o = -pg
 
 #
 # X32 processes use x32 vDSO to access 64bit kernel data.
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index 36b644e16272..4bf48462fca7 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -27,6 +27,7 @@  VERSION {
 		__vdso_time;
 		clock_getres;
 		__vdso_clock_getres;
+		__vdso_sgx_enter_enclave;
 	local: *;
 	};
 }
diff --git a/arch/x86/entry/vdso/vsgx.S b/arch/x86/entry/vdso/vsgx.S
new file mode 100644
index 000000000000..5c047e588f16
--- /dev/null
+++ b/arch/x86/entry/vdso/vsgx.S
@@ -0,0 +1,157 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/linkage.h>
+#include <asm/export.h>
+#include <asm/errno.h>
+#include <asm/enclu.h>
+
+#include "extable.h"
+
+/* Relative to %rbp. */
+#define SGX_ENCLAVE_OFFSET_OF_RUN		16
+
+/* The offsets relative to struct sgx_enclave_run. */
+#define SGX_ENCLAVE_RUN_TCS			0
+#define SGX_ENCLAVE_RUN_USER_HANDLER		8
+#define SGX_ENCLAVE_RUN_USER_DATA		16 /* unused */
+#define SGX_ENCLAVE_RUN_LEAF			24
+#define SGX_ENCLAVE_RUN_EXCEPTION_VECTOR	28
+#define SGX_ENCLAVE_RUN_EXCEPTION_ERROR_CODE	30
+#define SGX_ENCLAVE_RUN_EXCEPTION_ADDR		32
+#define SGX_ENCLAVE_RUN_RESERVED_START		40
+#define SGX_ENCLAVE_RUN_RESERVED_END		64
+
+.code64
+.section .text, "ax"
+
+SYM_FUNC_START(__vdso_sgx_enter_enclave)
+	/* Prolog */
+	.cfi_startproc
+	push	%rbp
+	.cfi_adjust_cfa_offset	8
+	.cfi_rel_offset		%rbp, 0
+	mov	%rsp, %rbp
+	.cfi_def_cfa_register	%rbp
+	push	%rbx
+	.cfi_rel_offset		%rbx, -8
+
+	mov	%ecx, %eax
+.Lenter_enclave:
+	/* EENTER <= leaf <= ERESUME */
+	cmp	$EENTER, %eax
+	jb	.Linvalid_input
+	cmp	$ERESUME, %eax
+	ja	.Linvalid_input
+
+	mov	SGX_ENCLAVE_OFFSET_OF_RUN(%rbp), %rcx
+
+	/* Validate that the reserved area contains only zeros. */
+	push	%rax
+	push	%rbx
+	mov	$SGX_ENCLAVE_RUN_RESERVED_START, %rbx
+1:
+	mov	(%rcx, %rbx), %rax
+	cmpq	$0, %rax
+	jne	.Linvalid_input
+
+	add	$8, %rbx
+	cmpq	$SGX_ENCLAVE_RUN_RESERVED_END, %rbx
+	jne	1b
+	pop	%rbx
+	pop	%rax
+
+	/* Load TCS and AEP */
+	mov	SGX_ENCLAVE_RUN_TCS(%rcx), %rbx
+	lea	.Lasync_exit_pointer(%rip), %rcx
+
+	/* Single ENCLU serving as both EENTER and AEP (ERESUME) */
+.Lasync_exit_pointer:
+.Lenclu_eenter_eresume:
+	enclu
+
+	/* EEXIT jumps here unless the enclave is doing something fancy. */
+	mov	SGX_ENCLAVE_OFFSET_OF_RUN(%rbp), %rbx
+
+	/* Set exit_reason. */
+	movl	$EEXIT, SGX_ENCLAVE_RUN_LEAF(%rbx)
+
+	/* Invoke userspace's exit handler if one was provided. */
+.Lhandle_exit:
+	cmpq	$0, SGX_ENCLAVE_RUN_USER_HANDLER(%rbx)
+	jne	.Linvoke_userspace_handler
+
+	/* Success, in the sense that ENCLU was attempted. */
+	xor	%eax, %eax
+
+.Lout:
+	pop	%rbx
+	leave
+	.cfi_def_cfa		%rsp, 8
+	ret
+
+	/* The out-of-line code runs with the pre-leave stack frame. */
+	.cfi_def_cfa		%rbp, 16
+
+.Linvalid_input:
+	mov	$(-EINVAL), %eax
+	jmp	.Lout
+
+.Lhandle_exception:
+	mov	SGX_ENCLAVE_OFFSET_OF_RUN(%rbp), %rbx
+
+	/* Set the exception info. */
+	mov	%eax, (SGX_ENCLAVE_RUN_LEAF)(%rbx)
+	mov	%di,  (SGX_ENCLAVE_RUN_EXCEPTION_VECTOR)(%rbx)
+	mov	%si,  (SGX_ENCLAVE_RUN_EXCEPTION_ERROR_CODE)(%rbx)
+	mov	%rdx, (SGX_ENCLAVE_RUN_EXCEPTION_ADDR)(%rbx)
+	jmp	.Lhandle_exit
+
+.Linvoke_userspace_handler:
+	/* Pass the untrusted RSP (at exit) to the callback via %rcx. */
+	mov	%rsp, %rcx
+
+	/* Save struct sgx_enclave_exception %rbx is about to be clobbered. */
+	mov	%rbx, %rax
+
+	/* Save the untrusted RSP offset in %rbx (non-volatile register). */
+	mov	%rsp, %rbx
+	and	$0xf, %rbx
+
+	/*
+	 * Align stack per x86_64 ABI. Note, %rsp needs to be 16-byte aligned
+	 * _after_ pushing the parameters on the stack, hence the bonus push.
+	 */
+	and	$-0x10, %rsp
+	push	%rax
+
+	/* Push struct sgx_enclave_exception as a param to the callback. */
+	push	%rax
+
+	/* Clear RFLAGS.DF per x86_64 ABI */
+	cld
+
+	/*
+	 * Load the callback pointer to %rax and lfence for LVI (load value
+	 * injection) protection before making the call.
+	 */
+	mov	SGX_ENCLAVE_RUN_USER_HANDLER(%rax), %rax
+	lfence
+	call	*%rax
+
+	/* Undo the post-exit %rsp adjustment. */
+	lea	0x10(%rsp, %rbx), %rsp
+
+	/*
+	 * If the return from callback is zero or negative, return immediately,
+	 * else re-execute ENCLU with the postive return value interpreted as
+	 * the requested ENCLU leaf.
+	 */
+	cmp	$0, %eax
+	jle	.Lout
+	jmp	.Lenter_enclave
+
+	.cfi_endproc
+
+_ASM_VDSO_EXTABLE_HANDLE(.Lenclu_eenter_eresume, .Lhandle_exception)
+
+SYM_FUNC_END(__vdso_sgx_enter_enclave)
diff --git a/arch/x86/include/asm/enclu.h b/arch/x86/include/asm/enclu.h
new file mode 100644
index 000000000000..b1314e41a744
--- /dev/null
+++ b/arch/x86/include/asm/enclu.h
@@ -0,0 +1,9 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_ENCLU_H
+#define _ASM_X86_ENCLU_H
+
+#define EENTER	0x02
+#define ERESUME	0x03
+#define EEXIT	0x04
+
+#endif /* _ASM_X86_ENCLU_H */
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index b6ba036a9b82..3dd2df44d569 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -74,4 +74,102 @@  struct sgx_enclave_provision {
 	__u64 attribute_fd;
 };
 
+struct sgx_enclave_run;
+
+/**
+ * typedef sgx_enclave_user_handler_t - Exit handler function accepted by
+ *					__vdso_sgx_enter_enclave()
+ * @run:	Pointer to the caller provided struct sgx_enclave_run
+ *
+ * The register parameters contain the snapshot of their values at enclave
+ * exit
+ *
+ * Return:
+ *  0 or negative to exit vDSO
+ *  positive to re-enter enclave (must be EENTER or ERESUME leaf)
+ */
+typedef int (*sgx_enclave_user_handler_t)(long rdi, long rsi, long rdx,
+					  long rsp, long r8, long r9,
+					  struct sgx_enclave_run *run);
+
+/**
+ * struct sgx_enclave_run - the execution context of __vdso_sgx_enter_enclave()
+ * @tcs:			TCS used to enter the enclave
+ * @user_handler:		User provided callback run on exception
+ * @user_data:			Data passed to the user handler
+ * @leaf:			The ENCLU leaf we were at (EENTER/ERESUME/EEXIT)
+ * @exception_vector:		The interrupt vector of the exception
+ * @exception_error_code:	The exception error code pulled out of the stack
+ * @exception_addr:		The address that triggered the exception
+ * @reserved			Reserved for possible future use
+ */
+struct sgx_enclave_run {
+	__u64 tcs;
+	__u64 user_handler;
+	__u64 user_data;
+	__u32 leaf;
+	__u16 exception_vector;
+	__u16 exception_error_code;
+	__u64 exception_addr;
+	__u8  reserved[24];
+};
+
+/**
+ * typedef vdso_sgx_enter_enclave_t - Prototype for __vdso_sgx_enter_enclave(),
+ *				      a vDSO function to enter an SGX enclave.
+ * @rdi:	Pass-through value for RDI
+ * @rsi:	Pass-through value for RSI
+ * @rdx:	Pass-through value for RDX
+ * @leaf:	ENCLU leaf, must be EENTER or ERESUME
+ * @r8:		Pass-through value for R8
+ * @r9:		Pass-through value for R9
+ * @run:	struct sgx_enclave_run, must be non-NULL
+ *
+ * NOTE: __vdso_sgx_enter_enclave() does not ensure full compliance with the
+ * x86-64 ABI, e.g. doesn't handle XSAVE state.  Except for non-volatile
+ * general purpose registers, EFLAGS.DF, and RSP alignment, preserving/setting
+ * state in accordance with the x86-64 ABI is the responsibility of the enclave
+ * and its runtime, i.e. __vdso_sgx_enter_enclave() cannot be called from C
+ * code without careful consideration by both the enclave and its runtime.
+ *
+ * All general purpose registers except RAX, RBX and RCX are passed as-is to
+ * the enclave.  RAX, RBX and RCX are consumed by EENTER and ERESUME and are
+ * loaded with @leaf, asynchronous exit pointer, and @run.tcs respectively.
+ *
+ * RBP and the stack are used to anchor __vdso_sgx_enter_enclave() to the
+ * pre-enclave state, e.g. to retrieve @run.exception and @run.user_handler
+ * after an enclave exit.  All other registers are available for use by the
+ * enclave and its runtime, e.g. an enclave can push additional data onto the
+ * stack (and modify RSP) to pass information to the optional user handler (see
+ * below).
+ *
+ * Most exceptions reported on ENCLU, including those that occur within the
+ * enclave, are fixed up and reported synchronously instead of being delivered
+ * via a standard signal. Debug Exceptions (#DB) and Breakpoints (#BP) are
+ * never fixed up and are always delivered via standard signals. On synchrously
+ * reported exceptions, -EFAULT is returned and details about the exception are
+ * recorded in @run.exception, the optional sgx_enclave_exception struct.
+ *
+ * If a user handler is provided, the handler will be invoked on synchronous
+ * exits from the enclave and for all synchronously reported exceptions. In
+ * latter case, @run.exception is filled prior to invoking the handler.
+ *
+ * The user handler's return value is interpreted as follows:
+ *  >0:		continue, restart __vdso_sgx_enter_enclave() with @ret as @leaf
+ *   0:		success, return @ret to the caller
+ *  <0:		error, return @ret to the caller
+ *
+ * The user handler may transfer control, e.g. via longjmp() or C++ exception,
+ * without returning to __vdso_sgx_enter_enclave().
+ *
+ * Return:
+ *  0 on success (ENCLU reached),
+ *  -EINVAL if ENCLU leaf is not allowed,
+ *  -errno for all other negative values returned by the userspace user handler
+ */
+typedef int (*vdso_sgx_enter_enclave_t)(unsigned long rdi, unsigned long rsi,
+					unsigned long rdx, unsigned int leaf,
+					unsigned long r8,  unsigned long r9,
+					struct sgx_enclave_run *run);
+
 #endif /* _UAPI_ASM_X86_SGX_H */