diff mbox series

[v36,21/24] x86/vdso: Implement a vDSO for Intel SGX enclave call

Message ID 20200716135303.276442-22-jarkko.sakkinen@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Intel SGX foundations | expand

Commit Message

Jarkko Sakkinen July 16, 2020, 1:53 p.m. UTC
From: Sean Christopherson <sean.j.christopherson@intel.com>

An SGX runtime must be aware of the exceptions, which happen inside an
enclave. Introduce a vDSO call that wraps EENTER/ERESUME cycle and returns
the CPU exception back to the caller exactly when it happens.

Kernel fixups the exception information to RDI, RSI and RDX. The SGX call
vDSO handler fills this information to the user provided buffer or
alternatively trigger user provided callback at the time of the exception.

The calling convention is custom and does not follow System V x86-64 ABI.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Jethro Beekman <jethro@fortanix.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Cedric Xing <cedric.xing@intel.com>
Signed-off-by: Cedric Xing <cedric.xing@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/entry/vdso/Makefile             |   2 +
 arch/x86/entry/vdso/vdso.lds.S           |   1 +
 arch/x86/entry/vdso/vsgx_enter_enclave.S | 131 +++++++++++++++++++++++
 arch/x86/include/asm/enclu.h             |   8 ++
 arch/x86/include/uapi/asm/sgx.h          |  98 +++++++++++++++++
 5 files changed, 240 insertions(+)
 create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S
 create mode 100644 arch/x86/include/asm/enclu.h

Comments

Nathaniel McCallum Aug. 6, 2020, 2:55 p.m. UTC | #1
In a past revision of this patch, I had requested a void *misc
parameter that could be passed through vdso_sgx_enter_enclave_t into
sgx_enclave_exit_handler_t. This request encountered some push back
and I dropped the issue. However, I'd like to revisit it or something
similar.

One way to create a generic interface to SGX is to pass a structure
that captures the relevant CPU state from the handler so that it can
be evaluated in C code before reentry. Precedent for this approach can
be found in struct kvm_run[0]. Currently, however, there is no way to
pass a pointer to such a structure directly into the handler.

This can be done implicitly by wrapping the struct
sgx_enclave_exception in another structure and then using techniques
like container_of() to find another field. However, this is made more
difficult by the fact that the sgx_enclave_exit_handler_t is not
really using the x86_64 sysv calling convention. Therefore, the
sgx_enclave_exit_handler_t MUST be written in assembly. This also
implies that we can't use techniques like container_of() and must
calculate all the offsets manually, which is tedious, error prone and
fragile.

This is further complicated by the fact that I'm using Rust (as are a
number of other consumers), which has no native offsetof support (yes,
there is a crate for it, but it requires a number of complex
strategies to defeat the compiler which aren't great) and therefore no
container_of() support.

We could design a standard struct for this (similar to struct
kvm_run). But in order to keep performance in check we'd have to
define a limited ABI surface (to avoid things like xsave) which
wouldn't have the full flexibility of the current approach. This would
allow for a kernel provided vDSO function with a normal calling
convention, however (which does have some non-trivial value). I think
this is a trade-off we should consider (perhaps making it optional?).

But at the least, allowing a pass-through void *misc would reduce the
complexity of the assembly calculations.

[0]: https://github.com/torvalds/linux/blob/master/include/uapi/linux/kvm.h#L263

On Thu, Jul 16, 2020 at 9:58 AM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> From: Sean Christopherson <sean.j.christopherson@intel.com>
>
> An SGX runtime must be aware of the exceptions, which happen inside an
> enclave. Introduce a vDSO call that wraps EENTER/ERESUME cycle and returns
> the CPU exception back to the caller exactly when it happens.
>
> Kernel fixups the exception information to RDI, RSI and RDX. The SGX call
> vDSO handler fills this information to the user provided buffer or
> alternatively trigger user provided callback at the time of the exception.
>
> The calling convention is custom and does not follow System V x86-64 ABI.
>
> Suggested-by: Andy Lutomirski <luto@amacapital.net>
> Acked-by: Jethro Beekman <jethro@fortanix.com>
> Tested-by: Jethro Beekman <jethro@fortanix.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Co-developed-by: Cedric Xing <cedric.xing@intel.com>
> Signed-off-by: Cedric Xing <cedric.xing@intel.com>
> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> ---
>  arch/x86/entry/vdso/Makefile             |   2 +
>  arch/x86/entry/vdso/vdso.lds.S           |   1 +
>  arch/x86/entry/vdso/vsgx_enter_enclave.S | 131 +++++++++++++++++++++++
>  arch/x86/include/asm/enclu.h             |   8 ++
>  arch/x86/include/uapi/asm/sgx.h          |  98 +++++++++++++++++
>  5 files changed, 240 insertions(+)
>  create mode 100644 arch/x86/entry/vdso/vsgx_enter_enclave.S
>  create mode 100644 arch/x86/include/asm/enclu.h
>
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index ebe82b7aecda..f71ad5ebd0c4 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -29,6 +29,7 @@ VDSO32-$(CONFIG_IA32_EMULATION)       := y
>  vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
>  vobjs32-y := vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
>  vobjs32-y += vdso32/vclock_gettime.o
> +vobjs-$(VDSO64-y)              += vsgx_enter_enclave.o
>
>  # files to link into kernel
>  obj-y                          += vma.o extable.o
> @@ -100,6 +101,7 @@ $(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CFLAGS
>  CFLAGS_REMOVE_vclock_gettime.o = -pg
>  CFLAGS_REMOVE_vdso32/vclock_gettime.o = -pg
>  CFLAGS_REMOVE_vgetcpu.o = -pg
> +CFLAGS_REMOVE_vsgx_enter_enclave.o = -pg
>
>  #
>  # X32 processes use x32 vDSO to access 64bit kernel data.
> diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
> index 36b644e16272..4bf48462fca7 100644
> --- a/arch/x86/entry/vdso/vdso.lds.S
> +++ b/arch/x86/entry/vdso/vdso.lds.S
> @@ -27,6 +27,7 @@ VERSION {
>                 __vdso_time;
>                 clock_getres;
>                 __vdso_clock_getres;
> +               __vdso_sgx_enter_enclave;
>         local: *;
>         };
>  }
> diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.S b/arch/x86/entry/vdso/vsgx_enter_enclave.S
> new file mode 100644
> index 000000000000..be7e467e1efb
> --- /dev/null
> +++ b/arch/x86/entry/vdso/vsgx_enter_enclave.S
> @@ -0,0 +1,131 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#include <linux/linkage.h>
> +#include <asm/export.h>
> +#include <asm/errno.h>
> +#include <asm/enclu.h>
> +
> +#include "extable.h"
> +
> +#define EX_LEAF                0*8
> +#define EX_TRAPNR      0*8+4
> +#define EX_ERROR_CODE  0*8+6
> +#define EX_ADDRESS     1*8
> +
> +.code64
> +.section .text, "ax"
> +
> +SYM_FUNC_START(__vdso_sgx_enter_enclave)
> +       /* Prolog */
> +       .cfi_startproc
> +       push    %rbp
> +       .cfi_adjust_cfa_offset  8
> +       .cfi_rel_offset         %rbp, 0
> +       mov     %rsp, %rbp
> +       .cfi_def_cfa_register   %rbp
> +       push    %rbx
> +       .cfi_rel_offset         %rbx, -8
> +
> +       mov     %ecx, %eax
> +.Lenter_enclave:
> +       /* EENTER <= leaf <= ERESUME */
> +       cmp     $EENTER, %eax
> +       jb      .Linvalid_leaf
> +       cmp     $ERESUME, %eax
> +       ja      .Linvalid_leaf
> +
> +       /* Load TCS and AEP */
> +       mov     0x10(%rbp), %rbx
> +       lea     .Lasync_exit_pointer(%rip), %rcx
> +
> +       /* Single ENCLU serving as both EENTER and AEP (ERESUME) */
> +.Lasync_exit_pointer:
> +.Lenclu_eenter_eresume:
> +       enclu
> +
> +       /* EEXIT jumps here unless the enclave is doing something fancy. */
> +       xor     %eax, %eax
> +
> +       /* Invoke userspace's exit handler if one was provided. */
> +.Lhandle_exit:
> +       cmp     $0, 0x20(%rbp)
> +       jne     .Linvoke_userspace_handler
> +
> +.Lout:
> +       pop     %rbx
> +       leave
> +       .cfi_def_cfa            %rsp, 8
> +       ret
> +
> +       /* The out-of-line code runs with the pre-leave stack frame. */
> +       .cfi_def_cfa            %rbp, 16
> +
> +.Linvalid_leaf:
> +       mov     $(-EINVAL), %eax
> +       jmp     .Lout
> +
> +.Lhandle_exception:
> +       mov     0x18(%rbp), %rcx
> +       test    %rcx, %rcx
> +       je      .Lskip_exception_info
> +
> +       /* Fill optional exception info. */
> +       mov     %eax, EX_LEAF(%rcx)
> +       mov     %di,  EX_TRAPNR(%rcx)
> +       mov     %si,  EX_ERROR_CODE(%rcx)
> +       mov     %rdx, EX_ADDRESS(%rcx)
> +.Lskip_exception_info:
> +       mov     $(-EFAULT), %eax
> +       jmp     .Lhandle_exit
> +
> +.Linvoke_userspace_handler:
> +       /* Pass the untrusted RSP (at exit) to the callback via %rcx. */
> +       mov     %rsp, %rcx
> +
> +       /* Save the untrusted RSP offset in %rbx (non-volatile register). */
> +       mov     %rsp, %rbx
> +       and     $0xf, %rbx
> +
> +       /*
> +        * Align stack per x86_64 ABI. Note, %rsp needs to be 16-byte aligned
> +        * _after_ pushing the parameters on the stack, hence the bonus push.
> +        */
> +       and     $-0x10, %rsp
> +       push    %rax
> +
> +       /* Push @e, the "return" value and @tcs as params to the callback. */
> +       push    0x18(%rbp)
> +       push    %rax
> +       push    0x10(%rbp)
> +
> +       /* Clear RFLAGS.DF per x86_64 ABI */
> +       cld
> +
> +       /* Load the callback pointer to %rax and invoke it via retpoline. */
> +       mov     0x20(%rbp), %rax
> +       call    .Lretpoline
> +
> +       /* Undo the post-exit %rsp adjustment. */
> +       lea     0x20(%rsp, %rbx), %rsp
> +
> +       /*
> +        * If the return from callback is zero or negative, return immediately,
> +        * else re-execute ENCLU with the postive return value interpreted as
> +        * the requested ENCLU leaf.
> +        */
> +       cmp     $0, %eax
> +       jle     .Lout
> +       jmp     .Lenter_enclave
> +
> +.Lretpoline:
> +       call    2f
> +1:     pause
> +       lfence
> +       jmp     1b
> +2:     mov     %rax, (%rsp)
> +       ret
> +       .cfi_endproc
> +
> +_ASM_VDSO_EXTABLE_HANDLE(.Lenclu_eenter_eresume, .Lhandle_exception)
> +
> +SYM_FUNC_END(__vdso_sgx_enter_enclave)
> diff --git a/arch/x86/include/asm/enclu.h b/arch/x86/include/asm/enclu.h
> new file mode 100644
> index 000000000000..06157b3e9ede
> --- /dev/null
> +++ b/arch/x86/include/asm/enclu.h
> @@ -0,0 +1,8 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_ENCLU_H
> +#define _ASM_X86_ENCLU_H
> +
> +#define EENTER 0x02
> +#define ERESUME        0x03
> +
> +#endif /* _ASM_X86_ENCLU_H */
> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
> index 57d0d30c79b3..3760e5d5dc0c 100644
> --- a/arch/x86/include/uapi/asm/sgx.h
> +++ b/arch/x86/include/uapi/asm/sgx.h
> @@ -74,4 +74,102 @@ struct sgx_enclave_set_attribute {
>         __u64 attribute_fd;
>  };
>
> +/**
> + * struct sgx_enclave_exception - structure to report exceptions encountered in
> + *                               __vdso_sgx_enter_enclave()
> + *
> + * @leaf:      ENCLU leaf from \%eax at time of exception
> + * @trapnr:    exception trap number, a.k.a. fault vector
> + * @error_code:        exception error code
> + * @address:   exception address, e.g. CR2 on a #PF
> + * @reserved:  reserved for future use
> + */
> +struct sgx_enclave_exception {
> +       __u32 leaf;
> +       __u16 trapnr;
> +       __u16 error_code;
> +       __u64 address;
> +       __u64 reserved[2];
> +};
> +
> +/**
> + * typedef sgx_enclave_exit_handler_t - Exit handler function accepted by
> + *                                     __vdso_sgx_enter_enclave()
> + *
> + * @rdi:       RDI at the time of enclave exit
> + * @rsi:       RSI at the time of enclave exit
> + * @rdx:       RDX at the time of enclave exit
> + * @ursp:      RSP at the time of enclave exit (untrusted stack)
> + * @r8:                R8 at the time of enclave exit
> + * @r9:                R9 at the time of enclave exit
> + * @tcs:       Thread Control Structure used to enter enclave
> + * @ret:       0 on success (EEXIT), -EFAULT on an exception
> + * @e:         Pointer to struct sgx_enclave_exception (as provided by caller)
> + */
> +typedef int (*sgx_enclave_exit_handler_t)(long rdi, long rsi, long rdx,
> +                                         long ursp, long r8, long r9,
> +                                         void *tcs, int ret,
> +                                         struct sgx_enclave_exception *e);
> +
> +/**
> + * __vdso_sgx_enter_enclave() - Enter an SGX enclave
> + * @rdi:       Pass-through value for RDI
> + * @rsi:       Pass-through value for RSI
> + * @rdx:       Pass-through value for RDX
> + * @leaf:      ENCLU leaf, must be EENTER or ERESUME
> + * @r8:                Pass-through value for R8
> + * @r9:                Pass-through value for R9
> + * @tcs:       TCS, must be non-NULL
> + * @e:         Optional struct sgx_enclave_exception instance
> + * @handler:   Optional enclave exit handler
> + *
> + * NOTE: __vdso_sgx_enter_enclave() does not ensure full compliance with the
> + * x86-64 ABI, e.g. doesn't explicitly clear EFLAGS.DF after EEXIT.  Except for
> + * non-volatile general purpose registers, preserving/setting state in
> + * accordance with the x86-64 ABI is the responsibility of the enclave and its
> + * runtime, i.e. __vdso_sgx_enter_enclave() cannot be called from C code
> + * without careful consideration by both the enclave and its runtime.
> + *
> + * All general purpose registers except RAX, RBX and RCX are passed as-is to
> + * the enclave.  RAX, RBX and RCX are consumed by EENTER and ERESUME and are
> + * loaded with @leaf, asynchronous exit pointer, and @tcs respectively.
> + *
> + * RBP and the stack are used to anchor __vdso_sgx_enter_enclave() to the
> + * pre-enclave state, e.g. to retrieve @e and @handler after an enclave exit.
> + * All other registers are available for use by the enclave and its runtime,
> + * e.g. an enclave can push additional data onto the stack (and modify RSP) to
> + * pass information to the optional exit handler (see below).
> + *
> + * Most exceptions reported on ENCLU, including those that occur within the
> + * enclave, are fixed up and reported synchronously instead of being delivered
> + * via a standard signal. Debug Exceptions (#DB) and Breakpoints (#BP) are
> + * never fixed up and are always delivered via standard signals. On synchrously
> + * reported exceptions, -EFAULT is returned and details about the exception are
> + * recorded in @e, the optional sgx_enclave_exception struct.
> +
> + * If an exit handler is provided, the handler will be invoked on synchronous
> + * exits from the enclave and for all synchronously reported exceptions. In
> + * latter case, @e is filled prior to invoking the handler.
> + *
> + * The exit handler's return value is interpreted as follows:
> + *  >0:                continue, restart __vdso_sgx_enter_enclave() with @ret as @leaf
> + *   0:                success, return @ret to the caller
> + *  <0:                error, return @ret to the caller
> + *
> + * The exit handler may transfer control, e.g. via longjmp() or C++ exception,
> + * without returning to __vdso_sgx_enter_enclave().
> + *
> + * Return:
> + *  0 on success,
> + *  -EINVAL if ENCLU leaf is not allowed,
> + *  -EFAULT if an exception occurs on ENCLU or within the enclave
> + *  -errno for all other negative values returned by the userspace exit handler
> + */
> +typedef int (*vdso_sgx_enter_enclave_t)(unsigned long rdi, unsigned long rsi,
> +                                       unsigned long rdx, unsigned int leaf,
> +                                       unsigned long r8,  unsigned long r9,
> +                                       void *tcs,
> +                                       struct sgx_enclave_exception *e,
> +                                       sgx_enclave_exit_handler_t handler);
> +
>  #endif /* _UAPI_ASM_X86_SGX_H */
> --
> 2.25.1
>
Sean Christopherson Aug. 10, 2020, 10:23 p.m. UTC | #2
On Thu, Aug 06, 2020 at 10:55:43AM -0400, Nathaniel McCallum wrote:
> In a past revision of this patch, I had requested a void *misc
> parameter that could be passed through vdso_sgx_enter_enclave_t into
> sgx_enclave_exit_handler_t. This request encountered some push back
> and I dropped the issue. However, I'd like to revisit it or something
> similar.
> 
> One way to create a generic interface to SGX is to pass a structure
> that captures the relevant CPU state from the handler so that it can
> be evaluated in C code before reentry. Precedent for this approach can
> be found in struct kvm_run[0]. Currently, however, there is no way to
> pass a pointer to such a structure directly into the handler.

The context switching aspect of kvm_run isn't a great template.  kvm_run
allows the VMM to get/set a limited amount of vCPU state without having to
invoke separate ioctls(), i.e. it's it's purely an optimization.  KVM also
needs to context switch guests state regardless of the ability to get/set
state via kvm_run, whereas this vDSO case doesn't _need_ to insert itself
between the runtime and its enclave.

The flow control and exit reporting aspect of kvm_run are relevant though.
More thoughts on that part at the end.

> This can be done implicitly by wrapping the struct
> sgx_enclave_exception in another structure and then using techniques
> like container_of() to find another field. However, this is made more
> difficult by the fact that the sgx_enclave_exit_handler_t is not
> really using the x86_64 sysv calling convention. Therefore, the
> sgx_enclave_exit_handler_t MUST be written in assembly.

What bits of the x86-64 ABI require writing the handler in assembly?  There
are certainly restrictions on what the handler can do without needing an
assembly trampoline, but I was under the impression that vanilla C code is
compatible with the exit handler patch.  Is Rust more picky about calling
convention?

Side topic, the documentation for vdso_sgx_enter_enclave_t is wrong, it
states the EFLAGS.DF is not cleared before invoking the handler, but that's
a lie.

> This also implies that we can't use techniques like container_of() and must
> calculate all the offsets manually, which is tedious, error prone and
> fragile.
> 
> This is further complicated by the fact that I'm using Rust (as are a
> number of other consumers), which has no native offsetof support (yes,
> there is a crate for it, but it requires a number of complex
> strategies to defeat the compiler which aren't great) and therefore no
> container_of() support.
> 
> We could design a standard struct for this (similar to struct
> kvm_run). But in order to keep performance in check we'd have to
> define a limited ABI surface (to avoid things like xsave) which
> wouldn't have the full flexibility of the current approach. This would
> allow for a kernel provided vDSO function with a normal calling
> convention, however (which does have some non-trivial value). I think
> this is a trade-off we should consider (perhaps making it optional?).
> 
> But at the least, allowing a pass-through void *misc would reduce the
> complexity of the assembly calculations.

I'm not opposed to adding a pass-through param, it's literally one line
and an extra PUSH <mem> in the exit handler path. 

Another thought would be to wrap sgx_enclave_exception in a struct to give
room for supporting additional exit information (if such a thing ever pops
up) and to allow the caller to opt in to select behavior, e.g. Jethro's
request to invoke the exit handler on IRQ exits.  This is basically the
equivalent of "struct kvm_run", minus the vCPU/enclave state.

Such a struct could also be used to avoid using -EFAULT for the "fault in
enclave" exit path, which I believe Andy isn't a fan of, by having an
explicit "exit_reason" field with arbitrary, dedicated exit codes, and
defining "success" as making it to ENCLU, i.e. returning '0' when there is
no exit handler if ENCLU is attempted.

E.g.:

struct sgx_enter_enclave {
        __u64 tcs;
        __u64 flags;

	__u32 exit_leaf;	/* output only */
	__u32 exit_reason;

	__u64 user_handler;
	__u64 user_data;

        union {
                struct sgx_enclave_exception {
                        __u16 trapnr;
                        __u16 error_code;
                        __u32 reserved32;
                        __u64 address;
                };

                __u8 pad[256];  /* 100% arbitrary */
        };
}

typedef int (*vdso_sgx_enter_enclave_t)(unsigned long rdi, unsigned long rsi,
                                        unsigned long rdx, unsigned int leaf,
                                        unsigned long r8,  unsigned long r9,
                                        struct sgx_enter_enclave *e);


The exit handler could then be:

typedef int (*sgx_enclave_exit_handler_t)(long rdi, long rsi, long rdx,
                                          long ursp, long r8, long r9,
                                          struct sgx_enter_enclave *e);

or if Rust doesn't like casting user_data:

typedef int (*sgx_enclave_exit_handler_t)(long rdi, long rsi, long rdx,
                                          long ursp, long r8, long r9,
                                          struct sgx_enter_enclave *e
					  void *user_data);
Andy Lutomirski Aug. 10, 2020, 11:08 p.m. UTC | #3
On Thu, Aug 6, 2020 at 7:55 AM Nathaniel McCallum <npmccallum@redhat.com> wrote:
>
> In a past revision of this patch, I had requested a void *misc
> parameter that could be passed through vdso_sgx_enter_enclave_t into
> sgx_enclave_exit_handler_t. This request encountered some push back
> and I dropped the issue. However, I'd like to revisit it or something
> similar.

Why do you need an exit handler at all?  IIRC way back when I
suggested that we simply not support it at all.  If you want to
call__vdso_sgx_enter_enclave() in a loop, call it in a loop.  If you
want to wrap it intelligently in Rust, you don't want a callback
anyway -- that forces you have an FFI (or non-Rust, anyway) frame on
the stack, which interacts poorly with panic handling and prevents you
from using await in your Rust callback handler.  If, on the other
hand, you just call __vdso_sg_enter_enclave() in a loop, all these
problems go away and, if you really want, you can pass in a callback
in Rust and call the callback from Rust.

What am I missing?  I still don't really understand why we are
supporting this mechanism at all.  Just the asm code to invoke the
callback seems to be about half of the entire function.
Sean Christopherson Aug. 10, 2020, 11:48 p.m. UTC | #4
On Mon, Aug 10, 2020 at 04:08:46PM -0700, Andy Lutomirski wrote:
> What am I missing?  I still don't really understand why we are
> supporting this mechanism at all.  Just the asm code to invoke the
> callback seems to be about half of the entire function.

Because the Intel SDK (and other SDKs?) wants to use the host stack to pass
parameters out of the enclave.
Andy Lutomirski Aug. 11, 2020, 12:52 a.m. UTC | #5
> On Aug 10, 2020, at 4:48 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Mon, Aug 10, 2020 at 04:08:46PM -0700, Andy Lutomirski wrote:
>> What am I missing?  I still don't really understand why we are
>> supporting this mechanism at all.  Just the asm code to invoke the
>> callback seems to be about half of the entire function.
> 
> Because the Intel SDK (and other SDKs?) wants to use the host stack to pass
> parameters out of the enclave.

Ugh, right.  I forgot about that particular abomination.

I suppose that passing a context pointer would be reasonable.
Jethro Beekman Aug. 11, 2020, 7:16 a.m. UTC | #6
On 2020-08-11 00:23, Sean Christopherson wrote:
> Another thought would be to wrap sgx_enclave_exception in a struct to give
> room for supporting additional exit information (if such a thing ever pops
> up) and to allow the caller to opt in to select behavior, e.g. Jethro's
> request to invoke the exit handler on IRQ exits.  This is basically the
> equivalent of "struct kvm_run", minus the vCPU/enclave state.

Actually, the flag I need is “return from the vdso on IRQ exits” (See Andy's email about preferring returns over callbacks). But maybe the flag should be “interpret IRQ exit as a normal exit” and let it be handled the same as any other exit based on whether an exit handler fnptr was passed or not.

--
Jethro Beekman | Fortanix
Sean Christopherson Aug. 11, 2020, 2:54 p.m. UTC | #7
On Tue, Aug 11, 2020 at 09:16:28AM +0200, Jethro Beekman wrote:
> On 2020-08-11 00:23, Sean Christopherson wrote:
> > Another thought would be to wrap sgx_enclave_exception in a struct to give
> > room for supporting additional exit information (if such a thing ever pops
> > up) and to allow the caller to opt in to select behavior, e.g. Jethro's
> > request to invoke the exit handler on IRQ exits.  This is basically the
> > equivalent of "struct kvm_run", minus the vCPU/enclave state.
> 
> Actually, the flag I need is “return from the vdso on IRQ exits” (See Andy's
> email about preferring returns over callbacks). But maybe the flag should be
> “interpret IRQ exit as a normal exit” and let it be handled the same as any
> other exit based on whether an exit handler fnptr was passed or not.

Ya, slip of the tongue, the behavior would apply to both paths.
Andy Lutomirski Aug. 11, 2020, 3:16 p.m. UTC | #8
> On Aug 10, 2020, at 5:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> 
> 
>>> On Aug 10, 2020, at 4:48 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>> 
>>> On Mon, Aug 10, 2020 at 04:08:46PM -0700, Andy Lutomirski wrote:
>>> What am I missing?  I still don't really understand why we are
>>> supporting this mechanism at all.  Just the asm code to invoke the
>>> callback seems to be about half of the entire function.
>> 
>> Because the Intel SDK (and other SDKs?) wants to use the host stack to pass
>> parameters out of the enclave.
> 
> Ugh, right.  I forgot about that particular abomination.
> 
> I suppose that passing a context pointer would be reasonable.

The alternative would be to pass in a parameter that gets put in RSP before entering the enclave. The idea is that the untrusted runtime would allocate a couple pages with guard pages at either end, and enclaves using the regrettable arguments-on-the-stack scheme would end up using the alternative stack.

At the end of the day, none of this really matters too much. Languages that can do inline asm but can’t do container_of() can get fixed or use workarounds.
Sean Christopherson Aug. 13, 2020, 7:38 p.m. UTC | #9
On Tue, Aug 11, 2020 at 08:16:54AM -0700, Andy Lutomirski wrote:
> 
> > On Aug 10, 2020, at 5:52 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > 
> > 
> >>> On Aug 10, 2020, at 4:48 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> >>> 
> >>> On Mon, Aug 10, 2020 at 04:08:46PM -0700, Andy Lutomirski wrote:
> >>> What am I missing?  I still don't really understand why we are
> >>> supporting this mechanism at all.  Just the asm code to invoke the
> >>> callback seems to be about half of the entire function.
> >> 
> >> Because the Intel SDK (and other SDKs?) wants to use the host stack to pass
> >> parameters out of the enclave.
> > 
> > Ugh, right.  I forgot about that particular abomination.
> > 
> > I suppose that passing a context pointer would be reasonable.
> 
> The alternative would be to pass in a parameter that gets put in RSP before
> entering the enclave. The idea is that the untrusted runtime would allocate a
> couple pages with guard pages at either end, and enclaves using the
> regrettable arguments-on-the-stack scheme would end up using the alternative
> stack.
> 
> At the end of the day, none of this really matters too much. Languages that
> can do inline asm but can’t do container_of() can get fixed or use
> workarounds.

So, is your "official" opinion

   Go update the vDSO to allow passing an arbitrary pointer.

or

   Eh, don't bother.
Nathaniel McCallum Aug. 17, 2020, 1:12 p.m. UTC | #10
On Mon, Aug 10, 2020 at 7:09 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Thu, Aug 6, 2020 at 7:55 AM Nathaniel McCallum <npmccallum@redhat.com> wrote:
> >
> > In a past revision of this patch, I had requested a void *misc
> > parameter that could be passed through vdso_sgx_enter_enclave_t into
> > sgx_enclave_exit_handler_t. This request encountered some push back
> > and I dropped the issue. However, I'd like to revisit it or something
> > similar.
>
> Why do you need an exit handler at all?  IIRC way back when I
> suggested that we simply not support it at all.  If you want to
> call__vdso_sgx_enter_enclave() in a loop, call it in a loop.  If you
> want to wrap it intelligently in Rust, you don't want a callback
> anyway -- that forces you have an FFI (or non-Rust, anyway) frame on
> the stack, which interacts poorly with panic handling and prevents you
> from using await in your Rust callback handler.  If, on the other
> hand, you just call __vdso_sg_enter_enclave() in a loop, all these
> problems go away and, if you really want, you can pass in a callback
> in Rust and call the callback from Rust.
>
> What am I missing?  I still don't really understand why we are
> supporting this mechanism at all.  Just the asm code to invoke the
> callback seems to be about half of the entire function.

There are three ways to pass state between the enclave and the outside world:
1. A pre-allocated memory block at enclave creation time.
2. A contract for pushing values onto the stack during entry/exit.
3. All registers and flags besides rax, rbx, and rcx.

Under the current vDSO function:

#1 is completely possible without a handler. The challenge is how to
communicate the address of this memory to the enclave. This can be
accomplished by a parameter in a measured block or by convention.
Otherwise, it needs to use #2 or #3 to communicate the address of the
block.

#2 requires a handler written in assembly. The semantics are well known.

#3 is possible without a handler, but only for the subset of the
registers allowed by the calling convention. However, with a handler
written in assembly you can pass both in and out the full range of
registers. To do this, the assembly handler needs a pointer to a
buffer to save the registers into. How does it get said pointer?
Currently: offsetof, which Rust doesn't support.

So when I want to write a general purpose library to expose this,
which method should I expose? In particular, I want to give a single
"best practice" workflow to library consumers so that the startup
process is easy. Since #1 requires #3 to avoid a bad developer
experience, I'm inclined to provide #3.

I can provide the calling convention registers easily. But this limits
the output registers to two (practically, one for a variety of
reasons). I might just accept that. But with a misc pointer to the
handler, I could expand the one or two return registers to 11 (rdi,
rsi, rdx, r8-r15).

Put simply: I think the one extra line of assembly is worth the
flexibility it gives to consumers of this interface. In particular, it
improves the FFI experience greatly since one doesn't have to resort
to offsetof tricks to get some additional context into the handler.
Andy Lutomirski Aug. 17, 2020, 3:01 p.m. UTC | #11
> On Aug 17, 2020, at 6:12 AM, Nathaniel McCallum <npmccallum@redhat.com> wrote:
> 
> On Mon, Aug 10, 2020 at 7:09 PM Andy Lutomirski <luto@kernel.org> wrote:
>> 
>>> On Thu, Aug 6, 2020 at 7:55 AM Nathaniel McCallum <npmccallum@redhat.com> wrote:
>>> 
>>> In a past revision of this patch, I had requested a void *misc
>>> parameter that could be passed through vdso_sgx_enter_enclave_t into
>>> sgx_enclave_exit_handler_t. This request encountered some push back
>>> and I dropped the issue. However, I'd like to revisit it or something
>>> similar.
>> 
>> Why do you need an exit handler at all?  IIRC way back when I
>> suggested that we simply not support it at all.  If you want to
>> call__vdso_sgx_enter_enclave() in a loop, call it in a loop.  If you
>> want to wrap it intelligently in Rust, you don't want a callback
>> anyway -- that forces you have an FFI (or non-Rust, anyway) frame on
>> the stack, which interacts poorly with panic handling and prevents you
>> from using await in your Rust callback handler.  If, on the other
>> hand, you just call __vdso_sg_enter_enclave() in a loop, all these
>> problems go away and, if you really want, you can pass in a callback
>> in Rust and call the callback from Rust.
>> 
>> What am I missing?  I still don't really understand why we are
>> supporting this mechanism at all.  Just the asm code to invoke the
>> callback seems to be about half of the entire function.
> 
> There are three ways to pass state between the enclave and the outside world:
> 1. A pre-allocated memory block at enclave creation time.
> 2. A contract for pushing values onto the stack during entry/exit.
> 3. All registers and flags besides rax, rbx, and rcx.
> 
> Under the current vDSO function:
> 
> #1 is completely possible without a handler. The challenge is how to
> communicate the address of this memory to the enclave. This can be
> accomplished by a parameter in a measured block or by convention.
> Otherwise, it needs to use #2 or #3 to communicate the address of the
> block.
> 
> #2 requires a handler written in assembly. The semantics are well known.

No one seems particularly interested in my suggestion that the RSP exposed to the enclave be different from the actual untrusted stack. Oh well.

That being said, if I were writing a Rust wrapper for SGX (or Python or any language with a reasonable form of async/await), I would want the vDSO code to support swapping RSP before ENCLU because I would want to have first-class support for invoking an enclave from an async function and suspending inside the handler. Allocating a real stack (and shadow stack once CET shows up) for this is disgusting.  If the vDSO supported this natively, it could (in principle) interact with signal delivery such that signals would not see the alternative stack.

I do admit that the implementation would not be pretty. Honestly, I think Intel messed up by exposing USER_RSP to enclaves in the first place.

> 
> #3 is possible without a handler, but only for the subset of the
> registers allowed by the calling convention. However, with a handler
> written in assembly you can pass both in and out the full range of
> registers. To do this, the assembly handler needs a pointer to a
> buffer to save the registers into. How does it get said pointer?
> Currently: offsetof, which Rust doesn't support.

I find this justification a bit silly. Binutils asm (gas) doesn’t support offsetof for C structs either, and Linux works just fine. We’re talking about hardcoding one number along with an assertion somewhere that the number is correct, right?  Couldn’t sizeof be used, too?

To be clear, I think that passing around a misc pointer seems entirely reasonable, but I see it as a nice feature, not as a requirement for correct usage of the function.
Jarkko Sakkinen Aug. 18, 2020, 2:26 p.m. UTC | #12
On Thu, Aug 06, 2020 at 10:55:43AM -0400, Nathaniel McCallum wrote:
> In a past revision of this patch, I had requested a void *misc
> parameter that could be passed through vdso_sgx_enter_enclave_t into
> sgx_enclave_exit_handler_t. This request encountered some push back
> and I dropped the issue. However, I'd like to revisit it or something
> similar.
> 
> One way to create a generic interface to SGX is to pass a structure
> that captures the relevant CPU state from the handler so that it can
> be evaluated in C code before reentry. Precedent for this approach can
> be found in struct kvm_run[0]. Currently, however, there is no way to
> pass a pointer to such a structure directly into the handler.
> 
> This can be done implicitly by wrapping the struct
> sgx_enclave_exception in another structure and then using techniques
> like container_of() to find another field. However, this is made more
> difficult by the fact that the sgx_enclave_exit_handler_t is not
> really using the x86_64 sysv calling convention. Therefore, the
> sgx_enclave_exit_handler_t MUST be written in assembly. This also
> implies that we can't use techniques like container_of() and must
> calculate all the offsets manually, which is tedious, error prone and
> fragile.

If instead of having a callback, kernel would run an optional bpf
program, then a state structure could passed by the kernel to the
execution context of the bpf program.

That would also sort out difficulty of writing exit handlers.

/Jarkko
Jarkko Sakkinen Aug. 18, 2020, 2:52 p.m. UTC | #13
On Mon, Aug 10, 2020 at 03:23:17PM -0700, Sean Christopherson wrote:
> > This can be done implicitly by wrapping the struct
> > sgx_enclave_exception in another structure and then using techniques
> > like container_of() to find another field. However, this is made more
> > difficult by the fact that the sgx_enclave_exit_handler_t is not
> > really using the x86_64 sysv calling convention. Therefore, the
> > sgx_enclave_exit_handler_t MUST be written in assembly.
> 
> What bits of the x86-64 ABI require writing the handler in assembly?  There
> are certainly restrictions on what the handler can do without needing an
> assembly trampoline, but I was under the impression that vanilla C code is
> compatible with the exit handler patch.  Is Rust more picky about calling
> convention?
> 
> Side topic, the documentation for vdso_sgx_enter_enclave_t is wrong, it
> states the EFLAGS.DF is not cleared before invoking the handler, but that's
> a lie.

If handler requires the use of setjmp/longjmp API for sudden exits, that
is considered bad even with C++, as it is not compatible with stack
unwinding. The handler has a lot of constraints for its environment, and
is somewhat unappealing to use.

That's why I started today thinking a possibility of using a bpf program
as a middle-man. BPF programs can be used to execute code by the kernel
in behalf of user in a domain defined sandbox. The execution context is
just a buffer passed in R1 to the BPF interpreter. It can be defined by
application.

/Jarkko
Jarkko Sakkinen Aug. 18, 2020, 3:06 p.m. UTC | #14
On Tue, Aug 18, 2020 at 05:52:41PM +0300, Jarkko Sakkinen wrote:
> On Mon, Aug 10, 2020 at 03:23:17PM -0700, Sean Christopherson wrote:
> > > This can be done implicitly by wrapping the struct
> > > sgx_enclave_exception in another structure and then using techniques
> > > like container_of() to find another field. However, this is made more
> > > difficult by the fact that the sgx_enclave_exit_handler_t is not
> > > really using the x86_64 sysv calling convention. Therefore, the
> > > sgx_enclave_exit_handler_t MUST be written in assembly.
> > 
> > What bits of the x86-64 ABI require writing the handler in assembly?  There
> > are certainly restrictions on what the handler can do without needing an
> > assembly trampoline, but I was under the impression that vanilla C code is
> > compatible with the exit handler patch.  Is Rust more picky about calling
> > convention?
> > 
> > Side topic, the documentation for vdso_sgx_enter_enclave_t is wrong, it
> > states the EFLAGS.DF is not cleared before invoking the handler, but that's
> > a lie.
> 
> If handler requires the use of setjmp/longjmp API for sudden exits, that
> is considered bad even with C++, as it is not compatible with stack
> unwinding. The handler has a lot of constraints for its environment, and
> is somewhat unappealing to use.
> 
> That's why I started today thinking a possibility of using a bpf program
> as a middle-man. BPF programs can be used to execute code by the kernel
> in behalf of user in a domain defined sandbox. The execution context is
> just a buffer passed in R1 to the BPF interpreter. It can be defined by
> application.

Something like

1. An exception is triggered.
2. Kernel executes an eBPF program behalf of the caller, if one was
   given.
3. vDSO calls a fixed exit handler that based on the outcome calls
   ERESUME/EENTER.

Possibly an ioctl could be used to attach an eBPF program to an
enclave and vDSO would only get a context struct.

/Jarkko
Jarkko Sakkinen Aug. 18, 2020, 3:15 p.m. UTC | #15
On Mon, Aug 10, 2020 at 04:08:46PM -0700, Andy Lutomirski wrote:
> On Thu, Aug 6, 2020 at 7:55 AM Nathaniel McCallum <npmccallum@redhat.com> wrote:
> >
> > In a past revision of this patch, I had requested a void *misc
> > parameter that could be passed through vdso_sgx_enter_enclave_t into
> > sgx_enclave_exit_handler_t. This request encountered some push back
> > and I dropped the issue. However, I'd like to revisit it or something
> > similar.
> 
> Why do you need an exit handler at all?  IIRC way back when I
> suggested that we simply not support it at all.  If you want to
> call__vdso_sgx_enter_enclave() in a loop, call it in a loop.  If you
> want to wrap it intelligently in Rust, you don't want a callback
> anyway -- that forces you have an FFI (or non-Rust, anyway) frame on
> the stack, which interacts poorly with panic handling and prevents you
> from using await in your Rust callback handler.  If, on the other
> hand, you just call __vdso_sg_enter_enclave() in a loop, all these
> problems go away and, if you really want, you can pass in a callback
> in Rust and call the callback from Rust.

How would Intel SDK be able to do its stack manipulation?

> What am I missing?  I still don't really understand why we are
> supporting this mechanism at all.  Just the asm code to invoke the
> callback seems to be about half of the entire function.

I'm most worried maintaining all of this given all the innovative ways
that users can exploit an uapi.

/Jarkko
Nathaniel McCallum Aug. 18, 2020, 3:15 p.m. UTC | #16
That seems like overkill to me. I'm just asking for one additional mov
instruction. :)

On Tue, Aug 18, 2020 at 11:06 AM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> On Tue, Aug 18, 2020 at 05:52:41PM +0300, Jarkko Sakkinen wrote:
> > On Mon, Aug 10, 2020 at 03:23:17PM -0700, Sean Christopherson wrote:
> > > > This can be done implicitly by wrapping the struct
> > > > sgx_enclave_exception in another structure and then using techniques
> > > > like container_of() to find another field. However, this is made more
> > > > difficult by the fact that the sgx_enclave_exit_handler_t is not
> > > > really using the x86_64 sysv calling convention. Therefore, the
> > > > sgx_enclave_exit_handler_t MUST be written in assembly.
> > >
> > > What bits of the x86-64 ABI require writing the handler in assembly?  There
> > > are certainly restrictions on what the handler can do without needing an
> > > assembly trampoline, but I was under the impression that vanilla C code is
> > > compatible with the exit handler patch.  Is Rust more picky about calling
> > > convention?
> > >
> > > Side topic, the documentation for vdso_sgx_enter_enclave_t is wrong, it
> > > states the EFLAGS.DF is not cleared before invoking the handler, but that's
> > > a lie.
> >
> > If handler requires the use of setjmp/longjmp API for sudden exits, that
> > is considered bad even with C++, as it is not compatible with stack
> > unwinding. The handler has a lot of constraints for its environment, and
> > is somewhat unappealing to use.
> >
> > That's why I started today thinking a possibility of using a bpf program
> > as a middle-man. BPF programs can be used to execute code by the kernel
> > in behalf of user in a domain defined sandbox. The execution context is
> > just a buffer passed in R1 to the BPF interpreter. It can be defined by
> > application.
>
> Something like
>
> 1. An exception is triggered.
> 2. Kernel executes an eBPF program behalf of the caller, if one was
>    given.
> 3. vDSO calls a fixed exit handler that based on the outcome calls
>    ERESUME/EENTER.
>
> Possibly an ioctl could be used to attach an eBPF program to an
> enclave and vDSO would only get a context struct.
>
> /Jarkko
>
Jarkko Sakkinen Aug. 18, 2020, 4:43 p.m. UTC | #17
On Tue, Aug 18, 2020 at 11:15:32AM -0400, Nathaniel McCallum wrote:
> That seems like overkill to me. I'm just asking for one additional mov
> instruction. :)

I started to consider eBPF since the complexity and constraints of the
callback look like an overkill and without doubt will be a burden to
maintain.

/Jarkko
Nathaniel McCallum Aug. 19, 2020, 1:33 p.m. UTC | #18
On Tue, Aug 18, 2020 at 12:44 PM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> On Tue, Aug 18, 2020 at 11:15:32AM -0400, Nathaniel McCallum wrote:
> > That seems like overkill to me. I'm just asking for one additional mov
> > instruction. :)
>
> I started to consider eBPF since the complexity and constraints of the
> callback look like an overkill and without doubt will be a burden to
> maintain.

That feels to me like more complexity just to move the existing
complexity from one place to another.
Jethro Beekman Aug. 19, 2020, 2 p.m. UTC | #19
On 2020-08-19 15:33, Nathaniel McCallum wrote:
> On Tue, Aug 18, 2020 at 12:44 PM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
>>
>> On Tue, Aug 18, 2020 at 11:15:32AM -0400, Nathaniel McCallum wrote:
>>> That seems like overkill to me. I'm just asking for one additional mov
>>> instruction. :)
>>
>> I started to consider eBPF since the complexity and constraints of the
>> callback look like an overkill and without doubt will be a burden to
>> maintain.
> 
> That feels to me like more complexity just to move the existing
> complexity from one place to another.
> 

Agreed that BPF doesn't make things better.

--
Jethro Beekman | Fortanix
Jarkko Sakkinen Aug. 19, 2020, 9:23 p.m. UTC | #20
On Wed, Aug 19, 2020 at 09:33:45AM -0400, Nathaniel McCallum wrote:
> On Tue, Aug 18, 2020 at 12:44 PM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > On Tue, Aug 18, 2020 at 11:15:32AM -0400, Nathaniel McCallum wrote:
> > > That seems like overkill to me. I'm just asking for one additional mov
> > > instruction. :)
> >
> > I started to consider eBPF since the complexity and constraints of the
> > callback look like an overkill and without doubt will be a burden to
> > maintain.
> 
> That feels to me like more complexity just to move the existing
> complexity from one place to another.

My thinking was that there is like two parts on AEX handler:

A. You have a set of data and some application dependent logic.
   Right now this part is written in assembly most of the time.
B. You have code that decide how to continue based on that logic.

I was thinking that you could have the logic manipulating data (the
existing context structure we have can be passed directly to the BPF
state machine) patchable with a BPF program.

The logic would be executed in ring-0. vDSO would contain part B.

It is better to at least make some rational conclusions about this
because it seems to be the trend [1].

On the other hand the history with callbacks has not been a history
of victory as far as it comes to all the possible issues with them.
Like think about signals for example...

[1] https://lwn.net/Articles/813261/

/Jarkko
Andy Lutomirski Aug. 20, 2020, 12:19 a.m. UTC | #21
On Tue, Aug 18, 2020 at 8:15 AM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> On Mon, Aug 10, 2020 at 04:08:46PM -0700, Andy Lutomirski wrote:
> > On Thu, Aug 6, 2020 at 7:55 AM Nathaniel McCallum <npmccallum@redhat.com> wrote:
> > >
> > > In a past revision of this patch, I had requested a void *misc
> > > parameter that could be passed through vdso_sgx_enter_enclave_t into
> > > sgx_enclave_exit_handler_t. This request encountered some push back
> > > and I dropped the issue. However, I'd like to revisit it or something
> > > similar.
> >
> > Why do you need an exit handler at all?  IIRC way back when I
> > suggested that we simply not support it at all.  If you want to
> > call__vdso_sgx_enter_enclave() in a loop, call it in a loop.  If you
> > want to wrap it intelligently in Rust, you don't want a callback
> > anyway -- that forces you have an FFI (or non-Rust, anyway) frame on
> > the stack, which interacts poorly with panic handling and prevents you
> > from using await in your Rust callback handler.  If, on the other
> > hand, you just call __vdso_sg_enter_enclave() in a loop, all these
> > problems go away and, if you really want, you can pass in a callback
> > in Rust and call the callback from Rust.
>
> How would Intel SDK be able to do its stack manipulation?

The same as now.  The enclave would see a pointer to a stack-like
writable area in USER_RSP, but it just wouldn't be the actual stack.

I suppose that the caller of the vdso can play these games just as
well as the vdso itself, though, so maybe this is not helpful.

--Andy
diff mbox series

Patch

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index ebe82b7aecda..f71ad5ebd0c4 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -29,6 +29,7 @@  VDSO32-$(CONFIG_IA32_EMULATION)	:= y
 vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
 vobjs32-y := vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
 vobjs32-y += vdso32/vclock_gettime.o
+vobjs-$(VDSO64-y)		+= vsgx_enter_enclave.o
 
 # files to link into kernel
 obj-y				+= vma.o extable.o
@@ -100,6 +101,7 @@  $(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CFLAGS
 CFLAGS_REMOVE_vclock_gettime.o = -pg
 CFLAGS_REMOVE_vdso32/vclock_gettime.o = -pg
 CFLAGS_REMOVE_vgetcpu.o = -pg
+CFLAGS_REMOVE_vsgx_enter_enclave.o = -pg
 
 #
 # X32 processes use x32 vDSO to access 64bit kernel data.
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index 36b644e16272..4bf48462fca7 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -27,6 +27,7 @@  VERSION {
 		__vdso_time;
 		clock_getres;
 		__vdso_clock_getres;
+		__vdso_sgx_enter_enclave;
 	local: *;
 	};
 }
diff --git a/arch/x86/entry/vdso/vsgx_enter_enclave.S b/arch/x86/entry/vdso/vsgx_enter_enclave.S
new file mode 100644
index 000000000000..be7e467e1efb
--- /dev/null
+++ b/arch/x86/entry/vdso/vsgx_enter_enclave.S
@@ -0,0 +1,131 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/linkage.h>
+#include <asm/export.h>
+#include <asm/errno.h>
+#include <asm/enclu.h>
+
+#include "extable.h"
+
+#define EX_LEAF		0*8
+#define EX_TRAPNR	0*8+4
+#define EX_ERROR_CODE	0*8+6
+#define EX_ADDRESS	1*8
+
+.code64
+.section .text, "ax"
+
+SYM_FUNC_START(__vdso_sgx_enter_enclave)
+	/* Prolog */
+	.cfi_startproc
+	push	%rbp
+	.cfi_adjust_cfa_offset	8
+	.cfi_rel_offset		%rbp, 0
+	mov	%rsp, %rbp
+	.cfi_def_cfa_register	%rbp
+	push	%rbx
+	.cfi_rel_offset		%rbx, -8
+
+	mov	%ecx, %eax
+.Lenter_enclave:
+	/* EENTER <= leaf <= ERESUME */
+	cmp	$EENTER, %eax
+	jb	.Linvalid_leaf
+	cmp	$ERESUME, %eax
+	ja	.Linvalid_leaf
+
+	/* Load TCS and AEP */
+	mov	0x10(%rbp), %rbx
+	lea	.Lasync_exit_pointer(%rip), %rcx
+
+	/* Single ENCLU serving as both EENTER and AEP (ERESUME) */
+.Lasync_exit_pointer:
+.Lenclu_eenter_eresume:
+	enclu
+
+	/* EEXIT jumps here unless the enclave is doing something fancy. */
+	xor	%eax, %eax
+
+	/* Invoke userspace's exit handler if one was provided. */
+.Lhandle_exit:
+	cmp	$0, 0x20(%rbp)
+	jne	.Linvoke_userspace_handler
+
+.Lout:
+	pop	%rbx
+	leave
+	.cfi_def_cfa		%rsp, 8
+	ret
+
+	/* The out-of-line code runs with the pre-leave stack frame. */
+	.cfi_def_cfa		%rbp, 16
+
+.Linvalid_leaf:
+	mov	$(-EINVAL), %eax
+	jmp	.Lout
+
+.Lhandle_exception:
+	mov	0x18(%rbp), %rcx
+	test    %rcx, %rcx
+	je	.Lskip_exception_info
+
+	/* Fill optional exception info. */
+	mov	%eax, EX_LEAF(%rcx)
+	mov	%di,  EX_TRAPNR(%rcx)
+	mov	%si,  EX_ERROR_CODE(%rcx)
+	mov	%rdx, EX_ADDRESS(%rcx)
+.Lskip_exception_info:
+	mov	$(-EFAULT), %eax
+	jmp	.Lhandle_exit
+
+.Linvoke_userspace_handler:
+	/* Pass the untrusted RSP (at exit) to the callback via %rcx. */
+	mov	%rsp, %rcx
+
+	/* Save the untrusted RSP offset in %rbx (non-volatile register). */
+	mov	%rsp, %rbx
+	and	$0xf, %rbx
+
+	/*
+	 * Align stack per x86_64 ABI. Note, %rsp needs to be 16-byte aligned
+	 * _after_ pushing the parameters on the stack, hence the bonus push.
+	 */
+	and	$-0x10, %rsp
+	push	%rax
+
+	/* Push @e, the "return" value and @tcs as params to the callback. */
+	push	0x18(%rbp)
+	push	%rax
+	push	0x10(%rbp)
+
+	/* Clear RFLAGS.DF per x86_64 ABI */
+	cld
+
+	/* Load the callback pointer to %rax and invoke it via retpoline. */
+	mov	0x20(%rbp), %rax
+	call	.Lretpoline
+
+	/* Undo the post-exit %rsp adjustment. */
+	lea	0x20(%rsp, %rbx), %rsp
+
+	/*
+	 * If the return from callback is zero or negative, return immediately,
+	 * else re-execute ENCLU with the postive return value interpreted as
+	 * the requested ENCLU leaf.
+	 */
+	cmp	$0, %eax
+	jle	.Lout
+	jmp	.Lenter_enclave
+
+.Lretpoline:
+	call	2f
+1:	pause
+	lfence
+	jmp	1b
+2:	mov	%rax, (%rsp)
+	ret
+	.cfi_endproc
+
+_ASM_VDSO_EXTABLE_HANDLE(.Lenclu_eenter_eresume, .Lhandle_exception)
+
+SYM_FUNC_END(__vdso_sgx_enter_enclave)
diff --git a/arch/x86/include/asm/enclu.h b/arch/x86/include/asm/enclu.h
new file mode 100644
index 000000000000..06157b3e9ede
--- /dev/null
+++ b/arch/x86/include/asm/enclu.h
@@ -0,0 +1,8 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_ENCLU_H
+#define _ASM_X86_ENCLU_H
+
+#define EENTER	0x02
+#define ERESUME	0x03
+
+#endif /* _ASM_X86_ENCLU_H */
diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 57d0d30c79b3..3760e5d5dc0c 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -74,4 +74,102 @@  struct sgx_enclave_set_attribute {
 	__u64 attribute_fd;
 };
 
+/**
+ * struct sgx_enclave_exception - structure to report exceptions encountered in
+ *				  __vdso_sgx_enter_enclave()
+ *
+ * @leaf:	ENCLU leaf from \%eax at time of exception
+ * @trapnr:	exception trap number, a.k.a. fault vector
+ * @error_code:	exception error code
+ * @address:	exception address, e.g. CR2 on a #PF
+ * @reserved:	reserved for future use
+ */
+struct sgx_enclave_exception {
+	__u32 leaf;
+	__u16 trapnr;
+	__u16 error_code;
+	__u64 address;
+	__u64 reserved[2];
+};
+
+/**
+ * typedef sgx_enclave_exit_handler_t - Exit handler function accepted by
+ *					__vdso_sgx_enter_enclave()
+ *
+ * @rdi:	RDI at the time of enclave exit
+ * @rsi:	RSI at the time of enclave exit
+ * @rdx:	RDX at the time of enclave exit
+ * @ursp:	RSP at the time of enclave exit (untrusted stack)
+ * @r8:		R8 at the time of enclave exit
+ * @r9:		R9 at the time of enclave exit
+ * @tcs:	Thread Control Structure used to enter enclave
+ * @ret:	0 on success (EEXIT), -EFAULT on an exception
+ * @e:		Pointer to struct sgx_enclave_exception (as provided by caller)
+ */
+typedef int (*sgx_enclave_exit_handler_t)(long rdi, long rsi, long rdx,
+					  long ursp, long r8, long r9,
+					  void *tcs, int ret,
+					  struct sgx_enclave_exception *e);
+
+/**
+ * __vdso_sgx_enter_enclave() - Enter an SGX enclave
+ * @rdi:	Pass-through value for RDI
+ * @rsi:	Pass-through value for RSI
+ * @rdx:	Pass-through value for RDX
+ * @leaf:	ENCLU leaf, must be EENTER or ERESUME
+ * @r8:		Pass-through value for R8
+ * @r9:		Pass-through value for R9
+ * @tcs:	TCS, must be non-NULL
+ * @e:		Optional struct sgx_enclave_exception instance
+ * @handler:	Optional enclave exit handler
+ *
+ * NOTE: __vdso_sgx_enter_enclave() does not ensure full compliance with the
+ * x86-64 ABI, e.g. doesn't explicitly clear EFLAGS.DF after EEXIT.  Except for
+ * non-volatile general purpose registers, preserving/setting state in
+ * accordance with the x86-64 ABI is the responsibility of the enclave and its
+ * runtime, i.e. __vdso_sgx_enter_enclave() cannot be called from C code
+ * without careful consideration by both the enclave and its runtime.
+ *
+ * All general purpose registers except RAX, RBX and RCX are passed as-is to
+ * the enclave.  RAX, RBX and RCX are consumed by EENTER and ERESUME and are
+ * loaded with @leaf, asynchronous exit pointer, and @tcs respectively.
+ *
+ * RBP and the stack are used to anchor __vdso_sgx_enter_enclave() to the
+ * pre-enclave state, e.g. to retrieve @e and @handler after an enclave exit.
+ * All other registers are available for use by the enclave and its runtime,
+ * e.g. an enclave can push additional data onto the stack (and modify RSP) to
+ * pass information to the optional exit handler (see below).
+ *
+ * Most exceptions reported on ENCLU, including those that occur within the
+ * enclave, are fixed up and reported synchronously instead of being delivered
+ * via a standard signal. Debug Exceptions (#DB) and Breakpoints (#BP) are
+ * never fixed up and are always delivered via standard signals. On synchrously
+ * reported exceptions, -EFAULT is returned and details about the exception are
+ * recorded in @e, the optional sgx_enclave_exception struct.
+
+ * If an exit handler is provided, the handler will be invoked on synchronous
+ * exits from the enclave and for all synchronously reported exceptions. In
+ * latter case, @e is filled prior to invoking the handler.
+ *
+ * The exit handler's return value is interpreted as follows:
+ *  >0:		continue, restart __vdso_sgx_enter_enclave() with @ret as @leaf
+ *   0:		success, return @ret to the caller
+ *  <0:		error, return @ret to the caller
+ *
+ * The exit handler may transfer control, e.g. via longjmp() or C++ exception,
+ * without returning to __vdso_sgx_enter_enclave().
+ *
+ * Return:
+ *  0 on success,
+ *  -EINVAL if ENCLU leaf is not allowed,
+ *  -EFAULT if an exception occurs on ENCLU or within the enclave
+ *  -errno for all other negative values returned by the userspace exit handler
+ */
+typedef int (*vdso_sgx_enter_enclave_t)(unsigned long rdi, unsigned long rsi,
+					unsigned long rdx, unsigned int leaf,
+					unsigned long r8,  unsigned long r9,
+					void *tcs,
+					struct sgx_enclave_exception *e,
+					sgx_enclave_exit_handler_t handler);
+
 #endif /* _UAPI_ASM_X86_SGX_H */