diff mbox series

[5/7] x86/sgx: Add flag to zero added region instead of copying from source

Message ID 20190605194845.926-6-sean.j.christopherson@intel.com (mailing list archive)
State New, archived
Headers show
Series x86/sgx: Clean up and enhance add pages ioctl | expand

Commit Message

Sean Christopherson June 5, 2019, 7:48 p.m. UTC
For some enclaves, e.g. an enclave with a small code footprint and a
large working set, the vast majority of pages added to the enclave are
zero pages.  Introduce a flag to denote such zero pages.  The major
benefit of the flag will be realized in a future patch to use Linux's
actual zero page as the source, as opposed to explicitly zeroing the
enclave's backing memory.

Use bit 8 for the SGX_ZERO_REGION flag to avoid an anticipated conflict
with passing PROT_{READ,WRITE,EXEC} in bits 2:0, and to leave room in
case there is a need for additional protection related bits.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/uapi/asm/sgx.h        |  5 +++++
 arch/x86/kernel/cpu/sgx/driver/ioctl.c | 15 ++++++++++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

Comments

Andy Lutomirski June 6, 2019, 5:20 p.m. UTC | #1
On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> For some enclaves, e.g. an enclave with a small code footprint and a
> large working set, the vast majority of pages added to the enclave are
> zero pages.  Introduce a flag to denote such zero pages.  The major
> benefit of the flag will be realized in a future patch to use Linux's
> actual zero page as the source, as opposed to explicitly zeroing the
> enclave's backing memory.
>

I feel like I probably asked this at some point, but why is there a
workqueue here at all?
Sean Christopherson June 6, 2019, 5:32 p.m. UTC | #2
On Thu, Jun 06, 2019 at 10:20:38AM -0700, Andy Lutomirski wrote:
> On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > For some enclaves, e.g. an enclave with a small code footprint and a
> > large working set, the vast majority of pages added to the enclave are
> > zero pages.  Introduce a flag to denote such zero pages.  The major
> > benefit of the flag will be realized in a future patch to use Linux's
> > actual zero page as the source, as opposed to explicitly zeroing the
> > enclave's backing memory.
> >
> 
> I feel like I probably asked this at some point, but why is there a
> workqueue here at all?

Performance.  A while back I wrote a patch set to remove the worker queue
and discovered that it tanks enclave build time when the enclave is being
hosted by a Golang application.  Here's a snippet from a mail discussing
the code.

    The bad news is that I don't think we can remove the add page worker
    as applications with userspace schedulers, e.g. Go's M:N scheduler,
    can see a 10x or more throughput improvement when using the worker
    queue.  I did a bit of digging for the Golang case to make sure I
    wasn't doing something horribly stupid/naive and found that it's a
    generic issue in Golang with blocking (or just long-running) system
    calls.  Because Golang multiplexes Goroutines on top of OS threads,
    blocking syscalls introduce latency and context switching overhead,
    e.g. Go's scheduler will spin up a new OS thread to service other
    Goroutines after it realizes the syscall has blocked, and will later
    destroy one of the OS threads so that it doesn't build up too many
    unused.

IIRC, the scenario is spinning up several goroutines, each building an
enclave.  I played around with adding a flag to do a synchronous EADD
but didn't see a meaningful change in performance for the simple case.
Supporting both the worker queue and direct paths was complex enough
that I decided it wasn't worth the trouble for initial upstreaming.
Andy Lutomirski June 7, 2019, 7:32 p.m. UTC | #3
> On Jun 6, 2019, at 10:32 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
>> On Thu, Jun 06, 2019 at 10:20:38AM -0700, Andy Lutomirski wrote:
>> On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson
>> <sean.j.christopherson@intel.com> wrote:
>>> 
>>> For some enclaves, e.g. an enclave with a small code footprint and a
>>> large working set, the vast majority of pages added to the enclave are
>>> zero pages.  Introduce a flag to denote such zero pages.  The major
>>> benefit of the flag will be realized in a future patch to use Linux's
>>> actual zero page as the source, as opposed to explicitly zeroing the
>>> enclave's backing memory.
>>> 
>> 
>> I feel like I probably asked this at some point, but why is there a
>> workqueue here at all?
> 
> Performance.  A while back I wrote a patch set to remove the worker queue
> and discovered that it tanks enclave build time when the enclave is being
> hosted by a Golang application.  Here's a snippet from a mail discussing
> the code.
> 
>    The bad news is that I don't think we can remove the add page worker
>    as applications with userspace schedulers, e.g. Go's M:N scheduler,
>    can see a 10x or more throughput improvement when using the worker
>    queue.  I did a bit of digging for the Golang case to make sure I
>    wasn't doing something horribly stupid/naive and found that it's a
>    generic issue in Golang with blocking (or just long-running) system
>    calls.  Because Golang multiplexes Goroutines on top of OS threads,
>    blocking syscalls introduce latency and context switching overhead,
>    e.g. Go's scheduler will spin up a new OS thread to service other
>    Goroutines after it realizes the syscall has blocked, and will later
>    destroy one of the OS threads so that it doesn't build up too many
>    unused.
> 
> IIRC, the scenario is spinning up several goroutines, each building an
> enclave.  I played around with adding a flag to do a synchronous EADD
> but didn't see a meaningful change in performance for the simple case.
> Supporting both the worker queue and direct paths was complex enough
> that I decided it wasn't worth the trouble for initial upstreaming.

Sigh.

It seems silly to add a workaround for a language that has trouble calling somewhat-but-not-too-slow syscalls or ioctls.

How about fixing this in Go directly?  Either convince the golang people to add a way to allocate a real thread for a particular region of code or have the Go SGX folks write a bit of C code to do  a whole bunch of ioctls and have Go call *that*.  Then the mess stays in Go where it belongs.
Jarkko Sakkinen June 10, 2019, 5:06 p.m. UTC | #4
On Fri, Jun 07, 2019 at 12:32:23PM -0700, Andy Lutomirski wrote:
> Sigh.
> 
> It seems silly to add a workaround for a language that has trouble
> calling somewhat-but-not-too-slow syscalls or ioctls.
> 
> How about fixing this in Go directly?  Either convince the golang
> people to add a way to allocate a real thread for a particular region
> of code or have the Go SGX folks write a bit of C code to do  a whole
> bunch of ioctls and have Go call *that*.  Then the mess stays in Go
> where it belongs.

A worker thread would be only appropriate if the existing SGX code was
already in the mainline because then it would break the user space.

Doing it either way right now does not break anything so there is no
case for having it.

/Jarkko
Xing, Cedric June 10, 2019, 6:09 p.m. UTC | #5
> From: Andy Lutomirski [mailto:luto@amacapital.net]
> Sent: Friday, June 07, 2019 12:32 PM
> 
> > On Jun 6, 2019, at 10:32 AM, Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> >> On Thu, Jun 06, 2019 at 10:20:38AM -0700, Andy Lutomirski wrote:
> >> On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson
> >> <sean.j.christopherson@intel.com> wrote:
> >>>
> >>> For some enclaves, e.g. an enclave with a small code footprint and a
> >>> large working set, the vast majority of pages added to the enclave
> >>> are zero pages.  Introduce a flag to denote such zero pages.  The
> >>> major benefit of the flag will be realized in a future patch to use
> >>> Linux's actual zero page as the source, as opposed to explicitly
> >>> zeroing the enclave's backing memory.
> >>>
> >>
> >> I feel like I probably asked this at some point, but why is there a
> >> workqueue here at all?
> >
> > Performance.  A while back I wrote a patch set to remove the worker
> > queue and discovered that it tanks enclave build time when the enclave
> > is being hosted by a Golang application.  Here's a snippet from a mail
> > discussing the code.
> >
> >    The bad news is that I don't think we can remove the add page
> worker
> >    as applications with userspace schedulers, e.g. Go's M:N scheduler,
> >    can see a 10x or more throughput improvement when using the worker
> >    queue.  I did a bit of digging for the Golang case to make sure I
> >    wasn't doing something horribly stupid/naive and found that it's a
> >    generic issue in Golang with blocking (or just long-running) system
> >    calls.  Because Golang multiplexes Goroutines on top of OS threads,
> >    blocking syscalls introduce latency and context switching overhead,
> >    e.g. Go's scheduler will spin up a new OS thread to service other
> >    Goroutines after it realizes the syscall has blocked, and will
> later
> >    destroy one of the OS threads so that it doesn't build up too many
> >    unused.
> >
> > IIRC, the scenario is spinning up several goroutines, each building an
> > enclave.  I played around with adding a flag to do a synchronous EADD
> > but didn't see a meaningful change in performance for the simple case.
> > Supporting both the worker queue and direct paths was complex enough
> > that I decided it wasn't worth the trouble for initial upstreaming.
> 
> Sigh.
> 
> It seems silly to add a workaround for a language that has trouble
> calling somewhat-but-not-too-slow syscalls or ioctls.
> 
> How about fixing this in Go directly?  Either convince the golang people
> to add a way to allocate a real thread for a particular region of code
> or have the Go SGX folks write a bit of C code to do  a whole bunch of
> ioctls and have Go call *that*.  Then the mess stays in Go where it
> belongs.

The add page worker is less about performance but more about concurrency restrictions in SGX ISA. EADD/EEXTEND are slow instructions and both take lock on the SECS. So if there's another EADD/EEXTEND in progress then it will #GP.

In practice, no sane applications will EADD in multiple threads because that would make MRENCLAVE unpredictable. But adversary may use that just to trigger #GP in kernel. Therefore, the SGX module would have to lock around EADD/EEXTEND anyway. And then we figured it would be better to have the add page worker so that no lock would be necessary, plus (small) improvement in performance.
Sean Christopherson June 10, 2019, 6:41 p.m. UTC | #6
On Mon, Jun 10, 2019 at 11:09:50AM -0700, Xing, Cedric wrote:
> > From: Andy Lutomirski [mailto:luto@amacapital.net]
> > Sent: Friday, June 07, 2019 12:32 PM
> > 
> > > On Jun 6, 2019, at 10:32 AM, Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > >> On Thu, Jun 06, 2019 at 10:20:38AM -0700, Andy Lutomirski wrote:
> > >> On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson
> > >> <sean.j.christopherson@intel.com> wrote:
> > >>>
> > >>> For some enclaves, e.g. an enclave with a small code footprint and a
> > >>> large working set, the vast majority of pages added to the enclave
> > >>> are zero pages.  Introduce a flag to denote such zero pages.  The
> > >>> major benefit of the flag will be realized in a future patch to use
> > >>> Linux's actual zero page as the source, as opposed to explicitly
> > >>> zeroing the enclave's backing memory.
> > >>>
> > >>
> > >> I feel like I probably asked this at some point, but why is there a
> > >> workqueue here at all?
> > >
> > > Performance.  A while back I wrote a patch set to remove the worker
> > > queue and discovered that it tanks enclave build time when the enclave
> > > is being hosted by a Golang application.  Here's a snippet from a mail
> > > discussing the code.
> > >
> > >    The bad news is that I don't think we can remove the add page
> > worker
> > >    as applications with userspace schedulers, e.g. Go's M:N scheduler,
> > >    can see a 10x or more throughput improvement when using the worker
> > >    queue.  I did a bit of digging for the Golang case to make sure I
> > >    wasn't doing something horribly stupid/naive and found that it's a
> > >    generic issue in Golang with blocking (or just long-running) system
> > >    calls.  Because Golang multiplexes Goroutines on top of OS threads,
> > >    blocking syscalls introduce latency and context switching overhead,
> > >    e.g. Go's scheduler will spin up a new OS thread to service other
> > >    Goroutines after it realizes the syscall has blocked, and will
> > later
> > >    destroy one of the OS threads so that it doesn't build up too many
> > >    unused.
> > >
> > > IIRC, the scenario is spinning up several goroutines, each building an
> > > enclave.  I played around with adding a flag to do a synchronous EADD
> > > but didn't see a meaningful change in performance for the simple case.
> > > Supporting both the worker queue and direct paths was complex enough
> > > that I decided it wasn't worth the trouble for initial upstreaming.
> > 
> > Sigh.
> > 
> > It seems silly to add a workaround for a language that has trouble
> > calling somewhat-but-not-too-slow syscalls or ioctls.
> > 
> > How about fixing this in Go directly?  Either convince the golang people
> > to add a way to allocate a real thread for a particular region of code
> > or have the Go SGX folks write a bit of C code to do  a whole bunch of
> > ioctls and have Go call *that*.  Then the mess stays in Go where it
> > belongs.
> 
> The add page worker is less about performance but more about concurrency
> restrictions in SGX ISA. EADD/EEXTEND are slow instructions and both take
> lock on the SECS. So if there's another EADD/EEXTEND in progress then it will
> #GP.
> 
> In practice, no sane applications will EADD in multiple threads because that
> would make MRENCLAVE unpredictable. But adversary may use that just to
> trigger #GP in kernel. Therefore, the SGX module would have to lock around
> EADD/EEXTEND anyway. And then we figured it would be better to have the add
> page worker so that no lock would be necessary, plus (small) improvement in
> performance. 

That may have been true years ago, but it doesn't reflect the current
reality.  The ADD_PAGE ioctl *and* the worker function both take the
enclave's lock.  The worker function actually takes it twice, once to pull
the request off the queue, and once to do EADD/EEXTEND.  The lock is
temporarily released by the worker function to avoid deadlock in case EPC
page allocation triggers reclaim.
Sean Christopherson June 10, 2019, 6:53 p.m. UTC | #7
On Fri, Jun 07, 2019 at 12:32:23PM -0700, Andy Lutomirski wrote:
> 
> > On Jun 6, 2019, at 10:32 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > 
> >> On Thu, Jun 06, 2019 at 10:20:38AM -0700, Andy Lutomirski wrote:
> >> On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson
> >> <sean.j.christopherson@intel.com> wrote:
> >>> 
> >>> For some enclaves, e.g. an enclave with a small code footprint and a
> >>> large working set, the vast majority of pages added to the enclave are
> >>> zero pages.  Introduce a flag to denote such zero pages.  The major
> >>> benefit of the flag will be realized in a future patch to use Linux's
> >>> actual zero page as the source, as opposed to explicitly zeroing the
> >>> enclave's backing memory.
> >>> 
> >> 
> >> I feel like I probably asked this at some point, but why is there a
> >> workqueue here at all?
> > 
> > Performance.  A while back I wrote a patch set to remove the worker queue
> > and discovered that it tanks enclave build time when the enclave is being
> > hosted by a Golang application.  Here's a snippet from a mail discussing
> > the code.
> > 
> >    The bad news is that I don't think we can remove the add page worker
> >    as applications with userspace schedulers, e.g. Go's M:N scheduler,
> >    can see a 10x or more throughput improvement when using the worker
> >    queue.  I did a bit of digging for the Golang case to make sure I
> >    wasn't doing something horribly stupid/naive and found that it's a
> >    generic issue in Golang with blocking (or just long-running) system
> >    calls.  Because Golang multiplexes Goroutines on top of OS threads,
> >    blocking syscalls introduce latency and context switching overhead,
> >    e.g. Go's scheduler will spin up a new OS thread to service other
> >    Goroutines after it realizes the syscall has blocked, and will later
> >    destroy one of the OS threads so that it doesn't build up too many
> >    unused.
> > 
> > IIRC, the scenario is spinning up several goroutines, each building an
> > enclave.  I played around with adding a flag to do a synchronous EADD
> > but didn't see a meaningful change in performance for the simple case.
> > Supporting both the worker queue and direct paths was complex enough
> > that I decided it wasn't worth the trouble for initial upstreaming.
> 
> Sigh.
> 
> It seems silly to add a workaround for a language that has trouble calling
> somewhat-but-not-too-slow syscalls or ioctls.
> 
> How about fixing this in Go directly?  Either convince the golang people to
> add a way to allocate a real thread for a particular region of code or have
> the Go SGX folks write a bit of C code to do  a whole bunch of ioctls and
> have Go call *that*.  Then the mess stays in Go where it belongs.

Actually, I'm pretty sure changing the ioctl() from ADD_PAGE to ADD_REGION
would eliminate the worst of the golang slowdown without requiring
userspace to get super fancy.  I'm in favor of eliminating the work queue,
especially if the UAPI is changed to allow adding multiple pages in a
single syscall.
Jethro Beekman June 13, 2019, 12:38 a.m. UTC | #8
On 2019-06-10 11:53, Sean Christopherson wrote:
> On Fri, Jun 07, 2019 at 12:32:23PM -0700, Andy Lutomirski wrote:
>>
>>> On Jun 6, 2019, at 10:32 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>>
>>>> On Thu, Jun 06, 2019 at 10:20:38AM -0700, Andy Lutomirski wrote:
>>>> On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson
>>>> <sean.j.christopherson@intel.com> wrote:
>>>>>
>>>>> For some enclaves, e.g. an enclave with a small code footprint and a
>>>>> large working set, the vast majority of pages added to the enclave are
>>>>> zero pages.  Introduce a flag to denote such zero pages.  The major
>>>>> benefit of the flag will be realized in a future patch to use Linux's
>>>>> actual zero page as the source, as opposed to explicitly zeroing the
>>>>> enclave's backing memory.
>>>>>
>>>>
>>>> I feel like I probably asked this at some point, but why is there a
>>>> workqueue here at all?
>>>
>>> Performance.  A while back I wrote a patch set to remove the worker queue
>>> and discovered that it tanks enclave build time when the enclave is being
>>> hosted by a Golang application.  Here's a snippet from a mail discussing
>>> the code.
>>>
>>>     The bad news is that I don't think we can remove the add page worker
>>>     as applications with userspace schedulers, e.g. Go's M:N scheduler,
>>>     can see a 10x or more throughput improvement when using the worker
>>>     queue.  I did a bit of digging for the Golang case to make sure I
>>>     wasn't doing something horribly stupid/naive and found that it's a
>>>     generic issue in Golang with blocking (or just long-running) system
>>>     calls.  Because Golang multiplexes Goroutines on top of OS threads,
>>>     blocking syscalls introduce latency and context switching overhead,
>>>     e.g. Go's scheduler will spin up a new OS thread to service other
>>>     Goroutines after it realizes the syscall has blocked, and will later
>>>     destroy one of the OS threads so that it doesn't build up too many
>>>     unused.
>>>
>>> IIRC, the scenario is spinning up several goroutines, each building an
>>> enclave.  I played around with adding a flag to do a synchronous EADD
>>> but didn't see a meaningful change in performance for the simple case.
>>> Supporting both the worker queue and direct paths was complex enough
>>> that I decided it wasn't worth the trouble for initial upstreaming.
>>
>> Sigh.
>>
>> It seems silly to add a workaround for a language that has trouble calling
>> somewhat-but-not-too-slow syscalls or ioctls.
>>
>> How about fixing this in Go directly?  Either convince the golang people to
>> add a way to allocate a real thread for a particular region of code or have
>> the Go SGX folks write a bit of C code to do  a whole bunch of ioctls and
>> have Go call *that*.  Then the mess stays in Go where it belongs.
> 
> Actually, I'm pretty sure changing the ioctl() from ADD_PAGE to ADD_REGION
> would eliminate the worst of the golang slowdown without requiring
> userspace to get super fancy.  I'm in favor of eliminating the work queue,
> especially if the UAPI is changed to allow adding multiple pages in a
> single syscall.
> 

I don't know if this is going to matter a whole lot, but have you 
considered the performance impact of needing to the EPC paging while 
doing the EADD ioctl and how this interacts with having a workqueue?

--
Jethro Beekman | Fortanix
Sean Christopherson June 13, 2019, 1:46 p.m. UTC | #9
On Thu, Jun 13, 2019 at 12:38:02AM +0000, Jethro Beekman wrote:
> On 2019-06-10 11:53, Sean Christopherson wrote:
> >On Fri, Jun 07, 2019 at 12:32:23PM -0700, Andy Lutomirski wrote:
> >>
> >>>On Jun 6, 2019, at 10:32 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> >>>
> >>>>On Thu, Jun 06, 2019 at 10:20:38AM -0700, Andy Lutomirski wrote:
> >>>>On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson
> >>>><sean.j.christopherson@intel.com> wrote:
> >>>>>
> >>>>>For some enclaves, e.g. an enclave with a small code footprint and a
> >>>>>large working set, the vast majority of pages added to the enclave are
> >>>>>zero pages.  Introduce a flag to denote such zero pages.  The major
> >>>>>benefit of the flag will be realized in a future patch to use Linux's
> >>>>>actual zero page as the source, as opposed to explicitly zeroing the
> >>>>>enclave's backing memory.
> >>>>>
> >>>>
> >>>>I feel like I probably asked this at some point, but why is there a
> >>>>workqueue here at all?
> >>>
> >>>Performance.  A while back I wrote a patch set to remove the worker queue
> >>>and discovered that it tanks enclave build time when the enclave is being
> >>>hosted by a Golang application.  Here's a snippet from a mail discussing
> >>>the code.
> >>>
> >>>    The bad news is that I don't think we can remove the add page worker
> >>>    as applications with userspace schedulers, e.g. Go's M:N scheduler,
> >>>    can see a 10x or more throughput improvement when using the worker
> >>>    queue.  I did a bit of digging for the Golang case to make sure I
> >>>    wasn't doing something horribly stupid/naive and found that it's a
> >>>    generic issue in Golang with blocking (or just long-running) system
> >>>    calls.  Because Golang multiplexes Goroutines on top of OS threads,
> >>>    blocking syscalls introduce latency and context switching overhead,
> >>>    e.g. Go's scheduler will spin up a new OS thread to service other
> >>>    Goroutines after it realizes the syscall has blocked, and will later
> >>>    destroy one of the OS threads so that it doesn't build up too many
> >>>    unused.
> >>>
> >>>IIRC, the scenario is spinning up several goroutines, each building an
> >>>enclave.  I played around with adding a flag to do a synchronous EADD
> >>>but didn't see a meaningful change in performance for the simple case.
> >>>Supporting both the worker queue and direct paths was complex enough
> >>>that I decided it wasn't worth the trouble for initial upstreaming.
> >>
> >>Sigh.
> >>
> >>It seems silly to add a workaround for a language that has trouble calling
> >>somewhat-but-not-too-slow syscalls or ioctls.
> >>
> >>How about fixing this in Go directly?  Either convince the golang people to
> >>add a way to allocate a real thread for a particular region of code or have
> >>the Go SGX folks write a bit of C code to do  a whole bunch of ioctls and
> >>have Go call *that*.  Then the mess stays in Go where it belongs.
> >
> >Actually, I'm pretty sure changing the ioctl() from ADD_PAGE to ADD_REGION
> >would eliminate the worst of the golang slowdown without requiring
> >userspace to get super fancy.  I'm in favor of eliminating the work queue,
> >especially if the UAPI is changed to allow adding multiple pages in a
> >single syscall.
> >
> 
> I don't know if this is going to matter a whole lot, but have you considered
> the performance impact of needing to the EPC paging while doing the EADD
> ioctl and how this interacts with having a workqueue?

Yep, other than the goroutine case, eliminating the workqueue doesn't
substantially affect performance in either direction, regardless of the
pressure on the EPC.
Andy Lutomirski June 13, 2019, 4:16 p.m. UTC | #10
On Thu, Jun 13, 2019 at 6:46 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Thu, Jun 13, 2019 at 12:38:02AM +0000, Jethro Beekman wrote:
> > On 2019-06-10 11:53, Sean Christopherson wrote:
> > >On Fri, Jun 07, 2019 at 12:32:23PM -0700, Andy Lutomirski wrote:
> > >>
> > >>>On Jun 6, 2019, at 10:32 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > >>>
> > >>>>On Thu, Jun 06, 2019 at 10:20:38AM -0700, Andy Lutomirski wrote:
> > >>>>On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson
> > >>>><sean.j.christopherson@intel.com> wrote:
> > >>>>>
> > >>>>>For some enclaves, e.g. an enclave with a small code footprint and a
> > >>>>>large working set, the vast majority of pages added to the enclave are
> > >>>>>zero pages.  Introduce a flag to denote such zero pages.  The major
> > >>>>>benefit of the flag will be realized in a future patch to use Linux's
> > >>>>>actual zero page as the source, as opposed to explicitly zeroing the
> > >>>>>enclave's backing memory.
> > >>>>>
> > >>>>
> > >>>>I feel like I probably asked this at some point, but why is there a
> > >>>>workqueue here at all?
> > >>>
> > >>>Performance.  A while back I wrote a patch set to remove the worker queue
> > >>>and discovered that it tanks enclave build time when the enclave is being
> > >>>hosted by a Golang application.  Here's a snippet from a mail discussing
> > >>>the code.
> > >>>
> > >>>    The bad news is that I don't think we can remove the add page worker
> > >>>    as applications with userspace schedulers, e.g. Go's M:N scheduler,
> > >>>    can see a 10x or more throughput improvement when using the worker
> > >>>    queue.  I did a bit of digging for the Golang case to make sure I
> > >>>    wasn't doing something horribly stupid/naive and found that it's a
> > >>>    generic issue in Golang with blocking (or just long-running) system
> > >>>    calls.  Because Golang multiplexes Goroutines on top of OS threads,
> > >>>    blocking syscalls introduce latency and context switching overhead,
> > >>>    e.g. Go's scheduler will spin up a new OS thread to service other
> > >>>    Goroutines after it realizes the syscall has blocked, and will later
> > >>>    destroy one of the OS threads so that it doesn't build up too many
> > >>>    unused.
> > >>>
> > >>>IIRC, the scenario is spinning up several goroutines, each building an
> > >>>enclave.  I played around with adding a flag to do a synchronous EADD
> > >>>but didn't see a meaningful change in performance for the simple case.
> > >>>Supporting both the worker queue and direct paths was complex enough
> > >>>that I decided it wasn't worth the trouble for initial upstreaming.
> > >>
> > >>Sigh.
> > >>
> > >>It seems silly to add a workaround for a language that has trouble calling
> > >>somewhat-but-not-too-slow syscalls or ioctls.
> > >>
> > >>How about fixing this in Go directly?  Either convince the golang people to
> > >>add a way to allocate a real thread for a particular region of code or have
> > >>the Go SGX folks write a bit of C code to do  a whole bunch of ioctls and
> > >>have Go call *that*.  Then the mess stays in Go where it belongs.
> > >
> > >Actually, I'm pretty sure changing the ioctl() from ADD_PAGE to ADD_REGION
> > >would eliminate the worst of the golang slowdown without requiring
> > >userspace to get super fancy.  I'm in favor of eliminating the work queue,
> > >especially if the UAPI is changed to allow adding multiple pages in a
> > >single syscall.
> > >
> >
> > I don't know if this is going to matter a whole lot, but have you considered
> > the performance impact of needing to the EPC paging while doing the EADD
> > ioctl and how this interacts with having a workqueue?
>
> Yep, other than the goroutine case, eliminating the workqueue doesn't
> substantially affect performance in either direction, regardless of the
> pressure on the EPC.


It should get rid of some extra copies and allocations, no?
Sean Christopherson June 13, 2019, 4:54 p.m. UTC | #11
On Thu, Jun 13, 2019 at 09:16:46AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 13, 2019 at 6:46 AM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Thu, Jun 13, 2019 at 12:38:02AM +0000, Jethro Beekman wrote:
> > > On 2019-06-10 11:53, Sean Christopherson wrote:
> > > >On Fri, Jun 07, 2019 at 12:32:23PM -0700, Andy Lutomirski wrote:
> > > >>
> > > >>>On Jun 6, 2019, at 10:32 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > > >>>
> > > >>>>On Thu, Jun 06, 2019 at 10:20:38AM -0700, Andy Lutomirski wrote:
> > > >>>>On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson
> > > >>>><sean.j.christopherson@intel.com> wrote:
> > > >>>>>
> > > >>>>>For some enclaves, e.g. an enclave with a small code footprint and a
> > > >>>>>large working set, the vast majority of pages added to the enclave are
> > > >>>>>zero pages.  Introduce a flag to denote such zero pages.  The major
> > > >>>>>benefit of the flag will be realized in a future patch to use Linux's
> > > >>>>>actual zero page as the source, as opposed to explicitly zeroing the
> > > >>>>>enclave's backing memory.
> > > >>>>>
> > > >>>>
> > > >>>>I feel like I probably asked this at some point, but why is there a
> > > >>>>workqueue here at all?
> > > >>>
> > > >>>Performance.  A while back I wrote a patch set to remove the worker queue
> > > >>>and discovered that it tanks enclave build time when the enclave is being
> > > >>>hosted by a Golang application.  Here's a snippet from a mail discussing
> > > >>>the code.
> > > >>>
> > > >>>    The bad news is that I don't think we can remove the add page worker
> > > >>>    as applications with userspace schedulers, e.g. Go's M:N scheduler,
> > > >>>    can see a 10x or more throughput improvement when using the worker
> > > >>>    queue.  I did a bit of digging for the Golang case to make sure I
> > > >>>    wasn't doing something horribly stupid/naive and found that it's a
> > > >>>    generic issue in Golang with blocking (or just long-running) system
> > > >>>    calls.  Because Golang multiplexes Goroutines on top of OS threads,
> > > >>>    blocking syscalls introduce latency and context switching overhead,
> > > >>>    e.g. Go's scheduler will spin up a new OS thread to service other
> > > >>>    Goroutines after it realizes the syscall has blocked, and will later
> > > >>>    destroy one of the OS threads so that it doesn't build up too many
> > > >>>    unused.
> > > >>>
> > > >>>IIRC, the scenario is spinning up several goroutines, each building an
> > > >>>enclave.  I played around with adding a flag to do a synchronous EADD
> > > >>>but didn't see a meaningful change in performance for the simple case.
> > > >>>Supporting both the worker queue and direct paths was complex enough
> > > >>>that I decided it wasn't worth the trouble for initial upstreaming.
> > > >>
> > > >>Sigh.
> > > >>
> > > >>It seems silly to add a workaround for a language that has trouble calling
> > > >>somewhat-but-not-too-slow syscalls or ioctls.
> > > >>
> > > >>How about fixing this in Go directly?  Either convince the golang people to
> > > >>add a way to allocate a real thread for a particular region of code or have
> > > >>the Go SGX folks write a bit of C code to do  a whole bunch of ioctls and
> > > >>have Go call *that*.  Then the mess stays in Go where it belongs.
> > > >
> > > >Actually, I'm pretty sure changing the ioctl() from ADD_PAGE to ADD_REGION
> > > >would eliminate the worst of the golang slowdown without requiring
> > > >userspace to get super fancy.  I'm in favor of eliminating the work queue,
> > > >especially if the UAPI is changed to allow adding multiple pages in a
> > > >single syscall.
> > > >
> > >
> > > I don't know if this is going to matter a whole lot, but have you considered
> > > the performance impact of needing to the EPC paging while doing the EADD
> > > ioctl and how this interacts with having a workqueue?
> >
> > Yep, other than the goroutine case, eliminating the workqueue doesn't
> > substantially affect performance in either direction, regardless of the
> > pressure on the EPC.
> 
> It should get rid of some extra copies and allocations, no?

My experiments were fairly dumb, e.g. removed the workqueue without doing
extra enhancements like avoiding copy_from_user.  IIRC, the type of
allocation changed, but the number of copies/allocations stayed the same.
diff mbox series

Patch

diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h
index 30d114f6b3bd..18204722f238 100644
--- a/arch/x86/include/uapi/asm/sgx.h
+++ b/arch/x86/include/uapi/asm/sgx.h
@@ -31,6 +31,9 @@  struct sgx_enclave_create  {
 	__u64	src;
 };
 
+/* Zero an added region instead of copying data from a source page. */
+#define SGX_ZERO_REGION	0x100
+
 /**
  * struct sgx_enclave_add_region - parameter structure for the
  *                                 %SGX_IOC_ENCLAVE_ADD_REGION ioctl
@@ -38,6 +41,7 @@  struct sgx_enclave_create  {
  * @src:	start address for the pages' data
  * @size:	size of region, in bytes
  * @secinfo:	address of the SECINFO data (common to the entire region)
+ * @flags:	miscellaneous flags
  * @mrmask:	bitmask of 256 byte chunks to measure (applied per 4k page)
  */
 struct sgx_enclave_add_region {
@@ -45,6 +49,7 @@  struct sgx_enclave_add_region {
 	__u64	src;
 	__u64	size;
 	__u64	secinfo;
+	__u32	flags;
 	__u16	mrmask;
 } __attribute__((__packed__));
 
diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
index b69350696b87..c35264ea0c93 100644
--- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
@@ -459,6 +459,9 @@  static int sgx_validate_tcs(struct sgx_encl *encl, struct sgx_tcs *tcs)
 {
 	int i;
 
+	if (!tcs)
+		return -EINVAL;
+
 	if (tcs->flags & SGX_TCS_RESERVED_MASK)
 		return -EINVAL;
 
@@ -510,7 +513,10 @@  static int sgx_encl_queue_page(struct sgx_encl *encl,
 	}
 
 	backing_ptr = kmap(backing);
-	memcpy(backing_ptr, data, PAGE_SIZE);
+	if (data)
+		memcpy(backing_ptr, data, PAGE_SIZE);
+	else
+		memset(backing_ptr, 0, PAGE_SIZE);
 	kunmap(backing);
 	if (page_type == SGX_SECINFO_TCS)
 		encl_page->desc |= SGX_ENCL_PAGE_TCS;
@@ -576,12 +582,15 @@  static int __sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
 
 static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr,
 			     unsigned long src, struct sgx_secinfo *secinfo,
-			     unsigned int mrmask)
+			     unsigned int mrmask, unsigned long flags)
 {
 	struct page *data_page;
 	void *data;
 	int ret;
 
+	if (flags & SGX_ZERO_REGION)
+		return __sgx_encl_add_page(encl, addr, NULL, secinfo, mrmask);
+
 	data_page = alloc_page(GFP_HIGHUSER);
 	if (!data_page)
 		return -ENOMEM;
@@ -658,7 +667,7 @@  static long sgx_ioc_enclave_add_region(struct file *filep, void __user *arg)
 			cond_resched();
 
 		ret = sgx_encl_add_page(encl, region.addr + i, region.src + i,
-					&secinfo, region.mrmask);
+					&secinfo, region.mrmask, region.flags);
 		if (ret)
 			break;
 	}