[RFC,0/9] security: x86/sgx: SGX vs. LSM

Message ID	20190531233159.30992-1-sean.j.christopherson@intel.com (mailing list archive)
Headers	show Return-Path: <linux-sgx-owner@kernel.org> From: Sean Christopherson <sean.j.christopherson@intel.com> To: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org>, Cedric Xing <cedric.xing@intel.com>, Stephen Smalley <sds@tycho.nsa.gov>, James Morris <jmorris@namei.org>, "Serge E . Hallyn" <serge@hallyn.com>, LSM List <linux-security-module@vger.kernel.org>, Paul Moore <paul@paul-moore.com>, Eric Paris <eparis@parisplace.org>, selinux@vger.kernel.org, Jethro Beekman <jethro@fortanix.com>, Dave Hansen <dave.hansen@intel.com>, Thomas Gleixner <tglx@linutronix.de>, Linus Torvalds <torvalds@linux-foundation.org>, LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>, linux-sgx@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>, nhorman@redhat.com, npmccallum@redhat.com, Serge Ayoun <serge.ayoun@intel.com>, Shay Katz-zamir <shay.katz-zamir@intel.com>, Haitao Huang <haitao.huang@intel.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Kai Svahn <kai.svahn@intel.com>, Borislav Petkov <bp@alien8.de>, Josh Triplett <josh@joshtriplett.org>, Kai Huang <kai.huang@intel.com>, David Rientjes <rientjes@google.com>, William Roberts <william.c.roberts@intel.com>, Philip Tricca <philip.b.tricca@intel.com> Subject: [RFC PATCH 0/9] security: x86/sgx: SGX vs. LSM Date: Fri, 31 May 2019 16:31:50 -0700 Message-Id: <20190531233159.30992-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-sgx-owner@vger.kernel.org Precedence: bulk
Series	security: x86/sgx: SGX vs. LSM \| expand [RFC,0/9] security: x86/sgx: SGX vs. LSM [RFC,1/9] x86/sgx: Remove unused local variable in sgx_encl_release() [RFC,2/9] x86/sgx: Do not naturally align MAP_FIXED address [RFC,3/9] x86/sgx: Allow userspace to add multiple pages in single ioctl() [RFC,4/9] mm: Introduce vm_ops->mprotect() [RFC,5/9] x86/sgx: Restrict mapping without an enclave page to PROT_NONE [RFC,6/9] x86/sgx: Require userspace to provide allowed prots to ADD_PAGES [RFC,7/9] x86/sgx: Enforce noexec filesystem restriction for enclaves [RFC,8/9] LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX [RFC,9/9] security/selinux: Add enclave_load() implementation

Sean Christopherson May 31, 2019, 11:31 p.m. UTC

This series is the result of a rather absurd amount of discussion over
how to get SGX to play nice with LSM policies, without having to resort
to evil shenanigans or put undue burden on userspace.  The discussion
definitely wandered into completely insane territory at times, but I
think/hope we ended up with something reasonable.

The basic gist of the approach is to require userspace to declare what
protections are maximally allowed for any given page, e.g. add a flags
field for loading enclave pages that takes ALLOW_{READ,WRITE,EXEC}.  LSMs
can then adjust the allowed protections, e.g. clear ALLOW_EXEC to prevent
ever mapping the page with PROT_EXEC.  SGX enforces the allowed perms
via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses
MAY_{READ,WRITE,EXEC}.

ALLOW_EXEC is used to deny hings like loading an enclave from a noexec
file system or from a file without EXECUTE permissions, e.g. without
the ALLOW_EXEC concept, on SGX2 hardware (regardless of kernel support)
userspace could EADD from a noexec file using read-only permissions,
and later use mprotect() and ENCLU[EMODPE] to gain execute permissions.

ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's
EXECMOD (or EXECMEM).

This is very much an RFC series.  It's only compile tested, likely has
obvious bugs, the SELinux patch could be completely harebrained, etc...
My goal at this point is to get feedback at a macro level, e.g. is the
core concept viable/acceptable, are there objection to hooking
mprotect(), etc...

Andy and Cedric, hopefully this aligns with your general expectations
based on our last discussion.

Lastly, I added a patch to allow userspace to add multiple pages in a
single ioctl().  It's obviously not directly related to the security
stuff, but the idea tangentially came up during earlier discussions and
it's something I think the UAPI should provide (it's a tiny change).
Since I was modifying the UAPI anyways, I threw it in.

Sean Christopherson (9):
  x86/sgx: Remove unused local variable in sgx_encl_release()
  x86/sgx: Do not naturally align MAP_FIXED address
  x86/sgx: Allow userspace to add multiple pages in single ioctl()
  mm: Introduce vm_ops->mprotect()
  x86/sgx: Restrict mapping without an enclave page to PROT_NONE
  x86/sgx: Require userspace to provide allowed prots to ADD_PAGES
  x86/sgx: Enforce noexec filesystem restriction for enclaves
  LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX
  security/selinux: Add enclave_load() implementation

 arch/x86/include/uapi/asm/sgx.h        |  30 ++++--
 arch/x86/kernel/cpu/sgx/driver/ioctl.c | 143 +++++++++++++++++--------
 arch/x86/kernel/cpu/sgx/driver/main.c  |  13 ++-
 arch/x86/kernel/cpu/sgx/encl.c         |  31 +++++-
 arch/x86/kernel/cpu/sgx/encl.h         |   4 +
 include/linux/lsm_hooks.h              |  16 +++
 include/linux/mm.h                     |   2 +
 include/linux/security.h               |   2 +
 mm/mprotect.c                          |  15 ++-
 security/security.c                    |   8 ++
 security/selinux/hooks.c               |  85 +++++++++++++++
 11 files changed, 290 insertions(+), 59 deletions(-)

Xing, Cedric June 2, 2019, 7:29 a.m. UTC | #1

Hi Sean,

> From: Christopherson, Sean J
> Sent: Friday, May 31, 2019 4:32 PM
> 
> This series is the result of a rather absurd amount of discussion over how to get SGX to play
> nice with LSM policies, without having to resort to evil shenanigans or put undue burden on
> userspace.  The discussion definitely wandered into completely insane territory at times, but
> I think/hope we ended up with something reasonable.
> 
> The basic gist of the approach is to require userspace to declare what protections are
> maximally allowed for any given page, e.g. add a flags field for loading enclave pages that
> takes ALLOW_{READ,WRITE,EXEC}.  LSMs can then adjust the allowed protections, e.g. clear
> ALLOW_EXEC to prevent ever mapping the page with PROT_EXEC.  SGX enforces the allowed perms
> via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses MAY_{READ,WRITE,EXEC}.
> 
> ALLOW_EXEC is used to deny hings like loading an enclave from a noexec file system or from a
> file without EXECUTE permissions, e.g. without the ALLOW_EXEC concept, on SGX2 hardware
> (regardless of kernel support) userspace could EADD from a noexec file using read-only
> permissions, and later use mprotect() and ENCLU[EMODPE] to gain execute permissions.
> 
> ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's EXECMOD (or EXECMEM).
> 
> This is very much an RFC series.  It's only compile tested, likely has obvious bugs, the
> SELinux patch could be completely harebrained, etc...
> My goal at this point is to get feedback at a macro level, e.g. is the core concept
> viable/acceptable, are there objection to hooking mprotect(), etc...
> 
> Andy and Cedric, hopefully this aligns with your general expectations based on our last
> discussion.

I couldn't understand the real intentions of ALLOW_* flags until I saw them in code. I have to say C is more expressive than English in that regard :)

Generally I agree with your direction but think ALLOW_* flags are completely internal to LSM because they can be both produced and consumed inside an LSM module. So spilling them into SGX driver and also user mode code makes the solution ugly and in some cases impractical because not every enclave host process has a priori knowledge on whether or not an enclave page would be EMODPE'd at runtime.

Theoretically speaking, what you really need is a per page flag (let's name it WRITTEN?) indicating whether a page has ever been written to (or more precisely, granted PROT_WRITE), which will be used to decide whether to grant PROT_EXEC when requested in future. Given the fact that all mprotect() goes through LSM and mmap() is limited to PROT_NONE, it's easy for LSM to capture that flag by itself instead of asking user mode code to provide it.

That said, here is the summary of what I think is a better approach.
* In hook security_file_alloc(), if @file is an enclave, allocate some data structure to store for every page, the WRITTEN flag as described above. WRITTEN is cleared initially for all pages.
  Open: Given a file of type struct file *, how to tell if it is an enclave (i.e. /dev/sgx/enclave)?
* In hook security_mmap_file(), if @file is an enclave, make sure @prot can only be PROT_NONE. This is to force all protection changes to go through security_file_mprotect().
* In the newly introduced hook security_enclave_load(), set WRITTEN for pages that are requested PROT_WRITE.
* In hook security_file_mprotect(), if @vma->vm_file is an enclave, look up and use WRITTEN flags for all pages within @vma, along with other global flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux) to decide on allowing/rejecting @prot.
* In hook security_file_free(), if @file is an enclave, free storage allocated for WRITTEN flags. 

I'll try to make more detailed comments in my replies to individual patches sometime tomorrow.

> 
> Lastly, I added a patch to allow userspace to add multiple pages in a single ioctl().  It's
> obviously not directly related to the security stuff, but the idea tangentially came up during
> earlier discussions and it's something I think the UAPI should provide (it's a tiny change).
> Since I was modifying the UAPI anyways, I threw it in.
> 
> Sean Christopherson (9):
>   x86/sgx: Remove unused local variable in sgx_encl_release()
>   x86/sgx: Do not naturally align MAP_FIXED address
>   x86/sgx: Allow userspace to add multiple pages in single ioctl()
>   mm: Introduce vm_ops->mprotect()
>   x86/sgx: Restrict mapping without an enclave page to PROT_NONE
>   x86/sgx: Require userspace to provide allowed prots to ADD_PAGES
>   x86/sgx: Enforce noexec filesystem restriction for enclaves
>   LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX
>   security/selinux: Add enclave_load() implementation
> 
>  arch/x86/include/uapi/asm/sgx.h        |  30 ++++--
>  arch/x86/kernel/cpu/sgx/driver/ioctl.c | 143 +++++++++++++++++--------
> arch/x86/kernel/cpu/sgx/driver/main.c  |  13 ++-
>  arch/x86/kernel/cpu/sgx/encl.c         |  31 +++++-
>  arch/x86/kernel/cpu/sgx/encl.h         |   4 +
>  include/linux/lsm_hooks.h              |  16 +++
>  include/linux/mm.h                     |   2 +
>  include/linux/security.h               |   2 +
>  mm/mprotect.c                          |  15 ++-
>  security/security.c                    |   8 ++
>  security/selinux/hooks.c               |  85 +++++++++++++++
>  11 files changed, 290 insertions(+), 59 deletions(-)
> 
> --
> 2.21.0

-Cedric

Sean Christopherson June 3, 2019, 5:15 p.m. UTC | #2

On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
> Hi Sean,
> 
> > From: Christopherson, Sean J
> > Sent: Friday, May 31, 2019 4:32 PM
> > 
> > This series is the result of a rather absurd amount of discussion over how to get SGX to play
> > nice with LSM policies, without having to resort to evil shenanigans or put undue burden on
> > userspace.  The discussion definitely wandered into completely insane territory at times, but
> > I think/hope we ended up with something reasonable.
> > 
> > The basic gist of the approach is to require userspace to declare what protections are
> > maximally allowed for any given page, e.g. add a flags field for loading enclave pages that
> > takes ALLOW_{READ,WRITE,EXEC}.  LSMs can then adjust the allowed protections, e.g. clear
> > ALLOW_EXEC to prevent ever mapping the page with PROT_EXEC.  SGX enforces the allowed perms
> > via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses MAY_{READ,WRITE,EXEC}.
> > 
> > ALLOW_EXEC is used to deny hings like loading an enclave from a noexec file system or from a
> > file without EXECUTE permissions, e.g. without the ALLOW_EXEC concept, on SGX2 hardware
> > (regardless of kernel support) userspace could EADD from a noexec file using read-only
> > permissions, and later use mprotect() and ENCLU[EMODPE] to gain execute permissions.
> > 
> > ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's EXECMOD (or EXECMEM).
> > 
> > This is very much an RFC series.  It's only compile tested, likely has obvious bugs, the
> > SELinux patch could be completely harebrained, etc...
> > My goal at this point is to get feedback at a macro level, e.g. is the core concept
> > viable/acceptable, are there objection to hooking mprotect(), etc...
> > 
> > Andy and Cedric, hopefully this aligns with your general expectations based on our last
> > discussion.
> 
> I couldn't understand the real intentions of ALLOW_* flags until I saw them
> in code. I have to say C is more expressive than English in that regard :)
> 
> Generally I agree with your direction but think ALLOW_* flags are completely
> internal to LSM because they can be both produced and consumed inside an LSM
> module. So spilling them into SGX driver and also user mode code makes the
> solution ugly and in some cases impractical because not every enclave host
> process has a priori knowledge on whether or not an enclave page would be
> EMODPE'd at runtime.

In this case, the host process should tag *all* pages it *might* convert
to executable as ALLOW_EXEC.  LSMs can (and should/will) be written in
such a way that denying ALLOW_EXEC is fatal to the enclave if and only if
the enclave actually attempts mprotect(PROT_EXEC).

Take the SELinux path for example.  The only scenario in which PROT_WRITE
is cleared from @allowed_prot is if the page *starts* with PROT_EXEC.
If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
then PROT_EXEC will be cleared from @allowed_prot.

As Stephen pointed out, auditing the denials on @allowed_prot means the
log will contain false positives of a sort.  But this is more of a noise
issue than true false positives.  E.g. there are three possible outcomes
for the enclave.

  - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
    Requesting ALLOW_EXEC is either a straightforward a userspace bug or
    a poorly written generic enclave loader.

  - The enclave conditionally performs EMODPE[PROT_EXEC].  In this case
    the denial is a true false positive.
  
  - The enclave does EMODPE[PROT_EXEC] and its host userspace then fails
    on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended.
    The audit log will be noisy, but viewed as a whole the denials aren't
    false positives.

The potential for noisy audit logs and/or false positives is unfortunate,
but it's (by far) the lesser of many evils.

> Theoretically speaking, what you really need is a per page flag (let's name
> it WRITTEN?) indicating whether a page has ever been written to (or more
> precisely, granted PROT_WRITE), which will be used to decide whether to grant
> PROT_EXEC when requested in future. Given the fact that all mprotect() goes
> through LSM and mmap() is limited to PROT_NONE, it's easy for LSM to capture
> that flag by itself instead of asking user mode code to provide it.
>
> That said, here is the summary of what I think is a better approach.
> * In hook security_file_alloc(), if @file is an enclave, allocate some data
>   structure to store for every page, the WRITTEN flag as described above.
>   WRITTEN is cleared initially for all pages.

This would effectively require *every* LSM to duplicate the SGX driver's
functionality, e.g. track per-page metadata, implement locking to prevent
races between multiple mm structs, etc...

>   Open: Given a file of type struct file *, how to tell if it is an enclave (i.e. /dev/sgx/enclave)?
> * In hook security_mmap_file(), if @file is an enclave, make sure @prot can
>   only be PROT_NONE. This is to force all protection changes to go through
>   security_file_mprotect().
> * In the newly introduced hook security_enclave_load(), set WRITTEN for pages
>   that are requested PROT_WRITE.

How would an LSM associate a page with a specific enclave?  vma->vm_file
will point always point at /dev/sgx/enclave.  vma->vm_mm is useless
because we're allowing multiple processes to map a single enclave, not to
mention that by mm would require holding a reference to the mm.

> * In hook security_file_mprotect(), if @vma->vm_file is an enclave, look up
>   and use WRITTEN flags for all pages within @vma, along with other global
>   flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux) to decide
>   on allowing/rejecting @prot.

vma->vm_file will always be /dev/sgx/enclave at this point, which means
LSMs don't have the necessary anchor back to the source file, e.g. to
enforce FILE__EXECUTE.  The noexec file system case is also unaddressed.

> * In hook security_file_free(), if @file is an  enclave, free storage
>   allocated for WRITTEN flags.

Stephen Smalley June 3, 2019, 5:47 p.m. UTC | #3

On 6/2/19 3:29 AM, Xing, Cedric wrote:
> Hi Sean,
> 
>> From: Christopherson, Sean J
>> Sent: Friday, May 31, 2019 4:32 PM
>>
>> This series is the result of a rather absurd amount of discussion over how to get SGX to play
>> nice with LSM policies, without having to resort to evil shenanigans or put undue burden on
>> userspace.  The discussion definitely wandered into completely insane territory at times, but
>> I think/hope we ended up with something reasonable.
>>
>> The basic gist of the approach is to require userspace to declare what protections are
>> maximally allowed for any given page, e.g. add a flags field for loading enclave pages that
>> takes ALLOW_{READ,WRITE,EXEC}.  LSMs can then adjust the allowed protections, e.g. clear
>> ALLOW_EXEC to prevent ever mapping the page with PROT_EXEC.  SGX enforces the allowed perms
>> via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses MAY_{READ,WRITE,EXEC}.
>>
>> ALLOW_EXEC is used to deny hings like loading an enclave from a noexec file system or from a
>> file without EXECUTE permissions, e.g. without the ALLOW_EXEC concept, on SGX2 hardware
>> (regardless of kernel support) userspace could EADD from a noexec file using read-only
>> permissions, and later use mprotect() and ENCLU[EMODPE] to gain execute permissions.
>>
>> ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's EXECMOD (or EXECMEM).
>>
>> This is very much an RFC series.  It's only compile tested, likely has obvious bugs, the
>> SELinux patch could be completely harebrained, etc...
>> My goal at this point is to get feedback at a macro level, e.g. is the core concept
>> viable/acceptable, are there objection to hooking mprotect(), etc...
>>
>> Andy and Cedric, hopefully this aligns with your general expectations based on our last
>> discussion.
> 
> I couldn't understand the real intentions of ALLOW_* flags until I saw them in code. I have to say C is more expressive than English in that regard :)
> 
> Generally I agree with your direction but think ALLOW_* flags are completely internal to LSM because they can be both produced and consumed inside an LSM module. So spilling them into SGX driver and also user mode code makes the solution ugly and in some cases impractical because not every enclave host process has a priori knowledge on whether or not an enclave page would be EMODPE'd at runtime.
> 
> Theoretically speaking, what you really need is a per page flag (let's name it WRITTEN?) indicating whether a page has ever been written to (or more precisely, granted PROT_WRITE), which will be used to decide whether to grant PROT_EXEC when requested in future. Given the fact that all mprotect() goes through LSM and mmap() is limited to PROT_NONE, it's easy for LSM to capture that flag by itself instead of asking user mode code to provide it.
> 
> That said, here is the summary of what I think is a better approach.
> * In hook security_file_alloc(), if @file is an enclave, allocate some data structure to store for every page, the WRITTEN flag as described above. WRITTEN is cleared initially for all pages.
>    Open: Given a file of type struct file *, how to tell if it is an enclave (i.e. /dev/sgx/enclave)?
> * In hook security_mmap_file(), if @file is an enclave, make sure @prot can only be PROT_NONE. This is to force all protection changes to go through security_file_mprotect().
> * In the newly introduced hook security_enclave_load(), set WRITTEN for pages that are requested PROT_WRITE.
> * In hook security_file_mprotect(), if @vma->vm_file is an enclave, look up and use WRITTEN flags for all pages within @vma, along with other global flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux) to decide on 
allowing/rejecting @prot.

At this point we have no knowledge of the source vma/file, right?  So 
what do we check FILE__EXECUTE and/or FILE__EXECMOD against? 
vma->vm_file at this point is /dev/sgx/enclave, right?

> * In hook security_file_free(), if @file is an enclave, free storage allocated for WRITTEN flags.
> 
> I'll try to make more detailed comments in my replies to individual patches sometime tomorrow.
> 
>>
>> Lastly, I added a patch to allow userspace to add multiple pages in a single ioctl().  It's
>> obviously not directly related to the security stuff, but the idea tangentially came up during
>> earlier discussions and it's something I think the UAPI should provide (it's a tiny change).
>> Since I was modifying the UAPI anyways, I threw it in.
>>
>> Sean Christopherson (9):
>>    x86/sgx: Remove unused local variable in sgx_encl_release()
>>    x86/sgx: Do not naturally align MAP_FIXED address
>>    x86/sgx: Allow userspace to add multiple pages in single ioctl()
>>    mm: Introduce vm_ops->mprotect()
>>    x86/sgx: Restrict mapping without an enclave page to PROT_NONE
>>    x86/sgx: Require userspace to provide allowed prots to ADD_PAGES
>>    x86/sgx: Enforce noexec filesystem restriction for enclaves
>>    LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX
>>    security/selinux: Add enclave_load() implementation
>>
>>   arch/x86/include/uapi/asm/sgx.h        |  30 ++++--
>>   arch/x86/kernel/cpu/sgx/driver/ioctl.c | 143 +++++++++++++++++--------
>> arch/x86/kernel/cpu/sgx/driver/main.c  |  13 ++-
>>   arch/x86/kernel/cpu/sgx/encl.c         |  31 +++++-
>>   arch/x86/kernel/cpu/sgx/encl.h         |   4 +
>>   include/linux/lsm_hooks.h              |  16 +++
>>   include/linux/mm.h                     |   2 +
>>   include/linux/security.h               |   2 +
>>   mm/mprotect.c                          |  15 ++-
>>   security/security.c                    |   8 ++
>>   security/selinux/hooks.c               |  85 +++++++++++++++
>>   11 files changed, 290 insertions(+), 59 deletions(-)
>>
>> --
>> 2.21.0
> 
> -Cedric
>

Xing, Cedric June 3, 2019, 6:02 p.m. UTC | #4

> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx-
> owner@vger.kernel.org] On Behalf Of Stephen Smalley
> Sent: Monday, June 03, 2019 10:47 AM
> 
> On 6/2/19 3:29 AM, Xing, Cedric wrote:
> > Hi Sean,
> >
> >> From: Christopherson, Sean J
> >> Sent: Friday, May 31, 2019 4:32 PM
> >>
> >> This series is the result of a rather absurd amount of discussion
> >> over how to get SGX to play nice with LSM policies, without having to
> >> resort to evil shenanigans or put undue burden on userspace.  The
> >> discussion definitely wandered into completely insane territory at
> times, but I think/hope we ended up with something reasonable.
> >>
> >> The basic gist of the approach is to require userspace to declare
> >> what protections are maximally allowed for any given page, e.g. add a
> >> flags field for loading enclave pages that takes
> >> ALLOW_{READ,WRITE,EXEC}.  LSMs can then adjust the allowed
> >> protections, e.g. clear ALLOW_EXEC to prevent ever mapping the page
> with PROT_EXEC.  SGX enforces the allowed perms via a new mprotect()
> vm_ops hook, e.g. like regular mprotect() uses MAY_{READ,WRITE,EXEC}.
> >>
> >> ALLOW_EXEC is used to deny hings like loading an enclave from a
> >> noexec file system or from a file without EXECUTE permissions, e.g.
> >> without the ALLOW_EXEC concept, on SGX2 hardware (regardless of
> >> kernel support) userspace could EADD from a noexec file using read-
> only permissions, and later use mprotect() and ENCLU[EMODPE] to gain
> execute permissions.
> >>
> >> ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce
> SELinux's EXECMOD (or EXECMEM).
> >>
> >> This is very much an RFC series.  It's only compile tested, likely
> >> has obvious bugs, the SELinux patch could be completely harebrained,
> etc...
> >> My goal at this point is to get feedback at a macro level, e.g. is
> >> the core concept viable/acceptable, are there objection to hooking
> mprotect(), etc...
> >>
> >> Andy and Cedric, hopefully this aligns with your general expectations
> >> based on our last discussion.
> >
> > I couldn't understand the real intentions of ALLOW_* flags until I saw
> > them in code. I have to say C is more expressive than English in that
> > regard :)
> >
> > Generally I agree with your direction but think ALLOW_* flags are
> completely internal to LSM because they can be both produced and
> consumed inside an LSM module. So spilling them into SGX driver and also
> user mode code makes the solution ugly and in some cases impractical
> because not every enclave host process has a priori knowledge on whether
> or not an enclave page would be EMODPE'd at runtime.
> >
> > Theoretically speaking, what you really need is a per page flag (let's
> name it WRITTEN?) indicating whether a page has ever been written to (or
> more precisely, granted PROT_WRITE), which will be used to decide
> whether to grant PROT_EXEC when requested in future. Given the fact that
> all mprotect() goes through LSM and mmap() is limited to PROT_NONE, it's
> easy for LSM to capture that flag by itself instead of asking user mode
> code to provide it.
> >
> > That said, here is the summary of what I think is a better approach.
> > * In hook security_file_alloc(), if @file is an enclave, allocate some
> data structure to store for every page, the WRITTEN flag as described
> above. WRITTEN is cleared initially for all pages.
> >    Open: Given a file of type struct file *, how to tell if it is an
> enclave (i.e. /dev/sgx/enclave)?
> > * In hook security_mmap_file(), if @file is an enclave, make sure
> @prot can only be PROT_NONE. This is to force all protection changes to
> go through security_file_mprotect().
> > * In the newly introduced hook security_enclave_load(), set WRITTEN
> for pages that are requested PROT_WRITE.
> > * In hook security_file_mprotect(), if @vma->vm_file is an enclave,
> > look up and use WRITTEN flags for all pages within @vma, along with
> > other global flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of
> > SELinux) to decide on
> allowing/rejecting @prot.
> 
> At this point we have no knowledge of the source vma/file, right?  So
> what do we check FILE__EXECUTE and/or FILE__EXECMOD against?
> vma->vm_file at this point is /dev/sgx/enclave, right?

My apology to the confusions here.

Yes, vma->vm_file is always /dev/sgx/enclave, but each open("/dev/sgx/enclave") returns a *new* file struct (let's denote it as @enclave_fd) that uniquely identifies one enclave instance, and the expectation is that @enclave_fd->f_security would be used by LSM to store enclave specific information, including ALLOW_* flags and whatever deemed appropriate by an LSM module.

In the case of SELinux, and if the choice is to use FILE__EXECMOD of .sigstruct file to authorize RW->RX at runtime, then SELinux could cache that flag in @enclave_fd->f_security upon security_enclave_init().

Xing, Cedric June 3, 2019, 6:30 p.m. UTC | #5

> From: Christopherson, Sean J
> Sent: Monday, June 03, 2019 10:16 AM
> 
> On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
> > Hi Sean,
> >
> > Generally I agree with your direction but think ALLOW_* flags are
> > completely internal to LSM because they can be both produced and
> > consumed inside an LSM module. So spilling them into SGX driver and
> > also user mode code makes the solution ugly and in some cases
> > impractical because not every enclave host process has a priori
> > knowledge on whether or not an enclave page would be EMODPE'd at
> runtime.
> 
> In this case, the host process should tag *all* pages it *might* convert
> to executable as ALLOW_EXEC.  LSMs can (and should/will) be written in
> such a way that denying ALLOW_EXEC is fatal to the enclave if and only
> if the enclave actually attempts mprotect(PROT_EXEC).

What if those pages contain self-modifying code but the host doesn't know ahead of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD? Then would it prevent those pages to start with PROT_EXEC?

Anyway, my point is that it is unnecessary even if it works.

> 
> Take the SELinux path for example.  The only scenario in which
> PROT_WRITE is cleared from @allowed_prot is if the page *starts* with
> PROT_EXEC.
> If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
> then PROT_EXEC will be cleared from @allowed_prot.
> 
> As Stephen pointed out, auditing the denials on @allowed_prot means the
> log will contain false positives of a sort.  But this is more of a noise
> issue than true false positives.  E.g. there are three possible outcomes
> for the enclave.
> 
>   - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
>     Requesting ALLOW_EXEC is either a straightforward a userspace bug or
>     a poorly written generic enclave loader.
> 
>   - The enclave conditionally performs EMODPE[PROT_EXEC].  In this case
>     the denial is a true false positive.
> 
>   - The enclave does EMODPE[PROT_EXEC] and its host userspace then fails
>     on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended.
>     The audit log will be noisy, but viewed as a whole the denials
> aren't
>     false positives.

What I was talking about was EMODPE[PROT_WRITE] on an RX page.

> 
> The potential for noisy audit logs and/or false positives is unfortunate,
> but it's (by far) the lesser of many evils.
> 
> > Theoretically speaking, what you really need is a per page flag (let's
> > name it WRITTEN?) indicating whether a page has ever been written to
> > (or more precisely, granted PROT_WRITE), which will be used to decide
> > whether to grant PROT_EXEC when requested in future. Given the fact
> > that all mprotect() goes through LSM and mmap() is limited to
> > PROT_NONE, it's easy for LSM to capture that flag by itself instead of
> asking user mode code to provide it.
> >
> > That said, here is the summary of what I think is a better approach.
> > * In hook security_file_alloc(), if @file is an enclave, allocate some
> data
> >   structure to store for every page, the WRITTEN flag as described
> above.
> >   WRITTEN is cleared initially for all pages.
> 
> This would effectively require *every* LSM to duplicate the SGX driver's
> functionality, e.g. track per-page metadata, implement locking to
> prevent races between multiple mm structs, etc...

Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_* are no difference than PROCESS__* or FILE__* flags, which are just artifacts to assist particular LSMs in decision making. They are never considered part of the LSM interface, even if other LSMs than SELinux may adopt the same/similar approach.

If code duplication is what you are worrying about, you can put them in a library, or implement/export them in some new file (maybe security/enclave.c?) as utility functions. But spilling them into user mode is what I think is unacceptable.

> 
> >   Open: Given a file of type struct file *, how to tell if it is an
> enclave (i.e. /dev/sgx/enclave)?
> > * In hook security_mmap_file(), if @file is an enclave, make sure
> @prot can
> >   only be PROT_NONE. This is to force all protection changes to go
> through
> >   security_file_mprotect().
> > * In the newly introduced hook security_enclave_load(), set WRITTEN
> for pages
> >   that are requested PROT_WRITE.
> 
> How would an LSM associate a page with a specific enclave?  vma->vm_file
> will point always point at /dev/sgx/enclave.  vma->vm_mm is useless
> because we're allowing multiple processes to map a single enclave, not
> to mention that by mm would require holding a reference to the mm.

Each open("/dev/sgx/enclave") syscall creates a *new* instance of struct file to uniquely identify one enclave instance. What I mean is @vma->vm_file, not @vma->vm_file->f_path or @vma->vm_file->f_inode.

> 
> > * In hook security_file_mprotect(), if @vma->vm_file is an enclave,
> look up
> >   and use WRITTEN flags for all pages within @vma, along with other
> global
> >   flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux)
> to decide
> >   on allowing/rejecting @prot.
> 
> vma->vm_file will always be /dev/sgx/enclave at this point, which means
> LSMs don't have the necessary anchor back to the source file, e.g. to
> enforce FILE__EXECUTE.  The noexec file system case is also unaddressed.

vma->vm_file identifies an enclave instance uniquely. FILE__EXECUTE is checked by security_enclave_load() using @source_vma->vm_file. Once a page has been EADD'ed, whether to allow RW->RX depends on .sigstruct file (more precisely, the file backing SIGSTRUCT), whose FILE__* attributes could be cached in vma->vm_file->f_security by security_enclave_init().
 
The noexec case should be addressed in IOC_ADD_PAGES by testing @source_vma->vm_flags & VM_MAYEXEC.

> 
> > * In hook security_file_free(), if @file is an  enclave, free storage
> >   allocated for WRITTEN flags.

Sean Christopherson June 4, 2019, 1:36 a.m. UTC | #6

On Mon, Jun 03, 2019 at 11:30:54AM -0700, Xing, Cedric wrote:
> > From: Christopherson, Sean J
> > Sent: Monday, June 03, 2019 10:16 AM
> > 
> > On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
> > > Hi Sean,
> > >
> > > Generally I agree with your direction but think ALLOW_* flags are
> > > completely internal to LSM because they can be both produced and
> > > consumed inside an LSM module. So spilling them into SGX driver and
> > > also user mode code makes the solution ugly and in some cases
> > > impractical because not every enclave host process has a priori
> > > knowledge on whether or not an enclave page would be EMODPE'd at
> > runtime.
> > 
> > In this case, the host process should tag *all* pages it *might* convert
> > to executable as ALLOW_EXEC.  LSMs can (and should/will) be written in
> > such a way that denying ALLOW_EXEC is fatal to the enclave if and only
> > if the enclave actually attempts mprotect(PROT_EXEC).
> 
> What if those pages contain self-modifying code but the host doesn't know
> ahead of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD? Then would it
> prevent those pages to start with PROT_EXEC?

Without ALLOW_WRITE+ALLOW_EXEC, the enclave would build and launch, but
fail at mprotect(..., PROT_WRITE), e.g. when it attempted to gain write
access to do self-modifying code.  And it would would fail irrespective of
LSM restrictions.

> Anyway, my point is that it is unnecessary even if it works.

Unnecessary in an ideal world, yes.  Realistically, it's the least bad
option.

> > Take the SELinux path for example.  The only scenario in which
> > PROT_WRITE is cleared from @allowed_prot is if the page *starts* with
> > PROT_EXEC.
> > If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
> > then PROT_EXEC will be cleared from @allowed_prot.
> > 
> > As Stephen pointed out, auditing the denials on @allowed_prot means the
> > log will contain false positives of a sort.  But this is more of a noise
> > issue than true false positives.  E.g. there are three possible outcomes
> > for the enclave.
> > 
> >   - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
> >     Requesting ALLOW_EXEC is either a straightforward a userspace bug or
> >     a poorly written generic enclave loader.
> > 
> >   - The enclave conditionally performs EMODPE[PROT_EXEC].  In this case
> >     the denial is a true false positive.
> > 
> >   - The enclave does EMODPE[PROT_EXEC] and its host userspace then fails
> >     on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended.
> >     The audit log will be noisy, but viewed as a whole the denials
> > aren't
> >     false positives.
> 
> What I was talking about was EMODPE[PROT_WRITE] on an RX page.

As above, mprotect(..., PROT_WRITE) would fail without ALLOW_WRITE.

> > The potential for noisy audit logs and/or false positives is unfortunate,
> > but it's (by far) the lesser of many evils.
> > 
> > > Theoretically speaking, what you really need is a per page flag (let's
> > > name it WRITTEN?) indicating whether a page has ever been written to
> > > (or more precisely, granted PROT_WRITE), which will be used to decide
> > > whether to grant PROT_EXEC when requested in future. Given the fact
> > > that all mprotect() goes through LSM and mmap() is limited to
> > > PROT_NONE, it's easy for LSM to capture that flag by itself instead of
> > asking user mode code to provide it.
> > >
> > > That said, here is the summary of what I think is a better approach.
> > > * In hook security_file_alloc(), if @file is an enclave, allocate some
> > data
> > >   structure to store for every page, the WRITTEN flag as described
> > above.
> > >   WRITTEN is cleared initially for all pages.
> > 
> > This would effectively require *every* LSM to duplicate the SGX driver's
> > functionality, e.g. track per-page metadata, implement locking to
> > prevent races between multiple mm structs, etc...
> 
> Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_* are no
> difference than PROCESS__* or FILE__* flags, which are just artifacts to
> assist particular LSMs in decision making. They are never considered part of
> the LSM interface, even if other LSMs than SELinux may adopt the same/similar
> approach.

No, the flags are tracked and managed by SGX.  We are not dictating LSM
behavior in any way, e.g. an LSM could completely ignore @allowed_prot and
nothing would break.

> If code duplication is what you are worrying about, you can put them in a
> library, or implement/export them in some new file (maybe
> security/enclave.c?) as utility functions.

Code duplication is the least of my concerns.  Tracking file pointers
would require a global list/tree of some form, along with a locking and/or
RCU scheme to protect accesses to that container.  Another lock would be
needed to prevent races between mprotect() calls from different processes.

> But spilling them into user mode is what I think is unacceptable.

Why is it unacceptable?  There's effectively no cost to userspace for SGX1.
The ALLOW_* flags only come into play in the event of a noexec or LSM
restriction, i.e. worst case scenario an enclave that wants to do arbitrary
self-modifying code can declare RWX on everything.

Jarkko Sakkinen June 4, 2019, 11:15 a.m. UTC | #7

On Fri, May 31, 2019 at 04:31:50PM -0700, Sean Christopherson wrote:
> This series is the result of a rather absurd amount of discussion over
> how to get SGX to play nice with LSM policies, without having to resort
> to evil shenanigans or put undue burden on userspace.  The discussion
> definitely wandered into completely insane territory at times, but I
> think/hope we ended up with something reasonable.

By definition this is a broken series because it does not apply to
mainline. Even RFC series should at least apply. Would be better idea to
discuss design ideas and use snippets instead. Now you have to take
original v20 and apply to these patches to evaluate anything.

> The basic gist of the approach is to require userspace to declare what
> protections are maximally allowed for any given page, e.g. add a flags
> field for loading enclave pages that takes ALLOW_{READ,WRITE,EXEC}.  LSMs
> can then adjust the allowed protections, e.g. clear ALLOW_EXEC to prevent
> ever mapping the page with PROT_EXEC.  SGX enforces the allowed perms
> via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses
> MAY_{READ,WRITE,EXEC}.

mprotect() does not use MAY_{READ,WRITE,EXEC} constants. It uses
VM_MAY{READ,WRITE,EXEC,SHARED} constants.

What are ALLOW_{READ,WRITE,EXEC} and how they are used? What does the
hook do and why it is in vm_ops and not in file_operations? Are they
arguments to the ioctl or internal variables that are set based on
SECINFO?

/Jarkko

Stephen Smalley June 4, 2019, 3:33 p.m. UTC | #8

On 6/3/19 2:30 PM, Xing, Cedric wrote:
>> From: Christopherson, Sean J
>> Sent: Monday, June 03, 2019 10:16 AM
>>
>> On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
>>> Hi Sean,
>>>
>>> Generally I agree with your direction but think ALLOW_* flags are
>>> completely internal to LSM because they can be both produced and
>>> consumed inside an LSM module. So spilling them into SGX driver and
>>> also user mode code makes the solution ugly and in some cases
>>> impractical because not every enclave host process has a priori
>>> knowledge on whether or not an enclave page would be EMODPE'd at
>> runtime.
>>
>> In this case, the host process should tag *all* pages it *might* convert
>> to executable as ALLOW_EXEC.  LSMs can (and should/will) be written in
>> such a way that denying ALLOW_EXEC is fatal to the enclave if and only
>> if the enclave actually attempts mprotect(PROT_EXEC).
> 
> What if those pages contain self-modifying code but the host doesn't know ahead of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD? Then would it prevent those pages to start with PROT_EXEC?
> 
> Anyway, my point is that it is unnecessary even if it works.
> 
>>
>> Take the SELinux path for example.  The only scenario in which
>> PROT_WRITE is cleared from @allowed_prot is if the page *starts* with
>> PROT_EXEC.
>> If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
>> then PROT_EXEC will be cleared from @allowed_prot.
>>
>> As Stephen pointed out, auditing the denials on @allowed_prot means the
>> log will contain false positives of a sort.  But this is more of a noise
>> issue than true false positives.  E.g. there are three possible outcomes
>> for the enclave.
>>
>>    - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
>>      Requesting ALLOW_EXEC is either a straightforward a userspace bug or
>>      a poorly written generic enclave loader.
>>
>>    - The enclave conditionally performs EMODPE[PROT_EXEC].  In this case
>>      the denial is a true false positive.
>>
>>    - The enclave does EMODPE[PROT_EXEC] and its host userspace then fails
>>      on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended.
>>      The audit log will be noisy, but viewed as a whole the denials
>> aren't
>>      false positives.
> 
> What I was talking about was EMODPE[PROT_WRITE] on an RX page.
> 
>>
>> The potential for noisy audit logs and/or false positives is unfortunate,
>> but it's (by far) the lesser of many evils.
>>
>>> Theoretically speaking, what you really need is a per page flag (let's
>>> name it WRITTEN?) indicating whether a page has ever been written to
>>> (or more precisely, granted PROT_WRITE), which will be used to decide
>>> whether to grant PROT_EXEC when requested in future. Given the fact
>>> that all mprotect() goes through LSM and mmap() is limited to
>>> PROT_NONE, it's easy for LSM to capture that flag by itself instead of
>> asking user mode code to provide it.
>>>
>>> That said, here is the summary of what I think is a better approach.
>>> * In hook security_file_alloc(), if @file is an enclave, allocate some
>> data
>>>    structure to store for every page, the WRITTEN flag as described
>> above.
>>>    WRITTEN is cleared initially for all pages.
>>
>> This would effectively require *every* LSM to duplicate the SGX driver's
>> functionality, e.g. track per-page metadata, implement locking to
>> prevent races between multiple mm structs, etc...
> 
> Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_* are no difference than PROCESS__* or FILE__* flags, which are just artifacts to assist particular LSMs in decision making. They are never considered part of the LSM interface, even if other LSMs than SELinux may adopt the same/similar approach.
> 
> If code duplication is what you are worrying about, you can put them in a library, or implement/export them in some new file (maybe security/enclave.c?) as utility functions. But spilling them into user mode is what I think is unacceptable.
> 
>>
>>>    Open: Given a file of type struct file *, how to tell if it is an
>> enclave (i.e. /dev/sgx/enclave)?
>>> * In hook security_mmap_file(), if @file is an enclave, make sure
>> @prot can
>>>    only be PROT_NONE. This is to force all protection changes to go
>> through
>>>    security_file_mprotect().
>>> * In the newly introduced hook security_enclave_load(), set WRITTEN
>> for pages
>>>    that are requested PROT_WRITE.
>>
>> How would an LSM associate a page with a specific enclave?  vma->vm_file
>> will point always point at /dev/sgx/enclave.  vma->vm_mm is useless
>> because we're allowing multiple processes to map a single enclave, not
>> to mention that by mm would require holding a reference to the mm.
> 
> Each open("/dev/sgx/enclave") syscall creates a *new* instance of struct file to uniquely identify one enclave instance. What I mean is @vma->vm_file, not @vma->vm_file->f_path or @vma->vm_file->f_inode.
> 
>>
>>> * In hook security_file_mprotect(), if @vma->vm_file is an enclave,
>> look up
>>>    and use WRITTEN flags for all pages within @vma, along with other
>> global
>>>    flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux)
>> to decide
>>>    on allowing/rejecting @prot.
>>
>> vma->vm_file will always be /dev/sgx/enclave at this point, which means
>> LSMs don't have the necessary anchor back to the source file, e.g. to
>> enforce FILE__EXECUTE.  The noexec file system case is also unaddressed.
> 
> vma->vm_file identifies an enclave instance uniquely. FILE__EXECUTE is checked by security_enclave_load() using @source_vma->vm_file. Once a page has been EADD'ed, whether to allow RW->RX depends on .sigstruct file (more precisely, the file backing SIGSTRUCT), whose FILE__* attributes could be cached in vma->vm_file->f_security by security_enclave_init().

The RFC series seemed to dispense with the use of the sigstruct file and 
just used the source file throughout IIUC.  That allowed for reuse of 
FILE__* permissions without ambiguity rather than introducing separate 
ENCLAVE__* permissions or using /dev/sgx/enclave inode as the target of 
all checks.

Regardless, IIUC, your approach requires that we always check 
FILE__EXECMOD, and FILE__EXECUTE up front during security_enclave_load() 
irrespective of prot so that we can save the result in the f_security 
for later use by the mprotect hook.  This may generate many spurious 
audit messages for cases where PROT_EXEC will never be requested, and 
users will be prone to just always allowing it since they cannot tell 
when it was actually needed.

>   
> The noexec case should be addressed in IOC_ADD_PAGES by testing @source_vma->vm_flags & VM_MAYEXEC.
> 
>>
>>> * In hook security_file_free(), if @file is an  enclave, free storage
>>>    allocated for WRITTEN flags.

Sean Christopherson June 4, 2019, 4:30 p.m. UTC | #9

On Tue, Jun 04, 2019 at 11:33:44AM -0400, Stephen Smalley wrote:
> The RFC series seemed to dispense with the use of the sigstruct file and
> just used the source file throughout IIUC.  That allowed for reuse of
> FILE__* permissions without ambiguity rather than introducing separate
> ENCLAVE__* permissions or using /dev/sgx/enclave inode as the target of all
> checks.

Drat, I meant to explicitly call that out in the cover letter.  Yes, the
concept of using sigstruct as a proxy was dropped for this RFC.  The
primary motivation was to avoid having to take a hold a reference to the
sigstruct file for the lifetime of the enclave, and in general so that
userspace isn't forced to put sigstruct into a file.

> Regardless, IIUC, your approach requires that we always check FILE__EXECMOD,
> and FILE__EXECUTE up front during security_enclave_load() irrespective of
> prot so that we can save the result in the f_security for later use by the
> mprotect hook.

Correct, this approach requires up front checks.

> This may generate many spurious audit messages for cases
> where PROT_EXEC will never be requested, and users will be prone to just
> always allowing it since they cannot tell when it was actually needed.

Userspace will be able to understand when PROT_EXEC is actually needed
as mprotect() will (eventually) fail.  Of course that assumes userspace
is being intelligent and isn't blindly declaring permissions they don't
need, e.g. declaring RWX on all pages even though the enclave never
actually maps a RWX or RW->RX page.

One thought for handling this in a more user friendly fashion would be
to immediately return -EACCES instead of modifying @allowed_prot.  An
enclave that truly needs the permission would fail immediately.

An enclave loader that wants/needs to speculatively declare PROT_EXEC,
e.g. because the exact requirements of the enclave are unknown, could
handle -EACCESS gracefully by retrying the SGX ioctl() with different
@allowed_prot, e.g.:

  region.flags = SGX_ALLOW_READ | SGX_ALLOW_WRITE | SGX_ALLOW_EXEC;

  ret = ioctl(fd, SGX_IOC_ENCLAVE_ADD_REGION, &region);
  if (ret && errno == EACCES && !(prot & PROT_EXEC)) {
      region.flags &= ~SGX_ALLOW_EXEC;
      ret = ioctl(fd, SGX_IOC_ENCLAVE_ADD_REGION, &region);
  }

This type of enclave loader would still generate spurious audit messages,
but the spurious messages would be limited to enclave loaders that are
deliberately probing the allowed permissions.

> >The noexec case should be addressed in IOC_ADD_PAGES by testing
> >@source_vma->vm_flags & VM_MAYEXEC.
> >
> >>
> >>>* In hook security_file_free(), if @file is an  enclave, free storage
> >>>   allocated for WRITTEN flags.
>

Xing, Cedric June 4, 2019, 9:38 p.m. UTC | #10

Hi Stephen,

> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx-
> owner@vger.kernel.org] On Behalf Of Stephen Smalley
> Sent: Tuesday, June 04, 2019 8:34 AM
> 
> On 6/3/19 2:30 PM, Xing, Cedric wrote:
> >> From: Christopherson, Sean J
> >> Sent: Monday, June 03, 2019 10:16 AM
> >>
> >> On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
> >>> Hi Sean,
> >>>
> >>> Generally I agree with your direction but think ALLOW_* flags are
> >>> completely internal to LSM because they can be both produced and
> >>> consumed inside an LSM module. So spilling them into SGX driver and
> >>> also user mode code makes the solution ugly and in some cases
> >>> impractical because not every enclave host process has a priori
> >>> knowledge on whether or not an enclave page would be EMODPE'd at
> >> runtime.
> >>
> >> In this case, the host process should tag *all* pages it *might*
> convert
> >> to executable as ALLOW_EXEC.  LSMs can (and should/will) be written
> in
> >> such a way that denying ALLOW_EXEC is fatal to the enclave if and
> only
> >> if the enclave actually attempts mprotect(PROT_EXEC).
> >
> > What if those pages contain self-modifying code but the host doesn't
> know ahead of time? Would it require ALLOW_WRITE|ALLOW_EXEC at EADD?
> Then would it prevent those pages to start with PROT_EXEC?
> >
> > Anyway, my point is that it is unnecessary even if it works.
> >
> >>
> >> Take the SELinux path for example.  The only scenario in which
> >> PROT_WRITE is cleared from @allowed_prot is if the page *starts* with
> >> PROT_EXEC.
> >> If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
> >> then PROT_EXEC will be cleared from @allowed_prot.
> >>
> >> As Stephen pointed out, auditing the denials on @allowed_prot means
> the
> >> log will contain false positives of a sort.  But this is more of a
> noise
> >> issue than true false positives.  E.g. there are three possible
> outcomes
> >> for the enclave.
> >>
> >>    - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
> >>      Requesting ALLOW_EXEC is either a straightforward a userspace
> bug or
> >>      a poorly written generic enclave loader.
> >>
> >>    - The enclave conditionally performs EMODPE[PROT_EXEC].  In this
> case
> >>      the denial is a true false positive.
> >>
> >>    - The enclave does EMODPE[PROT_EXEC] and its host userspace then
> fails
> >>      on mprotect(PROT_EXEC), i.e. the LSM denial is working as
> intended.
> >>      The audit log will be noisy, but viewed as a whole the denials
> >> aren't
> >>      false positives.
> >
> > What I was talking about was EMODPE[PROT_WRITE] on an RX page.
> >
> >>
> >> The potential for noisy audit logs and/or false positives is
> unfortunate,
> >> but it's (by far) the lesser of many evils.
> >>
> >>> Theoretically speaking, what you really need is a per page flag
> (let's
> >>> name it WRITTEN?) indicating whether a page has ever been written to
> >>> (or more precisely, granted PROT_WRITE), which will be used to
> decide
> >>> whether to grant PROT_EXEC when requested in future. Given the fact
> >>> that all mprotect() goes through LSM and mmap() is limited to
> >>> PROT_NONE, it's easy for LSM to capture that flag by itself instead
> of
> >> asking user mode code to provide it.
> >>>
> >>> That said, here is the summary of what I think is a better approach.
> >>> * In hook security_file_alloc(), if @file is an enclave, allocate
> some
> >> data
> >>>    structure to store for every page, the WRITTEN flag as described
> >> above.
> >>>    WRITTEN is cleared initially for all pages.
> >>
> >> This would effectively require *every* LSM to duplicate the SGX
> driver's
> >> functionality, e.g. track per-page metadata, implement locking to
> >> prevent races between multiple mm structs, etc...
> >
> > Architecturally we shouldn't dictate how LSM makes decisions. ALLOW_*
> are no difference than PROCESS__* or FILE__* flags, which are just
> artifacts to assist particular LSMs in decision making. They are never
> considered part of the LSM interface, even if other LSMs than SELinux
> may adopt the same/similar approach.
> >
> > If code duplication is what you are worrying about, you can put them
> in a library, or implement/export them in some new file (maybe
> security/enclave.c?) as utility functions. But spilling them into user
> mode is what I think is unacceptable.
> >
> >>
> >>>    Open: Given a file of type struct file *, how to tell if it is an
> >> enclave (i.e. /dev/sgx/enclave)?
> >>> * In hook security_mmap_file(), if @file is an enclave, make sure
> >> @prot can
> >>>    only be PROT_NONE. This is to force all protection changes to go
> >> through
> >>>    security_file_mprotect().
> >>> * In the newly introduced hook security_enclave_load(), set WRITTEN
> >> for pages
> >>>    that are requested PROT_WRITE.
> >>
> >> How would an LSM associate a page with a specific enclave?  vma-
> >vm_file
> >> will point always point at /dev/sgx/enclave.  vma->vm_mm is useless
> >> because we're allowing multiple processes to map a single enclave,
> not
> >> to mention that by mm would require holding a reference to the mm.
> >
> > Each open("/dev/sgx/enclave") syscall creates a *new* instance of
> struct file to uniquely identify one enclave instance. What I mean is
> @vma->vm_file, not @vma->vm_file->f_path or @vma->vm_file->f_inode.
> >
> >>
> >>> * In hook security_file_mprotect(), if @vma->vm_file is an enclave,
> >> look up
> >>>    and use WRITTEN flags for all pages within @vma, along with other
> >> global
> >>>    flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux)
> >> to decide
> >>>    on allowing/rejecting @prot.
> >>
> >> vma->vm_file will always be /dev/sgx/enclave at this point, which
> means
> >> LSMs don't have the necessary anchor back to the source file, e.g. to
> >> enforce FILE__EXECUTE.  The noexec file system case is also
> unaddressed.
> >
> > vma->vm_file identifies an enclave instance uniquely. FILE__EXECUTE is
> checked by security_enclave_load() using @source_vma->vm_file. Once a
> page has been EADD'ed, whether to allow RW->RX depends on .sigstruct
> file (more precisely, the file backing SIGSTRUCT), whose FILE__*
> attributes could be cached in vma->vm_file->f_security by
> security_enclave_init().
> 
> The RFC series seemed to dispense with the use of the sigstruct file and
> just used the source file throughout IIUC.  That allowed for reuse of
> FILE__* permissions without ambiguity rather than introducing separate
> ENCLAVE__* permissions or using /dev/sgx/enclave inode as the target of
> all checks.

That's right. But that's not the distinction between Sean's patch and my proposal.

My point here is, from the perspective of LSM architecture, LSM hooks shall be defined to pass adequate information to allow *all* possible implementations to make reasonable decisions. In particular, all parameters to EADD (i.e. target linear address, SECINFO, source page) could affect (current and future) decisions but Sean's definition of security_enclave_load() passes only the source, which limits the possibility of other implementations. Another point I'm trying to make is, different LSM implementations may need different information at any given decision point, therefore it's *not* possible to always pass "right" information at "right" time. And that's why I think .f_security was added to struct file to allow stateful LSMs. Sean's patch however is trying the opposite, as ALLOW_* flags should otherwise be part of internal state of LSMs, but they are spilled into SGX driver and also userspace. 

> 
> Regardless, IIUC, your approach requires that we always check
> FILE__EXECMOD, and FILE__EXECUTE up front during security_enclave_load()
> irrespective of prot so that we can save the result in the f_security
> for later use by the mprotect hook.  This may generate many spurious
> audit messages for cases where PROT_EXEC will never be requested, and
> users will be prone to just always allowing it since they cannot tell
> when it was actually needed.

Yes and no. 

For those not following this discussion closely, here's the prototype of security_enclave_load() that I proposed in one of my earlier emails.

int security_enclave_load(struct file *enclave_fd, unsigned long linear_address, unsigned long nr_pages, int prot, struct vm_area_struct *source_vma);

@enclave_fd identifies the enclave to which new pages are being added.
@linear_address/@nr_pages specifies the linear range of pages being added.
@prot specifies the initial protection of those newly added pages. It is taken from the vma covering the target range.
@source_vma covers the source pages in the case of EADD. An LSM is expected to make sure security_file_mprotect(source_vma, prot, prot) would succeed before checking anything else, unless @source_vma is NULL, indicating pages are being EAUG'ed. In all cases, LSM is expected to "remember" @prot for all those pages to be checked in future security_file_mprotect() invocations.

Architecture wise, the idea here is for SGX driver to pass in all information relevant for an LSM's decision. 

Implementation wise, LSM may allow PROT_EXEC depending on FILE__EXECUTE of the source file (@source_vma->vm_file), or the sigstruct file (will be provided to LSM at security_enclave_init), or /dev/sgx/enclave. It makes most sense to me to use the source file, hence the check would most likely be done here. For future security_file_mprotect(), the source file's FILE__EXECMOD could also be cached here, or it could use sigstruct file's FILE__EXECMOD instead. Given the fact that no EPC pages could be accessed before EINIT, the major purpose of security_enclave_load() is for LSM to cache whatever information deemed appropriate for the pages being EADD'ed (i.e. @source_vma != NULL) so that it has necessary information to make decisions in security_file_mprotect() in future. And in that regard the parameter @prot is unnecessary but I decided to have it here for 2 reasons: 1) Pages may be EADD'ed within an existing VMA (so no upcoming mprotect) so LSM's decision is needed right away and 2) @source_vma may not be able to mprotect(@prot) and in that case it'd be better to fail EADD instead of failing at mprotect() later.

> 
> >
> > The noexec case should be addressed in IOC_ADD_PAGES by testing
> @source_vma->vm_flags & VM_MAYEXEC.
> >
> >>
> >>> * In hook security_file_free(), if @file is an  enclave, free
> storage
> >>>    allocated for WRITTEN flags.

[RFC,0/9] security: x86/sgx: SGX vs. LSM

Message

Comments