[RFC,2/9] x86/sgx: Do not naturally align MAP_FIXED address
diff mbox series

Message ID 20190531233159.30992-3-sean.j.christopherson@intel.com
State New
Headers show
Series
  • security: x86/sgx: SGX vs. LSM
Related show

Commit Message

Sean Christopherson May 31, 2019, 11:31 p.m. UTC
SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is
tracked and enforced by the CPU using a base+mask approach, similar to
how hardware range registers such as the variable MTRRs.  As a result,
the ELRANGE must be naturally sized and aligned.

To reduce boilerplate code that would be needed in every userspace
enclave loader, the SGX driver naturally aligns the mmap() address and
also requires the range to be naturally sized.  Unfortunately, SGX fails
to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap()
if userspace is attempting to map a small slice of an existing enclave.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/driver/main.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Jarkko Sakkinen June 4, 2019, 11:49 a.m. UTC | #1
On Fri, May 31, 2019 at 04:31:52PM -0700, Sean Christopherson wrote:
> SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is
> tracked and enforced by the CPU using a base+mask approach, similar to
> how hardware range registers such as the variable MTRRs.  As a result,
> the ELRANGE must be naturally sized and aligned.
> 
> To reduce boilerplate code that would be needed in every userspace
> enclave loader, the SGX driver naturally aligns the mmap() address and
> also requires the range to be naturally sized.  Unfortunately, SGX fails
> to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap()
> if userspace is attempting to map a small slice of an existing enclave.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Why you want to allow mmap() to be called multiple times? mmap() could
be allowed only once with PROT_NONE and denied afterwards. Is this for
sending fd to another process that would map already existing enclave?

I don't see any checks for whether the is enclave underneath. Also, I
think that in all cases mmap() callback should allow only PROT_NONE
as permissions for clarity even if it could called multiple times.

/Jarkko
Andy Lutomirski June 4, 2019, 8:16 p.m. UTC | #2
On Tue, Jun 4, 2019 at 4:50 AM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> On Fri, May 31, 2019 at 04:31:52PM -0700, Sean Christopherson wrote:
> > SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is
> > tracked and enforced by the CPU using a base+mask approach, similar to
> > how hardware range registers such as the variable MTRRs.  As a result,
> > the ELRANGE must be naturally sized and aligned.
> >
> > To reduce boilerplate code that would be needed in every userspace
> > enclave loader, the SGX driver naturally aligns the mmap() address and
> > also requires the range to be naturally sized.  Unfortunately, SGX fails
> > to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap()
> > if userspace is attempting to map a small slice of an existing enclave.
> >
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>
> Why you want to allow mmap() to be called multiple times? mmap() could
> be allowed only once with PROT_NONE and denied afterwards. Is this for
> sending fd to another process that would map already existing enclave?
>
> I don't see any checks for whether the is enclave underneath. Also, I
> think that in all cases mmap() callback should allow only PROT_NONE
> as permissions for clarity even if it could called multiple times.
>

What's the advantage to only allowing PROT_NONE?  The idea here is to
allow a PROT_NONE map followed by some replacemets that overlay it for
the individual segments.  Admittedly, mprotect() can do the same
thing, but disallowing mmap() seems at least a bit surprising.
Xing, Cedric June 4, 2019, 10:10 p.m. UTC | #3
> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx-
> owner@vger.kernel.org] On Behalf Of Andy Lutomirski
> Sent: Tuesday, June 04, 2019 1:16 PM
> 
> On Tue, Jun 4, 2019 at 4:50 AM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > On Fri, May 31, 2019 at 04:31:52PM -0700, Sean Christopherson wrote:
> > > SGX enclaves have an associated Enclave Linear Range (ELRANGE) that
> > > is tracked and enforced by the CPU using a base+mask approach,
> > > similar to how hardware range registers such as the variable MTRRs.
> > > As a result, the ELRANGE must be naturally sized and aligned.
> > >
> > > To reduce boilerplate code that would be needed in every userspace
> > > enclave loader, the SGX driver naturally aligns the mmap() address
> > > and also requires the range to be naturally sized.  Unfortunately,
> > > SGX fails to grant a waiver to the MAP_FIXED case, e.g. incorrectly
> > > rejects mmap() if userspace is attempting to map a small slice of an
> existing enclave.
> > >
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> >
> > Why you want to allow mmap() to be called multiple times? mmap() could
> > be allowed only once with PROT_NONE and denied afterwards. Is this for
> > sending fd to another process that would map already existing enclave?
> >
> > I don't see any checks for whether the is enclave underneath. Also, I
> > think that in all cases mmap() callback should allow only PROT_NONE as
> > permissions for clarity even if it could called multiple times.
> >
> 
> What's the advantage to only allowing PROT_NONE?  The idea here is to
> allow a PROT_NONE map followed by some replacemets that overlay it for
> the individual segments.  Admittedly, mprotect() can do the same thing,
> but disallowing mmap() seems at least a bit surprising.

Disallowing mmap() is not only surprising but also unnecessary.

A bit off topic here. This mmap()/mprotect() discussion reminds me a question (guess for Jarkko): Now that vma->vm_file->private_data keeps a pointer to the enclave, why do we store it again in vma->vm_private? It isn't a big deal but non-NULL vm_private does prevent mprotect() from merging adjacent VMAs.
Sean Christopherson June 5, 2019, 2:08 p.m. UTC | #4
On Tue, Jun 04, 2019 at 03:10:22PM -0700, Xing, Cedric wrote:
> A bit off topic here. This mmap()/mprotect() discussion reminds me a question
> (guess for Jarkko): Now that vma->vm_file->private_data keeps a pointer to
> the enclave, why do we store it again in vma->vm_private? It isn't a big deal
> but non-NULL vm_private does prevent mprotect() from merging adjacent VMAs. 

vma->vm_ops->close also prevents merging, and we need that to refcount the
enclave and mm.

We also rely on nullifying vma->vm_private_data in the unlikely event that
adding a new mm to the enclave fails on kzalloc().
Jarkko Sakkinen June 5, 2019, 3:15 p.m. UTC | #5
On Tue, Jun 04, 2019 at 01:16:04PM -0700, Andy Lutomirski wrote:
> On Tue, Jun 4, 2019 at 4:50 AM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > On Fri, May 31, 2019 at 04:31:52PM -0700, Sean Christopherson wrote:
> > > SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is
> > > tracked and enforced by the CPU using a base+mask approach, similar to
> > > how hardware range registers such as the variable MTRRs.  As a result,
> > > the ELRANGE must be naturally sized and aligned.
> > >
> > > To reduce boilerplate code that would be needed in every userspace
> > > enclave loader, the SGX driver naturally aligns the mmap() address and
> > > also requires the range to be naturally sized.  Unfortunately, SGX fails
> > > to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap()
> > > if userspace is attempting to map a small slice of an existing enclave.
> > >
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> >
> > Why you want to allow mmap() to be called multiple times? mmap() could
> > be allowed only once with PROT_NONE and denied afterwards. Is this for
> > sending fd to another process that would map already existing enclave?
> >
> > I don't see any checks for whether the is enclave underneath. Also, I
> > think that in all cases mmap() callback should allow only PROT_NONE
> > as permissions for clarity even if it could called multiple times.
> >
> 
> What's the advantage to only allowing PROT_NONE?  The idea here is to
> allow a PROT_NONE map followed by some replacemets that overlay it for
> the individual segments.  Admittedly, mprotect() can do the same
> thing, but disallowing mmap() seems at least a bit surprising.

I was merely wondering if it is specifically for the application where a
client process would mmap(MAP_FIXED) an enclave created by a server
process.

/Jarkko
Jarkko Sakkinen June 5, 2019, 3:17 p.m. UTC | #6
On Tue, Jun 04, 2019 at 10:10:22PM +0000, Xing, Cedric wrote:
> A bit off topic here. This mmap()/mprotect() discussion reminds me a
> question (guess for Jarkko): Now that vma->vm_file->private_data keeps
> a pointer to the enclave, why do we store it again in vma->vm_private?
> It isn't a big deal but non-NULL vm_private does prevent mprotect()
> from merging adjacent VMAs. 

Same semantics as with a regular mmap i.e. you can close the file and
still use the mapping.

/Jarkko
Andy Lutomirski June 5, 2019, 8:14 p.m. UTC | #7
> On Jun 5, 2019, at 8:17 AM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> 
>> On Tue, Jun 04, 2019 at 10:10:22PM +0000, Xing, Cedric wrote:
>> A bit off topic here. This mmap()/mprotect() discussion reminds me a
>> question (guess for Jarkko): Now that vma->vm_file->private_data keeps
>> a pointer to the enclave, why do we store it again in vma->vm_private?
>> It isn't a big deal but non-NULL vm_private does prevent mprotect()
>> from merging adjacent VMAs. 
> 
> Same semantics as with a regular mmap i.e. you can close the file and
> still use the mapping.
> 
> 

The file should be properly refcounted — vm_file should not go away while it’s mapped.
Jarkko Sakkinen June 6, 2019, 3:37 p.m. UTC | #8
On Wed, Jun 05, 2019 at 01:14:04PM -0700, Andy Lutomirski wrote:
> 
> 
> > On Jun 5, 2019, at 8:17 AM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> > 
> >> On Tue, Jun 04, 2019 at 10:10:22PM +0000, Xing, Cedric wrote:
> >> A bit off topic here. This mmap()/mprotect() discussion reminds me a
> >> question (guess for Jarkko): Now that vma->vm_file->private_data keeps
> >> a pointer to the enclave, why do we store it again in vma->vm_private?
> >> It isn't a big deal but non-NULL vm_private does prevent mprotect()
> >> from merging adjacent VMAs. 
> > 
> > Same semantics as with a regular mmap i.e. you can close the file and
> > still use the mapping.
> > 
> > 
> 
> The file should be properly refcounted — vm_file should not go away while it’s mapped.

Right, makes sense. It is easy one to change essentially just removing
internal refcount from sgx_encl and using file for the same. I'll update
this to my tree along with the changes to remove LKM/ACPI bits ASAP.

/Jarkko
Jarkko Sakkinen June 13, 2019, 1:48 p.m. UTC | #9
On Thu, Jun 06, 2019 at 06:37:10PM +0300, Jarkko Sakkinen wrote:
> On Wed, Jun 05, 2019 at 01:14:04PM -0700, Andy Lutomirski wrote:
> > 
> > 
> > > On Jun 5, 2019, at 8:17 AM, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote:
> > > 
> > >> On Tue, Jun 04, 2019 at 10:10:22PM +0000, Xing, Cedric wrote:
> > >> A bit off topic here. This mmap()/mprotect() discussion reminds me a
> > >> question (guess for Jarkko): Now that vma->vm_file->private_data keeps
> > >> a pointer to the enclave, why do we store it again in vma->vm_private?
> > >> It isn't a big deal but non-NULL vm_private does prevent mprotect()
> > >> from merging adjacent VMAs. 
> > > 
> > > Same semantics as with a regular mmap i.e. you can close the file and
> > > still use the mapping.
> > > 
> > > 
> > 
> > The file should be properly refcounted — vm_file should not go away while it’s mapped.

mm already takes care of that so vm_file does not go away. Still
we need an internal refcount for enclaves to synchronize with the
swapper. In summary nothing needs to be done.

> Right, makes sense. It is easy one to change essentially just removing
> internal refcount from sgx_encl and using file for the same. I'll update
> this to my tree along with the changes to remove LKM/ACPI bits ASAP.

/Jarkko
Xing, Cedric June 13, 2019, 4:47 p.m. UTC | #10
> From: Jarkko Sakkinen [mailto:jarkko.sakkinen@linux.intel.com]
> Sent: Thursday, June 13, 2019 6:48 AM
> 
> On Thu, Jun 06, 2019 at 06:37:10PM +0300, Jarkko Sakkinen wrote:
> > On Wed, Jun 05, 2019 at 01:14:04PM -0700, Andy Lutomirski wrote:
> > >
> > >
> > > > On Jun 5, 2019, at 8:17 AM, Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> > > >
> > > >> On Tue, Jun 04, 2019 at 10:10:22PM +0000, Xing, Cedric wrote:
> > > >> A bit off topic here. This mmap()/mprotect() discussion reminds
> > > >> me a question (guess for Jarkko): Now that
> > > >> vma->vm_file->private_data keeps a pointer to the enclave, why do
> we store it again in vma->vm_private?
> > > >> It isn't a big deal but non-NULL vm_private does prevent
> > > >> mprotect() from merging adjacent VMAs.
> > > >
> > > > Same semantics as with a regular mmap i.e. you can close the file
> > > > and still use the mapping.
> > > >
> > > >
> > >
> > > The file should be properly refcounted — vm_file should not go away
> while it’s mapped.
> 
> mm already takes care of that so vm_file does not go away. Still we need
> an internal refcount for enclaves to synchronize with the swapper. In
> summary nothing needs to be done.

I don't get this. The swapper takes a read lock on mm->mmap_sem, which locks the vma, which in turn reference counts vma->vm_file. Why is the internal refcount still needed? 

> 
> > Right, makes sense. It is easy one to change essentially just removing
> > internal refcount from sgx_encl and using file for the same. I'll
> > update this to my tree along with the changes to remove LKM/ACPI bits
> ASAP.
> 
> /Jarkko
Sean Christopherson June 13, 2019, 5:14 p.m. UTC | #11
On Thu, Jun 13, 2019 at 09:47:06AM -0700, Xing, Cedric wrote:
> > From: Jarkko Sakkinen [mailto:jarkko.sakkinen@linux.intel.com]
> > Sent: Thursday, June 13, 2019 6:48 AM
> > 
> > On Thu, Jun 06, 2019 at 06:37:10PM +0300, Jarkko Sakkinen wrote:
> > > On Wed, Jun 05, 2019 at 01:14:04PM -0700, Andy Lutomirski wrote:
> > > >
> > > >
> > > > > On Jun 5, 2019, at 8:17 AM, Jarkko Sakkinen
> > <jarkko.sakkinen@linux.intel.com> wrote:
> > > > >
> > > > >> On Tue, Jun 04, 2019 at 10:10:22PM +0000, Xing, Cedric wrote:
> > > > >> A bit off topic here. This mmap()/mprotect() discussion reminds
> > > > >> me a question (guess for Jarkko): Now that
> > > > >> vma->vm_file->private_data keeps a pointer to the enclave, why do
> > we store it again in vma->vm_private?
> > > > >> It isn't a big deal but non-NULL vm_private does prevent
> > > > >> mprotect() from merging adjacent VMAs.
> > > > >
> > > > > Same semantics as with a regular mmap i.e. you can close the file
> > > > > and still use the mapping.
> > > > >
> > > > >
> > > >
> > > > The file should be properly refcounted — vm_file should not go away
> > while it’s mapped.
> > 
> > mm already takes care of that so vm_file does not go away. Still we need
> > an internal refcount for enclaves to synchronize with the swapper. In
> > summary nothing needs to be done.
> 
> I don't get this. The swapper takes a read lock on mm->mmap_sem, which locks
> the vma, which in turn reference counts vma->vm_file. Why is the internal
> refcount still needed? 

mmap_sem is only held when reclaim is touching PTEs, e.g. to test/clear
its accessed bit and to zap the PTE.  The liveliness of the enclave needs
to be guaranteed for the entire duration of reclaim, e.g. we can't have
the enclave disappearing when we go to do EWB.  It's also worth nothing
that a single reclaim may operate on more than one mmap_sem, as enclaves
can be shared across processes (mm_structs).
Jarkko Sakkinen June 14, 2019, 3:18 p.m. UTC | #12
On Thu, Jun 13, 2019 at 10:14:51AM -0700, Sean Christopherson wrote:
> > I don't get this. The swapper takes a read lock on mm->mmap_sem, which locks
> > the vma, which in turn reference counts vma->vm_file. Why is the internal
> > refcount still needed? 
> 
> mmap_sem is only held when reclaim is touching PTEs, e.g. to test/clear
> its accessed bit and to zap the PTE.  The liveliness of the enclave needs
> to be guaranteed for the entire duration of reclaim, e.g. we can't have
> the enclave disappearing when we go to do EWB.  It's also worth nothing
> that a single reclaim may operate on more than one mmap_sem, as enclaves
> can be shared across processes (mm_structs).

Anyway, the takeaway I got from this is that encl->refcount does not
need to be updated for VMAs (sent a patch to linux-sgx that I plan
merge).

/Jarkko

Patch
diff mbox series

diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c
index afe844aa81d6..129d356aff30 100644
--- a/arch/x86/kernel/cpu/sgx/driver/main.c
+++ b/arch/x86/kernel/cpu/sgx/driver/main.c
@@ -79,7 +79,13 @@  static unsigned long sgx_get_unmapped_area(struct file *file,
 					   unsigned long pgoff,
 					   unsigned long flags)
 {
-	if (len < 2 * PAGE_SIZE || len & (len - 1) || flags & MAP_PRIVATE)
+	if (flags & MAP_PRIVATE)
+		return -EINVAL;
+
+	if (flags & MAP_FIXED)
+		return addr;
+
+	if (len < 2 * PAGE_SIZE || len & (len - 1))
 		return -EINVAL;
 
 	addr = current->mm->get_unmapped_area(file, addr, 2 * len, pgoff,