[RFC,v3,01/12] x86/sgx: Add mm to enclave at mmap()
diff mbox series

Message ID 20190617222438.2080-2-sean.j.christopherson@intel.com
State New
Headers show
Series
  • security: x86/sgx: SGX vs. LSM, round 3
Related show

Commit Message

Sean Christopherson June 17, 2019, 10:24 p.m. UTC
The enclave mm tracking is currently broken:

  - Adding current->mm during ECREATE is wrong as there is no guarantee
    that the current process has mmap()'d the enclave, i.e. there may
    never be an associated sgx_vma_close() to drop the encl_mm.

  - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called
    only when splitting or duplicating a vma.  If userspace performs a
    single mmap() on the enclave then SGX will fail to track the mm.
    This bug is partially hidden by tracking current->mm at ECREATE.

Rework the tracking to get/add the mm at mmap().  A side effect of the
bug fix is that sgx_vma_{open,close}() should never encounter a vma with
an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm
cannot be found in either condition.

Change the WARN() on a non-empty mm_list to a WARN_ONCE().  The warning
will fire over and over (and over) if the mm tracking is broken, which
hampers debug/triage far more than it helps.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/sgx/driver/ioctl.c | 14 ----------
 arch/x86/kernel/cpu/sgx/driver/main.c  | 26 ++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.c         | 38 +++++---------------------
 arch/x86/kernel/cpu/sgx/encl.h         |  4 +--
 4 files changed, 35 insertions(+), 47 deletions(-)

Comments

Dave Hansen June 17, 2019, 10:32 p.m. UTC | #1
On 6/17/19 3:24 PM, Sean Christopherson wrote:
> +static int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
> +{
> +	struct sgx_encl_mm *encl_mm;
> +
> +	encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL);
> +	if (!encl_mm)
> +		return -ENOMEM;
> +
> +	encl_mm->encl = encl;
> +	encl_mm->mm = mm;
> +	kref_init(&encl_mm->refcount);
> +
> +	spin_lock(&encl->mm_lock);
> +	list_add(&encl_mm->list, &encl->mm_list);
> +	spin_unlock(&encl->mm_lock);
> +
> +	return 0;
> +}

This probably need commenting and/or changelogging about the mm
refcounting or lack thereof.
Andy Lutomirski June 17, 2019, 11:42 p.m. UTC | #2
On Mon, Jun 17, 2019 at 3:24 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> The enclave mm tracking is currently broken:
>
>   - Adding current->mm during ECREATE is wrong as there is no guarantee
>     that the current process has mmap()'d the enclave, i.e. there may
>     never be an associated sgx_vma_close() to drop the encl_mm.
>
>   - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called
>     only when splitting or duplicating a vma.  If userspace performs a
>     single mmap() on the enclave then SGX will fail to track the mm.
>     This bug is partially hidden by tracking current->mm at ECREATE.
>
> Rework the tracking to get/add the mm at mmap().  A side effect of the
> bug fix is that sgx_vma_{open,close}() should never encounter a vma with
> an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm
> cannot be found in either condition.
>

It would be nifty if you could also kill .vm_close, since then VMAs
could be merged properly.  Would this be straightforward?

--Andy
Sean Christopherson June 18, 2019, 2:11 p.m. UTC | #3
On Mon, Jun 17, 2019 at 04:42:59PM -0700, Andy Lutomirski wrote:
> On Mon, Jun 17, 2019 at 3:24 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > The enclave mm tracking is currently broken:
> >
> >   - Adding current->mm during ECREATE is wrong as there is no guarantee
> >     that the current process has mmap()'d the enclave, i.e. there may
> >     never be an associated sgx_vma_close() to drop the encl_mm.
> >
> >   - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called
> >     only when splitting or duplicating a vma.  If userspace performs a
> >     single mmap() on the enclave then SGX will fail to track the mm.
> >     This bug is partially hidden by tracking current->mm at ECREATE.
> >
> > Rework the tracking to get/add the mm at mmap().  A side effect of the
> > bug fix is that sgx_vma_{open,close}() should never encounter a vma with
> > an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm
> > cannot be found in either condition.
> >
> 
> It would be nifty if you could also kill .vm_close, since then VMAs
> could be merged properly.  Would this be straightforward?

Hmm, we probably could move the mm tracking to f_op->{open,release}.  The
downside to that approach is that EPC reclaim would unnecessarily walk the
vmas for processes that have opened the enclave but not mapped any EPC
pages.  In the grand scheme, that's a minor issue and probably worth the
tradeoff of vma merging.

On the plus side, in addition to zapping ->close, I think it would allow
for a simpler vma walking scheme.  Maybe.
Sean Christopherson June 18, 2019, 4:06 p.m. UTC | #4
On Tue, Jun 18, 2019 at 07:11:47AM -0700, Sean Christopherson wrote:
> On Mon, Jun 17, 2019 at 04:42:59PM -0700, Andy Lutomirski wrote:
> > On Mon, Jun 17, 2019 at 3:24 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > The enclave mm tracking is currently broken:
> > >
> > >   - Adding current->mm during ECREATE is wrong as there is no guarantee
> > >     that the current process has mmap()'d the enclave, i.e. there may
> > >     never be an associated sgx_vma_close() to drop the encl_mm.
> > >
> > >   - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called
> > >     only when splitting or duplicating a vma.  If userspace performs a
> > >     single mmap() on the enclave then SGX will fail to track the mm.
> > >     This bug is partially hidden by tracking current->mm at ECREATE.
> > >
> > > Rework the tracking to get/add the mm at mmap().  A side effect of the
> > > bug fix is that sgx_vma_{open,close}() should never encounter a vma with
> > > an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm
> > > cannot be found in either condition.
> > >
> > 
> > It would be nifty if you could also kill .vm_close, since then VMAs
> > could be merged properly.  Would this be straightforward?
> 
> Hmm, we probably could move the mm tracking to f_op->{open,release}.  The

Scratch that thought, pre-coffee brain was thinking there was a magical
file op that was called whenever a new reference to the file was acquired.

The only idea I can think of to avoid .vm_close() would be to not refcount
the mm at all, and instead register a mmu notifier at .mmap() and use
mmu_notifier.release() to remove the mm.  The downside is that the mm
tracking would be sticky, i.e. once a process mmap()'d an enclave it would
forever be tracked by the enclave.  Practically speaking, in the worst
case this would add a small amount of overhead to reclaiming from an
enclave that was temporarily mapped by multiple processes.  So it might
be a viable option?

> downside to that approach is that EPC reclaim would unnecessarily walk the
> vmas for processes that have opened the enclave but not mapped any EPC
> pages.  In the grand scheme, that's a minor issue and probably worth the
> tradeoff of vma merging.
> 
> On the plus side, in addition to zapping ->close, I think it would allow
> for a simpler vma walking scheme.  Maybe.
Jarkko Sakkinen June 19, 2019, 12:56 p.m. UTC | #5
On Mon, 2019-06-17 at 15:24 -0700, Sean Christopherson wrote:
> The enclave mm tracking is currently broken:
> 
>   - Adding current->mm during ECREATE is wrong as there is no guarantee
>     that the current process has mmap()'d the enclave, i.e. there may
>     never be an associated sgx_vma_close() to drop the encl_mm.
> 
>   - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called
>     only when splitting or duplicating a vma.  If userspace performs a
>     single mmap() on the enclave then SGX will fail to track the mm.
>     This bug is partially hidden by tracking current->mm at ECREATE.
> 
> Rework the tracking to get/add the mm at mmap().  A side effect of the
> bug fix is that sgx_vma_{open,close}() should never encounter a vma with
> an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm
> cannot be found in either condition.
> 
> Change the WARN() on a non-empty mm_list to a WARN_ONCE().  The warning
> will fire over and over (and over) if the mm tracking is broken, which
> hampers debug/triage far more than it helps.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kernel/cpu/sgx/driver/ioctl.c | 14 ----------
>  arch/x86/kernel/cpu/sgx/driver/main.c  | 26 ++++++++++++++++++
>  arch/x86/kernel/cpu/sgx/encl.c         | 38 +++++---------------------
>  arch/x86/kernel/cpu/sgx/encl.h         |  4 +--
>  4 files changed, 35 insertions(+), 47 deletions(-)

BTW, sgx_encl_mm is a bit confusing name (which I made) to begin with. 
It clues as it was a some kind of 1:1 association with the process,
which it isn't.

Could you update the patch and rename it as sgx_encl_mapping? After that
I'm happy to merge.

/Jarkko
Jarkko Sakkinen June 19, 2019, 1 p.m. UTC | #6
On Wed, 2019-06-19 at 15:56 +0300, Jarkko Sakkinen wrote:
> On Mon, 2019-06-17 at 15:24 -0700, Sean Christopherson wrote:
> > The enclave mm tracking is currently broken:
> > 
> >   - Adding current->mm during ECREATE is wrong as there is no guarantee
> >     that the current process has mmap()'d the enclave, i.e. there may
> >     never be an associated sgx_vma_close() to drop the encl_mm.
> > 
> >   - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called
> >     only when splitting or duplicating a vma.  If userspace performs a
> >     single mmap() on the enclave then SGX will fail to track the mm.
> >     This bug is partially hidden by tracking current->mm at ECREATE.
> > 
> > Rework the tracking to get/add the mm at mmap().  A side effect of the
> > bug fix is that sgx_vma_{open,close}() should never encounter a vma with
> > an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm
> > cannot be found in either condition.
> > 
> > Change the WARN() on a non-empty mm_list to a WARN_ONCE().  The warning
> > will fire over and over (and over) if the mm tracking is broken, which
> > hampers debug/triage far more than it helps.
> > 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/kernel/cpu/sgx/driver/ioctl.c | 14 ----------
> >  arch/x86/kernel/cpu/sgx/driver/main.c  | 26 ++++++++++++++++++
> >  arch/x86/kernel/cpu/sgx/encl.c         | 38 +++++---------------------
> >  arch/x86/kernel/cpu/sgx/encl.h         |  4 +--
> >  4 files changed, 35 insertions(+), 47 deletions(-)
> 
> BTW, sgx_encl_mm is a bit confusing name (which I made) to begin with. 
> It clues as it was a some kind of 1:1 association with the process,
> which it isn't.
> 
> Could you update the patch and rename it as sgx_encl_mapping? After that
> I'm happy to merge.

And send it as a separate patch. Can merge it today/tomorrow after I've
finished my reaper change.

/Jarkko
Jarkko Sakkinen June 20, 2019, 8:09 p.m. UTC | #7
On Wed, Jun 19, 2019 at 04:00:14PM +0300, Jarkko Sakkinen wrote:
> On Wed, 2019-06-19 at 15:56 +0300, Jarkko Sakkinen wrote:
> > On Mon, 2019-06-17 at 15:24 -0700, Sean Christopherson wrote:
> > > The enclave mm tracking is currently broken:
> > > 
> > >   - Adding current->mm during ECREATE is wrong as there is no guarantee
> > >     that the current process has mmap()'d the enclave, i.e. there may
> > >     never be an associated sgx_vma_close() to drop the encl_mm.
> > > 
> > >   - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called
> > >     only when splitting or duplicating a vma.  If userspace performs a
> > >     single mmap() on the enclave then SGX will fail to track the mm.
> > >     This bug is partially hidden by tracking current->mm at ECREATE.
> > > 
> > > Rework the tracking to get/add the mm at mmap().  A side effect of the
> > > bug fix is that sgx_vma_{open,close}() should never encounter a vma with
> > > an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm
> > > cannot be found in either condition.
> > > 
> > > Change the WARN() on a non-empty mm_list to a WARN_ONCE().  The warning
> > > will fire over and over (and over) if the mm tracking is broken, which
> > > hampers debug/triage far more than it helps.
> > > 
> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > > ---
> > >  arch/x86/kernel/cpu/sgx/driver/ioctl.c | 14 ----------
> > >  arch/x86/kernel/cpu/sgx/driver/main.c  | 26 ++++++++++++++++++
> > >  arch/x86/kernel/cpu/sgx/encl.c         | 38 +++++---------------------
> > >  arch/x86/kernel/cpu/sgx/encl.h         |  4 +--
> > >  4 files changed, 35 insertions(+), 47 deletions(-)
> > 
> > BTW, sgx_encl_mm is a bit confusing name (which I made) to begin with. 
> > It clues as it was a some kind of 1:1 association with the process,
> > which it isn't.
> > 
> > Could you update the patch and rename it as sgx_encl_mapping? After that
> > I'm happy to merge.
> 
> And send it as a separate patch. Can merge it today/tomorrow after I've
> finished my reaper change.

Still working/testing reaper stuff but this one is now applied. Lets
not do any renames for the moment because they only slow down. That
was a bad suggestion from my part.

The reason I squashed this change is that it is a pure bug fix.

If there is a bug, lets try to just fix that with absolutely minimal
intrusion. Only after that consider a "better way".

/Jarkko

Patch
diff mbox series

diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
index d17c60dca114..3552d642b26f 100644
--- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c
@@ -276,7 +276,6 @@  static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 {
 	unsigned long encl_size = secs->size + PAGE_SIZE;
 	struct sgx_epc_page *secs_epc;
-	struct sgx_encl_mm *encl_mm;
 	unsigned long ssaframesize;
 	struct sgx_pageinfo pginfo;
 	struct sgx_secinfo secinfo;
@@ -311,12 +310,6 @@  static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 
 	INIT_WORK(&encl->work, sgx_add_page_worker);
 
-	encl_mm = sgx_encl_mm_add(encl, current->mm);
-	if (IS_ERR(encl_mm)) {
-		ret = PTR_ERR(encl_mm);
-		goto err_out;
-	}
-
 	secs_epc = sgx_alloc_page(&encl->secs, true);
 	if (IS_ERR(secs_epc)) {
 		ret = PTR_ERR(secs_epc);
@@ -369,13 +362,6 @@  static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
 		encl->backing = NULL;
 	}
 
-	if (!list_empty(&encl->mm_list)) {
-		encl_mm = list_first_entry(&encl->mm_list, struct sgx_encl_mm,
-					   list);
-		list_del(&encl_mm->list);
-		kfree(encl_mm);
-	}
-
 	mutex_unlock(&encl->lock);
 	return ret;
 }
diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c
index 87735ce8b5ba..03eca4bccee1 100644
--- a/arch/x86/kernel/cpu/sgx/driver/main.c
+++ b/arch/x86/kernel/cpu/sgx/driver/main.c
@@ -60,9 +60,35 @@  static long sgx_compat_ioctl(struct file *filep, unsigned int cmd,
 }
 #endif
 
+static int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm)
+{
+	struct sgx_encl_mm *encl_mm;
+
+	encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL);
+	if (!encl_mm)
+		return -ENOMEM;
+
+	encl_mm->encl = encl;
+	encl_mm->mm = mm;
+	kref_init(&encl_mm->refcount);
+
+	spin_lock(&encl->mm_lock);
+	list_add(&encl_mm->list, &encl->mm_list);
+	spin_unlock(&encl->mm_lock);
+
+	return 0;
+}
+
 static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct sgx_encl *encl = file->private_data;
+	int ret;
+
+	if (!sgx_encl_get_mm(encl, vma->vm_mm)) {
+		ret = sgx_encl_mm_add(encl, vma->vm_mm);
+		if (ret)
+			return ret;
+	}
 
 	vma->vm_ops = &sgx_vm_ops;
 	vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 6b190eccd02e..6e9f3a41a40d 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -132,26 +132,6 @@  static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
 	return entry;
 }
 
-struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl,
-				    struct mm_struct *mm)
-{
-	struct sgx_encl_mm *encl_mm;
-
-	encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL);
-	if (!encl_mm)
-		return ERR_PTR(-ENOMEM);
-
-	encl_mm->encl = encl;
-	encl_mm->mm = mm;
-	kref_init(&encl_mm->refcount);
-
-	spin_lock(&encl->mm_lock);
-	list_add(&encl_mm->list, &encl->mm_list);
-	spin_unlock(&encl->mm_lock);
-
-	return encl_mm;
-}
-
 void sgx_encl_mm_release(struct kref *ref)
 {
 	struct sgx_encl_mm *encl_mm =
@@ -164,8 +144,8 @@  void sgx_encl_mm_release(struct kref *ref)
 	kfree(encl_mm);
 }
 
-static struct sgx_encl_mm *sgx_encl_get_mm(struct sgx_encl *encl,
-					   struct mm_struct *mm)
+struct sgx_encl_mm *sgx_encl_get_mm(struct sgx_encl *encl,
+				    struct mm_struct *mm)
 {
 	struct sgx_encl_mm *encl_mm = NULL;
 	struct sgx_encl_mm *prev_mm = NULL;
@@ -190,10 +170,10 @@  static struct sgx_encl_mm *sgx_encl_get_mm(struct sgx_encl *encl,
 	return NULL;
 }
 
+
 static void sgx_vma_open(struct vm_area_struct *vma)
 {
 	struct sgx_encl *encl = vma->vm_private_data;
-	struct sgx_encl_mm *encl_mm;
 
 	if (!encl)
 		return;
@@ -201,12 +181,8 @@  static void sgx_vma_open(struct vm_area_struct *vma)
 	if (encl->flags & SGX_ENCL_DEAD)
 		goto error;
 
-	encl_mm = sgx_encl_get_mm(encl, vma->vm_mm);
-	if (!encl_mm) {
-		encl_mm = sgx_encl_mm_add(encl, vma->vm_mm);
-		if (IS_ERR(encl_mm))
-			goto error;
-	}
+	if (WARN_ON_ONCE(!sgx_encl_get_mm(encl, vma->vm_mm)))
+		goto error;
 
 	kref_get(&encl->refcount);
 	return;
@@ -224,7 +200,7 @@  static void sgx_vma_close(struct vm_area_struct *vma)
 		return;
 
 	encl_mm = sgx_encl_get_mm(encl, vma->vm_mm);
-	if (encl_mm) {
+	if (!WARN_ON_ONCE(!encl_mm)) {
 		kref_put(&encl_mm->refcount, sgx_encl_mm_release);
 
 		/* Release kref for the VMA. */
@@ -468,7 +444,7 @@  void sgx_encl_release(struct kref *ref)
 	if (encl->backing)
 		fput(encl->backing);
 
-	WARN(!list_empty(&encl->mm_list), "sgx: mm_list non-empty");
+	WARN_ONCE(!list_empty(&encl->mm_list), "sgx: mm_list non-empty");
 
 	kfree(encl);
 }
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index c557f0374d74..7b339334d875 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -120,9 +120,9 @@  pgoff_t sgx_encl_get_index(struct sgx_encl *encl, struct sgx_encl_page *page);
 struct page *sgx_encl_get_backing_page(struct sgx_encl *encl, pgoff_t index);
 struct sgx_encl_mm *sgx_encl_next_mm(struct sgx_encl *encl,
 				     struct sgx_encl_mm *encl_mm, int *iter);
-struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl,
-				    struct mm_struct *mm);
 void sgx_encl_mm_release(struct kref *ref);
+struct sgx_encl_mm *sgx_encl_get_mm(struct sgx_encl *encl,
+				    struct mm_struct *mm);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
 struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,