Message ID | 20210207221401.29933-1-jarkko@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v8] x86/sgx: Maintain encl->refcount for each encl->mm_list entry | expand |
> This has been shown in tests: > > [ +0.000008] WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374 cleanup_srcu_struct+0xed/0x100 > > This is essentially a use-after free, although SRCU notices it as > an SRCU cleanup in an invalid context. ... Acked-by: Dave Hansen <dave.hansen@linux.intel.com
On Sun, 07 Feb 2021 16:14:01 -0600, Jarkko Sakkinen <jarkko@kernel.org> wrote: > This has been shown in tests: > > [ +0.000008] WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374 > cleanup_srcu_struct+0xed/0x100 > > This is essentially a use-after free, although SRCU notices it as > an SRCU cleanup in an invalid context. > The comments in code around this warning indicate a potential memory leak. Not sure how use-after-free come into play. Anyway, this fix seems to work for the warning above. However, I still have doubts on another potential race. See below. > diff --git a/arch/x86/kernel/cpu/sgx/driver.c > b/arch/x86/kernel/cpu/sgx/driver.c > index f2eac41bb4ff..8ce6d8371cfb 100644 > --- a/arch/x86/kernel/cpu/sgx/driver.c > +++ b/arch/x86/kernel/cpu/sgx/driver.c > @@ -72,6 +72,9 @@ static int sgx_release(struct inode *inode, struct > file *file) > synchronize_srcu(&encl->srcu); > mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); > kfree(encl_mm); Note here you are freeing the encl_mm, outside protection of encl->refcount. > + > + /* 'encl_mm' is gone, put encl_mm->encl reference: */ > + kref_put(&encl->refcount, sgx_encl_release); > } > kref_put(&encl->refcount, sgx_encl_release); > diff --git a/arch/x86/kernel/cpu/sgx/encl.c > b/arch/x86/kernel/cpu/sgx/encl.c > index 20a2dd5ba2b4..7449ef33f081 100644 > --- a/arch/x86/kernel/cpu/sgx/encl.c > +++ b/arch/x86/kernel/cpu/sgx/encl.c > @@ -473,6 +473,9 @@ static void sgx_mmu_notifier_free(struct > mmu_notifier *mn) > { > struct sgx_encl_mm *encl_mm = container_of(mn, struct sgx_encl_mm, > mmu_notifier); > + /* 'encl_mm' is going away, put encl_mm->encl reference: */ > + kref_put(&encl_mm->encl->refcount, sgx_encl_release); > + > kfree(encl_mm); Could this access to and kfree of encl_mm possibly be after the kfree(encl_mm) noted above? Also is there a reason we do kfree(encl_mm) in notifier_free not directly in notifier_release? Thanks Haitao
On Tue, Apr 13, 2021, Haitao Huang wrote: > On Sun, 07 Feb 2021 16:14:01 -0600, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > This has been shown in tests: > > > > [ +0.000008] WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374 > > cleanup_srcu_struct+0xed/0x100 > > > > This is essentially a use-after free, although SRCU notices it as > > an SRCU cleanup in an invalid context. > > > The comments in code around this warning indicate a potential memory leak. > Not sure how use-after-free come into play. Anyway, this fix seems to work > for the warning above. > > However, I still have doubts on another potential race. See below. > > > > diff --git a/arch/x86/kernel/cpu/sgx/driver.c > > b/arch/x86/kernel/cpu/sgx/driver.c > > index f2eac41bb4ff..8ce6d8371cfb 100644 > > --- a/arch/x86/kernel/cpu/sgx/driver.c > > +++ b/arch/x86/kernel/cpu/sgx/driver.c > > @@ -72,6 +72,9 @@ static int sgx_release(struct inode *inode, struct > > file *file) > > synchronize_srcu(&encl->srcu); > > mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); > > kfree(encl_mm); > > Note here you are freeing the encl_mm, outside protection of encl->refcount. > > > + > > + /* 'encl_mm' is gone, put encl_mm->encl reference: */ > > + kref_put(&encl->refcount, sgx_encl_release); > > } > > kref_put(&encl->refcount, sgx_encl_release); > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c > > b/arch/x86/kernel/cpu/sgx/encl.c > > index 20a2dd5ba2b4..7449ef33f081 100644 > > --- a/arch/x86/kernel/cpu/sgx/encl.c > > +++ b/arch/x86/kernel/cpu/sgx/encl.c > > @@ -473,6 +473,9 @@ static void sgx_mmu_notifier_free(struct > > mmu_notifier *mn) > > { > > struct sgx_encl_mm *encl_mm = container_of(mn, struct sgx_encl_mm, > > mmu_notifier); > > + /* 'encl_mm' is going away, put encl_mm->encl reference: */ > > + kref_put(&encl_mm->encl->refcount, sgx_encl_release); > > + > > kfree(encl_mm); > > Could this access to and kfree of encl_mm possibly be after the > kfree(encl_mm) noted above? No, the mmu_notifier_unregister() ensures that all in-progress notifiers complete before it returns, i.e. SGX's notifier call back is not reachable after it's unregistered. > Also is there a reason we do kfree(encl_mm) in notifier_free not directly in > notifier_release? Because encl_mm is the anchor to the enclave reference /* 'encl_mm' is going away, put encl_mm->encl reference: */ kref_put(&encl_mm->encl->refcount, sgx_encl_release); as well as the mmu notifier reference (the mmu_notifier_put(mn) call chain). Freeing encl_mm immediately would prevent sgx_mmu_notifier_free() from dropping the enclave reference. And the mmu notifier reference need to be dropped in sgx_mmu_notifier_release() because the encl_mm has been taken off encl->mm_list.
On 4/14/21 8:51 AM, Sean Christopherson wrote: >> Could this access to and kfree of encl_mm possibly be after the >> kfree(encl_mm) noted above? > No, the mmu_notifier_unregister() ensures that all in-progress notifiers complete > before it returns, i.e. SGX's notifier call back is not reachable after it's > unregistered. > >> Also is there a reason we do kfree(encl_mm) in notifier_free not directly in >> notifier_release? > Because encl_mm is the anchor to the enclave reference > > /* 'encl_mm' is going away, put encl_mm->encl reference: */ > kref_put(&encl_mm->encl->refcount, sgx_encl_release); > > as well as the mmu notifier reference (the mmu_notifier_put(mn) call chain). > Freeing encl_mm immediately would prevent sgx_mmu_notifier_free() from dropping > the enclave reference. And the mmu notifier reference need to be dropped in > sgx_mmu_notifier_release() because the encl_mm has been taken off encl->mm_list. Haitao, I think you've highlighted that this locking scheme is woefully under-documented. Any patches to beef it up would be very welcome.
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c index f2eac41bb4ff..8ce6d8371cfb 100644 --- a/arch/x86/kernel/cpu/sgx/driver.c +++ b/arch/x86/kernel/cpu/sgx/driver.c @@ -72,6 +72,9 @@ static int sgx_release(struct inode *inode, struct file *file) synchronize_srcu(&encl->srcu); mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); kfree(encl_mm); + + /* 'encl_mm' is gone, put encl_mm->encl reference: */ + kref_put(&encl->refcount, sgx_encl_release); } kref_put(&encl->refcount, sgx_encl_release); diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 20a2dd5ba2b4..7449ef33f081 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -473,6 +473,9 @@ static void sgx_mmu_notifier_free(struct mmu_notifier *mn) { struct sgx_encl_mm *encl_mm = container_of(mn, struct sgx_encl_mm, mmu_notifier); + /* 'encl_mm' is going away, put encl_mm->encl reference: */ + kref_put(&encl_mm->encl->refcount, sgx_encl_release); + kfree(encl_mm); } @@ -526,6 +529,8 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm) if (!encl_mm) return -ENOMEM; + /* Grab a refcount for the encl_mm->encl reference: */ + kref_get(&encl->refcount); encl_mm->encl = encl; encl_mm->mm = mm; encl_mm->mmu_notifier.ops = &sgx_mmu_notifier_ops;
This has been shown in tests: [ +0.000008] WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374 cleanup_srcu_struct+0xed/0x100 This is essentially a use-after free, although SRCU notices it as an SRCU cleanup in an invalid context. == Background == SGX has a data structure (struct sgx_encl_mm) which keeps per-mm SGX metadata. This is separate from 'struct sgx_encl' because, in theory, an enclave can be mapped from more than one mm. sgx_encl_mm includes a pointer back to the sgx_encl. This means that sgx_encl must have a longer lifetime than all of the sgx_encl_mm's that point to it. That's usually the case: sgx_encl_mm is freed only after the mmu_notifier is unregistered in sgx_release(). However, there's a race. If the process is exiting, sgx_mmu_notifier_release() can be called in parallel with sgx_release() instead of being called *by* it. The mmu_notifier path keeps encl_mm alive past when sgx_encl can be freed. This inverts the lifetime rules and means that sgx_mmu_notifier_release() can access a freed sgx_encl. == Fix == Increase encl->refcount when encl_mm->encl is established. Release this reference encl_mm is freed. This ensures that 'encl' outlives 'encl_mm'. Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Cc: Dave Hansen <dave.hansen@linux.intel.com Reported-by: Haitao Huang <haitao.huang@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> --- v8: - Slight adjustments on call sites suggested by Dave, to make things more clear and obvious. Otherwise, semantically same as v7: https://lore.kernel.org/linux-sgx/b874673d-9d58-0d6f-ce2d-ef4d33ac5115@intel.com/ Contains also long description written by Dave. v7: - No changes from v6. Resend of https://patchwork.kernel.org/project/intel-sgx/patch/20210204143845.39697-1-jarkko@kernel.org/ v6: - Maintain refcount for each encl->mm_list entry. v5: - To make sure that the instance does not get deleted use kref_get() kref_put(). This also removes the need for additional synchronize_srcu(). v4: - Rewrite the commit message. - Just change the call order. *_expedited() is out of scope for this bug fix. v3: Fine-tuned tags, and added missing change log for v2. v2: Switch to synchronize_srcu_expedited(). arch/x86/kernel/cpu/sgx/driver.c | 3 +++ arch/x86/kernel/cpu/sgx/encl.c | 5 +++++ 2 files changed, 8 insertions(+)