diff mbox series

[RFC] binfmt_elf: Protect mm_struct access with mmap_sem

Message ID 20190612142811.24894-1-mkoutny@suse.com (mailing list archive)
State New, archived
Headers show
Series [RFC] binfmt_elf: Protect mm_struct access with mmap_sem | expand

Commit Message

Michal Koutný June 12, 2019, 2:28 p.m. UTC
find_extend_vma assumes the caller holds mmap_sem as a reader (explained
in expand_downwards()). The path when we are extending the stack VMA to
accomodate argv[] pointers happens without the lock.

I was not able to cause an mm_struct corruption but
BUG_ON(!rwsem_is_locked(&mm->mmap_sem)) in find_extend_vma could be
triggered as

    # <bigfile xargs echo
    xargs: echo: terminated by signal 11

(bigfile needs to have more than RLIMIT_STACK / sizeof(char *) rows)

Other accesses to mm_struct in exec path are protected by mmap_sem, so
conservatively, protect also this one. Besides that, explain why we omit
mm_struct.arg_lock in the exec(2) path.

Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
---

When I was attempting to reduce usage of mmap_sem I came across this
unprotected access and increased number of its holders :-/

I'm not sure whether there is a real concurrent writer at this early
stages (I considered khugepaged especially as setup_arg_pages invokes
khugepaged_enter_vma_merge but we're lucky because khugepaged skips it
because of VM_STACK_INCOMPLETE_SETUP).

A nicer approach would perhaps be to do all this exec setup when the
mm_struct is still not exposed via current->mm (and hence no need to
synchronize via mmap_sem). But I didn't look enough into binfmt specific
whether it is even doable and worth it.

So I'm sending this for a discussion.

 fs/binfmt_elf.c | 10 +++++++++-
 fs/exec.c       |  3 ++-
 2 files changed, 11 insertions(+), 2 deletions(-)

Comments

Matthew Wilcox June 12, 2019, 5 p.m. UTC | #1
On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutný wrote:
> -	/* N.B. passed_fileno might not be initialized? */
> +

Why did you delete this comment?
Michal Koutný June 12, 2019, 5:29 p.m. UTC | #2
On Wed, Jun 12, 2019 at 10:00:34AM -0700, Matthew Wilcox <willy@infradead.org> wrote:
> On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutný wrote:
> > -	/* N.B. passed_fileno might not be initialized? */
> > +
> 
> Why did you delete this comment?
The variable got removed in
    d20894a23708 ("Remove a.out interpreter support in ELF loader")
so it is not relevant anymore.
Matthew Wilcox June 12, 2019, 5:51 p.m. UTC | #3
On Wed, Jun 12, 2019 at 07:29:15PM +0200, Michal Koutný wrote:
> On Wed, Jun 12, 2019 at 10:00:34AM -0700, Matthew Wilcox <willy@infradead.org> wrote:
> > On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutný wrote:
> > > -	/* N.B. passed_fileno might not be initialized? */
> > > +
> > 
> > Why did you delete this comment?
> The variable got removed in
>     d20894a23708 ("Remove a.out interpreter support in ELF loader")
> so it is not relevant anymore.

Better put that in the changelog for v2 then.  or even make it a
separate patch.
Cyrill Gorcunov June 12, 2019, 6:02 p.m. UTC | #4
On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutný wrote:
> find_extend_vma assumes the caller holds mmap_sem as a reader (explained
> in expand_downwards()). The path when we are extending the stack VMA to
> accomodate argv[] pointers happens without the lock.
> 
> I was not able to cause an mm_struct corruption but
> BUG_ON(!rwsem_is_locked(&mm->mmap_sem)) in find_extend_vma could be
> triggered as
> 
>     # <bigfile xargs echo
>     xargs: echo: terminated by signal 11
> 
> (bigfile needs to have more than RLIMIT_STACK / sizeof(char *) rows)
> 
> Other accesses to mm_struct in exec path are protected by mmap_sem, so
> conservatively, protect also this one. Besides that, explain why we omit
> mm_struct.arg_lock in the exec(2) path.
> 
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Signed-off-by: Michal Koutný <mkoutny@suse.com>
> ---
> 
> When I was attempting to reduce usage of mmap_sem I came across this
> unprotected access and increased number of its holders :-/
> 
> I'm not sure whether there is a real concurrent writer at this early
> stages (I considered khugepaged especially as setup_arg_pages invokes
> khugepaged_enter_vma_merge but we're lucky because khugepaged skips it
> because of VM_STACK_INCOMPLETE_SETUP).
> 
> A nicer approach would perhaps be to do all this exec setup when the
> mm_struct is still not exposed via current->mm (and hence no need to
> synchronize via mmap_sem). But I didn't look enough into binfmt specific
> whether it is even doable and worth it.
> 
> So I'm sending this for a discussion.
> 
>  fs/binfmt_elf.c | 10 +++++++++-
>  fs/exec.c       |  3 ++-
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 8264b468f283..48e169760a9c 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -299,7 +299,11 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
>  	 * Grow the stack manually; some architectures have a limit on how
>  	 * far ahead a user-space access may be in order to grow the stack.
>  	 */
> +	if (down_read_killable(&current->mm->mmap_sem))
> +		return -EINTR;
>  	vma = find_extend_vma(current->mm, bprm->p);
> +	up_read(&current->mm->mmap_sem);
> +

Good catch, Michal! Actually the loader code is heavy on its own so
I think having readlock taken here should not cause any perf problems
but worth having for consistency.

Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cyrill Gorcunov June 12, 2019, 6:04 p.m. UTC | #5
On Wed, Jun 12, 2019 at 10:51:59AM -0700, Matthew Wilcox wrote:
> On Wed, Jun 12, 2019 at 07:29:15PM +0200, Michal Koutný wrote:
> > On Wed, Jun 12, 2019 at 10:00:34AM -0700, Matthew Wilcox <willy@infradead.org> wrote:
> > > On Wed, Jun 12, 2019 at 04:28:11PM +0200, Michal Koutný wrote:
> > > > -	/* N.B. passed_fileno might not be initialized? */
> > > > +
> > > 
> > > Why did you delete this comment?
> > The variable got removed in
> >     d20894a23708 ("Remove a.out interpreter support in ELF loader")
> > so it is not relevant anymore.
> 
> Better put that in the changelog for v2 then.  or even make it a
> separate patch.

Just updated changelog should be fine, I guess. A separate commit
just to remove an obsolete comment is too much.
diff mbox series

Patch

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 8264b468f283..48e169760a9c 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -299,7 +299,11 @@  create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
 	 * Grow the stack manually; some architectures have a limit on how
 	 * far ahead a user-space access may be in order to grow the stack.
 	 */
+	if (down_read_killable(&current->mm->mmap_sem))
+		return -EINTR;
 	vma = find_extend_vma(current->mm, bprm->p);
+	up_read(&current->mm->mmap_sem);
+
 	if (!vma)
 		return -EFAULT;
 
@@ -1123,11 +1127,15 @@  static int load_elf_binary(struct linux_binprm *bprm)
 		goto out;
 #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
 
+	/*
+	 * Don't take mm->arg_lock. The concurrent change might happen only
+	 * from prctl_set_mm but after de_thread we are certainly alone here.
+	 */
 	retval = create_elf_tables(bprm, &loc->elf_ex,
 			  load_addr, interp_load_addr);
 	if (retval < 0)
 		goto out;
-	/* N.B. passed_fileno might not be initialized? */
+
 	current->mm->end_code = end_code;
 	current->mm->start_code = start_code;
 	current->mm->start_data = start_data;
diff --git a/fs/exec.c b/fs/exec.c
index 89a500bb897a..d5b55c92019a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -212,7 +212,8 @@  static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
 
 	/*
 	 * We are doing an exec().  'current' is the process
-	 * doing the exec and bprm->mm is the new process's mm.
+	 * doing the exec and bprm->mm is the new process's mm that is not
+	 * shared yet, so no synchronization on mmap_sem.
 	 */
 	ret = get_user_pages_remote(current, bprm->mm, pos, 1, gup_flags,
 			&page, NULL, NULL);