From patchwork Fri Oct 16 22:57:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 11842437 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BAF1F15E6 for ; Fri, 16 Oct 2020 22:57:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6025322203 for ; Fri, 16 Oct 2020 22:57:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ewHMTuMT" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6025322203 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7C38C940008; Fri, 16 Oct 2020 18:57:30 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 77B94940007; Fri, 16 Oct 2020 18:57:30 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C763940008; Fri, 16 Oct 2020 18:57:30 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0114.hostedemail.com [216.40.44.114]) by kanga.kvack.org (Postfix) with ESMTP id 254CA940007 for ; Fri, 16 Oct 2020 18:57:30 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B929A1EE6 for ; Fri, 16 Oct 2020 22:57:29 +0000 (UTC) X-FDA: 77379301818.19.teeth04_0516e5d27220 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id 9DC881AD1B5 for ; Fri, 16 Oct 2020 22:57:29 +0000 (UTC) X-Spam-Summary: 1,0,0,8654b0930e4b1e36,d41d8cd98f00b204,jannh@google.com,,RULES_HIT:1:2:41:69:355:379:541:800:960:966:968:973:981:988:989:1260:1311:1314:1345:1359:1437:1513:1515:1521:1605:1730:1747:1777:1792:2194:2196:2199:2200:2393:2559:2562:2693:2895:2901:2914:3138:3139:3140:3141:3142:3152:3865:3866:3867:3868:3870:3871:3872:3874:4049:4250:4321:4385:4605:5007:6261:6653:6742:7875:7901:7903:9592:9969:10004:11026:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12683:12895:13149:13161:13229:13230:14394:21063:21080:21444:21451:21627:21740:21990:30003:30012:30054:30064:30070,0,RBL:209.85.221.68:@google.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100;04yrhh4ttwzyfnbp1awz37jnbb7h3ococngmegpqezeuf65rpykujcue9b3qtxh.wogu6i3a6o71dnuae4zbcn6rkxb39bj8mbpbzjibradxb86s8pusxqowkhnz6gg.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: teeth04_0516e5d27220 X-Filterd-Recvd-Size: 10278 Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Fri, 16 Oct 2020 22:57:29 +0000 (UTC) Received: by mail-wr1-f68.google.com with SMTP id g12so4804535wrp.10 for ; Fri, 16 Oct 2020 15:57:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6FwH6tQJ2kvR/4Vz9wM7cbsmRIZSsoZ1Y81FZd6gOnM=; b=ewHMTuMTpWjAeY+8LJTjjCSCRGtI1vlQQ6q/+/gOqdNr/v8lHtoeOAtkHat0cs9qyG TcETNdIJ4AV/aRhHdfIFAkdc3ZEd35Y2Uot6+hdRwD3K7XT0fxldxLxI0GZBNlKIVPWD hxlJowpu3nW8GG6BDa73g+zhzTHUvS7XLeTuQ3Y7+gwhrstw3dpGTzx2w1GjxhIIk/li ndZEZFXoPZbiOYzL+mdi/8ZEjO7Ric7lzvoIk87295xgRTJdsgLWkfCRM4DO4LH/AzBx nTro3dC1TQoCEQ0BIHw6zZX6XBcscddsKHw0k390b0pUOIUnrC5hsa+wXKdWIbuAgYA6 P3ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6FwH6tQJ2kvR/4Vz9wM7cbsmRIZSsoZ1Y81FZd6gOnM=; b=hP0lOoPYGmbG6njl+KI4rdsoGTWTwDDxDK+94tVud1qJriOud1lbIeHwhst9gggs5n lm6w4vxAlHQYjJaxLMG+NM2O0u2yj+Bn0n/pCnQsFj18vYQn3KwvC3i4kZuEjXC8o5v+ CsWAUokPE+z/j7ewoIZyA0z5vhXKpvXShyfc8I0VgIdWCRHUG9K58T0hRzOW1PoMjvEs AUbcouXu+iC15/pY+9hwGlvEQNXdA60Xq0RNZ1vgbyn+fTb4U5EkR2vMZN4JE8yh+Ab1 YJNQ0MfaMrxemncEbh58QzSmk2kvFLzEozmgFkQZjv7wlwlRX4STuTZbLYTCtH26T+1h pkMQ== X-Gm-Message-State: AOAM533W2mN993qNf/uHIeC+pKOJkXif+g4h/HgZ4nN+nfCWwShkBOI0 dKSGqIpuUUWMdao8J8svOTXWRg== X-Google-Smtp-Source: ABdhPJwSmAiBuod1zFjGtnQBj6JUvit62csdXtBhw4YActL0zCxPXX6yQgHwikrwDhWMRyM6ReU11w== X-Received: by 2002:a5d:63cb:: with SMTP id c11mr6242289wrw.243.1602889047918; Fri, 16 Oct 2020 15:57:27 -0700 (PDT) Received: from localhost ([2a02:168:96c5:1:55ed:514f:6ad7:5bcc]) by smtp.gmail.com with ESMTPSA id 40sm5506933wrc.46.2020.10.16.15.57.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Oct 2020 15:57:27 -0700 (PDT) From: Jann Horn To: Andrew Morton , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, "Eric W . Biederman" , Michel Lespinasse , Mauro Carvalho Chehab , Sakari Ailus , Jeff Dike , Richard Weinberger , Anton Ivanov , linux-um@lists.infradead.org, Jason Gunthorpe , John Hubbard , Johannes Berg Subject: [PATCH resend v3 2/2] exec: Broadly lock nascent mm until setup_arg_pages() Date: Sat, 17 Oct 2020 00:57:13 +0200 Message-Id: <20201016225713.1971256-3-jannh@google.com> X-Mailer: git-send-email 2.29.0.rc1.297.gfa9743e501-goog In-Reply-To: <20201016225713.1971256-1-jannh@google.com> References: <20201016225713.1971256-1-jannh@google.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: While AFAIK there currently is nothing that can modify the VMA tree of a new mm until userspace has started running under the mm, we should properly lock the mm here anyway, both to keep lockdep happy when adding locking assertions and to be safe in the future in case someone e.g. decides to permit VMA-tree-mutating operations in process_madvise_behavior_valid(). The goal of this patch is to broadly lock the nascent mm in the exec path, from around the time it is created all the way to the end of setup_arg_pages() (because setup_arg_pages() accesses bprm->vma). As long as the mm is write-locked, keep it around in bprm->mm, even after it has been installed on the task (with an extra reference on the mm, to reduce complexity in free_bprm()). After setup_arg_pages(), we have to unlock the mm so that APIs such as copy_to_user() will work in the following binfmt-specific setup code. Suggested-by: Jason Gunthorpe Suggested-by: Michel Lespinasse Signed-off-by: Jann Horn --- fs/exec.c | 68 ++++++++++++++++++++--------------------- include/linux/binfmts.h | 2 +- 2 files changed, 35 insertions(+), 35 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 229dbc7aa61a..00edf833781f 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -254,11 +254,6 @@ static int __bprm_mm_init(struct linux_binprm *bprm) return -ENOMEM; vma_set_anonymous(vma); - if (mmap_write_lock_killable(mm)) { - err = -EINTR; - goto err_free; - } - /* * Place the stack at the largest stack address the architecture * supports. Later, we'll move this to an appropriate place. We don't @@ -276,12 +271,9 @@ static int __bprm_mm_init(struct linux_binprm *bprm) goto err; mm->stack_vm = mm->total_vm = 1; - mmap_write_unlock(mm); bprm->p = vma->vm_end - sizeof(void *); return 0; err: - mmap_write_unlock(mm); -err_free: bprm->vma = NULL; vm_area_free(vma); return err; @@ -364,9 +356,9 @@ static int bprm_mm_init(struct linux_binprm *bprm) struct mm_struct *mm = NULL; bprm->mm = mm = mm_alloc(); - err = -ENOMEM; if (!mm) - goto err; + return -ENOMEM; + mmap_write_lock_nascent(mm); /* Save current stack limit for all calculations made during exec. */ task_lock(current->group_leader); @@ -374,17 +366,12 @@ static int bprm_mm_init(struct linux_binprm *bprm) task_unlock(current->group_leader); err = __bprm_mm_init(bprm); - if (err) - goto err; - - return 0; - -err: - if (mm) { - bprm->mm = NULL; - mmdrop(mm); - } + if (!err) + return 0; + bprm->mm = NULL; + mmap_write_unlock(mm); + mmdrop(mm); return err; } @@ -735,6 +722,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift) /* * Finalizes the stack vm_area_struct. The flags and permissions are updated, * the stack is optionally relocated, and some extra space is added. + * At the end of this, the mm_struct will be unlocked on success. */ int setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, @@ -787,9 +775,6 @@ int setup_arg_pages(struct linux_binprm *bprm, bprm->loader -= stack_shift; bprm->exec -= stack_shift; - if (mmap_write_lock_killable(mm)) - return -EINTR; - vm_flags = VM_STACK_FLAGS; /* @@ -807,7 +792,7 @@ int setup_arg_pages(struct linux_binprm *bprm, ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags); if (ret) - goto out_unlock; + return ret; BUG_ON(prev != vma); if (unlikely(vm_flags & VM_EXEC)) { @@ -819,7 +804,7 @@ int setup_arg_pages(struct linux_binprm *bprm, if (stack_shift) { ret = shift_arg_pages(vma, stack_shift); if (ret) - goto out_unlock; + return ret; } /* mprotect_fixup is overkill to remove the temporary stack flags */ @@ -846,11 +831,17 @@ int setup_arg_pages(struct linux_binprm *bprm, current->mm->start_stack = bprm->p; ret = expand_stack(vma, stack_base); if (ret) - ret = -EFAULT; + return -EFAULT; -out_unlock: + /* + * From this point on, anything that wants to poke around in the + * mm_struct must lock it by itself. + */ + bprm->vma = NULL; mmap_write_unlock(mm); - return ret; + mmput(mm); + bprm->mm = NULL; + return 0; } EXPORT_SYMBOL(setup_arg_pages); @@ -1114,8 +1105,6 @@ static int exec_mmap(struct mm_struct *mm) if (ret) return ret; - mmap_write_lock_nascent(mm); - if (old_mm) { /* * Make sure that if there is a core dump in progress @@ -1127,11 +1116,12 @@ static int exec_mmap(struct mm_struct *mm) if (unlikely(old_mm->core_state)) { mmap_read_unlock(old_mm); mutex_unlock(&tsk->signal->exec_update_mutex); - mmap_write_unlock(mm); return -EINTR; } } + /* bprm->mm stays refcounted, current->mm takes an extra reference */ + mmget(mm); task_lock(tsk); active_mm = tsk->active_mm; membarrier_exec_mmap(mm); @@ -1141,7 +1131,6 @@ static int exec_mmap(struct mm_struct *mm) tsk->mm->vmacache_seqnum = 0; vmacache_flush(tsk); task_unlock(tsk); - mmap_write_unlock(mm); if (old_mm) { mmap_read_unlock(old_mm); BUG_ON(active_mm != old_mm); @@ -1397,8 +1386,6 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) goto out; - bprm->mm = NULL; - #ifdef CONFIG_POSIX_TIMERS exit_itimers(me->signal); flush_itimer_signals(); @@ -1545,6 +1532,18 @@ void setup_new_exec(struct linux_binprm * bprm) me->mm->task_size = TASK_SIZE; mutex_unlock(&me->signal->exec_update_mutex); mutex_unlock(&me->signal->cred_guard_mutex); + + if (!IS_ENABLED(CONFIG_MMU)) { + /* + * On MMU, setup_arg_pages() wants to access bprm->vma after + * this point, so we can't drop the mmap lock yet. + * On !MMU, we have neither setup_arg_pages() nor bprm->vma, + * so we should drop the lock here. + */ + mmap_write_unlock(bprm->mm); + mmput(bprm->mm); + bprm->mm = NULL; + } } EXPORT_SYMBOL(setup_new_exec); @@ -1581,6 +1580,7 @@ static void free_bprm(struct linux_binprm *bprm) { if (bprm->mm) { acct_arg_size(bprm, 0); + mmap_write_unlock(bprm->mm); mmput(bprm->mm); } free_arg_pages(bprm); diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 0571701ab1c5..3bf06212fbae 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -22,7 +22,7 @@ struct linux_binprm { # define MAX_ARG_PAGES 32 struct page *page[MAX_ARG_PAGES]; #endif - struct mm_struct *mm; + struct mm_struct *mm; /* nascent mm, write-locked */ unsigned long p; /* current top of mem */ unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int