Message ID | 87ftcei2si.fsf@x220.int.ebiederm.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/7] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf | expand |
On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote: > > The current idiom for the callers is: > > flush_old_exec(bprm); > set_personality(...); > setup_new_exec(bprm); > > In 2010 Linus split flush_old_exec into flush_old_exec and > setup_new_exec. With the intention that setup_new_exec be what is > called after the processes new personality is set. > > Move the code that doesn't depend upon the personality from > setup_new_exec into flush_old_exec. This is to facilitate future > changes by having as much code together in one function as possible. Er, I *think* this is okay, but I have some questions below which maybe you already investigated (and should perhaps get called out in the changelog). > > Ref: 221af7f87b97 ("Split 'flush_old_exec' into two functions") > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > fs/exec.c | 85 ++++++++++++++++++++++++++++--------------------------- > 1 file changed, 44 insertions(+), 41 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 8c3abafb9bb1..0eff20558735 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1359,39 +1359,7 @@ int flush_old_exec(struct linux_binprm * bprm) > * undergoing exec(2). > */ > do_close_on_exec(me->files); > - return 0; > - > -out_unlock: > - mutex_unlock(&me->signal->exec_update_mutex); > -out: > - return retval; > -} > -EXPORT_SYMBOL(flush_old_exec); > - > -void would_dump(struct linux_binprm *bprm, struct file *file) > -{ > - struct inode *inode = file_inode(file); > - if (inode_permission(inode, MAY_READ) < 0) { > - struct user_namespace *old, *user_ns; > - bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP; > - > - /* Ensure mm->user_ns contains the executable */ > - user_ns = old = bprm->mm->user_ns; > - while ((user_ns != &init_user_ns) && > - !privileged_wrt_inode_uidgid(user_ns, inode)) > - user_ns = user_ns->parent; > > - if (old != user_ns) { > - bprm->mm->user_ns = get_user_ns(user_ns); > - put_user_ns(old); > - } > - } > -} > -EXPORT_SYMBOL(would_dump); > - > -void setup_new_exec(struct linux_binprm * bprm) > -{ > - struct task_struct *me = current; > /* > * Once here, prepare_binrpm() will not be called any more, so > * the final state of setuid/setgid/fscaps can be merged into the > @@ -1414,8 +1382,6 @@ void setup_new_exec(struct linux_binprm * bprm) > bprm->rlim_stack.rlim_cur = _STK_LIM; > } > > - arch_pick_mmap_layout(me->mm, &bprm->rlim_stack); > - > me->sas_ss_sp = me->sas_ss_size = 0; > > /* > @@ -1430,16 +1396,9 @@ void setup_new_exec(struct linux_binprm * bprm) > else > set_dumpable(current->mm, SUID_DUMP_USER); > > - arch_setup_new_exec(); > perf_event_exec(); What is perf expecting to be able to examine at this point? Does it want a view of things after arch_setup_new_exec()? (i.e. "final" TIF flags, mmap layout, etc.) From what I can, the answer is "no, it's just resetting counters", so I think this is fine. Maybe double-check with Steve? > __set_task_comm(me, kbasename(bprm->filename), true); > > - /* Set the new mm task size. We have to do that late because it may > - * depend on TIF_32BIT which is only updated in flush_thread() on > - * some architectures like powerpc > - */ > - me->mm->task_size = TASK_SIZE; > - > /* An exec changes our domain. We are no longer part of the thread > group */ > WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1); > @@ -1467,6 +1426,50 @@ void setup_new_exec(struct linux_binprm * bprm) > * credentials; any time after this it may be unlocked. > */ > security_bprm_committed_creds(bprm); Similarly for the LSM hook: is it expecting a post-arch-setup view? I don't see anything looking at task_size, TIF flags, or anything else; they seem to be just cleaning up from the old process being replaced, so against, I think this is okay. Not visible in this patch, the following things how happen earlier, which I feel should maybe get called out in the changelog, with, perhaps, better justification than what I've got here: bprm->secureexec set/check (looks safe, since it depends on prepare_binprm()'s security_bprm_set_creds(). rlim_stack.rlim_cur setting (safe, just needs to happen before arch_pick_mmap_layout()) dumpable() check (looks safe, BINPRM_FLAGS_ENFORCE_NONDUMP depends on much earlier would_dump(), and uid/gid depend on earlier calls to prepare_binprm()'s bprm_fill_uid()) __set_task_comm (looks safe, just dealing with the task name...) self_exec_id bump (looks safe, but I think -- it's still after uid setting) flush_signal_handlers() (looks safe -- nothing appears to depend on mm nor personality) > + return 0; > + > +out_unlock: > + mutex_unlock(&me->signal->exec_update_mutex); > +out: > + return retval; > +} > +EXPORT_SYMBOL(flush_old_exec); > + > +void would_dump(struct linux_binprm *bprm, struct file *file) > +{ > + struct inode *inode = file_inode(file); > + if (inode_permission(inode, MAY_READ) < 0) { > + struct user_namespace *old, *user_ns; > + bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP; > + > + /* Ensure mm->user_ns contains the executable */ > + user_ns = old = bprm->mm->user_ns; > + while ((user_ns != &init_user_ns) && > + !privileged_wrt_inode_uidgid(user_ns, inode)) > + user_ns = user_ns->parent; > + > + if (old != user_ns) { > + bprm->mm->user_ns = get_user_ns(user_ns); > + put_user_ns(old); > + } > + } > +} > +EXPORT_SYMBOL(would_dump); The diff helpfully decided this moved would_dump(). ;) Is it worth maybe just moviing it explicitly above flush_old_exec() to avoid this churn? I dunno. > + > +void setup_new_exec(struct linux_binprm * bprm) > +{ > + /* Setup things that can depend upon the personality */ Should this comment be above the function instead? > + struct task_struct *me = current; > + > + arch_pick_mmap_layout(me->mm, &bprm->rlim_stack); > + > + arch_setup_new_exec(); > + > + /* Set the new mm task size. We have to do that late because it may > + * depend on TIF_32BIT which is only updated in flush_thread() on > + * some architectures like powerpc > + */ > + me->mm->task_size = TASK_SIZE; > mutex_unlock(&me->signal->exec_update_mutex); > mutex_unlock(&me->signal->cred_guard_mutex); > } > -- > 2.20.1 > So, as I say, I *think* this is okay, but I always get suspicious about reordering things in execve(). ;) So, with a bit larger changelog discussing what's moving "earlier", I think this looks good: Reviewed-by: Kees Cook <keescook@chromium.org>
Kees Cook <keescook@chromium.org> writes: > On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote: >> >> The current idiom for the callers is: >> >> flush_old_exec(bprm); >> set_personality(...); >> setup_new_exec(bprm); >> >> In 2010 Linus split flush_old_exec into flush_old_exec and >> setup_new_exec. With the intention that setup_new_exec be what is >> called after the processes new personality is set. >> >> Move the code that doesn't depend upon the personality from >> setup_new_exec into flush_old_exec. This is to facilitate future >> changes by having as much code together in one function as possible. > > Er, I *think* this is okay, but I have some questions below which > maybe you already investigated (and should perhaps get called out in > the changelog). I will see if I can expand more on the review that I have done. I saw this as moving thre lines and the personality setting later in the code, rather than moving a bunch of lines up AKA these lines: >> + arch_pick_mmap_layout(me->mm, &bprm->rlim_stack); >> + >> + arch_setup_new_exec(); >> + >> + /* Set the new mm task size. We have to do that late because it may >> + * depend on TIF_32BIT which is only updated in flush_thread() on >> + * some architectures like powerpc >> + */ >> + me->mm->task_size = TASK_SIZE; I verified carefully that only those three lines can depend upon the personality changes. Your concern if anything depends on those moved lines I haven't looked at so closely so I will go back through and do that. I don't actually expect anything depends upon those three lines because they should only be changing architecture specific state. But that is general handwaving not actually careful review which tends to turn up suprises in exec. Speaking of while I was looking through the lsm hooks again I just realized that 613cc2b6f272 ("fs: exec: apply CLOEXEC before changing dumpable task flags") only fixed half the problem. So I am going to take a quick detour fix that then come back to this. As that directly affects this code motion. Eric
On Wed, May 06, 2020 at 09:57:10AM -0500, Eric W. Biederman wrote: > Kees Cook <keescook@chromium.org> writes: > > > On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote: > >> > >> The current idiom for the callers is: > >> > >> flush_old_exec(bprm); > >> set_personality(...); > >> setup_new_exec(bprm); > >> > >> In 2010 Linus split flush_old_exec into flush_old_exec and > >> setup_new_exec. With the intention that setup_new_exec be what is > >> called after the processes new personality is set. > >> > >> Move the code that doesn't depend upon the personality from > >> setup_new_exec into flush_old_exec. This is to facilitate future > >> changes by having as much code together in one function as possible. > > > > Er, I *think* this is okay, but I have some questions below which > > maybe you already investigated (and should perhaps get called out in > > the changelog). > > I will see if I can expand more on the review that I have done. > > I saw this as moving thre lines and the personality setting later in the > code, rather than moving a bunch of lines up > > AKA these lines: > >> + arch_pick_mmap_layout(me->mm, &bprm->rlim_stack); > >> + > >> + arch_setup_new_exec(); > >> + > >> + /* Set the new mm task size. We have to do that late because it may > >> + * depend on TIF_32BIT which is only updated in flush_thread() on > >> + * some architectures like powerpc > >> + */ > >> + me->mm->task_size = TASK_SIZE; > > > I verified carefully that only those three lines can depend upon the > personality changes. > > Your concern if anything depends on those moved lines I haven't looked > at so closely so I will go back through and do that. I don't actually > expect anything depends upon those three lines because they should only > be changing architecture specific state. But that is general handwaving > not actually careful review which tends to turn up suprises in exec. Right -- I looked through all of it (see my last email) and I think it's all okay, but I was curious if you'd looked too. :) > Speaking of while I was looking through the lsm hooks again I just > realized that 613cc2b6f272 ("fs: exec: apply CLOEXEC before changing > dumpable task flags") only fixed half the problem. So I am going to > take a quick detour fix that then come back to this. As that directly > affects this code motion. Oh yay. :) Thanks for catching it!
Kees Cook <keescook@chromium.org> writes: > On Wed, May 06, 2020 at 09:57:10AM -0500, Eric W. Biederman wrote: >> Kees Cook <keescook@chromium.org> writes: >> >> > On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote: >> >> >> >> The current idiom for the callers is: >> >> >> >> flush_old_exec(bprm); >> >> set_personality(...); >> >> setup_new_exec(bprm); >> >> >> >> In 2010 Linus split flush_old_exec into flush_old_exec and >> >> setup_new_exec. With the intention that setup_new_exec be what is >> >> called after the processes new personality is set. >> >> >> >> Move the code that doesn't depend upon the personality from >> >> setup_new_exec into flush_old_exec. This is to facilitate future >> >> changes by having as much code together in one function as possible. >> > >> > Er, I *think* this is okay, but I have some questions below which >> > maybe you already investigated (and should perhaps get called out in >> > the changelog). >> >> I will see if I can expand more on the review that I have done. >> >> I saw this as moving thre lines and the personality setting later in the >> code, rather than moving a bunch of lines up >> >> AKA these lines: >> >> + arch_pick_mmap_layout(me->mm, &bprm->rlim_stack); >> >> + >> >> + arch_setup_new_exec(); >> >> + >> >> + /* Set the new mm task size. We have to do that late because it may >> >> + * depend on TIF_32BIT which is only updated in flush_thread() on >> >> + * some architectures like powerpc >> >> + */ >> >> + me->mm->task_size = TASK_SIZE; >> >> >> I verified carefully that only those three lines can depend upon the >> personality changes. >> >> Your concern if anything depends on those moved lines I haven't looked >> at so closely so I will go back through and do that. I don't actually >> expect anything depends upon those three lines because they should only >> be changing architecture specific state. But that is general handwaving >> not actually careful review which tends to turn up suprises in exec. > > Right -- I looked through all of it (see my last email) and I think it's > all okay, but I was curious if you'd looked too. :) I had and I will finish looking in the other direction and see if there is anything else I can see. Thank you for asking and keeping me honest. There are so many moving parts to this code it is easy to overlook something by accident. >> Speaking of while I was looking through the lsm hooks again I just >> realized that 613cc2b6f272 ("fs: exec: apply CLOEXEC before changing >> dumpable task flags") only fixed half the problem. So I am going to >> take a quick detour fix that then come back to this. As that directly >> affects this code motion. > > Oh yay. :) Thanks for catching it! Well that fix is going to be a lot more involved than I anticipated. The more I looked the more bugs I find so I will revisit fixing that after I complete this set of changes. I thought it was going to be a trivial localized fix, and unfortunately not. Eric
Kees Cook <keescook@chromium.org> writes: > On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote: >> >> The current idiom for the callers is: >> >> flush_old_exec(bprm); >> set_personality(...); >> setup_new_exec(bprm); >> >> In 2010 Linus split flush_old_exec into flush_old_exec and >> setup_new_exec. With the intention that setup_new_exec be what is >> called after the processes new personality is set. >> >> Move the code that doesn't depend upon the personality from >> setup_new_exec into flush_old_exec. This is to facilitate future >> changes by having as much code together in one function as possible. > > Er, I *think* this is okay, but I have some questions below which > maybe you already investigated (and should perhaps get called out in > the changelog). I intend to the following text to the changelog. At this point I believe I have read through everything and nothing raises any concerns for me: --- text begin --- To see why it is safe to move this code please note that effectively this change moves the personality setting in the binfmt and the following three lines of code after everything except unlocking the mutexes: arch_pick_mmap_layout arch_setup_new_exec mm->task_size = TASK_SIZE The function arch_pick_mmap_layout at most sets: mm->get_unmapped_area mm->mmap_base mm->mmap_legacy_base mm->mmap_compat_base mm->mmap_compat_legacy_base which nothing in flush_old_exec or setup_new_exec depends on. The function arch_setup_new_exec only sets architecture specific state and the rest of the functions only deal in state that applies to all architectures. The last line just sets mm->task_size and again nothing in flush_old_exec or setup_new_exec depend on task_size. --- text end --- >> >> Ref: 221af7f87b97 ("Split 'flush_old_exec' into two functions") >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- >> fs/exec.c | 85 ++++++++++++++++++++++++++++--------------------------- >> 1 file changed, 44 insertions(+), 41 deletions(-) >> >> diff --git a/fs/exec.c b/fs/exec.c >> index 8c3abafb9bb1..0eff20558735 100644 >> --- a/fs/exec.c >> +++ b/fs/exec.c >> @@ -1359,39 +1359,7 @@ int flush_old_exec(struct linux_binprm * bprm) >> * undergoing exec(2). >> */ >> do_close_on_exec(me->files); >> - return 0; >> - >> -out_unlock: >> - mutex_unlock(&me->signal->exec_update_mutex); >> -out: >> - return retval; >> -} >> -EXPORT_SYMBOL(flush_old_exec); >> - >> -void would_dump(struct linux_binprm *bprm, struct file *file) >> -{ >> - struct inode *inode = file_inode(file); >> - if (inode_permission(inode, MAY_READ) < 0) { >> - struct user_namespace *old, *user_ns; >> - bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP; >> - >> - /* Ensure mm->user_ns contains the executable */ >> - user_ns = old = bprm->mm->user_ns; >> - while ((user_ns != &init_user_ns) && >> - !privileged_wrt_inode_uidgid(user_ns, inode)) >> - user_ns = user_ns->parent; >> >> - if (old != user_ns) { >> - bprm->mm->user_ns = get_user_ns(user_ns); >> - put_user_ns(old); >> - } >> - } >> -} >> -EXPORT_SYMBOL(would_dump); >> - >> -void setup_new_exec(struct linux_binprm * bprm) >> -{ >> - struct task_struct *me = current; >> /* >> * Once here, prepare_binrpm() will not be called any more, so >> * the final state of setuid/setgid/fscaps can be merged into the >> @@ -1414,8 +1382,6 @@ void setup_new_exec(struct linux_binprm * bprm) >> bprm->rlim_stack.rlim_cur = _STK_LIM; >> } >> >> - arch_pick_mmap_layout(me->mm, &bprm->rlim_stack); >> - >> me->sas_ss_sp = me->sas_ss_size = 0; >> >> /* >> @@ -1430,16 +1396,9 @@ void setup_new_exec(struct linux_binprm * bprm) >> else >> set_dumpable(current->mm, SUID_DUMP_USER); >> >> - arch_setup_new_exec(); >> perf_event_exec(); > > What is perf expecting to be able to examine at this point? Does it want > a view of things after arch_setup_new_exec()? (i.e. "final" TIF flags, > mmap layout, etc.) From what I can, the answer is "no, it's just > resetting counters", so I think this is fine. Maybe double-check with > Steve? I can't find anything in the perf code that depends on arch_pick_mmap_layout or mm->task_size. So I don't have any concerns. I have grepped through both kernel/events/ and arch/x86/events/ and include/trace to double check and have nothing turned up. I can't see the policy of where things will be allocated in the memory map making any difference to perf. Depending on what events actually are I can imagine then firing and having issues as I can imagine an event to be just about anything but I don't see a way to prevent that. I do see perf disabling events that are based on addresses. I further see perf enabling/disabling events that have already been computed. I see perf treating exec effectively as a process scheduling out and in. Then finally I see perf shutting itself down on suid exec, and generating some final perf events. I have some concerns that is a bit late, and that the test might not be quite right but nothing particular to this change. >> __set_task_comm(me, kbasename(bprm->filename), true); >> >> - /* Set the new mm task size. We have to do that late because it may >> - * depend on TIF_32BIT which is only updated in flush_thread() on >> - * some architectures like powerpc >> - */ >> - me->mm->task_size = TASK_SIZE; >> - >> /* An exec changes our domain. We are no longer part of the thread >> group */ >> WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1); >> @@ -1467,6 +1426,50 @@ void setup_new_exec(struct linux_binprm * bprm) >> * credentials; any time after this it may be unlocked. >> */ >> security_bprm_committed_creds(bprm); > > Similarly for the LSM hook: is it expecting a post-arch-setup view? I > don't see anything looking at task_size, TIF flags, or anything else; > they seem to be just cleaning up from the old process being replaced, so > against, I think this is okay. Nothing at all with the mm. The LSM hooks close files and muck with rlimits and signals, and tidy up their lsm state. There are only 3 implementations apparmor, tomoyo and selinux so it isn't too hard to read through them. > Not visible in this patch, the following things how happen earlier, > which I feel should maybe get called out in the changelog, with, > perhaps, better justification than what I've got here: > > bprm->secureexec set/check (looks safe, since it depends on > prepare_binprm()'s security_bprm_set_creds(). > > rlim_stack.rlim_cur setting (safe, just needs to happen before > arch_pick_mmap_layout()) > > dumpable() check (looks safe, BINPRM_FLAGS_ENFORCE_NONDUMP depends on > much earlier would_dump(), and uid/gid depend on earlier calls to > prepare_binprm()'s bprm_fill_uid()) > > __set_task_comm (looks safe, just dealing with the task name...) > > self_exec_id bump (looks safe, but I think -- it's still after uid > setting) > > flush_signal_handlers() (looks safe -- nothing appears to depend on mm > nor personality) Agreed. >> + return 0; >> + >> +out_unlock: >> + mutex_unlock(&me->signal->exec_update_mutex); >> +out: >> + return retval; >> +} >> +EXPORT_SYMBOL(flush_old_exec); >> + >> +void would_dump(struct linux_binprm *bprm, struct file *file) >> +{ >> + struct inode *inode = file_inode(file); >> + if (inode_permission(inode, MAY_READ) < 0) { >> + struct user_namespace *old, *user_ns; >> + bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP; >> + >> + /* Ensure mm->user_ns contains the executable */ >> + user_ns = old = bprm->mm->user_ns; >> + while ((user_ns != &init_user_ns) && >> + !privileged_wrt_inode_uidgid(user_ns, inode)) >> + user_ns = user_ns->parent; >> + >> + if (old != user_ns) { >> + bprm->mm->user_ns = get_user_ns(user_ns); >> + put_user_ns(old); >> + } >> + } >> +} >> +EXPORT_SYMBOL(would_dump); > > The diff helpfully decided this moved would_dump(). ;) Is it worth > maybe just moviing it explicitly above flush_old_exec() to avoid this > churn? I dunno. Given the amount of review and testing that has been put in at this point I don't think so. >> + >> +void setup_new_exec(struct linux_binprm * bprm) >> +{ >> + /* Setup things that can depend upon the personality */ > > Should this comment be above the function instead? My experience has been that comments above functions unless they are in full linuxdoc tend to be less well maintained than comments within the function itself. So I don't think it is worth moving.x >> + struct task_struct *me = current; >> + >> + arch_pick_mmap_layout(me->mm, &bprm->rlim_stack); >> + >> + arch_setup_new_exec(); >> + >> + /* Set the new mm task size. We have to do that late because it may >> + * depend on TIF_32BIT which is only updated in flush_thread() on >> + * some architectures like powerpc >> + */ >> + me->mm->task_size = TASK_SIZE; >> mutex_unlock(&me->signal->exec_update_mutex); >> mutex_unlock(&me->signal->cred_guard_mutex); >> } >> -- >> 2.20.1 >> > > So, as I say, I *think* this is okay, but I always get suspicious about > reordering things in execve(). ;) > > So, with a bit larger changelog discussing what's moving "earlier", > I think this looks good: Please see above. Eric
On Thu, May 07, 2020 at 04:51:13PM -0500, Eric W. Biederman wrote: > I intend to the following text to the changelog. At this point I > believe I have read through everything and nothing raises any concerns > for me: > > --- text begin --- > > To see why it is safe to move this code please note that effectively > this change moves the personality setting in the binfmt and the following > three lines of code after everything except unlocking the mutexes: > arch_pick_mmap_layout > arch_setup_new_exec > mm->task_size = TASK_SIZE > > The function arch_pick_mmap_layout at most sets: > mm->get_unmapped_area > mm->mmap_base > mm->mmap_legacy_base > mm->mmap_compat_base > mm->mmap_compat_legacy_base > which nothing in flush_old_exec or setup_new_exec depends on. > > The function arch_setup_new_exec only sets architecture specific > state and the rest of the functions only deal in state that applies > to all architectures. > > The last line just sets mm->task_size and again nothing in flush_old_exec > or setup_new_exec depend on task_size. > > --- text end --- > [...] > > So, with a bit larger changelog discussing what's moving "earlier", > > I think this looks good: > > Please see above. Awesome! Thanks for checking my checking of your checking. ;) Acked-by: Kees Cook <keescook@chromium.org>
diff --git a/fs/exec.c b/fs/exec.c index 8c3abafb9bb1..0eff20558735 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1359,39 +1359,7 @@ int flush_old_exec(struct linux_binprm * bprm) * undergoing exec(2). */ do_close_on_exec(me->files); - return 0; - -out_unlock: - mutex_unlock(&me->signal->exec_update_mutex); -out: - return retval; -} -EXPORT_SYMBOL(flush_old_exec); - -void would_dump(struct linux_binprm *bprm, struct file *file) -{ - struct inode *inode = file_inode(file); - if (inode_permission(inode, MAY_READ) < 0) { - struct user_namespace *old, *user_ns; - bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP; - - /* Ensure mm->user_ns contains the executable */ - user_ns = old = bprm->mm->user_ns; - while ((user_ns != &init_user_ns) && - !privileged_wrt_inode_uidgid(user_ns, inode)) - user_ns = user_ns->parent; - if (old != user_ns) { - bprm->mm->user_ns = get_user_ns(user_ns); - put_user_ns(old); - } - } -} -EXPORT_SYMBOL(would_dump); - -void setup_new_exec(struct linux_binprm * bprm) -{ - struct task_struct *me = current; /* * Once here, prepare_binrpm() will not be called any more, so * the final state of setuid/setgid/fscaps can be merged into the @@ -1414,8 +1382,6 @@ void setup_new_exec(struct linux_binprm * bprm) bprm->rlim_stack.rlim_cur = _STK_LIM; } - arch_pick_mmap_layout(me->mm, &bprm->rlim_stack); - me->sas_ss_sp = me->sas_ss_size = 0; /* @@ -1430,16 +1396,9 @@ void setup_new_exec(struct linux_binprm * bprm) else set_dumpable(current->mm, SUID_DUMP_USER); - arch_setup_new_exec(); perf_event_exec(); __set_task_comm(me, kbasename(bprm->filename), true); - /* Set the new mm task size. We have to do that late because it may - * depend on TIF_32BIT which is only updated in flush_thread() on - * some architectures like powerpc - */ - me->mm->task_size = TASK_SIZE; - /* An exec changes our domain. We are no longer part of the thread group */ WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1); @@ -1467,6 +1426,50 @@ void setup_new_exec(struct linux_binprm * bprm) * credentials; any time after this it may be unlocked. */ security_bprm_committed_creds(bprm); + return 0; + +out_unlock: + mutex_unlock(&me->signal->exec_update_mutex); +out: + return retval; +} +EXPORT_SYMBOL(flush_old_exec); + +void would_dump(struct linux_binprm *bprm, struct file *file) +{ + struct inode *inode = file_inode(file); + if (inode_permission(inode, MAY_READ) < 0) { + struct user_namespace *old, *user_ns; + bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP; + + /* Ensure mm->user_ns contains the executable */ + user_ns = old = bprm->mm->user_ns; + while ((user_ns != &init_user_ns) && + !privileged_wrt_inode_uidgid(user_ns, inode)) + user_ns = user_ns->parent; + + if (old != user_ns) { + bprm->mm->user_ns = get_user_ns(user_ns); + put_user_ns(old); + } + } +} +EXPORT_SYMBOL(would_dump); + +void setup_new_exec(struct linux_binprm * bprm) +{ + /* Setup things that can depend upon the personality */ + struct task_struct *me = current; + + arch_pick_mmap_layout(me->mm, &bprm->rlim_stack); + + arch_setup_new_exec(); + + /* Set the new mm task size. We have to do that late because it may + * depend on TIF_32BIT which is only updated in flush_thread() on + * some architectures like powerpc + */ + me->mm->task_size = TASK_SIZE; mutex_unlock(&me->signal->exec_update_mutex); mutex_unlock(&me->signal->cred_guard_mutex); }
The current idiom for the callers is: flush_old_exec(bprm); set_personality(...); setup_new_exec(bprm); In 2010 Linus split flush_old_exec into flush_old_exec and setup_new_exec. With the intention that setup_new_exec be what is called after the processes new personality is set. Move the code that doesn't depend upon the personality from setup_new_exec into flush_old_exec. This is to facilitate future changes by having as much code together in one function as possible. Ref: 221af7f87b97 ("Split 'flush_old_exec' into two functions") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 85 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 44 insertions(+), 41 deletions(-)