Message ID | 20220207121800.5079-1-mkoutny@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | RLIMIT_NPROC in ucounts fixups | expand |
Michal Koutný <mkoutny@suse.com> writes: > This series is a result of looking deeper into breakage of > tools/testing/selftests/rlimits/rlimits-per-userns.c after > https://lore.kernel.org/r/20220204181144.24462-1-mkoutny@suse.com/ > is applied. > > The description of the original problem that lead to RLIMIT_NPROC et al. > ucounts rewrite could be ambiguously interpretted as supporting either > the case of: > - never-fork service or > - fork (RLIMIT_NPROC-1) times service. > > The scenario is weird anyway given existence of pids controller. > > The realization of that scenario relies not only on tracking number of > processes per user_ns but also newly allows the root to override limit through > set*uid. The commit message didn't mention that, so it's unclear if it > was the intention too. > > I also noticed that the RLIMIT_NPROC enforcing in fork seems subject to TOCTOU > race (check(nr_tasks),...,nr_tasks++) so the limit is rather advisory (but > that's not a new thing related to ucounts rewrite). > > This series is RFC to discuss relevance of the subtle changes RLIMIT_NPROC to > ucounts rewrite introduced. A quick reply (because I don't have a lot of time at the moment). I agree with the issues your first patch before this series addresses and the issues the first 3 patches address. I have not looked at the tests. I actually disagree with most of your fixes. Both because of intrusiveness and because of awkwardness. My basic problem with your fixes is I don't think they leave the code in a more maintainable state. Hopefully later today I can propose some alternative fixes and we can continue the discussion. One thing I think you misunderstood is the capability checks in set_user have always been there. There is a very good argument they are badly placed so are not exactly checking the correct credentials. Especially now. Your patch 4/6 I don't think makes sense. It has always been the case that root without capabilities is subject to the rlimit. If you are in a user namespace you are root without capabilities. Eric
Hello there, On 07/02/2022 12:17, Michal Koutný wrote: > This series is a result of looking deeper into breakage of > tools/testing/selftests/rlimits/rlimits-per-userns.c after > https://lore.kernel.org/r/20220204181144.24462-1-mkoutny@suse.com/ > is applied. Pardon the intrusion, but I thought you might be interested to know that as a humble user I noticed actual user-visible breakage from 59ec715 "ucounts: Fix rlimit max values check": https://bugzilla.kernel.org/show_bug.cgi?id=215596 I'm not sure I understand everything that's going on in this thread but it does seem very relevant. You guys might want to double-check the behavior in the particular scenario described there. I'm mostly sending this to make sure everything is cross-linked.
On Sat, Feb 12, 2022 at 03:32:30PM +0000, Etienne Dechamps <etienne@edechamps.fr> wrote: > I'm not sure I understand everything that's going on in this thread but it > does seem very relevant. You guys might want to double-check the behavior in > the particular scenario described there. I'm mostly sending this to make > sure everything is cross-linked. Thanks for the report with strace. AFAICT, it's caused by setresuid() after unshare(), i.e. all root's tasks are (wrongly) compared against the lowered RLIMIT_NPROC. This is tackled by my RFC patch 2/6 [1] or Eric's variant but 3/8 (equivalent fix for this case but I haven't run that build). Michal [1] I could run your test (LimitNPROC=1 actually) against kernel with my patches and the service starts.
Michal Koutný <mkoutny@suse.com> writes: > On Sat, Feb 12, 2022 at 03:32:30PM +0000, Etienne Dechamps <etienne@edechamps.fr> wrote: >> I'm not sure I understand everything that's going on in this thread but it >> does seem very relevant. You guys might want to double-check the behavior in >> the particular scenario described there. I'm mostly sending this to make >> sure everything is cross-linked. > > Thanks for the report with strace. > > AFAICT, it's caused by setresuid() after unshare(), i.e. all root's > tasks are (wrongly) compared against the lowered RLIMIT_NPROC. > > This is tackled by my RFC patch 2/6 [1] or Eric's variant but 3/8 > (equivalent fix for this case but I haven't run that build). > > Michal > > [1] I could run your test (LimitNPROC=1 actually) against kernel with my > patches and the service starts. So I looked into this and our previous patchsets (but not my final one) did resolve this. What fixed it and what is needed to fix this is not enforcing RLIMIT_NPROC when the user who creates the user namespace is INIT_USER. AKA something like the patch below. It is a regression so if at all possible it needs to be fixed, and it is certainly possible. The patch below feels right at first glance, but I am not convinced that testing cred->user or cred->ucounts is the proper test so I am going to sleep on this a little bit. I did want everyone to know I looked into this and I am going to ensure this gets fixed. diff --git a/kernel/fork.c b/kernel/fork.c index 17d8a8c85e3b..532ce5cbf851 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2027,7 +2027,7 @@ static __latent_entropy struct task_struct *copy_process( retval = -EAGAIN; if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) { - if (p->real_cred->user != INIT_USER && + if (p->real_cred->ucounts != &init_ucounts && !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) goto bad_fork_cleanup_count; } diff --git a/kernel/sys.c b/kernel/sys.c index 97dc9e5d6bf9..7b5d74a7845c 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -490,7 +490,7 @@ static void flag_nproc_exceeded(struct cred *new) * failure to the execve() stage. */ if (is_ucounts_overlimit(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) && - new->user != INIT_USER) + new->ucounts != &init_ucounts) current->flags |= PF_NPROC_EXCEEDED; else current->flags &= ~PF_NPROC_EXCEEDED; diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 6b2e3ca7ee99..925fb3579ef3 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -123,6 +123,8 @@ int create_user_ns(struct cred *new) ns->ucount_max[i] = INT_MAX; } set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)); + if (new->ucounts == &init_ucounts) + set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, RLIM_INFINITY); set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MSGQUEUE, rlimit(RLIMIT_MSGQUEUE)); set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_SIGPENDING, rlimit(RLIMIT_SIGPENDING)); set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MEMLOCK, rlimit(RLIMIT_MEMLOCK));
[CC'd the security list because I really don't know who the right people are to drag into this discussion] While looking at some issues that have cropped up with making it so that RLIMIT_NPROC cannot be escaped by creating a user namespace I have stumbled upon a very old issue of how rlimits and suid exec interact poorly. This specific saga starts with commit 909cc4ae86f3 ("[PATCH] Fix two bugs with process limits (RLIMIT_NPROC)") from https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git which essentially replaced a capable() check with a an open-coded implementation of suser(), for RLIMIT_NPROC. The description from Neil Brown was: 1/ If a setuid process swaps it's real and effective uids and then forks, the fork fails if the new realuid has more processes than the original process was limited to. This is particularly a problem if a user with a process limit (e.g. 256) runs a setuid-root program which does setuid() + fork() (e.g. lprng) while root already has more than 256 process (which is quite possible). The root problem here is that a limit which should be a per-user limit is being implemented as a per-process limit with per-process (e.g. CAP_SYS_RESOURCE) controls. Being a per-user limit, it should be that the root-user can over-ride it, not just some process with CAP_SYS_RESOURCE. This patch adds a test to ignore process limits if the real user is root. The test to see if the real user is root was: if (p->real_cred->user != INIT_USER) ... which persists to this day in fs/fork.c:copy_process(). The practical problem with this test is that it works like nothing else in the kernel, and so does not look like what it is. Saying: if (!uid_eq(p->real_cred->uid, GLOBAL_ROOT_USER)) ... would at least be more recognizable. Really this entire test should be if (!capable(CAP_SYS_RESOURCE) because CAP_SYS_RESOURCE is the capability that controls if you are allowed to exceed your rlimits. Which brings us to the practical issues of how all of these things are wired together today. The per-user rlimits are accounted based upon a processes real user, not the effective user. All other permission checks are based upon the effective user. This has the practical effect that uids are swapped as above that the processes are charged to root, but use the permissions of an ordinary user. The problems get worse when you realize that suid exec does not reset any of the rlimits except for RLIMIT_STACK. The rlimits that are particularly affected and are per-user are: RLIMIT_NPROC, RLIMIT_MSGQUEUE, RLIMIT_SIGPENDING, RLIMIT_MEMLOCK. But I think failing to reset rlimits during exec has the potential to effect any suid exec. Does anyone have any historical knowledge or sense of how this should work? Right now it feels like we have coded ourselves into a corner and will have to risk breaking userspace to get out of it. AKA I think we need a policy of reseting rlimits on suid exec, and I think we need to store global rlimits based upon the effective user not the real user. Those changes should allow making capable calls where they belong, and removing the much too magic user == INIT_USER test for RLIMIT_NPROC. Eric
On Wed, Feb 23, 2022 at 10:00 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > [CC'd the security list because I really don't know who the right people > are to drag into this discussion] > > While looking at some issues that have cropped up with making it so > that RLIMIT_NPROC cannot be escaped by creating a user namespace I have > stumbled upon a very old issue of how rlimits and suid exec interact > poorly. Once upon a time, these resource limits were effectively the only way to control memory consumption and consumption of historically limited resources like processes. (The scheduler used to have serious issues with too many processes -- this is not so true any more. And without cgroups, too many processes could use too much CPU collectively.) This all worked pretty poorly. Now we have cgroups, fancy memory accounting, etc. So I'm wondering if NPROC is even useful anymore. I don't have a brilliant idea of how to deprecate it, but I think it wouldn't be entirely nuts to take it much less seriously and maybe even eventually get rid of it. I doubt there is much existing userspace that would break if a previously failing fork() started succeeding. --Andy]
On Wed, Feb 23, 2022 at 10:00 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Which brings us to the practical issues of how all of these things are > wired together today. I honestly think you should treat the limits as "approximate". We do that for a number of reasons: - sometimes we have racy tests because we don't want to do excessive locking just for a limit: nobody cares if you can go a couple of entries past a limit because you were lucky, it's important that you can't go *much* past the limit. - sometimes the limits themselves are fuzzy (example: time. it's incremented by "ticks", but it's simply not that precise, and it depends a bit when the ticks happen) - sometimes it's ambiguous who we're talking about. I think suid execs tend to fall in that third category. Be generous. If the limit doesn't trigger at the suid exec, nobody cares. You want to make sure it triggers eventually. For example, let's say that you are the admin, and you made a mistake, and you had a runaway fork() bomb that was caught by the limits. Optimally, you still want to be able to be able to log in (one process that was root when it did the fork(), and did a 'setresuid()' or similar to drop the things, and then one process that does 'sudo' to get privileges to kill the darn fork bomb). See how that 'user' technically went over the limit, and that was A-OK! Basic rule: it's better to be too lenient than to be too strict. Linus
Hi Andy, On Wed, Feb 23, 2022 at 11:44:51AM -0800, Andy Lutomirski wrote: > On Wed, Feb 23, 2022 at 10:00 AM Eric W. Biederman > <ebiederm@xmission.com> wrote: > > > > > > [CC'd the security list because I really don't know who the right people > > are to drag into this discussion] > > > > While looking at some issues that have cropped up with making it so > > that RLIMIT_NPROC cannot be escaped by creating a user namespace I have > > stumbled upon a very old issue of how rlimits and suid exec interact > > poorly. > > Once upon a time, these resource limits were effectively the only way > to control memory consumption and consumption of historically limited > resources like processes. (The scheduler used to have serious issues > with too many processes -- this is not so true any more. And without > cgroups, too many processes could use too much CPU collectively.) > This all worked pretty poorly. Now we have cgroups, fancy memory > accounting, etc. So I'm wondering if NPROC is even useful anymore. I > don't have a brilliant idea of how to deprecate it, but I think it > wouldn't be entirely nuts to take it much less seriously and maybe > even eventually get rid of it. > > I doubt there is much existing userspace that would break if a > previously failing fork() started succeeding. I strongly disagree. I've been using it for a long time as a security measure. Setting NPROC to 0 after daemonizing remains a particularly effective and portable method to mitigate the possible consequences of an in-process intrusion. While I wouldn't care about approximate non-zero values, for me it would be a significant security regression to drop the inability to fork() when the limit is zero. Thus at least I do want to keep that feature when NPROC is zero. Willy
Linus Torvalds <linus@torvalds.org> writes: > Basic rule: it's better to be too lenient than to be too strict. Thank you. With that guideline I can explore the space of what is possible. Question: Running a suid program today charges the activity of that program to the user who ran that program, not to the user the program runs as. Does anyone see a problem with charging the user the program runs as? The reason I want to change who is charged with a process (besides it making more sense in my head) is so that capable(CAP_SYS_RESOURCE) can be used instead of the magic incantation (cred->user == INIT_USER). An accidental experiment happened in v5.14-rc1 in July when the ucount rlimit code was merged. It was only this last week when after Michal Koutný discovered the discrepency through code inspect a bug fix was merged. This changes the behavior that has existed in some form since Linux v1.0 when per user process limits were added. The original code in v1.0 looked like: > static int find_empty_process(void) > { > int free_task; > int i, tasks_free; > int this_user_tasks; > > repeat: > if ((++last_pid) & 0xffff8000) > last_pid=1; > this_user_tasks = 0; > tasks_free = 0; > free_task = -EAGAIN; > i = NR_TASKS; > while (--i > 0) { > if (!task[i]) { > free_task = i; > tasks_free++; > continue; > } > if (task[i]->uid == current->uid) > this_user_tasks++; > if (task[i]->pid == last_pid || task[i]->pgrp == last_pid || > task[i]->session == last_pid) > goto repeat; > } > if (tasks_free <= MIN_TASKS_LEFT_FOR_ROOT || > this_user_tasks > MAX_TASKS_PER_USER) > if (current->uid) > return -EAGAIN; > return free_task; > } Having tracked the use of real uid in limits back this far my guess is that it was an accident of the implementation and real uid vs effective uid had not be considered. Does anyone know if choosing the real uid was a deliberate decision anywhere in the history of Linux? Linus you were talking about making it possible to login as I think a non-root user to be able to use sudo and kill a fork bomb. The counter case is apache having a dedicated user for running cgi-scripts and using RLIMIT_NPROC to limit how many of those processes can exist. Unless I am misunderstanding something that looks exactly like your login as non-root so you can run sudo to kill a fork-bomb. A comment from an in-process cleanup patch explains this as best I can: /* * In general rlimits are only enforced when a new resource * is acquired. That would be during fork for RLIMIT_NPROC. * That is insufficient for RLIMIT_NPROC as many attributes of * a new process must be set between fork and exec. * * A case where this matter is when apache runs forks a process * and calls setuid to run cgi-scripts as a different user. * Generating those processes through a code sequence like: * * fork() * setrlimit(RLIMIT_NPROC, ...) * execve() -- suid wrapper * setuid() * execve() -- cgi script * * The cgi-scripts are unlikely to fork on their own so unless * RLIMIT_NPROC is checked after the user change and before * the cgi-script starts, RLIMIT_NPROC simply will not be enforced * for the cgi-scripts. * * So the code tracks if between fork and exec if an operation * occurs that could cause the RLIMIT_NPROC check to fail. If * such an operation has happened re-check RLIMIT_NPROC. */ Answered-Question: I was trying to ask if anyone knows of a reason why we can't just sanitize the rlimits of the process during suid exec? Linus your guideline would appear to allow that behavior. Unfortunately that looks like it would break current usage of apache suexec. Eric
Linus Torvalds <linus@torvalds.org> writes: > Basic rule: it's better to be too lenient than to be too strict. Thank you. With that guideline I can explore the space of what is possible. Question: Running a suid program today charges the activity of that program to the user who ran that program, not to the user the program runs as. Does anyone see a problem with charging the user the program runs as? The reason I want to change which user is charged with a process (besides it making more sense in my head) is so that "capable(CAP_SYS_RESOURCE)" can be used instead of the magic incantation "(cred->user == INIT_USER)". Today "capable(CAP_SYS_RESOURCE)" with respect to RLIMIT_NPROC is effectively meaningless for suid programs because the of the mismatch of charging the real user with the effective users credentials. An accidental experiment happened in v5.14-rc1 in July when the ucount rlimit code was merged. It was only this last week when after Michal Koutný discovered the discrepancy through code inspection I merged a bug fix because the code was not preserving the existing behavior as intended. This behavior has existed in some form since Linux v1.0 when per user process limits were added. The original code in v1.0 was: > static int find_empty_process(void) > { > int free_task; > int i, tasks_free; > int this_user_tasks; > > repeat: > if ((++last_pid) & 0xffff8000) > last_pid=1; > this_user_tasks = 0; > tasks_free = 0; > free_task = -EAGAIN; > i = NR_TASKS; > while (--i > 0) { > if (!task[i]) { > free_task = i; > tasks_free++; > continue; > } > if (task[i]->uid == current->uid) > this_user_tasks++; > if (task[i]->pid == last_pid || task[i]->pgrp == last_pid || > task[i]->session == last_pid) > goto repeat; > } > if (tasks_free <= MIN_TASKS_LEFT_FOR_ROOT || > this_user_tasks > MAX_TASKS_PER_USER) > if (current->uid) > return -EAGAIN; > return free_task; > } Having tracked the use of real uid in limits back this far my guess is that it was an accident of the implementation and real uid vs effective uid had not be considered. Does anyone know if choosing the real uid vs the effective uid for accounting a users processes was a deliberate decision anywhere in the history of Linux? Linus you were talking about making it possible to login as I think a non-root user to be able to use sudo and kill a fork bomb. The counter case is apache having a dedicated user for running cgi-scripts and using RLIMIT_NPROC to limit how many of those processes can exist. Unless I am misunderstanding something that looks exactly like your login as non-root so you can run sudo to kill a fork-bomb. A comment from an in-process cleanup patch explains this as best I can: /* * In general rlimits are only enforced when a new resource * is acquired. That would be during fork for RLIMIT_NPROC. * That is insufficient for RLIMIT_NPROC as many attributes of * a new process must be set between fork and exec. * * A case where this matter is when apache runs forks a process * and calls setuid to run cgi-scripts as a different user. * Generating those processes through a code sequence like: * * fork() * setrlimit(RLIMIT_NPROC, ...) * execve() -- suid wrapper * setuid() * execve() -- cgi script * * The cgi-scripts are unlikely to fork on their own so unless * RLIMIT_NPROC is checked after the user change and before * the cgi-script starts, RLIMIT_NPROC simply will not be enforced * for the cgi-scripts. * * So the code tracks if between fork and exec if an operation * occurs that could cause the RLIMIT_NPROC check to fail. If * such an operation has happened re-check RLIMIT_NPROC. */ Answered-Question: I was trying to ask if anyone knows of a reason why we can't just sanitize the rlimits of the process during suid exec? Linus your guideline would appear to allow that behavior. Unfortunately that looks like it would break current usage of apache suexec. Eric
On Wed, Feb 23, 2022 at 5:24 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Question: Running a suid program today charges the activity of that > program to the user who ran that program, not to the user the program > runs as. Does anyone see a problem with charging the user the program > runs as? So I think that there's actually two independent issues with limits when you have situations like this where the actual user might be ambiguous. - the "who to charge" question - the "how do we *check* the limit" question and honestly, I think that when it comes to suid binaries, the first question is fundamentally ambiguous, because it almost certainly depends on the user. Which to me implies that there probably isn't an answer that is always right, and that what you should look at is that second option. So I would actually suggest that the "execute a suid binary" should charge the real user, but *because* it is suid, it should then not check the limit (or, perhaps, should check the hard limit?). You have to charge somebody, but at that point it's a bit ambiguous whether it should be allowed. Exactly so that if you're over a process limit (or something similar - think "too many files open" or whatever because you screwed up and opened everything) you could still log in as yourself (ssh/login charges some admin thing, which probably has high limits or is unlimited), and hopefully get shell access, and then be able to "exec sudo" to actually get admin access that should be disabled from the network. The above is just one (traditional) example of a fork/open bomb case where a user isn't really able to no longer function as himself, but wants to fix things (maybe the user has another terminal open, but then he can hopefully use a shell-buiiltin 'kill' instead). And I'm not saying it's "the thing that needs to work". I'm more making up an example. So I'm only saying that the above actually has two examples to the two sides of the coin: "login" lowering privileges to a user that may be over some limit - and succeeding despite that - and 'suid' succeeding despite the original user perhaps being over-committed. So it's intended exactly as an example of "picking the new or the old user would be wrong in either case if you check limits at the transition point". Hmm? Linus
Linus Torvalds <linus@torvalds.org> writes: > On Wed, Feb 23, 2022 at 5:24 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> Question: Running a suid program today charges the activity of that >> program to the user who ran that program, not to the user the program >> runs as. Does anyone see a problem with charging the user the program >> runs as? > > So I think that there's actually two independent issues with limits > when you have situations like this where the actual user might be > ambiguous. > > - the "who to charge" question > > - the "how do we *check* the limit" question > > and honestly, I think that when it comes to suid binaries, the first > question is fundamentally ambiguous, because it almost certainly > depends on the user. > > Which to me implies that there probably isn't an answer that is always > right, and that what you should look at is that second option. > > So I would actually suggest that the "execute a suid binary" should > charge the real user, but *because* it is suid, it should then not > check the limit (or, perhaps, should check the hard limit?). > > You have to charge somebody, but at that point it's a bit ambiguous > whether it should be allowed. > > Exactly so that if you're over a process limit (or something similar - > think "too many files open" or whatever because you screwed up and > opened everything) you could still log in as yourself (ssh/login > charges some admin thing, which probably has high limits or is > unlimited), and hopefully get shell access, and then be able to "exec > sudo" to actually get admin access that should be disabled from the > network. > > The above is just one (traditional) example of a fork/open bomb case > where a user isn't really able to no longer function as himself, but > wants to fix things (maybe the user has another terminal open, but > then he can hopefully use a shell-buiiltin 'kill' instead). > > And I'm not saying it's "the thing that needs to work". I'm more > making up an example. > > So I'm only saying that the above actually has two examples to the two > sides of the coin: "login" lowering privileges to a user that may be > over some limit - and succeeding despite that - and 'suid' succeeding > despite the original user perhaps being over-committed. > > So it's intended exactly as an example of "picking the new or the old > user would be wrong in either case if you check limits at the > transition point". > > Hmm? That doesn't really clarify anything for me. We have two checks one in fork and one in exec and you seem to be talking about the check in exec. The check I have problems with for a suid executable is the check in fork. If the new process is accounted to the previous user and we use the permissions of the effective user for checking it that does not make sense to me. If we can sort out that the check in fork. I think I have clarity about the other cases. The check in exec while clumsy and needing cleaning up seems to make sense to me. We have a transition that starts with fork and ends with exec and has operations like setuid in between. If something like setuid() is called before exec we check in exec. The case the check in exec is aimed at supporting are processes spawned from a parent that have a different user (than the parent) and will never call fork again. Those processes would be fundamentally immune to RLIMIT_NPROC if we don't check somewhere besides fork. There is existing code in apache to use RLIMIT_NPROC this way. For your login case I have no problems with it in principle. In practice I think you have to login as root to deal with a fork bomb that hits RLIMIT_NPROC and does not die gracefully. What I don't see about your login example is how it is practically different from the apache cgi script case, that the code has supported for 20 years, and that would be a regression if stopped supporting. If we want to stop supporting that case we can just remove all of the RLIMIT_NPROC tests everywhere except for fork, a nice cleanup. That still leaves me with mismatched effective vs real uid checks in fork when the effective and real uids don't match. Which means testing for root with "capable(CAP_SYS_ADMIN)" does not work. Which today is make the code a bit of a challenge to understand and work with. Eric
From: Linus Torvalds > Sent: 24 February 2022 01:42 > > On Wed, Feb 23, 2022 at 5:24 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > > Question: Running a suid program today charges the activity of that > > program to the user who ran that program, not to the user the program > > runs as. Does anyone see a problem with charging the user the program > > runs as? > > So I think that there's actually two independent issues with limits > when you have situations like this where the actual user might be > ambiguous. > > - the "who to charge" question > > - the "how do we *check* the limit" question > > and honestly, I think that when it comes to suid binaries, the first > question is fundamentally ambiguous, because it almost certainly > depends on the user. Doesn't the rlimit check happen during the fork. At which time you don't know that a suid exec might follow? The problem with changing the uid is that when the process exits you need to "uncharge" the correct uid. So either you need to remember the original uid or setuid has to transfer the charge (whichever uid is used). If you transfer the charge then the setuid system call can't fail. But a later exec can fail. Any check will always be done against the process's own rlimit value. Set that to zero and fork should fail regardless of which uid's process count is checked. Now a normal suid program only changes the effective uid. So keeping the process charged against the real uid makes sense. If a process changes its real uid you could change the charged uid but you can't error if over the rlimit value. OTOH during a later exec you can test things and exec can fail. At least one unix I've used has three uids for each process. The 'real uid', 'effective uid' and 'saved by exec uid'. I suspect the process is always "charged" against the latter. I think that exec compares the 'real' and 'saved by exec' uids and, if different, moves the charge to the real uid (which will check rlimit) then sets the 'saved by exec uid' to the real uid. So an exec after a setuid() can be allowed to fail if the real user has too many processes. But in all other cases exec just works regardless of the process count for any user. > > Which to me implies that there probably isn't an answer that is always > right, and that what you should look at is that second option. > > So I would actually suggest that the "execute a suid binary" should > charge the real user, but *because* it is suid, it should then not > check the limit (or, perhaps, should check the hard limit?). > > You have to charge somebody, but at that point it's a bit ambiguous > whether it should be allowed. > > Exactly so that if you're over a process limit (or something similar - > think "too many files open" or whatever because you screwed up and > opened everything) you could still log in as yourself (ssh/login > charges some admin thing, which probably has high limits or is > unlimited), and hopefully get shell access, and then be able to "exec > sudo" to actually get admin access that should be disabled from the > network. You usually have to use 'rsh machine sh -i' to avoid the shell running all its startup scripts. But I doubt that will get you past a fork bomb. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)