From patchwork Mon Feb 7 12:17:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Michal_Koutn=C3=BD?= X-Patchwork-Id: 12737175 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39570C433FE for ; Mon, 7 Feb 2022 12:32:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1382629AbiBGM20 (ORCPT ); Mon, 7 Feb 2022 07:28:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1382733AbiBGMYW (ORCPT ); Mon, 7 Feb 2022 07:24:22 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07E93C002B4D; Mon, 7 Feb 2022 04:18:13 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9B090210E7; Mon, 7 Feb 2022 12:18:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1644236292; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i4aAhbDoTDfRME9edOXYN8Xd3j7KeUJo6FCdttuCi7Q=; b=qcuQVmPAKpuli8d8JKcy9s2e+6GX/jOzZeZJNIji/24oWaLjPX8+Cz2+qzRBS1bRUmJPDe pYqkF0kg059Fuh0emCFA3zwXUDfANmYFDp+aiQv2AFBdtdSYZWeY8qy3+mQ+BfgWCYW2ra eFvrMzdhrvh2BOyfRa+Pli1YwbUwxbk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 781D913BE6; Mon, 7 Feb 2022 12:18:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id AInNHAQOAWLMegAAMHmgww (envelope-from ); Mon, 07 Feb 2022 12:18:12 +0000 From: =?utf-8?q?Michal_Koutn=C3=BD?= To: Eric Biederman , Alexey Gladkov Cc: Kees Cook , Shuah Khan , Christian Brauner , Solar Designer , Ran Xiaokai , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Linux Containers Subject: [RFC PATCH 1/6] set_user: Perform RLIMIT_NPROC capability check against new user credentials Date: Mon, 7 Feb 2022 13:17:55 +0100 Message-Id: <20220207121800.5079-2-mkoutny@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220207121800.5079-1-mkoutny@suse.com> References: <20220207121800.5079-1-mkoutny@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org The check is currently against the current->cred but since those are going to change and we want to check RLIMIT_NPROC condition after the switch, supply the capability check with the new cred. But since we're checking new_user being INIT_USER any new cred's capability-based allowance may be redundant when the check fails and the alternative solution would be revert of the commit 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds") Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds") Cc: Solar Designer Cc: Christian Brauner Signed-off-by: Michal Koutný --- kernel/sys.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sys.c b/kernel/sys.c index 8ea20912103a..48c90dcceff3 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -481,7 +481,8 @@ static int set_user(struct cred *new) */ if (ucounts_limit_cmp(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) >= 0 && new_user != INIT_USER && - !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) + !security_capable(new, &init_user_ns, CAP_SYS_RESOURCE, CAP_OPT_NONE) && + !security_capable(new, &init_user_ns, CAP_SYS_ADMIN, CAP_OPT_NONE)) current->flags |= PF_NPROC_EXCEEDED; else current->flags &= ~PF_NPROC_EXCEEDED; From patchwork Mon Feb 7 12:17:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Michal_Koutn=C3=BD?= X-Patchwork-Id: 12737214 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A18D4C43217 for ; Mon, 7 Feb 2022 13:28:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242694AbiBGN1l (ORCPT ); Mon, 7 Feb 2022 08:27:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1445598AbiBGMmS (ORCPT ); Mon, 7 Feb 2022 07:42:18 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D0BBE033DB0; Mon, 7 Feb 2022 04:33:57 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id BBE201F386; Mon, 7 Feb 2022 12:18:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1644236292; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LOAZl+X1cH+/PE8PnJNA97WZ/FEmmtdAOX5J155V/GY=; b=R8vHFitE3/0c2+aGokRHHqIecoBd082bURHw791MAgCrpS+qdVLAID/ocEGvCoGhMecpAz 8vZtgPllVDkkTehtIMbSCysfFSrc9ShUVS4/eJouxWbDJ06PWi1EEGxlqdAAHmc6Siwwl3 6vp3aC0jiKvBudbRtwu0d5OAVauMN5M= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 97D9113BBC; Mon, 7 Feb 2022 12:18:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 2KeSJAQOAWLMegAAMHmgww (envelope-from ); Mon, 07 Feb 2022 12:18:12 +0000 From: =?utf-8?q?Michal_Koutn=C3=BD?= To: Eric Biederman , Alexey Gladkov Cc: Kees Cook , Shuah Khan , Christian Brauner , Solar Designer , Ran Xiaokai , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Linux Containers Subject: [RFC PATCH 2/6] set*uid: Check RLIMIT_PROC against new credentials Date: Mon, 7 Feb 2022 13:17:56 +0100 Message-Id: <20220207121800.5079-3-mkoutny@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220207121800.5079-1-mkoutny@suse.com> References: <20220207121800.5079-1-mkoutny@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org The generic idea is that not even root or capable user can force an unprivileged user's limit breach. (For historical and security reasons this check is postponed from set*uid to execve.) During the switch the resource consumption of target the user has to be checked. The commits 905ae01c4ae2 ("Add a reference to ucounts for each cred") and 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") made the check in set_user() look at the old user's consumption. This version of the fix simply moves the check to the place where the actual switch of the accounting structure happens -- set_cred_ucounts(). The other callers are kept without the check but with the per-userns accounting they may be newly subject to the check too. The set_cred_ucounts() becomes inconsistent since task->flags are passed by the caller but task_rlimit() is implicitly `current`'s, this patch is meant to illustrate the issue, nicer implementation is possible. Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Signed-off-by: Michal Koutný --- fs/exec.c | 2 +- include/linux/cred.h | 2 +- kernel/cred.c | 24 +++++++++++++++++++++--- kernel/fork.c | 2 +- kernel/sys.c | 21 +++------------------ kernel/user_namespace.c | 2 +- 6 files changed, 28 insertions(+), 25 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index fc598c2652b2..e759e42c61da 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1363,7 +1363,7 @@ int begin_new_exec(struct linux_binprm * bprm) WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1); flush_signal_handlers(me, 0); - retval = set_cred_ucounts(bprm->cred); + retval = set_cred_ucounts(bprm->cred, NULL); if (retval < 0) goto out_unlock; diff --git a/include/linux/cred.h b/include/linux/cred.h index fcbc6885cc09..455525ab380d 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -170,7 +170,7 @@ extern int set_security_override_from_ctx(struct cred *, const char *); extern int set_create_files_as(struct cred *, struct inode *); extern int cred_fscmp(const struct cred *, const struct cred *); extern void __init cred_init(void); -extern int set_cred_ucounts(struct cred *); +extern int set_cred_ucounts(struct cred *, unsigned int *); /* * check for validity of credentials diff --git a/kernel/cred.c b/kernel/cred.c index 473d17c431f3..791cab70b764 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -370,7 +370,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) ret = create_user_ns(new); if (ret < 0) goto error_put; - ret = set_cred_ucounts(new); + ret = set_cred_ucounts(new, NULL); if (ret < 0) goto error_put; } @@ -492,7 +492,7 @@ int commit_creds(struct cred *new) /* do it * RLIMIT_NPROC limits on user->processes have already been checked - * in set_user(). + * in set_cred_ucounts(). */ alter_cred_subscribers(new, 2); if (new->user != old->user || new->user_ns != old->user_ns) @@ -663,7 +663,7 @@ int cred_fscmp(const struct cred *a, const struct cred *b) } EXPORT_SYMBOL(cred_fscmp); -int set_cred_ucounts(struct cred *new) +int set_cred_ucounts(struct cred *new, unsigned int *nproc_flags) { struct task_struct *task = current; const struct cred *old = task->real_cred; @@ -685,6 +685,24 @@ int set_cred_ucounts(struct cred *new) new->ucounts = new_ucounts; put_ucounts(old_ucounts); + if (!nproc_flags) + return 0; + + /* + * We don't fail in case of NPROC limit excess here because too many + * poorly written programs don't check set*uid() return code, assuming + * it never fails if called by root. We may still enforce NPROC limit + * for programs doing set*uid()+execve() by harmlessly deferring the + * failure to the execve() stage. + */ + if (ucounts_limit_cmp(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) >= 0 && + new->user != INIT_USER && + !security_capable(new, &init_user_ns, CAP_SYS_RESOURCE, CAP_OPT_NONE) && + !security_capable(new, &init_user_ns, CAP_SYS_ADMIN, CAP_OPT_NONE)) + *nproc_flags |= PF_NPROC_EXCEEDED; + else + *nproc_flags &= ~PF_NPROC_EXCEEDED; + return 0; } diff --git a/kernel/fork.c b/kernel/fork.c index 7cb21a70737d..a4005c679d29 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -3051,7 +3051,7 @@ int ksys_unshare(unsigned long unshare_flags) goto bad_unshare_cleanup_cred; if (new_cred) { - err = set_cred_ucounts(new_cred); + err = set_cred_ucounts(new_cred, NULL); if (err) goto bad_unshare_cleanup_cred; } diff --git a/kernel/sys.c b/kernel/sys.c index 48c90dcceff3..4e4eea30e235 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -472,21 +472,6 @@ static int set_user(struct cred *new) if (!new_user) return -EAGAIN; - /* - * We don't fail in case of NPROC limit excess here because too many - * poorly written programs don't check set*uid() return code, assuming - * it never fails if called by root. We may still enforce NPROC limit - * for programs doing set*uid()+execve() by harmlessly deferring the - * failure to the execve() stage. - */ - if (ucounts_limit_cmp(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) >= 0 && - new_user != INIT_USER && - !security_capable(new, &init_user_ns, CAP_SYS_RESOURCE, CAP_OPT_NONE) && - !security_capable(new, &init_user_ns, CAP_SYS_ADMIN, CAP_OPT_NONE)) - current->flags |= PF_NPROC_EXCEEDED; - else - current->flags &= ~PF_NPROC_EXCEEDED; - free_uid(new->user); new->user = new_user; return 0; @@ -560,7 +545,7 @@ long __sys_setreuid(uid_t ruid, uid_t euid) if (retval < 0) goto error; - retval = set_cred_ucounts(new); + retval = set_cred_ucounts(new, ¤t->flags); if (retval < 0) goto error; @@ -622,7 +607,7 @@ long __sys_setuid(uid_t uid) if (retval < 0) goto error; - retval = set_cred_ucounts(new); + retval = set_cred_ucounts(new, ¤t->flags); if (retval < 0) goto error; @@ -701,7 +686,7 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid) if (retval < 0) goto error; - retval = set_cred_ucounts(new); + retval = set_cred_ucounts(new, ¤t->flags); if (retval < 0) goto error; diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 6b2e3ca7ee99..f7eec0b0233b 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -1344,7 +1344,7 @@ static int userns_install(struct nsset *nsset, struct ns_common *ns) put_user_ns(cred->user_ns); set_cred_user_ns(cred, get_user_ns(user_ns)); - if (set_cred_ucounts(cred) < 0) + if (set_cred_ucounts(cred, NULL) < 0) return -EINVAL; return 0; From patchwork Mon Feb 7 12:17:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Michal_Koutn=C3=BD?= X-Patchwork-Id: 12737212 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C059C433EF for ; Mon, 7 Feb 2022 13:28:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231258AbiBGN1k (ORCPT ); Mon, 7 Feb 2022 08:27:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1445594AbiBGMmS (ORCPT ); Mon, 7 Feb 2022 07:42:18 -0500 X-Greylist: delayed 944 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Mon, 07 Feb 2022 04:33:58 PST Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BBB4E033DAE; Mon, 7 Feb 2022 04:33:57 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id DB2991F388; Mon, 7 Feb 2022 12:18:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1644236292; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CS9BI2kyakQST5h0Bh8iX4MIXucTQqH9EtpYM1r+SFU=; b=bqyOAR/Ayx8HfmqJm7xqDwfnnaJaQS1Qv9Tx/Q+FF33WDgwUGGTN7e3wbrcRLHYknmL5Om HmZMg8Jr6e/02m+nPGbCLFRtXL0IkWThH62+sbXAnZpweMZCIcEhJ8/G562/CYTQvgCl2M gtj0TNANCxXwxRIXLjvgdFspEH12zzE= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id BA6EC13BE6; Mon, 7 Feb 2022 12:18:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id +E37LAQOAWLMegAAMHmgww (envelope-from ); Mon, 07 Feb 2022 12:18:12 +0000 From: =?utf-8?q?Michal_Koutn=C3=BD?= To: Eric Biederman , Alexey Gladkov Cc: Kees Cook , Shuah Khan , Christian Brauner , Solar Designer , Ran Xiaokai , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Linux Containers Subject: [RFC PATCH 3/6] cred: Count tasks by their real uid into RLIMIT_NPROC Date: Mon, 7 Feb 2022 13:17:57 +0100 Message-Id: <20220207121800.5079-4-mkoutny@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220207121800.5079-1-mkoutny@suse.com> References: <20220207121800.5079-1-mkoutny@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Tasks are associated to multiple users at once. Historically and as per setrlimit(2) RLIMIT_NPROC is enforce based on real user ID. The commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") made the accounting structure "indexed" by euid and hence potentially account tasks differently. The effective user ID may be different e.g. for setuid programs but those are exec'd into already existing task (i.e. below limit), so different accounting is moot. Some special setresuid(2) users may notice the difference, justifying this fix. (This is just illustrative, it piggy-backs onto nproc_flags and should be implemented properly.) Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Signed-off-by: Michal Koutný --- kernel/cred.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/cred.c b/kernel/cred.c index 791cab70b764..ed247daa1f67 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -668,6 +668,7 @@ int set_cred_ucounts(struct cred *new, unsigned int *nproc_flags) struct task_struct *task = current; const struct cred *old = task->real_cred; struct ucounts *new_ucounts, *old_ucounts = new->ucounts; + kuid_t new_uid = nproc_flags ? new->uid : new->euid; if (new->user == old->user && new->user_ns == old->user_ns) return 0; @@ -676,10 +677,10 @@ int set_cred_ucounts(struct cred *new, unsigned int *nproc_flags) * This optimization is needed because alloc_ucounts() uses locks * for table lookups. */ - if (old_ucounts->ns == new->user_ns && uid_eq(old_ucounts->uid, new->euid)) + if (old_ucounts->ns == new->user_ns && uid_eq(old_ucounts->uid, new_uid)) return 0; - if (!(new_ucounts = alloc_ucounts(new->user_ns, new->euid))) + if (!(new_ucounts = alloc_ucounts(new->user_ns, new_uid))) return -EAGAIN; new->ucounts = new_ucounts; From patchwork Mon Feb 7 12:17:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Michal_Koutn=C3=BD?= X-Patchwork-Id: 12737210 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3622EC433EF for ; Mon, 7 Feb 2022 13:27:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229983AbiBGN0S (ORCPT ); Mon, 7 Feb 2022 08:26:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1445600AbiBGMmS (ORCPT ); Mon, 7 Feb 2022 07:42:18 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C6D2E033DAF; Mon, 7 Feb 2022 04:33:57 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 060CC1F38D; Mon, 7 Feb 2022 12:18:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1644236293; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JUdGqqOL/AiENbReMRD7Dx9TxjZn7NTW8OeBOFQBruY=; b=syKK+oRx7HRZghKuXKJxOaqgXkuQ234dxQMVDFvHqsaX93Pknk+cCPMfxkHYeEw0jk+uzw 1SPGHJ5iqzTU0Vy8kr4s/+owO2lP/pLWg/MmnyUj8X9wb+MQygfHMldN0Ffcm+JVXTPR/x cN1mEwthFVDCQsFbjhA26vHWnSsuR2I= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D976113BBC; Mon, 7 Feb 2022 12:18:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id MNyWNAQOAWLMegAAMHmgww (envelope-from ); Mon, 07 Feb 2022 12:18:12 +0000 From: =?utf-8?q?Michal_Koutn=C3=BD?= To: Eric Biederman , Alexey Gladkov Cc: Kees Cook , Shuah Khan , Christian Brauner , Solar Designer , Ran Xiaokai , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Linux Containers Subject: [RFC PATCH 4/6] ucounts: Allow root to override RLIMIT_NPROC Date: Mon, 7 Feb 2022 13:17:58 +0100 Message-Id: <20220207121800.5079-5-mkoutny@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220207121800.5079-1-mkoutny@suse.com> References: <20220207121800.5079-1-mkoutny@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Call sites of ucounts_limit_cmp() would allow the global root or capable user to bypass RLIMIT_NPROC on the bottom level of user_ns tree by not looking at ucounts at all. As the traversal up the user_ns tree continues, the ucounts to which the task is charged may switch the owning user (to the creator of user_ns). If the new chargee is root, we don't really care about RLIMIT_NPROC observation, so lift the limit to the max. The result is that an unprivileged user U can globally run more that RLIMIT_NPROC (of user_ns) tasks but within each user_ns it is still limited to RLIMINT_NPROC (as passed into task->signal->rlim) iff the user_nss are created by the privileged user. Signed-off-by: Michal Koutný --- kernel/ucount.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/ucount.c b/kernel/ucount.c index 53ccd96387dd..f52b7273a572 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -356,6 +356,9 @@ long ucounts_limit_cmp(struct ucounts *ucounts, enum ucount_type type, unsigned if (excess > 0) return excess; max = READ_ONCE(iter->ns->ucount_max[type]); + /* Next ucounts owned by root? RLIMIT_NPROC is moot */ + if (type == UCOUNT_RLIMIT_NPROC && uid_eq(iter->ns->owner, GLOBAL_ROOT_UID)) + max = LONG_MAX; } return excess; } From patchwork Mon Feb 7 12:17:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Michal_Koutn=C3=BD?= X-Patchwork-Id: 12737176 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61BB8C43217 for ; Mon, 7 Feb 2022 12:32:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1382630AbiBGM20 (ORCPT ); Mon, 7 Feb 2022 07:28:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1382754AbiBGMYW (ORCPT ); Mon, 7 Feb 2022 07:24:22 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E0E2C002B4E; Mon, 7 Feb 2022 04:18:14 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 262E8210FA; Mon, 7 Feb 2022 12:18:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1644236293; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WP6Xgo2ws98hPpXKeUU962VqHQ+2ksGNpkLhWTonht4=; b=TGINR8F84BeIkQgoA51gSRY3TQArk02/ttVciZuSUTaE915VzbI2t5jJbBPIy7qcRntcI8 Y+Lgxh5qk9zh+ztIJwYwFqwMHqJpiQIC3MdiFPcJEZbx5mRbfsl2oS0DgxqXR76IRO88o/ 14AYuFnOo3RVUkVt/oFbsuSTrFiN5g8= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0472B13BE6; Mon, 7 Feb 2022 12:18:13 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id qGCSAAUOAWLMegAAMHmgww (envelope-from ); Mon, 07 Feb 2022 12:18:13 +0000 From: =?utf-8?q?Michal_Koutn=C3=BD?= To: Eric Biederman , Alexey Gladkov Cc: Kees Cook , Shuah Khan , Christian Brauner , Solar Designer , Ran Xiaokai , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Linux Containers Subject: [RFC PATCH 5/6] selftests: Challenge RLIMIT_NPROC in user namespaces Date: Mon, 7 Feb 2022 13:17:59 +0100 Message-Id: <20220207121800.5079-6-mkoutny@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220207121800.5079-1-mkoutny@suse.com> References: <20220207121800.5079-1-mkoutny@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org The services are started in descendant user namepaces, each of them should honor the RLIMIT_NPROC that's passed during user namespace creation. main [user_ns_0] ` service [user_ns_1] ` worker 1 ` worker 2 ... ` worker k ... ` service [user_ns_n] ` worker 1 ` worker 2 ... ` worker k Test uses explicit synchronization, to make sure original parent's limit does not interfere with descendants. Signed-off-by: Michal Koutný --- .../selftests/rlimits/rlimits-per-userns.c | 154 ++++++++++++++---- 1 file changed, 125 insertions(+), 29 deletions(-) diff --git a/tools/testing/selftests/rlimits/rlimits-per-userns.c b/tools/testing/selftests/rlimits/rlimits-per-userns.c index 26dc949e93ea..54c1b345e42b 100644 --- a/tools/testing/selftests/rlimits/rlimits-per-userns.c +++ b/tools/testing/selftests/rlimits/rlimits-per-userns.c @@ -9,7 +9,9 @@ #include #include #include +#include +#include #include #include #include @@ -21,38 +23,74 @@ #include #include -#define NR_CHILDS 2 +#define THE_LIMIT 4 +#define NR_CHILDREN 5 + +static_assert(NR_CHILDREN >= THE_LIMIT-1, "Need slots for limit-1 children."); static char *service_prog; static uid_t user = 60000; static uid_t group = 60000; +static struct rlimit saved_limit; + +/* Two uses: main and service */ +static pid_t child[NR_CHILDREN]; +static pid_t pid; static void setrlimit_nproc(rlim_t n) { - pid_t pid = getpid(); struct rlimit limit = { .rlim_cur = n, .rlim_max = n }; - - warnx("(pid=%d): Setting RLIMIT_NPROC=%ld", pid, n); + if (getrlimit(RLIMIT_NPROC, &saved_limit) < 0) + err(EXIT_FAILURE, "(pid=%d): getrlimit(RLIMIT_NPROC)", pid); if (setrlimit(RLIMIT_NPROC, &limit) < 0) err(EXIT_FAILURE, "(pid=%d): setrlimit(RLIMIT_NPROC)", pid); + + warnx("(pid=%d): Set RLIMIT_NPROC=%ld", pid, n); +} + +static void restore_rlimit_nproc(void) +{ + if (setrlimit(RLIMIT_NPROC, &saved_limit) < 0) + err(EXIT_FAILURE, "(pid=%d): setrlimit(RLIMIT_NPROC, saved)", pid); + warnx("(pid=%d) Restored RLIMIT_NPROC", pid); } -static pid_t fork_child(void) +enum msg_sync { + UNSHARE, + RLIMIT_RESTORE, +}; + +static void sync_notify(int fd, enum msg_sync m) { - pid_t pid = fork(); + char tmp = m; + + if (write(fd, &tmp, 1) < 0) + warnx("(pid=%d): failed sync-write", pid); +} - if (pid < 0) +static void sync_wait(int fd, enum msg_sync m) +{ + char tmp; + + if (read(fd, &tmp, 1) < 0) + warnx("(pid=%d): failed sync-read", pid); +} + +static pid_t fork_child(int control_fd) +{ + pid_t new_pid = fork(); + + if (new_pid < 0) err(EXIT_FAILURE, "fork"); - if (pid > 0) - return pid; + if (new_pid > 0) + return new_pid; pid = getpid(); - warnx("(pid=%d): New process starting ...", pid); if (prctl(PR_SET_PDEATHSIG, SIGKILL) < 0) @@ -73,6 +111,9 @@ static pid_t fork_child(void) if (unshare(CLONE_NEWUSER) < 0) err(EXIT_FAILURE, "unshare(CLONE_NEWUSER)"); + sync_notify(control_fd, UNSHARE); + sync_wait(control_fd, RLIMIT_RESTORE); + char *const argv[] = { "service", NULL }; char *const envp[] = { "I_AM_SERVICE=1", NULL }; @@ -82,37 +123,92 @@ static pid_t fork_child(void) err(EXIT_FAILURE, "(pid=%d): execve", pid); } +static void run_service(void) +{ + size_t i; + int ret = EXIT_SUCCESS; + struct rlimit limit; + char user_ns[PATH_MAX]; + + if (getrlimit(RLIMIT_NPROC, &limit) < 0) + err(EXIT_FAILURE, "(pid=%d) failed getrlimit", pid); + if (readlink("/proc/self/ns/user", user_ns, PATH_MAX) < 0) + err(EXIT_FAILURE, "(pid=%d) failed readlink", pid); + + warnx("(pid=%d) Service instance attempts %i children, limit %lu:%lu, ns=%s", + pid, THE_LIMIT, limit.rlim_cur, limit.rlim_max, user_ns); + + /* test rlimit inside the service, effectively THE_LIMIT-1 becaue of service itself */ + for (i = 0; i < THE_LIMIT; i++) { + child[i] = fork(); + if (child[i] == 0) { + /* service child */ + pause(); + exit(EXIT_SUCCESS); + } + if (child[i] < 0) { + warnx("(pid=%d) service fork %lu failed, errno = %i", pid, i+1, errno); + if (!(i == THE_LIMIT-1 && errno == EAGAIN)) + ret = EXIT_FAILURE; + } else if (i == THE_LIMIT-1) { + warnx("(pid=%d) RLIMIT_NPROC not honored", pid); + ret = EXIT_FAILURE; + } + } + + /* service cleanup */ + for (i = 0; i < THE_LIMIT; i++) + if (child[i] > 0) + kill(child[i], SIGUSR1); + + for (i = 0; i < THE_LIMIT; i++) + if (child[i] > 0) + waitpid(child[i], NULL, WNOHANG); + + if (ret) + exit(ret); + pause(); +} + int main(int argc, char **argv) { size_t i; - pid_t child[NR_CHILDS]; - int wstatus[NR_CHILDS]; - int childs = NR_CHILDS; - pid_t pid; + int control_fd[NR_CHILDREN]; + int wstatus[NR_CHILDREN]; + int children = NR_CHILDREN; + int sockets[2]; + + pid = getpid(); if (getenv("I_AM_SERVICE")) { - pause(); - exit(EXIT_SUCCESS); + run_service(); + exit(EXIT_FAILURE); } service_prog = argv[0]; - pid = getpid(); warnx("(pid=%d) Starting testcase", pid); - /* - * This rlimit is not a problem for root because it can be exceeded. - */ - setrlimit_nproc(1); - - for (i = 0; i < NR_CHILDS; i++) { - child[i] = fork_child(); + setrlimit_nproc(THE_LIMIT); + for (i = 0; i < NR_CHILDREN; i++) { + if (socketpair(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0, sockets) < 0) + err(EXIT_FAILURE, "(pid=%d) socketpair failed", pid); + control_fd[i] = sockets[0]; + child[i] = fork_child(sockets[1]); wstatus[i] = 0; + } + + for (i = 0; i < NR_CHILDREN; i++) + sync_wait(control_fd[i], UNSHARE); + restore_rlimit_nproc(); + + for (i = 0; i < NR_CHILDREN; i++) { + sync_notify(control_fd[i], RLIMIT_RESTORE); usleep(250000); } while (1) { - for (i = 0; i < NR_CHILDS; i++) { + for (i = 0; i < NR_CHILDREN; i++) { if (child[i] <= 0) continue; @@ -126,22 +222,22 @@ int main(int argc, char **argv) warn("(pid=%d): waitpid(%d)", pid, child[i]); child[i] *= -1; - childs -= 1; + children -= 1; } - if (!childs) + if (!children) break; usleep(250000); - for (i = 0; i < NR_CHILDS; i++) { + for (i = 0; i < NR_CHILDREN; i++) { if (child[i] <= 0) continue; kill(child[i], SIGUSR1); } } - for (i = 0; i < NR_CHILDS; i++) { + for (i = 0; i < NR_CHILDREN; i++) { if (WIFEXITED(wstatus[i])) warnx("(pid=%d): pid %d exited, status=%d", pid, -child[i], WEXITSTATUS(wstatus[i])); From patchwork Mon Feb 7 12:18:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Michal_Koutn=C3=BD?= X-Patchwork-Id: 12737211 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53BECC433F5 for ; Mon, 7 Feb 2022 13:28:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239162AbiBGN1k (ORCPT ); Mon, 7 Feb 2022 08:27:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1445599AbiBGMmS (ORCPT ); Mon, 7 Feb 2022 07:42:18 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D177E033DB1; Mon, 7 Feb 2022 04:33:57 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 46FD41F38F; Mon, 7 Feb 2022 12:18:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1644236293; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aQBjXVV6nPDxhegxhJuo1rX8QnvdXvXCHXEHajMjE/k=; b=F7vAbavv5+DmmcYqc+EmNYdMceHNSmxYSAN9d0tSWb0chC3D3hfAkjuJXtTsRP9K1btMvr A8jT8Uy54seXjkBVVreXC3HuvigVWh/RwAFUv+ReMCNKF7QqlpMsEl79kgMbBBhrkEMZVk oeL3Pc2ZMCI/XWD7PFQ/NPciWtTEzcI= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2581413BBC; Mon, 7 Feb 2022 12:18:13 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id wMqjCAUOAWLMegAAMHmgww (envelope-from ); Mon, 07 Feb 2022 12:18:13 +0000 From: =?utf-8?q?Michal_Koutn=C3=BD?= To: Eric Biederman , Alexey Gladkov Cc: Kees Cook , Shuah Khan , Christian Brauner , Solar Designer , Ran Xiaokai , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Linux Containers Subject: [RFC PATCH 6/6] selftests: Test RLIMIT_NPROC in clone-created user namespaces Date: Mon, 7 Feb 2022 13:18:00 +0100 Message-Id: <20220207121800.5079-7-mkoutny@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220207121800.5079-1-mkoutny@suse.com> References: <20220207121800.5079-1-mkoutny@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Verify RLIMIT_NPROC observance in user namespaces also in the clone(CLONE_NEWUSER) path. Note the such a user_ns is created by the privileged user. Signed-off-by: Michal Koutný --- .../selftests/rlimits/rlimits-per-userns.c | 141 +++++++++++++----- 1 file changed, 101 insertions(+), 40 deletions(-) diff --git a/tools/testing/selftests/rlimits/rlimits-per-userns.c b/tools/testing/selftests/rlimits/rlimits-per-userns.c index 54c1b345e42b..46f4cff36b30 100644 --- a/tools/testing/selftests/rlimits/rlimits-per-userns.c +++ b/tools/testing/selftests/rlimits/rlimits-per-userns.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-or-later /* * Author: Alexey Gladkov + * Author: Michal Koutný */ #define _GNU_SOURCE #include @@ -25,16 +26,25 @@ #define THE_LIMIT 4 #define NR_CHILDREN 5 +#define STACK_SIZE (2 * (1<<20)) -static_assert(NR_CHILDREN >= THE_LIMIT-1, "Need slots for limit-1 children."); +static_assert(NR_CHILDREN >= THE_LIMIT-1, "Need slots for THE_LIMIT-1 children."); -static char *service_prog; static uid_t user = 60000; static uid_t group = 60000; static struct rlimit saved_limit; -/* Two uses: main and service */ -static pid_t child[NR_CHILDREN]; +enum userns_mode { + UM_UNSHARE, /* setrlimit,clone(0),setuid,unshare,execve */ + UM_CLONE_NEWUSER, /* setrlimit,clone(NEWUSER),setuid,execve */ +}; +static struct { + int control_fd; + char *pathname; + enum userns_mode mode; +} child_args; + +/* Cache current pid */ static pid_t pid; static void setrlimit_nproc(rlim_t n) @@ -60,6 +70,7 @@ static void restore_rlimit_nproc(void) } enum msg_sync { + MAP_DEFINE, UNSHARE, RLIMIT_RESTORE, }; @@ -80,15 +91,32 @@ static void sync_wait(int fd, enum msg_sync m) warnx("(pid=%d): failed sync-read", pid); } -static pid_t fork_child(int control_fd) +static int define_maps(pid_t child_pid) { - pid_t new_pid = fork(); + FILE *f; + char filename[PATH_MAX]; - if (new_pid < 0) - err(EXIT_FAILURE, "fork"); + if (child_args.mode != UM_CLONE_NEWUSER) + return 0; + + snprintf(filename, PATH_MAX, "/proc/%i/uid_map", child_pid); + f = fopen(filename, "w"); + if (fprintf(f, "%i %i 1\n", user, user) < 0) + return -1; + fclose(f); + + snprintf(filename, PATH_MAX, "/proc/%i/gid_map", child_pid); + f = fopen(filename, "w"); + if (fprintf(f, "%i %i 1\n", group, group) < 0) + return -1; + fclose(f); + + return 0; +} - if (new_pid > 0) - return new_pid; +static int setup_and_exec(void *arg) +{ + int control_fd = child_args.control_fd; pid = getpid(); warnx("(pid=%d): New process starting ...", pid); @@ -98,6 +126,7 @@ static pid_t fork_child(int control_fd) signal(SIGUSR1, SIG_DFL); + sync_wait(control_fd, RLIMIT_RESTORE); warnx("(pid=%d): Changing to uid=%d, gid=%d", pid, user, group); if (setgid(group) < 0) @@ -107,9 +136,11 @@ static pid_t fork_child(int control_fd) warnx("(pid=%d): Service running ...", pid); - warnx("(pid=%d): Unshare user namespace", pid); - if (unshare(CLONE_NEWUSER) < 0) - err(EXIT_FAILURE, "unshare(CLONE_NEWUSER)"); + if (child_args.mode == UM_UNSHARE) { + warnx("(pid=%d): Unshare user namespace", pid); + if (unshare(CLONE_NEWUSER) < 0) + err(EXIT_FAILURE, "unshare(CLONE_NEWUSER)"); + } sync_notify(control_fd, UNSHARE); sync_wait(control_fd, RLIMIT_RESTORE); @@ -119,14 +150,30 @@ static pid_t fork_child(int control_fd) warnx("(pid=%d): Executing real service ...", pid); - execve(service_prog, argv, envp); + execve(child_args.pathname, argv, envp); err(EXIT_FAILURE, "(pid=%d): execve", pid); } -static void run_service(void) +static pid_t start_child(char *pathname, int control_fd) +{ + char *stack = malloc(STACK_SIZE); + int flags = child_args.mode == UM_CLONE_NEWUSER ? CLONE_NEWUSER : 0; + pid_t new_pid; + + child_args.control_fd = control_fd; + child_args.pathname = pathname; + + new_pid = clone(setup_and_exec, stack+STACK_SIZE-1, flags, NULL); + if (new_pid < 0) + err(EXIT_FAILURE, "clone"); + + free(stack); + close(control_fd); + return new_pid; +} + +static void dump_context(size_t n_workers) { - size_t i; - int ret = EXIT_SUCCESS; struct rlimit limit; char user_ns[PATH_MAX]; @@ -135,44 +182,55 @@ static void run_service(void) if (readlink("/proc/self/ns/user", user_ns, PATH_MAX) < 0) err(EXIT_FAILURE, "(pid=%d) failed readlink", pid); - warnx("(pid=%d) Service instance attempts %i children, limit %lu:%lu, ns=%s", - pid, THE_LIMIT, limit.rlim_cur, limit.rlim_max, user_ns); + warnx("(pid=%d) Service instance attempts %lu workers, limit %lu:%lu, ns=%s", + pid, n_workers, limit.rlim_cur, limit.rlim_max, user_ns); +} + +static int run_service(void) +{ + size_t i, n_workers = THE_LIMIT; + pid_t worker[NR_CHILDREN]; + int ret = EXIT_SUCCESS; - /* test rlimit inside the service, effectively THE_LIMIT-1 becaue of service itself */ - for (i = 0; i < THE_LIMIT; i++) { - child[i] = fork(); - if (child[i] == 0) { - /* service child */ + dump_context(n_workers); + + /* test rlimit inside the service, last worker should fail because of service itself */ + for (i = 0; i < n_workers; i++) { + worker[i] = fork(); + if (worker[i] == 0) { + /* service worker */ pause(); exit(EXIT_SUCCESS); } - if (child[i] < 0) { + if (worker[i] < 0) { warnx("(pid=%d) service fork %lu failed, errno = %i", pid, i+1, errno); - if (!(i == THE_LIMIT-1 && errno == EAGAIN)) + if (!(i == n_workers-1 && errno == EAGAIN)) ret = EXIT_FAILURE; - } else if (i == THE_LIMIT-1) { + } else if (i == n_workers-1) { warnx("(pid=%d) RLIMIT_NPROC not honored", pid); ret = EXIT_FAILURE; } } /* service cleanup */ - for (i = 0; i < THE_LIMIT; i++) - if (child[i] > 0) - kill(child[i], SIGUSR1); + for (i = 0; i < n_workers; i++) + if (worker[i] > 0) + kill(worker[i], SIGUSR1); - for (i = 0; i < THE_LIMIT; i++) - if (child[i] > 0) - waitpid(child[i], NULL, WNOHANG); + for (i = 0; i < n_workers; i++) + if (worker[i] > 0) + waitpid(worker[i], NULL, WNOHANG); if (ret) - exit(ret); + return ret; pause(); + return EXIT_FAILURE; } int main(int argc, char **argv) { size_t i; + pid_t child[NR_CHILDREN]; int control_fd[NR_CHILDREN]; int wstatus[NR_CHILDREN]; int children = NR_CHILDREN; @@ -180,12 +238,11 @@ int main(int argc, char **argv) pid = getpid(); - if (getenv("I_AM_SERVICE")) { - run_service(); - exit(EXIT_FAILURE); - } + if (getenv("I_AM_SERVICE")) + return run_service(); - service_prog = argv[0]; + if (argc > 1 && *argv[1] == 'c') + child_args.mode = UM_CLONE_NEWUSER; warnx("(pid=%d) Starting testcase", pid); @@ -194,8 +251,12 @@ int main(int argc, char **argv) if (socketpair(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0, sockets) < 0) err(EXIT_FAILURE, "(pid=%d) socketpair failed", pid); control_fd[i] = sockets[0]; - child[i] = fork_child(sockets[1]); + child[i] = start_child(argv[0], sockets[1]); wstatus[i] = 0; + + if (define_maps(child[i]) < 0) + err(EXIT_FAILURE, "(pid=%d) user_ns maps definition failed", pid); + sync_notify(control_fd[i], MAP_DEFINE); } for (i = 0; i < NR_CHILDREN; i++)