From patchwork Sun Jun 9 10:43:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jonathan Calmels X-Patchwork-Id: 13691207 Received: from flow3-smtp.messagingengine.com (flow3-smtp.messagingengine.com [103.168.172.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1E2B1CD0C; Sun, 9 Jun 2024 10:40:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.138 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717929615; cv=none; b=n1yOOiQsSEBvKtvenSKjgvwieNURoHUCQ2n/fN/9r7xkrEkuDKDUrV5cQZlz8EfZ5KVTkzO4GE7J/l//RtoNatahGMueU1XovDsKZGe3U2H+7UcD2ieyMk9OypMN48uoT0uQ9skizbQdsp5xJySTlhsFAim8aVvtyau+yRmnFU4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717929615; c=relaxed/simple; bh=GkI1fHZ1CYIUcEw/eSRklea8KLu/IoQulLyqyPHHfpQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cKSL8v7lovJJAPqvh5+h2pZ8qR1+qxGxxfpoOigKaXjfYLbBxeTZ5nRjl2fYqq9Lyydn+/4uvk4auqaTKY5MbnkX81D+fBtsOQ4kkwuoBMMkHRVzeKmPDgxjBW8J/xg+k0wIeoepuM9Y88akUuT6rMTDS6TPK1+Wpg4kIJ3eXjc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net; spf=pass smtp.mailfrom=3xx0.net; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b=X+AX03KK; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=IoCD/xkD; arc=none smtp.client-ip=103.168.172.138 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=3xx0.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b="X+AX03KK"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="IoCD/xkD" Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailflow.nyi.internal (Postfix) with ESMTP id BFDFE2001C9; Sun, 9 Jun 2024 06:40:12 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Sun, 09 Jun 2024 06:40:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=3xx0.net; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1717929612; x=1717933212; bh=sG0Lp5pEKKnarvUVPQF/2JQX1lLkIJR9+zF2+Y+y4Sw=; b= X+AX03KKvp65JJTQTpN3EEHeX5m9iWfer/7WftacITbIArxwDk5/mc1pMiHjx4RU sGzoP6dNVJWXOmVAg9HpyozimZDxP897xGNYZfAyb3mSqaEOkS6TmeBHMwBDExUs 8cPmu9KwFNYP8U7JrxnWYvJsiMVdMxT0LPL0C5DfiSsM4ziDfIafiyGYvku9timh DPoU9ppJ5jP0g+5p7boaHtypqcGGg7bwqNdS1UG+bQZQDnvZAkHVnfKAPfwk3L0U J4vcb9LAF9lo5FgvK6ToOdgoyhRTg8sWxRPEUCNR5rccEy2RSZZRzy85dPPyPVI3 V/r0FNKigJxq6ZEU0Atyxg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1717929612; x= 1717933212; bh=sG0Lp5pEKKnarvUVPQF/2JQX1lLkIJR9+zF2+Y+y4Sw=; b=I oCD/xkD25aVQt+THn8j8TucNqbL0ivSlU2ZYbJWB2+3znVh9aUWjmbLhUhde3ku2 Ps+fM/wj8U00+uGVkmk6DzZJfSmRY9miNuL7gN8KlhWElKPE2NHnQF+RQX3ibX66 BeBB2iqmeAYy0WV/9rQ1boHo33MSYrMUXQRe3F121qyY4GWR1ggdeHkpNWD7DCMm AXpYQXpWsykS65cg1OTi/SMB1Dqo3wRlVZoMsehnkgUgg8Yrtw7wHaA/aIiza+A3 GZLn9j8YAjxjwvxkbiuCX96llTFzZqe+7+UWW/jzkatsDdCvXAiLiW2SD4Z5kuHV g1ev2O1nfZ3whM1OX+eFQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfedtjedgfeduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvvefufffkofgjfhggtgfgsehtkeertdertdejnecuhfhrohhmpeflohhn rghthhgrnhcuvegrlhhmvghlshcuoehjtggrlhhmvghlshesfeiggidtrdhnvghtqeenuc ggtffrrghtthgvrhhnpeeftdeutdejieffledtteeikeetkeehuefgtefghfevjefftdff jeegtedvkeethfenucffohhmrghinhepghhoohhglhgvsghlohhgrdgtohhmpdhkvghrnh gvlhdrohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhr ohhmpehjtggrlhhmvghlshesfeiggidtrdhnvght X-ME-Proxy: Feedback-ID: i76614979:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 9 Jun 2024 06:40:08 -0400 (EDT) From: Jonathan Calmels To: brauner@kernel.org, ebiederm@xmission.com, Jonathan Corbet , Paul Moore , James Morris , "Serge E. Hallyn" , KP Singh , Matt Bobrowski , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , Stanislav Fomichev , Hao Luo , Jiri Olsa , Luis Chamberlain , Kees Cook , Joel Granados , John Johansen , David Howells , Jarkko Sakkinen , Stephen Smalley , Ondrej Mosnacek , Mykola Lysenko , Shuah Khan Cc: containers@lists.linux.dev, Jonathan Calmels , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-security-module@vger.kernel.org, bpf@vger.kernel.org, apparmor@lists.ubuntu.com, keyrings@vger.kernel.org, selinux@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH v2 1/4] capabilities: Add user namespace capabilities Date: Sun, 9 Jun 2024 03:43:34 -0700 Message-ID: <20240609104355.442002-2-jcalmels@3xx0.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240609104355.442002-1-jcalmels@3xx0.net> References: <20240609104355.442002-1-jcalmels@3xx0.net> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Attackers often rely on user namespaces to get elevated (yet confined) privileges in order to target specific subsystems (e.g. [1]). Distributions have been pretty adamant that they need a way to configure these, most of them carry out-of-tree patches to do so, or plainly refuse to enable them. As a result, there have been multiple efforts over the years to introduce various knobs to control and/or disable user namespaces (e.g. [2][3][4]). While we acknowledge that there are already ways to control the creation of such namespaces (the most recent being a LSM hook), there are inherent issues with these approaches. Preventing the user namespace creation is not fine-grained enough, and in some cases, incompatible with various userspace expectations (e.g. container runtimes, browser sandboxing, service isolation) This patch addresses these limitations by introducing an additional capability set used to restrict the permissions granted when creating user namespaces. This way, processes can apply the principle of least privilege by configuring only the capabilities they need for their namespaces. For compatibility reasons, processes always start with a full userns capability set. On namespace creation, the userns capability set (pU) is assigned to the new effective (pE), permitted (pP) and bounding set (X) of the task: pU = pE = pP = X The userns capability set obeys the invariant that no bit can ever be set if it is not already part of the task’s bounding set. This ensures that no namespace can ever gain more privileges than its predecessors. Additionally, if a task is not privileged over CAP_SETPCAP, setting any bit in the userns set requires its corresponding bit to be set in the permitted set. This effectively mimics the inheritable set rules and means that, by default, only root in the user namespace can regain userns capabilities previously dropped: p’U = (pE & CAP_SETPCAP) ? X : (X & pP) Note that since userns capabilities are strictly hierarchical, policies can be enforced at various levels (e.g. init, pam_cap) and inherited by every child namespace. Here is a sample program that can be used to verify the functionality: /* * Test program that drops CAP_SYS_RAWIO from subsequent user namespaces. * * ./cap_userns_test unshare -r grep Cap /proc/self/status * CapInh: 0000000000000000 * CapPrm: 000001fffffdffff * CapEff: 000001fffffdffff * CapBnd: 000001fffffdffff * CapAmb: 0000000000000000 * CapUNs: 000001fffffdffff */ int main(int argc, char *argv[]) { if (prctl(PR_CAP_USERNS, PR_CAP_USERNS_LOWER, CAP_SYS_RAWIO, 0, 0) < 0) err(1, "cannot drop userns cap"); execvp(argv[1], argv + 1); err(1, "cannot exec"); } [1] https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html [2] https://lore.kernel.org/lkml/1453502345-30416-1-git-send-email-keescook@chromium.org [3] https://lore.kernel.org/lkml/20220815162028.926858-1-fred@cloudflare.com [4] https://lore.kernel.org/containers/168547265011.24337.4306067683997517082-0@git.sr.ht Signed-off-by: Jonathan Calmels Reviewed-by: Serge Hallyn --- Documentation/filesystems/proc.rst | 1 + Documentation/security/credentials.rst | 6 +++ fs/proc/array.c | 9 ++++ include/linux/cred.h | 3 ++ include/uapi/linux/prctl.h | 7 +++ kernel/cred.c | 3 ++ kernel/umh.c | 15 +++++++ kernel/user_namespace.c | 12 +++-- security/commoncap.c | 62 ++++++++++++++++++++++++-- security/keys/process_keys.c | 3 ++ 10 files changed, 111 insertions(+), 10 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 7c3a565ffbef..b5de4eaf1b7b 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -294,6 +294,7 @@ It's slow but very precise. CapEff bitmap of effective capabilities CapBnd bitmap of capabilities bounding set CapAmb bitmap of ambient capabilities + CapUns bitmap of user namespace capabilities NoNewPrivs no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...) Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...) Speculation_Store_Bypass speculative store bypass mitigation status diff --git a/Documentation/security/credentials.rst b/Documentation/security/credentials.rst index 357328d566c8..7ee904237023 100644 --- a/Documentation/security/credentials.rst +++ b/Documentation/security/credentials.rst @@ -148,6 +148,7 @@ The Linux kernel supports the following types of credentials: - Set of permitted capabilities - Set of inheritable capabilities - Set of effective capabilities + - Set of user namespace capabilities - Capability bounding set These are only carried by tasks. They indicate superior capabilities @@ -170,6 +171,11 @@ The Linux kernel supports the following types of credentials: ``execve()``, especially when a binary is executed that will execute as UID 0. + The user namespace set limits the capabilities granted to user namespaces. + It defines what capabilities will be available in the other sets after + creating a new user namespace, such as when calling ``clone()`` or + ``unshare()`` with ``CLONE_NEWUSER``. + 3. Secure management flags (securebits). These are only carried by tasks. These govern the way the above diff --git a/fs/proc/array.c b/fs/proc/array.c index 34a47fb0c57f..364e8bb19f9d 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -313,6 +313,9 @@ static inline void task_cap(struct seq_file *m, struct task_struct *p) const struct cred *cred; kernel_cap_t cap_inheritable, cap_permitted, cap_effective, cap_bset, cap_ambient; +#ifdef CONFIG_USER_NS + kernel_cap_t cap_userns; +#endif rcu_read_lock(); cred = __task_cred(p); @@ -321,6 +324,9 @@ static inline void task_cap(struct seq_file *m, struct task_struct *p) cap_effective = cred->cap_effective; cap_bset = cred->cap_bset; cap_ambient = cred->cap_ambient; +#ifdef CONFIG_USER_NS + cap_userns = cred->cap_userns; +#endif rcu_read_unlock(); render_cap_t(m, "CapInh:\t", &cap_inheritable); @@ -328,6 +334,9 @@ static inline void task_cap(struct seq_file *m, struct task_struct *p) render_cap_t(m, "CapEff:\t", &cap_effective); render_cap_t(m, "CapBnd:\t", &cap_bset); render_cap_t(m, "CapAmb:\t", &cap_ambient); +#ifdef CONFIG_USER_NS + render_cap_t(m, "CapUNs:\t", &cap_userns); +#endif } static inline void task_seccomp(struct seq_file *m, struct task_struct *p) diff --git a/include/linux/cred.h b/include/linux/cred.h index 2976f534a7a3..adab0031443e 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -124,6 +124,9 @@ struct cred { kernel_cap_t cap_effective; /* caps we can actually use */ kernel_cap_t cap_bset; /* capability bounding set */ kernel_cap_t cap_ambient; /* Ambient capability set */ +#ifdef CONFIG_USER_NS + kernel_cap_t cap_userns; /* User namespace capability set */ +#endif #ifdef CONFIG_KEYS unsigned char jit_keyring; /* default keyring to attach requested * keys to */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 35791791a879..b58325ebdc9e 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -198,6 +198,13 @@ struct prctl_mm_map { # define PR_CAP_AMBIENT_LOWER 3 # define PR_CAP_AMBIENT_CLEAR_ALL 4 +/* Control the userns capability set */ +#define PR_CAP_USERNS 48 +# define PR_CAP_USERNS_IS_SET 1 +# define PR_CAP_USERNS_RAISE 2 +# define PR_CAP_USERNS_LOWER 3 +# define PR_CAP_USERNS_CLEAR_ALL 4 + /* arm64 Scalable Vector Extension controls */ /* Flag values must be kept in sync with ptrace NT_ARM_SVE interface */ #define PR_SVE_SET_VL 50 /* set task vector length */ diff --git a/kernel/cred.c b/kernel/cred.c index 075cfa7c896f..9912c6f3bc6b 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -56,6 +56,9 @@ struct cred init_cred = { .cap_permitted = CAP_FULL_SET, .cap_effective = CAP_FULL_SET, .cap_bset = CAP_FULL_SET, +#ifdef CONFIG_USER_NS + .cap_userns = CAP_FULL_SET, +#endif .user = INIT_USER, .user_ns = &init_user_ns, .group_info = &init_groups, diff --git a/kernel/umh.c b/kernel/umh.c index 598b3ffe1522..0a5a9cf10d83 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -32,6 +32,9 @@ #include +#ifdef CONFIG_USER_NS +static kernel_cap_t usermodehelper_userns = CAP_FULL_SET; +#endif static kernel_cap_t usermodehelper_bset = CAP_FULL_SET; static kernel_cap_t usermodehelper_inheritable = CAP_FULL_SET; static DEFINE_SPINLOCK(umh_sysctl_lock); @@ -94,6 +97,9 @@ static int call_usermodehelper_exec_async(void *data) new->cap_bset = cap_intersect(usermodehelper_bset, new->cap_bset); new->cap_inheritable = cap_intersect(usermodehelper_inheritable, new->cap_inheritable); +#ifdef CONFIG_USER_NS + new->cap_userns = cap_intersect(usermodehelper_userns, new->cap_userns); +#endif spin_unlock(&umh_sysctl_lock); if (sub_info->init) { @@ -560,6 +566,15 @@ static struct ctl_table usermodehelper_table[] = { .mode = 0600, .proc_handler = proc_cap_handler, }, +#ifdef CONFIG_USER_NS + { + .procname = "userns", + .data = &usermodehelper_userns, + .maxlen = 2 * sizeof(unsigned long), + .mode = 0600, + .proc_handler = proc_cap_handler, + }, +#endif }; static int __init init_umh_sysctls(void) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 0b0b95418b16..7e624607330b 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -42,15 +42,13 @@ static void dec_user_namespaces(struct ucounts *ucounts) static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns) { - /* Start with the same capabilities as init but useless for doing - * anything as the capabilities are bound to the new user namespace. - */ - cred->securebits = SECUREBITS_DEFAULT; + /* Start with the capabilities defined in the userns set. */ + cred->cap_bset = cred->cap_userns; + cred->cap_permitted = cred->cap_userns; + cred->cap_effective = cred->cap_userns; cred->cap_inheritable = CAP_EMPTY_SET; - cred->cap_permitted = CAP_FULL_SET; - cred->cap_effective = CAP_FULL_SET; cred->cap_ambient = CAP_EMPTY_SET; - cred->cap_bset = CAP_FULL_SET; + cred->securebits = SECUREBITS_DEFAULT; #ifdef CONFIG_KEYS key_put(cred->request_key_auth); cred->request_key_auth = NULL; diff --git a/security/commoncap.c b/security/commoncap.c index 162d96b3a676..59fafbfcfc5e 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -214,10 +214,10 @@ int cap_capget(const struct task_struct *target, kernel_cap_t *effective, } /* - * Determine whether the inheritable capabilities are limited to the old + * Determine whether the capabilities are limited to the old * permitted set. Returns 1 if they are limited, 0 if they are not. */ -static inline int cap_inh_is_capped(void) +static inline int cap_is_capped(void) { /* they are so limited unless the current task has the CAP_SETPCAP * capability @@ -228,6 +228,29 @@ static inline int cap_inh_is_capped(void) return 1; } +/* + * Determine whether a userns capability can be raised. + * Returns 1 if it can, 0 otherwise. + */ +#ifdef CONFIG_USER_NS +static inline int cap_uns_is_raiseable(unsigned long cap) +{ + if (!!cap_raised(current_cred()->cap_userns, cap)) + return 1; + + /* + * A capability cannot be raised unless the current task has it in + * its bounding set and, without CAP_SETPCAP, its permitted set. + */ + if (!cap_raised(current_cred()->cap_bset, cap)) + return 0; + if (cap_is_capped() && !cap_raised(current_cred()->cap_permitted, cap)) + return 0; + + return 1; +} +#endif + /** * cap_capset - Validate and apply proposed changes to current's capabilities * @new: The proposed new credentials; alterations should be made here @@ -246,7 +269,7 @@ int cap_capset(struct cred *new, const kernel_cap_t *inheritable, const kernel_cap_t *permitted) { - if (cap_inh_is_capped() && + if (cap_is_capped() && !cap_issubset(*inheritable, cap_combine(old->cap_inheritable, old->cap_permitted))) @@ -1382,6 +1405,39 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3, return commit_creds(new); } +#ifdef CONFIG_USER_NS + case PR_CAP_USERNS: + if (arg2 == PR_CAP_USERNS_CLEAR_ALL) { + if (arg3 | arg4 | arg5) + return -EINVAL; + + new = prepare_creds(); + if (!new) + return -ENOMEM; + cap_clear(new->cap_userns); + return commit_creds(new); + } + + if (((!cap_valid(arg3)) | arg4 | arg5)) + return -EINVAL; + + if (arg2 == PR_CAP_USERNS_IS_SET) + return !!cap_raised(current_cred()->cap_userns, arg3); + if (arg2 != PR_CAP_USERNS_RAISE && arg2 != PR_CAP_USERNS_LOWER) + return -EINVAL; + if (arg2 == PR_CAP_USERNS_RAISE && !cap_uns_is_raiseable(arg3)) + return -EPERM; + + new = prepare_creds(); + if (!new) + return -ENOMEM; + if (arg2 == PR_CAP_USERNS_RAISE) + cap_raise(new->cap_userns, arg3); + else + cap_lower(new->cap_userns, arg3); + return commit_creds(new); +#endif + default: /* No functionality available - continue with default */ return -ENOSYS; diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c index b5d5333ab330..e3670d815435 100644 --- a/security/keys/process_keys.c +++ b/security/keys/process_keys.c @@ -944,6 +944,9 @@ void key_change_session_keyring(struct callback_head *twork) new->cap_effective = old->cap_effective; new->cap_ambient = old->cap_ambient; new->cap_bset = old->cap_bset; +#ifdef CONFIG_USER_NS + new->cap_userns = old->cap_userns; +#endif new->jit_keyring = old->jit_keyring; new->thread_keyring = key_get(old->thread_keyring); From patchwork Sun Jun 9 10:43:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jonathan Calmels X-Patchwork-Id: 13691208 Received: from flow3-smtp.messagingengine.com (flow3-smtp.messagingengine.com [103.168.172.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92DFE24B5B; Sun, 9 Jun 2024 10:40:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.138 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717929626; cv=none; b=MlH8pasWbdjNKoMEAH0ByMjrYkFd1ywp0dhKoKwBZuQ+RVAgRAt2kp9kKfU8kcqQPqMFvJknuRfHFCdLONFf6hUM+55Wn/LooWVqUmukpc94WwLrDT3QxWvrplV1XY5Ne7DV/KN0rhWZACWSEh1Fv4vIW//j1/KQ3erTjbG12Ds= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717929626; c=relaxed/simple; bh=TRflIsPbASjR72yTWpx34CeYRaOCTbvPLNUnQ9bjQ6k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cyHRtGU0UTWsa3c3klDSM7osXZp0bK5gpF8w9zmjdCDrC8VlDAtkCjbFejhVlyGV8gmJ08ujvoVmoGc2x4Xm6Fp+tUzYe90KfVqJkIf0fbWSVqIYf5gzDBRzGcE1vl8rWLCF5Mp96OqQz2rC7o6YTGWovaAE/B5P9ddr+ZP8zuc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net; spf=pass smtp.mailfrom=3xx0.net; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b=gsfwPULs; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=UQOz4ItZ; arc=none smtp.client-ip=103.168.172.138 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=3xx0.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b="gsfwPULs"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="UQOz4ItZ" Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailflow.nyi.internal (Postfix) with ESMTP id C56CA2001CA; Sun, 9 Jun 2024 06:40:23 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Sun, 09 Jun 2024 06:40:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=3xx0.net; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1717929623; x=1717933223; bh=givYjoPOL2k44qEG2AAhLIKWw+SaQbnMN2v1JiAaofI=; b= gsfwPULsB5vGdh2divKYeX2pT9m7EhL7B8rHXxOxGrnjOluHppy8OC6V/mJFZ6qG RGZcSmXNOCxb86qNZyXSSrkCYSSNJ9fnL2PcxZexz4wNzkoDLbnKrmufOi8NZbxh Xa98Juv3S2ziLBs3ZYQD2XnvkNKcsZhPrQUJbcXDnTANkMHzxI/6MfevpAQntzGf Egvm14viEkR/hpYGwq/KECjvzN29akriG+vYyD0Br1E+t63mN4IVRmEv1tM2WvvO P7jQi1XAUOdoJL54p8/6s6eCb8Ob+1d1gFaOITzeWyhTtj8EB1Ed5dUJgUyZHsh7 z7FP+gbbFde7LylQsszfzw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1717929623; x= 1717933223; bh=givYjoPOL2k44qEG2AAhLIKWw+SaQbnMN2v1JiAaofI=; b=U QOz4ItZeUl4PBUxODr3Q1h/fZXwI+Q1LDY2gPrf9FL0OOyk0CzxxbTIYafM5jirH 3aqh0gd+G3OxbxmS5+pVaTFKGlbuY3eDMUfWnIu6I5w2hY24b2PTQOiZuVmNhkuK buF6oUecCbOTJvDq1E2f/kWHwXJTxc99KNz/GzlZ1n8jggHlgeJF7bmb5Cc4vtxh 4vzpRpoH9t5M6rATh/++1ojrpJscOK2seRo2HShpcDLGqIpAOx/b+emrqla23Ope yPTI0OchbCaPiW9X1EfFPq2auSr8tfeQ8q2teol3k5kFk0wM016PJ/Yc6FbScqbY +ikf/xa1m4gqMjtItzmDw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfedtjedgfeduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvvefufffkofgjfhggtgfgsehtkeertdertdejnecuhfhrohhmpeflohhn rghthhgrnhcuvegrlhhmvghlshcuoehjtggrlhhmvghlshesfeiggidtrdhnvghtqeenuc ggtffrrghtthgvrhhnpeeiueeukeeitddvheeiveeiiedvhfeljeeiteeggedtteeiueet iefhudfgvddvleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpehjtggrlhhmvghlshesfeiggidtrdhnvght X-ME-Proxy: Feedback-ID: i76614979:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 9 Jun 2024 06:40:19 -0400 (EDT) From: Jonathan Calmels To: brauner@kernel.org, ebiederm@xmission.com, Jonathan Corbet , Paul Moore , James Morris , "Serge E. Hallyn" , KP Singh , Matt Bobrowski , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , Stanislav Fomichev , Hao Luo , Jiri Olsa , Luis Chamberlain , Kees Cook , Joel Granados , John Johansen , David Howells , Jarkko Sakkinen , Stephen Smalley , Ondrej Mosnacek , Mykola Lysenko , Shuah Khan Cc: containers@lists.linux.dev, Jonathan Calmels , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-security-module@vger.kernel.org, bpf@vger.kernel.org, apparmor@lists.ubuntu.com, keyrings@vger.kernel.org, selinux@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH v2 2/4] capabilities: Add securebit to restrict userns caps Date: Sun, 9 Jun 2024 03:43:35 -0700 Message-ID: <20240609104355.442002-3-jcalmels@3xx0.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240609104355.442002-1-jcalmels@3xx0.net> References: <20240609104355.442002-1-jcalmels@3xx0.net> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This patch adds a new capability security bit designed to constrain a task’s userns capability set to its bounding set. The reason for this is twofold: - This serves as a quick and easy way to lock down a set of capabilities for a task, thus ensuring that any namespace it creates will never be more privileged than itself is. - This helps userspace transition to more secure defaults by not requiring specific logic for the userns capability set, or libcap support. Example: # capsh --secbits=$((1 << 8)) --drop=cap_sys_rawio -- \ -c 'unshare -r grep Cap /proc/self/status' CapInh: 0000000000000000 CapPrm: 000001fffffdffff CapEff: 000001fffffdffff CapBnd: 000001fffffdffff CapAmb: 0000000000000000 CapUNs: 000001fffffdffff Signed-off-by: Jonathan Calmels --- include/linux/securebits.h | 1 + include/uapi/linux/securebits.h | 11 ++++++++++- kernel/user_namespace.c | 5 +++++ 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/include/linux/securebits.h b/include/linux/securebits.h index 656528673983..5f9d85cd69c3 100644 --- a/include/linux/securebits.h +++ b/include/linux/securebits.h @@ -5,4 +5,5 @@ #include #define issecure(X) (issecure_mask(X) & current_cred_xxx(securebits)) +#define iscredsecure(cred, X) (issecure_mask(X) & cred->securebits) #endif /* !_LINUX_SECUREBITS_H */ diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h index d6d98877ff1a..2da3f4be4531 100644 --- a/include/uapi/linux/securebits.h +++ b/include/uapi/linux/securebits.h @@ -52,10 +52,19 @@ #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \ (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED)) +/* When set, user namespace capabilities are restricted to their parent's bounding set. */ +#define SECURE_USERNS_STRICT_CAPS 8 +#define SECURE_USERNS_STRICT_CAPS_LOCKED 9 /* make bit-8 immutable */ + +#define SECBIT_USERNS_STRICT_CAPS (issecure_mask(SECURE_USERNS_STRICT_CAPS)) +#define SECBIT_USERNS_STRICT_CAPS_LOCKED \ + (issecure_mask(SECURE_USERNS_STRICT_CAPS_LOCKED)) + #define SECURE_ALL_BITS (issecure_mask(SECURE_NOROOT) | \ issecure_mask(SECURE_NO_SETUID_FIXUP) | \ issecure_mask(SECURE_KEEP_CAPS) | \ - issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE)) + issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \ + issecure_mask(SECURE_USERNS_STRICT_CAPS)) #define SECURE_ALL_LOCKS (SECURE_ALL_BITS << 1) #endif /* _UAPI_LINUX_SECUREBITS_H */ diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 7e624607330b..53848e2b68cd 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -42,6 +43,10 @@ static void dec_user_namespaces(struct ucounts *ucounts) static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns) { + /* Limit userns capabilities to our parent's bounding set. */ + if (iscredsecure(cred, SECURE_USERNS_STRICT_CAPS)) + cred->cap_userns = cap_intersect(cred->cap_userns, cred->cap_bset); + /* Start with the capabilities defined in the userns set. */ cred->cap_bset = cred->cap_userns; cred->cap_permitted = cred->cap_userns; From patchwork Sun Jun 9 10:43:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Calmels X-Patchwork-Id: 13691209 Received: from flow3-smtp.messagingengine.com (flow3-smtp.messagingengine.com [103.168.172.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E63E2C181; Sun, 9 Jun 2024 10:40:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.138 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717929649; cv=none; b=acFueZM3Wi4f5DLB4f00/775V3zHGC0+MGI8YH+ERESMj0+ez6T2WShmYJZiiOw6UKZREsy1G88+neHoKkqexTco2Fi0Qy+fikuC8Ydyk4nOuOjnNDjCGOR14NiZxIUWINL2vaVahf7PzQCHWvya3kfspp/3pbyeuJHacCLAj3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717929649; c=relaxed/simple; bh=yhrFM9a1AAGa6F/yQlT85ABET6HKqLjbN7uNjulPHjk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MoVLnrZ/cQDYvoP/NKqaXIeUcr9jIAB6IZQucY8to5KiGXAHrR0sDvDDCqbZNWzjgEAPRfD3UWFLnseb7dnxmXjVn0Lsd+cGGYDr0bAww2CDgD0DAXi59LD6RYDiaMzvdzErfOu9sa30/dwP55NH1lKQMhohS6KXHnL55EUFWOQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net; spf=pass smtp.mailfrom=3xx0.net; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b=ljdvgWRE; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=AZAHZSf2; arc=none smtp.client-ip=103.168.172.138 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=3xx0.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b="ljdvgWRE"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="AZAHZSf2" Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailflow.nyi.internal (Postfix) with ESMTP id 9032220022F; Sun, 9 Jun 2024 06:40:46 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Sun, 09 Jun 2024 06:40:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=3xx0.net; h=cc :cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm1; t=1717929646; x= 1717933246; bh=zcFWSn/J5QquzegH6FwKfVvJSr+1k1tL286u53jyny8=; b=l jdvgWREhhuLplnI7/5a3ag5MfxGfDUzEgkIhF7BXSEj+JxJFYr3bPkMiI6zekoZF YRGQyWd0ctL4FamRDme6VQ0jIgd6l+eJuSqtUdhq55qHuMyhvaNL+Z5HZDwsa7sx g/7cqWPE4V/ifHXDZk9g+Tu1SiaqfGb45/T7SLjBj7iFhsEnWGHJTn/C6bOINgj4 qroxFSLbFMp1VPo9xnartMdDJoakKC7RQyxk8iG7CQjd86V6xDUYFy7TmUZEyx1L sqTub+Kpz6X0LZHday33Gh4iPP5uszR86vCmoakrb6GX6m6VEa3Ucqp8rugU8FQm ppXotBFeKi3mGa9gg6MrQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1717929646; x= 1717933246; bh=zcFWSn/J5QquzegH6FwKfVvJSr+1k1tL286u53jyny8=; b=A ZAHZSf21tcvL4xlGLhVoX5zjAWQiS6kaChhiGU89AR6UCW87KYqmFGVCFj8dfbZK KevVWAdQFoLAuF+CX5DSpTp8BZEvFpocoVg18C2Uw70FRgDI69h6EzG3Mf9ZhAw2 2j4X4C1ej1p9TViJ9t1TWptbtK+hzybVihLh5E/2c2vm0K9gPfnolUO80yZFNT97 ER4ZaLBrdnVIOCeRB6mMi/ti693mBfP2wBVRayAfL4fuIFYQ2A490lu05Y/EnT8j j95MBgEB5inYPHipLqSUFi+WWBPm4U0Bt3ht8Bzw5extKB4TEPT77/lBnAbR8fEo IyldPdVLS2WfgakGLYKOQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfedtjedgfeduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomheplfhonhgr thhhrghnucevrghlmhgvlhhsuceojhgtrghlmhgvlhhsseefgiigtddrnhgvtheqnecugg ftrfgrthhtvghrnheptdejhfelheejfeeutdekgeevueetkedtgfelkeejgfffhefgveet teffueegvdeknecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrh homhepjhgtrghlmhgvlhhsseefgiigtddrnhgvth X-ME-Proxy: Feedback-ID: i76614979:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 9 Jun 2024 06:40:41 -0400 (EDT) From: Jonathan Calmels To: brauner@kernel.org, ebiederm@xmission.com, Jonathan Corbet , Paul Moore , James Morris , "Serge E. Hallyn" , KP Singh , Matt Bobrowski , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , Stanislav Fomichev , Hao Luo , Jiri Olsa , Luis Chamberlain , Kees Cook , Joel Granados , John Johansen , David Howells , Jarkko Sakkinen , Stephen Smalley , Ondrej Mosnacek , Mykola Lysenko , Shuah Khan Cc: containers@lists.linux.dev, Jonathan Calmels , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-security-module@vger.kernel.org, bpf@vger.kernel.org, apparmor@lists.ubuntu.com, keyrings@vger.kernel.org, selinux@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH v2 3/4] capabilities: Add sysctl to mask off userns caps Date: Sun, 9 Jun 2024 03:43:36 -0700 Message-ID: <20240609104355.442002-4-jcalmels@3xx0.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240609104355.442002-1-jcalmels@3xx0.net> References: <20240609104355.442002-1-jcalmels@3xx0.net> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This patch adds a new system-wide userns capability mask designed to mask off capabilities in user namespaces. This mask is controlled through a sysctl and can be set early in the boot process or on the kernel command line to exclude known capabilities from ever being gained in namespaces. Once set, it can be further restricted to exert dynamic policies on the system (e.g. ward off a potential exploit). Changing this mask requires privileges in the initial user namespace over the newly introduced CAP_SYS_CONTROL. Example: # sysctl -qw kernel.cap_userns_mask=0x1fffffdffff && \ unshare -r grep Cap /proc/self/status CapInh: 0000000000000000 CapPrm: 000001fffffdffff CapEff: 000001fffffdffff CapBnd: 000001fffffdffff CapAmb: 0000000000000000 CapUNs: 000001fffffdffff Signed-off-by: Jonathan Calmels --- include/linux/user_namespace.h | 7 ++++ include/uapi/linux/capability.h | 6 ++- kernel/sysctl.c | 10 +++++ kernel/user_namespace.c | 63 +++++++++++++++++++++++++++++ security/selinux/include/classmap.h | 5 ++- 5 files changed, 88 insertions(+), 3 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 6030a8235617..d958d4819608 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -2,6 +2,7 @@ #ifndef _LINUX_USER_NAMESPACE_H #define _LINUX_USER_NAMESPACE_H +#include #include #include #include @@ -14,6 +15,12 @@ #define UID_GID_MAP_MAX_BASE_EXTENTS 5 #define UID_GID_MAP_MAX_EXTENTS 340 +#ifdef CONFIG_SYSCTL +extern kernel_cap_t cap_userns_mask; +int cap_userns_sysctl_handler(const struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos); +#endif + struct uid_gid_extent { u32 first; u32 lower_first; diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index 5bb906098697..e2c5e4bb2eb0 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -418,7 +418,11 @@ struct vfs_ns_cap_data { #define CAP_CHECKPOINT_RESTORE 40 -#define CAP_LAST_CAP CAP_CHECKPOINT_RESTORE +/* Allow setting the system userns capability mask. */ + +#define CAP_SYS_CONTROL 41 + +#define CAP_LAST_CAP CAP_SYS_CONTROL #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index e0b917328cf9..95b27a92c63c 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -62,6 +62,7 @@ #include #include #include +#include #include #include "../lib/kstrtox.h" @@ -1846,6 +1847,15 @@ static struct ctl_table kern_table[] = { .mode = 0444, .proc_handler = proc_dointvec, }, +#ifdef CONFIG_USER_NS + { + .procname = "cap_userns_mask", + .data = &cap_userns_mask, + .maxlen = sizeof(kernel_cap_t), + .mode = 0644, + .proc_handler = cap_userns_sysctl_handler, + }, +#endif #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) { .procname = "unknown_nmi_panic", diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 53848e2b68cd..e513d87ed102 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -26,6 +26,63 @@ static struct kmem_cache *user_ns_cachep __ro_after_init; static DEFINE_MUTEX(userns_state_mutex); +#ifdef CONFIG_SYSCTL +static DEFINE_SPINLOCK(cap_userns_lock); +kernel_cap_t cap_userns_mask = CAP_FULL_SET; + +int cap_userns_sysctl_handler(const struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + struct ctl_table t; + unsigned long mask_array[2]; + kernel_cap_t new_mask, *mask; + int err; + + if (write && !capable(CAP_SYS_CONTROL)) + return -EPERM; + + /* + * convert from the global kernel_cap_t to the ulong array to print to + * userspace if this is a read. + * + * capabilities are exposed as one 64-bit value or two 32-bit values + * depending on the architecture + */ + mask = table->data; + spin_lock(&cap_userns_lock); + mask_array[0] = (unsigned long) mask->val; + if (BITS_PER_LONG != 64) + mask_array[1] = mask->val >> BITS_PER_LONG; + spin_unlock(&cap_userns_lock); + + t = *table; + t.data = &mask_array; + + /* + * actually read or write and array of ulongs from userspace. Remember + * these are least significant bits first + */ + err = proc_doulongvec_minmax(&t, write, buffer, lenp, ppos); + if (err < 0) + return err; + + new_mask.val = mask_array[0]; + if (BITS_PER_LONG != 64) + new_mask.val += (u64)mask_array[1] << BITS_PER_LONG; + + /* + * Drop everything not in the new_mask (but don't add things) + */ + if (write) { + spin_lock(&cap_userns_lock); + *mask = cap_intersect(*mask, new_mask); + spin_unlock(&cap_userns_lock); + } + + return 0; +} +#endif + static bool new_idmap_permitted(const struct file *file, struct user_namespace *ns, int cap_setid, struct uid_gid_map *map); @@ -46,6 +103,12 @@ static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns) /* Limit userns capabilities to our parent's bounding set. */ if (iscredsecure(cred, SECURE_USERNS_STRICT_CAPS)) cred->cap_userns = cap_intersect(cred->cap_userns, cred->cap_bset); +#ifdef CONFIG_SYSCTL + /* Mask off userns capabilities that are not permitted by the system-wide mask. */ + spin_lock(&cap_userns_lock); + cred->cap_userns = cap_intersect(cred->cap_userns, cap_userns_mask); + spin_unlock(&cap_userns_lock); +#endif /* Start with the capabilities defined in the userns set. */ cred->cap_bset = cred->cap_userns; diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h index 7229c9bf6c27..8f3ede7aac92 100644 --- a/security/selinux/include/classmap.h +++ b/security/selinux/include/classmap.h @@ -34,9 +34,10 @@ #define COMMON_CAP2_PERMS \ "mac_override", "mac_admin", "syslog", "wake_alarm", "block_suspend", \ - "audit_read", "perfmon", "bpf", "checkpoint_restore" + "audit_read", "perfmon", "bpf", "checkpoint_restore", \ + "sys_control" -#if CAP_LAST_CAP > CAP_CHECKPOINT_RESTORE +#if CAP_LAST_CAP > CAP_SYS_CONTROL #error New capability defined, please update COMMON_CAP2_PERMS. #endif From patchwork Sun Jun 9 10:43:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Calmels X-Patchwork-Id: 13691210 X-Patchwork-Delegate: bpf@iogearbox.net Received: from flow3-smtp.messagingengine.com (flow3-smtp.messagingengine.com [103.168.172.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A86EA2C181; Sun, 9 Jun 2024 10:40:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.138 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717929658; cv=none; b=B88DVSSn6NFfVZKO+yOqq4igTnY/LQeok3eZWGnqrXEjX9BZsSsiYKe3sjgfsMuYu9JsFxJdddGnwp8FdxG+5Wzm+/6veE6fXsVLCFourju/b6ccc8hkp8Qr+zp56jh7eSKha8gPaETmqC2EllrJkc9yS2BYTtABuCV8TW1mSDs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717929658; c=relaxed/simple; bh=Te9b2tTgFLIjn+G+t41oNEnaKLeIvcknA3SYrFEQ1WA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UaAlVMZHGXpCRt6hmqsJ26uAjVL1cSedgRv6Eqxhegn0e+PHigmudChHUsaXynCzqdVzb+Og4a8SAy/iQJI1uhHYgugRww8LrwjRerzSOJfQXN0ub8hmSx5Zh4xXt+7J1nADeYx9ukx5+GpMJEG/Tamu8VAD0sp0C1654ep2Kcw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net; spf=pass smtp.mailfrom=3xx0.net; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b=UqNy3PyQ; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=mGpmevGf; arc=none smtp.client-ip=103.168.172.138 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=3xx0.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b="UqNy3PyQ"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="mGpmevGf" Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailflow.nyi.internal (Postfix) with ESMTP id D663B200230; Sun, 9 Jun 2024 06:40:55 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Sun, 09 Jun 2024 06:40:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=3xx0.net; h=cc :cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm1; t=1717929655; x= 1717933255; bh=XpZy3c6xoasiD2Syf1e7KNVl4niFLU+xqznRa2cRaQY=; b=U qNy3PyQhOH732E7pX54uA3DCObeosJ+3w1bM4h6PkM2TyOXtPe/X0C7NeDBarAuo bscAip/ZjGHinjSrMO965JZL9l3X+NmdjU8lc4p0JMQ1kCBtPfZzesdnYahUht/Y SmpLd4vgq1rB96M7R1mpl0jYE8AeKKwqc5DVGUObjxfkeNdWEvf1P9G9MFmlOXih WsmRaPVhEutfR5aL4ufVoG7BY3sxlZNu7WEaiifcC+9RTuazDpTIfhDwbyRQdU76 8BB1Epbsh7d5SYG4MbMELRbmH6s79Znf3/qb8BC50Bgvu74JM4HoZd8BXgz0DePe BTd+aBrY9kbdKRaP4bUuw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1717929655; x= 1717933255; bh=XpZy3c6xoasiD2Syf1e7KNVl4niFLU+xqznRa2cRaQY=; b=m GpmevGfJ6mIMqMWXF3GfuRbVuOA2qHUusWXetaTnZatmccDgLfYt9+gS8WJKZ7Vt sVvca/vuyK2rJxBkV+rov0Rv9vhlzkbXlOZeQ3W7UD5V3xUr9bHqYbXdrD90kW8T mfIcvLN9T/J4zqKyxGkMok8jg2c0FsbVkpZoIbqXOmOpRQOnjybVqC8UkVKGPmNB mM/wMt3j1YfbluAoPiUhWXFbadxAJSCEXovKZtA+IhaBy83V4Yz/GJNv6uzAznAR AFcbxiXoutSy5hAYpQ+RtGJH5UywxdNvOTOxhQssELcFsrOOyJbK2WXfuPZGUK4U 3mxF0P6YInPFOZp55XBhQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfedtjedgfeduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomheplfhonhgr thhhrghnucevrghlmhgvlhhsuceojhgtrghlmhgvlhhsseefgiigtddrnhgvtheqnecugg ftrfgrthhtvghrnheptdejhfelheejfeeutdekgeevueetkedtgfelkeejgfffhefgveet teffueegvdeknecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrh homhepjhgtrghlmhgvlhhsseefgiigtddrnhgvth X-ME-Proxy: Feedback-ID: i76614979:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 9 Jun 2024 06:40:51 -0400 (EDT) From: Jonathan Calmels To: brauner@kernel.org, ebiederm@xmission.com, Jonathan Corbet , Paul Moore , James Morris , "Serge E. Hallyn" , KP Singh , Matt Bobrowski , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , Stanislav Fomichev , Hao Luo , Jiri Olsa , Luis Chamberlain , Kees Cook , Joel Granados , John Johansen , David Howells , Jarkko Sakkinen , Stephen Smalley , Ondrej Mosnacek , Mykola Lysenko , Shuah Khan Cc: containers@lists.linux.dev, Jonathan Calmels , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-security-module@vger.kernel.org, bpf@vger.kernel.org, apparmor@lists.ubuntu.com, keyrings@vger.kernel.org, selinux@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH v2 4/4] bpf,lsm: Allow editing capabilities in BPF-LSM hooks Date: Sun, 9 Jun 2024 03:43:37 -0700 Message-ID: <20240609104355.442002-5-jcalmels@3xx0.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240609104355.442002-1-jcalmels@3xx0.net> References: <20240609104355.442002-1-jcalmels@3xx0.net> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net This patch allows modifying the various capabilities of the struct cred in BPF-LSM hooks. More specifically, the userns_create hook called prior to creating a new user namespace. With the introduction of userns capabilities, this effectively provides a simple way for LSMs to control the capabilities granted to a user namespace and all its descendants. Update the selftests accordingly by dropping CAP_SYS_ADMIN in namespaces and checking the resulting task's bounding set. Signed-off-by: Jonathan Calmels --- include/linux/lsm_hook_defs.h | 2 +- include/linux/security.h | 4 +- kernel/bpf/bpf_lsm.c | 55 +++++++++++++++++++ security/apparmor/lsm.c | 2 +- security/security.c | 6 +- security/selinux/hooks.c | 2 +- .../selftests/bpf/prog_tests/deny_namespace.c | 12 ++-- .../selftests/bpf/progs/test_deny_namespace.c | 7 ++- 8 files changed, 76 insertions(+), 14 deletions(-) diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index f804b76cde44..58d6d8f2511f 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -250,7 +250,7 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p, struct inode *inode) -LSM_HOOK(int, 0, userns_create, const struct cred *cred) +LSM_HOOK(int, 0, userns_create, struct cred *cred) LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag) LSM_HOOK(void, LSM_RET_VOID, ipc_getsecid, struct kern_ipc_perm *ipcp, u32 *secid) diff --git a/include/linux/security.h b/include/linux/security.h index 21cf70346b33..ffb1b0dd2aef 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -465,7 +465,7 @@ int security_task_kill(struct task_struct *p, struct kernel_siginfo *info, int security_task_prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5); void security_task_to_inode(struct task_struct *p, struct inode *inode); -int security_create_user_ns(const struct cred *cred); +int security_create_user_ns(struct cred *cred); int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag); void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid); int security_msg_msg_alloc(struct msg_msg *msg); @@ -1294,7 +1294,7 @@ static inline int security_task_prctl(int option, unsigned long arg2, static inline void security_task_to_inode(struct task_struct *p, struct inode *inode) { } -static inline int security_create_user_ns(const struct cred *cred) +static inline int security_create_user_ns(struct cred *cred) { return 0; } diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c index 68240c3c6e7d..6edba93ff883 100644 --- a/kernel/bpf/bpf_lsm.c +++ b/kernel/bpf/bpf_lsm.c @@ -382,10 +382,65 @@ bool bpf_lsm_is_trusted(const struct bpf_prog *prog) return !btf_id_set_contains(&untrusted_lsm_hooks, prog->aux->attach_btf_id); } +static int bpf_lsm_btf_struct_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) +{ + const struct btf_type *cred; + const struct btf_type *t; + s32 type_id; + size_t end; + + type_id = btf_find_by_name_kind(reg->btf, "cred", BTF_KIND_STRUCT); + if (type_id < 0) + return -EINVAL; + + t = btf_type_by_id(reg->btf, reg->btf_id); + cred = btf_type_by_id(reg->btf, type_id); + if (t != cred) { + bpf_log(log, "only read is supported\n"); + return -EACCES; + } + + switch (off) { + case offsetof(struct cred, cap_inheritable): + end = offsetofend(struct cred, cap_inheritable); + break; + case offsetof(struct cred, cap_permitted): + end = offsetofend(struct cred, cap_permitted); + break; + case offsetof(struct cred, cap_effective): + end = offsetofend(struct cred, cap_effective); + break; + case offsetof(struct cred, cap_bset): + end = offsetofend(struct cred, cap_bset); + break; + case offsetof(struct cred, cap_ambient): + end = offsetofend(struct cred, cap_ambient); + break; + case offsetof(struct cred, cap_userns): + end = offsetofend(struct cred, cap_userns); + break; + default: + bpf_log(log, "no write support to cred at off %d\n", off); + return -EACCES; + } + + if (off + size > end) { + bpf_log(log, + "write access at off %d with size %d beyond the member of cred ended at %zu\n", + off, size, end); + return -EACCES; + } + + return 0; +} + const struct bpf_prog_ops lsm_prog_ops = { }; const struct bpf_verifier_ops lsm_verifier_ops = { .get_func_proto = bpf_lsm_func_proto, .is_valid_access = btf_ctx_access, + .btf_struct_access = bpf_lsm_btf_struct_access, }; diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c index 6239777090c4..310c9fa3d4b4 100644 --- a/security/apparmor/lsm.c +++ b/security/apparmor/lsm.c @@ -1036,7 +1036,7 @@ static int apparmor_task_kill(struct task_struct *target, struct kernel_siginfo return error; } -static int apparmor_userns_create(const struct cred *cred) +static int apparmor_userns_create(struct cred *cred) { struct aa_label *label; struct aa_profile *profile; diff --git a/security/security.c b/security/security.c index e5da848c50b9..83cf2025c58e 100644 --- a/security/security.c +++ b/security/security.c @@ -3558,14 +3558,14 @@ void security_task_to_inode(struct task_struct *p, struct inode *inode) } /** - * security_create_user_ns() - Check if creating a new userns is allowed + * security_create_user_ns() - Review permissions prior to userns creation * @cred: prepared creds * - * Check permission prior to creating a new user namespace. + * Check and/or modify permissions prior to creating a new user namespace. * * Return: Returns 0 if successful, otherwise < 0 error code. */ -int security_create_user_ns(const struct cred *cred) +int security_create_user_ns(struct cred *cred) { return call_int_hook(userns_create, cred); } diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 7eed331e90f0..28deb9510d8e 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4263,7 +4263,7 @@ static void selinux_task_to_inode(struct task_struct *p, spin_unlock(&isec->lock); } -static int selinux_userns_create(const struct cred *cred) +static int selinux_userns_create(struct cred *cred) { u32 sid = current_sid(); diff --git a/tools/testing/selftests/bpf/prog_tests/deny_namespace.c b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c index 1bc6241b755b..1500578f9a30 100644 --- a/tools/testing/selftests/bpf/prog_tests/deny_namespace.c +++ b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c @@ -5,6 +5,8 @@ #include #include "cap_helpers.h" #include +#include +#include static int wait_for_pid(pid_t pid) { @@ -29,7 +31,7 @@ static int wait_for_pid(pid_t pid) * positive return value -> userns creation failed * 0 -> userns creation succeeded */ -static int create_user_ns(void) +static int create_user_ns(bool bpf) { pid_t pid; @@ -40,6 +42,8 @@ static int create_user_ns(void) if (pid == 0) { if (unshare(CLONE_NEWUSER)) _exit(EXIT_FAILURE); + if (bpf && prctl(PR_CAPBSET_READ, CAP_SYS_ADMIN)) + _exit(EXIT_FAILURE); _exit(EXIT_SUCCESS); } @@ -53,11 +57,11 @@ static void test_userns_create_bpf(void) cap_enable_effective(cap_mask, &old_caps); - ASSERT_OK(create_user_ns(), "priv new user ns"); + ASSERT_OK(create_user_ns(true), "priv new user ns"); cap_disable_effective(cap_mask, &old_caps); - ASSERT_EQ(create_user_ns(), EPERM, "unpriv new user ns"); + ASSERT_EQ(create_user_ns(true), EPERM, "unpriv new user ns"); if (cap_mask & old_caps) cap_enable_effective(cap_mask, NULL); @@ -70,7 +74,7 @@ static void test_unpriv_userns_create_no_bpf(void) cap_disable_effective(cap_mask, &old_caps); - ASSERT_OK(create_user_ns(), "no-bpf unpriv new user ns"); + ASSERT_OK(create_user_ns(false), "no-bpf unpriv new user ns"); if (cap_mask & old_caps) cap_enable_effective(cap_mask, NULL); diff --git a/tools/testing/selftests/bpf/progs/test_deny_namespace.c b/tools/testing/selftests/bpf/progs/test_deny_namespace.c index e96b901a733c..051906f80f4c 100644 --- a/tools/testing/selftests/bpf/progs/test_deny_namespace.c +++ b/tools/testing/selftests/bpf/progs/test_deny_namespace.c @@ -9,12 +9,13 @@ typedef struct { unsigned long long val; } kernel_cap_t; struct cred { kernel_cap_t cap_effective; + kernel_cap_t cap_userns; } __attribute__((preserve_access_index)); char _license[] SEC("license") = "GPL"; SEC("lsm.s/userns_create") -int BPF_PROG(test_userns_create, const struct cred *cred, int ret) +int BPF_PROG(test_userns_create, struct cred *cred, int ret) { kernel_cap_t caps = cred->cap_effective; __u64 cap_mask = 1ULL << CAP_SYS_ADMIN; @@ -23,8 +24,10 @@ int BPF_PROG(test_userns_create, const struct cred *cred, int ret) return 0; ret = -EPERM; - if (caps.val & cap_mask) + if (caps.val & cap_mask) { + cred->cap_userns.val &= ~cap_mask; return 0; + } return -EPERM; }