From patchwork Thu May 16 09:22:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jonathan Calmels X-Patchwork-Id: 13665924 Received: from flow7-smtp.messagingengine.com (flow7-smtp.messagingengine.com [103.168.172.142]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13DCE14386A; Thu, 16 May 2024 09:21:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.142 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715851289; cv=none; b=X5Egfs7AmdZuw6o80OZyBjn0/AJ38dw623t3nyg4sJf+aEgESlEzZ8vrAWt/E6roHcL22lOF9fBXShJf7NzopgtdngefOnGYxFZ4T5LMHEFheGFSq2NQPly4S230plsiUID7QMcoLputzyuz0BV7rs/TJ9tOW3k6PUsdzKqGfDM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715851289; c=relaxed/simple; bh=Hr76OlKl3SP62o9nnG226zhHiGSCGjsCWFsUNe4kp6M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PHMaLagpaAPJXzDSqHSmLQmgcYFZ1WrpTlc7BU4srOnlBZzlq0FDq0WYyfDXHjO/CdpD/vsVJfsyxfF1U0nUen/uZOdag+8y9Wmi1e9OcAGWfuf6qfYL5+MMd1xj2o2M0OuSNWB520Ldgpkw85c3bDUVTi0k7ra39DeZAbk4YR0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net; spf=pass smtp.mailfrom=3xx0.net; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b=V7TZcx8W; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=kOYwKGKc; arc=none smtp.client-ip=103.168.172.142 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=3xx0.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b="V7TZcx8W"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="kOYwKGKc" Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailflow.nyi.internal (Postfix) with ESMTP id 3FAE5200436; Thu, 16 May 2024 05:21:27 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Thu, 16 May 2024 05:21:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=3xx0.net; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1715851287; x=1715854887; bh=TZjOnb0H5vnFoLCsEWXTKcw89E4EnJrXebH7gGAIs9k=; b= V7TZcx8W2+KzAtGpTNZzqcpJp3Y0dmalztN7Er6atXZKKBW3FzoYzc9GUz8Y0ht4 UgFVNsX7IsSSwM3bw+UT3L4xffgaI/njNt9S6RUBe4UxOJ5ev7N4O1KVVOI4DzCp MN6VbN4ciigV5EiuupFrqls2BsodZxAKFhLhxIXplpA76C9+JFxhB0gxicYU19Uj tbS7/AgH+LrR0qsnY2U2b6TPwMTGyJflJmkWDi44d3PSKiyhE46ZBQPxk6TK/omJ LZxBjAOh+5VeusKWl7lY1bDIJZAdFxGVkcJKZyRP1D5x/wmm92vxhtotJhkKgAEo fy54WgTSh842s/DtpHG0jw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=i76614979.fm3; t= 1715851287; x=1715854887; bh=TZjOnb0H5vnFoLCsEWXTKcw89E4EnJrXebH 7gGAIs9k=; b=kOYwKGKcUlH6c9syVE4bN+2q1wSRmryV66C7lKvUspPiYMZ2yTj JsC0w3Gyo+rlfcVyaqg2ebJLXv91ME+6ioG0I6oH2VOfUZ9MqiKFbviMtsUjf2Y1 Mpew/mbA3cEthBHH837dKIuOZRdR4H6RWzbV9hWeuoxUMjVrWr5VPvrlFjyjEJb7 hXmqG0gKTmySmptxIUJjO3M4Ojuuhj+CrzWWSz6/9w1NOoHvseUAthr7NVi92DTG n3UsV2yVuZa4d5LA2lmwFKYst2uTHLQgxh3L3JqpnY0pwnzFbIclXWNccMKqphao d7Ug6MmbaK6aTZaAiKVTLajpjgWvbUdpj5A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdehuddgudefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvvefufffkofgjfhggtgfgsehtkeertdertdejnecuhfhrohhmpeflohhn rghthhgrnhcuvegrlhhmvghlshcuoehjtggrlhhmvghlshesfeiggidtrdhnvghtqeenuc ggtffrrghtthgvrhhnpeeftdeutdejieffledtteeikeetkeehuefgtefghfevjefftdff jeegtedvkeethfenucffohhmrghinhepghhoohhglhgvsghlohhgrdgtohhmpdhkvghrnh gvlhdrohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhr ohhmpehjtggrlhhmvghlshesfeiggidtrdhnvght X-ME-Proxy: Feedback-ID: i76614979:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 16 May 2024 05:21:25 -0400 (EDT) From: Jonathan Calmels To: brauner@kernel.org, ebiederm@xmission.com, Luis Chamberlain , Kees Cook , Joel Granados , Serge Hallyn , Paul Moore , James Morris , David Howells , Jarkko Sakkinen Cc: containers@lists.linux.dev, Jonathan Calmels , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, keyrings@vger.kernel.org Subject: [PATCH 1/3] capabilities: user namespace capabilities Date: Thu, 16 May 2024 02:22:03 -0700 Message-ID: <20240516092213.6799-2-jcalmels@3xx0.net> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240516092213.6799-1-jcalmels@3xx0.net> References: <20240516092213.6799-1-jcalmels@3xx0.net> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Attackers often rely on user namespaces to get elevated (yet confined) privileges in order to target specific subsystems (e.g. [1]). Distributions have been pretty adamant that they need a way to configure these, most of them carry out-of-tree patches to do so, or plainly refuse to enable them. As a result, there have been multiple efforts over the years to introduce various knobs to control and/or disable user namespaces (e.g. [2][3][4]). While we acknowledge that there are already ways to control the creation of such namespaces (the most recent being a LSM hook), there are inherent issues with these approaches. Preventing the user namespace creation is not fine-grained enough, and in some cases, incompatible with various userspace expectations (e.g. container runtimes, browser sandboxing, service isolation) This patch addresses these limitations by introducing an additional capability set used to restrict the permissions granted when creating user namespaces. This way, processes can apply the principle of least privilege by configuring only the capabilities they need for their namespaces. For compatibility reasons, processes always start with a full userns capability set. On namespace creation, the userns capability set (pU) is assigned to the new effective (pE), permitted (pP) and bounding set (X) of the task: pU = pE = pP = X The userns capability set obeys the invariant that no bit can ever be set if it is not already part of the task’s bounding set. This ensures that no namespace can ever gain more privileges than its predecessors. Additionally, if a task is not privileged over CAP_SETPCAP, setting any bit in the userns set requires its corresponding bit to be set in the permitted set. This effectively mimics the inheritable set rules and means that, by default, only root in the initial user namespace can gain userns capabilities: p’U = (pE & CAP_SETPCAP) ? X : (X & pP) Note that since userns capabilities are strictly hierarchical, policies can be enforced at various levels (e.g. init, pam_cap) and inherited by every child namespace. Here is a sample program that can be used to verify the functionality: /* * Test program that drops CAP_SYS_RAWIO from subsequent user namespaces. * * ./cap_userns_test unshare -r grep Cap /proc/self/status * CapInh: 0000000000000000 * CapPrm: 000001fffffdffff * CapEff: 000001fffffdffff * CapBnd: 000001fffffdffff * CapAmb: 0000000000000000 * CapUNs: 000001fffffdffff */ int main(int argc, char *argv[]) { if (prctl(PR_CAP_USERNS, PR_CAP_USERNS_LOWER, CAP_SYS_RAWIO, 0, 0) < 0) err(1, "cannot drop userns cap"); execvp(argv[1], argv + 1); err(1, "cannot exec"); } Link: https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html Link: https://lore.kernel.org/lkml/1453502345-30416-1-git-send-email-keescook@chromium.org Link: https://lore.kernel.org/lkml/20220815162028.926858-1-fred@cloudflare.com Link: https://lore.kernel.org/containers/168547265011.24337.4306067683997517082-0@git.sr.ht Signed-off-by: Jonathan Calmels Reviewed-by: Serge Hallyn --- fs/proc/array.c | 9 ++++++ include/linux/cred.h | 3 ++ include/uapi/linux/prctl.h | 7 +++++ kernel/cred.c | 3 ++ kernel/umh.c | 16 ++++++++++ kernel/user_namespace.c | 12 +++----- security/commoncap.c | 59 ++++++++++++++++++++++++++++++++++++ security/keys/process_keys.c | 3 ++ 8 files changed, 105 insertions(+), 7 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index 34a47fb0c57f..364e8bb19f9d 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -313,6 +313,9 @@ static inline void task_cap(struct seq_file *m, struct task_struct *p) const struct cred *cred; kernel_cap_t cap_inheritable, cap_permitted, cap_effective, cap_bset, cap_ambient; +#ifdef CONFIG_USER_NS + kernel_cap_t cap_userns; +#endif rcu_read_lock(); cred = __task_cred(p); @@ -321,6 +324,9 @@ static inline void task_cap(struct seq_file *m, struct task_struct *p) cap_effective = cred->cap_effective; cap_bset = cred->cap_bset; cap_ambient = cred->cap_ambient; +#ifdef CONFIG_USER_NS + cap_userns = cred->cap_userns; +#endif rcu_read_unlock(); render_cap_t(m, "CapInh:\t", &cap_inheritable); @@ -328,6 +334,9 @@ static inline void task_cap(struct seq_file *m, struct task_struct *p) render_cap_t(m, "CapEff:\t", &cap_effective); render_cap_t(m, "CapBnd:\t", &cap_bset); render_cap_t(m, "CapAmb:\t", &cap_ambient); +#ifdef CONFIG_USER_NS + render_cap_t(m, "CapUNs:\t", &cap_userns); +#endif } static inline void task_seccomp(struct seq_file *m, struct task_struct *p) diff --git a/include/linux/cred.h b/include/linux/cred.h index 2976f534a7a3..adab0031443e 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -124,6 +124,9 @@ struct cred { kernel_cap_t cap_effective; /* caps we can actually use */ kernel_cap_t cap_bset; /* capability bounding set */ kernel_cap_t cap_ambient; /* Ambient capability set */ +#ifdef CONFIG_USER_NS + kernel_cap_t cap_userns; /* User namespace capability set */ +#endif #ifdef CONFIG_KEYS unsigned char jit_keyring; /* default keyring to attach requested * keys to */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 370ed14b1ae0..e09475171f62 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -198,6 +198,13 @@ struct prctl_mm_map { # define PR_CAP_AMBIENT_LOWER 3 # define PR_CAP_AMBIENT_CLEAR_ALL 4 +/* Control the userns capability set */ +#define PR_CAP_USERNS 48 +# define PR_CAP_USERNS_IS_SET 1 +# define PR_CAP_USERNS_RAISE 2 +# define PR_CAP_USERNS_LOWER 3 +# define PR_CAP_USERNS_CLEAR_ALL 4 + /* arm64 Scalable Vector Extension controls */ /* Flag values must be kept in sync with ptrace NT_ARM_SVE interface */ #define PR_SVE_SET_VL 50 /* set task vector length */ diff --git a/kernel/cred.c b/kernel/cred.c index 075cfa7c896f..9912c6f3bc6b 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -56,6 +56,9 @@ struct cred init_cred = { .cap_permitted = CAP_FULL_SET, .cap_effective = CAP_FULL_SET, .cap_bset = CAP_FULL_SET, +#ifdef CONFIG_USER_NS + .cap_userns = CAP_FULL_SET, +#endif .user = INIT_USER, .user_ns = &init_user_ns, .group_info = &init_groups, diff --git a/kernel/umh.c b/kernel/umh.c index 1b13c5d34624..51f1e1d25d49 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -32,6 +32,9 @@ #include +#ifdef CONFIG_USER_NS +static kernel_cap_t usermodehelper_userns = CAP_FULL_SET; +#endif static kernel_cap_t usermodehelper_bset = CAP_FULL_SET; static kernel_cap_t usermodehelper_inheritable = CAP_FULL_SET; static DEFINE_SPINLOCK(umh_sysctl_lock); @@ -94,6 +97,10 @@ static int call_usermodehelper_exec_async(void *data) new->cap_bset = cap_intersect(usermodehelper_bset, new->cap_bset); new->cap_inheritable = cap_intersect(usermodehelper_inheritable, new->cap_inheritable); +#ifdef CONFIG_USER_NS + new->cap_userns = cap_intersect(usermodehelper_userns, + new->cap_userns); +#endif spin_unlock(&umh_sysctl_lock); if (sub_info->init) { @@ -560,6 +567,15 @@ static struct ctl_table usermodehelper_table[] = { .mode = 0600, .proc_handler = proc_cap_handler, }, +#ifdef CONFIG_USER_NS + { + .procname = "userns", + .data = &usermodehelper_userns, + .maxlen = 2 * sizeof(unsigned long), + .mode = 0600, + .proc_handler = proc_cap_handler, + }, +#endif { } }; diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 0b0b95418b16..7e624607330b 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -42,15 +42,13 @@ static void dec_user_namespaces(struct ucounts *ucounts) static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns) { - /* Start with the same capabilities as init but useless for doing - * anything as the capabilities are bound to the new user namespace. - */ - cred->securebits = SECUREBITS_DEFAULT; + /* Start with the capabilities defined in the userns set. */ + cred->cap_bset = cred->cap_userns; + cred->cap_permitted = cred->cap_userns; + cred->cap_effective = cred->cap_userns; cred->cap_inheritable = CAP_EMPTY_SET; - cred->cap_permitted = CAP_FULL_SET; - cred->cap_effective = CAP_FULL_SET; cred->cap_ambient = CAP_EMPTY_SET; - cred->cap_bset = CAP_FULL_SET; + cred->securebits = SECUREBITS_DEFAULT; #ifdef CONFIG_KEYS key_put(cred->request_key_auth); cred->request_key_auth = NULL; diff --git a/security/commoncap.c b/security/commoncap.c index 162d96b3a676..b3d3372bf910 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -228,6 +228,28 @@ static inline int cap_inh_is_capped(void) return 1; } +/* + * Determine whether a userns capability can be raised. + * Returns 1 if it can, 0 otherwise. + */ +#ifdef CONFIG_USER_NS +static inline int cap_uns_is_raiseable(unsigned long cap) +{ + if (!!cap_raised(current_cred()->cap_userns, cap)) + return 1; + /* a capability cannot be raised unless the current task has it in + * its bounding set and, without CAP_SETPCAP, its permitted set. + */ + if (!cap_raised(current_cred()->cap_bset, cap)) + return 0; + if (cap_capable(current_cred(), current_cred()->user_ns, + CAP_SETPCAP, CAP_OPT_NONE) != 0 && + !cap_raised(current_cred()->cap_permitted, cap)) + return 0; + return 1; +} +#endif + /** * cap_capset - Validate and apply proposed changes to current's capabilities * @new: The proposed new credentials; alterations should be made here @@ -1382,6 +1404,43 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3, return commit_creds(new); } +#ifdef CONFIG_USER_NS + case PR_CAP_USERNS: + if (arg2 == PR_CAP_USERNS_CLEAR_ALL) { + if (arg3 | arg4 | arg5) + return -EINVAL; + + new = prepare_creds(); + if (!new) + return -ENOMEM; + cap_clear(new->cap_userns); + return commit_creds(new); + } + + if (((!cap_valid(arg3)) | arg4 | arg5)) + return -EINVAL; + + if (arg2 == PR_CAP_USERNS_IS_SET) { + return !!cap_raised(current_cred()->cap_userns, arg3); + } else if (arg2 != PR_CAP_USERNS_RAISE && + arg2 != PR_CAP_USERNS_LOWER) { + return -EINVAL; + } else { + if (arg2 == PR_CAP_USERNS_RAISE && + !cap_uns_is_raiseable(arg3)) + return -EPERM; + + new = prepare_creds(); + if (!new) + return -ENOMEM; + if (arg2 == PR_CAP_USERNS_RAISE) + cap_raise(new->cap_userns, arg3); + else + cap_lower(new->cap_userns, arg3); + return commit_creds(new); + } +#endif + default: /* No functionality available - continue with default */ return -ENOSYS; diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c index b5d5333ab330..e3670d815435 100644 --- a/security/keys/process_keys.c +++ b/security/keys/process_keys.c @@ -944,6 +944,9 @@ void key_change_session_keyring(struct callback_head *twork) new->cap_effective = old->cap_effective; new->cap_ambient = old->cap_ambient; new->cap_bset = old->cap_bset; +#ifdef CONFIG_USER_NS + new->cap_userns = old->cap_userns; +#endif new->jit_keyring = old->jit_keyring; new->thread_keyring = key_get(old->thread_keyring); From patchwork Thu May 16 09:22:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jonathan Calmels X-Patchwork-Id: 13665925 Received: from flow7-smtp.messagingengine.com (flow7-smtp.messagingengine.com [103.168.172.142]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58C6F14388C; Thu, 16 May 2024 09:21:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.142 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715851296; cv=none; b=HkmfgW5eOxPxQ0kR0DQzM36WyNKKVQUJx0iLLA+nKUuA2cc0/Yd2xrPk08OHh6p6o3we+6mLO0jOXzAj1zODnRTYZScB9hrY+6GSsSVZfmWul0PEtDYRSkqM3KWLRH5IeIyRap0t/zXOceQWko2GX8xfnFbXK9yKZuObsxEFMUo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715851296; c=relaxed/simple; bh=6O1ofNJUy0gHxkhdD5Q09AwNTpUXElwnupKCEglOSCw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=uTuMJpZDdiNnhRHI7IalyEH+szOO6nLu7otpmTh9Yt27EbBe7LLliSduLWkj1ZFzJh/6Zirq/UVLYu1wGz/bp5sXNbdu5PcTNiyjWYz1JySVRjHuZHaFaFrNqecCWkXjLNzYnzkxxj7bMTUbzpOQ7BCIhjNBXXPjosJkjYTlDec= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net; spf=pass smtp.mailfrom=3xx0.net; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b=UA73pc/O; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=jMQGshpO; arc=none smtp.client-ip=103.168.172.142 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=3xx0.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b="UA73pc/O"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="jMQGshpO" Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailflow.nyi.internal (Postfix) with ESMTP id 836752005AF; Thu, 16 May 2024 05:21:33 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 16 May 2024 05:21:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=3xx0.net; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1715851293; x=1715854893; bh=HAPFKHfyGQbSeGmKfPdaAPtnhg6SRuYQe6Dc6y43qhQ=; b= UA73pc/OZj0YmV+zhOjcamXvQzJJ5N4fuyV89JobU6APEbh3YxSpSZ9SZmzmS94G rMh36GIb308syuhVcPjzaefRcxakmw+Hju6JVck7oXFoE/hbwWnZ5OJk6ZhC/UOe s3bU+U5i7eb3MOT9xFPu4a9FVA8Q7HCqJ44eYDSYNn8GvtHNMWK0MjN9FL0gHkVs sBGyDG36b2NmJwtsBkRqJ7sswWf4c4wJjf65YaEOWAxpRfVAajhHmiZLFm54AxZN nQxYnr4PMZ+PSk4uXC9gOAxaDcEv3YaXx1LxKtt/9WFN9uOn0vZ6feamqDAJ2DDs g0kJO06+fxzabNaaiWQLRQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=i76614979.fm3; t= 1715851293; x=1715854893; bh=HAPFKHfyGQbSeGmKfPdaAPtnhg6SRuYQe6D c6y43qhQ=; b=jMQGshpO5p6HAlkOptaqnbv/v3o1Fgscrcpf48aYl4AVGKxR4L6 8zEPNQlL6ri9lnVlANsBBv8D+FJnFi6i425cf32wSc4EyT877h8IxsYo2r5sL168 96HpaVCq6DHVjOHBF/VmjEEAAM5tbwHM8cz+xMkkilU+eZN81M1kuPhiX7O1gkHe 7JoQ3gSX9hklFZq4Zjvzdbfd6nUDF1EOVRl8sVX1mbJjFgmhBFsBBKnJYJUH/38d xUXZmazej721K9huD6abFj4IS5SaX2SyHmqHSP9ONtb7gf3fRg1Oi5IL18ZrmfUj p+MtIswlKMhDIWfaWJn+T/B5ZcZTq9mHAdg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdehuddgudefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvvefufffkofgjfhggtgfgsehtkeertdertdejnecuhfhrohhmpeflohhn rghthhgrnhcuvegrlhhmvghlshcuoehjtggrlhhmvghlshesfeiggidtrdhnvghtqeenuc ggtffrrghtthgvrhhnpeeiueeukeeitddvheeiveeiiedvhfeljeeiteeggedtteeiueet iefhudfgvddvleenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpehjtggrlhhmvghlshesfeiggidtrdhnvght X-ME-Proxy: Feedback-ID: i76614979:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 16 May 2024 05:21:31 -0400 (EDT) From: Jonathan Calmels To: brauner@kernel.org, ebiederm@xmission.com, Luis Chamberlain , Kees Cook , Joel Granados , Serge Hallyn , Paul Moore , James Morris , David Howells , Jarkko Sakkinen Cc: containers@lists.linux.dev, Jonathan Calmels , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, keyrings@vger.kernel.org Subject: [PATCH 2/3] capabilities: add securebit for strict userns caps Date: Thu, 16 May 2024 02:22:04 -0700 Message-ID: <20240516092213.6799-3-jcalmels@3xx0.net> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240516092213.6799-1-jcalmels@3xx0.net> References: <20240516092213.6799-1-jcalmels@3xx0.net> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This patch adds a new capability security bit designed to constrain a task’s userns capability set to its bounding set. The reason for this is twofold: - This serves as a quick and easy way to lock down a set of capabilities for a task, thus ensuring that any namespace it creates will never be more privileged than itself is. - This helps userspace transition to more secure defaults by not requiring specific logic for the userns capability set, or libcap support. Example: # capsh --secbits=$((1 << 8)) --drop=cap_sys_rawio -- \ -c 'unshare -r grep Cap /proc/self/status' CapInh: 0000000000000000 CapPrm: 000001fffffdffff CapEff: 000001fffffdffff CapBnd: 000001fffffdffff CapAmb: 0000000000000000 CapUNs: 000001fffffdffff Signed-off-by: Jonathan Calmels Reviewed-by: Serge Hallyn --- include/linux/securebits.h | 1 + include/uapi/linux/securebits.h | 11 ++++++++++- kernel/user_namespace.c | 5 +++++ 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/include/linux/securebits.h b/include/linux/securebits.h index 656528673983..5f9d85cd69c3 100644 --- a/include/linux/securebits.h +++ b/include/linux/securebits.h @@ -5,4 +5,5 @@ #include #define issecure(X) (issecure_mask(X) & current_cred_xxx(securebits)) +#define iscredsecure(cred, X) (issecure_mask(X) & cred->securebits) #endif /* !_LINUX_SECUREBITS_H */ diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h index d6d98877ff1a..2da3f4be4531 100644 --- a/include/uapi/linux/securebits.h +++ b/include/uapi/linux/securebits.h @@ -52,10 +52,19 @@ #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \ (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED)) +/* When set, user namespace capabilities are restricted to their parent's bounding set. */ +#define SECURE_USERNS_STRICT_CAPS 8 +#define SECURE_USERNS_STRICT_CAPS_LOCKED 9 /* make bit-8 immutable */ + +#define SECBIT_USERNS_STRICT_CAPS (issecure_mask(SECURE_USERNS_STRICT_CAPS)) +#define SECBIT_USERNS_STRICT_CAPS_LOCKED \ + (issecure_mask(SECURE_USERNS_STRICT_CAPS_LOCKED)) + #define SECURE_ALL_BITS (issecure_mask(SECURE_NOROOT) | \ issecure_mask(SECURE_NO_SETUID_FIXUP) | \ issecure_mask(SECURE_KEEP_CAPS) | \ - issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE)) + issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \ + issecure_mask(SECURE_USERNS_STRICT_CAPS)) #define SECURE_ALL_LOCKS (SECURE_ALL_BITS << 1) #endif /* _UAPI_LINUX_SECUREBITS_H */ diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 7e624607330b..53848e2b68cd 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -42,6 +43,10 @@ static void dec_user_namespaces(struct ucounts *ucounts) static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns) { + /* Limit userns capabilities to our parent's bounding set. */ + if (iscredsecure(cred, SECURE_USERNS_STRICT_CAPS)) + cred->cap_userns = cap_intersect(cred->cap_userns, cred->cap_bset); + /* Start with the capabilities defined in the userns set. */ cred->cap_bset = cred->cap_userns; cred->cap_permitted = cred->cap_userns; From patchwork Thu May 16 09:22:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Calmels X-Patchwork-Id: 13665926 Received: from flow7-smtp.messagingengine.com (flow7-smtp.messagingengine.com [103.168.172.142]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F277143C6F; Thu, 16 May 2024 09:21:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.142 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715851303; cv=none; b=DDqvDiSU/YZZVBhuxZvu20GFW7CPEVx40QJyQ3i9KgywhgsXkgT+nE9AV48xy1633tgYvjMzUJdnTCtL9SX5zGKEOtMhVXLfIM+1cOkv8zlBETfoMAIYZmHchCNDDFQ5tl6Gdob49si8hq3AUeblY+6Wpm3kAgI1RUq0MkhlAE4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715851303; c=relaxed/simple; bh=1H97K6DDCaTu5wDYD3cSAAS1gqyXZ1uuZ0c3//v/g1c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JehMs0DYc8kelipPuJ6NRaWZEBLG+qxIiFyt/lC5EkZQc+A+Z11oGrfPqKNadFVfSp1PqE1iUOHmkPDDBs8wY2WU9qDaU/qf3JCRC+kyUhgzLmsk4C8mJahApeCO8kilvfUDn7+M+C+a8cViWl2BXFdK0OGuEgMdTO+nHQDEAdI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net; spf=pass smtp.mailfrom=3xx0.net; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b=lm26UlGB; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=S5bgLvXi; arc=none smtp.client-ip=103.168.172.142 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=3xx0.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=3xx0.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=3xx0.net header.i=@3xx0.net header.b="lm26UlGB"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="S5bgLvXi" Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailflow.nyi.internal (Postfix) with ESMTP id AF4252005EF; Thu, 16 May 2024 05:21:39 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 16 May 2024 05:21:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=3xx0.net; h=cc :cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to; s=fm1; t=1715851299; x= 1715854899; bh=YH9uRR3GuhjkujujI5PhznoSSSyCvnA2D5Rmli2gZ7Q=; b=l m26UlGBhBiCx888VFeVOM8jT9aNAxyzXZPXIUO/EXzIcak0AEL4Ao4UTtM4YeKQ4 BzT7attEDfga4PAJ4Oy15kdsKCXPgNMkkAkbkKJY7xhIw9x3/P55xamNgkRh8HPu Ad0D5hPdRsXzSayLcDAPMQQEOqJ+BkuXQncl4VNRIjWEMVoSZNqPTGS7QkRbuNwi cDhPRwRGVm63IzReLKAOqG3imG2kzMXgt1TReNdm4K6QN3MUG+vxAvVKneVv1jTu 1ZzKJeelfeSp3psKfiSc1K2guqgFc+ziuIXtrRRM6DriPB7QyB7xbIVclO+x4tlP jbmohhCBuk6emLZy0X2AQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=i76614979.fm3; t= 1715851299; x=1715854899; bh=YH9uRR3GuhjkujujI5PhznoSSSyCvnA2D5R mli2gZ7Q=; b=S5bgLvXiwr8TAV15qprOcH4koPv7SPeHXu36AOyjHHLxnv2ppzZ WZIKwVa3RPUA2hT7LPVf/sOPyig+bJCudix5scNQyZRrSTnd6xszvqi4oI1R1hjb D13WkecJLif7pkUMzbZ2dfYguGFOSmvs+NJwn/gpabQSOgLBdPhw0fl/JCF3Qnee xote0eSdwlQpN7UAf81k1XOtv5ErXRS9bRz1JX5aY9qiqSL+yleuZIPJiKbcJTqC XAMZFgjnadXCgtKZ+dUeUUF8aMQysNQqvT9dij0lu/Fq/Q6a6W7/25Gm25/lrX3k Iqz4PQXLQWsy2ZdB7IcTrrbnKauNEqUMPng== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdehuddgudefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomheplfhonhgr thhhrghnucevrghlmhgvlhhsuceojhgtrghlmhgvlhhsseefgiigtddrnhgvtheqnecugg ftrfgrthhtvghrnheptdejhfelheejfeeutdekgeevueetkedtgfelkeejgfffhefgveet teffueegvdeknecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrh homhepjhgtrghlmhgvlhhsseefgiigtddrnhgvth X-ME-Proxy: Feedback-ID: i76614979:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 16 May 2024 05:21:37 -0400 (EDT) From: Jonathan Calmels To: brauner@kernel.org, ebiederm@xmission.com, Luis Chamberlain , Kees Cook , Joel Granados , Serge Hallyn , Paul Moore , James Morris , David Howells , Jarkko Sakkinen Cc: containers@lists.linux.dev, Jonathan Calmels , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, keyrings@vger.kernel.org Subject: [PATCH 3/3] capabilities: add cap userns sysctl mask Date: Thu, 16 May 2024 02:22:05 -0700 Message-ID: <20240516092213.6799-4-jcalmels@3xx0.net> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240516092213.6799-1-jcalmels@3xx0.net> References: <20240516092213.6799-1-jcalmels@3xx0.net> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This patch adds a new system-wide userns capability mask designed to mask off capabilities in user namespaces. This mask is controlled through a sysctl and can be set early in the boot process or on the kernel command line to exclude known capabilities from ever being gained in namespaces. Once set, it can be further restricted to exert dynamic policies on the system (e.g. ward off a potential exploit). Changing this mask requires privileges over CAP_SYS_ADMIN and CAP_SETPCAP in the initial user namespace. Example: # sysctl -qw kernel.cap_userns_mask=0x1fffffdffff && \ unshare -r grep Cap /proc/self/status CapInh: 0000000000000000 CapPrm: 000001fffffdffff CapEff: 000001fffffdffff CapBnd: 000001fffffdffff CapAmb: 0000000000000000 CapUNs: 000001fffffdffff Signed-off-by: Jonathan Calmels Reviewed-by: Serge Hallyn --- include/linux/user_namespace.h | 7 ++++ kernel/sysctl.c | 10 ++++++ kernel/user_namespace.c | 66 ++++++++++++++++++++++++++++++++++ 3 files changed, 83 insertions(+) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 6030a8235617..e3478bd54ee5 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -2,6 +2,7 @@ #ifndef _LINUX_USER_NAMESPACE_H #define _LINUX_USER_NAMESPACE_H +#include #include #include #include @@ -14,6 +15,12 @@ #define UID_GID_MAP_MAX_BASE_EXTENTS 5 #define UID_GID_MAP_MAX_EXTENTS 340 +#ifdef CONFIG_SYSCTL +extern kernel_cap_t cap_userns_mask; +int proc_cap_userns_handler(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos); +#endif + struct uid_gid_extent { u32 first; u32 lower_first; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 81cc974913bb..1546eebd6aea 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -62,6 +62,7 @@ #include #include #include +#include #include #include "../lib/kstrtox.h" @@ -1846,6 +1847,15 @@ static struct ctl_table kern_table[] = { .mode = 0444, .proc_handler = proc_dointvec, }, +#ifdef CONFIG_USER_NS + { + .procname = "cap_userns_mask", + .data = &cap_userns_mask, + .maxlen = sizeof(kernel_cap_t), + .mode = 0644, + .proc_handler = proc_cap_userns_handler, + }, +#endif #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) { .procname = "unknown_nmi_panic", diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 53848e2b68cd..e0cf606e9140 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -26,6 +26,66 @@ static struct kmem_cache *user_ns_cachep __ro_after_init; static DEFINE_MUTEX(userns_state_mutex); +#ifdef CONFIG_SYSCTL +static DEFINE_SPINLOCK(cap_userns_lock); +kernel_cap_t cap_userns_mask = CAP_FULL_SET; + +int proc_cap_userns_handler(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + struct ctl_table t; + unsigned long mask_array[2]; + kernel_cap_t new_mask, *mask; + int err; + + if (write && (!capable(CAP_SETPCAP) || + !capable(CAP_SYS_ADMIN))) + return -EPERM; + + /* + * convert from the global kernel_cap_t to the ulong array to print to + * userspace if this is a read. + * + * capabilities are exposed as one 64-bit value or two 32-bit values + * depending on the architecture + */ + mask = table->data; + spin_lock(&cap_userns_lock); + mask_array[0] = (unsigned long) mask->val; +#if BITS_PER_LONG != 64 + mask_array[1] = mask->val >> BITS_PER_LONG; +#endif + spin_unlock(&cap_userns_lock); + + t = *table; + t.data = &mask_array; + + /* + * actually read or write and array of ulongs from userspace. Remember + * these are least significant bits first + */ + err = proc_doulongvec_minmax(&t, write, buffer, lenp, ppos); + if (err < 0) + return err; + + new_mask.val = mask_array[0]; +#if BITS_PER_LONG != 64 + new_mask.val += (u64)mask_array[1] << BITS_PER_LONG; +#endif + + /* + * Drop everything not in the new_mask (but don't add things) + */ + if (write) { + spin_lock(&cap_userns_lock); + *mask = cap_intersect(*mask, new_mask); + spin_unlock(&cap_userns_lock); + } + + return 0; +} +#endif + static bool new_idmap_permitted(const struct file *file, struct user_namespace *ns, int cap_setid, struct uid_gid_map *map); @@ -46,6 +106,12 @@ static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns) /* Limit userns capabilities to our parent's bounding set. */ if (iscredsecure(cred, SECURE_USERNS_STRICT_CAPS)) cred->cap_userns = cap_intersect(cred->cap_userns, cred->cap_bset); +#ifdef CONFIG_SYSCTL + /* Mask off userns capabilities that are not permitted by the system-wide mask. */ + spin_lock(&cap_userns_lock); + cred->cap_userns = cap_intersect(cred->cap_userns, cap_userns_mask); + spin_unlock(&cap_userns_lock); +#endif /* Start with the capabilities defined in the userns set. */ cred->cap_bset = cred->cap_userns;