From patchwork Wed Jan 3 07:26:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh Bandewar X-Patchwork-Id: 10141775 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6A1216034B for ; Wed, 3 Jan 2018 07:27:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 53E3828F7D for ; Wed, 3 Jan 2018 07:27:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 48A9328F8D; Wed, 3 Jan 2018 07:27:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.wl.linuxfoundation.org (Postfix) with SMTP id 22C5328F7D for ; Wed, 3 Jan 2018 07:27:15 +0000 (UTC) Received: (qmail 32364 invoked by uid 550); 3 Jan 2018 07:27:10 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Delivered-To: mailing list kernel-hardening@lists.openwall.com Received: (qmail 32207 invoked from network); 3 Jan 2018 07:27:08 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bandewar-net.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=ig3uzXyEJgQkJScKAafT0l58hZ02U3W/JKWvZBZjEyo=; b=DYdzs4sk245gezUmZJz3wkr85JtecMk+d2yArDDv7cqFiHgiyQ9BXeQ+Zrx+CDAGQC fM45csGNi+QCIhRO1kcfDhitR8MpsuxoV80ypUnyqHntPMBMy+JEvnc4gpw107rw0Zp6 95IF8jmYdfhG4qfAJZcRXI8PzBOgt6hJW61AsLAA/AUYNHwTnrNLebuYiF5pZxcOLOOr OcZkSn5n0tEZU5fRo1UtMAbaGmbdOyyBvE7rXWmdSSnqGd4HQ7xUp5ihOL3dhEZn2Mdb sW40i7cPsSjA27OIcQRKX08Lq+U/bVWrLOdH++vPqGcqjjusjV5mQAHgMkpO0Rkn0Bbe Z/5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=ig3uzXyEJgQkJScKAafT0l58hZ02U3W/JKWvZBZjEyo=; b=eS1KhDq9G5M74kVhc+uha7hxHk3tb7Hxhn+VxoY6ZKwgeLOvjYyD0Bj/2JU5cg9pZO 6pOQ1V6TziYbN63P0oXlLMbDRUxvPtedhTEF5JEDlcHygfTeWuDJmQYe0UrybBjI3UbN BEbk03gb6FyytfxzPPBL+jMuNiAMcY0D92LQgRiD6AcHSMMVXTTW/T40GRCISn5p1/Fc PrfSZwjNwFLOd7ziutqYCzJAcIncdiEj/VeUf32y8nMXoaj8FmSENYxZUNtT/Fz7TO6p gOgXYPq/NP/Q+ubNl7ZXhSBBgZ8PNZfcQ4J6ulzeGK3IpJvwUh+LzT09+EjYluQSwcX/ C6DQ== X-Gm-Message-State: AKGB3mIno7y3G8FL3DwB3VlJ3hnc1z0zUOcAs9MuchgnGJ2gh5etLWzS zurjS78BYtRjKeyxhiPG6RRehA== X-Google-Smtp-Source: ACJfBouDn63sMzNabeFbUUdnpLEEE5CmydvJFnpVyhmkYyAU2mkFpm8v5jM31MtUO7Yx+pgxn/vCAQ== X-Received: by 10.36.253.73 with SMTP id m70mr800339ith.49.1514964416631; Tue, 02 Jan 2018 23:26:56 -0800 (PST) From: Mahesh Bandewar To: LKML , James Morris Cc: Netdev , Kernel-hardening , Linux API , Linux Security , Serge Hallyn , Michael Kerrisk , Kees Cook , "Eric W . Biederman" , Eric Dumazet , David Miller , Mahesh Bandewar , Mahesh Bandewar Date: Tue, 2 Jan 2018 23:26:52 -0800 Message-Id: <20180103072652.161912-1-mahesh@bandewar.net> X-Mailer: git-send-email 2.15.1.620.gb9897f4670-goog Subject: [kernel-hardening] [PATCHv4 1/2] capability: introduce sysctl for controlled user-ns capability whitelist X-Virus-Scanned: ClamAV using ClamSMTP From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. Capability mask is stored in kernel as kernel_cap_t type (array of u32). This sysctl takes input as comma separated hex u32 words. For simplicity one could see this sysctl to operate on string inputs. However the value is not expected to change that often during the life of a kernel-boot. It makes more sense to use the widely available API instead of bringing another string manipulation for the purpose of making this simpler. The default value set (for kernel.controlled_userns_caps_whitelist) is CAP_FULL_SET indicating that no capability is controlled by default to maintain compatibility with the existing behavior of user-ns. Administrator will have to modify this sysctl to control any capability as such. e.g. to control CAP_NET_RAW the mask need to be changed like - # sysctl -q kernel.controlled_userns_caps_whitelist kernel.controlled_userns_caps_whitelist = 1f,ffffffff # sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff kernel.controlled_userns_caps_whitelist = 1f,ffffdfff For bit-to-mask conversion please check include/uapi/linux/capability.h file. Any capabilities that are not part of this mask will be controlled and will not be allowed to processes in controlled user-ns. In above example CAP_NET_RAW will not be available to controlled-user-namespaces. Acked-by: Serge Hallyn Signed-off-by: Mahesh Bandewar --- v4: commit message changes. v3: Added couple of comments as requested by Serge Hallyn v2: Rebase v1: Initial submission Documentation/sysctl/kernel.txt | 21 ++++++++++++++++++ include/linux/capability.h | 3 +++ kernel/capability.c | 47 +++++++++++++++++++++++++++++++++++++++++ kernel/sysctl.c | 5 +++++ 4 files changed, 76 insertions(+) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 694968c7523c..6aa1e087afee 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -25,6 +25,7 @@ show up in /proc/sys/kernel: - bootloader_version [ X86 only ] - callhome [ S390 only ] - cap_last_cap +- controlled_userns_caps_whitelist - core_pattern - core_pipe_limit - core_uses_pid @@ -187,6 +188,26 @@ CAP_LAST_CAP from the kernel. ============================================================== +controlled_userns_caps_whitelist + +Capability mask that is whitelisted for "controlled" user namespaces. +Any capability that is missing from this mask will not be allowed to +any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW +is not part of this mask, then processes running inside any controlled +userns's will not be allowed to perform action that needs CAP_NET_RAW +capability. However, processes that are attached to a parent user-ns +hierarchy that is *not* controlled and has CAP_NET_RAW can continue +performing those actions. User-namespaces are marked "controlled" at +the time of their creation based on the capabilities of the creator. +A process that does not have CAP_SYS_ADMIN will create user-namespaces +that are controlled. + +The value is expressed as two comma separated hex words (u32). This +sysctl is available in init-ns and users with CAP_SYS_ADMIN in init-ns +are allowed to make changes. + +============================================================== + core_pattern: core_pattern is used to specify a core dumpfile pattern name. diff --git a/include/linux/capability.h b/include/linux/capability.h index f640dcbc880c..7d79a4689625 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -14,6 +14,7 @@ #define _LINUX_CAPABILITY_H #include +#include #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 @@ -248,6 +249,8 @@ extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns); /* audit system wants to get cap info from files as well */ extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps); +int proc_douserns_caps_whitelist(struct ctl_table *table, int write, + void __user *buff, size_t *lenp, loff_t *ppos); extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size); diff --git a/kernel/capability.c b/kernel/capability.c index 1e1c0236f55b..4a859b7d4902 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -29,6 +29,8 @@ EXPORT_SYMBOL(__cap_empty_set); int file_caps_enabled = 1; +kernel_cap_t controlled_userns_caps_whitelist = CAP_FULL_SET; + static int __init file_caps_disable(char *str) { file_caps_enabled = 0; @@ -507,3 +509,48 @@ bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns) rcu_read_unlock(); return (ret == 0); } + +/* Controlled-userns capabilities routines */ +#ifdef CONFIG_SYSCTL +int proc_douserns_caps_whitelist(struct ctl_table *table, int write, + void __user *buff, size_t *lenp, loff_t *ppos) +{ + DECLARE_BITMAP(caps_bitmap, CAP_LAST_CAP); + struct ctl_table caps_table; + char tbuf[NAME_MAX]; + int ret; + + ret = bitmap_from_u32array(caps_bitmap, CAP_LAST_CAP, + controlled_userns_caps_whitelist.cap, + _KERNEL_CAPABILITY_U32S); + if (ret != CAP_LAST_CAP) + return -1; + + scnprintf(tbuf, NAME_MAX, "%*pb", CAP_LAST_CAP, caps_bitmap); + + caps_table.data = tbuf; + caps_table.maxlen = NAME_MAX; + caps_table.mode = table->mode; + ret = proc_dostring(&caps_table, write, buff, lenp, ppos); + if (ret) + return ret; + if (write) { + kernel_cap_t tmp; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + ret = bitmap_parse_user(buff, *lenp, caps_bitmap, CAP_LAST_CAP); + if (ret) + return ret; + + ret = bitmap_to_u32array(tmp.cap, _KERNEL_CAPABILITY_U32S, + caps_bitmap, CAP_LAST_CAP); + if (ret != CAP_LAST_CAP) + return -1; + + controlled_userns_caps_whitelist = tmp; + } + return 0; +} +#endif /* CONFIG_SYSCTL */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 557d46728577..759b6c286806 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1217,6 +1217,11 @@ static struct ctl_table kern_table[] = { .extra2 = &one, }, #endif + { + .procname = "controlled_userns_caps_whitelist", + .mode = 0644, + .proc_handler = proc_douserns_caps_whitelist, + }, { } };