Integrating Namespaces and Cgroups for Enhanced Resource Management

Dear Community,

We are a system security research team from multiple organizations.

Recently, we found some vulnerabilities/bugs in container runtimes (Docker, Kubernetes, Podman, etc.), discovering that the 'rolling updates' function is vulnerable to DoS attacks. Exploiting such a problem, the attacker can bypass the cgroup limit and exhaust the host memory.

The 'IPC=shareable' option enables multiple containers to share the same IPC namespace. One of these containers (container A) can constantly create IPC resources under its cgroup's memory restriction to communicate with other containers (container B) in the same IPC namespace. The memory usage of these IPC resources is counted by the cgroup of container A, and this count will be cleared if container A exits. However, these IPC resources are not released after the exit of container A but are destroyed with the end of this container's IPC namespace. This means that these IPC resources will continue to occupy memory if container B does not exit. In this case, attackers can start two containers in the 'IPC=shareable' mode and repetitively restart one of the containers to ignore the cgroup restriction. This restarted container can create lots of IPC resources until it exhausts the host’s memory. 

The same thing will happen when using the 'rolling updates' in Kubernetes. By modifying the ‘spec.containers[*].image’ field, Kubernetes will update a pod by creating a container with the new image to replace the old container. The replacement will reset the new container’s memory counts in cgroup but does not release the IPC resources allocated to the old container. An attacker can repeat the rolling update and IPC resources allocation to bypass cgroup restrictions, which will exhaust the host’s memory eventually. Our research reveals that popular container tools, including Docker, Podman, and Kubernetes, all involve this namespace-cgroup desynchronization vulnerability. 

We have reported those issues to Podman/Kubernetes/Docker. When we talk with them, we get a piece of information that this kind of vulnerability might be caused by the Linux kernel. The main insight is that the isolation offered by containers (leveraging Linux namespaces and cgroups) is achieved in a highly coordinated way. This foundation for container protection, however, has been shaken by the evolution of computing paradigms, particularly the emergence of serverless computing with strong demands for resource sharing across namespaces. Such sharing weakens the container's isolation model. We conduct a serious study on such risks, aiming at identifying their root causes and understanding their implications. 

Summary of Namespace-Cgroup Desynchronization Vulnerabilities

While individual containers maintain namespace and cgroup synchrony, shared namespaces disrupt this balance. Termination of a container does not dissolve shared namespaces, leaving allocated resources accessible to others, untracked by the destroyed cgroup.

Our Approach

Therefore, this patch would like to solve such namespace-cgroup desynchronization vulnerabilities by merging namespace-based resource management with cgroup-enforced restrictions. The primary objective is to address the potential desynchronization between namespaces and cgroups, particularly in multi-container environments where shared namespaces can evade cgroup constraints.

Key Highlights

Unified Resource Tracking: Introduces a unified balloon cgroup to oversee resources that have eluded cgroup limitations, ensuring consistent management across shared namespaces.

Enhanced Namespace-Cgroup Synchronization: Integrates namespace chains into the cgroup structure, tagging shared resources like memory and IPC objects to monitor bypassed restrictions.

Patch Implementation Details

1) Namespace-Cgroup Linkage and Resource Tagging: Extends cgroup structure to track namespaces associated with processes within a cgroup, ensuring resource accountability. Assigns cgroup tags to virtual resources, facilitating precise tracking and management.

To mitigate the vulnerabilities, this patch bridges the namespace and the cgroups by placing a cgroup tag on all the resources mentioned above. These tags identify which cgroup is responsible for charging and restricting these resources. Furthermore, this tag can be used to identify the residual resources which belongs to the exiting cgroup.

2) Balloon Cgroup Setup: Establishes a universal cgroup for residual resources, with configurable limits to prevent system-wide resource overflow.

To re-manage these residual resources. this patch creates a balloon cgroup specifically for limiting these residual resources. First, resources requested by each container are tagged with the cgroup of that container. When a cgroup is about to be destroyed, the patch will reclaim the resources and transfer all the resources and their records to the balloon cgroup. Later, when a container reuses these resources, the residual resources become owned by the container in need, are tagged and governed by the container’s cgroup, and are removed from the balloon cgroup.

3) Resource Reallocation: Transforms and reallocates residual resources to active containers, maintaining cgroup governance.

The resources that are shared between the containers through a sharing namespace are transformed to the balloon cgroup and tagged as residual resources when the container exits. When these residual resources are reused, they will be transformed into the cgroup of the user.
The balloon cgroup is equipped with a dedicated queue that meticulously tracks the residual resources associated with that namespace. Upon reaching full capacity, release all the resources in the queue with the highest volume of residual resources.

This patch not only strengthens the Linux kernel's resource management framework but also enhances security and efficiency in containerized environments. We expect the community to take heed of this issue and collaborate in enhancing the security of cgroups. 

Hope we can discuss about it.

Best regards!

Signed-off-by: StanPlatinum <liuwj0129@foxmail.com>
---
 drivers/base/core.c             |   3 +
 include/linux/cgroup-defs.h     |  36 +++++
 include/linux/cgroup.h          |   2 +
 include/linux/inetdevice.h      |   6 +-
 include/linux/ipc.h             |   3 +
 include/linux/ipc_namespace.h   |  15 ++
 include/linux/memcontrol.h      |  22 ++-
 include/linux/netdevice.h       |   4 +
 include/linux/pid_namespace.h   |   2 +
 include/linux/sem.h             |   3 +-
 include/net/neighbour.h         |   2 +
 include/net/net_namespace.h     |   1 +
 ipc/msg.c                       | 153 +++++++++++++++++-
 ipc/msgutil.c                   |  21 +++
 ipc/namespace.c                 | 106 +++++++++++-
 ipc/sem.c                       | 153 +++++++++++++++++-
 ipc/shm.c                       | 175 +++++++++++++++++++-
 ipc/util.c                      |   2 +-
 ipc/util.h                      |   2 +-
 kernel/cgroup/cgroup-internal.h |   2 +
 kernel/cgroup/cgroup-v1.c       |   2 +-
 kernel/cgroup/cgroup.c          | 276 +++++++++++++++++++++++++++++++-
 kernel/cgroup/pids.c            |   3 +-
 kernel/exit.c                   |  15 ++
 kernel/pid_namespace.c          |  95 +++++++++++
 kernel/sysctl.c                 |   7 +
 mm/memcontrol.c                 | 216 ++++++++++++++++++++++++-
 mm/shmem.c                      |  70 +++++++-
 net/core/dev.c                  | 241 +++++++++++++++++++++++++++-
 net/core/neighbour.c            |  33 ++++
 net/core/net_namespace.c        |  75 +++++++++
 net/ipv4/devinet.c              |  35 ++++
 net/ipv6/addrconf.c             |  58 ++++++-
 33 files changed, 1814 insertions(+), 25 deletions(-)

Message ID	tencent_BFC5A388F2922E5FB6F3FE2E3A3662561809@qq.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68108C52D6F for <linux-mm@archiver.kernel.org>; Sat, 24 Aug 2024 13:23:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 49A02800E5; Sat, 24 Aug 2024 09:23:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 44875800D4; Sat, 24 Aug 2024 09:23:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24E14800E5; Sat, 24 Aug 2024 09:23:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EE6BF800D4 for <linux-mm@kvack.org>; Sat, 24 Aug 2024 09:23:52 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6DEBB1605DC for <linux-mm@kvack.org>; Sat, 24 Aug 2024 13:23:52 +0000 (UTC) X-FDA: 82487206704.16.6DE785A Received: from out162-62-63-194.mail.qq.com (out162-62-63-194.mail.qq.com [162.62.63.194]) by imf03.hostedemail.com (Postfix) with ESMTP id 5F63B20007 for <linux-mm@kvack.org>; Sat, 24 Aug 2024 13:23:48 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=foxmail.com header.s=s201512 header.b=YLYHBCNu; dmarc=pass (policy=none) header.from=foxmail.com; spf=pass (imf03.hostedemail.com: domain of liuwj0129@foxmail.com designates 162.62.63.194 as permitted sender) smtp.mailfrom=liuwj0129@foxmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724505737; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=EPdXf6JWpfcAy48quY7fg1pgDJ38eBUk/b2VgzdKTFg=; b=zNsorsgVk1SaVmvP5jAfc9P38LMPtRBRVlpFTy5TdWeCdrCCefBfFndMXoXk7SYsZOt+Be ktwHb55cA33KWBbMxFlvQRaxAdpnhHeZ8PqXQHpleIulbf7XbqFS5HYX74b7RxZdEI9SKO hoP8mSPM6x6fm56wFYaYurBPfaUepPA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724505737; a=rsa-sha256; cv=none; b=AzxVRf9wrW4wENf6bNoUZu2y0RkgYGiB/bSS4kic5O0wdLeQEp0btK+uxMTjS/WlSjT4ju to3gwcyVbu+F05yRyFihcpm78LJqlU3T77JbeHmN0qGzu67m98uYmGdsmi5HE0FlHidH6/ AbFUBoXY//1jCRZpzf10lqIVYdmUYN4= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=foxmail.com header.s=s201512 header.b=YLYHBCNu; dmarc=pass (policy=none) header.from=foxmail.com; spf=pass (imf03.hostedemail.com: domain of liuwj0129@foxmail.com designates 162.62.63.194 as permitted sender) smtp.mailfrom=liuwj0129@foxmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foxmail.com; s=s201512; t=1724505818; bh=EPdXf6JWpfcAy48quY7fg1pgDJ38eBUk/b2VgzdKTFg=; h=From:To:Cc:Subject:Date; b=YLYHBCNuwOdBNZJkUIHkmtKplEoVGMcy6fMg2AXq8CfZxxlnyqHYdBQOhJTAJS02e MFYOl/e+Xf1SFhhvBGXtalky9DmwgggmTMWiaHcXhRfMZVX51HT6FUG7J5OuTbfRew LaXuaRvU4oF2XxfvtVTXGrAsSvTuADi0JAnug/cU= Received: from liuweijie-Inspiron-5502.. ([36.106.210.43]) by newxmesmtplogicsvrszb9-0.qq.com (NewEsmtp) with SMTP id 44D04853; Sat, 24 Aug 2024 21:17:13 +0800 X-QQ-mid: xmsmtpt1724505433tkyqafeqk Message-ID: <tencent_BFC5A388F2922E5FB6F3FE2E3A3662561809@qq.com> X-QQ-XMAILINFO: OA3vIbg3sQetbjavtHLvNchskCknqnzigyYBvQWIoQHPRhroxCgMGIkNn5gxHo c0YGDtD6guKXtlRqyUof/cj3wN+SQGFoYElxydC7NnMIxVhxblNmIODlJp6E1qEtYwQMd8T0xXKK tRjNg6c8xB0fkF45kzN4TL4xp2dnJNIPe4Z9591fHm9MF1iVB1PLIFeR6V7c8Ysrl+fReMUe7yyZ f+js8MLewTzC8+9cH1rzvDKhgg78jsycKv+cHh27TdkAhEyCMSg5MHvFhym+dwOvtJ+jWp9uBFRR /TsyfFiYZwd3q7la5YEV/+BYxXMl/yXDCRNdgGbV2c0xBJnerDtKTwInfNEITcunvBfWSodyb0ip RhzU3lE40NvMEdk2JE3p/snOY1/33aHptgYup4ifNJfRqvJSytWUagnOQdi3zcyHGcaj8hZ2tkSF 9Ne0n9X3Yna703Fpy5yzy02lF4QfMJRQgNqdsJhNM19uyY/MbF/lzcb/ZdR81WQVPsKpc9R3dCZ6 1kp6Sb6iPIEcZ0snzFD8t454aiJhetPcLO8KeTR3Mxm6nzDvoszLpznuccvjvSXnMYUksxnMg5Dx bP7qYhziPs/duq7b+nr4T/Z5k0wTUSZFWB+JrTGmENiUmqIqWpw9hc9rxIVTfegy+fCiw3u7lZvk 62z8Nt4QRuGHK1gwXLZ2TcRQnepctBc+f5zptHjgRRMVBFd568aGJDPHS9fZswsL48u+b3DXF+7A b/G+VO1MIq8vsYlcYmhoEkh0FCTuJgJCEoAH5j+PNTzThC2yyKdl463c3y5pawTr9BpRrJA3GO1O L9hjbYJazJVXI4g30r8r7ioJIlbhzzBD1VkqldUJGZF5SUWaOgc+0TPzBaSCUYCroy5HdMLxTJUx lXUD766bv67DX/xpzIpNWG5fktg2Zf3G3bmW5pQjRUKpr1OzGQ21VIC2Z4kZXlCk0XlLWcIt/VeK r95ZgAG7YP8NRR04gZVaVp1UaL1T6iPOoWfJ4gHKJZZDtzMXQJYlVGvotrcVJRurilpTd4VlXy5s azmaFp8e2THGUcOP+ADf8FhTvNEf+rXrNEhsdbGQbMttwbSlpfnFjyU/e+smfZC5OluMPlomxh38 gOeQGfqgF/XwIAP7s= X-QQ-XMRINFO: M/715EihBoGSf6IYSX1iLFg= From: StanPlatinum <liuwj0129@foxmail.com> To: mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, weijieliu@nankai.edu.cn, lizhi16@hust.edu.cn, 15086729272@163.com, StanPlatinum <liuwj0129@foxmail.com> Subject: [PATCH] Integrating Namespaces and Cgroups for Enhanced Resource Management Date: Sat, 24 Aug 2024 21:17:11 +0800 X-OQ-MSGID: <20240824131711.60598-1-liuwj0129@foxmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5F63B20007 X-Stat-Signature: fu6mirj8jz1dcr3ssp1mua8oy6htzm94 X-Rspam-User: X-HE-Tag: 1724505828-691876 X-HE-Meta: U2FsdGVkX1/ALyleRW4pt5a6EbkZRsTYjVHRWEi3cjtfrpBw6o0lisd5TYpfflL3Ad3f8aZrhtykbR0WPe3oSIw5s4dsmWyjgoINXtzrc1nbs4HKsBCWDh4yol6i4YRaAYkDxiL6w55Nnca0ONToMTFVOuVWcHaWx9IfTV2cUOfFlKJQSL51viLi77+7/VaLInEAw0crs7OmhMQoMMVRD6qI+9msUsGuOROYfaZRKaTCsTbPWUij6wZX056XDwpSE36sCBdou9BCQBhv7xzJMrmzpkgezsnoYRYWTHpOYuWDgfJZ0h8MN878MnDdWJXXkprp7WM9tymsaS8sN4ft8LMTjqEUCDBPoJXt2PsxqUgbDDYD8PUEcVbHKF6q9i+rSAOEN/cYaye9zCx1WveT3iODGkudM8mv8fgfBuhu2gaZ7O9SIgkCkD25i1Tx8FcJmCyUhOm0cWCLpiFG8N22YJzB/diOLGtJL2yZwhO64UdxiYV7ScULCgKGVZVAR5jfhEEerx+FByCvuOx9wq6V+xVy0J+0AYnDHHneqnugiDeyeKNaxLfXFHaxFH3unPr6ArqZMJ2FFuD5hOt2fMEdRYd8zEH/2ICvfB8InojMzDivw9Yo9ilSYbVBfF6HpI8pENLm00DWZCNHWYS/o6p4rOa7Vh7ZIZMWsXERPwESqx+TuajIckSEcJbcoxlrA1fbUFsNS/+Z3U15eTSg0Y3nIDPMBRhgv+FE2j2wLR7KGC1wq/98uxQIhJBMDdbJnPdPX8nOO1PB1WTcGJJKrmyKV8FcNvm0EJV8OIxr6Rv3628QQUykmskp45qV33dfz7zWt+g3UvuGPZXLG9vtAbcMNQsCfzOJvPfab1eFs1Q5a64gtNhEOVkBIDyW0xv3qyE8ot5kuN0Zq/Tl8Py3c81XFJLR70ewogCBfZZv9KYwfoeUzxIsXx1w3uKdflnf06SpDwMIamtc9z/Mc0XzbWS Duj5k6JH mLDkT8c94IovuTFV0s8mDlDQ9eGuUKhECWM76I+j2dD5miF0A/m7EQtLdtasGq16hdNwXMTiMfnN8pbBqs+pn4kbV7wfLk0x2DRJc026JZALNLkE22yuxc1hbHtYbPhkmG5JGLVloLLNs5WnSnfFBGNebftp4v40A92/LRRwRQKRERAj68uToacg7qjABCsEUzeJ3rjfOcRxr+SMOB4qker10YMfm5boTDt82qS5BQcFMMW1pFDgwUXI8sNa7vLpqwsV/eWszN4bkmAqGl0UaJdEygtlEiJaxgn9WyMftXsqsB60eNYphkF4fT8Wolgt+BLnF+dr4cp0vkuA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: <linux-mm.kvack.org> List-Subscribe: <mailto:majordomo@kvack.org> List-Unsubscribe: <mailto:majordomo@kvack.org>
Series	Integrating Namespaces and Cgroups for Enhanced Resource Management \| expand Integrating Namespaces and Cgroups for Enhanced Resource Management

Integrating Namespaces and Cgroups for Enhanced Resource Management

Commit Message

Comments

Patch