From patchwork Thu May 20 08:08:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yutian Yang X-Patchwork-Id: 12269393 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1CA2C433B4 for ; Thu, 20 May 2021 08:11:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7F98760200 for ; Thu, 20 May 2021 08:11:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7F98760200 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F1A0D8D0002; Thu, 20 May 2021 04:11:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EA30E6B0074; Thu, 20 May 2021 04:11:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD4238D0002; Thu, 20 May 2021 04:11:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180]) by kanga.kvack.org (Postfix) with ESMTP id 9746D6B0073 for ; Thu, 20 May 2021 04:11:52 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3659FAF9B for ; Thu, 20 May 2021 08:11:52 +0000 (UTC) X-FDA: 78160890864.23.5F733AB Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf18.hostedemail.com (Postfix) with ESMTP id 9569620007FD for ; Thu, 20 May 2021 08:11:50 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id n6-20020a17090ac686b029015d2f7aeea8so5022681pjt.1 for ; Thu, 20 May 2021 01:11:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=dKXXvvEdCPH+ptTNMVfkzPYklSo2kJNNdVAijt8am50=; b=bSBv8DPvXGB6vKZs7INGAbnQThj5J60bW/4UZEusYvMVqSrlHpOmp8KgtEXN8nr1YJ M8psiSgHzbh/qnoEijRaK89rimo0pOcaItrcZAaW9xdUBk8fC9FWy3TklLk66gjrFWdF bxXMSz9OoyzsjDbf6tjj3KzxY4i1B6E6Qova2egNUABDWK0UDMijXbkPEsiT4/tPWeZJ blaauM7ADJMjZCDvacAykSPpoaSe4hXt+XTmTGtIz8qE09+uCvp0urfp2/98+8zLLUcm 8Hm/3eo7aE9EYzFB93p3KOs6G5SltSzQrJ15E9h65/cgVE76Y8lr+YrwiPMQhvhHky9c KSlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=dKXXvvEdCPH+ptTNMVfkzPYklSo2kJNNdVAijt8am50=; b=Ad2a1WMIoMXyU5vTsi3TD93CKAAY78Qw9uuM/q6MAGHK0SYZCURqeb3QyMYcp13HPM 7Uqqs54xV2Y53HJTNUfo9EXxvcyPOhc1qGyFXFgmVNtrcGrFX9I1lKbtJsLIWT6jB1Bk iz5gCBpa76QaPWfwrsDb/KxRcGb3BgjXS8xtYayg4s6pQsFJbRkuaSPTxdzXNov0wRnz lzyCXrpsQGjriDXUfrD4Qow13XKB2fOM8ujw6MKk4xdcSSSsC7W/qG0eOjgALXhkpWtY PFZIXLU98OtkWXB4vwNPWCSad+H51g1FgmTC7S4HQbG/RmO8oGoe88eUVwiYdoYvUxHV h4JQ== X-Gm-Message-State: AOAM533+NrHwju3oopEwQBPxfUFUWUT3RPp3bMIf+4H5tOACn3Qo1JxN /B0bpyt/OJEDmpXOm32iwsE= X-Google-Smtp-Source: ABdhPJxze/dA5YjPAlUiDARVnP2gEp4Jb6NsZnKWnW8zCEwdTnMH8tdkAumg4sqvpjY3ke+dFwd/Qg== X-Received: by 2002:a17:902:e812:b029:f0:aa50:2f1d with SMTP id u18-20020a170902e812b02900f0aa502f1dmr4391264plg.79.1621498310783; Thu, 20 May 2021 01:11:50 -0700 (PDT) Received: from localhost.localdomain ([27.102.114.24]) by smtp.gmail.com with ESMTPSA id t14sm1242380pfg.168.2021.05.20.01.11.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 May 2021 01:11:50 -0700 (PDT) From: Yutian Yang To: mhocko@kernel.org Cc: hannes@cmpxchg.org, vdavydov.dev@gmail.com, shenwenbo@zju.edu.cn, cgroups@vger.kernel.org, linux-mm@kvack.org, ytyang@zju.edu.cn, mhocko@suse.com, Yutian Yang Subject: [PATCH] mm: fix unaccounted time namespace objects Date: Thu, 20 May 2021 17:08:58 +0900 Message-Id: <20210520080858.25450-1-nglaive@gmail.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-Rspamd-Queue-Id: 9569620007FD Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=bSBv8DPv; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of nglaive@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=nglaive@gmail.com X-Rspamd-Server: rspam03 X-Stat-Signature: h3516mrowjxmd7fria5m156mbuueexmm X-HE-Tag: 1621498310-235589 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds memcg accounting for time namespace objects, as we have confirmed that unaccounted namespace objects could lead to breaking memcg limit. For common concerns on this patch, we have the following response: For the practicality of our concerns, we have confirmed that repeatedly creating new namespaces could lead to breaking memcg limit. Although the number of namespaces could be limited by per-user quota (e.g., max_time_namespaces), depending on per-user quota to limit memory usage is unsafe and impractical as users may have their own considerations when setting these limits. In fact, limitation on memory usage is more foundamental than limitation on various kernel objects. I believe this is also the reason why the fd tables and pipe buffers have been accounted by memcg even if they are also under per-user quota's limitation. The same reason applies to limitation of pid cgroups. Moreover, both net and uts namespaces are properly accounted while the others are not, which shows inconsistencies. For other unaccounted allocations (proc_alloc_inum, vvar_page and likely others), we have not reached them yet as our detecting tool reported many results which require much manual effort to go through. To me, it seems that vvar_page also need patches. Lastly, our work is based on a detecting tool and we only report missing-charging sites that are manually confirmed to be triggerable from syscalls. The results that are obviously unexploitable like uncharged ldt_struct, which is allocated per process, are also filtered out. We would like to continuously contribute to memcg and we are planning to submit more patches in the future. I have reported the patch but I have not added it to the public mailing list then. Consequently,I switch to a new thread and copy our previous discussions below: > -----Original Messages----- > From: "Michal Hocko" > Sent Time: 2021-04-16 14:29:52 (Friday) > To: "Yutian Yang" > Cc: tglx@linutronix.de, "shenwenbo@zju.edu.cn" , "vdavydov.dev@gmail.com" > Subject: Re: User-controllable memcg-unaccounted objects of time namespace > > Thank you for this and other reports which are trying to track memcg > unaccounted objects. I have few remarks/questions. > > > On Thu 15-04-21 21:29:57, Yutian Yang wrote: > > Hi, our team has found bugs in time namespace module on Linux kernel v5.10.19, which leads to user-controllable memcg-unaccounted objects. > > They are caused by the code snippets listed below: > > > > /*--------------- kernel/time/namespace.c --------------------*/ > > ...... > > 91ns = kmalloc(sizeof(*ns), GFP_KERNEL); > > 92if (!ns) > > 93goto fail_dec; > > ...... > > /*----------------------------- end -------------------------------*/ > > > > > > The code at line 91 could be triggered by syscall clone if > > CLONE_NEWTIME flag is set in the parameter. A user could repeatedly > > make the clone syscall and trigger the bugs to occupy more and > > more unaccounted memory. In fact, time namespaces objects could be > > allocated by users and are also controllable by users. As a result, > > they need to be accounted and we suggest the following patch: > > Is this a practical concern? I am not really deeply familiar with > namespaces but isn't there any cap on how many of them can be created by > user? If not, isn't that contained by the pid cgroup controller? If even > that is not the case, care to explain why? > > You are referring to struct time_namespace above (that is 88B) but I can > see there are other unaccounted allocations (proc_alloc_inum, vvar_page > and likely others) so why the above is more important than those? > > Btw. a similar feedback applies to other reports similar to this one. I > assume you have some sort of tool to explore those potential run aways > and that is really great but it would be really helpful and highly > appreciated to analyze those reports and try to provide some sort of > risk assessment. > > Thanks! > -- > Michal Hocko > SUSE Labs Thanks! Yutian Yang, Zhejiang University Signed-off-by: Yutian Yang --- kernel/time/namespace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c index afc65e6be..00c20f7fd 100644 --- a/kernel/time/namespace.c +++ b/kernel/time/namespace.c @@ -88,13 +88,13 @@ static struct time_namespace *clone_time_ns(struct user_namespace *user_ns, goto fail; err = -ENOMEM; - ns = kmalloc(sizeof(*ns), GFP_KERNEL); + ns = kmalloc(sizeof(*ns), GFP_KERNEL_ACCOUNT); if (!ns) goto fail_dec; kref_init(&ns->kref); - ns->vvar_page = alloc_page(GFP_KERNEL | __GFP_ZERO); + ns->vvar_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!ns->vvar_page) goto fail_free;