From patchwork Wed Mar 3 05:59:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12112963 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E62AAC433DB for ; Wed, 3 Mar 2021 06:02:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 270FB64DF4 for ; Wed, 3 Mar 2021 06:02:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 270FB64DF4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AF9248D0126; Wed, 3 Mar 2021 01:02:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ACF0E8D00FA; Wed, 3 Mar 2021 01:02:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96F4C8D0126; Wed, 3 Mar 2021 01:02:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0246.hostedemail.com [216.40.44.246]) by kanga.kvack.org (Postfix) with ESMTP id 7ED1E8D00FA for ; Wed, 3 Mar 2021 01:02:01 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 40EF51814C3A5 for ; Wed, 3 Mar 2021 06:02:01 +0000 (UTC) X-FDA: 77877517242.06.36E5066 Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by imf24.hostedemail.com (Postfix) with ESMTP id B00F7A0009E9 for ; Wed, 3 Mar 2021 06:01:59 +0000 (UTC) Received: by mail-pj1-f46.google.com with SMTP id d13-20020a17090abf8db02900c0590648b1so2354400pjs.1 for ; Tue, 02 Mar 2021 22:02:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=F82I5iLp8QFjgHMvqeGaLr+bXfCpx3cOa12t/BoWkUc=; b=V4sz59O76soBeKy1MtglLuevbONhfDwYgKBjaM6jRAq5iGUyXxuhFVAubUttj4Qaq2 eGQX8LQEEdLLnXDwgCwkwozmk5FnEcYk4fbUYIn1gPCBwWEQLJoGhaVfm/GpyBQWCFHL QMd45GCGTuAJfq5IyUuyUlmJxISaYqTi6ry4Nv6eKBD7/fRMCJA4UUkyiK16h6HNmq0z QPwHG4yKBTbj6BVvkcmRZcXBQ3mKRuKMf2ECWHcV/AcRModiP+2oaDmqXku4j4DwRZ0Y cF6TgUO2s+Fy+T5U1bwwwkeRHCaBUGolgLjyv4tit6qRB0qVLG/fp8tkJd8akHV7n6BC 2Ncg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=F82I5iLp8QFjgHMvqeGaLr+bXfCpx3cOa12t/BoWkUc=; b=o67erFpNU8LbfyP5MO+FlRlBWBpanMm+hEeWz8U9JS51kevv1ArfIoMe6EqHvrTm5p SoLKEAKgzh6mIqHgOwFf13Wco/GftRTmGUAkOF4Yf7k7QM76uQAdKJSFi69+rlU7UHTJ UVjONSfywy69J8w+AwRer5Wkq405m18aglAcqsX0YaExiRuQZKEbITz51BLivelNWQbk 6hxmvNllJBm9U4goHUDNiR1W1SXENOz6yNsrcMIpTk4dfOE6GacWz6Lh7kBxGMXG4Db0 kj0zGA7yPBzxbJSSUWGpKuNC7Ot/5rYTqNiIlf7tZYD0eL49hMAeKLouzpw/wstXHuK+ y10g== X-Gm-Message-State: AOAM531NH+cRnQBsOpqmfnR5AnnS0sqtIYwH/QhE+vPdCl+Fc4Mu16Ep KPUNzlHT5xd0vrMAu8Joy5/qGA== X-Google-Smtp-Source: ABdhPJxuvuQPCxWld98PGk2EfLaYsZ+wBxAcY5gKqkYYX2FVRttJA3tsDjGtOfGjpWzxBRS47UAmrA== X-Received: by 2002:a17:90b:3890:: with SMTP id mu16mr8063126pjb.9.1614751319812; Tue, 02 Mar 2021 22:01:59 -0800 (PST) Received: from localhost.localdomain ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id x11sm6131088pjh.0.2021.03.02.22.01.56 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Mar 2021 22:01:59 -0800 (PST) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, Muchun Song Subject: [PATCH v2 1/5] mm: memcontrol: introduce obj_cgroup_{un}charge_page Date: Wed, 3 Mar 2021 13:59:13 +0800 Message-Id: <20210303055917.66054-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210303055917.66054-1-songmuchun@bytedance.com> References: <20210303055917.66054-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: ewwaoqwsobhpqg1z9s4dgpuwuzmnam9p X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B00F7A0009E9 Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=mail-pj1-f46.google.com; client-ip=209.85.216.46 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614751319-931920 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We know that the unit of slab object charging is bytes, the unit of kmem page charging is PAGE_SIZE. If we want to reuse obj_cgroup APIs to charge the kmem pages, we should pass PAGE_SIZE (as third parameter) to obj_cgroup_charge(). Because the size is already PAGE_SIZE, we can skip touch the objcg stock. And obj_cgroup_{un}charge_page() are introduced to charge in units of page level. In the later patch, we also can reuse those two helpers to charge or uncharge a number of kernel pages to a object cgroup. This is just a code movement without any functional changes. Signed-off-by: Muchun Song --- mm/memcontrol.c | 46 +++++++++++++++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 845eec01ef9d..faae16def127 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3056,6 +3056,34 @@ static void memcg_free_cache_id(int id) ida_simple_remove(&memcg_cache_ida, id); } +static inline void obj_cgroup_uncharge_page(struct obj_cgroup *objcg, + unsigned int nr_pages) +{ + rcu_read_lock(); + __memcg_kmem_uncharge(obj_cgroup_memcg(objcg), nr_pages); + rcu_read_unlock(); +} + +static int obj_cgroup_charge_page(struct obj_cgroup *objcg, gfp_t gfp, + unsigned int nr_pages) +{ + struct mem_cgroup *memcg; + int ret; + + rcu_read_lock(); +retry: + memcg = obj_cgroup_memcg(objcg); + if (unlikely(!css_tryget(&memcg->css))) + goto retry; + rcu_read_unlock(); + + ret = __memcg_kmem_charge(memcg, gfp, nr_pages); + + css_put(&memcg->css); + + return ret; +} + /** * __memcg_kmem_charge: charge a number of kernel pages to a memcg * @memcg: memory cgroup to charge @@ -3180,11 +3208,8 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock) unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT; unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1); - if (nr_pages) { - rcu_read_lock(); - __memcg_kmem_uncharge(obj_cgroup_memcg(old), nr_pages); - rcu_read_unlock(); - } + if (nr_pages) + obj_cgroup_uncharge_page(old, nr_pages); /* * The leftover is flushed to the centralized per-memcg value. @@ -3242,7 +3267,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) { - struct mem_cgroup *memcg; unsigned int nr_pages, nr_bytes; int ret; @@ -3259,24 +3283,16 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) * refill_obj_stock(), called from this function or * independently later. */ - rcu_read_lock(); -retry: - memcg = obj_cgroup_memcg(objcg); - if (unlikely(!css_tryget(&memcg->css))) - goto retry; - rcu_read_unlock(); - nr_pages = size >> PAGE_SHIFT; nr_bytes = size & (PAGE_SIZE - 1); if (nr_bytes) nr_pages += 1; - ret = __memcg_kmem_charge(memcg, gfp, nr_pages); + ret = obj_cgroup_charge_page(objcg, gfp, nr_pages); if (!ret && nr_bytes) refill_obj_stock(objcg, PAGE_SIZE - nr_bytes); - css_put(&memcg->css); return ret; } From patchwork Wed Mar 3 05:59:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12112965 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 354E7C433E0 for ; Wed, 3 Mar 2021 06:02:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7CAA464DF4 for ; Wed, 3 Mar 2021 06:02:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7CAA464DF4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 098B38D00FA; Wed, 3 Mar 2021 01:02:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 06F728D0127; Wed, 3 Mar 2021 01:02:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E515D8D00FA; Wed, 3 Mar 2021 01:02:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0123.hostedemail.com [216.40.44.123]) by kanga.kvack.org (Postfix) with ESMTP id CEEBE8D0127 for ; Wed, 3 Mar 2021 01:02:05 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9486E814A for ; Wed, 3 Mar 2021 06:02:05 +0000 (UTC) X-FDA: 77877517410.29.9D6725A Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf20.hostedemail.com (Postfix) with ESMTP id EC1E5D6 for ; Wed, 3 Mar 2021 06:02:03 +0000 (UTC) Received: by mail-pl1-f169.google.com with SMTP id p5so13446040plo.4 for ; Tue, 02 Mar 2021 22:02:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=q5+aGjYHaG6LkZOIPA7CkmGtW7XLwWM/a6l8CvJ2fGk=; b=X8+tXZI0Wsn/l4Jj18Z4z0iyPKupM3Bm0aPndcer52lJMonJnnlt48a0SGVHncNBwK V1fKBk7VgHyqvwFSNVyG802EMDi1J6g3B1e7s+IF0Tc1cpPXZNBafsl3AjwCWM5HYx50 nOPlQJELWtGK4sazM2xNX2c9xIvojyeFtHDNatS/iXdWQGcsucr/Kfg+6eM8TSLVZn1I FVh0yBq0S9pgpqi2OSn1N+pGfRqGbqfTEwUAIZ5PcD+7f8cs4zCHqTdnGbAKYsp0sNGQ 8bTueIfyu7FiLaKO8yzKG8Ude01G67N1H6YTsBNix9iG+qsB0kRYILGQJLMvuCydILJR 9A3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=q5+aGjYHaG6LkZOIPA7CkmGtW7XLwWM/a6l8CvJ2fGk=; b=ks1KgpjeblRwpiueC6WBPZllKvEXUgbPB8nEy4dPBVZJk9gTU3HQBpWSH5oDh3PTzc 7XcPAmMxnRHLECFCJ0Zx4uj54gK0CxizlGmVyAIdBe7vNCjc1DAMu2TaEiN65E+tLf0c Mw0/durzBVGQI4WGpBs4769jDjRFpFW3DhEddm6wDef8ugS+dZhROVnBDIhI7q+764p9 BLA4aWOhIBEj/tF1KPM4P/2PSNgM3FgXzt55Btq8dx5N8oT1PgDLLpr52CJSjgI5hIGL YxMWFJETKydQhIOIF2dyK5y16khNAD3GWiwlBwZ9C0dz0vHBX8Qjc3BIvyIYBw36mqPJ nQBA== X-Gm-Message-State: AOAM531GBC6QmXoapcqnQ6f9Yfnn3nIA7j8NDWmuX5M5pJHYRb6QNjM/ qj4DhyJaqsNIxuz9urL4HWmNtw== X-Google-Smtp-Source: ABdhPJxz6DLT8GeVYZKMFyoN4BXAy4FjrRYUOZDBGXYJ6tVrV5NnVfsvpH0drfhnrwuCFuolXkOwzQ== X-Received: by 2002:a17:902:dacb:b029:e5:b538:9ce6 with SMTP id q11-20020a170902dacbb02900e5b5389ce6mr6291099plx.8.1614751323966; Tue, 02 Mar 2021 22:02:03 -0800 (PST) Received: from localhost.localdomain ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id x11sm6131088pjh.0.2021.03.02.22.02.00 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Mar 2021 22:02:03 -0800 (PST) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, Muchun Song Subject: [PATCH v2 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Date: Wed, 3 Mar 2021 13:59:14 +0800 Message-Id: <20210303055917.66054-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210303055917.66054-1-songmuchun@bytedance.com> References: <20210303055917.66054-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: e6bawnauyj1ok1w6aouaxcy6nry4hp3k X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: EC1E5D6 Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from=""; helo=mail-pl1-f169.google.com; client-ip=209.85.214.169 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614751323-104420 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We want to reuse the obj_cgroup APIs to charge the kmem pages when If we do that, we should store an object cgroup pointer to page->memcg_data for the kmem pages. Finally, page->memcg_data can have 3 different meanings. 1) For the slab pages, page->memcg_data points to an object cgroups vector. 2) For the kmem pages (exclude the slab pages), page->memcg_data points to an object cgroup. 3) For the user pages (e.g. the LRU pages), page->memcg_data points to a memory cgroup. Currently we always get the memory cgroup associated with a page via page_memcg() or page_memcg_rcu(). page_memcg_check() is special, it has to be used in cases when it's not known if a page has an associated memory cgroup pointer or an object cgroups vector. Because the page->memcg_data of the kmem page is not pointing to a memory cgroup in the later patch, the page_memcg() and page_memcg_rcu() cannot be applicable for the kmem pages. In this patch, make page_memcg() and page_memcg_rcu() no longer apply to the kmem pages. We do not change the behavior of the page_memcg_check(), it is also applicable for the kmem pages. In the end, there are 3 helpers to get the memcg associated with a page. Usage is as follows. 1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU pages). - page_memcg() - page_memcg_rcu() 2) Get the memory cgroup associated with a page. It has to be used in cases when it's not known if a page has an associated memory cgroup pointer or an object cgroups vector. Returns NULL for slab pages or uncharged pages. Otherwise, returns memory cgroup for charged pages (e.g. the kmem pages, the LRU pages). - page_memcg_check() In some place, we use page_memcg() to check whether the page is charged. Now introduce page_memcg_charged() helper to do that. This is a preparation for reparenting the kmem pages. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 36 ++++++++++++++++++++++++++++-------- mm/memcontrol.c | 23 +++++++++++++---------- mm/page_alloc.c | 4 ++-- 3 files changed, 43 insertions(+), 20 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e6dc793d587d..049b80246cbf 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -358,14 +358,26 @@ enum page_memcg_data_flags { #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1) +/* Return true for charged page, otherwise false. */ +static inline bool page_memcg_charged(struct page *page) +{ + unsigned long memcg_data = page->memcg_data; + + VM_BUG_ON_PAGE(PageSlab(page), page); + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); + + return !!memcg_data; +} + /* - * page_memcg - get the memory cgroup associated with a page + * page_memcg - get the memory cgroup associated with a non-kmem page * @page: a pointer to the page struct * * Returns a pointer to the memory cgroup associated with the page, * or NULL. This function assumes that the page is known to have a * proper memory cgroup pointer. It's not safe to call this function - * against some type of pages, e.g. slab pages or ex-slab pages. + * against some type of pages, e.g. slab pages, kmem pages or ex-slab + * pages. * * Any of the following ensures page and memcg binding stability: * - the page lock @@ -378,27 +390,30 @@ static inline struct mem_cgroup *page_memcg(struct page *page) unsigned long memcg_data = page->memcg_data; VM_BUG_ON_PAGE(PageSlab(page), page); - VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_FLAGS_MASK, page); - return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return (struct mem_cgroup *)memcg_data; } /* - * page_memcg_rcu - locklessly get the memory cgroup associated with a page + * page_memcg_rcu - locklessly get the memory cgroup associated with a non-kmem page * @page: a pointer to the page struct * * Returns a pointer to the memory cgroup associated with the page, * or NULL. This function assumes that the page is known to have a * proper memory cgroup pointer. It's not safe to call this function - * against some type of pages, e.g. slab pages or ex-slab pages. + * against some type of pages, e.g. slab pages, kmem pages or ex-slab + * pages. */ static inline struct mem_cgroup *page_memcg_rcu(struct page *page) { + unsigned long memcg_data = READ_ONCE(page->memcg_data); + VM_BUG_ON_PAGE(PageSlab(page), page); + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_FLAGS_MASK, page); WARN_ON_ONCE(!rcu_read_lock_held()); - return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) & - ~MEMCG_DATA_FLAGS_MASK); + return (struct mem_cgroup *)memcg_data; } /* @@ -1072,6 +1087,11 @@ void mem_cgroup_split_huge_fixup(struct page *head); struct mem_cgroup; +static inline bool page_memcg_charged(struct page *page) +{ + return false; +} + static inline struct mem_cgroup *page_memcg(struct page *page) { return NULL; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index faae16def127..86a8db937ec6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -855,10 +855,11 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx, int val) { struct page *head = compound_head(page); /* rmap on tail pages */ - struct mem_cgroup *memcg = page_memcg(head); + struct mem_cgroup *memcg; pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; + memcg = page_memcg_check(head); /* Untracked pages have no memcg, no lruvec. Update only the node */ if (!memcg) { __mod_node_page_state(pgdat, idx, val); @@ -3166,12 +3167,13 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) */ void __memcg_kmem_uncharge_page(struct page *page, int order) { - struct mem_cgroup *memcg = page_memcg(page); + struct mem_cgroup *memcg; unsigned int nr_pages = 1 << order; - if (!memcg) + if (!page_memcg_charged(page)) return; + memcg = page_memcg_check(page); VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page); __memcg_kmem_uncharge(memcg, nr_pages); page->memcg_data = 0; @@ -6827,24 +6829,25 @@ static void uncharge_batch(const struct uncharge_gather *ug) static void uncharge_page(struct page *page, struct uncharge_gather *ug) { unsigned long nr_pages; + struct mem_cgroup *memcg; VM_BUG_ON_PAGE(PageLRU(page), page); - if (!page_memcg(page)) + if (!page_memcg_charged(page)) return; /* * Nobody should be changing or seriously looking at - * page_memcg(page) at this point, we have fully - * exclusive access to the page. + * page memcg at this point, we have fully exclusive + * access to the page. */ - - if (ug->memcg != page_memcg(page)) { + memcg = page_memcg_check(page); + if (ug->memcg != memcg) { if (ug->memcg) { uncharge_batch(ug); uncharge_gather_clear(ug); } - ug->memcg = page_memcg(page); + ug->memcg = memcg; /* pairs with css_put in uncharge_batch */ css_get(&ug->memcg->css); @@ -6877,7 +6880,7 @@ void mem_cgroup_uncharge(struct page *page) return; /* Don't touch page->lru of any random page, pre-check: */ - if (!page_memcg(page)) + if (!page_memcg_charged(page)) return; uncharge_gather_clear(&ug); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f10966e3b4a5..bcb58ae15e24 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1124,7 +1124,7 @@ static inline bool page_expected_state(struct page *page, if (unlikely((unsigned long)page->mapping | page_ref_count(page) | #ifdef CONFIG_MEMCG - (unsigned long)page_memcg(page) | + page_memcg_charged(page) | #endif (page->flags & check_flags))) return false; @@ -1149,7 +1149,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags) bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set"; } #ifdef CONFIG_MEMCG - if (unlikely(page_memcg(page))) + if (unlikely(page_memcg_charged(page))) bad_reason = "page still charged to cgroup"; #endif return bad_reason; From patchwork Wed Mar 3 05:59:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12112967 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5759C433E0 for ; Wed, 3 Mar 2021 06:02:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0671A64DF4 for ; Wed, 3 Mar 2021 06:02:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0671A64DF4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 85F7D8D0128; Wed, 3 Mar 2021 01:02:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8374B8D0127; Wed, 3 Mar 2021 01:02:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 689E68D0128; Wed, 3 Mar 2021 01:02:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0240.hostedemail.com [216.40.44.240]) by kanga.kvack.org (Postfix) with ESMTP id 4916E8D0127 for ; Wed, 3 Mar 2021 01:02:10 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 0A91D181BCDC7 for ; Wed, 3 Mar 2021 06:02:10 +0000 (UTC) X-FDA: 77877517620.30.FEABF0A Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf28.hostedemail.com (Postfix) with ESMTP id A96AA2000381 for ; Wed, 3 Mar 2021 06:02:09 +0000 (UTC) Received: by mail-pj1-f53.google.com with SMTP id l18so3649556pji.3 for ; Tue, 02 Mar 2021 22:02:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=oplEqh5eV2GZ+vXN8Na+NKbtehy1/4Xwc42h7dUdrAs=; b=zwjWBgnKKdx3RtGLqjpG50StX1guMHqcNEGEkmnoPezr6dlSyKy1KHrlCKV2e5x+FM ZYfy+XWMsQ0MTGFzY6D9CfN7USKbaz5ulvFP4mOsbYmEAIWEvWUWRx9vLYbOR7tFEsNE 1aopf71zceWMqhcynUBb2GKAHxV+T8sbfm0VCReDzsNua8Qro2YBdWfZecPX/OahFS3e CMyAK08ESRbD3JlWZLMFAs4p14EQ8zGYGc0Zbs/fQ/bsAIxJ4pRx+UjTURoOWep0Mias BSkz6yamu9RZT8AFm8yVZu+dCwenY5U8zd+dfajOJcbzTTRiQVSo58DQD9Zxr7EnOja+ +8ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oplEqh5eV2GZ+vXN8Na+NKbtehy1/4Xwc42h7dUdrAs=; b=Xmv5JUOx6QmBsteUZalZ9deEnzWWUBO1c+JLK3jZy21gx+xQuynoKjosme120Bwshb MFF7mc43FJd3dlzQ9oxLl8itnVpdI+MczauNtLkvZt1h4tMGyYMKW/aZ9k2q/sp3FcZh Yj/OEdPd8t1HYIlcqduGIURBoi6oeXASMONCurgxZ8Z9d4IenCvhK9G+cJrAQUNqySvw BShYv/zisE8gYo17iuh3K3gwj8ph0Yw2pLzxqVaU5JSXwmZD2XG7qkWGTF5cSUpoYRAy zu9Mo9w8wBILhQVCJhQSXchVyE3gmk80gMPFsnycWq1B+j58DyM9qH2NzwU6ZUVisAGB 1vBw== X-Gm-Message-State: AOAM533y2h0aQ2dMhWxuQDF9fkHDCBNHjyEOLMJDmInDM7xgGHfVhXYh FkiwoqZEdj8GsAbpxUmAQXpiKw== X-Google-Smtp-Source: ABdhPJxaErlSOSFTBuKKDyN0xxNxqhWv8qVqBdvv08tIooh0PWvnlvd2ps190khBgEnpHSR08l+kpQ== X-Received: by 2002:a17:903:304e:b029:e5:d43:9415 with SMTP id u14-20020a170903304eb02900e50d439415mr6858937pla.42.1614751328502; Tue, 02 Mar 2021 22:02:08 -0800 (PST) Received: from localhost.localdomain ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id x11sm6131088pjh.0.2021.03.02.22.02.04 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Mar 2021 22:02:08 -0800 (PST) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, Muchun Song Subject: [PATCH v2 3/5] mm: memcontrol: charge kmem pages by using obj_cgroup APIs Date: Wed, 3 Mar 2021 13:59:15 +0800 Message-Id: <20210303055917.66054-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210303055917.66054-1-songmuchun@bytedance.com> References: <20210303055917.66054-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A96AA2000381 X-Stat-Signature: h43f36aj63u6ptf9jb488h1fu4wphfs1 Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf28; identity=mailfrom; envelope-from=""; helo=mail-pj1-f53.google.com; client-ip=209.85.216.53 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614751329-269411 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since Roman series "The new cgroup slab memory controller" applied. All slab objects are charged via the new APIs of obj_cgroup. The new APIs introduce a struct obj_cgroup to charge slab objects. It prevents long-living objects from pinning the original memory cgroup in the memory. But there are still some corner objects (e.g. allocations larger than order-1 page on SLUB) which are not charged via the new APIs. Those objects (include the pages which are allocated from buddy allocator directly) are charged as kmem pages which still hold a reference to the memory cgroup. This patch aims to charge the kmem pages by using the new APIs of obj_cgroup. Finally, the page->memcg_data of the kmem page points to an object cgroup. We can use the page_objcg() to get the object cgroup associated with a kmem page. Or we can use page_memcg_check() to get the memory cgroup associated with a kmem page, but caller must ensure that the returned memcg won't be released (e.g. acquire the rcu_read_lock or css_set_lock). Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 63 +++++++++++++++++------ mm/memcontrol.c | 123 +++++++++++++++++++++++++++++++-------------- 2 files changed, 133 insertions(+), 53 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 049b80246cbf..5911b9d107b0 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -370,6 +370,18 @@ static inline bool page_memcg_charged(struct page *page) } /* + * After the initialization objcg->memcg is always pointing at + * a valid memcg, but can be atomically swapped to the parent memcg. + * + * The caller must ensure that the returned memcg won't be released: + * e.g. acquire the rcu_read_lock or css_set_lock. + */ +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) +{ + return READ_ONCE(objcg->memcg); +} + +/* * page_memcg - get the memory cgroup associated with a non-kmem page * @page: a pointer to the page struct * @@ -421,9 +433,10 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page) * @page: a pointer to the page struct * * Returns a pointer to the memory cgroup associated with the page, - * or NULL. This function unlike page_memcg() can take any page + * or NULL. This function unlike page_memcg() can take any non-kmem page * as an argument. It has to be used in cases when it's not known if a page - * has an associated memory cgroup pointer or an object cgroups vector. + * has an associated memory cgroup pointer or an object cgroups vector or + * an object cgroup. * * Any of the following ensures page and memcg binding stability: * - the page lock @@ -442,6 +455,17 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page) if (memcg_data & MEMCG_DATA_OBJCGS) return NULL; + if (memcg_data & MEMCG_DATA_KMEM) { + struct obj_cgroup *objcg; + + /* + * The caller must ensure that the returned memcg won't be + * released: e.g. acquire the rcu_read_lock or css_set_lock. + */ + objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return obj_cgroup_memcg(objcg); + } + return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); } @@ -500,6 +524,24 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page) return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); } +/* + * page_objcg - get the object cgroup associated with a kmem page + * @page: a pointer to the page struct + * + * Returns a pointer to the object cgroup associated with the kmem page, + * or NULL. This function assumes that the page is known to have an + * associated object cgroup. It's only safe to call this function + * against kmem pages (PageMemcgKmem() returns true). + */ +static inline struct obj_cgroup *page_objcg(struct page *page) +{ + unsigned long memcg_data = page->memcg_data; + + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); + VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page); + + return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); +} #else static inline struct obj_cgroup **page_objcgs(struct page *page) { @@ -510,6 +552,11 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page) { return NULL; } + +static inline struct obj_cgroup *page_objcg(struct page *page) +{ + return NULL; +} #endif static __always_inline bool memcg_stat_item_in_bytes(int idx) @@ -728,18 +775,6 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg) percpu_ref_put(&objcg->refcnt); } -/* - * After the initialization objcg->memcg is always pointing at - * a valid memcg, but can be atomically swapped to the parent memcg. - * - * The caller must ensure that the returned memcg won't be released: - * e.g. acquire the rcu_read_lock or css_set_lock. - */ -static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) -{ - return READ_ONCE(objcg->memcg); -} - static inline void mem_cgroup_put(struct mem_cgroup *memcg) { if (memcg) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 86a8db937ec6..0cf342d22547 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -856,10 +856,16 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx, { struct page *head = compound_head(page); /* rmap on tail pages */ struct mem_cgroup *memcg; - pg_data_t *pgdat = page_pgdat(page); + pg_data_t *pgdat; struct lruvec *lruvec; - memcg = page_memcg_check(head); + if (PageMemcgKmem(head)) { + __mod_lruvec_kmem_state(page_to_virt(head), idx, val); + return; + } + + pgdat = page_pgdat(head); + memcg = page_memcg(head); /* Untracked pages have no memcg, no lruvec. Update only the node */ if (!memcg) { __mod_node_page_state(pgdat, idx, val); @@ -3144,18 +3150,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page */ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; int ret = 0; - memcg = get_mem_cgroup_from_current(); - if (memcg && !mem_cgroup_is_root(memcg)) { - ret = __memcg_kmem_charge(memcg, gfp, 1 << order); + objcg = get_obj_cgroup_from_current(); + if (objcg) { + ret = obj_cgroup_charge_page(objcg, gfp, 1 << order); if (!ret) { - page->memcg_data = (unsigned long)memcg | + page->memcg_data = (unsigned long)objcg | MEMCG_DATA_KMEM; return 0; } - css_put(&memcg->css); + obj_cgroup_put(objcg); } return ret; } @@ -3167,17 +3173,18 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) */ void __memcg_kmem_uncharge_page(struct page *page, int order) { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; unsigned int nr_pages = 1 << order; if (!page_memcg_charged(page)) return; - memcg = page_memcg_check(page); - VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page); - __memcg_kmem_uncharge(memcg, nr_pages); + VM_BUG_ON_PAGE(!PageMemcgKmem(page), page); + + objcg = page_objcg(page); + obj_cgroup_uncharge_page(objcg, nr_pages); page->memcg_data = 0; - css_put(&memcg->css); + obj_cgroup_put(objcg); } static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) @@ -6794,8 +6801,12 @@ struct uncharge_gather { struct mem_cgroup *memcg; unsigned long nr_pages; unsigned long pgpgout; - unsigned long nr_kmem; struct page *dummy_page; + +#ifdef CONFIG_MEMCG_KMEM + struct obj_cgroup *objcg; + unsigned long nr_kmem; +#endif }; static inline void uncharge_gather_clear(struct uncharge_gather *ug) @@ -6807,12 +6818,21 @@ static void uncharge_batch(const struct uncharge_gather *ug) { unsigned long flags; +#ifdef CONFIG_MEMCG_KMEM + if (ug->objcg) { + obj_cgroup_uncharge_page(ug->objcg, ug->nr_kmem); + /* drop reference from uncharge_kmem_page */ + obj_cgroup_put(ug->objcg); + } +#endif + + if (!ug->memcg) + return; + if (!mem_cgroup_is_root(ug->memcg)) { page_counter_uncharge(&ug->memcg->memory, ug->nr_pages); if (do_memsw_account()) page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages); - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) - page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); memcg_oom_recover(ug->memcg); } @@ -6822,26 +6842,40 @@ static void uncharge_batch(const struct uncharge_gather *ug) memcg_check_events(ug->memcg, ug->dummy_page); local_irq_restore(flags); - /* drop reference from uncharge_page */ + /* drop reference from uncharge_user_page */ css_put(&ug->memcg->css); } -static void uncharge_page(struct page *page, struct uncharge_gather *ug) +#ifdef CONFIG_MEMCG_KMEM +static void uncharge_kmem_page(struct page *page, struct uncharge_gather *ug) { - unsigned long nr_pages; - struct mem_cgroup *memcg; + struct obj_cgroup *objcg = page_objcg(page); - VM_BUG_ON_PAGE(PageLRU(page), page); + if (ug->objcg != objcg) { + if (ug->objcg) { + uncharge_batch(ug); + uncharge_gather_clear(ug); + } + ug->objcg = objcg; - if (!page_memcg_charged(page)) - return; + /* pairs with obj_cgroup_put in uncharge_batch */ + obj_cgroup_get(ug->objcg); + } + + ug->nr_kmem += compound_nr(page); + page->memcg_data = 0; + obj_cgroup_put(ug->objcg); +} +#else +static void uncharge_kmem_page(struct page *page, struct uncharge_gather *ug) +{ +} +#endif + +static void uncharge_user_page(struct page *page, struct uncharge_gather *ug) +{ + struct mem_cgroup *memcg = page_memcg(page); - /* - * Nobody should be changing or seriously looking at - * page memcg at this point, we have fully exclusive - * access to the page. - */ - memcg = page_memcg_check(page); if (ug->memcg != memcg) { if (ug->memcg) { uncharge_batch(ug); @@ -6852,18 +6886,30 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug) /* pairs with css_put in uncharge_batch */ css_get(&ug->memcg->css); } + ug->pgpgout++; + ug->dummy_page = page; + + ug->nr_pages += compound_nr(page); + page->memcg_data = 0; + css_put(&ug->memcg->css); +} - nr_pages = compound_nr(page); - ug->nr_pages += nr_pages; +static void uncharge_page(struct page *page, struct uncharge_gather *ug) +{ + VM_BUG_ON_PAGE(PageLRU(page), page); + if (!page_memcg_charged(page)) + return; + + /* + * Nobody should be changing or seriously looking at + * page memcg at this point, we have fully exclusive + * access to the page. + */ if (PageMemcgKmem(page)) - ug->nr_kmem += nr_pages; + uncharge_kmem_page(page, ug); else - ug->pgpgout++; - - ug->dummy_page = page; - page->memcg_data = 0; - css_put(&ug->memcg->css); + uncharge_user_page(page, ug); } /** @@ -6906,8 +6952,7 @@ void mem_cgroup_uncharge_list(struct list_head *page_list) uncharge_gather_clear(&ug); list_for_each_entry(page, page_list, lru) uncharge_page(page, &ug); - if (ug.memcg) - uncharge_batch(&ug); + uncharge_batch(&ug); } /** From patchwork Wed Mar 3 05:59:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12112969 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F1ABC433E0 for ; Wed, 3 Mar 2021 06:02:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7258664EBC for ; Wed, 3 Mar 2021 06:02:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7258664EBC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F3FC78D0129; Wed, 3 Mar 2021 01:02:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EA0C28D0127; Wed, 3 Mar 2021 01:02:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA3D18D0129; Wed, 3 Mar 2021 01:02:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0066.hostedemail.com [216.40.44.66]) by kanga.kvack.org (Postfix) with ESMTP id 9A2598D0127 for ; Wed, 3 Mar 2021 01:02:14 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 5AC42180CE6A9 for ; Wed, 3 Mar 2021 06:02:14 +0000 (UTC) X-FDA: 77877517788.25.7D53568 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf30.hostedemail.com (Postfix) with ESMTP id 9D181E0011EE for ; Wed, 3 Mar 2021 06:02:12 +0000 (UTC) Received: by mail-pj1-f52.google.com with SMTP id u12so3656243pjr.2 for ; Tue, 02 Mar 2021 22:02:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=NbGTbs5FDqW8dfi9ZRBPOsJxUqRFINYZ3rgDHjWLvwc=; b=Q3V/YknHFmRgj1PwWc0SRV2EtAx2M3OksOg3m9d9fOM2iJqT91NYD9rWa9oG/+FIGR HeJpIGHB4jht045gvudvL97cnfXrobWymkOfWo5TcCTsrHCz7CEyiai7x41Ai2EYJKbe eJydamPnBgFMAI8XSJGov8ANqot7MF1CDptuPRm0ffhLQm4fDvZdthd+VZseO6xvhkIK 0WK8apIgp6ieI+KkrF80w6wnyT14aLOrXTCiYNVa4kYRAbmaNwA2/3nadnkirsGi2TQE /lMiF8ztv933IGT5RXnU4Ka0AklPFhE3C8P+wggnAdHRBy5HwV39xFwGr0RcfsCVuWTX yIIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NbGTbs5FDqW8dfi9ZRBPOsJxUqRFINYZ3rgDHjWLvwc=; b=Tv4BjfKtxy7ku3D44RDEMmlU8UrtXKNsjcmH8WMZjvjYwZeKag3YLg9KpM9RjrY7Ht SaOSDz0L7LcareboGIFYnje1NK4tp0uDC3KUAjcX1z5KeR2V49gw5UWJmHcN/5pQzkjO jDsFIKpjJ2nO9V0ViBFjjH2G1eMnSwO16a3i54SyxOKwW+keppJZ/ClpqUfafYt5BM6t n1NyQ0neRJ4qZ38N8sCAkfXq44G5TI9OUNaSle8gI6zzP8KRz8RT/5w6I/U8mv3wBfaK LqiBNSCcPQxeiqPF96CeknPsTVe6CHtv/4KKWmT5wKQTssI1nk99m6kwN1eT90L9JIPX heLQ== X-Gm-Message-State: AOAM533ji+UV78c4hYO/7nXgUGfOolvQJoT6qqc2fvtXms9/B4++LEQ1 +UQSmZhGXzix+om5AFQZPEyOig== X-Google-Smtp-Source: ABdhPJyLf+noO6Y8PTxKht/Kj/fFJWB5G0zB0M0F8H7PQEohcTvEYAbxhevCUVGoyQQMN8plhzsijA== X-Received: by 2002:a17:902:be12:b029:e3:8245:f16c with SMTP id r18-20020a170902be12b02900e38245f16cmr1479021pls.57.1614751333000; Tue, 02 Mar 2021 22:02:13 -0800 (PST) Received: from localhost.localdomain ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id x11sm6131088pjh.0.2021.03.02.22.02.08 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Mar 2021 22:02:12 -0800 (PST) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, Muchun Song Subject: [PATCH v2 4/5] mm: memcontrol: introduce remote objcg charging API Date: Wed, 3 Mar 2021 13:59:16 +0800 Message-Id: <20210303055917.66054-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210303055917.66054-1-songmuchun@bytedance.com> References: <20210303055917.66054-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 9D181E0011EE X-Stat-Signature: zsjtxkq13f1dx5magfq41d7kodhdfy9q Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf30; identity=mailfrom; envelope-from=""; helo=mail-pj1-f52.google.com; client-ip=209.85.216.52 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614751332-512474 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The remote memcg charing APIs is a mechanism to charge pages to a given memcg. Since all kernel memory are charged by using obj_cgroup APIs. Actually, we want to charge kernel memory to the remote object cgroup instead of memory cgroup. So introduce remote objcg charging APIs to charge the kmem pages by using objcg_cgroup APIs. And if remote memcg and objcg are both set, objcg takes precedence over memcg to charge the kmem pages. In the later patch, we will use those API to charge kernel memory to the remote objcg. Signed-off-by: Muchun Song --- include/linux/sched.h | 4 ++++ include/linux/sched/mm.h | 38 ++++++++++++++++++++++++++++++++++++++ kernel/fork.c | 3 +++ mm/memcontrol.c | 44 ++++++++++++++++++++++++++++++++++++++++---- 4 files changed, 85 insertions(+), 4 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index ee46f5cab95b..8edcc71a0a1d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1318,6 +1318,10 @@ struct task_struct { /* Used by memcontrol for targeted memcg charge: */ struct mem_cgroup *active_memcg; #endif +#ifdef CONFIG_MEMCG_KMEM + /* Used by memcontrol for targeted objcg charge: */ + struct obj_cgroup *active_objcg; +#endif #ifdef CONFIG_BLK_CGROUP struct request_queue *throttle_queue; diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 1ae08b8462a4..be1189598b09 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -330,6 +330,44 @@ set_active_memcg(struct mem_cgroup *memcg) } #endif +#ifdef CONFIG_MEMCG_KMEM +DECLARE_PER_CPU(struct obj_cgroup *, int_active_objcg); + +/** + * set_active_objcg - Starts the remote objcg kmem pages charging scope. + * @objcg: objcg to charge. + * + * This function marks the beginning of the remote objcg charging scope. All the + * __GFP_ACCOUNT kmem page allocations till the end of the scope will be charged + * to the given objcg. + * + * NOTE: This function can nest. Users must save the return value and + * reset the previous value after their own charging scope is over. + * + * If remote memcg and objcg are both set, objcg takes precedence over memcg + * to charge the kmem pages. + */ +static inline struct obj_cgroup *set_active_objcg(struct obj_cgroup *objcg) +{ + struct obj_cgroup *old; + + if (in_interrupt()) { + old = this_cpu_read(int_active_objcg); + this_cpu_write(int_active_objcg, objcg); + } else { + old = current->active_objcg; + current->active_objcg = objcg; + } + + return old; +} +#else +static inline struct obj_cgroup *set_active_objcg(struct obj_cgroup *objcg) +{ + return NULL; +} +#endif + #ifdef CONFIG_MEMBARRIER enum { MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY = (1U << 0), diff --git a/kernel/fork.c b/kernel/fork.c index d66cd1014211..b4b9dd5d122f 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -945,6 +945,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) #ifdef CONFIG_MEMCG tsk->active_memcg = NULL; #endif +#ifdef CONFIG_MEMCG_KMEM + tsk->active_objcg = NULL; +#endif return tsk; free_stack: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0cf342d22547..e48d4ab0af76 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -79,6 +79,11 @@ struct mem_cgroup *root_mem_cgroup __read_mostly; /* Active memory cgroup to use from an interrupt context */ DEFINE_PER_CPU(struct mem_cgroup *, int_active_memcg); +#ifdef CONFIG_MEMCG_KMEM +/* Active object cgroup to use from an interrupt context */ +DEFINE_PER_CPU(struct obj_cgroup *, int_active_objcg); +#endif + /* Socket memory accounting disabled? */ static bool cgroup_memory_nosocket; @@ -1076,7 +1081,7 @@ static __always_inline struct mem_cgroup *get_active_memcg(void) return memcg; } -static __always_inline bool memcg_kmem_bypass(void) +static __always_inline bool memcg_charge_bypass(void) { /* Allow remote memcg charging from any context. */ if (unlikely(active_memcg())) @@ -1094,7 +1099,7 @@ static __always_inline bool memcg_kmem_bypass(void) */ static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void) { - if (memcg_kmem_bypass()) + if (memcg_charge_bypass()) return NULL; if (unlikely(active_memcg())) @@ -1103,6 +1108,29 @@ static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void) return get_mem_cgroup_from_mm(current->mm); } +#ifdef CONFIG_MEMCG_KMEM +static __always_inline struct obj_cgroup *active_objcg(void) +{ + if (in_interrupt()) + return this_cpu_read(int_active_objcg); + else + return current->active_objcg; +} + +static __always_inline bool kmem_charge_bypass(void) +{ + /* Allow remote charging from any context. */ + if (unlikely(active_objcg() || active_memcg())) + return false; + + /* Memcg to charge can't be determined. */ + if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) + return true; + + return false; +} +#endif + /** * mem_cgroup_iter - iterate over memory cgroup hierarchy * @root: hierarchy root @@ -2997,13 +3025,20 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p) __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void) { - struct obj_cgroup *objcg = NULL; + struct obj_cgroup *objcg; struct mem_cgroup *memcg; - if (memcg_kmem_bypass()) + if (kmem_charge_bypass()) return NULL; rcu_read_lock(); + objcg = active_objcg(); + if (unlikely(objcg)) { + /* remote object cgroup must hold a reference. */ + obj_cgroup_get(objcg); + goto out; + } + if (unlikely(active_memcg())) memcg = active_memcg(); else @@ -3015,6 +3050,7 @@ __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void) break; objcg = NULL; } +out: rcu_read_unlock(); return objcg; From patchwork Wed Mar 3 05:59:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12112971 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66151C433E0 for ; Wed, 3 Mar 2021 06:02:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CA6A664EBC for ; Wed, 3 Mar 2021 06:02:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA6A664EBC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 615D88D012A; Wed, 3 Mar 2021 01:02:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C56F8D0127; Wed, 3 Mar 2021 01:02:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F2A78D012A; Wed, 3 Mar 2021 01:02:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0018.hostedemail.com [216.40.44.18]) by kanga.kvack.org (Postfix) with ESMTP id 19D408D0127 for ; Wed, 3 Mar 2021 01:02:20 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C8ED2181BCDAC for ; Wed, 3 Mar 2021 06:02:19 +0000 (UTC) X-FDA: 77877517998.16.F6B0D43 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf02.hostedemail.com (Postfix) with ESMTP id 272D54080F40 for ; Wed, 3 Mar 2021 06:02:17 +0000 (UTC) Received: by mail-pg1-f177.google.com with SMTP id t26so15575850pgv.3 for ; Tue, 02 Mar 2021 22:02:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0SGcDLK5SCw5qpmMPfN20YvzOKzNsNe/KAEKeLOxazs=; b=0UkELpminMnKj+/y5cec+UKoj8HsSQiO78XgmB1nrVeP82D5dj50AWBuJEO7BW5icC 9dK8N2baH9CXKY6CNsydeG04XvU2+WBf84vIycyUqT1p84869gehyw1GI11M26Sgt9ff vsjU9U4nfH8Iq0sYtPsYpMxP/I52Bkr85EbjliLl++2wS6lMEwVwEvJDlZb4+wHla1Xg K/fIdpS3D98i8f8SB3l/o+/ss5sXqDTIjZz8sX5bZnzjlL417b9oqCoHQDuTaHoWIJDC AVaQ9ZUcBJo8zXksBoiRadIq+hbcT2pJwiy0vfPrbsxqZFFQfnxXR1TktzsuWFtU9ng+ HkGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0SGcDLK5SCw5qpmMPfN20YvzOKzNsNe/KAEKeLOxazs=; b=VCN0xUbnu9j+t3zSC+wHsd8mFBhvr+CY+OMZfS8OjtbbcaCoWPEwkci7H3G2oRjVXY YuSpFnpa9a9kDIFIp726jTucIh97pRUF87ZSERVFxc1SlQ5Ht7yt/Zzvs7ZCuuRY+QPO k7MN501X1eKo+lURx1jbDrv1LYngQFaZQDUEk0gmuCTzimoG6zMhO0aH1g12dyZeyisN 0MJuVeIuZLqTlIPdWlD5ASZGX51dLX7zTbxQQoX6gN5WRmXJQcrzJJlhPKnsgYZLxJds z8TuWVmtZCEsOsjP2PGKl0K1OaFPkBc1YbJ2xDTFaJ5KJ6MV/B1oDAKzNIy3j3PDXalS g9cg== X-Gm-Message-State: AOAM531GEVb9EThKyCcLhipWSmxEtArzll/tpdlE3a6LXs8OdY3bE7H6 HihVBZStwHOUQ4kQ3UwrtRWUpQ== X-Google-Smtp-Source: ABdhPJxHRgAYE/gGW7hwehOpCGLV4nafnC3fnk24fHbDceASODAgq3mqh5UXHDn4SKCox6Wle6MkbA== X-Received: by 2002:a62:b606:0:b029:1ec:ecf1:cc5 with SMTP id j6-20020a62b6060000b02901ececf10cc5mr1851762pff.62.1614751338218; Tue, 02 Mar 2021 22:02:18 -0800 (PST) Received: from localhost.localdomain ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id x11sm6131088pjh.0.2021.03.02.22.02.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Mar 2021 22:02:17 -0800 (PST) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, Muchun Song Subject: [PATCH v2 5/5] mm: memcontrol: use remote objcg charging APIs to charge kernel memory Date: Wed, 3 Mar 2021 13:59:17 +0800 Message-Id: <20210303055917.66054-6-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210303055917.66054-1-songmuchun@bytedance.com> References: <20210303055917.66054-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: rrwg3acc76gcyqu7dw8nogh41i3hfk7r X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 272D54080F40 Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf02; identity=mailfrom; envelope-from=""; helo=mail-pg1-f177.google.com; client-ip=209.85.215.177 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614751337-686001 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Some users of remote memory cgroup charging (e.g. bpf and fsnotify) can switch to the remote objcg charging APIs. Because they just want to charge kernel memory to the remote memcg. Finally, all the kernel memory are charged by using obj_cgroup APIs and do not hold a refcount to the memory cgroup. It can prevent long-living objects from pinning the original memory cgroup in the memory efficiently. Signed-off-by: Muchun Song --- fs/buffer.c | 10 ++++++--- fs/notify/fanotify/fanotify.c | 6 +++--- fs/notify/fanotify/fanotify_user.c | 2 +- fs/notify/group.c | 3 ++- fs/notify/inotify/inotify_fsnotify.c | 8 ++++---- fs/notify/inotify/inotify_user.c | 2 +- include/linux/bpf.h | 2 +- include/linux/fsnotify_backend.h | 2 +- include/linux/memcontrol.h | 15 ++++++++++++++ kernel/bpf/syscall.c | 35 ++++++++++++++++---------------- mm/memcontrol.c | 39 +++++++++++++++++++++++++++++++++--- 11 files changed, 89 insertions(+), 35 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 591547779dbd..6915195b1a70 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -842,14 +842,16 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, struct buffer_head *bh, *head; gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT; long offset; - struct mem_cgroup *memcg, *old_memcg; + struct mem_cgroup *memcg; + struct obj_cgroup *objcg, *old_objcg; if (retry) gfp |= __GFP_NOFAIL; /* The page lock pins the memcg */ memcg = page_memcg(page); - old_memcg = set_active_memcg(memcg); + objcg = get_obj_cgroup_from_mem_cgroup(memcg); + old_objcg = set_active_objcg(objcg); head = NULL; offset = PAGE_SIZE; @@ -868,7 +870,9 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, set_bh_page(bh, page, offset); } out: - set_active_memcg(old_memcg); + set_active_objcg(old_objcg); + if (objcg) + obj_cgroup_put(objcg); return head; /* * In case anything failed, we just free everything we got. diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 1192c9953620..d4ca37b85c56 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -530,7 +530,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); - struct mem_cgroup *old_memcg; + struct obj_cgroup *old_objcg; struct inode *child = NULL; bool name_event = false; @@ -580,7 +580,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, gfp |= __GFP_RETRY_MAYFAIL; /* Whoever is interested in the event, pays for the allocation. */ - old_memcg = set_active_memcg(group->memcg); + old_objcg = set_active_objcg(group->objcg); if (fanotify_is_perm_event(mask)) { event = fanotify_alloc_perm_event(path, gfp); @@ -608,7 +608,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, event->pid = get_pid(task_tgid(current)); out: - set_active_memcg(old_memcg); + set_active_objcg(old_objcg); return event; } diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 9e0c1afac8bd..055ca36d4e0e 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -985,7 +985,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) group->fanotify_data.user = user; group->fanotify_data.flags = flags; atomic_inc(&user->fanotify_listeners); - group->memcg = get_mem_cgroup_from_mm(current->mm); + group->objcg = get_obj_cgroup_from_current(); group->overflow_event = fanotify_alloc_overflow_event(); if (unlikely(!group->overflow_event)) { diff --git a/fs/notify/group.c b/fs/notify/group.c index ffd723ffe46d..fac46b92c16f 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -24,7 +24,8 @@ static void fsnotify_final_destroy_group(struct fsnotify_group *group) if (group->ops->free_group_priv) group->ops->free_group_priv(group); - mem_cgroup_put(group->memcg); + if (group->objcg) + obj_cgroup_put(group->objcg); mutex_destroy(&group->mark_mutex); kfree(group); diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index 1901d799909b..ece6dd561659 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -66,7 +66,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, int ret; int len = 0; int alloc_len = sizeof(struct inotify_event_info); - struct mem_cgroup *old_memcg; + struct obj_cgroup *old_objcg; if (name) { len = name->len; @@ -81,12 +81,12 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, /* * Whoever is interested in the event, pays for the allocation. Do not - * trigger OOM killer in the target monitoring memcg as it may have + * trigger OOM killer in the target monitoring objcg as it may have * security repercussion. */ - old_memcg = set_active_memcg(group->memcg); + old_objcg = set_active_objcg(group->objcg); event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); - set_active_memcg(old_memcg); + set_active_objcg(old_objcg); if (unlikely(!event)) { /* diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index c71be4fb7dc5..5b4de477fcac 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -649,7 +649,7 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events) oevent->name_len = 0; group->max_events = max_events; - group->memcg = get_mem_cgroup_from_mm(current->mm); + group->objcg = get_obj_cgroup_from_current(); spin_lock_init(&group->inotify_data.idr_lock); idr_init(&group->inotify_data.idr); diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cccaef1088ea..b6894e3cd095 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -158,7 +158,7 @@ struct bpf_map { u32 btf_value_type_id; struct btf *btf; #ifdef CONFIG_MEMCG_KMEM - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; #endif char name[BPF_OBJ_NAME_LEN]; u32 btf_vmlinux_value_type_id; diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index e5409b83e731..d0303f634da6 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -220,7 +220,7 @@ struct fsnotify_group { * notification list is too * full */ - struct mem_cgroup *memcg; /* memcg to charge allocations */ + struct obj_cgroup *objcg; /* objcg to charge allocations */ /* groups can define private fields here or use the void *private */ union { diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 5911b9d107b0..d5a286e3394f 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1643,6 +1643,7 @@ static inline void memcg_set_shrinker_bit(struct mem_cgroup *memcg, int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order); void __memcg_kmem_uncharge_page(struct page *page, int order); +struct obj_cgroup *get_obj_cgroup_from_mem_cgroup(struct mem_cgroup *memcg); struct obj_cgroup *get_obj_cgroup_from_current(void); int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size); @@ -1693,6 +1694,20 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg) struct mem_cgroup *mem_cgroup_from_obj(void *p); #else +static inline +struct obj_cgroup *get_obj_cgroup_from_mem_cgroup(struct mem_cgroup *memcg) +{ + return NULL; +} + +static inline struct obj_cgroup *get_obj_cgroup_from_current(void) +{ + return NULL; +} + +static inline void obj_cgroup_put(struct obj_cgroup *objcg) +{ +} static inline int memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c859bc46d06c..0703dc94cb30 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -390,37 +390,38 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock) } #ifdef CONFIG_MEMCG_KMEM -static void bpf_map_save_memcg(struct bpf_map *map) +static void bpf_map_save_objcg(struct bpf_map *map) { - map->memcg = get_mem_cgroup_from_mm(current->mm); + map->objcg = get_obj_cgroup_from_current(); } -static void bpf_map_release_memcg(struct bpf_map *map) +static void bpf_map_release_objcg(struct bpf_map *map) { - mem_cgroup_put(map->memcg); + if (map->objcg) + obj_cgroup_put(map->objcg); } void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags, int node) { - struct mem_cgroup *old_memcg; + struct obj_cgroup *old_objcg; void *ptr; - old_memcg = set_active_memcg(map->memcg); + old_objcg = set_active_objcg(map->objcg); ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node); - set_active_memcg(old_memcg); + set_active_objcg(old_objcg); return ptr; } void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags) { - struct mem_cgroup *old_memcg; + struct obj_cgroup *old_objcg; void *ptr; - old_memcg = set_active_memcg(map->memcg); + old_objcg = set_active_objcg(map->objcg); ptr = kzalloc(size, flags | __GFP_ACCOUNT); - set_active_memcg(old_memcg); + set_active_objcg(old_objcg); return ptr; } @@ -428,22 +429,22 @@ void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags) void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, size_t align, gfp_t flags) { - struct mem_cgroup *old_memcg; + struct obj_cgroup *old_objcg; void __percpu *ptr; - old_memcg = set_active_memcg(map->memcg); + old_objcg = set_active_objcg(map->objcg); ptr = __alloc_percpu_gfp(size, align, flags | __GFP_ACCOUNT); - set_active_memcg(old_memcg); + set_active_objcg(old_objcg); return ptr; } #else -static void bpf_map_save_memcg(struct bpf_map *map) +static void bpf_map_save_objcg(struct bpf_map *map) { } -static void bpf_map_release_memcg(struct bpf_map *map) +static void bpf_map_release_objcg(struct bpf_map *map) { } #endif @@ -454,7 +455,7 @@ static void bpf_map_free_deferred(struct work_struct *work) struct bpf_map *map = container_of(work, struct bpf_map, work); security_bpf_map_free(map); - bpf_map_release_memcg(map); + bpf_map_release_objcg(map); /* implementation dependent freeing */ map->ops->map_free(map); } @@ -877,7 +878,7 @@ static int map_create(union bpf_attr *attr) if (err) goto free_map_sec; - bpf_map_save_memcg(map); + bpf_map_save_objcg(map); err = bpf_map_new_fd(map, f_flags); if (err < 0) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e48d4ab0af76..367d1a485292 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3023,6 +3023,22 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p) return page_memcg_check(page); } +struct obj_cgroup *get_obj_cgroup_from_mem_cgroup(struct mem_cgroup *memcg) +{ + struct obj_cgroup *objcg = NULL; + + rcu_read_lock(); + for (; !mem_cgroup_is_root(memcg); memcg = parent_mem_cgroup(memcg)) { + objcg = rcu_dereference(memcg->objcg); + if (objcg && obj_cgroup_tryget(objcg)) + break; + objcg = NULL; + } + rcu_read_unlock(); + + return objcg; +} + __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void) { struct obj_cgroup *objcg; @@ -5356,16 +5372,33 @@ static struct mem_cgroup *mem_cgroup_alloc(void) return ERR_PTR(error); } +#ifdef CONFIG_MEMCG_KMEM +static inline struct obj_cgroup *memcg_obj_cgroup(struct mem_cgroup *memcg) +{ + return memcg ? memcg->objcg : NULL; +} +#else +static inline struct obj_cgroup *memcg_obj_cgroup(struct mem_cgroup *memcg) +{ + return NULL; +} +#endif + static struct cgroup_subsys_state * __ref mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) { struct mem_cgroup *parent = mem_cgroup_from_css(parent_css); - struct mem_cgroup *memcg, *old_memcg; + struct mem_cgroup *memcg; + struct obj_cgroup *old_objcg; long error = -ENOMEM; - old_memcg = set_active_memcg(parent); + /* + * The @parent cannot be offlined, so @parent->objcg cannot be freed + * under us. + */ + old_objcg = set_active_objcg(memcg_obj_cgroup(parent)); memcg = mem_cgroup_alloc(); - set_active_memcg(old_memcg); + set_active_objcg(old_objcg); if (IS_ERR(memcg)) return ERR_CAST(memcg);