From patchwork Fri Oct 6 18:46:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13411835 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 955C4E92FFD for ; Fri, 6 Oct 2023 18:46:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2DA988D00C9; Fri, 6 Oct 2023 14:46:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 28A4280008; Fri, 6 Oct 2023 14:46:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12B4C8D00D5; Fri, 6 Oct 2023 14:46:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 02FC78D00C9 for ; Fri, 6 Oct 2023 14:46:36 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D785AB43F4 for ; Fri, 6 Oct 2023 18:46:35 +0000 (UTC) X-FDA: 81315917550.04.AB4AE3D Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) by imf29.hostedemail.com (Postfix) with ESMTP id 06F83120023 for ; Fri, 6 Oct 2023 18:46:32 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OsnGWhfK; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696617993; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IQ3fDKoP0HxAKgFx5HjhrMjKnZYWAS/nwLzGqRn4tZU=; b=H+PvsZvwmQF9x70aLAHYhjx5sKBhqEDARAFDGvcZUkUSH0G/UXR+PEhHrfOOBXRO/4bDh+ 51Ko/ZgoF+4R+4sATUxqSii7SHJ0V0YiTIZRnhu9UnIT8/+b8Bn21BFHHXudrE/QlR+AMk HLd7VaX171i+vW/z1j5UZ3edSZ7A7R8= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OsnGWhfK; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696617993; a=rsa-sha256; cv=none; b=W1yJnABA/1KLdO5ksW0TJSIq/JdEm5d9jsrUBg1CbPLw+4n0pBLR2OdbXsrMOzxXKZtScK NbmXFT5YvqlWhm4dY8C78RP4UbZuIRJbULghngj8OjGNdxzQNcZDFS2Z/GGssBv+9XenoP 5pGRkNos5PbHH8Em5Zdoaf1UOJM2np4= Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-51b4ef5378bso1894269a12.1 for ; Fri, 06 Oct 2023 11:46:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696617992; x=1697222792; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IQ3fDKoP0HxAKgFx5HjhrMjKnZYWAS/nwLzGqRn4tZU=; b=OsnGWhfKCClSmqIIz7thhZJBxZgKWx5WTMrUzxylzgOsoX+HDVhrx5K2WBVo5asnrE 3rHZ7g06/fbWIxsSRdth135yCCFy2fXe+rxuPjMafbsgaabGMigJDFOfkPglkSwL71Y5 xG/M61kDhMagtYoSzkEWOyU0ciZGstrsG0Sc+/qNFFZdDGqMQBVghpJ/6Hh/RwNVyMGE veWOSyIRUpfyU17EGUanyQx0u/IUNYyc3TEsModbyXakTE5j/Y3hIQFGzDCcziJMeXQ4 lYsM9TLVOKuwHtehuaFflPtOzgdppPjRC/aoQYF7qXgb7U8ITBxj6ZCC0VaTiRZFO48S WkRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696617992; x=1697222792; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IQ3fDKoP0HxAKgFx5HjhrMjKnZYWAS/nwLzGqRn4tZU=; b=PVQl4/y3ryFNxSBNzYU8cVzg/Zo/dvJS5U9jh2fyc/0ELzUT21/BlOdcfN88maFNB5 61bb+jAq3QBefx7LT7SZZbAhRMiJQAbtPHOJKrWdeok2v7/5IyCzpnKd4hAfvmX6QRkO 6BJr+5LbuIsXNF4+dQuKPgnYC9k+lcDlZjFg65kCwRCv2skLc4YjXffjWb+WbVbkvnaE M0xnT6rLKpB5VmlxugYBJGxJzoDaJneExVsn3zy5PTA7N89NFJem4O1fVm0d6RKoN7FG Vx17KPYLsfuGXbDzdrHbW1Ull94wk38XxyX1YztE7DCAW2qyXbWBqz5poXQyeJpiHffk CHyg== X-Gm-Message-State: AOJu0YzFO0dg8D1IBua3fKFTsliKlZmstfKo0XQSvSrFCsRLTOhvtJDc 1N4KJH51korSWc164f42It8= X-Google-Smtp-Source: AGHT+IH+P+iy9kpiRfFqEt50ag8YuOg/nc+rwk5xrE7+ZTfX7UsJHZlKqKdNGkM0xrSHZ2CUdYxkDg== X-Received: by 2002:a17:90a:b891:b0:273:441a:dae6 with SMTP id o17-20020a17090ab89100b00273441adae6mr8808492pjr.19.1696617991566; Fri, 06 Oct 2023 11:46:31 -0700 (PDT) Received: from localhost (fwdproxy-prn-000.fbsv.net. [2a03:2880:ff::face:b00c]) by smtp.gmail.com with ESMTPSA id s20-20020a170902989400b001c5b8087fe5sm4250869plp.94.2023.10.06.11.46.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 11:46:31 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH v4 1/4] memcontrol: add helpers for hugetlb memcg accounting Date: Fri, 6 Oct 2023 11:46:26 -0700 Message-Id: <20231006184629.155543-2-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231006184629.155543-1-nphamcs@gmail.com> References: <20231006184629.155543-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 06F83120023 X-Stat-Signature: bwo7b5d4wgf669uaf7ab3qfrfjbts96j X-Rspam-User: X-HE-Tag: 1696617992-528499 X-HE-Meta: U2FsdGVkX1/j996K6XfZqdG5cgLPp/IOhfyT/YW2pC8IzsIcvNzdzBMv8ZnxEHhM6q4/5+Jt14C29JBut8cAnaEirtQSGRAhSJ0QDsByd8v9/xS4aQpbPyVUoYFluijGlgNJPTljy5uq3dbw99FkJjXm8+pF496/aWYYTk3AsPPCfLSQkt5u5oJxbbzQHVk9dSpftvWMmr78QShzWSHaMXt8WbNfpj4XMZY25fMzXTUIzBFfY3ez/QQKNd70NTgnwk9GdnLbJn4nkLyaoOzrg+emyf7CUvG4ND3IaZtnLWu6XcODJgeFDXwn91lXkjiWYBvSTeUkEaR74M4PvYYKdYRvumfYDVDx681WnRlT4S0Q27j4GyXrhmc/ueVyhEdWoTOahNBZinG3+O8kPZn1WVUHjNTGUCjdIfbvFMKvvchvpnmr3VBNZ3HHkgykcX+44bc4P9vH0bhnkg8bvJviKpkd7vdu5mbFlRZrt2CZxcQU1UGz+t56rzGQg0WJVduJ/VJ1QJ4cVBbi1g8961kgf2IJ1nGoqQ2bOxy2I5ybKKm7t6l4rxyBMWPWPZLd3u3OLvpih3Pmpk/Y2XlSxdYLb8L4Jfbb32pVhXYPE7eFDGcfj5/3ilcSxVegz0GrxGGnX1zeCjA0ueU8Fp2JQuEgAwKJ3Do6amCveRUrAiQZp/MnDG0dpq5ARMqEMGsdaS9xpb/BRoFZvnRK6rg5Wcv57dP4yuL44RfaxH5DByQQ9IfusEgLYuPugSkC0+j1qAuN9VVrSXuMPOhKkwnLqx7hTknbPKmeXTUiigZDVtV8pLm422C1DXlryl+cGQ6lhYwvpMnJDUz1sbv3g50qfHA09O7kSljDxMnBe63Tz4oylGmtNatXD3DsisaRGGNCPwt4mIKx5iWNoFHrnsz0izdN8v8ZhN8GVkOy7Y55p9MhInCiTgd2+n5+JK6yr7OPQ3XQ/a9ThW6GVoNf8oNhd26 xr0Gl3py CnrIUFVLG5vU1F8JnRBqNk9T5LMq23Ukzz8Oe55pue9bssY9Nbcy+IjviUrGrZb7bNQgI1AKE8BMN5Sjv8B5X+tfRnEQBwmwvarLAjvCS3n8gFwfwyPwViwluslGE7AQPeqBWlOIf2cGEUx4jjCBib5eA2LuSd2L2zz/QWXPC3Nqwrf9O96mM8vdhKV7J3GVmO4I6ojQLxouqnAD3S60N0sayhx+AFD19GP9mZDni8o8pcSXYeX4LvK4oq5jIJegW5AfsPslHs8bvWJF8U85Y5/jG+INuyCkqPNIYKxPpF6So4VEku/z9UzcH4hSYOf5sBJxWOA+rhFabfcucpO6FzVQExrgVqJDloGbdr6jaB/Hq4IlIiBQKa3q0Rx584Bz2F7xzMHZ9J8cPNKnI+zv8bUB5TbegU58jqMUbdUhHdOP7cy4Z1lKld0o1htyexJss6e+5E4Bi4C1k0ZizpmMfjSK1bGAACv15ZmPdLSmuQAzrALARMd7YVV/eu/JT/QF+vzQV30qUM+JB6MvIH7J9PX1CeZpQrwraCviJrIpYO+jdjLo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch exposes charge committing and cancelling as parts of the memory controller interface. These functionalities are useful when the try_charge() and commit_charge() stages have to be separated by other actions in between (which can fail). One such example is the new hugetlb accounting behavior in the following patch. The patch also adds a helper function to obtain a reference to the current task's memcg. Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Nhat Pham --- include/linux/memcontrol.h | 21 ++++++++++++++ mm/memcontrol.c | 59 ++++++++++++++++++++++++++++++-------- 2 files changed, 68 insertions(+), 12 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e0cfab58ab71..42bf7e9b1a2f 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -653,6 +653,8 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *target, page_counter_read(&memcg->memory); } +void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg); + int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp); /** @@ -704,6 +706,8 @@ static inline void mem_cgroup_uncharge_list(struct list_head *page_list) __mem_cgroup_uncharge_list(page_list); } +void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages); + void mem_cgroup_migrate(struct folio *old, struct folio *new); /** @@ -760,6 +764,8 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); +struct mem_cgroup *get_mem_cgroup_from_current(void); + struct lruvec *folio_lruvec_lock(struct folio *folio); struct lruvec *folio_lruvec_lock_irq(struct folio *folio); struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, @@ -1245,6 +1251,11 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *target, return false; } +static inline void mem_cgroup_commit_charge(struct folio *folio, + struct mem_cgroup *memcg) +{ +} + static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp) { @@ -1269,6 +1280,11 @@ static inline void mem_cgroup_uncharge_list(struct list_head *page_list) { } +static inline void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, + unsigned int nr_pages) +{ +} + static inline void mem_cgroup_migrate(struct folio *old, struct folio *new) { } @@ -1306,6 +1322,11 @@ static inline struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) return NULL; } +static inline struct mem_cgroup *get_mem_cgroup_from_current(void) +{ + return NULL; +} + static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d1a322a75172..0219befeae38 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1086,6 +1086,27 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) } EXPORT_SYMBOL(get_mem_cgroup_from_mm); +/** + * get_mem_cgroup_from_current - Obtain a reference on current task's memcg. + */ +struct mem_cgroup *get_mem_cgroup_from_current(void) +{ + struct mem_cgroup *memcg; + + if (mem_cgroup_disabled()) + return NULL; + +again: + rcu_read_lock(); + memcg = mem_cgroup_from_task(current); + if (!css_tryget(&memcg->css)) { + rcu_read_unlock(); + goto again; + } + rcu_read_unlock(); + return memcg; +} + static __always_inline bool memcg_kmem_bypass(void) { /* Allow remote memcg charging from any context. */ @@ -2873,7 +2894,12 @@ static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, return try_charge_memcg(memcg, gfp_mask, nr_pages); } -static inline void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) +/** + * mem_cgroup_cancel_charge() - cancel an uncommitted try_charge() call. + * @memcg: memcg previously charged. + * @nr_pages: number of pages previously charged. + */ +void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) { if (mem_cgroup_is_root(memcg)) return; @@ -2898,6 +2924,22 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg) folio->memcg_data = (unsigned long)memcg; } +/** + * mem_cgroup_commit_charge - commit a previously successful try_charge(). + * @folio: folio to commit the charge to. + * @memcg: memcg previously charged. + */ +void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg) +{ + css_get(&memcg->css); + commit_charge(folio, memcg); + + local_irq_disable(); + mem_cgroup_charge_statistics(memcg, folio_nr_pages(folio)); + memcg_check_events(memcg, folio_nid(folio)); + local_irq_enable(); +} + #ifdef CONFIG_MEMCG_KMEM /* * The allocated objcg pointers array is not accounted directly. @@ -6105,7 +6147,7 @@ static void __mem_cgroup_clear_mc(void) /* we must uncharge all the leftover precharges from mc.to */ if (mc.precharge) { - cancel_charge(mc.to, mc.precharge); + mem_cgroup_cancel_charge(mc.to, mc.precharge); mc.precharge = 0; } /* @@ -6113,7 +6155,7 @@ static void __mem_cgroup_clear_mc(void) * we must uncharge here. */ if (mc.moved_charge) { - cancel_charge(mc.from, mc.moved_charge); + mem_cgroup_cancel_charge(mc.from, mc.moved_charge); mc.moved_charge = 0; } /* we must fixup refcnts and charges */ @@ -7020,20 +7062,13 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root, static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg, gfp_t gfp) { - long nr_pages = folio_nr_pages(folio); int ret; - ret = try_charge(memcg, gfp, nr_pages); + ret = try_charge(memcg, gfp, folio_nr_pages(folio)); if (ret) goto out; - css_get(&memcg->css); - commit_charge(folio, memcg); - - local_irq_disable(); - mem_cgroup_charge_statistics(memcg, nr_pages); - memcg_check_events(memcg, folio_nid(folio)); - local_irq_enable(); + mem_cgroup_commit_charge(folio, memcg); out: return ret; } From patchwork Fri Oct 6 18:46:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13411836 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5567CE92FFC for ; Fri, 6 Oct 2023 18:46:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FF3880009; Fri, 6 Oct 2023 14:46:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 462B780008; Fri, 6 Oct 2023 14:46:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2645180009; Fri, 6 Oct 2023 14:46:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1007680008 for ; Fri, 6 Oct 2023 14:46:37 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CD8A28036B for ; Fri, 6 Oct 2023 18:46:36 +0000 (UTC) X-FDA: 81315917592.15.071AADA Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf14.hostedemail.com (Postfix) with ESMTP id 09855100020 for ; Fri, 6 Oct 2023 18:46:34 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=C6sK2kBw; spf=pass (imf14.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696617995; a=rsa-sha256; cv=none; b=4P6tNr4tCru48oXyfnxmAPTGJJ95odx9MxU1y06n1baEYfLwQ3uRI/AopyIs2TP2NastTX cyxsuWWkmHYypiaLt/cQo2pgUPrI6QwEoBGvCsaiwr1SChSm2FaAxTsyBH3oPiRaYE1Ms+ /Emijmh3TCQl4viQ4KtXpnIOV9dWut4= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=C6sK2kBw; spf=pass (imf14.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696617995; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6PLbRh1GdQSuzgMGxcjYajdp+r42f9X7B8/Cs8IIw1E=; b=dYmOIV01VsnTO7BzK9wLT3OtIj/rFfkAD+Q5c7uKiTMwvBXGNGE7NAGmiOf8hkteq0XO4i kP5UxLBm5vajU0aLkWO9qxjsxlKEks5a6pfTfe3GjZYQg/dDbmPzqru2pl+TU0SBAaINbU bX/Il269k2UMVGoRc8bUTfFYDc07KxU= Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-565334377d0so1887277a12.2 for ; Fri, 06 Oct 2023 11:46:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696617994; x=1697222794; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6PLbRh1GdQSuzgMGxcjYajdp+r42f9X7B8/Cs8IIw1E=; b=C6sK2kBwIcpXUKBtnyBTJsq5mLZVwme/uHkH6T6ZPdjq6st1FNQF3wJ3io+MOJ2KSG wLEDRSIsJf9sZRR/PpwrPWuu/YjqmU53kPSBkQQz3iRVXa1cfQdFveC4ViCEJhUnDeeF /O+KxEtwRTk568isgAFl4NfVy1XTro1Sq2RHLYSL2JthCviZ5fD6NyNRPYZQj++NtCaS OF4xQV/i88flnAXUf7OdaxmKCS382HCpT+0TJGQY2AHYdusJ+mYhj+tE41bS9TlUJah0 /6Srl2rFkCGSDmeADRDYkxPOjOJHIvAHPe7Jx6iLRBpVesomxJKo52FTEOCKNQ76JvB/ gl8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696617994; x=1697222794; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6PLbRh1GdQSuzgMGxcjYajdp+r42f9X7B8/Cs8IIw1E=; b=BPi8l1pACDXeTiJH2+RYmdyHKvJGV4lFhgM7zOX2ypdpdKq2QTSCE+6XuApb3z3f09 o7kk8IsJwwj5QEG9vkMQCcMhOCSzsMYLQY2u8Eed+eI0bWboWGeG7m87NsZsiQPYv05O Cf/6Utjlr/kYBJJrK1pxSSYGpvXeGDhEIQYvVMCafgyL5F2R1yRy/pA7KQxjE6ilvJLg DlGfoU4HewRI2q1RnFhhm5TB+nImhqSJsMUtlj19gQz2fg3fzsZYkiy6bQwsm2TfuNs8 V6eVhwR7jd7ZzKUG/cHaYNwWldjw9Jl+v9XFxIDXyAT6Wu4hoU3kv7/rOk3ygDWs6NrV oiXQ== X-Gm-Message-State: AOJu0YxJtkNNGTBCOAna7iM1Ax5EhrZeRc7Jdu1QDPDpPHKgQfRcaa4t dd6jLRXx6EoVUWBLC29jAShPmx1QPeM= X-Google-Smtp-Source: AGHT+IEe+6yqU5r07JsaeSeLI4O7kp/29vLl/f9OnIgJYThbQQ0o75CPnRFGXWPYmDXr5YACLKuB3g== X-Received: by 2002:a05:6a21:798a:b0:16c:bd7e:d524 with SMTP id bh10-20020a056a21798a00b0016cbd7ed524mr566168pzc.57.1696617993719; Fri, 06 Oct 2023 11:46:33 -0700 (PDT) Received: from localhost (fwdproxy-prn-015.fbsv.net. [2a03:2880:ff:f::face:b00c]) by smtp.gmail.com with ESMTPSA id g22-20020aa78756000000b00682669dc19bsm1800647pfo.201.2023.10.06.11.46.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 11:46:32 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH v4 2/4] memcontrol: only transfer the memcg data for migration Date: Fri, 6 Oct 2023 11:46:27 -0700 Message-Id: <20231006184629.155543-3-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231006184629.155543-1-nphamcs@gmail.com> References: <20231006184629.155543-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 09855100020 X-Stat-Signature: knxs7zc5xdjwzpifjo3dqjp88wrp9hp6 X-Rspam-User: X-HE-Tag: 1696617994-118405 X-HE-Meta: U2FsdGVkX18d0KMLkNLsagA2W9DNZ3x2bxtghdwFyVILXPm8yCtu2Ig03hGmTWfVN/P6SWflX1NymsMojALYQo+0OK9NunlomIEc9QSj0rjOFDeSWXdIXQOdigFLuNZA0mMSzWvFgEy9D8yWl/3yJ9l8qSjcP3tYvDx1/ahOGfg0pCuezQUfPHuJZSb3j/Pzs6qe/0j5kwok49gCZOnZEQQAmsLeYg5O+6i48VDkK/r+uuPp8YvC1+ZOichMx9SmNr1uZlYqY2YhxvpBD4U6AokVPy58RCpw9axOY6Tk91qsF2RLkkQ3+jrNJWd2OOgvFZenGvJKmVYXXh2worvEwIT5YqNrZqpYCogNuCdh9a+RbjrynN0Sp/i145PA2prxmpeKU6AuphbrWrBG8Lx+i6Cx9BNsGCpEvPFwMZSKS/4J24fGmshv6wxAabQmumvqNUNy6C1sLDNb5f9cieazYAZBNOtO0+/cehvlpMkmG7bKZyGXKwETUxlZL/3vK21d9mbG2E+ceB9o9BqA2oe1MlnmNPy+3Pxniv/B9FqOo4ThR0UyuMJRGGNo9dog6qz7tsk0q6FQchIlkizR5G4rZYbE0bK3DyNMnD6cvZqJMuCJctMqs4u0TDsVpHef5lh2C4NwMvCcFK3e1Z3M8RDZGosOGLywiYgXauw6euJ5ci9iix/k7n6DJc96xwkhAW00MNEIN+GVlDnNIxL3WnKHZ7M2FNH88Hk/O2WboreEnUGhzQHk+tAqdwV7KM9B+Svpm2WRCyV9Kljtugc4t++kVWuB5xwAYqT4z0gJ8wrg89YtedjzdlxsZrE/24InveG/o32nukdObISZ+r6LoyqYbslByH6d+9aWY/YCd88LJy2BUkxeXhK5L3oDRUVi16ouK3qkVBOcaGgefkaeFfUbl26H0MqVG8CrTe8HwsuMllXcN/Eq0jzq5/YuqOhn6hnMFKmZsUTFagek7xeJUoL y6V1aqx/ bEgreZkU3+EA1W6Ieck/fnM0XuPdaZ2+I/xv3WSZAoHezDvFhTkKw0Y13LXdJWBXqi505sIjmGUrNvjcYfifLWPx++jNjGMROCQ2tVVtzEowbtuuPWhBimDeenv71YV1f8AByXM6Ym+FktMzOA7VPwZRLuf5Vt7HmXLEYE8rugEw39Eg1mv3U5mv60Ft+KRujvb+ZzyatknErDBVcYNJ6MjJU8JXpX4Io8gYPJAIolZjjM9riiwI6C+WpRrT23dx9eviHOQZMxi8Jt2ZRm62MJFAsfigtZ91nbadXn6fJgD3CcVbH0O5NWfNjJciGpWmGJycC+LKXdT30JGk8jF8Adfl4gtO0t1LQ3xoV5K/ckbRrlpFzb4ouhiMMEgHYUxFOnOkxB+W4W7qptW4d37oScgfuQinu51kM4jEAEZDw6FLCxkujI6c0JP5Acm1qVnd1Td8OKwTXWn54/MyjIR8mkU1+sXVK/yMzf3bB/QYWzOQHoePiLf5jc6nG8vf69LbJm9fZ/ZqX+Vgmr9yLN6xPGk8evxw9u3v4ecU4YQcNLbQ5Zmo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For most migration use cases, only transfer the memcg data from the old folio to the new folio, and clear the old folio's memcg data. No charging and uncharging will be done. This shaves off some work on the migration path, and avoids the temporary double charging of a folio during its migration. The only exception is replace_page_cache_folio(), which will use the old mem_cgroup_migrate() (now renamed to mem_cgroup_replace_folio). In that context, the isolation of the old page isn't quite as thorough as with migration, so we cannot use our new implementation directly. This patch is the result of the following discussion on the new hugetlb memcg accounting behavior: https://lore.kernel.org/lkml/20231003171329.GB314430@monkey/ Suggested-by: Johannes Weiner Signed-off-by: Nhat Pham Acked-by: Johannes Weiner --- include/linux/memcontrol.h | 7 +++++++ mm/filemap.c | 2 +- mm/memcontrol.c | 40 +++++++++++++++++++++++++++++++++++--- 3 files changed, 45 insertions(+), 4 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 42bf7e9b1a2f..5daf14da3759 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -708,6 +708,8 @@ static inline void mem_cgroup_uncharge_list(struct list_head *page_list) void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages); +void mem_cgroup_replace_folio(struct folio *old, struct folio *new); + void mem_cgroup_migrate(struct folio *old, struct folio *new); /** @@ -1285,6 +1287,11 @@ static inline void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, { } +static inline void mem_cgroup_replace_folio(struct folio *old, + struct folio *new) +{ +} + static inline void mem_cgroup_migrate(struct folio *old, struct folio *new) { } diff --git a/mm/filemap.c b/mm/filemap.c index 9481ffaf24e6..673745219c82 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -819,7 +819,7 @@ void replace_page_cache_folio(struct folio *old, struct folio *new) new->mapping = mapping; new->index = offset; - mem_cgroup_migrate(old, new); + mem_cgroup_replace_folio(old, new); xas_lock_irq(&xas); xas_store(&xas, new); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0219befeae38..b9c479d768e2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7281,16 +7281,17 @@ void __mem_cgroup_uncharge_list(struct list_head *page_list) } /** - * mem_cgroup_migrate - Charge a folio's replacement. + * mem_cgroup_replace_folio - Charge a folio's replacement. * @old: Currently circulating folio. * @new: Replacement folio. * * Charge @new as a replacement folio for @old. @old will - * be uncharged upon free. + * be uncharged upon free. This is only used by the page cache + * (in replace_page_cache_folio()). * * Both folios must be locked, @new->mapping must be set up. */ -void mem_cgroup_migrate(struct folio *old, struct folio *new) +void mem_cgroup_replace_folio(struct folio *old, struct folio *new) { struct mem_cgroup *memcg; long nr_pages = folio_nr_pages(new); @@ -7329,6 +7330,39 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new) local_irq_restore(flags); } +/** + * mem_cgroup_migrate - Transfer the memcg data from the old to the new folio. + * @old: Currently circulating folio. + * @new: Replacement folio. + * + * Transfer the memcg data from the old folio to the new folio for migration. + * The old folio's data info will be cleared. Note that the memory counters + * will remain unchanged throughout the process. + * + * Both folios must be locked, @new->mapping must be set up. + */ +void mem_cgroup_migrate(struct folio *old, struct folio *new) +{ + struct mem_cgroup *memcg; + + VM_BUG_ON_FOLIO(!folio_test_locked(old), old); + VM_BUG_ON_FOLIO(!folio_test_locked(new), new); + VM_BUG_ON_FOLIO(folio_test_anon(old) != folio_test_anon(new), new); + VM_BUG_ON_FOLIO(folio_nr_pages(old) != folio_nr_pages(new), new); + + if (mem_cgroup_disabled()) + return; + + memcg = folio_memcg(old); + VM_WARN_ON_ONCE_FOLIO(!memcg, old); + if (!memcg) + return; + + /* Transfer the charge and the css ref */ + commit_charge(new, memcg); + old->memcg_data = 0; +} + DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key); EXPORT_SYMBOL(memcg_sockets_enabled_key); From patchwork Fri Oct 6 18:46:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13411837 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83C70E92FE0 for ; Fri, 6 Oct 2023 18:46:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E7228000A; Fri, 6 Oct 2023 14:46:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 744AF80008; Fri, 6 Oct 2023 14:46:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 572028000A; Fri, 6 Oct 2023 14:46:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3ED8480008 for ; Fri, 6 Oct 2023 14:46:38 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 09407120362 for ; Fri, 6 Oct 2023 18:46:38 +0000 (UTC) X-FDA: 81315917676.04.F4F2149 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf03.hostedemail.com (Postfix) with ESMTP id 192BC2000C for ; Fri, 6 Oct 2023 18:46:35 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CMy6s3oq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696617996; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S8WFxmGmo8Wm2uPAMHU1NgqFhq3Wq2AWhdy0gUN56bU=; b=Lj66YGp3oaumTB05BC1GSLCXT2F7yZjTvZRByRZGgisJUzhypDT0sZ4eSj+bS5a724GTGT 6UPMEF/54sAmrKM+IYjHhkmzyQCrQ5UsR5b5ImpksQHQb8Rg5AbbEphHsROzM+qKT2Fssl tOT5xuqNl9S9BqDFqZQM/E8nNeAsP7c= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CMy6s3oq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696617996; a=rsa-sha256; cv=none; b=o3R/TbrqgaIg5jkbsE9a7N92gAwpslTjDBWFCHOGEmLblURD7HgQ2IYx0MsDzxWZMznYYT 7OKNk/a1hetvvKydtq3rAriU8d/RJN3RIQb8wizzBtJ5Ihld9L2kOF6EOHkKVRxI7karvt oNvXGHCc23YooojI/BDCNyFAKVUFJxU= Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-1c871a095ceso20454255ad.2 for ; Fri, 06 Oct 2023 11:46:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696617995; x=1697222795; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=S8WFxmGmo8Wm2uPAMHU1NgqFhq3Wq2AWhdy0gUN56bU=; b=CMy6s3oq9w8scF0nE03QTzKUiOzXpqdbsacgSCkWudctRKvn96dw5STvueLuyXuN8S Ubv5TQMuXQvaySLCz8mcogZEurpYJY07x/EcvCW4ADXiimJd7xCB2LCsaN2rAbrUav9M slfNyD76oIkMi4sduuwC/HkzCB1QyBxhHEUV6+5dWiVFG4MvRyUXP+cb1k94UURkiXac z+JHtxScbjV+08bQN8sl0jK8yFnV6T1bIi5bEPBOvmWchgA6r2NXSh83UnHdnP1W4g8A 8RCxn/rfuz6NZ3hF9S3dxHP+BmSotKRP3A69bEXjB+Dd+jFb3CTSlZEwzrPQhSjkBTtf O9pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696617995; x=1697222795; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S8WFxmGmo8Wm2uPAMHU1NgqFhq3Wq2AWhdy0gUN56bU=; b=AEbKKVAQ1Q9/ZiZPEMVU40YKZ6IWWMCDvclNDiObtWv8UvujtGp/QBxiulynpv1LN3 VIW8HlGQ1CGRW/4aXKUI1PZlWKaKaFVQ4IEdHdc9XsTgPDDfmv0EAdjDDPhtWsm2dnt+ VQ1kFog2flvyVbjgpyEvT3IvrtMuwZ3zDoAnzZcfZMnhyPu/xSGFYEUo1ApucLAXP2fJ dQS2348Y34MBpBcevWHWin58eBao3ONNoul/PMizg8Y8iM0zEPEBPv5hQP7GO5yq8uUF 9etNWm+J6xtkRnlyFMYRfsUo18YxKpgyOODWN771KYolE7isfF7NPw/YgeLZdxNpNdEQ GISw== X-Gm-Message-State: AOJu0YxI6b59GXw+OvI65wmYqFaSNH0h4bU9PTacF1zdOFPi8gapC21D KxC7slTTqd4pDKY79xch8ug= X-Google-Smtp-Source: AGHT+IFxR6wHAD2nVbIRv2FkwmdHHCgfiwmUD4DnTGWC/ogkLmz36RVmuDPIhydPU2RqgCuhvYX+eQ== X-Received: by 2002:a17:903:228a:b0:1c3:2423:8e24 with SMTP id b10-20020a170903228a00b001c324238e24mr11013936plh.8.1696617994752; Fri, 06 Oct 2023 11:46:34 -0700 (PDT) Received: from localhost (fwdproxy-prn-118.fbsv.net. [2a03:2880:ff:76::face:b00c]) by smtp.gmail.com with ESMTPSA id ja18-20020a170902efd200b001bb750189desm4263819plb.255.2023.10.06.11.46.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 11:46:34 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH v4 3/4] hugetlb: memcg: account hugetlb-backed memory in memory controller Date: Fri, 6 Oct 2023 11:46:28 -0700 Message-Id: <20231006184629.155543-4-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231006184629.155543-1-nphamcs@gmail.com> References: <20231006184629.155543-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 192BC2000C X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: ptaaqc7657m5ix8eheizhoifkg1jwft5 X-HE-Tag: 1696617995-301657 X-HE-Meta: U2FsdGVkX19r0u7nF5NPWKMpo2aEeSqfVCMytyrtD/ll8PIn4btwZ/jz2rlUIreeh9yyRh0rzIw3/DCf7oD4fZEabu/CdJXF2dDFuqwriWWlwXJ9kf/sQJqPGOLBFjYvUVz5FyPds8T1xGQvaX3WrdJnlItHyFAQwlgYWXiRr9xBGzO4JhL5de2eOJn1J19MhXVoCv+82iToamUerciNKKB9M/CT1IoFZHEBFFmbFY9QFRs3W+YNX/GEiZRhgN3otyMMJMPIz5oX9JmpX9M+/hyoa8YkfpZJzBkkLSD2GN+sefu78H+iJzS6OjEI3wyNCy2It1Re0gxJxlLXOXhbY+1oH+JukAJoVQJ5TUEN3PfK19Dh2yO/1JNGqGcpIqCuSZUhb3oiyS57lCeuk2FV3rDHYaxlgc9VEjMbnA+EUQywgzr+B2SvdfBz++OFc8Ec6kwgjiQClp/+d0MFWrEPc+E/vsCCZQE76yehuy99LQ8GANxhQ5fQ2B6CuEAxJITyTRuZKhGqP+Hpie46d23tpj7a5h2u1k4jfHsiaoTyCH3moMq2xDG9sTP5klhRaKFpgi/ues4yKDvKSCqOSihADXsPXjywxBkd7BOwfELd2sb2PYT060NOQR7F6mf0V/w4zxWsDF3nxkCYSZEg3ljwM5uvKKD621WkkF4cbXKxKNgireFaAAQluZXgJF64iSKjvWF69dEAEEE5XYLh5r1h5+gjZKa+ewLxkKXfyi0UW2ZDwaSW8bk7Ohs6+MPZRDgZEuCRHbAT/g3H4oeTDE2nRN25JvLFv5dthk324rWMSwZcFNPO0cEwYwI7ygY1l42I2SO7TG+oObKpihjBfEK0kvSATmnp/Y3p1lo+pjeveS2ybDQ2SjuSI/9ZAlVyhhdGd7NoT1LAguw2FCkUY+Riw4JBTByLqYv+byUUkOn26rWyEdxZfh8C2Ed/kdUTpWASg0JwUH4M2eBef53Neyi jsR890CO Lt3Wq6F2Dg3Js+sB7kHUUE7YDlEax6aSguOaoctgPr4cmtjomqK7StscPXH+njh+T+s5xNbMZvAPb3mBeGXLpmAW+1xBRiV0ae6zSw/qANxSP5weLvDeDQ/RXMNh4hwoqDsVaIis+1BrnwBzpUZIbvgu9MmUzSXgr+DbLr5RvBDBRtytNNzcccXGGBzDyBkfa7ThR4I7z/EBdKwEFitsCeVgqFbEMLdpJVJx37tPf9UAA7Q/d2zu/X+r0GpE0r7MgdA9C38C2a/AucUVhF20D7xFth+rpyqjo1DSEcer0JLLqDUDx7bh1gDC+5pY7XrpGAofj9OhsSH4XdkRxeyt3WlZZTzWsh7hKVWhx+eC42kbCYqGj2lLVH3+G7ZMu2k91WQXaSS7x0Wpz9IudbU8l1k88GGGnZALCSrKmeCId913m3/bAK2m6fu37XEMOoLwON1DMVRws3sKyXeob3GqIerRDvFMtbKibr8Z/DBCKnd276RQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, hugetlb memory usage is not acounted for in the memory controller, which could lead to memory overprotection for cgroups with hugetlb-backed memory. This has been observed in our production system. For instance, here is one of our usecases: suppose there are two 32G containers. The machine is booted with hugetlb_cma=6G, and each container may or may not use up to 3 gigantic page, depending on the workload within it. The rest is anon, cache, slab, etc. We can set the hugetlb cgroup limit of each cgroup to 3G to enforce hugetlb fairness. But it is very difficult to configure memory.max to keep overall consumption, including anon, cache, slab etc. fair. What we have had to resort to is to constantly poll hugetlb usage and readjust memory.max. Similar procedure is done to other memory limits (memory.low for e.g). However, this is rather cumbersome and buggy. Furthermore, when there is a delay in memory limits correction, (for e.g when hugetlb usage changes within consecutive runs of the userspace agent), the system could be in an over/underprotected state. This patch rectifies this issue by charging the memcg when the hugetlb folio is utilized, and uncharging when the folio is freed (analogous to the hugetlb controller). Note that we do not charge when the folio is allocated to the hugetlb pool, because at this point it is not owned by any memcg. Some caveats to consider: * This feature is only available on cgroup v2. * There is no hugetlb pool management involved in the memory controller. As stated above, hugetlb folios are only charged towards the memory controller when it is used. Host overcommit management has to consider it when configuring hard limits. * Failure to charge towards the memcg results in SIGBUS. This could happen even if the hugetlb pool still has pages (but the cgroup limit is hit and reclaim attempt fails). * When this feature is enabled, hugetlb pages contribute to memory reclaim protection. low, min limits tuning must take into account hugetlb memory. * Hugetlb pages utilized while this option is not selected will not be tracked by the memory controller (even if cgroup v2 is remounted later on). Signed-off-by: Nhat Pham Acked-by: Johannes Weiner --- Documentation/admin-guide/cgroup-v2.rst | 29 +++++++++++++++++ include/linux/cgroup-defs.h | 5 +++ include/linux/memcontrol.h | 9 ++++++ kernel/cgroup/cgroup.c | 15 ++++++++- mm/hugetlb.c | 35 ++++++++++++++++----- mm/memcontrol.c | 42 ++++++++++++++++++++++++- mm/migrate.c | 3 +- 7 files changed, 127 insertions(+), 11 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 622a7f28db1f..606b2e0eac4b 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -210,6 +210,35 @@ cgroup v2 currently supports the following mount options. relying on the original semantics (e.g. specifying bogusly high 'bypass' protection values at higher tree levels). + memory_hugetlb_accounting + Count HugeTLB memory usage towards the cgroup's overall + memory usage for the memory controller (for the purpose of + statistics reporting and memory protetion). This is a new + behavior that could regress existing setups, so it must be + explicitly opted in with this mount option. + + A few caveats to keep in mind: + + * There is no HugeTLB pool management involved in the memory + controller. The pre-allocated pool does not belong to anyone. + Specifically, when a new HugeTLB folio is allocated to + the pool, it is not accounted for from the perspective of the + memory controller. It is only charged to a cgroup when it is + actually used (for e.g at page fault time). Host memory + overcommit management has to consider this when configuring + hard limits. In general, HugeTLB pool management should be + done via other mechanisms (such as the HugeTLB controller). + * Failure to charge a HugeTLB folio to the memory controller + results in SIGBUS. This could happen even if the HugeTLB pool + still has pages available (but the cgroup limit is hit and + reclaim attempt fails). + * Charging HugeTLB memory towards the memory controller affects + memory protection and reclaim dynamics. Any userspace tuning + (of low, min limits for e.g) needs to take this into account. + * HugeTLB pages utilized while this option is not selected + will not be tracked by the memory controller (even if cgroup + v2 is remounted later on). + Organizing Processes and Threads -------------------------------- diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index f1b3151ac30b..8641f4320c98 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -115,6 +115,11 @@ enum { * Enable recursive subtree protection */ CGRP_ROOT_MEMORY_RECURSIVE_PROT = (1 << 18), + + /* + * Enable hugetlb accounting for the memory controller. + */ + CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING = (1 << 19), }; /* cftype->flags */ diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 5daf14da3759..e3eaa123256b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -679,6 +679,9 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, return __mem_cgroup_charge(folio, mm, gfp); } +int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp, + long nr_pages); + int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry); void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry); @@ -1264,6 +1267,12 @@ static inline int mem_cgroup_charge(struct folio *folio, return 0; } +static inline int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, + gfp_t gfp, long nr_pages) +{ + return 0; +} + static inline int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry) { diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1fb7f562289d..f11488b18ceb 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1902,6 +1902,7 @@ enum cgroup2_param { Opt_favordynmods, Opt_memory_localevents, Opt_memory_recursiveprot, + Opt_memory_hugetlb_accounting, nr__cgroup2_params }; @@ -1910,6 +1911,7 @@ static const struct fs_parameter_spec cgroup2_fs_parameters[] = { fsparam_flag("favordynmods", Opt_favordynmods), fsparam_flag("memory_localevents", Opt_memory_localevents), fsparam_flag("memory_recursiveprot", Opt_memory_recursiveprot), + fsparam_flag("memory_hugetlb_accounting", Opt_memory_hugetlb_accounting), {} }; @@ -1936,6 +1938,9 @@ static int cgroup2_parse_param(struct fs_context *fc, struct fs_parameter *param case Opt_memory_recursiveprot: ctx->flags |= CGRP_ROOT_MEMORY_RECURSIVE_PROT; return 0; + case Opt_memory_hugetlb_accounting: + ctx->flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; + return 0; } return -EINVAL; } @@ -1960,6 +1965,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags) cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_RECURSIVE_PROT; else cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_RECURSIVE_PROT; + + if (root_flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING) + cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; + else + cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; } } @@ -1973,6 +1983,8 @@ static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root seq_puts(seq, ",memory_localevents"); if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_RECURSIVE_PROT) seq_puts(seq, ",memory_recursiveprot"); + if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING) + seq_puts(seq, ",memory_hugetlb_accounting"); return 0; } @@ -7050,7 +7062,8 @@ static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr, "nsdelegate\n" "favordynmods\n" "memory_localevents\n" - "memory_recursiveprot\n"); + "memory_recursiveprot\n" + "memory_hugetlb_accounting\n"); } static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index de220e3ff8be..74472e911b0a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1902,6 +1902,7 @@ void free_huge_folio(struct folio *folio) pages_per_huge_page(h), folio); hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), pages_per_huge_page(h), folio); + mem_cgroup_uncharge(folio); if (restore_reserve) h->resv_huge_pages++; @@ -3009,11 +3010,20 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); struct folio *folio; - long map_chg, map_commit; + long map_chg, map_commit, nr_pages = pages_per_huge_page(h); long gbl_chg; - int ret, idx; + int memcg_charge_ret, ret, idx; struct hugetlb_cgroup *h_cg = NULL; + struct mem_cgroup *memcg; bool deferred_reserve; + gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; + + memcg = get_mem_cgroup_from_current(); + memcg_charge_ret = mem_cgroup_hugetlb_try_charge(memcg, gfp, nr_pages); + if (memcg_charge_ret == -ENOMEM) { + mem_cgroup_put(memcg); + return ERR_PTR(-ENOMEM); + } idx = hstate_index(h); /* @@ -3022,8 +3032,12 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, * code of zero indicates a reservation exists (no change). */ map_chg = gbl_chg = vma_needs_reservation(h, vma, addr); - if (map_chg < 0) + if (map_chg < 0) { + if (!memcg_charge_ret) + mem_cgroup_cancel_charge(memcg, nr_pages); + mem_cgroup_put(memcg); return ERR_PTR(-ENOMEM); + } /* * Processes that did not create the mapping will have no @@ -3034,10 +3048,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, */ if (map_chg || avoid_reserve) { gbl_chg = hugepage_subpool_get_pages(spool, 1); - if (gbl_chg < 0) { - vma_end_reservation(h, vma, addr); - return ERR_PTR(-ENOSPC); - } + if (gbl_chg < 0) + goto out_end_reservation; /* * Even though there was no reservation in the region/reserve @@ -3119,6 +3131,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), pages_per_huge_page(h), folio); } + + if (!memcg_charge_ret) + mem_cgroup_commit_charge(folio, memcg); + mem_cgroup_put(memcg); + return folio; out_uncharge_cgroup: @@ -3130,7 +3147,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, out_subpool_put: if (map_chg || avoid_reserve) hugepage_subpool_put_pages(spool, 1); +out_end_reservation: vma_end_reservation(h, vma, addr); + if (!memcg_charge_ret) + mem_cgroup_cancel_charge(memcg, nr_pages); + mem_cgroup_put(memcg); return ERR_PTR(-ENOSPC); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b9c479d768e2..a3adfecf5977 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7085,6 +7085,41 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp) return ret; } +/** + * mem_cgroup_hugetlb_try_charge - try to charge the memcg for a hugetlb folio + * @memcg: memcg to charge. + * @gfp: reclaim mode. + * @nr_pages: number of pages to charge. + * + * This function is called when allocating a huge page folio to determine if + * the memcg has the capacity for it. It does not commit the charge yet, + * as the hugetlb folio itself has not been obtained from the hugetlb pool. + * + * Once we have obtained the hugetlb folio, we can call + * mem_cgroup_commit_charge() to commit the charge. If we fail to obtain the + * folio, we should instead call mem_cgroup_cancel_charge() to undo the effect + * of try_charge(). + * + * Returns 0 on success. Otherwise, an error code is returned. + */ +int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp, + long nr_pages) +{ + /* + * If hugetlb memcg charging is not enabled, do not fail hugetlb allocation, + * but do not attempt to commit charge later (or cancel on error) either. + */ + if (mem_cgroup_disabled() || !memcg || + !cgroup_subsys_on_dfl(memory_cgrp_subsys) || + !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING)) + return -EOPNOTSUPP; + + if (try_charge(memcg, gfp, nr_pages)) + return -ENOMEM; + + return 0; +} + /** * mem_cgroup_swapin_charge_folio - Charge a newly allocated folio for swapin. * @folio: folio to charge. @@ -7354,7 +7389,12 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new) return; memcg = folio_memcg(old); - VM_WARN_ON_ONCE_FOLIO(!memcg, old); + /* + * Note that it is normal to see !memcg for a hugetlb folio. + * For e.g, itt could have been allocated when memory_hugetlb_accounting + * was not selected. + */ + VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(old) && !memcg, old); if (!memcg) return; diff --git a/mm/migrate.c b/mm/migrate.c index 7d1804c4a5d9..6034c7ed1d65 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -633,8 +633,7 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio) folio_copy_owner(newfolio, folio); - if (!folio_test_hugetlb(folio)) - mem_cgroup_migrate(folio, newfolio); + mem_cgroup_migrate(folio, newfolio); } EXPORT_SYMBOL(folio_migrate_flags); From patchwork Fri Oct 6 18:46:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13411838 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10A65E81E1E for ; Fri, 6 Oct 2023 18:46:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2B5A98000B; Fri, 6 Oct 2023 14:46:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2646780008; Fri, 6 Oct 2023 14:46:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 090588000B; Fri, 6 Oct 2023 14:46:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E797180008 for ; Fri, 6 Oct 2023 14:46:38 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B1E43404F1 for ; Fri, 6 Oct 2023 18:46:38 +0000 (UTC) X-FDA: 81315917676.26.3B4365E Received: from mail-oi1-f169.google.com (mail-oi1-f169.google.com [209.85.167.169]) by imf19.hostedemail.com (Postfix) with ESMTP id F03341A0013 for ; Fri, 6 Oct 2023 18:46:36 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ghMqF+w5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.167.169 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696617997; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vY+Soyh4ezPzEsiTwAjxm3PLfRjJVUJ94CEs1TUSu9g=; b=aKwWCxa5nErHiIkhpgDxZv3EtMtD6cMyCgH4IhJzOkd9sG1Fze7YhmPjLGAM1/8iZ5UaSB +vHu9msh1v1S5oPho6ca8INlEPQwesU6NYYazxpxFGXETjmT8fcJYNajO25m/aX2gNr5a/ 7nF+mmcx1+kVoUmqPgNSqzDKjkrT6SI= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ghMqF+w5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.167.169 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696617997; a=rsa-sha256; cv=none; b=JeIRfQfBCHYFdyOzLqw8p9vnvzAXuvhkQyM90jU7YzKl/0c4xx9BPQccMpbEzUSNvnvI44 2XQNVB7SUww85alPttexHIjWtO1E9DwQXKkCNbw+bWzwypd4c9YUgoMd15iZ2OFuwMJxz5 yFjY21xe/3M26fJEzRWJDgy41M6ISH8= Received: by mail-oi1-f169.google.com with SMTP id 5614622812f47-3ae2896974bso1555940b6e.0 for ; Fri, 06 Oct 2023 11:46:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696617996; x=1697222796; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vY+Soyh4ezPzEsiTwAjxm3PLfRjJVUJ94CEs1TUSu9g=; b=ghMqF+w5ebG8v8Y0eBmDaXnjz13ACQ1MI1AlcdZ9U9xssP0F94fP0VxQMyxiyOOmb/ nsnJTOicXZJT6D5kBoLBUrM/ehaPzVYfgm05RqYBu3IU0wtByktdHcBir2D2KNlOYX6o 20QE622Y+wfnXDSy7oAZ2GMNQNEb4CVwMZHInGuqVC7Ccnpqivo/0StfdhyFOgqq6IJI Onn6J/L86pUTyXaG6mhNr10vkliun6IGp9boJvx96kkYZqRHOv6FF7mmrZ8nrtK9ZT7G WgBZZd9n8tWduC2Qkv1A1cVz+DABmt6THbU6sdpTShSeKGS6L/dYRgy9hCNJM8MqlJfb dOlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696617996; x=1697222796; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vY+Soyh4ezPzEsiTwAjxm3PLfRjJVUJ94CEs1TUSu9g=; b=J/9KFCzc5R9sdaKyXeJqAJvjkIB4t73C7+0YwZrWUZUtrua6zP4ble63///2diFTMg spBsBanMOZvDXI499huCtTevyztTqf7Y26eG6o60kg3+xt6o/nSHRG0LMchndc89vAW6 qqRowGh1Wah8ijZY7IIsUxy6J8RgHFmSIF2558L05AL9x5YcFMjKvGKYCPM2I/8mQjdg 4Ybz+1NdmIdtNDnABZa0HUbgv/oyclcSpmNkkhI14HX5WZO+6qgEs1yaL56m4cTxYvTi BEEdTfa7/ynnyPQ5//54Va6S+BLq90RXglXA6rOk2Y/tPp/68S8Ourjozc3vuBe9cb4I 670Q== X-Gm-Message-State: AOJu0YwS9W4qiAkn7dEc1/Xpftb/LSeoF1wzZmgtMCBz6KSHsH0yTEdM +cYKqMOz7MOidrIyGoQAdgM= X-Google-Smtp-Source: AGHT+IEy8b9pbe0uvLgz2KimcopbbkeJwi6+OhwuqZtXmIDIBfGt5NNnJWLdEFwXkYOWepQdee4kyg== X-Received: by 2002:a54:4812:0:b0:3ad:c497:1336 with SMTP id j18-20020a544812000000b003adc4971336mr9048765oij.16.1696617995913; Fri, 06 Oct 2023 11:46:35 -0700 (PDT) Received: from localhost (fwdproxy-prn-013.fbsv.net. [2a03:2880:ff:d::face:b00c]) by smtp.gmail.com with ESMTPSA id a24-20020a637058000000b00581048ffc13sm3722085pgn.81.2023.10.06.11.46.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 11:46:35 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH v4 4/4] selftests: add a selftest to verify hugetlb usage in memcg Date: Fri, 6 Oct 2023 11:46:29 -0700 Message-Id: <20231006184629.155543-5-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231006184629.155543-1-nphamcs@gmail.com> References: <20231006184629.155543-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: F03341A0013 X-Stat-Signature: o8jqqifd8bgejxwpponi3hsmne86eeij X-HE-Tag: 1696617996-597107 X-HE-Meta: U2FsdGVkX19+QYJyvjPesYPs8RzorCt62Lx0OSXBrosjP6z3gOLmNSZMqe7MVF2whbic0GJXrmgzjVyLXqlo2mho06Ygyx8qPUgYHIs2eilYIhdSdghzESVfTvZApwOgTkVpdztUSxK2N/zYlJW4TjycAmsMojhnC+H9FIGOlJ7uCGMbt53ZPYy9WHfQkUTmr4810CeLPRAu/HuxNBwJzrr6B6+qYU0J/ww8aBKH7r5xOoGn35S7THmxvsWA6dV+tF/J5NO3vj2eA6JEHpHijJpZJ0uJRrAR2k1HnRu32T7GhydFaVRUYJTAZsON2kbz9yhC6KWjwIF0v8dQqA4wCLthQI6C0/aiv717Th2wQCNNcf48+i+U08LT84JkjvI69CpfR+YwjG1Djkt0fROUndvj8XebnqIFxouSuExd1yKs+uc+qbSeUPwsPXlnClyrfCnQmdaK3kbMNSLCpX3LvYDTW4s9OpujpaDOca4ovNsrFkRTdnmhWKnH7bGhLFe4WpqhxigFFhK35wRF6vY5dHNa8E1qnOoRiFf/sfVhWQ4YBu4rQ7G4NI0Umz+FN5DZxHr3DGy5ask9XUn5g5r2PlTgkmt6Z4Y5ZBQEXIZ+TJzkHTvqyXtdmUMEanAmOoZl72hmDlXY311liChsR9DSdclHVrSbHFObS+2AY/M/8QeQ5Sg+N0nIGbPbSzwDEtFjoXsmGipSoH3H5zdOjJUt5+l1kNvVDp3mvl11u+c+Egnpkvbu+rDDtIlm7sLuZp8DNsltvAwFQ3UcPuYsiQ6DhrSULEKPvp0GlVuv7wu+6P2/Apnne5UuBmOrxMLdkXXvX2h166XBRMOdYAiW1iOnYT27BN0eQ1sliJG6U4lClYrWQzD/ZRw9qEIxDQV6Z0Rg4D5XkovvicYedUvWlawXnZyQ2SOUTyy209D3M4G7sAXGZkywWVSEvZC1jzEm2wh2bwooebE1usFepPd3VKN LPVkuhPV 2tUuDNySS5wXwwEw3nA/TP84a8rkzdX4gu/cSGMmes02MxM1hY76qj0jRqsMpaTvZ0awYlJBZ16FFxYba102SArjT8SoozVbkV4CQRJ1tVYtZ6PsyRWwvBchZRJ5L93zugguwHSyr43N+1KCuQCFGAZeQzs21b0CpRLfIoYZNYrdlEqdFG6NSZ3lAw3Wl5Ss5YlD9g2lz86NoWwLUNzaXk7Qf5d9QBF1PyABjZotiul92lVD1KPvi2Qc1g/9d2gL/izbafU4GL6kmIaLlvHFb+3XIFmDRDxOs1xNLspiW2cyyePJ0s19eKGJoF1VG53HHoGUQpanu9AMPJobm+NWt0Gx8ThSWbky2z+TZIfzwqQhzK3Ldo0PGqqpNshox1R+TRrR7ywgzJjjno7qRaqFoei1oN2cjH35sfDnTW9glVWA2k8w7rLLbUrFM4s/xLRIWa/89N62dmD0XhMF6GP7Q4Wq6GjxuZI0TeaW26h3zC8YnTaw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.436917, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch add a new kselftest to demonstrate and verify the new hugetlb memcg accounting behavior. Signed-off-by: Nhat Pham --- MAINTAINERS | 2 + tools/testing/selftests/cgroup/.gitignore | 1 + tools/testing/selftests/cgroup/Makefile | 2 + .../selftests/cgroup/test_hugetlb_memcg.c | 234 ++++++++++++++++++ 4 files changed, 239 insertions(+) create mode 100644 tools/testing/selftests/cgroup/test_hugetlb_memcg.c diff --git a/MAINTAINERS b/MAINTAINERS index bf0f54c24f81..ce9f40bcc2ba 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5269,6 +5269,7 @@ S: Maintained F: mm/memcontrol.c F: mm/swap_cgroup.c F: tools/testing/selftests/cgroup/memcg_protection.m +F: tools/testing/selftests/cgroup/test_hugetlb_memcg.c F: tools/testing/selftests/cgroup/test_kmem.c F: tools/testing/selftests/cgroup/test_memcontrol.c @@ -9652,6 +9653,7 @@ F: include/linux/hugetlb.h F: mm/hugetlb.c F: mm/hugetlb_vmemmap.c F: mm/hugetlb_vmemmap.h +F: tools/testing/selftests/cgroup/test_hugetlb_memcg.c HVA ST MEDIA DRIVER M: Jean-Christophe Trotin diff --git a/tools/testing/selftests/cgroup/.gitignore b/tools/testing/selftests/cgroup/.gitignore index af8c3f30b9c1..2732e0b29271 100644 --- a/tools/testing/selftests/cgroup/.gitignore +++ b/tools/testing/selftests/cgroup/.gitignore @@ -7,4 +7,5 @@ test_kill test_cpu test_cpuset test_zswap +test_hugetlb_memcg wait_inotify diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile index c27f05f6ce9b..00b441928909 100644 --- a/tools/testing/selftests/cgroup/Makefile +++ b/tools/testing/selftests/cgroup/Makefile @@ -14,6 +14,7 @@ TEST_GEN_PROGS += test_kill TEST_GEN_PROGS += test_cpu TEST_GEN_PROGS += test_cpuset TEST_GEN_PROGS += test_zswap +TEST_GEN_PROGS += test_hugetlb_memcg LOCAL_HDRS += $(selfdir)/clone3/clone3_selftests.h $(selfdir)/pidfd/pidfd.h @@ -27,3 +28,4 @@ $(OUTPUT)/test_kill: cgroup_util.c $(OUTPUT)/test_cpu: cgroup_util.c $(OUTPUT)/test_cpuset: cgroup_util.c $(OUTPUT)/test_zswap: cgroup_util.c +$(OUTPUT)/test_hugetlb_memcg: cgroup_util.c diff --git a/tools/testing/selftests/cgroup/test_hugetlb_memcg.c b/tools/testing/selftests/cgroup/test_hugetlb_memcg.c new file mode 100644 index 000000000000..f0fefeb4cc24 --- /dev/null +++ b/tools/testing/selftests/cgroup/test_hugetlb_memcg.c @@ -0,0 +1,234 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include "../kselftest.h" +#include "cgroup_util.h" + +#define ADDR ((void *)(0x0UL)) +#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) +/* mapping 8 MBs == 4 hugepages */ +#define LENGTH (8UL*1024*1024) +#define PROTECTION (PROT_READ | PROT_WRITE) + +/* borrowed from mm/hmm-tests.c */ +static long get_hugepage_size(void) +{ + int fd; + char buf[2048]; + int len; + char *p, *q, *path = "/proc/meminfo", *tag = "Hugepagesize:"; + long val; + + fd = open(path, O_RDONLY); + if (fd < 0) { + /* Error opening the file */ + return -1; + } + + len = read(fd, buf, sizeof(buf)); + close(fd); + if (len < 0) { + /* Error in reading the file */ + return -1; + } + if (len == sizeof(buf)) { + /* Error file is too large */ + return -1; + } + buf[len] = '\0'; + + /* Search for a tag if provided */ + if (tag) { + p = strstr(buf, tag); + if (!p) + return -1; /* looks like the line we want isn't there */ + p += strlen(tag); + } else + p = buf; + + val = strtol(p, &q, 0); + if (*q != ' ') { + /* Error parsing the file */ + return -1; + } + + return val; +} + +static int set_file(const char *path, long value) +{ + FILE *file; + int ret; + + file = fopen(path, "w"); + if (!file) + return -1; + ret = fprintf(file, "%ld\n", value); + fclose(file); + return ret; +} + +static int set_nr_hugepages(long value) +{ + return set_file("/proc/sys/vm/nr_hugepages", value); +} + +static unsigned int check_first(char *addr) +{ + return *(unsigned int *)addr; +} + +static void write_data(char *addr) +{ + unsigned long i; + + for (i = 0; i < LENGTH; i++) + *(addr + i) = (char)i; +} + +static int hugetlb_test_program(const char *cgroup, void *arg) +{ + char *test_group = (char *)arg; + void *addr; + long old_current, expected_current, current; + int ret = EXIT_FAILURE; + + old_current = cg_read_long(test_group, "memory.current"); + set_nr_hugepages(20); + current = cg_read_long(test_group, "memory.current"); + if (current - old_current >= MB(2)) { + ksft_print_msg( + "setting nr_hugepages should not increase hugepage usage.\n"); + ksft_print_msg("before: %ld, after: %ld\n", old_current, current); + return EXIT_FAILURE; + } + + addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, 0, 0); + if (addr == MAP_FAILED) { + ksft_print_msg("fail to mmap.\n"); + return EXIT_FAILURE; + } + current = cg_read_long(test_group, "memory.current"); + if (current - old_current >= MB(2)) { + ksft_print_msg("mmap should not increase hugepage usage.\n"); + ksft_print_msg("before: %ld, after: %ld\n", old_current, current); + goto out_failed_munmap; + } + old_current = current; + + /* read the first page */ + check_first(addr); + expected_current = old_current + MB(2); + current = cg_read_long(test_group, "memory.current"); + if (!values_close(expected_current, current, 5)) { + ksft_print_msg("memory usage should increase by around 2MB.\n"); + ksft_print_msg( + "expected memory: %ld, actual memory: %ld\n", + expected_current, current); + goto out_failed_munmap; + } + + /* write to the whole range */ + write_data(addr); + current = cg_read_long(test_group, "memory.current"); + expected_current = old_current + MB(8); + if (!values_close(expected_current, current, 5)) { + ksft_print_msg("memory usage should increase by around 8MB.\n"); + ksft_print_msg( + "expected memory: %ld, actual memory: %ld\n", + expected_current, current); + goto out_failed_munmap; + } + + /* unmap the whole range */ + munmap(addr, LENGTH); + current = cg_read_long(test_group, "memory.current"); + expected_current = old_current; + if (!values_close(expected_current, current, 5)) { + ksft_print_msg("memory usage should go back down.\n"); + ksft_print_msg( + "expected memory: %ld, actual memory: %ld\n", + expected_current, current); + return ret; + } + + ret = EXIT_SUCCESS; + return ret; + +out_failed_munmap: + munmap(addr, LENGTH); + return ret; +} + +static int test_hugetlb_memcg(char *root) +{ + int ret = KSFT_FAIL; + char *test_group; + + test_group = cg_name(root, "hugetlb_memcg_test"); + if (!test_group || cg_create(test_group)) { + ksft_print_msg("fail to create cgroup.\n"); + goto out; + } + + if (cg_write(test_group, "memory.max", "100M")) { + ksft_print_msg("fail to set cgroup memory limit.\n"); + goto out; + } + + /* disable swap */ + if (cg_write(test_group, "memory.swap.max", "0")) { + ksft_print_msg("fail to disable swap.\n"); + goto out; + } + + if (!cg_run(test_group, hugetlb_test_program, (void *)test_group)) + ret = KSFT_PASS; +out: + cg_destroy(test_group); + free(test_group); + return ret; +} + +int main(int argc, char **argv) +{ + char root[PATH_MAX]; + int ret = EXIT_SUCCESS, has_memory_hugetlb_acc; + + has_memory_hugetlb_acc = proc_mount_contains("memory_hugetlb_accounting"); + if (has_memory_hugetlb_acc < 0) + ksft_exit_skip("Failed to query cgroup mount option\n"); + else if (!has_memory_hugetlb_acc) + ksft_exit_skip("memory hugetlb accounting is disabled\n"); + + /* Unit is kB! */ + if (get_hugepage_size() != 2048) { + ksft_print_msg("test_hugetlb_memcg requires 2MB hugepages\n"); + ksft_test_result_skip("test_hugetlb_memcg\n"); + return ret; + } + + if (cg_find_unified_root(root, sizeof(root))) + ksft_exit_skip("cgroup v2 isn't mounted\n"); + + switch (test_hugetlb_memcg(root)) { + case KSFT_PASS: + ksft_test_result_pass("test_hugetlb_memcg\n"); + break; + case KSFT_SKIP: + ksft_test_result_skip("test_hugetlb_memcg\n"); + break; + default: + ret = EXIT_FAILURE; + ksft_test_result_fail("test_hugetlb_memcg\n"); + break; + } + + return ret; +}