From patchwork Tue Sep 26 19:49:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13399651 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9113EE7D27E for ; Tue, 26 Sep 2023 19:49:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC3A18D0022; Tue, 26 Sep 2023 15:49:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FE028D0002; Tue, 26 Sep 2023 15:49:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89D8F8D0022; Tue, 26 Sep 2023 15:49:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7B5A38D0002 for ; Tue, 26 Sep 2023 15:49:54 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4AD8116085C for ; Tue, 26 Sep 2023 19:49:54 +0000 (UTC) X-FDA: 81279789108.11.4B4264B Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by imf23.hostedemail.com (Postfix) with ESMTP id 60639140006 for ; Tue, 26 Sep 2023 19:49:52 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=D2hH6QKs; spf=pass (imf23.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695757792; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DHPSbE4HFUhSHC8JococnD2WCCrSSr2Zh6a3tHWORhg=; b=2ruPkszsDVpyO7UR9kGjCYm89GtesNGqU/EWHdmOcrQWaYMmF5e6peHx99WIpNGXJOOyb5 ZCq3dpjJREtRiQbPCYCDndKwC7ivF561QsUHx66BYyPrclWw6lEfQwocRbYTmi4hhOqnw+ +BzR+AlAVRS9sOrF8TPPJWB96Fftjs4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695757792; a=rsa-sha256; cv=none; b=xHZGN9uPitmohRJgVp8ggY/46FItSG2DJNYqS/NMIYRcMf5FEbfHKwFgRs3ulxuLpB5B6j w88O+wGb0TnnU9hmRKlhGC7aqHJvkRD0Y8ajY5TPtHe7JGEoE0kksZegWa9CnLrInAu3DR xM9loQk1amnBSMKeFmvoMt3tp2UPESA= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=D2hH6QKs; spf=pass (imf23.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pj1-f46.google.com with SMTP id 98e67ed59e1d1-27777174297so2335472a91.3 for ; Tue, 26 Sep 2023 12:49:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695757791; x=1696362591; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DHPSbE4HFUhSHC8JococnD2WCCrSSr2Zh6a3tHWORhg=; b=D2hH6QKsywr9Zn1Iqyls9uyIZqSExWH030S+BQRa4fCD9wJBec1/zpk903729OseaN QZy2F5uubz9s1H19N/5gOlmAr/LLqfQfEUsq9B3Y8PwIa3kgJJPY5RsR7KQ/UWgRcK5e KNsWphoHfV2L3oV/v4CixGmqJCWyWMgIsZKt1GTcwtton3l5GL+JGWBPOGMo/GU9gPps Oz6SXii1cw0vNVtRV/1LEArmwCA4zpoWN6zvUGTOp+OQCDWGibity6EMKJd5JY0EeaTV yIis19Iamz73nYJwrrOXrGbY8WvL/GwMB/t7pfJQfys7pQxwiPJeEjIpnF7m6qKequav QQMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695757791; x=1696362591; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DHPSbE4HFUhSHC8JococnD2WCCrSSr2Zh6a3tHWORhg=; b=s7C+Dw+wsQ7LSF3btuDqj+Esmx8QOUlvoGwOfJDcQIaAUIvW/2xYK/qEvZxieAtcvs ELcpC8cxCKB7l0+c56jnEY023FFOBeImSduaDCBp/NN/Z8Lz+rKM8ttnbQ3eVr3E5aMp 2vlYoYBFe0GnrEF5CQdC39F3bmun0EjeUrYt3ucFSr690LULfZDnF8wINHQQe5CRDBw/ 0QQC+haGvMF0DPHTyA5juJ6btjucOmnMV4IfxRvs+PopukcQFvXBFci4CwXKKSg9iyYK stzY7D3gd0WIAc/es4OTLGbTPD3OO/H+eQlRlQ48YU3gd9nOXVbYj77gMZBz1rqMrJXo 6h8A== X-Gm-Message-State: AOJu0Yx+8cz0P4WYmnBwBr5UWPrBaeca/0sG2NLeL2K0oDxVVxQR+9Vt 4deHXQoKu16+LstlIjV9/Xo= X-Google-Smtp-Source: AGHT+IGFISuCv1SNjWDXqxDghs9FzB/KYM3zP0VlR0ZfNQk/UyoPEf7j5pzM+YuwubCahPcTSwtbWw== X-Received: by 2002:a17:90b:128a:b0:271:9c5f:fc42 with SMTP id fw10-20020a17090b128a00b002719c5ffc42mr8383608pjb.31.1695757791100; Tue, 26 Sep 2023 12:49:51 -0700 (PDT) Received: from localhost (fwdproxy-prn-006.fbsv.net. [2a03:2880:ff:6::face:b00c]) by smtp.gmail.com with ESMTPSA id 25-20020a17090a031900b00274bbfc34c8sm13212320pje.16.2023.09.26.12.49.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Sep 2023 12:49:50 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller Date: Tue, 26 Sep 2023 12:49:48 -0700 Message-Id: <20230926194949.2637078-2-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230926194949.2637078-1-nphamcs@gmail.com> References: <20230926194949.2637078-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 60639140006 X-Rspam-User: X-Stat-Signature: ih559y37q9e8tqb4gzax9ag1biugm4er X-Rspamd-Server: rspam03 X-HE-Tag: 1695757792-334825 X-HE-Meta: U2FsdGVkX1+ZNBkVAYBn863RXgK+jrUEPZ6yRFhLkXExdsNNlcQw9zm0/P4uxbYLK6O9JBieUF70gy8TXD3kp0Myyu1NRdIhCW+JRJVCeGRuOd7d0MMRsN5wKxi/eJjw8K2Uvpvb5HAsKEf4f5YRILgu8OytUqzTf4xIBGYM1+ki47mlTfZrzoeNbGyaR3aKnAKEkfZcgrvUOB9Zpcqx9oIk2r1R+NHJJI2T9b0LAiMQcOxTzaN3c5LGe1v7ykh6H5qXkBteX2R1T1jEevvKHX6mzGEvTpKyEpEaZQyHANa/RhDvRNM0lhpi3q7zfdV1APcjrpTI+S1QerwJAScD4wH6pu6UVaqO/iF1H0K40Sfj34YE75z933A9unUICWk3Q+JEXNns1O6Fgi8xxVKL6VzBOZ4ujrx1cDIhJcxEn0GvbA4hba49Af5brTn/dPpmkkgdqFyARojGRkNVXXIVTs+fRAkfu7WCkP7N1MndmbQ3Dw3ReUythD/sJsyO67mbbCDbLaz67+NPraQxs2Y2za1J5ckMcqLhlWLTMlfDI6m7D6F5sphaaGyLj150WKjzpHcMwQSmtJlI8im/uII3++1nK+/mf/7Ikvvrh9GmIBQp4oIJ9vMAHHHo7hFNDWfNTAsCretwqEbvKajJlI2s0LkpZgdrMJJLCwfBaW6jOVupE5aA3Vu/yjH9l/8w6jWyOaGn445MLNPurgE/b9vp1JuErWoXsXfSZVO3I1GOgWb62eh7OB7L61u6WaQcoBsJMUGZHxbpoQeG/Z4eE5YtdbYJYtXfpx5O2n+kxcAqCiS61EXoce45bPcu40u3utImUHX6GyYMrKZyNziVqRghvY6IKBh4BDhUXNiNT8Vhvq5mXpDW9Z8Dd10ve31UWZdgHN2QAOwENphC21ctwBaU8JRuBKuHUG2mkGBpHc6h9HBxy2kMZYRNK8bEbi2zvhH21VGZ5aLrRy1WMCaOu4D 3swyLiyn /6ZhcFFcpY8uTqo7JOPJBisOhfT8zoLKbP9OQTHPhv0j3N2p5FJi/S1TlN/ZI7KT9pOh3rl1X8vug+dfFw+A4mVxcrOzWFA7XzDntlsVvnuj0taHe0C7rU5BWVO9+FhPOrw3tHOlsg4/bjhS+5q/RvtVZ9a4K9455281HlSjtqQwF1thvzcuNonOPeJu/YzMtA9f2efzyssfEzVWqUdI3uiRv0F6xJdigbMl+TtlKPBBg9/XxJKdx8X4cEPzVqjc8l0LLmpyFLjiEN0CMO5+hvuwWCVnvrem/MD4vl8O9DGavzCeKyBkp/FzVj+YXLi1y9ZVf5GCfjxfs5yDKPqR1M5nMrA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, hugetlb memory usage is not acounted for in the memory controller, which could lead to memory overprotection for cgroups with hugetlb-backed memory. This has been observed in our production system. This patch rectifies this issue by charging the memcg when the hugetlb folio is allocated, and uncharging when the folio is freed (analogous to the hugetlb controller). Signed-off-by: Nhat Pham --- fs/hugetlbfs/inode.c | 2 +- include/linux/hugetlb.h | 6 ++++-- include/linux/memcontrol.h | 8 ++++++++ mm/hugetlb.c | 23 ++++++++++++++++------ mm/memcontrol.c | 40 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 70 insertions(+), 9 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 60fce26ff937..034967319955 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -902,7 +902,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, * to keep reservation accounting consistent. */ hugetlb_set_vma_policy(&pseudo_vma, inode, index); - folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0); + folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0, true); hugetlb_drop_vma_policy(&pseudo_vma); if (IS_ERR(folio)) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a30686e649f7..9b73db1605a2 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -713,7 +713,8 @@ struct huge_bootmem_page { int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve); + unsigned long addr, int avoid_reserve, + bool restore_reserve_on_memcg_failure); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, @@ -1016,7 +1017,8 @@ static inline int isolate_or_dissolve_huge_page(struct page *page, static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, - int avoid_reserve) + int avoid_reserve, + bool restore_reserve_on_memcg_failure) { return NULL; } diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e0cfab58ab71..8094679c99dd 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -677,6 +677,8 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, return __mem_cgroup_charge(folio, mm, gfp); } +int mem_cgroup_hugetlb_charge_folio(struct folio *folio, gfp_t gfp); + int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry); void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry); @@ -1251,6 +1253,12 @@ static inline int mem_cgroup_charge(struct folio *folio, return 0; } +static inline int mem_cgroup_hugetlb_charge_folio(struct folio *folio, + gfp_t gfp) +{ + return 0; +} + static inline int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index de220e3ff8be..ff88ea4df11a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1902,6 +1902,7 @@ void free_huge_folio(struct folio *folio) pages_per_huge_page(h), folio); hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), pages_per_huge_page(h), folio); + mem_cgroup_uncharge(folio); if (restore_reserve) h->resv_huge_pages++; @@ -3004,7 +3005,8 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) } struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) + unsigned long addr, int avoid_reserve, + bool restore_reserve_on_memcg_failure) { struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); @@ -3119,6 +3121,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), pages_per_huge_page(h), folio); } + + /* undo allocation if memory controller disallows it. */ + if (mem_cgroup_hugetlb_charge_folio(folio, GFP_KERNEL)) { + if (restore_reserve_on_memcg_failure) + restore_reserve_on_error(h, vma, addr, folio); + folio_put(folio); + return ERR_PTR(-ENOMEM); + } + return folio; out_uncharge_cgroup: @@ -5179,7 +5190,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new_folio = alloc_hugetlb_folio(dst_vma, addr, 1); + new_folio = alloc_hugetlb_folio(dst_vma, addr, 1, false); if (IS_ERR(new_folio)) { folio_put(pte_folio); ret = PTR_ERR(new_folio); @@ -5656,7 +5667,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * be acquired again before returning to the caller, as expected. */ spin_unlock(ptl); - new_folio = alloc_hugetlb_folio(vma, haddr, outside_reserve); + new_folio = alloc_hugetlb_folio(vma, haddr, outside_reserve, true); if (IS_ERR(new_folio)) { /* @@ -5930,7 +5941,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, VM_UFFD_MISSING); } - folio = alloc_hugetlb_folio(vma, haddr, 0); + folio = alloc_hugetlb_folio(vma, haddr, 0, true); if (IS_ERR(folio)) { /* * Returning error will result in faulting task being @@ -6352,7 +6363,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0, true); if (IS_ERR(folio)) { ret = -ENOMEM; goto out; @@ -6394,7 +6405,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0, false); if (IS_ERR(folio)) { folio_put(*foliop); ret = -ENOMEM; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d1a322a75172..e7ae63f14120 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7050,6 +7050,46 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp) return ret; } +static struct mem_cgroup *get_mem_cgroup_from_current(void) +{ + struct mem_cgroup *memcg; + +again: + rcu_read_lock(); + memcg = mem_cgroup_from_task(current); + if (!css_tryget(&memcg->css)) { + rcu_read_unlock(); + goto again; + } + rcu_read_unlock(); + return memcg; +} + +/** + * mem_cgroup_hugetlb_charge_folio - Charge a newly allocated hugetlb folio. + * @folio: folio to charge. + * @gfp: reclaim mode + * + * This function charges an allocated hugetlbf folio to the memcg of the + * current task. + * + * Returns 0 on success. Otherwise, an error code is returned. + */ +int mem_cgroup_hugetlb_charge_folio(struct folio *folio, gfp_t gfp) +{ + struct mem_cgroup *memcg; + int ret; + + if (mem_cgroup_disabled()) + return 0; + + memcg = get_mem_cgroup_from_current(); + ret = charge_memcg(folio, memcg, gfp); + mem_cgroup_put(memcg); + + return ret; +} + /** * mem_cgroup_swapin_charge_folio - Charge a newly allocated folio for swapin. * @folio: folio to charge.