From patchwork Fri Oct 6 18:46:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13411834 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF668E92FE0 for ; Fri, 6 Oct 2023 18:46:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54BB980007; Fri, 6 Oct 2023 14:46:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4FACA8D00C9; Fri, 6 Oct 2023 14:46:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39C1580007; Fri, 6 Oct 2023 14:46:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 29B4B8D00C9 for ; Fri, 6 Oct 2023 14:46:34 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EBC2780240 for ; Fri, 6 Oct 2023 18:46:33 +0000 (UTC) X-FDA: 81315917466.26.DD903E3 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) by imf26.hostedemail.com (Postfix) with ESMTP id 2BE71140015 for ; Fri, 6 Oct 2023 18:46:31 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=efUW7tVL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.210.178 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696617992; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=2s4i0NFYgIBv1cJZUrejLYvQnyIri0mts0GvRI59+ZQ=; b=VsOUBaNeOC3NKJfjd/cVwzA/tScJAhJ0bvujDMz8WWUcHXjtlgOl1zQOgoulqXjWL45azX dqzaP8B5oto5SCq54SGkD+TUsvJV9Qboz46RwWuEs4i4vIdl1lCb/pvrCnlNFPQMNEKRcb mbSRnF/HCMGJehRKoWaBxIGWwWiimlg= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=efUW7tVL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.210.178 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696617992; a=rsa-sha256; cv=none; b=YuuM33/DKf+XVtDh/Hq5679dE9SKzqDHUPegS0okZlMBei97XD8gkQcxGKE9UTxC3H/IY+ 5YI0YFthEb+t4hCQRDIQ53cYVxWA3BN7BGE+D2ITOSN1gqC1UuECQivg9SoyR86FepnLIb WPXloyvcr7r3W+BXiYYWeFbiF81XJNY= Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-6969b391791so1983667b3a.3 for ; Fri, 06 Oct 2023 11:46:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696617991; x=1697222791; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=2s4i0NFYgIBv1cJZUrejLYvQnyIri0mts0GvRI59+ZQ=; b=efUW7tVLGP0BanZiIEbXoa7kUrUPYjU165TVR0zu2Bwd7TlyaU4DKn7ySZ5AA7i/EJ kjh2T1Z4yg14sNaV9YAAlHY2NYvFaR9KyLGXnSBMeUXsfB2iXVsI1xVgZjb8ITrhub59 X3dpHX0KKw3/Mfa5aLIyXuR/T/yRZom3ac+poxXWVmYky5jJQPRnqPXgnUAnXWTZG3aA QgZ5JiExwFwW84ZCb3O9nbFdVwJ8k4Le7sKfX108PhnWpSp22HiVcFLzaD92+2DRERx/ ggec1aJ4U8hKdWZ6D8wz9MUBHaX16oTCuAnQy0go5EPTjWBaFpeFsFIqrIIfOs0Eif2v Qn8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696617991; x=1697222791; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=2s4i0NFYgIBv1cJZUrejLYvQnyIri0mts0GvRI59+ZQ=; b=l2rxTlEZi/Hv51n99FfeGm/dgOgeL3cFxo+OXa2N+UN3cXUgAQO/IbYt/rfi/5IVz1 21V/8arScU128g2qgIpdner2HuIAFpua4XftldBqGSzDnIZHLHOWSrua5DigLKgjuFZG K1R/roIsjAGOvBe/DW1ok/u0r7cb/DAV481Gglih6pnYbZqqqAWwKeRheDyCKDTE0hY9 qeozBBB/xLCFhPhv2lW8jl3mdVUuB55qDpHiUPo50el3j/4d6R/PSIsahi0bi0RxFngd E5OCmFtXHjH7PuFpXge2WZBmaVVhelDAUHWvjWKk1w0MGuG4QqvorBTqasbjEy6fKLsL 3OJQ== X-Gm-Message-State: AOJu0YxAC6ivLZNFNyq0ujYjNwabUrmvGFuYg9Ml6Efp+l1NKg/clD42 Le776rVzxh9ID8WLKIRUXm2h9jgA+Oo= X-Google-Smtp-Source: AGHT+IFh2JBtLTorBxbati1NWKp48DoHPTohn7WHY69D8nGHAnt4czmX5CCj+n6BTX0EGyhcAYyzdw== X-Received: by 2002:a05:6a20:841b:b0:137:2f8c:fab0 with SMTP id c27-20020a056a20841b00b001372f8cfab0mr10387583pzd.49.1696617990622; Fri, 06 Oct 2023 11:46:30 -0700 (PDT) Received: from localhost (fwdproxy-prn-000.fbsv.net. [2a03:2880:ff::face:b00c]) by smtp.gmail.com with ESMTPSA id v13-20020aa7808d000000b0068ffd4eb66dsm1812830pff.35.2023.10.06.11.46.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 11:46:30 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH v4 0/4] hugetlb memcg accounting Date: Fri, 6 Oct 2023 11:46:25 -0700 Message-Id: <20231006184629.155543-1-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 2BE71140015 X-Stat-Signature: 4r4iebhcyitj1y6n1owrjegsm3nc177n X-Rspam-User: X-HE-Tag: 1696617991-802199 X-HE-Meta: U2FsdGVkX1/qc0eQbg07rkpY+m591BsnMNpARdz0ESCgY7sEMTlPW7ZBQe+EPqd1/Z9TCfcN7LkEjTyOi8vhuSUmEFXljI+Rheen435l7NIzSaKzbwJwxN/GjLchhSxhlLnkgisEeLojyyB9FzaMIua1likLgpCVTuTapNf3amm48UbdzweplVjCfSHdHUmmVo9y0zvXupxoOj1qdwXi07ihSmsqOvo2v+6ZPWutjTC5df3HJ9L2XHKiBmKZtYu3q/rQ1WbdoSr6DQJ3+gKnHr9QevJuLPnYYmAi3UhzNJpiT4ND5BeHL3NT5Kh56Ft2QALbpKl8XSgvXPX2Fta4dUygTRBdV+XSuSEWM5iA65vILJYeLM/4rR0Lgg8B3D0OFOhrtSg6dklbKROE7JTJKRkA6HVuUtFVX74oJJkRzTYA2a8evuMC/d/0wIhSj5RZpIGty2V/F3apOqGeurgF0kDBhwaOFv6aX6bOde4AOZa7kGgtB2K+CUiVxmu3ZYu9iAABMP1yU279CIsXSrPDHDea/+ZU8XfHprHkKL3+Lg1B1iOcC8xfYkPxuUit4FxYeiIKsgr79pOLslSAj61K1mwQugziOvuEplKJW3VvuzCfkg0HY2qRdxRi1+yMcLKJeQbTcx/Tlw5Ezp6VrIwNpEXZyYCaIDj/ZX1jZop/iyLABfHDM3A93rqmaiWF2Hllefg2XBU/j1wopyQjatuP+YDD8rak7pjKUve1WVlwmMMFV2k3u4US3qPzi6JvELKYTDn7qz7GzrOPm4UKKHYHFU3YoYBw4rsHllrs2xOsi7wQnwnqp+V//KQCIKPq7bkiGX6MsqMzWppR//po+0acQy9wWHG99iiui2YMPSQNprcMc9O09xo56lNjWFqpvdRo1OkI6nSh2Z5x1GMpdlOqgy9DdKInLniIc2YhBPP9dxAVhVtesKRA3Zot2IeE9Wi2bCNdpT5HjFJ5tqsjnNO nBqwVIIi 3Xy8Ybc3obCaaH/jc31YErk7xUVpJxCtE1zqCOEaaUQoZ64C5rspXjLvhTRygxXzZaZqfNOWT4+/LkIxhsiuMg7z+nZhBuBysihyz0Il2FBpBgOiml8cQDSNlB8+X8u9HbgKNPAED93RRy8CmJcN/alxFVvZlyqYXWxbsUYuXlDYozFBZtORbk6TY4RNF4pLLcocCq+1f4OYUrqx/uEHdGjknPR5Ct821FdnVlSBi6iizHXG8gfohwNhAn51+HLl81Qzb1IT0Fa6x7XuCVfDtGg6RTE5Y2E8c/3i1sNIkk2SzyywfKluIg6Honi4YxKK3DWWM9tpXhnjS7BNT6CxHs3I3urpAvaoURTROTrqnp9YpAPWfFfg6xGnXdCSMVyG8YalycooG+0foYSeiMYJJ+occPQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Changelog: v4: * Add another prep patch to clean up memory controller migration logic. * Fix an issue in hugetlb folio migration where the new folio is not properly charged (patch 3) (reported by Mike Kravetz) (suggested by Johannes Weiner). v3: * Add a prep patch at the start of the series to extend the memory controller interface with new helper functions for hugetlb accounting. * Do not account hugetlb memory for memcontroller in cgroup v1 (patch 2) (suggested by Johannes Weiner). * Change the gfp flag passed to mem cgroup charging (patch 2) (suggested by Michal Hocko). * Add caveats to cgroup admin guide and commit changelog (patch 2) (suggested by Michal Hocko). v2: * Add a cgroup mount option to enable/disable the new hugetlb memcg accounting behavior (patch 1) (suggested by Johannes Weiner). * Add a couple more ksft_print_msg() on error to aid debugging when the selftest fails. (patch 2) Currently, hugetlb memory usage is not acounted for in the memory controller, which could lead to memory overprotection for cgroups with hugetlb-backed memory. This has been observed in our production system. For instance, here is one of our usecases: suppose there are two 32G containers. The machine is booted with hugetlb_cma=6G, and each container may or may not use up to 3 gigantic page, depending on the workload within it. The rest is anon, cache, slab, etc. We can set the hugetlb cgroup limit of each cgroup to 3G to enforce hugetlb fairness. But it is very difficult to configure memory.max to keep overall consumption, including anon, cache, slab etc. fair. What we have had to resort to is to constantly poll hugetlb usage and readjust memory.max. Similar procedure is done to other memory limits (memory.low for e.g). However, this is rather cumbersome and buggy. Furthermore, when there is a delay in memory limits correction, (for e.g when hugetlb usage changes within consecutive runs of the userspace agent), the system could be in an over/underprotected state. This patch series rectifies this issue by charging the memcg when the hugetlb folio is allocated, and uncharging when the folio is freed. In addition, a new selftest is added to demonstrate and verify this new behavior. Nhat Pham (4): memcontrol: add helpers for hugetlb memcg accounting memcontrol: only transfer the memcg data for migration hugetlb: memcg: account hugetlb-backed memory in memory controller selftests: add a selftest to verify hugetlb usage in memcg Documentation/admin-guide/cgroup-v2.rst | 29 +++ MAINTAINERS | 2 + include/linux/cgroup-defs.h | 5 + include/linux/memcontrol.h | 37 +++ kernel/cgroup/cgroup.c | 15 +- mm/filemap.c | 2 +- mm/hugetlb.c | 35 ++- mm/memcontrol.c | 139 +++++++++-- mm/migrate.c | 3 +- tools/testing/selftests/cgroup/.gitignore | 1 + tools/testing/selftests/cgroup/Makefile | 2 + .../selftests/cgroup/test_hugetlb_memcg.c | 234 ++++++++++++++++++ 12 files changed, 478 insertions(+), 26 deletions(-) create mode 100644 tools/testing/selftests/cgroup/test_hugetlb_memcg.c