From patchwork Tue Jun 25 00:58:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C74DC30659 for ; Tue, 25 Jun 2024 00:59:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 702AA6B034A; Mon, 24 Jun 2024 20:59:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B2A56B034B; Mon, 24 Jun 2024 20:59:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B6526B034C; Mon, 24 Jun 2024 20:59:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2E5EC6B034A for ; Mon, 24 Jun 2024 20:59:33 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id AE82CA3592 for ; Tue, 25 Jun 2024 00:59:32 +0000 (UTC) X-FDA: 82267602984.02.268E08C Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) by imf26.hostedemail.com (Postfix) with ESMTP id 82718140015 for ; Tue, 25 Jun 2024 00:59:30 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=SNHEm2mO; spf=pass (imf26.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.188 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277164; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0K8TiASsfXHI+86AZ2a2bPWGqGFFcRCb/gQyRRUpsu8=; b=4+V5BYWOpt+6sE2S+TOs74hVBn+29A7rdhgKvehP170l7BiFHWwUuU+lfTdLusxjHZFBBZ V4mGZOKeDazg7uwrLIS8PbBGaGhTYFUISY9/8HwYOZXCxc3GHVz7MAOo/RVuXEa49pT5JR rtYL/xr+U5HFP2XZsHxJgMBUmhrJKMg= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=SNHEm2mO; spf=pass (imf26.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.188 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277164; a=rsa-sha256; cv=none; b=Qh5p73fSbvkvngz0sdvrg2bbJ0HXMIpKmr+Mw4krYY5djgNks/2bDavPVSkeyY5WLvo+K5 VT5FFkn/S/kQvdKM/09zOZnmF8dUBElKAhrAg4EPxI39o4gsNbK9PPngp4QjijzXT9lD6Z tyCXGtquuiK1+/p9uyb2V8bZlUNOtNE= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277168; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0K8TiASsfXHI+86AZ2a2bPWGqGFFcRCb/gQyRRUpsu8=; b=SNHEm2mO/2MBhZOMRXB4XkfaEiY2p+h8mD8OBd4U1zfPiQ3BuKoVwWbf2ski/Cc+TvOfxT QpO01xxRyLiHWa3dG3T+27zaWcaG96hCPF5P1nHpIMrWd4l1kPGh8RycYujNQQJUiVPl5+ uPIKcwxBcy2hngodxGz3SoUlayh91Iw= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 01/14] mm: memcg: introduce memcontrol-v1.c Date: Mon, 24 Jun 2024 17:58:53 -0700 Message-ID: <20240625005906.106920-2-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 82718140015 X-Stat-Signature: 8gao7kzemxoej4uxta6gbxy5853xg6od X-HE-Tag: 1719277170-269098 X-HE-Meta: U2FsdGVkX18IR+561HQq56O2nSR4lWu8ClLowot67YcN/0wwZxev4RBEybNL+WOibEeUZ2tjhgsGzgsKZyht3+SxNjiM2zaU1Ar044cmUNkoteA4FQ/nat13nS3MbrxpDothCFfWFxgR5kinAGb53msNRvRkZMoexKTb+ftQ541NKpjapJuYT9C3ikY+RPN2DRs8CYLkA/Y8IfLt0N2Wti8pB7WBbdf+fWX2fTyhDWFiQEEW5P1LqLlAbQJUQsKV27kkNyjDucOTHRRUV9vLYU361u/upButC1bAhmxahucrYvMgKOJTjL987fchjN520jrU8/9rjl0YYn7X1Q6LpDR4fvn8xA/P9NEu59uCNl70tE4942I0xGrZlYMDPlc6BFiuLQptqh99z7eg4NtzpKsTKtwPOCpz96ZGgqW5XybmYqNA9SB2c9H96hhLkWR+e/CHtdPoXY6OR0Q21pvmvrTXj1o/sekvfUiJGQM6RcCwjSn+lrObSnYF5ZRckAl86hcu4BisYRy/sjQC1DKsW0waQsp/58+pVEBPJa5QZcBWoyHDAHXjuhVjNaoBNTiiXYpbg/+e5pM9Qd/YqMoEALRCBigBbheYeyDH8yIvQ2J9K3w/6GrBy+uN2FbcfpDSRPF8JZZK1SxPv8yEArueM8IQN9G3ipFyDz9jpg4aq/mLrsA+HdLVHgBbDzud/HetEhYIkOhdSRbkqN6pB/nWgkKnLJYQVHPasvJBE+JRwy5s8yqbCP1DmvXuKkhkL05+8cWdssh0yRyprYxUigwdksBEUH6Z0TUPM6hePX/5IlwO4im4CqPpoW6E7GFfoAqZRlgSVBmmLP5BkTFnlFp2Nq+06cnUe9bri+3PugoaWal1ZpzvieZDj/UF/EFR0OBJ4/g85q6uVqO4q+uZrhq9kTxPcGWjUIHe0Z+RARaq7lHpNSzlqZv+AJ5hd967l61P9UzEOwFMTBvUEWVjar0 WZQXsPik enbVTofd061haBqFnC0/DHL4zWtWgJjQbg+ejCEMEUxfmWENOSdzkR0pIE3HsPfV1S3J5i79cAQGVsgdOZq1/HZNZN1CfOfF9duqVtAZ9fL60Xxc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch introduces the mm/memcontrol-v1.c source file which will be used for all legacy (cgroup v1) memory cgroup code. It also introduces mm/memcontrol-v1.h to keep declarations shared between mm/memcontrol.c and mm/memcontrol-v1.c. As of now, let's compile it if CONFIG_MEMCG is set, similar to mm/memcontrol.c. Later on it can be switched to use a separate config option, so that the legacy code won't be compiled if not required. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- mm/Makefile | 3 ++- mm/memcontrol-v1.c | 3 +++ mm/memcontrol-v1.h | 7 +++++++ 3 files changed, 12 insertions(+), 1 deletion(-) create mode 100644 mm/memcontrol-v1.c create mode 100644 mm/memcontrol-v1.h diff --git a/mm/Makefile b/mm/Makefile index 8fb85acda1b1..124d4dea2035 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -26,6 +26,7 @@ KCOV_INSTRUMENT_page_alloc.o := n KCOV_INSTRUMENT_debug-pagealloc.o := n KCOV_INSTRUMENT_kmemleak.o := n KCOV_INSTRUMENT_memcontrol.o := n +KCOV_INSTRUMENT_memcontrol-v1.o := n KCOV_INSTRUMENT_mmzone.o := n KCOV_INSTRUMENT_vmstat.o := n KCOV_INSTRUMENT_failslab.o := n @@ -95,7 +96,7 @@ obj-$(CONFIG_NUMA) += memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o -obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o +obj-$(CONFIG_MEMCG) += memcontrol.o memcontrol-v1.o vmpressure.o ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) += swap_cgroup.o endif diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c new file mode 100644 index 000000000000..a941446ba575 --- /dev/null +++ b/mm/memcontrol-v1.c @@ -0,0 +1,3 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include "memcontrol-v1.h" diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h new file mode 100644 index 000000000000..7c5f094755ff --- /dev/null +++ b/mm/memcontrol-v1.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ + +#ifndef __MM_MEMCONTROL_V1_H +#define __MM_MEMCONTROL_V1_H + + +#endif /* __MM_MEMCONTROL_V1_H */ From patchwork Tue Jun 25 00:58:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710396 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9279C2BD09 for ; Tue, 25 Jun 2024 00:59:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 52DA46B034C; Mon, 24 Jun 2024 20:59:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 48ECF6B034D; Mon, 24 Jun 2024 20:59:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2BC136B0350; Mon, 24 Jun 2024 20:59:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F229D6B034C for ; Mon, 24 Jun 2024 20:59:35 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 69B29121523 for ; Tue, 25 Jun 2024 00:59:35 +0000 (UTC) X-FDA: 82267603110.04.C38B443 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) by imf21.hostedemail.com (Postfix) with ESMTP id 41D811C0016 for ; Tue, 25 Jun 2024 00:59:32 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=GmhrTbP+; spf=pass (imf21.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.185 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277161; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rjhMCfmQ5JlzFrU4RV8zWub/Fm9YmIR6Xcl5saEXBXg=; b=Rz1bCRaDk4CbUn820i1dbS+0n/mt4kdJ50RGqIxuea6wRSZUOR6NPe0WE/cU0+B9f5b/vZ fAC39UAgKRw9ppjvUadQkt16P1YpyIjJsi/jnOrjIAEbEq2G+q4WjdX8EXIQcdWJH/Oa4b fDy0JskM/PqH0rKL/ZxZaBcnbTV8Xmk= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=GmhrTbP+; spf=pass (imf21.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.185 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277161; a=rsa-sha256; cv=none; b=rXNYC2kh6VxMwqnzIcTj3sxdb6gy+r0L6LV6k4ef7iEtPKwjIQYD4XaxcmI8lBX5Oqzi/T Xbl4Hews1KSrdqsVZ+stEntxtVZiF3rOXbfOy9EsMoCOsG4vO8j289+7H4/G9mnYhsTRa4 S98giZS7arHi2/loKE9mLWk6nX4QvMs= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277170; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rjhMCfmQ5JlzFrU4RV8zWub/Fm9YmIR6Xcl5saEXBXg=; b=GmhrTbP+cQCGqW0v1/PBAdYCK9w59SjAQqjOeUubJgdxGDNfRE7O+OSVZO2hMOj0thmYpL 0HIc5CZIo4WY5zNd2rGVoPrZLXvIDmkwoMrtij56Tjl4iPPSRcrbBeSLihGhfEibFgsv8c pFkIK2YNYrNSp9nfGFUPwklXrLeEJ7I= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 02/14] mm: memcg: move soft limit reclaim code to memcontrol-v1.c Date: Mon, 24 Jun 2024 17:58:54 -0700 Message-ID: <20240625005906.106920-3-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Stat-Signature: 9dnub9wdb85m364hrryez9bo5riaub1k X-Rspam-User: X-Rspamd-Queue-Id: 41D811C0016 X-Rspamd-Server: rspam02 X-HE-Tag: 1719277171-557579 X-HE-Meta: U2FsdGVkX18kxX5A2LQQ75hk537FgZYijILPRsR+SvFnZoQUCzOfsGsIzbJj7+ccPCL1TZX4AU2SAXwt2DeY7TkG7bL7neyQSabjISDqiBZT4AauGCI6D19ge5nRqQfGSxqkaeTeLgkGHLneozqipfoz99bxl6tNz1QCqPOxZf6lBDSrf6M6eF05/KXS8VesvVpIlcizPPjdsmWpJI2RVfyAH79ZiDsvNG1LpWlzlba7GykQHhv0J3sy8fmVDv8fD+JfWYITTxEA4ghSh0kBF7EZpT0mxOna6b2rMtcW3wioBV84g1PdHtN9Y275Tke443wrPYahQlewK7dYvTu4Qpmi2XD1aK1oj6gUfVzKs+5Npz/zGGHyogRxGPB2m2Oo3VLTfmUr6eG7DOj9Q+WrjHKvJvxi+U2XhQeH7KFoJe7MCpD2QyqTjcIJwZAzHUnEvLDnZjhMZTT9Oo32x+hYrRaLqYSqxgJaF8ITAXNRE2hnfgMENFmsLhOTMmHi8PyiKB5WAjGUOHqEdpKy9wSJPrNxqBYA056w7VgpyKCUPXa4tMeuDj0d+Zj0JendMYb+rx9KEHgMoLTat3SvwcwljJIZMsGQmSVk64lZP1/73lttaOuaBAOMOPkPPfiaxKyoGeruEWQVbWsGJxE412IWQEACUF4J0kOZo4CqOvfiVQAiTCBQX058z6kd4U+j3mQ0VKRa9Jw2QKVyfKPkljNZmOfO0IS18CVoBWyXy4abRx/aMRf7VMCNkHBlZGNmpqTakJSu6E2kSaseJwIWPneMlbXLyDhle6hNMdAjWPkY4W+c5RoZhU+1SUMlWs8UKEqtddRSARWNlH6H34cw5TuePk5Xv1HONjZ85Gb5Ku7DZL86gxBPrhP3uUBc+nkmWhkouLtOLg3SQF9dT/B35bSHP7ctuX7SS7ADjChtmug26lYxkymExHD16nAMZ1oDbIhoJ/OHCFG0/v2oi78WgfD bzCt3MjG YZmBuheB7CewknKZp+e3ahiBtTw1pa9jTXCVmYOlshewKEePOzQlKe5h71eS8l6LixIq+t64Ot0QVO5wBQwlq4rxi1bMcpT7LevM8mxh+zggOcz8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Soft limits are cgroup v1-specific and are not supported by cgroup v2, so let's move the corresponding code into memcontrol-v1.c. Aside from simple moving the code, this commits introduces a trivial memcg1_soft_limit_reset() function to reset soft limits and also moves the global soft limit tree initialization code into a new memcg1_init() function. It also moves corresponding declarations shared between memcontrol.c and memcontrol-v1.c into mm/memcontrol-v1.h. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- mm/memcontrol-v1.c | 342 +++++++++++++++++++++++++++++++++++++++++++++ mm/memcontrol-v1.h | 7 + mm/memcontrol.c | 337 +------------------------------------------- 3 files changed, 353 insertions(+), 333 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index a941446ba575..2ccb8406fa84 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -1,3 +1,345 @@ // SPDX-License-Identifier: GPL-2.0-or-later +#include +#include +#include + #include "memcontrol-v1.h" + +/* + * Cgroups above their limits are maintained in a RB-Tree, independent of + * their hierarchy representation + */ + +struct mem_cgroup_tree_per_node { + struct rb_root rb_root; + struct rb_node *rb_rightmost; + spinlock_t lock; +}; + +struct mem_cgroup_tree { + struct mem_cgroup_tree_per_node *rb_tree_per_node[MAX_NUMNODES]; +}; + +static struct mem_cgroup_tree soft_limit_tree __read_mostly; + +/* + * Maximum loops in mem_cgroup_soft_reclaim(), used for soft + * limit reclaim to prevent infinite loops, if they ever occur. + */ +#define MEM_CGROUP_MAX_RECLAIM_LOOPS 100 +#define MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS 2 + +static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, + struct mem_cgroup_tree_per_node *mctz, + unsigned long new_usage_in_excess) +{ + struct rb_node **p = &mctz->rb_root.rb_node; + struct rb_node *parent = NULL; + struct mem_cgroup_per_node *mz_node; + bool rightmost = true; + + if (mz->on_tree) + return; + + mz->usage_in_excess = new_usage_in_excess; + if (!mz->usage_in_excess) + return; + while (*p) { + parent = *p; + mz_node = rb_entry(parent, struct mem_cgroup_per_node, + tree_node); + if (mz->usage_in_excess < mz_node->usage_in_excess) { + p = &(*p)->rb_left; + rightmost = false; + } else { + p = &(*p)->rb_right; + } + } + + if (rightmost) + mctz->rb_rightmost = &mz->tree_node; + + rb_link_node(&mz->tree_node, parent, p); + rb_insert_color(&mz->tree_node, &mctz->rb_root); + mz->on_tree = true; +} + +static void __mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, + struct mem_cgroup_tree_per_node *mctz) +{ + if (!mz->on_tree) + return; + + if (&mz->tree_node == mctz->rb_rightmost) + mctz->rb_rightmost = rb_prev(&mz->tree_node); + + rb_erase(&mz->tree_node, &mctz->rb_root); + mz->on_tree = false; +} + +static void mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, + struct mem_cgroup_tree_per_node *mctz) +{ + unsigned long flags; + + spin_lock_irqsave(&mctz->lock, flags); + __mem_cgroup_remove_exceeded(mz, mctz); + spin_unlock_irqrestore(&mctz->lock, flags); +} + +static unsigned long soft_limit_excess(struct mem_cgroup *memcg) +{ + unsigned long nr_pages = page_counter_read(&memcg->memory); + unsigned long soft_limit = READ_ONCE(memcg->soft_limit); + unsigned long excess = 0; + + if (nr_pages > soft_limit) + excess = nr_pages - soft_limit; + + return excess; +} + +void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid) +{ + unsigned long excess; + struct mem_cgroup_per_node *mz; + struct mem_cgroup_tree_per_node *mctz; + + if (lru_gen_enabled()) { + if (soft_limit_excess(memcg)) + lru_gen_soft_reclaim(memcg, nid); + return; + } + + mctz = soft_limit_tree.rb_tree_per_node[nid]; + if (!mctz) + return; + /* + * Necessary to update all ancestors when hierarchy is used. + * because their event counter is not touched. + */ + for (; memcg; memcg = parent_mem_cgroup(memcg)) { + mz = memcg->nodeinfo[nid]; + excess = soft_limit_excess(memcg); + /* + * We have to update the tree if mz is on RB-tree or + * mem is over its softlimit. + */ + if (excess || mz->on_tree) { + unsigned long flags; + + spin_lock_irqsave(&mctz->lock, flags); + /* if on-tree, remove it */ + if (mz->on_tree) + __mem_cgroup_remove_exceeded(mz, mctz); + /* + * Insert again. mz->usage_in_excess will be updated. + * If excess is 0, no tree ops. + */ + __mem_cgroup_insert_exceeded(mz, mctz, excess); + spin_unlock_irqrestore(&mctz->lock, flags); + } + } +} + +void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) +{ + struct mem_cgroup_tree_per_node *mctz; + struct mem_cgroup_per_node *mz; + int nid; + + for_each_node(nid) { + mz = memcg->nodeinfo[nid]; + mctz = soft_limit_tree.rb_tree_per_node[nid]; + if (mctz) + mem_cgroup_remove_exceeded(mz, mctz); + } +} + +static struct mem_cgroup_per_node * +__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) +{ + struct mem_cgroup_per_node *mz; + +retry: + mz = NULL; + if (!mctz->rb_rightmost) + goto done; /* Nothing to reclaim from */ + + mz = rb_entry(mctz->rb_rightmost, + struct mem_cgroup_per_node, tree_node); + /* + * Remove the node now but someone else can add it back, + * we will to add it back at the end of reclaim to its correct + * position in the tree. + */ + __mem_cgroup_remove_exceeded(mz, mctz); + if (!soft_limit_excess(mz->memcg) || + !css_tryget(&mz->memcg->css)) + goto retry; +done: + return mz; +} + +static struct mem_cgroup_per_node * +mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) +{ + struct mem_cgroup_per_node *mz; + + spin_lock_irq(&mctz->lock); + mz = __mem_cgroup_largest_soft_limit_node(mctz); + spin_unlock_irq(&mctz->lock); + return mz; +} + +static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, + pg_data_t *pgdat, + gfp_t gfp_mask, + unsigned long *total_scanned) +{ + struct mem_cgroup *victim = NULL; + int total = 0; + int loop = 0; + unsigned long excess; + unsigned long nr_scanned; + struct mem_cgroup_reclaim_cookie reclaim = { + .pgdat = pgdat, + }; + + excess = soft_limit_excess(root_memcg); + + while (1) { + victim = mem_cgroup_iter(root_memcg, victim, &reclaim); + if (!victim) { + loop++; + if (loop >= 2) { + /* + * If we have not been able to reclaim + * anything, it might because there are + * no reclaimable pages under this hierarchy + */ + if (!total) + break; + /* + * We want to do more targeted reclaim. + * excess >> 2 is not to excessive so as to + * reclaim too much, nor too less that we keep + * coming back to reclaim from this cgroup + */ + if (total >= (excess >> 2) || + (loop > MEM_CGROUP_MAX_RECLAIM_LOOPS)) + break; + } + continue; + } + total += mem_cgroup_shrink_node(victim, gfp_mask, false, + pgdat, &nr_scanned); + *total_scanned += nr_scanned; + if (!soft_limit_excess(root_memcg)) + break; + } + mem_cgroup_iter_break(root_memcg, victim); + return total; +} + +unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned) +{ + unsigned long nr_reclaimed = 0; + struct mem_cgroup_per_node *mz, *next_mz = NULL; + unsigned long reclaimed; + int loop = 0; + struct mem_cgroup_tree_per_node *mctz; + unsigned long excess; + + if (lru_gen_enabled()) + return 0; + + if (order > 0) + return 0; + + mctz = soft_limit_tree.rb_tree_per_node[pgdat->node_id]; + + /* + * Do not even bother to check the largest node if the root + * is empty. Do it lockless to prevent lock bouncing. Races + * are acceptable as soft limit is best effort anyway. + */ + if (!mctz || RB_EMPTY_ROOT(&mctz->rb_root)) + return 0; + + /* + * This loop can run a while, specially if mem_cgroup's continuously + * keep exceeding their soft limit and putting the system under + * pressure + */ + do { + if (next_mz) + mz = next_mz; + else + mz = mem_cgroup_largest_soft_limit_node(mctz); + if (!mz) + break; + + reclaimed = mem_cgroup_soft_reclaim(mz->memcg, pgdat, + gfp_mask, total_scanned); + nr_reclaimed += reclaimed; + spin_lock_irq(&mctz->lock); + + /* + * If we failed to reclaim anything from this memory cgroup + * it is time to move on to the next cgroup + */ + next_mz = NULL; + if (!reclaimed) + next_mz = __mem_cgroup_largest_soft_limit_node(mctz); + + excess = soft_limit_excess(mz->memcg); + /* + * One school of thought says that we should not add + * back the node to the tree if reclaim returns 0. + * But our reclaim could return 0, simply because due + * to priority we are exposing a smaller subset of + * memory to reclaim from. Consider this as a longer + * term TODO. + */ + /* If excess == 0, no tree ops */ + __mem_cgroup_insert_exceeded(mz, mctz, excess); + spin_unlock_irq(&mctz->lock); + css_put(&mz->memcg->css); + loop++; + /* + * Could not reclaim anything and there are no more + * mem cgroups to try or we seem to be looping without + * reclaiming anything. + */ + if (!nr_reclaimed && + (next_mz == NULL || + loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS)) + break; + } while (!nr_reclaimed); + if (next_mz) + css_put(&next_mz->memcg->css); + return nr_reclaimed; +} + +static int __init memcg1_init(void) +{ + int node; + + for_each_node(node) { + struct mem_cgroup_tree_per_node *rtpn; + + rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL, node); + + rtpn->rb_root = RB_ROOT; + rtpn->rb_rightmost = NULL; + spin_lock_init(&rtpn->lock); + soft_limit_tree.rb_tree_per_node[node] = rtpn; + } + + return 0; +} +subsys_initcall(memcg1_init); diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 7c5f094755ff..4da6fa561c6d 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -3,5 +3,12 @@ #ifndef __MM_MEMCONTROL_V1_H #define __MM_MEMCONTROL_V1_H +void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid); +void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg); + +static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) +{ + WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); +} #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 974bd160838c..003e944f34ea 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -72,6 +72,7 @@ #include #include "slab.h" #include "swap.h" +#include "memcontrol-v1.h" #include @@ -108,23 +109,6 @@ static bool do_memsw_account(void) #define THRESHOLDS_EVENTS_TARGET 128 #define SOFTLIMIT_EVENTS_TARGET 1024 -/* - * Cgroups above their limits are maintained in a RB-Tree, independent of - * their hierarchy representation - */ - -struct mem_cgroup_tree_per_node { - struct rb_root rb_root; - struct rb_node *rb_rightmost; - spinlock_t lock; -}; - -struct mem_cgroup_tree { - struct mem_cgroup_tree_per_node *rb_tree_per_node[MAX_NUMNODES]; -}; - -static struct mem_cgroup_tree soft_limit_tree __read_mostly; - /* for OOM */ struct mem_cgroup_eventfd_list { struct list_head list; @@ -199,13 +183,6 @@ static struct move_charge_struct { .waitq = __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), }; -/* - * Maximum loops in mem_cgroup_soft_reclaim(), used for soft - * limit reclaim to prevent infinite loops, if they ever occur. - */ -#define MEM_CGROUP_MAX_RECLAIM_LOOPS 100 -#define MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS 2 - /* for encoding cft->private value on file */ enum res_type { _MEM, @@ -413,169 +390,6 @@ ino_t page_cgroup_ino(struct page *page) return ino; } -static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz, - unsigned long new_usage_in_excess) -{ - struct rb_node **p = &mctz->rb_root.rb_node; - struct rb_node *parent = NULL; - struct mem_cgroup_per_node *mz_node; - bool rightmost = true; - - if (mz->on_tree) - return; - - mz->usage_in_excess = new_usage_in_excess; - if (!mz->usage_in_excess) - return; - while (*p) { - parent = *p; - mz_node = rb_entry(parent, struct mem_cgroup_per_node, - tree_node); - if (mz->usage_in_excess < mz_node->usage_in_excess) { - p = &(*p)->rb_left; - rightmost = false; - } else { - p = &(*p)->rb_right; - } - } - - if (rightmost) - mctz->rb_rightmost = &mz->tree_node; - - rb_link_node(&mz->tree_node, parent, p); - rb_insert_color(&mz->tree_node, &mctz->rb_root); - mz->on_tree = true; -} - -static void __mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz) -{ - if (!mz->on_tree) - return; - - if (&mz->tree_node == mctz->rb_rightmost) - mctz->rb_rightmost = rb_prev(&mz->tree_node); - - rb_erase(&mz->tree_node, &mctz->rb_root); - mz->on_tree = false; -} - -static void mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz) -{ - unsigned long flags; - - spin_lock_irqsave(&mctz->lock, flags); - __mem_cgroup_remove_exceeded(mz, mctz); - spin_unlock_irqrestore(&mctz->lock, flags); -} - -static unsigned long soft_limit_excess(struct mem_cgroup *memcg) -{ - unsigned long nr_pages = page_counter_read(&memcg->memory); - unsigned long soft_limit = READ_ONCE(memcg->soft_limit); - unsigned long excess = 0; - - if (nr_pages > soft_limit) - excess = nr_pages - soft_limit; - - return excess; -} - -static void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid) -{ - unsigned long excess; - struct mem_cgroup_per_node *mz; - struct mem_cgroup_tree_per_node *mctz; - - if (lru_gen_enabled()) { - if (soft_limit_excess(memcg)) - lru_gen_soft_reclaim(memcg, nid); - return; - } - - mctz = soft_limit_tree.rb_tree_per_node[nid]; - if (!mctz) - return; - /* - * Necessary to update all ancestors when hierarchy is used. - * because their event counter is not touched. - */ - for (; memcg; memcg = parent_mem_cgroup(memcg)) { - mz = memcg->nodeinfo[nid]; - excess = soft_limit_excess(memcg); - /* - * We have to update the tree if mz is on RB-tree or - * mem is over its softlimit. - */ - if (excess || mz->on_tree) { - unsigned long flags; - - spin_lock_irqsave(&mctz->lock, flags); - /* if on-tree, remove it */ - if (mz->on_tree) - __mem_cgroup_remove_exceeded(mz, mctz); - /* - * Insert again. mz->usage_in_excess will be updated. - * If excess is 0, no tree ops. - */ - __mem_cgroup_insert_exceeded(mz, mctz, excess); - spin_unlock_irqrestore(&mctz->lock, flags); - } - } -} - -static void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) -{ - struct mem_cgroup_tree_per_node *mctz; - struct mem_cgroup_per_node *mz; - int nid; - - for_each_node(nid) { - mz = memcg->nodeinfo[nid]; - mctz = soft_limit_tree.rb_tree_per_node[nid]; - if (mctz) - mem_cgroup_remove_exceeded(mz, mctz); - } -} - -static struct mem_cgroup_per_node * -__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) -{ - struct mem_cgroup_per_node *mz; - -retry: - mz = NULL; - if (!mctz->rb_rightmost) - goto done; /* Nothing to reclaim from */ - - mz = rb_entry(mctz->rb_rightmost, - struct mem_cgroup_per_node, tree_node); - /* - * Remove the node now but someone else can add it back, - * we will to add it back at the end of reclaim to its correct - * position in the tree. - */ - __mem_cgroup_remove_exceeded(mz, mctz); - if (!soft_limit_excess(mz->memcg) || - !css_tryget(&mz->memcg->css)) - goto retry; -done: - return mz; -} - -static struct mem_cgroup_per_node * -mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) -{ - struct mem_cgroup_per_node *mz; - - spin_lock_irq(&mctz->lock); - mz = __mem_cgroup_largest_soft_limit_node(mctz); - spin_unlock_irq(&mctz->lock); - return mz; -} - /* Subset of node_stat_item for memcg stats */ static const unsigned int memcg_node_stat_items[] = { NR_INACTIVE_ANON, @@ -1980,56 +1794,6 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, return ret; } -static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, - pg_data_t *pgdat, - gfp_t gfp_mask, - unsigned long *total_scanned) -{ - struct mem_cgroup *victim = NULL; - int total = 0; - int loop = 0; - unsigned long excess; - unsigned long nr_scanned; - struct mem_cgroup_reclaim_cookie reclaim = { - .pgdat = pgdat, - }; - - excess = soft_limit_excess(root_memcg); - - while (1) { - victim = mem_cgroup_iter(root_memcg, victim, &reclaim); - if (!victim) { - loop++; - if (loop >= 2) { - /* - * If we have not been able to reclaim - * anything, it might because there are - * no reclaimable pages under this hierarchy - */ - if (!total) - break; - /* - * We want to do more targeted reclaim. - * excess >> 2 is not to excessive so as to - * reclaim too much, nor too less that we keep - * coming back to reclaim from this cgroup - */ - if (total >= (excess >> 2) || - (loop > MEM_CGROUP_MAX_RECLAIM_LOOPS)) - break; - } - continue; - } - total += mem_cgroup_shrink_node(victim, gfp_mask, false, - pgdat, &nr_scanned); - *total_scanned += nr_scanned; - if (!soft_limit_excess(root_memcg)) - break; - } - mem_cgroup_iter_break(root_memcg, victim); - return total; -} - #ifdef CONFIG_LOCKDEP static struct lockdep_map memcg_oom_lock_dep_map = { .name = "memcg_oom_lock", @@ -3925,88 +3689,6 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, return ret; } -unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned) -{ - unsigned long nr_reclaimed = 0; - struct mem_cgroup_per_node *mz, *next_mz = NULL; - unsigned long reclaimed; - int loop = 0; - struct mem_cgroup_tree_per_node *mctz; - unsigned long excess; - - if (lru_gen_enabled()) - return 0; - - if (order > 0) - return 0; - - mctz = soft_limit_tree.rb_tree_per_node[pgdat->node_id]; - - /* - * Do not even bother to check the largest node if the root - * is empty. Do it lockless to prevent lock bouncing. Races - * are acceptable as soft limit is best effort anyway. - */ - if (!mctz || RB_EMPTY_ROOT(&mctz->rb_root)) - return 0; - - /* - * This loop can run a while, specially if mem_cgroup's continuously - * keep exceeding their soft limit and putting the system under - * pressure - */ - do { - if (next_mz) - mz = next_mz; - else - mz = mem_cgroup_largest_soft_limit_node(mctz); - if (!mz) - break; - - reclaimed = mem_cgroup_soft_reclaim(mz->memcg, pgdat, - gfp_mask, total_scanned); - nr_reclaimed += reclaimed; - spin_lock_irq(&mctz->lock); - - /* - * If we failed to reclaim anything from this memory cgroup - * it is time to move on to the next cgroup - */ - next_mz = NULL; - if (!reclaimed) - next_mz = __mem_cgroup_largest_soft_limit_node(mctz); - - excess = soft_limit_excess(mz->memcg); - /* - * One school of thought says that we should not add - * back the node to the tree if reclaim returns 0. - * But our reclaim could return 0, simply because due - * to priority we are exposing a smaller subset of - * memory to reclaim from. Consider this as a longer - * term TODO. - */ - /* If excess == 0, no tree ops */ - __mem_cgroup_insert_exceeded(mz, mctz, excess); - spin_unlock_irq(&mctz->lock); - css_put(&mz->memcg->css); - loop++; - /* - * Could not reclaim anything and there are no more - * mem cgroups to try or we seem to be looping without - * reclaiming anything. - */ - if (!nr_reclaimed && - (next_mz == NULL || - loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS)) - break; - } while (!nr_reclaimed); - if (next_mz) - css_put(&next_mz->memcg->css); - return nr_reclaimed; -} - /* * Reclaims as many pages from the given memcg as possible. * @@ -5784,7 +5466,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) return ERR_CAST(memcg); page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX); - WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); + memcg1_soft_limit_reset(memcg); #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) memcg->zswap_max = PAGE_COUNTER_MAX; WRITE_ONCE(memcg->zswap_writeback, @@ -5957,7 +5639,7 @@ static void mem_cgroup_css_reset(struct cgroup_subsys_state *css) page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX); - WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); + memcg1_soft_limit_reset(memcg); page_counter_set_high(&memcg->swap, PAGE_COUNTER_MAX); memcg_wb_domain_size_changed(memcg); } @@ -7984,7 +7666,7 @@ __setup("cgroup.memory=", cgroup_memory); */ static int __init mem_cgroup_init(void) { - int cpu, node; + int cpu; /* * Currently s32 type (can refer to struct batched_lruvec_stat) is @@ -8001,17 +7683,6 @@ static int __init mem_cgroup_init(void) INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work, drain_local_stock); - for_each_node(node) { - struct mem_cgroup_tree_per_node *rtpn; - - rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL, node); - - rtpn->rb_root = RB_ROOT; - rtpn->rb_rightmost = NULL; - spin_lock_init(&rtpn->lock); - soft_limit_tree.rb_tree_per_node[node] = rtpn; - } - return 0; } subsys_initcall(mem_cgroup_init); From patchwork Tue Jun 25 00:58:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6459FC3064D for ; Tue, 25 Jun 2024 00:59:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B974F6B0350; Mon, 24 Jun 2024 20:59:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AFBB26B0351; Mon, 24 Jun 2024 20:59:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 923376B0353; Mon, 24 Jun 2024 20:59:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 71AB46B0350 for ; Mon, 24 Jun 2024 20:59:37 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 196991614C9 for ; Tue, 25 Jun 2024 00:59:37 +0000 (UTC) X-FDA: 82267603194.21.A48FDDF Received: from out-176.mta0.migadu.com (out-176.mta0.migadu.com [91.218.175.176]) by imf08.hostedemail.com (Postfix) with ESMTP id C90B4160016 for ; Tue, 25 Jun 2024 00:59:34 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=vC47+CqM; spf=pass (imf08.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.176 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277167; a=rsa-sha256; cv=none; b=zlPb42l9JZjOQ/caaTioEA0luqTrXocBNVquHeUBXgoNiEouXyw4a44eytXfv8CDykLUhG L+Rtfe51ZalGkjtbDsLgzljF92qU8DFfjsbaYQ3qTeSps28FwVhhZCO7g3AKLGS3dRqsvc bC366s9Auk3JtpjAE2jrfl4PdKDsA9c= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=vC47+CqM; spf=pass (imf08.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.176 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277167; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jqHMldr5yb8EHMopv4QA8bs9iV164cfr+gO2HUVb6OI=; b=yRNiC5rs3sq3VogbB5VqNI4LLlY5RXHLEH32emaFPSBs08kz/nP0HSbgnvqdOhIV17twFY xfttOB+2s44MMwQrIchhe2D22gKA4cUKzNruprWUfSAMFZiS8ZM3cUASenamnEpWYknoZv 18cYlh2cMeRW9IXz3i9sZIksLs2pbxE= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277173; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jqHMldr5yb8EHMopv4QA8bs9iV164cfr+gO2HUVb6OI=; b=vC47+CqMGWRzP59I6JXMOgvoaLbgn6Avaqr2H6e/hxJishCg/ejeMxkMXhSHpjUjIptXFO LFVdS25i0wSGT3zri597XOuNN81MUCYX5Yb7NzyvW3ye0M4dADWj+tGmjFyCUAN5oPQFgE 8zmBGd0hMknLsB9MyuQU7jNjIDMmrdA= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 03/14] mm: memcg: rename soft limit reclaim-related functions Date: Mon, 24 Jun 2024 17:58:55 -0700 Message-ID: <20240625005906.106920-4-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Stat-Signature: hex1gffagyk89us8ktxnbyf8s8y3eyec X-Rspamd-Queue-Id: C90B4160016 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1719277174-541185 X-HE-Meta: U2FsdGVkX1/JFK7MY8quX2Wu2vxv+bJ2oEF9DExwQhfozPXgUZsVaBTBONqYSn4RyFPBis89jXVCCUd4xHsJeHXSRL/YFOkO1in7F63haRmaWOkUQNqxUNTBCzUKQl+j3bJPUreav8xU8sgM+QO/m8T66SptB2BryZRzqJ5e7aulvtS7I2gt2f/4/Ua8+E0PXfwdLHqEYXJX3enEhICDvHDB4NbHG5dbufw+vQCWlQ1qIBt9OVSjCHS9kHJXGMAYHsHCV8EqKoKfHZeOw7b+v8ZQPNOySHiTkC6RUcvtIU++Lfc4+ioLWq4V257P4b3dWlszLI4PhBuJQ+Y4R77ts8Hvqr2JvKwUsZLD2itDFUh5s6W/tffGPfzduIX6f4gIEa5nL8Wlm3l4/8jrywjiiigA+sxU6gSaOfvNcRe5wDsZEl1yC++B0UDktvvoXYNRH/7aSJKjLCzDvIB6qUqrjIybt3/DqMR3+qHFrW9TcGOauWq4XmlxetCebWA9Yl74gMjD90gH6GBnDrVvAreE368ByByr0FoieD3iZNjdVJRVSyGDvx01PQLLxi1or6D6sDNkUFTgE7Y3t5DPOdOA3xdQOAMLxXhhZuL1Ru4AnngmIlcjPI/YU3gQAL/zBX3k18brj6er/d7a7l3upa5KFzH5LEhNXK9yzGl4IzEJi+ZmWThfSTlXyvZ5MP6aNg8+2Yi/dOCNX5nOvLdCPNZrVSwzpfGBMItqXy5Er3eyIVqZexU1KJ/UOkxq6RP470MiJ3VwuyDC2tIbFMq6p8HfevksRlp5G+2lpRtAK8xBy0NIncI8cldjcNV5ANxdWJMEEcmB3qI+d44G/DHYKnfNlQ4mDKEpF10kBpejc2yTVYQj34g7LIOFi7SczUFYJoms1i5IUgAQxx+pz/M+kbm+QZeTS1PSGN1udgv7rLP3dCMUHg18i9+k5mMI814Ys4NRQuY6NrNRrokUYRi++Hi r3VmTAfy VWbGxCFXVlR/KplchALyNRjfXbErHdEhuw0ictwiHoPLrojtStqQNbYUGT4q1aFXyJLBrCLF2lc+K4SSf7tkih1SRRg3fxF1Gow6h3Q47mioOqJF+HMG5TRI0Ow== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Rename exported function related to the softlimit reclaim to have memcg1_ prefix. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- include/linux/memcontrol.h | 12 ++++++------ mm/memcontrol-v1.c | 6 +++--- mm/memcontrol-v1.h | 4 ++-- mm/memcontrol.c | 4 ++-- mm/vmscan.c | 10 +++++----- 5 files changed, 18 insertions(+), 18 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 7403dd5926eb..83c8327455d8 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1121,9 +1121,9 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, void split_page_memcg(struct page *head, int old_order, int new_order); -unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned); +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned); #else /* CONFIG_MEMCG */ @@ -1572,9 +1572,9 @@ static inline void split_page_memcg(struct page *head, int old_order, int new_or } static inline -unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned) +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned) { return 0; } diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 2ccb8406fa84..68e2f1a718d3 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -100,7 +100,7 @@ static unsigned long soft_limit_excess(struct mem_cgroup *memcg) return excess; } -void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid) +void memcg1_update_tree(struct mem_cgroup *memcg, int nid) { unsigned long excess; struct mem_cgroup_per_node *mz; @@ -143,7 +143,7 @@ void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid) } } -void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) +void memcg1_remove_from_trees(struct mem_cgroup *memcg) { struct mem_cgroup_tree_per_node *mctz; struct mem_cgroup_per_node *mz; @@ -243,7 +243,7 @@ static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg, return total; } -unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, unsigned long *total_scanned) { diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 4da6fa561c6d..e37bc7e8d955 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -3,8 +3,8 @@ #ifndef __MM_MEMCONTROL_V1_H #define __MM_MEMCONTROL_V1_H -void mem_cgroup_update_tree(struct mem_cgroup *memcg, int nid); -void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg); +void memcg1_update_tree(struct mem_cgroup *memcg, int nid); +void memcg1_remove_from_trees(struct mem_cgroup *memcg); static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 003e944f34ea..3479e1af12d5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1012,7 +1012,7 @@ static void memcg_check_events(struct mem_cgroup *memcg, int nid) MEM_CGROUP_TARGET_SOFTLIMIT); mem_cgroup_threshold(memcg); if (unlikely(do_softlimit)) - mem_cgroup_update_tree(memcg, nid); + memcg1_update_tree(memcg, nid); } } @@ -5610,7 +5610,7 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css) vmpressure_cleanup(&memcg->vmpressure); cancel_work_sync(&memcg->high_work); - mem_cgroup_remove_from_trees(memcg); + memcg1_remove_from_trees(memcg); free_shrinker_info(memcg); mem_cgroup_free(memcg); } diff --git a/mm/vmscan.c b/mm/vmscan.c index 900bad16e506..3d4c681c6d40 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6186,9 +6186,9 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) * and balancing, not for a memcg's limit. */ nr_soft_scanned = 0; - nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone->zone_pgdat, - sc->order, sc->gfp_mask, - &nr_soft_scanned); + nr_soft_reclaimed = memcg1_soft_limit_reclaim(zone->zone_pgdat, + sc->order, sc->gfp_mask, + &nr_soft_scanned); sc->nr_reclaimed += nr_soft_reclaimed; sc->nr_scanned += nr_soft_scanned; /* need some check for avoid more shrink_zone() */ @@ -6952,8 +6952,8 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx) /* Call soft limit reclaim before calling shrink_node. */ sc.nr_scanned = 0; nr_soft_scanned = 0; - nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(pgdat, sc.order, - sc.gfp_mask, &nr_soft_scanned); + nr_soft_reclaimed = memcg1_soft_limit_reclaim(pgdat, sc.order, + sc.gfp_mask, &nr_soft_scanned); sc.nr_reclaimed += nr_soft_reclaimed; /* From patchwork Tue Jun 25 00:58:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710398 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44C3FC2BD09 for ; Tue, 25 Jun 2024 00:59:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09B706B0351; Mon, 24 Jun 2024 20:59:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 04C576B0353; Mon, 24 Jun 2024 20:59:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C099A6B0354; Mon, 24 Jun 2024 20:59:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9222C6B0351 for ; Mon, 24 Jun 2024 20:59:39 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4AA771C1C51 for ; Tue, 25 Jun 2024 00:59:39 +0000 (UTC) X-FDA: 82267603278.06.0842822 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) by imf09.hostedemail.com (Postfix) with ESMTP id E2906140011 for ; Tue, 25 Jun 2024 00:59:36 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=KHi0guIT; spf=pass (imf09.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.179 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277163; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LverYrd34j0zpBhJ+AEXga+Lutoo9REj+Fp8pHCDQ84=; b=FKxIyws8ytAjq+ZEM8WQXgOxy16N3DdT0y8RIdHxZOmuQIkX1TLQewJQkQ3GlD3RYpeg4E PTI7Iml92134G/T5nyGJq5PkWBqE4BSu2E/e1qOZCxbc2sbGOT1qiHUpIIy7Z4ZeQf1GXS zANg+A5GDxbEwmptEYzN2w0wemll6IU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277163; a=rsa-sha256; cv=none; b=FuAnFXZnpIPqxJR1NMsc0tCnOTvoKU++6QSyEOzzJiyRqweQ5QtRW/uRQw4elWLPFBN8a7 m4awv59d+wHlNKSUBrxogrZZghdETvJZqC9sd4RdUHmcATYp/Y58wOW0YHSiavcm5ks50g RXAgiMG1EgtYyBPKETUsLvhnho6kXuE= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=KHi0guIT; spf=pass (imf09.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.179 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277175; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LverYrd34j0zpBhJ+AEXga+Lutoo9REj+Fp8pHCDQ84=; b=KHi0guIT/OVgUwFPCNbAUXUMGfHU3m72V1FIE+Z/PTQVJFA4nwMEnPMlyKQUim5WaYO3Ux PiE8k3NXliAjhwjQRb6NO5CbOVQFqd+hXPAcJ/GSvmdNndp/fX+NfUpexf7JLLv7FAoGhG 0NZu4WU+RSBbyL9ffYR+jp9EgmFZIuQ= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 04/14] mm: memcg: move charge migration code to memcontrol-v1.c Date: Mon, 24 Jun 2024 17:58:56 -0700 Message-ID: <20240625005906.106920-5-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: E2906140011 X-Stat-Signature: y4g1n4xonia9pyt94n5m5ddpot3tpb4q X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1719277176-280587 X-HE-Meta: U2FsdGVkX1/yXpPxnt96e/RbLYCRtVejgpqwljs2JcqZaEzLoecpk0f5JhHMAPv73jARt4jRp++EmHcgCvQCyhyT5TVI4qiA4g+/0ieJgH/U+xe4m233ZAw8oVVXGywFRYubra8IrumLoMweie0VVQQyDCS3WoAHi63cslh7Ov+YWmGlSPmGamEjppYNSY93d242EesVdHJR0vKIZJbZZrVt4E9MgaLHaOQt1g/9EFObhxqiZ9vxjdQ2M+T+qBraGFbGb1YrDqxNeBUXA9FGNZ1BdZDyFz28G25QXPz9rKWGSxfZuupGCUqZk15M8RPmjTIjRkqUR+26vVsX4yMctvH53ZHLUFYdjxxkSExqoXSvafMHuzBC8tp7m9+u2fgCTHGZtAYS4sVE4iwNxncyABumnFVz4c2n6+DouWbbkIBmSVsBKcdlMWRDJmSbbzKgP6wNwteNwCBZsD4Q1evd5E2/Hvk/P43oynl08+NidlbCQcZgN0KWnO5gxkse+iuISFItEYe7VhL0sgTfjwkid2Vox4wrR8qbOf54IvWVoqwVaKpzsQFMvkLxu2cbRlwHx6IRqP5pOg/xAEQuPq/9ptN/l/xDX4JvXsPuwO4B+SAlFW5q0XDpQgtE4vqS92vi/jeYVnmt4bNjyiNZdnDGbmsxSZ+OUEmcdeGwNCEhe5AzvR7rF0LxgZUro0r4hf1JXbevDy3t3l0dxmTA6hd+mHDYVB/iTS8Nf5WMB8AxVK/IZadCdTKnmwVul/0yCobf+RaVW3HoKR9f5s8ahu0a1m0vm7Qii14K2vJujEZxK12TQVTpYIajwMBEqhTgq1ArPYgSe9cuPowzsHRlTGuurdBvs/hC4Mre/iofsTn0AgePh7hhm0yrFCZKKTcZmIybUBonxo119I29yKkw6KPGh+28PsYASu6jSNWhYPlv84Uhuypb2nWUS1JPVphv2uYjt5pty0nchN4SE0nVTba Cuhlr65M JX0PzOH0nL5Lhzg1/MxQu/BtiOEiFpc3Xyx1yQTJHbTlpqFwP/G3cOoyipgwhyfOw6xOo4lDwO5kR5NF3/A4yoTjKo8O6CGGagmU5+ENFsWN+jK7uKLM91oXTjbnkjDZOFbC1D1OyCQESzpAgPQGBlTQKJn9J17sJBtsioA5AYX/ARqI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Unlike the legacy cgroup v1 memory controller, cgroup v2 memory controller doesn't support moving charged pages between cgroups. It's a fairly large and complicated code which created a number of problems in the past. Let's move this code into memcontrol-v1.c. It shaves off 1k lines from memcontrol.c. It's also another step towards making the legacy memory controller code optionally compiled. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- mm/memcontrol-v1.c | 981 +++++++++++++++++++++++++++++++++++++++++++ mm/memcontrol-v1.h | 30 ++ mm/memcontrol.c | 1004 +------------------------------------------- 3 files changed, 1019 insertions(+), 996 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 68e2f1a718d3..f4c8bec5ae1b 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -3,7 +3,12 @@ #include #include #include +#include +#include +#include +#include "internal.h" +#include "swap.h" #include "memcontrol-v1.h" /* @@ -30,6 +35,31 @@ static struct mem_cgroup_tree soft_limit_tree __read_mostly; #define MEM_CGROUP_MAX_RECLAIM_LOOPS 100 #define MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS 2 +/* Stuffs for move charges at task migration. */ +/* + * Types of charges to be moved. + */ +#define MOVE_ANON 0x1U +#define MOVE_FILE 0x2U +#define MOVE_MASK (MOVE_ANON | MOVE_FILE) + +/* "mc" and its members are protected by cgroup_mutex */ +static struct move_charge_struct { + spinlock_t lock; /* for from, to */ + struct mm_struct *mm; + struct mem_cgroup *from; + struct mem_cgroup *to; + unsigned long flags; + unsigned long precharge; + unsigned long moved_charge; + unsigned long moved_swap; + struct task_struct *moving_task; /* a task moving charges */ + wait_queue_head_t waitq; /* a waitq for other context */ +} mc = { + .lock = __SPIN_LOCK_UNLOCKED(mc.lock), + .waitq = __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), +}; + static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, struct mem_cgroup_tree_per_node *mctz, unsigned long new_usage_in_excess) @@ -325,6 +355,957 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, return nr_reclaimed; } +/* + * A routine for checking "mem" is under move_account() or not. + * + * Checking a cgroup is mc.from or mc.to or under hierarchy of + * moving cgroups. This is for waiting at high-memory pressure + * caused by "move". + */ +static bool mem_cgroup_under_move(struct mem_cgroup *memcg) +{ + struct mem_cgroup *from; + struct mem_cgroup *to; + bool ret = false; + /* + * Unlike task_move routines, we access mc.to, mc.from not under + * mutual exclusion by cgroup_mutex. Here, we take spinlock instead. + */ + spin_lock(&mc.lock); + from = mc.from; + to = mc.to; + if (!from) + goto unlock; + + ret = mem_cgroup_is_descendant(from, memcg) || + mem_cgroup_is_descendant(to, memcg); +unlock: + spin_unlock(&mc.lock); + return ret; +} + +bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) +{ + if (mc.moving_task && current != mc.moving_task) { + if (mem_cgroup_under_move(memcg)) { + DEFINE_WAIT(wait); + prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE); + /* moving charge context might have finished. */ + if (mc.moving_task) + schedule(); + finish_wait(&mc.waitq, &wait); + return true; + } + } + return false; +} + +/** + * folio_memcg_lock - Bind a folio to its memcg. + * @folio: The folio. + * + * This function prevents unlocked LRU folios from being moved to + * another cgroup. + * + * It ensures lifetime of the bound memcg. The caller is responsible + * for the lifetime of the folio. + */ +void folio_memcg_lock(struct folio *folio) +{ + struct mem_cgroup *memcg; + unsigned long flags; + + /* + * The RCU lock is held throughout the transaction. The fast + * path can get away without acquiring the memcg->move_lock + * because page moving starts with an RCU grace period. + */ + rcu_read_lock(); + + if (mem_cgroup_disabled()) + return; +again: + memcg = folio_memcg(folio); + if (unlikely(!memcg)) + return; + +#ifdef CONFIG_PROVE_LOCKING + local_irq_save(flags); + might_lock(&memcg->move_lock); + local_irq_restore(flags); +#endif + + if (atomic_read(&memcg->moving_account) <= 0) + return; + + spin_lock_irqsave(&memcg->move_lock, flags); + if (memcg != folio_memcg(folio)) { + spin_unlock_irqrestore(&memcg->move_lock, flags); + goto again; + } + + /* + * When charge migration first begins, we can have multiple + * critical sections holding the fast-path RCU lock and one + * holding the slowpath move_lock. Track the task who has the + * move_lock for folio_memcg_unlock(). + */ + memcg->move_lock_task = current; + memcg->move_lock_flags = flags; +} + +static void __folio_memcg_unlock(struct mem_cgroup *memcg) +{ + if (memcg && memcg->move_lock_task == current) { + unsigned long flags = memcg->move_lock_flags; + + memcg->move_lock_task = NULL; + memcg->move_lock_flags = 0; + + spin_unlock_irqrestore(&memcg->move_lock, flags); + } + + rcu_read_unlock(); +} + +/** + * folio_memcg_unlock - Release the binding between a folio and its memcg. + * @folio: The folio. + * + * This releases the binding created by folio_memcg_lock(). This does + * not change the accounting of this folio to its memcg, but it does + * permit others to change it. + */ +void folio_memcg_unlock(struct folio *folio) +{ + __folio_memcg_unlock(folio_memcg(folio)); +} + +#ifdef CONFIG_SWAP +/** + * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record. + * @entry: swap entry to be moved + * @from: mem_cgroup which the entry is moved from + * @to: mem_cgroup which the entry is moved to + * + * It succeeds only when the swap_cgroup's record for this entry is the same + * as the mem_cgroup's id of @from. + * + * Returns 0 on success, -EINVAL on failure. + * + * The caller must have charged to @to, IOW, called page_counter_charge() about + * both res and memsw, and called css_get(). + */ +static int mem_cgroup_move_swap_account(swp_entry_t entry, + struct mem_cgroup *from, struct mem_cgroup *to) +{ + unsigned short old_id, new_id; + + old_id = mem_cgroup_id(from); + new_id = mem_cgroup_id(to); + + if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) { + mod_memcg_state(from, MEMCG_SWAP, -1); + mod_memcg_state(to, MEMCG_SWAP, 1); + return 0; + } + return -EINVAL; +} +#else +static inline int mem_cgroup_move_swap_account(swp_entry_t entry, + struct mem_cgroup *from, struct mem_cgroup *to) +{ + return -EINVAL; +} +#endif + +u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return mem_cgroup_from_css(css)->move_charge_at_immigrate; +} + +#ifdef CONFIG_MMU +int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. " + "Please report your usecase to linux-mm@kvack.org if you " + "depend on this functionality.\n"); + + if (val & ~MOVE_MASK) + return -EINVAL; + + /* + * No kind of locking is needed in here, because ->can_attach() will + * check this value once in the beginning of the process, and then carry + * on with stale data. This means that changes to this value will only + * affect task migrations starting after the change. + */ + memcg->move_charge_at_immigrate = val; + return 0; +} +#else +int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + return -ENOSYS; +} +#endif + +#ifdef CONFIG_MMU +/* Handlers for move charge at task migration. */ +static int mem_cgroup_do_precharge(unsigned long count) +{ + int ret; + + /* Try a single bulk charge without reclaim first, kswapd may wake */ + ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count); + if (!ret) { + mc.precharge += count; + return ret; + } + + /* Try charges one by one with reclaim, but do not retry */ + while (count--) { + ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1); + if (ret) + return ret; + mc.precharge++; + cond_resched(); + } + return 0; +} + +union mc_target { + struct folio *folio; + swp_entry_t ent; +}; + +enum mc_target_type { + MC_TARGET_NONE = 0, + MC_TARGET_PAGE, + MC_TARGET_SWAP, + MC_TARGET_DEVICE, +}; + +static struct page *mc_handle_present_pte(struct vm_area_struct *vma, + unsigned long addr, pte_t ptent) +{ + struct page *page = vm_normal_page(vma, addr, ptent); + + if (!page) + return NULL; + if (PageAnon(page)) { + if (!(mc.flags & MOVE_ANON)) + return NULL; + } else { + if (!(mc.flags & MOVE_FILE)) + return NULL; + } + get_page(page); + + return page; +} + +#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE) +static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, + pte_t ptent, swp_entry_t *entry) +{ + struct page *page = NULL; + swp_entry_t ent = pte_to_swp_entry(ptent); + + if (!(mc.flags & MOVE_ANON)) + return NULL; + + /* + * Handle device private pages that are not accessible by the CPU, but + * stored as special swap entries in the page table. + */ + if (is_device_private_entry(ent)) { + page = pfn_swap_entry_to_page(ent); + if (!get_page_unless_zero(page)) + return NULL; + return page; + } + + if (non_swap_entry(ent)) + return NULL; + + /* + * Because swap_cache_get_folio() updates some statistics counter, + * we call find_get_page() with swapper_space directly. + */ + page = find_get_page(swap_address_space(ent), swap_cache_index(ent)); + entry->val = ent.val; + + return page; +} +#else +static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, + pte_t ptent, swp_entry_t *entry) +{ + return NULL; +} +#endif + +static struct page *mc_handle_file_pte(struct vm_area_struct *vma, + unsigned long addr, pte_t ptent) +{ + unsigned long index; + struct folio *folio; + + if (!vma->vm_file) /* anonymous vma */ + return NULL; + if (!(mc.flags & MOVE_FILE)) + return NULL; + + /* folio is moved even if it's not RSS of this task(page-faulted). */ + /* shmem/tmpfs may report page out on swap: account for that too. */ + index = linear_page_index(vma, addr); + folio = filemap_get_incore_folio(vma->vm_file->f_mapping, index); + if (IS_ERR(folio)) + return NULL; + return folio_file_page(folio, index); +} + +/** + * mem_cgroup_move_account - move account of the folio + * @folio: The folio. + * @compound: charge the page as compound or small page + * @from: mem_cgroup which the folio is moved from. + * @to: mem_cgroup which the folio is moved to. @from != @to. + * + * The folio must be locked and not on the LRU. + * + * This function doesn't do "charge" to new cgroup and doesn't do "uncharge" + * from old cgroup. + */ +static int mem_cgroup_move_account(struct folio *folio, + bool compound, + struct mem_cgroup *from, + struct mem_cgroup *to) +{ + struct lruvec *from_vec, *to_vec; + struct pglist_data *pgdat; + unsigned int nr_pages = compound ? folio_nr_pages(folio) : 1; + int nid, ret; + + VM_BUG_ON(from == to); + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); + VM_BUG_ON(compound && !folio_test_large(folio)); + + ret = -EINVAL; + if (folio_memcg(folio) != from) + goto out; + + pgdat = folio_pgdat(folio); + from_vec = mem_cgroup_lruvec(from, pgdat); + to_vec = mem_cgroup_lruvec(to, pgdat); + + folio_memcg_lock(folio); + + if (folio_test_anon(folio)) { + if (folio_mapped(folio)) { + __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages); + __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages); + if (folio_test_pmd_mappable(folio)) { + __mod_lruvec_state(from_vec, NR_ANON_THPS, + -nr_pages); + __mod_lruvec_state(to_vec, NR_ANON_THPS, + nr_pages); + } + } + } else { + __mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages); + __mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages); + + if (folio_test_swapbacked(folio)) { + __mod_lruvec_state(from_vec, NR_SHMEM, -nr_pages); + __mod_lruvec_state(to_vec, NR_SHMEM, nr_pages); + } + + if (folio_mapped(folio)) { + __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages); + __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages); + } + + if (folio_test_dirty(folio)) { + struct address_space *mapping = folio_mapping(folio); + + if (mapping_can_writeback(mapping)) { + __mod_lruvec_state(from_vec, NR_FILE_DIRTY, + -nr_pages); + __mod_lruvec_state(to_vec, NR_FILE_DIRTY, + nr_pages); + } + } + } + +#ifdef CONFIG_SWAP + if (folio_test_swapcache(folio)) { + __mod_lruvec_state(from_vec, NR_SWAPCACHE, -nr_pages); + __mod_lruvec_state(to_vec, NR_SWAPCACHE, nr_pages); + } +#endif + if (folio_test_writeback(folio)) { + __mod_lruvec_state(from_vec, NR_WRITEBACK, -nr_pages); + __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages); + } + + /* + * All state has been migrated, let's switch to the new memcg. + * + * It is safe to change page's memcg here because the page + * is referenced, charged, isolated, and locked: we can't race + * with (un)charging, migration, LRU putback, or anything else + * that would rely on a stable page's memory cgroup. + * + * Note that folio_memcg_lock is a memcg lock, not a page lock, + * to save space. As soon as we switch page's memory cgroup to a + * new memcg that isn't locked, the above state can change + * concurrently again. Make sure we're truly done with it. + */ + smp_mb(); + + css_get(&to->css); + css_put(&from->css); + + folio->memcg_data = (unsigned long)to; + + __folio_memcg_unlock(from); + + ret = 0; + nid = folio_nid(folio); + + local_irq_disable(); + mem_cgroup_charge_statistics(to, nr_pages); + memcg_check_events(to, nid); + mem_cgroup_charge_statistics(from, -nr_pages); + memcg_check_events(from, nid); + local_irq_enable(); +out: + return ret; +} + +/** + * get_mctgt_type - get target type of moving charge + * @vma: the vma the pte to be checked belongs + * @addr: the address corresponding to the pte to be checked + * @ptent: the pte to be checked + * @target: the pointer the target page or swap ent will be stored(can be NULL) + * + * Context: Called with pte lock held. + * Return: + * * MC_TARGET_NONE - If the pte is not a target for move charge. + * * MC_TARGET_PAGE - If the page corresponding to this pte is a target for + * move charge. If @target is not NULL, the folio is stored in target->folio + * with extra refcnt taken (Caller should release it). + * * MC_TARGET_SWAP - If the swap entry corresponding to this pte is a + * target for charge migration. If @target is not NULL, the entry is + * stored in target->ent. + * * MC_TARGET_DEVICE - Like MC_TARGET_PAGE but page is device memory and + * thus not on the lru. For now such page is charged like a regular page + * would be as it is just special memory taking the place of a regular page. + * See Documentations/vm/hmm.txt and include/linux/hmm.h + */ +static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, + unsigned long addr, pte_t ptent, union mc_target *target) +{ + struct page *page = NULL; + struct folio *folio; + enum mc_target_type ret = MC_TARGET_NONE; + swp_entry_t ent = { .val = 0 }; + + if (pte_present(ptent)) + page = mc_handle_present_pte(vma, addr, ptent); + else if (pte_none_mostly(ptent)) + /* + * PTE markers should be treated as a none pte here, separated + * from other swap handling below. + */ + page = mc_handle_file_pte(vma, addr, ptent); + else if (is_swap_pte(ptent)) + page = mc_handle_swap_pte(vma, ptent, &ent); + + if (page) + folio = page_folio(page); + if (target && page) { + if (!folio_trylock(folio)) { + folio_put(folio); + return ret; + } + /* + * page_mapped() must be stable during the move. This + * pte is locked, so if it's present, the page cannot + * become unmapped. If it isn't, we have only partial + * control over the mapped state: the page lock will + * prevent new faults against pagecache and swapcache, + * so an unmapped page cannot become mapped. However, + * if the page is already mapped elsewhere, it can + * unmap, and there is nothing we can do about it. + * Alas, skip moving the page in this case. + */ + if (!pte_present(ptent) && page_mapped(page)) { + folio_unlock(folio); + folio_put(folio); + return ret; + } + } + + if (!page && !ent.val) + return ret; + if (page) { + /* + * Do only loose check w/o serialization. + * mem_cgroup_move_account() checks the page is valid or + * not under LRU exclusion. + */ + if (folio_memcg(folio) == mc.from) { + ret = MC_TARGET_PAGE; + if (folio_is_device_private(folio) || + folio_is_device_coherent(folio)) + ret = MC_TARGET_DEVICE; + if (target) + target->folio = folio; + } + if (!ret || !target) { + if (target) + folio_unlock(folio); + folio_put(folio); + } + } + /* + * There is a swap entry and a page doesn't exist or isn't charged. + * But we cannot move a tail-page in a THP. + */ + if (ent.val && !ret && (!page || !PageTransCompound(page)) && + mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { + ret = MC_TARGET_SWAP; + if (target) + target->ent = ent; + } + return ret; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +/* + * We don't consider PMD mapped swapping or file mapped pages because THP does + * not support them for now. + * Caller should make sure that pmd_trans_huge(pmd) is true. + */ +static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, + unsigned long addr, pmd_t pmd, union mc_target *target) +{ + struct page *page = NULL; + struct folio *folio; + enum mc_target_type ret = MC_TARGET_NONE; + + if (unlikely(is_swap_pmd(pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(pmd)); + return ret; + } + page = pmd_page(pmd); + VM_BUG_ON_PAGE(!page || !PageHead(page), page); + folio = page_folio(page); + if (!(mc.flags & MOVE_ANON)) + return ret; + if (folio_memcg(folio) == mc.from) { + ret = MC_TARGET_PAGE; + if (target) { + folio_get(folio); + if (!folio_trylock(folio)) { + folio_put(folio); + return MC_TARGET_NONE; + } + target->folio = folio; + } + } + return ret; +} +#else +static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, + unsigned long addr, pmd_t pmd, union mc_target *target) +{ + return MC_TARGET_NONE; +} +#endif + +static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, + unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + struct vm_area_struct *vma = walk->vma; + pte_t *pte; + spinlock_t *ptl; + + ptl = pmd_trans_huge_lock(pmd, vma); + if (ptl) { + /* + * Note their can not be MC_TARGET_DEVICE for now as we do not + * support transparent huge page with MEMORY_DEVICE_PRIVATE but + * this might change. + */ + if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) + mc.precharge += HPAGE_PMD_NR; + spin_unlock(ptl); + return 0; + } + + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) + return 0; + for (; addr != end; pte++, addr += PAGE_SIZE) + if (get_mctgt_type(vma, addr, ptep_get(pte), NULL)) + mc.precharge++; /* increment precharge temporarily */ + pte_unmap_unlock(pte - 1, ptl); + cond_resched(); + + return 0; +} + +static const struct mm_walk_ops precharge_walk_ops = { + .pmd_entry = mem_cgroup_count_precharge_pte_range, + .walk_lock = PGWALK_RDLOCK, +}; + +static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) +{ + unsigned long precharge; + + mmap_read_lock(mm); + walk_page_range(mm, 0, ULONG_MAX, &precharge_walk_ops, NULL); + mmap_read_unlock(mm); + + precharge = mc.precharge; + mc.precharge = 0; + + return precharge; +} + +static int mem_cgroup_precharge_mc(struct mm_struct *mm) +{ + unsigned long precharge = mem_cgroup_count_precharge(mm); + + VM_BUG_ON(mc.moving_task); + mc.moving_task = current; + return mem_cgroup_do_precharge(precharge); +} + +/* cancels all extra charges on mc.from and mc.to, and wakes up all waiters. */ +static void __mem_cgroup_clear_mc(void) +{ + struct mem_cgroup *from = mc.from; + struct mem_cgroup *to = mc.to; + + /* we must uncharge all the leftover precharges from mc.to */ + if (mc.precharge) { + mem_cgroup_cancel_charge(mc.to, mc.precharge); + mc.precharge = 0; + } + /* + * we didn't uncharge from mc.from at mem_cgroup_move_account(), so + * we must uncharge here. + */ + if (mc.moved_charge) { + mem_cgroup_cancel_charge(mc.from, mc.moved_charge); + mc.moved_charge = 0; + } + /* we must fixup refcnts and charges */ + if (mc.moved_swap) { + /* uncharge swap account from the old cgroup */ + if (!mem_cgroup_is_root(mc.from)) + page_counter_uncharge(&mc.from->memsw, mc.moved_swap); + + mem_cgroup_id_put_many(mc.from, mc.moved_swap); + + /* + * we charged both to->memory and to->memsw, so we + * should uncharge to->memory. + */ + if (!mem_cgroup_is_root(mc.to)) + page_counter_uncharge(&mc.to->memory, mc.moved_swap); + + mc.moved_swap = 0; + } + memcg_oom_recover(from); + memcg_oom_recover(to); + wake_up_all(&mc.waitq); +} + +static void mem_cgroup_clear_mc(void) +{ + struct mm_struct *mm = mc.mm; + + /* + * we must clear moving_task before waking up waiters at the end of + * task migration. + */ + mc.moving_task = NULL; + __mem_cgroup_clear_mc(); + spin_lock(&mc.lock); + mc.from = NULL; + mc.to = NULL; + mc.mm = NULL; + spin_unlock(&mc.lock); + + mmput(mm); +} + +int mem_cgroup_can_attach(struct cgroup_taskset *tset) +{ + struct cgroup_subsys_state *css; + struct mem_cgroup *memcg = NULL; /* unneeded init to make gcc happy */ + struct mem_cgroup *from; + struct task_struct *leader, *p; + struct mm_struct *mm; + unsigned long move_flags; + int ret = 0; + + /* charge immigration isn't supported on the default hierarchy */ + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return 0; + + /* + * Multi-process migrations only happen on the default hierarchy + * where charge immigration is not used. Perform charge + * immigration if @tset contains a leader and whine if there are + * multiple. + */ + p = NULL; + cgroup_taskset_for_each_leader(leader, css, tset) { + WARN_ON_ONCE(p); + p = leader; + memcg = mem_cgroup_from_css(css); + } + if (!p) + return 0; + + /* + * We are now committed to this value whatever it is. Changes in this + * tunable will only affect upcoming migrations, not the current one. + * So we need to save it, and keep it going. + */ + move_flags = READ_ONCE(memcg->move_charge_at_immigrate); + if (!move_flags) + return 0; + + from = mem_cgroup_from_task(p); + + VM_BUG_ON(from == memcg); + + mm = get_task_mm(p); + if (!mm) + return 0; + /* We move charges only when we move a owner of the mm */ + if (mm->owner == p) { + VM_BUG_ON(mc.from); + VM_BUG_ON(mc.to); + VM_BUG_ON(mc.precharge); + VM_BUG_ON(mc.moved_charge); + VM_BUG_ON(mc.moved_swap); + + spin_lock(&mc.lock); + mc.mm = mm; + mc.from = from; + mc.to = memcg; + mc.flags = move_flags; + spin_unlock(&mc.lock); + /* We set mc.moving_task later */ + + ret = mem_cgroup_precharge_mc(mm); + if (ret) + mem_cgroup_clear_mc(); + } else { + mmput(mm); + } + return ret; +} + +void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) +{ + if (mc.to) + mem_cgroup_clear_mc(); +} + +static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, + unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + int ret = 0; + struct vm_area_struct *vma = walk->vma; + pte_t *pte; + spinlock_t *ptl; + enum mc_target_type target_type; + union mc_target target; + struct folio *folio; + + ptl = pmd_trans_huge_lock(pmd, vma); + if (ptl) { + if (mc.precharge < HPAGE_PMD_NR) { + spin_unlock(ptl); + return 0; + } + target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); + if (target_type == MC_TARGET_PAGE) { + folio = target.folio; + if (folio_isolate_lru(folio)) { + if (!mem_cgroup_move_account(folio, true, + mc.from, mc.to)) { + mc.precharge -= HPAGE_PMD_NR; + mc.moved_charge += HPAGE_PMD_NR; + } + folio_putback_lru(folio); + } + folio_unlock(folio); + folio_put(folio); + } else if (target_type == MC_TARGET_DEVICE) { + folio = target.folio; + if (!mem_cgroup_move_account(folio, true, + mc.from, mc.to)) { + mc.precharge -= HPAGE_PMD_NR; + mc.moved_charge += HPAGE_PMD_NR; + } + folio_unlock(folio); + folio_put(folio); + } + spin_unlock(ptl); + return 0; + } + +retry: + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + if (!pte) + return 0; + for (; addr != end; addr += PAGE_SIZE) { + pte_t ptent = ptep_get(pte++); + bool device = false; + swp_entry_t ent; + + if (!mc.precharge) + break; + + switch (get_mctgt_type(vma, addr, ptent, &target)) { + case MC_TARGET_DEVICE: + device = true; + fallthrough; + case MC_TARGET_PAGE: + folio = target.folio; + /* + * We can have a part of the split pmd here. Moving it + * can be done but it would be too convoluted so simply + * ignore such a partial THP and keep it in original + * memcg. There should be somebody mapping the head. + */ + if (folio_test_large(folio)) + goto put; + if (!device && !folio_isolate_lru(folio)) + goto put; + if (!mem_cgroup_move_account(folio, false, + mc.from, mc.to)) { + mc.precharge--; + /* we uncharge from mc.from later. */ + mc.moved_charge++; + } + if (!device) + folio_putback_lru(folio); +put: /* get_mctgt_type() gets & locks the page */ + folio_unlock(folio); + folio_put(folio); + break; + case MC_TARGET_SWAP: + ent = target.ent; + if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { + mc.precharge--; + mem_cgroup_id_get_many(mc.to, 1); + /* we fixup other refcnts and charges later. */ + mc.moved_swap++; + } + break; + default: + break; + } + } + pte_unmap_unlock(pte - 1, ptl); + cond_resched(); + + if (addr != end) { + /* + * We have consumed all precharges we got in can_attach(). + * We try charge one by one, but don't do any additional + * charges to mc.to if we have failed in charge once in attach() + * phase. + */ + ret = mem_cgroup_do_precharge(1); + if (!ret) + goto retry; + } + + return ret; +} + +static const struct mm_walk_ops charge_walk_ops = { + .pmd_entry = mem_cgroup_move_charge_pte_range, + .walk_lock = PGWALK_RDLOCK, +}; + +static void mem_cgroup_move_charge(void) +{ + lru_add_drain_all(); + /* + * Signal folio_memcg_lock() to take the memcg's move_lock + * while we're moving its pages to another memcg. Then wait + * for already started RCU-only updates to finish. + */ + atomic_inc(&mc.from->moving_account); + synchronize_rcu(); +retry: + if (unlikely(!mmap_read_trylock(mc.mm))) { + /* + * Someone who are holding the mmap_lock might be waiting in + * waitq. So we cancel all extra charges, wake up all waiters, + * and retry. Because we cancel precharges, we might not be able + * to move enough charges, but moving charge is a best-effort + * feature anyway, so it wouldn't be a big problem. + */ + __mem_cgroup_clear_mc(); + cond_resched(); + goto retry; + } + /* + * When we have consumed all precharges and failed in doing + * additional charge, the page walk just aborts. + */ + walk_page_range(mc.mm, 0, ULONG_MAX, &charge_walk_ops, NULL); + mmap_read_unlock(mc.mm); + atomic_dec(&mc.from->moving_account); +} + +void mem_cgroup_move_task(void) +{ + if (mc.to) { + mem_cgroup_move_charge(); + mem_cgroup_clear_mc(); + } +} + +#else /* !CONFIG_MMU */ +static int mem_cgroup_can_attach(struct cgroup_taskset *tset) +{ + return 0; +} +static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) +{ +} +static void mem_cgroup_move_task(void) +{ +} +#endif + static int __init memcg1_init(void) { int node; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index e37bc7e8d955..55e7c4f90c39 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -11,4 +11,34 @@ static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); } +void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages); +void memcg_check_events(struct mem_cgroup *memcg, int nid); +void memcg_oom_recover(struct mem_cgroup *memcg); +int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, + unsigned int nr_pages); + +static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, + unsigned int nr_pages) +{ + if (mem_cgroup_is_root(memcg)) + return 0; + + return try_charge_memcg(memcg, gfp_mask, nr_pages); +} + +void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n); +void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n); + +bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg); +struct cgroup_taskset; +int mem_cgroup_can_attach(struct cgroup_taskset *tset); +void mem_cgroup_cancel_attach(struct cgroup_taskset *tset); +void mem_cgroup_move_task(void); + +struct cftype; +u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, + struct cftype *cft); +int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val); + #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3479e1af12d5..3332c89cae2e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -28,7 +28,6 @@ #include #include #include -#include #include #include #include @@ -45,7 +44,6 @@ #include #include #include -#include #include #include #include @@ -71,7 +69,6 @@ #include #include #include "slab.h" -#include "swap.h" #include "memcontrol-v1.h" #include @@ -158,31 +155,6 @@ struct mem_cgroup_event { static void mem_cgroup_threshold(struct mem_cgroup *memcg); static void mem_cgroup_oom_notify(struct mem_cgroup *memcg); -/* Stuffs for move charges at task migration. */ -/* - * Types of charges to be moved. - */ -#define MOVE_ANON 0x1U -#define MOVE_FILE 0x2U -#define MOVE_MASK (MOVE_ANON | MOVE_FILE) - -/* "mc" and its members are protected by cgroup_mutex */ -static struct move_charge_struct { - spinlock_t lock; /* for from, to */ - struct mm_struct *mm; - struct mem_cgroup *from; - struct mem_cgroup *to; - unsigned long flags; - unsigned long precharge; - unsigned long moved_charge; - unsigned long moved_swap; - struct task_struct *moving_task; /* a task moving charges */ - wait_queue_head_t waitq; /* a waitq for other context */ -} mc = { - .lock = __SPIN_LOCK_UNLOCKED(mc.lock), - .waitq = __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), -}; - /* for encoding cft->private value on file */ enum res_type { _MEM, @@ -955,8 +927,7 @@ static unsigned long memcg_events_local(struct mem_cgroup *memcg, int event) return READ_ONCE(memcg->vmstats->events_local[i]); } -static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, - int nr_pages) +void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages) { /* pagein of a big page is an event. So, ignore page size */ if (nr_pages > 0) @@ -998,7 +969,7 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, * Check events in order. * */ -static void memcg_check_events(struct mem_cgroup *memcg, int nid) +void memcg_check_events(struct mem_cgroup *memcg, int nid) { if (IS_ENABLED(CONFIG_PREEMPT_RT)) return; @@ -1467,51 +1438,6 @@ static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg) return margin; } -/* - * A routine for checking "mem" is under move_account() or not. - * - * Checking a cgroup is mc.from or mc.to or under hierarchy of - * moving cgroups. This is for waiting at high-memory pressure - * caused by "move". - */ -static bool mem_cgroup_under_move(struct mem_cgroup *memcg) -{ - struct mem_cgroup *from; - struct mem_cgroup *to; - bool ret = false; - /* - * Unlike task_move routines, we access mc.to, mc.from not under - * mutual exclusion by cgroup_mutex. Here, we take spinlock instead. - */ - spin_lock(&mc.lock); - from = mc.from; - to = mc.to; - if (!from) - goto unlock; - - ret = mem_cgroup_is_descendant(from, memcg) || - mem_cgroup_is_descendant(to, memcg); -unlock: - spin_unlock(&mc.lock); - return ret; -} - -static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) -{ - if (mc.moving_task && current != mc.moving_task) { - if (mem_cgroup_under_move(memcg)) { - DEFINE_WAIT(wait); - prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE); - /* moving charge context might have finished. */ - if (mc.moving_task) - schedule(); - finish_wait(&mc.waitq, &wait); - return true; - } - } - return false; -} - struct memory_stat { const char *name; unsigned int idx; @@ -1904,7 +1830,7 @@ static int memcg_oom_wake_function(wait_queue_entry_t *wait, return autoremove_wake_function(wait, mode, sync, arg); } -static void memcg_oom_recover(struct mem_cgroup *memcg) +void memcg_oom_recover(struct mem_cgroup *memcg) { /* * For the following lockless ->under_oom test, the only required @@ -2093,87 +2019,6 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg) pr_cont(" are going to be killed due to memory.oom.group set\n"); } -/** - * folio_memcg_lock - Bind a folio to its memcg. - * @folio: The folio. - * - * This function prevents unlocked LRU folios from being moved to - * another cgroup. - * - * It ensures lifetime of the bound memcg. The caller is responsible - * for the lifetime of the folio. - */ -void folio_memcg_lock(struct folio *folio) -{ - struct mem_cgroup *memcg; - unsigned long flags; - - /* - * The RCU lock is held throughout the transaction. The fast - * path can get away without acquiring the memcg->move_lock - * because page moving starts with an RCU grace period. - */ - rcu_read_lock(); - - if (mem_cgroup_disabled()) - return; -again: - memcg = folio_memcg(folio); - if (unlikely(!memcg)) - return; - -#ifdef CONFIG_PROVE_LOCKING - local_irq_save(flags); - might_lock(&memcg->move_lock); - local_irq_restore(flags); -#endif - - if (atomic_read(&memcg->moving_account) <= 0) - return; - - spin_lock_irqsave(&memcg->move_lock, flags); - if (memcg != folio_memcg(folio)) { - spin_unlock_irqrestore(&memcg->move_lock, flags); - goto again; - } - - /* - * When charge migration first begins, we can have multiple - * critical sections holding the fast-path RCU lock and one - * holding the slowpath move_lock. Track the task who has the - * move_lock for folio_memcg_unlock(). - */ - memcg->move_lock_task = current; - memcg->move_lock_flags = flags; -} - -static void __folio_memcg_unlock(struct mem_cgroup *memcg) -{ - if (memcg && memcg->move_lock_task == current) { - unsigned long flags = memcg->move_lock_flags; - - memcg->move_lock_task = NULL; - memcg->move_lock_flags = 0; - - spin_unlock_irqrestore(&memcg->move_lock, flags); - } - - rcu_read_unlock(); -} - -/** - * folio_memcg_unlock - Release the binding between a folio and its memcg. - * @folio: The folio. - * - * This releases the binding created by folio_memcg_lock(). This does - * not change the accounting of this folio to its memcg, but it does - * permit others to change it. - */ -void folio_memcg_unlock(struct folio *folio) -{ - __folio_memcg_unlock(folio_memcg(folio)); -} - struct memcg_stock_pcp { local_lock_t stock_lock; struct mem_cgroup *cached; /* this never be root cgroup */ @@ -2653,8 +2498,8 @@ void mem_cgroup_handle_over_high(gfp_t gfp_mask) css_put(&memcg->css); } -static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, - unsigned int nr_pages) +int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, + unsigned int nr_pages) { unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages); int nr_retries = MAX_RECLAIM_RETRIES; @@ -2849,15 +2694,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, return 0; } -static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, - unsigned int nr_pages) -{ - if (mem_cgroup_is_root(memcg)) - return 0; - - return try_charge_memcg(memcg, gfp_mask, nr_pages); -} - /** * mem_cgroup_cancel_charge() - cancel an uncommitted try_charge() call. * @memcg: memcg previously charged. @@ -3595,43 +3431,6 @@ void split_page_memcg(struct page *head, int old_order, int new_order) css_get_many(&memcg->css, old_nr / new_nr - 1); } -#ifdef CONFIG_SWAP -/** - * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record. - * @entry: swap entry to be moved - * @from: mem_cgroup which the entry is moved from - * @to: mem_cgroup which the entry is moved to - * - * It succeeds only when the swap_cgroup's record for this entry is the same - * as the mem_cgroup's id of @from. - * - * Returns 0 on success, -EINVAL on failure. - * - * The caller must have charged to @to, IOW, called page_counter_charge() about - * both res and memsw, and called css_get(). - */ -static int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) -{ - unsigned short old_id, new_id; - - old_id = mem_cgroup_id(from); - new_id = mem_cgroup_id(to); - - if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) { - mod_memcg_state(from, MEMCG_SWAP, -1); - mod_memcg_state(to, MEMCG_SWAP, 1); - return 0; - } - return -EINVAL; -} -#else -static inline int mem_cgroup_move_swap_account(swp_entry_t entry, - struct mem_cgroup *from, struct mem_cgroup *to) -{ - return -EINVAL; -} -#endif static DEFINE_MUTEX(memcg_max_mutex); @@ -4015,42 +3814,6 @@ static ssize_t mem_cgroup_reset(struct kernfs_open_file *of, char *buf, return nbytes; } -static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, - struct cftype *cft) -{ - return mem_cgroup_from_css(css)->move_charge_at_immigrate; -} - -#ifdef CONFIG_MMU -static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(css); - - pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. " - "Please report your usecase to linux-mm@kvack.org if you " - "depend on this functionality.\n"); - - if (val & ~MOVE_MASK) - return -EINVAL; - - /* - * No kind of locking is needed in here, because ->can_attach() will - * check this value once in the beginning of the process, and then carry - * on with stale data. This means that changes to this value will only - * affect task migrations starting after the change. - */ - memcg->move_charge_at_immigrate = val; - return 0; -} -#else -static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - return -ENOSYS; -} -#endif - #ifdef CONFIG_NUMA #define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE)) @@ -5261,13 +5024,13 @@ static void mem_cgroup_id_remove(struct mem_cgroup *memcg) } } -static void __maybe_unused mem_cgroup_id_get_many(struct mem_cgroup *memcg, - unsigned int n) +void __maybe_unused mem_cgroup_id_get_many(struct mem_cgroup *memcg, + unsigned int n) { refcount_add(n, &memcg->id.ref); } -static void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n) +void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n) { if (refcount_sub_and_test(n, &memcg->id.ref)) { mem_cgroup_id_remove(memcg); @@ -5747,757 +5510,6 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu) atomic64_set(&memcg->vmstats->stats_updates, 0); } -#ifdef CONFIG_MMU -/* Handlers for move charge at task migration. */ -static int mem_cgroup_do_precharge(unsigned long count) -{ - int ret; - - /* Try a single bulk charge without reclaim first, kswapd may wake */ - ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count); - if (!ret) { - mc.precharge += count; - return ret; - } - - /* Try charges one by one with reclaim, but do not retry */ - while (count--) { - ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1); - if (ret) - return ret; - mc.precharge++; - cond_resched(); - } - return 0; -} - -union mc_target { - struct folio *folio; - swp_entry_t ent; -}; - -enum mc_target_type { - MC_TARGET_NONE = 0, - MC_TARGET_PAGE, - MC_TARGET_SWAP, - MC_TARGET_DEVICE, -}; - -static struct page *mc_handle_present_pte(struct vm_area_struct *vma, - unsigned long addr, pte_t ptent) -{ - struct page *page = vm_normal_page(vma, addr, ptent); - - if (!page) - return NULL; - if (PageAnon(page)) { - if (!(mc.flags & MOVE_ANON)) - return NULL; - } else { - if (!(mc.flags & MOVE_FILE)) - return NULL; - } - get_page(page); - - return page; -} - -#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE) -static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, - pte_t ptent, swp_entry_t *entry) -{ - struct page *page = NULL; - swp_entry_t ent = pte_to_swp_entry(ptent); - - if (!(mc.flags & MOVE_ANON)) - return NULL; - - /* - * Handle device private pages that are not accessible by the CPU, but - * stored as special swap entries in the page table. - */ - if (is_device_private_entry(ent)) { - page = pfn_swap_entry_to_page(ent); - if (!get_page_unless_zero(page)) - return NULL; - return page; - } - - if (non_swap_entry(ent)) - return NULL; - - /* - * Because swap_cache_get_folio() updates some statistics counter, - * we call find_get_page() with swapper_space directly. - */ - page = find_get_page(swap_address_space(ent), swap_cache_index(ent)); - entry->val = ent.val; - - return page; -} -#else -static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, - pte_t ptent, swp_entry_t *entry) -{ - return NULL; -} -#endif - -static struct page *mc_handle_file_pte(struct vm_area_struct *vma, - unsigned long addr, pte_t ptent) -{ - unsigned long index; - struct folio *folio; - - if (!vma->vm_file) /* anonymous vma */ - return NULL; - if (!(mc.flags & MOVE_FILE)) - return NULL; - - /* folio is moved even if it's not RSS of this task(page-faulted). */ - /* shmem/tmpfs may report page out on swap: account for that too. */ - index = linear_page_index(vma, addr); - folio = filemap_get_incore_folio(vma->vm_file->f_mapping, index); - if (IS_ERR(folio)) - return NULL; - return folio_file_page(folio, index); -} - -/** - * mem_cgroup_move_account - move account of the folio - * @folio: The folio. - * @compound: charge the page as compound or small page - * @from: mem_cgroup which the folio is moved from. - * @to: mem_cgroup which the folio is moved to. @from != @to. - * - * The folio must be locked and not on the LRU. - * - * This function doesn't do "charge" to new cgroup and doesn't do "uncharge" - * from old cgroup. - */ -static int mem_cgroup_move_account(struct folio *folio, - bool compound, - struct mem_cgroup *from, - struct mem_cgroup *to) -{ - struct lruvec *from_vec, *to_vec; - struct pglist_data *pgdat; - unsigned int nr_pages = compound ? folio_nr_pages(folio) : 1; - int nid, ret; - - VM_BUG_ON(from == to); - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); - VM_BUG_ON(compound && !folio_test_large(folio)); - - ret = -EINVAL; - if (folio_memcg(folio) != from) - goto out; - - pgdat = folio_pgdat(folio); - from_vec = mem_cgroup_lruvec(from, pgdat); - to_vec = mem_cgroup_lruvec(to, pgdat); - - folio_memcg_lock(folio); - - if (folio_test_anon(folio)) { - if (folio_mapped(folio)) { - __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages); - __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages); - if (folio_test_pmd_mappable(folio)) { - __mod_lruvec_state(from_vec, NR_ANON_THPS, - -nr_pages); - __mod_lruvec_state(to_vec, NR_ANON_THPS, - nr_pages); - } - } - } else { - __mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages); - __mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages); - - if (folio_test_swapbacked(folio)) { - __mod_lruvec_state(from_vec, NR_SHMEM, -nr_pages); - __mod_lruvec_state(to_vec, NR_SHMEM, nr_pages); - } - - if (folio_mapped(folio)) { - __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages); - __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages); - } - - if (folio_test_dirty(folio)) { - struct address_space *mapping = folio_mapping(folio); - - if (mapping_can_writeback(mapping)) { - __mod_lruvec_state(from_vec, NR_FILE_DIRTY, - -nr_pages); - __mod_lruvec_state(to_vec, NR_FILE_DIRTY, - nr_pages); - } - } - } - -#ifdef CONFIG_SWAP - if (folio_test_swapcache(folio)) { - __mod_lruvec_state(from_vec, NR_SWAPCACHE, -nr_pages); - __mod_lruvec_state(to_vec, NR_SWAPCACHE, nr_pages); - } -#endif - if (folio_test_writeback(folio)) { - __mod_lruvec_state(from_vec, NR_WRITEBACK, -nr_pages); - __mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages); - } - - /* - * All state has been migrated, let's switch to the new memcg. - * - * It is safe to change page's memcg here because the page - * is referenced, charged, isolated, and locked: we can't race - * with (un)charging, migration, LRU putback, or anything else - * that would rely on a stable page's memory cgroup. - * - * Note that folio_memcg_lock is a memcg lock, not a page lock, - * to save space. As soon as we switch page's memory cgroup to a - * new memcg that isn't locked, the above state can change - * concurrently again. Make sure we're truly done with it. - */ - smp_mb(); - - css_get(&to->css); - css_put(&from->css); - - folio->memcg_data = (unsigned long)to; - - __folio_memcg_unlock(from); - - ret = 0; - nid = folio_nid(folio); - - local_irq_disable(); - mem_cgroup_charge_statistics(to, nr_pages); - memcg_check_events(to, nid); - mem_cgroup_charge_statistics(from, -nr_pages); - memcg_check_events(from, nid); - local_irq_enable(); -out: - return ret; -} - -/** - * get_mctgt_type - get target type of moving charge - * @vma: the vma the pte to be checked belongs - * @addr: the address corresponding to the pte to be checked - * @ptent: the pte to be checked - * @target: the pointer the target page or swap ent will be stored(can be NULL) - * - * Context: Called with pte lock held. - * Return: - * * MC_TARGET_NONE - If the pte is not a target for move charge. - * * MC_TARGET_PAGE - If the page corresponding to this pte is a target for - * move charge. If @target is not NULL, the folio is stored in target->folio - * with extra refcnt taken (Caller should release it). - * * MC_TARGET_SWAP - If the swap entry corresponding to this pte is a - * target for charge migration. If @target is not NULL, the entry is - * stored in target->ent. - * * MC_TARGET_DEVICE - Like MC_TARGET_PAGE but page is device memory and - * thus not on the lru. For now such page is charged like a regular page - * would be as it is just special memory taking the place of a regular page. - * See Documentations/vm/hmm.txt and include/linux/hmm.h - */ -static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, - unsigned long addr, pte_t ptent, union mc_target *target) -{ - struct page *page = NULL; - struct folio *folio; - enum mc_target_type ret = MC_TARGET_NONE; - swp_entry_t ent = { .val = 0 }; - - if (pte_present(ptent)) - page = mc_handle_present_pte(vma, addr, ptent); - else if (pte_none_mostly(ptent)) - /* - * PTE markers should be treated as a none pte here, separated - * from other swap handling below. - */ - page = mc_handle_file_pte(vma, addr, ptent); - else if (is_swap_pte(ptent)) - page = mc_handle_swap_pte(vma, ptent, &ent); - - if (page) - folio = page_folio(page); - if (target && page) { - if (!folio_trylock(folio)) { - folio_put(folio); - return ret; - } - /* - * page_mapped() must be stable during the move. This - * pte is locked, so if it's present, the page cannot - * become unmapped. If it isn't, we have only partial - * control over the mapped state: the page lock will - * prevent new faults against pagecache and swapcache, - * so an unmapped page cannot become mapped. However, - * if the page is already mapped elsewhere, it can - * unmap, and there is nothing we can do about it. - * Alas, skip moving the page in this case. - */ - if (!pte_present(ptent) && page_mapped(page)) { - folio_unlock(folio); - folio_put(folio); - return ret; - } - } - - if (!page && !ent.val) - return ret; - if (page) { - /* - * Do only loose check w/o serialization. - * mem_cgroup_move_account() checks the page is valid or - * not under LRU exclusion. - */ - if (folio_memcg(folio) == mc.from) { - ret = MC_TARGET_PAGE; - if (folio_is_device_private(folio) || - folio_is_device_coherent(folio)) - ret = MC_TARGET_DEVICE; - if (target) - target->folio = folio; - } - if (!ret || !target) { - if (target) - folio_unlock(folio); - folio_put(folio); - } - } - /* - * There is a swap entry and a page doesn't exist or isn't charged. - * But we cannot move a tail-page in a THP. - */ - if (ent.val && !ret && (!page || !PageTransCompound(page)) && - mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) { - ret = MC_TARGET_SWAP; - if (target) - target->ent = ent; - } - return ret; -} - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -/* - * We don't consider PMD mapped swapping or file mapped pages because THP does - * not support them for now. - * Caller should make sure that pmd_trans_huge(pmd) is true. - */ -static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) -{ - struct page *page = NULL; - struct folio *folio; - enum mc_target_type ret = MC_TARGET_NONE; - - if (unlikely(is_swap_pmd(pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(pmd)); - return ret; - } - page = pmd_page(pmd); - VM_BUG_ON_PAGE(!page || !PageHead(page), page); - folio = page_folio(page); - if (!(mc.flags & MOVE_ANON)) - return ret; - if (folio_memcg(folio) == mc.from) { - ret = MC_TARGET_PAGE; - if (target) { - folio_get(folio); - if (!folio_trylock(folio)) { - folio_put(folio); - return MC_TARGET_NONE; - } - target->folio = folio; - } - } - return ret; -} -#else -static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma, - unsigned long addr, pmd_t pmd, union mc_target *target) -{ - return MC_TARGET_NONE; -} -#endif - -static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, - unsigned long addr, unsigned long end, - struct mm_walk *walk) -{ - struct vm_area_struct *vma = walk->vma; - pte_t *pte; - spinlock_t *ptl; - - ptl = pmd_trans_huge_lock(pmd, vma); - if (ptl) { - /* - * Note their can not be MC_TARGET_DEVICE for now as we do not - * support transparent huge page with MEMORY_DEVICE_PRIVATE but - * this might change. - */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE) - mc.precharge += HPAGE_PMD_NR; - spin_unlock(ptl); - return 0; - } - - pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (!pte) - return 0; - for (; addr != end; pte++, addr += PAGE_SIZE) - if (get_mctgt_type(vma, addr, ptep_get(pte), NULL)) - mc.precharge++; /* increment precharge temporarily */ - pte_unmap_unlock(pte - 1, ptl); - cond_resched(); - - return 0; -} - -static const struct mm_walk_ops precharge_walk_ops = { - .pmd_entry = mem_cgroup_count_precharge_pte_range, - .walk_lock = PGWALK_RDLOCK, -}; - -static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) -{ - unsigned long precharge; - - mmap_read_lock(mm); - walk_page_range(mm, 0, ULONG_MAX, &precharge_walk_ops, NULL); - mmap_read_unlock(mm); - - precharge = mc.precharge; - mc.precharge = 0; - - return precharge; -} - -static int mem_cgroup_precharge_mc(struct mm_struct *mm) -{ - unsigned long precharge = mem_cgroup_count_precharge(mm); - - VM_BUG_ON(mc.moving_task); - mc.moving_task = current; - return mem_cgroup_do_precharge(precharge); -} - -/* cancels all extra charges on mc.from and mc.to, and wakes up all waiters. */ -static void __mem_cgroup_clear_mc(void) -{ - struct mem_cgroup *from = mc.from; - struct mem_cgroup *to = mc.to; - - /* we must uncharge all the leftover precharges from mc.to */ - if (mc.precharge) { - mem_cgroup_cancel_charge(mc.to, mc.precharge); - mc.precharge = 0; - } - /* - * we didn't uncharge from mc.from at mem_cgroup_move_account(), so - * we must uncharge here. - */ - if (mc.moved_charge) { - mem_cgroup_cancel_charge(mc.from, mc.moved_charge); - mc.moved_charge = 0; - } - /* we must fixup refcnts and charges */ - if (mc.moved_swap) { - /* uncharge swap account from the old cgroup */ - if (!mem_cgroup_is_root(mc.from)) - page_counter_uncharge(&mc.from->memsw, mc.moved_swap); - - mem_cgroup_id_put_many(mc.from, mc.moved_swap); - - /* - * we charged both to->memory and to->memsw, so we - * should uncharge to->memory. - */ - if (!mem_cgroup_is_root(mc.to)) - page_counter_uncharge(&mc.to->memory, mc.moved_swap); - - mc.moved_swap = 0; - } - memcg_oom_recover(from); - memcg_oom_recover(to); - wake_up_all(&mc.waitq); -} - -static void mem_cgroup_clear_mc(void) -{ - struct mm_struct *mm = mc.mm; - - /* - * we must clear moving_task before waking up waiters at the end of - * task migration. - */ - mc.moving_task = NULL; - __mem_cgroup_clear_mc(); - spin_lock(&mc.lock); - mc.from = NULL; - mc.to = NULL; - mc.mm = NULL; - spin_unlock(&mc.lock); - - mmput(mm); -} - -static int mem_cgroup_can_attach(struct cgroup_taskset *tset) -{ - struct cgroup_subsys_state *css; - struct mem_cgroup *memcg = NULL; /* unneeded init to make gcc happy */ - struct mem_cgroup *from; - struct task_struct *leader, *p; - struct mm_struct *mm; - unsigned long move_flags; - int ret = 0; - - /* charge immigration isn't supported on the default hierarchy */ - if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) - return 0; - - /* - * Multi-process migrations only happen on the default hierarchy - * where charge immigration is not used. Perform charge - * immigration if @tset contains a leader and whine if there are - * multiple. - */ - p = NULL; - cgroup_taskset_for_each_leader(leader, css, tset) { - WARN_ON_ONCE(p); - p = leader; - memcg = mem_cgroup_from_css(css); - } - if (!p) - return 0; - - /* - * We are now committed to this value whatever it is. Changes in this - * tunable will only affect upcoming migrations, not the current one. - * So we need to save it, and keep it going. - */ - move_flags = READ_ONCE(memcg->move_charge_at_immigrate); - if (!move_flags) - return 0; - - from = mem_cgroup_from_task(p); - - VM_BUG_ON(from == memcg); - - mm = get_task_mm(p); - if (!mm) - return 0; - /* We move charges only when we move a owner of the mm */ - if (mm->owner == p) { - VM_BUG_ON(mc.from); - VM_BUG_ON(mc.to); - VM_BUG_ON(mc.precharge); - VM_BUG_ON(mc.moved_charge); - VM_BUG_ON(mc.moved_swap); - - spin_lock(&mc.lock); - mc.mm = mm; - mc.from = from; - mc.to = memcg; - mc.flags = move_flags; - spin_unlock(&mc.lock); - /* We set mc.moving_task later */ - - ret = mem_cgroup_precharge_mc(mm); - if (ret) - mem_cgroup_clear_mc(); - } else { - mmput(mm); - } - return ret; -} - -static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) -{ - if (mc.to) - mem_cgroup_clear_mc(); -} - -static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, - unsigned long addr, unsigned long end, - struct mm_walk *walk) -{ - int ret = 0; - struct vm_area_struct *vma = walk->vma; - pte_t *pte; - spinlock_t *ptl; - enum mc_target_type target_type; - union mc_target target; - struct folio *folio; - - ptl = pmd_trans_huge_lock(pmd, vma); - if (ptl) { - if (mc.precharge < HPAGE_PMD_NR) { - spin_unlock(ptl); - return 0; - } - target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); - if (target_type == MC_TARGET_PAGE) { - folio = target.folio; - if (folio_isolate_lru(folio)) { - if (!mem_cgroup_move_account(folio, true, - mc.from, mc.to)) { - mc.precharge -= HPAGE_PMD_NR; - mc.moved_charge += HPAGE_PMD_NR; - } - folio_putback_lru(folio); - } - folio_unlock(folio); - folio_put(folio); - } else if (target_type == MC_TARGET_DEVICE) { - folio = target.folio; - if (!mem_cgroup_move_account(folio, true, - mc.from, mc.to)) { - mc.precharge -= HPAGE_PMD_NR; - mc.moved_charge += HPAGE_PMD_NR; - } - folio_unlock(folio); - folio_put(folio); - } - spin_unlock(ptl); - return 0; - } - -retry: - pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - if (!pte) - return 0; - for (; addr != end; addr += PAGE_SIZE) { - pte_t ptent = ptep_get(pte++); - bool device = false; - swp_entry_t ent; - - if (!mc.precharge) - break; - - switch (get_mctgt_type(vma, addr, ptent, &target)) { - case MC_TARGET_DEVICE: - device = true; - fallthrough; - case MC_TARGET_PAGE: - folio = target.folio; - /* - * We can have a part of the split pmd here. Moving it - * can be done but it would be too convoluted so simply - * ignore such a partial THP and keep it in original - * memcg. There should be somebody mapping the head. - */ - if (folio_test_large(folio)) - goto put; - if (!device && !folio_isolate_lru(folio)) - goto put; - if (!mem_cgroup_move_account(folio, false, - mc.from, mc.to)) { - mc.precharge--; - /* we uncharge from mc.from later. */ - mc.moved_charge++; - } - if (!device) - folio_putback_lru(folio); -put: /* get_mctgt_type() gets & locks the page */ - folio_unlock(folio); - folio_put(folio); - break; - case MC_TARGET_SWAP: - ent = target.ent; - if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) { - mc.precharge--; - mem_cgroup_id_get_many(mc.to, 1); - /* we fixup other refcnts and charges later. */ - mc.moved_swap++; - } - break; - default: - break; - } - } - pte_unmap_unlock(pte - 1, ptl); - cond_resched(); - - if (addr != end) { - /* - * We have consumed all precharges we got in can_attach(). - * We try charge one by one, but don't do any additional - * charges to mc.to if we have failed in charge once in attach() - * phase. - */ - ret = mem_cgroup_do_precharge(1); - if (!ret) - goto retry; - } - - return ret; -} - -static const struct mm_walk_ops charge_walk_ops = { - .pmd_entry = mem_cgroup_move_charge_pte_range, - .walk_lock = PGWALK_RDLOCK, -}; - -static void mem_cgroup_move_charge(void) -{ - lru_add_drain_all(); - /* - * Signal folio_memcg_lock() to take the memcg's move_lock - * while we're moving its pages to another memcg. Then wait - * for already started RCU-only updates to finish. - */ - atomic_inc(&mc.from->moving_account); - synchronize_rcu(); -retry: - if (unlikely(!mmap_read_trylock(mc.mm))) { - /* - * Someone who are holding the mmap_lock might be waiting in - * waitq. So we cancel all extra charges, wake up all waiters, - * and retry. Because we cancel precharges, we might not be able - * to move enough charges, but moving charge is a best-effort - * feature anyway, so it wouldn't be a big problem. - */ - __mem_cgroup_clear_mc(); - cond_resched(); - goto retry; - } - /* - * When we have consumed all precharges and failed in doing - * additional charge, the page walk just aborts. - */ - walk_page_range(mc.mm, 0, ULONG_MAX, &charge_walk_ops, NULL); - mmap_read_unlock(mc.mm); - atomic_dec(&mc.from->moving_account); -} - -static void mem_cgroup_move_task(void) -{ - if (mc.to) { - mem_cgroup_move_charge(); - mem_cgroup_clear_mc(); - } -} - -#else /* !CONFIG_MMU */ -static int mem_cgroup_can_attach(struct cgroup_taskset *tset) -{ - return 0; -} -static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) -{ -} -static void mem_cgroup_move_task(void) -{ -} -#endif - #ifdef CONFIG_MEMCG_KMEM static void mem_cgroup_fork(struct task_struct *task) { From patchwork Tue Jun 25 00:58:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3F44C30659 for ; Tue, 25 Jun 2024 00:59:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A13EE6B0353; Mon, 24 Jun 2024 20:59:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 99C296B0356; Mon, 24 Jun 2024 20:59:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 727EB6B0355; Mon, 24 Jun 2024 20:59:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4EF6E6B0353 for ; Mon, 24 Jun 2024 20:59:41 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0C648C14D3 for ; Tue, 25 Jun 2024 00:59:41 +0000 (UTC) X-FDA: 82267603362.25.15D7523 Received: from out-188.mta0.migadu.com (out-188.mta0.migadu.com [91.218.175.188]) by imf27.hostedemail.com (Postfix) with ESMTP id 065C440004 for ; Tue, 25 Jun 2024 00:59:38 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=SFL+k+TQ; spf=pass (imf27.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.188 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277160; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=v/tf3lBw3jG/rT7MJZGDLV/4uYe72LY+pJRmiKuAdfM=; b=hT6OUlhjYQmnOsbXvIWzjbnVzlmDxCZ25UUVP+4Ab5GfRiSZIkY4CFaVRzoRHVv969Zv5l cmlXH+IZH87vLpKVzEHnfFORHtYlju6XUIF0737VN3smXmodCSISACPGCczS+i/d4VBKhU PvGAn53k9jTfjXXDxfT7ybGK1wppSEY= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=SFL+k+TQ; spf=pass (imf27.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.188 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277160; a=rsa-sha256; cv=none; b=fNLkFubE3AlJ3fl9iodpi8qEfr3OytnS0PYhlYDFgT3jaqigrRDnMNCl4NJxA9Z+lfSc0G AZBMLFxPknWLxxmCuW9XCt1PcOrUCBg/Wnv/BHJ4MJu8tCXJw9fsgVMqDm2qV0ggeRuXV6 77YKzYMTZ/0Wlh3CMO1ebnR6epFjUTo= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277177; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v/tf3lBw3jG/rT7MJZGDLV/4uYe72LY+pJRmiKuAdfM=; b=SFL+k+TQiNlRjgoH6x50pq3n0QiBjYIdOlFT3DK3/1aCGerHYnl1GSqWHiUELtnGjVl6cf EHBcQ4kFu/PRip99CGJsa6MpFTpsdvUA9eQKJSLiBJ56N+W0FFhsvBi6fisG0L48+JrNBH AoKWwYioVwfLl56ihuUsHU1AN4cR280= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 05/14] mm: memcg: rename charge move-related functions Date: Mon, 24 Jun 2024 17:58:57 -0700 Message-ID: <20240625005906.106920-6-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 065C440004 X-Stat-Signature: qym3dik9drwpzu6wdybxxxqm9d9xshpg X-Rspam-User: X-HE-Tag: 1719277178-64412 X-HE-Meta: U2FsdGVkX1+H0AWMbuY5D37UEV84FSLBV8rZoYKrR8hp2Ig1TgHArwJxLJZTNllzjIMTuUjbwYa9rhAXg8WEaq25LmXpyYk15m0GnulThKpoA25ePP2KBSIQo43OREZY3i1mRtq/HZWeufqhj7ijaJ7jPl+DZhkYQe7/c+oGyQP/Z3g8IihOURJnkTKF6+5IoE7FZNd7kyQ0+YYSfwVx8Dzf4NWmciCNKA0vniT7sZP8fx0cG2JiPZyani7b8VOpi8l93Z6VynyGcBjPHEZdYANW3Xi/05aq2NRf81zMgCRZo/zfIvKtCIsoU7ybeI9Qgl2BbYkf1yKk7VnRaI0BLw3IbAtW0A3JkgJiydBrSeG1x4/M5YKkKo6vv6yN7/cmmeuIBbnDOwceDg2AKYwVQaK0CO9D95kbAh8fEYADF0M2pPimaMXLm0uUBtw+u6QWO7uq0nsBj8hyU/lgqB+8i6O1a+Woyf7HtDgNGjTih7UhMtjYi73npNfSuIsjRED3DGNaW0tvrDuT+Xh2gnaXnYiEtvgtU643n27vfoXIFv/YDYCjI9coOpAUPln+zptkRSjsniR+9S2FJ+SAPYRbmxCT8tddST1+5swKNfQ2hENWfZNdE1jdBGo5v2clp+yZ7GzuEU62hoTc0MfIv89+3RE9PRLGD7ms2Q+JM8JDKKjSVZNeyiGdymq36yqfum8JZPTxlLKmPAGP8BHZoDwb4Jw96oyh2eZMq3gCGCIYR+RiK8Ht3a4mDFuDbQ49Czruj/UOWwI7tUav4abAAO6uVrCtpL2dynFI6wa98qCCwXIbnA5nypZZGoUXe1FbO+zHjoMz1HrGW/2ePKIeYuBmCy3hGb6h4hw3QqkR3ti0UL7z3uvs2dh+vAxxtnIe5JOFbUgqxNBw/a4tQLMkrcPsX0tUENpP0pftcBy3m2ZvNtvUkXOI8klESz8nzK7FD7GpiZlaPjicYV3RprfFk6C zetmGjfh VEpddA6BRb78tClqxDgZ5HZl2CKnOpIbPcdLzrHjVUmGw2LpjPM4MTqWnPzysSiHWQJu+SF8+PGhcQqL9cZva/0wLMAnHXD4dLPhWSGHSWtMAr9w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Rename exported function related to the charge move to have the memcg1_ prefix. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- mm/memcontrol-v1.c | 14 +++++++------- mm/memcontrol-v1.h | 8 ++++---- mm/memcontrol.c | 8 ++++---- 3 files changed, 15 insertions(+), 15 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index f4c8bec5ae1b..c25e038ac874 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -384,7 +384,7 @@ static bool mem_cgroup_under_move(struct mem_cgroup *memcg) return ret; } -bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) +bool memcg1_wait_acct_move(struct mem_cgroup *memcg) { if (mc.moving_task && current != mc.moving_task) { if (mem_cgroup_under_move(memcg)) { @@ -1056,7 +1056,7 @@ static void mem_cgroup_clear_mc(void) mmput(mm); } -int mem_cgroup_can_attach(struct cgroup_taskset *tset) +int memcg1_can_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; struct mem_cgroup *memcg = NULL; /* unneeded init to make gcc happy */ @@ -1126,7 +1126,7 @@ int mem_cgroup_can_attach(struct cgroup_taskset *tset) return ret; } -void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) +void memcg1_cancel_attach(struct cgroup_taskset *tset) { if (mc.to) mem_cgroup_clear_mc(); @@ -1285,7 +1285,7 @@ static void mem_cgroup_move_charge(void) atomic_dec(&mc.from->moving_account); } -void mem_cgroup_move_task(void) +void memcg1_move_task(void) { if (mc.to) { mem_cgroup_move_charge(); @@ -1294,14 +1294,14 @@ void mem_cgroup_move_task(void) } #else /* !CONFIG_MMU */ -static int mem_cgroup_can_attach(struct cgroup_taskset *tset) +int memcg1_can_attach(struct cgroup_taskset *tset) { return 0; } -static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset) +void memcg1_cancel_attach(struct cgroup_taskset *tset) { } -static void mem_cgroup_move_task(void) +void memcg1_move_task(void) { } #endif diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 55e7c4f90c39..d377c0be9880 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -29,11 +29,11 @@ static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n); void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n); -bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg); +bool memcg1_wait_acct_move(struct mem_cgroup *memcg); struct cgroup_taskset; -int mem_cgroup_can_attach(struct cgroup_taskset *tset); -void mem_cgroup_cancel_attach(struct cgroup_taskset *tset); -void mem_cgroup_move_task(void); +int memcg1_can_attach(struct cgroup_taskset *tset); +void memcg1_cancel_attach(struct cgroup_taskset *tset); +void memcg1_move_task(void); struct cftype; u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3332c89cae2e..da2c0fa0de1b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2582,7 +2582,7 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, * At task move, charge accounts can be doubly counted. So, it's * better to wait until the end of task_move if something is going on. */ - if (mem_cgroup_wait_acct_move(mem_over_limit)) + if (memcg1_wait_acct_move(mem_over_limit)) goto retry; if (nr_retries--) @@ -6030,12 +6030,12 @@ struct cgroup_subsys memory_cgrp_subsys = { .css_free = mem_cgroup_css_free, .css_reset = mem_cgroup_css_reset, .css_rstat_flush = mem_cgroup_css_rstat_flush, - .can_attach = mem_cgroup_can_attach, + .can_attach = memcg1_can_attach, #if defined(CONFIG_LRU_GEN) || defined(CONFIG_MEMCG_KMEM) .attach = mem_cgroup_attach, #endif - .cancel_attach = mem_cgroup_cancel_attach, - .post_attach = mem_cgroup_move_task, + .cancel_attach = memcg1_cancel_attach, + .post_attach = memcg1_move_task, #ifdef CONFIG_MEMCG_KMEM .fork = mem_cgroup_fork, .exit = mem_cgroup_exit, From patchwork Tue Jun 25 00:58:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710400 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2740BC2BD09 for ; Tue, 25 Jun 2024 00:59:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1D3496B0356; Mon, 24 Jun 2024 20:59:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 181F36B0357; Mon, 24 Jun 2024 20:59:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA5BE6B0358; Mon, 24 Jun 2024 20:59:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C08616B0356 for ; Mon, 24 Jun 2024 20:59:43 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6B2A3C14F0 for ; Tue, 25 Jun 2024 00:59:43 +0000 (UTC) X-FDA: 82267603446.05.D10CFDA Received: from out-175.mta0.migadu.com (out-175.mta0.migadu.com [91.218.175.175]) by imf23.hostedemail.com (Postfix) with ESMTP id 0E3A5140008 for ; Tue, 25 Jun 2024 00:59:40 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=sRsGEcY6; spf=pass (imf23.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.175 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277168; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uHE77+TOfsb44R4Zdivy+arvI6EdnE8krvjgQcOnIU4=; b=rWqmq8CXKnbzQN8P4HXGPhfiSUj4Yi3lSEgKIF0+37oyjEaqDpOrHvK+dGYTVOMpKMrtRs Tib1doY9AG/2RapBPFHYcDOyli7sbzuVM5/dM2X8HsJGaheeGHzOjJNQQDUUlNZOEJ0TP+ 2vHM0vQX+TLJHCRWoxhXg6crV2DSuOI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277168; a=rsa-sha256; cv=none; b=y7zm+1vy016YEVu6OQlQydPlKf5+986qp6uwQsjILpAiOAgSKANQkEa20Wyvz7EkZa7YFt gbi1jfKMCS56KAoP6tYNXYf6J6UPwz27XesGuD3nqWc81z44nChxYHTuAlQHmYKDMD2E7R +Gl/2xTpWZMlGTcfB/i43WQ52kwNNeU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=sRsGEcY6; spf=pass (imf23.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.175 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277179; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uHE77+TOfsb44R4Zdivy+arvI6EdnE8krvjgQcOnIU4=; b=sRsGEcY6QL5IC0YQkXYUoQ7Kg1j8SVrZCi5BN8/Fx9mXprwavTJIVwpDVmJNHe5gWla/RV 8nn2dOVKeiKzenbNl2HTsZiqfYJ82CVfF58+i6cK4RJJRCAnC+GhVA9S8dt5ckBi7XAQGX Am4uKaptrsTYH28tdXRhRb6+sbe1C1o= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 06/14] mm: memcg: move legacy memcg event code into memcontrol-v1.c Date: Mon, 24 Jun 2024 17:58:58 -0700 Message-ID: <20240625005906.106920-7-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 0E3A5140008 X-Stat-Signature: 1yqjhyauw5o1guga7yzhht8fenwwq8od X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1719277180-91717 X-HE-Meta: U2FsdGVkX1/NogXbRY/dWEf6oi0KnCbe2WOr5f3zfuOxW/fDwgiDZuhpZnkQgHUWvoFOqFZ8Pfbbqa9mAdAHgoR4zwrvq0E/laI8O7+t31ESDBcKp9dQMPD6/EoOlrbfqgFPG8nKYRY1lHzWAkfL6NoD82E0C24T0iIfVpincJfmlquD3Zze5tcI54lXG5Hcu2T3wGcPrerg03Ix7OnneCb454PxdcYzp2JBeXM3D3qRbmXuTG6ZuKcFO4Bc3lZ0deaTG1+UwcYPLOKhJFc7er80Ellb9zGYg3nLXO3FvYFIpl6y6IYnhX2/t4pK3jW/zzggTCHCBYg4HRLyvySOIZ90cxrGipL7KLpq1FK1UKnmgcgZXfQSN6yh/eDqTKTVt1H4AR+omTn8sOvQJUKjYaHmoGbHSFgovhfl+YBkUvFRSk/QOjYddigxRkGXnhzm0sjXStgBV9IH+M0A+tGta6oJk1mAJcapsD9INT6ofhyjnuBffo/LWpIHtOU6D4OvZtmnsJ3O7BWQjlJKNc1PSddbQGzvuz7KItzZiqHmEpdAUsR7xCnsCjwxW89FjmvwJ74uIgGmLT23cR+Vsw+PwEkkka+tiBoEsFnfYBf9feBh5mhVQPdQDieq/Il9wJoIx3YkFaWrt9lUAj0fmMCdAUBYkO/Hw/NK4rJusY0tmYbMeot/sLrms9g/vGzTHdX4S9oVJuY/Oin5SjOf5uSQXISB50L0A517xVWRMp4oYCzUTGyo1scHR6NFPXB20ujWXNFt4kWqk03dkP+YUl/CwiFbZFdpvqtNfJGX3LNb8NC6okIlCWxcVrKqKv003qS7YPZklSd8uceOO9q+CXjmE2OYgJbq9hl0itquoKtCTY6L9GWaWbKQEV7n2J3M4e7LY+KEGdIkwvLz8x1vRmc0ptIIh1HrpLXh+5LKVNOWs3Tu+P1qJdoQ6k406O+zCLmxHDins47EanGIZ9D7pey EPAH98oa bR+gPJ4THNhoQ+/OWOX4HRFjI8lOBVuVVRhZZVbHlenVKhEqUXC6N0IZhPE3OeyEKdCamVl7tWswOZOoh2ekmElzSGKJ7FMW6IerYTOWy02dO7ctK83z1tK7TBodqjSrD6oBnwWkiFzsTiDdfr2f2Ex6b91glvKJsUDPW2HLDgRaefXg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Cgroup v1's memory controller contains a pretty complicated event notifications mechanism which is not used on cgroup v2. Let's move the corresponding code into memcontrol-v1.c. Please, note, that mem_cgroup_event_ratelimit() remains in memcontrol.c, otherwise it would require exporting too many details on memcg stats outside of memcontrol.c. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- include/linux/memcontrol.h | 12 - mm/memcontrol-v1.c | 653 +++++++++++++++++++++++++++++++++++ mm/memcontrol-v1.h | 51 +++ mm/memcontrol.c | 687 +------------------------------------ 4 files changed, 709 insertions(+), 694 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 83c8327455d8..588179d29849 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -69,18 +69,6 @@ struct mem_cgroup_id { refcount_t ref; }; -/* - * Per memcg event counter is incremented at every pagein/pageout. With THP, - * it will be incremented by the number of pages. This counter is used - * to trigger some periodic events. This is straightforward and better - * than using jiffies etc. to handle periodic memcg event. - */ -enum mem_cgroup_events_target { - MEM_CGROUP_TARGET_THRESH, - MEM_CGROUP_TARGET_SOFTLIMIT, - MEM_CGROUP_NTARGETS, -}; - struct memcg_vmstats_percpu; struct memcg_vmstats; struct lruvec_stats_percpu; diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index c25e038ac874..4b2290ceace6 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -6,6 +6,10 @@ #include #include #include +#include +#include +#include +#include #include "internal.h" #include "swap.h" @@ -60,6 +64,54 @@ static struct move_charge_struct { .waitq = __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq), }; +/* for OOM */ +struct mem_cgroup_eventfd_list { + struct list_head list; + struct eventfd_ctx *eventfd; +}; + +/* + * cgroup_event represents events which userspace want to receive. + */ +struct mem_cgroup_event { + /* + * memcg which the event belongs to. + */ + struct mem_cgroup *memcg; + /* + * eventfd to signal userspace about the event. + */ + struct eventfd_ctx *eventfd; + /* + * Each of these stored in a list by the cgroup. + */ + struct list_head list; + /* + * register_event() callback will be used to add new userspace + * waiter for changes related to this event. Use eventfd_signal() + * on eventfd to send notification to userspace. + */ + int (*register_event)(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args); + /* + * unregister_event() callback will be called when userspace closes + * the eventfd or on cgroup removing. This callback must be set, + * if you want provide notification functionality. + */ + void (*unregister_event)(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd); + /* + * All fields below needed to unregister event when + * userspace closes eventfd. + */ + poll_table pt; + wait_queue_head_t *wqh; + wait_queue_entry_t wait; + struct work_struct remove; +}; + +extern spinlock_t memcg_oom_lock; + static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, struct mem_cgroup_tree_per_node *mctz, unsigned long new_usage_in_excess) @@ -1306,6 +1358,607 @@ void memcg1_move_task(void) } #endif +static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap) +{ + struct mem_cgroup_threshold_ary *t; + unsigned long usage; + int i; + + rcu_read_lock(); + if (!swap) + t = rcu_dereference(memcg->thresholds.primary); + else + t = rcu_dereference(memcg->memsw_thresholds.primary); + + if (!t) + goto unlock; + + usage = mem_cgroup_usage(memcg, swap); + + /* + * current_threshold points to threshold just below or equal to usage. + * If it's not true, a threshold was crossed after last + * call of __mem_cgroup_threshold(). + */ + i = t->current_threshold; + + /* + * Iterate backward over array of thresholds starting from + * current_threshold and check if a threshold is crossed. + * If none of thresholds below usage is crossed, we read + * only one element of the array here. + */ + for (; i >= 0 && unlikely(t->entries[i].threshold > usage); i--) + eventfd_signal(t->entries[i].eventfd); + + /* i = current_threshold + 1 */ + i++; + + /* + * Iterate forward over array of thresholds starting from + * current_threshold+1 and check if a threshold is crossed. + * If none of thresholds above usage is crossed, we read + * only one element of the array here. + */ + for (; i < t->size && unlikely(t->entries[i].threshold <= usage); i++) + eventfd_signal(t->entries[i].eventfd); + + /* Update current_threshold */ + t->current_threshold = i - 1; +unlock: + rcu_read_unlock(); +} + +static void mem_cgroup_threshold(struct mem_cgroup *memcg) +{ + while (memcg) { + __mem_cgroup_threshold(memcg, false); + if (do_memsw_account()) + __mem_cgroup_threshold(memcg, true); + + memcg = parent_mem_cgroup(memcg); + } +} + +/* + * Check events in order. + * + */ +void memcg_check_events(struct mem_cgroup *memcg, int nid) +{ + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + return; + + /* threshold event is triggered in finer grain than soft limit */ + if (unlikely(mem_cgroup_event_ratelimit(memcg, + MEM_CGROUP_TARGET_THRESH))) { + bool do_softlimit; + + do_softlimit = mem_cgroup_event_ratelimit(memcg, + MEM_CGROUP_TARGET_SOFTLIMIT); + mem_cgroup_threshold(memcg); + if (unlikely(do_softlimit)) + memcg1_update_tree(memcg, nid); + } +} + +static int compare_thresholds(const void *a, const void *b) +{ + const struct mem_cgroup_threshold *_a = a; + const struct mem_cgroup_threshold *_b = b; + + if (_a->threshold > _b->threshold) + return 1; + + if (_a->threshold < _b->threshold) + return -1; + + return 0; +} + +static int mem_cgroup_oom_notify_cb(struct mem_cgroup *memcg) +{ + struct mem_cgroup_eventfd_list *ev; + + spin_lock(&memcg_oom_lock); + + list_for_each_entry(ev, &memcg->oom_notify, list) + eventfd_signal(ev->eventfd); + + spin_unlock(&memcg_oom_lock); + return 0; +} + +void mem_cgroup_oom_notify(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + + for_each_mem_cgroup_tree(iter, memcg) + mem_cgroup_oom_notify_cb(iter); +} + +static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args, enum res_type type) +{ + struct mem_cgroup_thresholds *thresholds; + struct mem_cgroup_threshold_ary *new; + unsigned long threshold; + unsigned long usage; + int i, size, ret; + + ret = page_counter_memparse(args, "-1", &threshold); + if (ret) + return ret; + + mutex_lock(&memcg->thresholds_lock); + + if (type == _MEM) { + thresholds = &memcg->thresholds; + usage = mem_cgroup_usage(memcg, false); + } else if (type == _MEMSWAP) { + thresholds = &memcg->memsw_thresholds; + usage = mem_cgroup_usage(memcg, true); + } else + BUG(); + + /* Check if a threshold crossed before adding a new one */ + if (thresholds->primary) + __mem_cgroup_threshold(memcg, type == _MEMSWAP); + + size = thresholds->primary ? thresholds->primary->size + 1 : 1; + + /* Allocate memory for new array of thresholds */ + new = kmalloc(struct_size(new, entries, size), GFP_KERNEL); + if (!new) { + ret = -ENOMEM; + goto unlock; + } + new->size = size; + + /* Copy thresholds (if any) to new array */ + if (thresholds->primary) + memcpy(new->entries, thresholds->primary->entries, + flex_array_size(new, entries, size - 1)); + + /* Add new threshold */ + new->entries[size - 1].eventfd = eventfd; + new->entries[size - 1].threshold = threshold; + + /* Sort thresholds. Registering of new threshold isn't time-critical */ + sort(new->entries, size, sizeof(*new->entries), + compare_thresholds, NULL); + + /* Find current threshold */ + new->current_threshold = -1; + for (i = 0; i < size; i++) { + if (new->entries[i].threshold <= usage) { + /* + * new->current_threshold will not be used until + * rcu_assign_pointer(), so it's safe to increment + * it here. + */ + ++new->current_threshold; + } else + break; + } + + /* Free old spare buffer and save old primary buffer as spare */ + kfree(thresholds->spare); + thresholds->spare = thresholds->primary; + + rcu_assign_pointer(thresholds->primary, new); + + /* To be sure that nobody uses thresholds */ + synchronize_rcu(); + +unlock: + mutex_unlock(&memcg->thresholds_lock); + + return ret; +} + +static int mem_cgroup_usage_register_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args) +{ + return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEM); +} + +static int memsw_cgroup_usage_register_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args) +{ + return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEMSWAP); +} + +static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, enum res_type type) +{ + struct mem_cgroup_thresholds *thresholds; + struct mem_cgroup_threshold_ary *new; + unsigned long usage; + int i, j, size, entries; + + mutex_lock(&memcg->thresholds_lock); + + if (type == _MEM) { + thresholds = &memcg->thresholds; + usage = mem_cgroup_usage(memcg, false); + } else if (type == _MEMSWAP) { + thresholds = &memcg->memsw_thresholds; + usage = mem_cgroup_usage(memcg, true); + } else + BUG(); + + if (!thresholds->primary) + goto unlock; + + /* Check if a threshold crossed before removing */ + __mem_cgroup_threshold(memcg, type == _MEMSWAP); + + /* Calculate new number of threshold */ + size = entries = 0; + for (i = 0; i < thresholds->primary->size; i++) { + if (thresholds->primary->entries[i].eventfd != eventfd) + size++; + else + entries++; + } + + new = thresholds->spare; + + /* If no items related to eventfd have been cleared, nothing to do */ + if (!entries) + goto unlock; + + /* Set thresholds array to NULL if we don't have thresholds */ + if (!size) { + kfree(new); + new = NULL; + goto swap_buffers; + } + + new->size = size; + + /* Copy thresholds and find current threshold */ + new->current_threshold = -1; + for (i = 0, j = 0; i < thresholds->primary->size; i++) { + if (thresholds->primary->entries[i].eventfd == eventfd) + continue; + + new->entries[j] = thresholds->primary->entries[i]; + if (new->entries[j].threshold <= usage) { + /* + * new->current_threshold will not be used + * until rcu_assign_pointer(), so it's safe to increment + * it here. + */ + ++new->current_threshold; + } + j++; + } + +swap_buffers: + /* Swap primary and spare array */ + thresholds->spare = thresholds->primary; + + rcu_assign_pointer(thresholds->primary, new); + + /* To be sure that nobody uses thresholds */ + synchronize_rcu(); + + /* If all events are unregistered, free the spare array */ + if (!new) { + kfree(thresholds->spare); + thresholds->spare = NULL; + } +unlock: + mutex_unlock(&memcg->thresholds_lock); +} + +static void mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd) +{ + return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEM); +} + +static void memsw_cgroup_usage_unregister_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd) +{ + return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEMSWAP); +} + +static int mem_cgroup_oom_register_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd, const char *args) +{ + struct mem_cgroup_eventfd_list *event; + + event = kmalloc(sizeof(*event), GFP_KERNEL); + if (!event) + return -ENOMEM; + + spin_lock(&memcg_oom_lock); + + event->eventfd = eventfd; + list_add(&event->list, &memcg->oom_notify); + + /* already in OOM ? */ + if (memcg->under_oom) + eventfd_signal(eventfd); + spin_unlock(&memcg_oom_lock); + + return 0; +} + +static void mem_cgroup_oom_unregister_event(struct mem_cgroup *memcg, + struct eventfd_ctx *eventfd) +{ + struct mem_cgroup_eventfd_list *ev, *tmp; + + spin_lock(&memcg_oom_lock); + + list_for_each_entry_safe(ev, tmp, &memcg->oom_notify, list) { + if (ev->eventfd == eventfd) { + list_del(&ev->list); + kfree(ev); + } + } + + spin_unlock(&memcg_oom_lock); +} + +/* + * DO NOT USE IN NEW FILES. + * + * "cgroup.event_control" implementation. + * + * This is way over-engineered. It tries to support fully configurable + * events for each user. Such level of flexibility is completely + * unnecessary especially in the light of the planned unified hierarchy. + * + * Please deprecate this and replace with something simpler if at all + * possible. + */ + +/* + * Unregister event and free resources. + * + * Gets called from workqueue. + */ +static void memcg_event_remove(struct work_struct *work) +{ + struct mem_cgroup_event *event = + container_of(work, struct mem_cgroup_event, remove); + struct mem_cgroup *memcg = event->memcg; + + remove_wait_queue(event->wqh, &event->wait); + + event->unregister_event(memcg, event->eventfd); + + /* Notify userspace the event is going away. */ + eventfd_signal(event->eventfd); + + eventfd_ctx_put(event->eventfd); + kfree(event); + css_put(&memcg->css); +} + +/* + * Gets called on EPOLLHUP on eventfd when user closes it. + * + * Called with wqh->lock held and interrupts disabled. + */ +static int memcg_event_wake(wait_queue_entry_t *wait, unsigned mode, + int sync, void *key) +{ + struct mem_cgroup_event *event = + container_of(wait, struct mem_cgroup_event, wait); + struct mem_cgroup *memcg = event->memcg; + __poll_t flags = key_to_poll(key); + + if (flags & EPOLLHUP) { + /* + * If the event has been detached at cgroup removal, we + * can simply return knowing the other side will cleanup + * for us. + * + * We can't race against event freeing since the other + * side will require wqh->lock via remove_wait_queue(), + * which we hold. + */ + spin_lock(&memcg->event_list_lock); + if (!list_empty(&event->list)) { + list_del_init(&event->list); + /* + * We are in atomic context, but cgroup_event_remove() + * may sleep, so we have to call it in workqueue. + */ + schedule_work(&event->remove); + } + spin_unlock(&memcg->event_list_lock); + } + + return 0; +} + +static void memcg_event_ptable_queue_proc(struct file *file, + wait_queue_head_t *wqh, poll_table *pt) +{ + struct mem_cgroup_event *event = + container_of(pt, struct mem_cgroup_event, pt); + + event->wqh = wqh; + add_wait_queue(wqh, &event->wait); +} + +/* + * DO NOT USE IN NEW FILES. + * + * Parse input and register new cgroup event handler. + * + * Input must be in format ' '. + * Interpretation of args is defined by control file implementation. + */ +ssize_t memcg_write_event_control(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct cgroup_subsys_state *css = of_css(of); + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + struct mem_cgroup_event *event; + struct cgroup_subsys_state *cfile_css; + unsigned int efd, cfd; + struct fd efile; + struct fd cfile; + struct dentry *cdentry; + const char *name; + char *endp; + int ret; + + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + return -EOPNOTSUPP; + + buf = strstrip(buf); + + efd = simple_strtoul(buf, &endp, 10); + if (*endp != ' ') + return -EINVAL; + buf = endp + 1; + + cfd = simple_strtoul(buf, &endp, 10); + if ((*endp != ' ') && (*endp != '\0')) + return -EINVAL; + buf = endp + 1; + + event = kzalloc(sizeof(*event), GFP_KERNEL); + if (!event) + return -ENOMEM; + + event->memcg = memcg; + INIT_LIST_HEAD(&event->list); + init_poll_funcptr(&event->pt, memcg_event_ptable_queue_proc); + init_waitqueue_func_entry(&event->wait, memcg_event_wake); + INIT_WORK(&event->remove, memcg_event_remove); + + efile = fdget(efd); + if (!efile.file) { + ret = -EBADF; + goto out_kfree; + } + + event->eventfd = eventfd_ctx_fileget(efile.file); + if (IS_ERR(event->eventfd)) { + ret = PTR_ERR(event->eventfd); + goto out_put_efile; + } + + cfile = fdget(cfd); + if (!cfile.file) { + ret = -EBADF; + goto out_put_eventfd; + } + + /* the process need read permission on control file */ + /* AV: shouldn't we check that it's been opened for read instead? */ + ret = file_permission(cfile.file, MAY_READ); + if (ret < 0) + goto out_put_cfile; + + /* + * The control file must be a regular cgroup1 file. As a regular cgroup + * file can't be renamed, it's safe to access its name afterwards. + */ + cdentry = cfile.file->f_path.dentry; + if (cdentry->d_sb->s_type != &cgroup_fs_type || !d_is_reg(cdentry)) { + ret = -EINVAL; + goto out_put_cfile; + } + + /* + * Determine the event callbacks and set them in @event. This used + * to be done via struct cftype but cgroup core no longer knows + * about these events. The following is crude but the whole thing + * is for compatibility anyway. + * + * DO NOT ADD NEW FILES. + */ + name = cdentry->d_name.name; + + if (!strcmp(name, "memory.usage_in_bytes")) { + event->register_event = mem_cgroup_usage_register_event; + event->unregister_event = mem_cgroup_usage_unregister_event; + } else if (!strcmp(name, "memory.oom_control")) { + event->register_event = mem_cgroup_oom_register_event; + event->unregister_event = mem_cgroup_oom_unregister_event; + } else if (!strcmp(name, "memory.pressure_level")) { + event->register_event = vmpressure_register_event; + event->unregister_event = vmpressure_unregister_event; + } else if (!strcmp(name, "memory.memsw.usage_in_bytes")) { + event->register_event = memsw_cgroup_usage_register_event; + event->unregister_event = memsw_cgroup_usage_unregister_event; + } else { + ret = -EINVAL; + goto out_put_cfile; + } + + /* + * Verify @cfile should belong to @css. Also, remaining events are + * automatically removed on cgroup destruction but the removal is + * asynchronous, so take an extra ref on @css. + */ + cfile_css = css_tryget_online_from_dir(cdentry->d_parent, + &memory_cgrp_subsys); + ret = -EINVAL; + if (IS_ERR(cfile_css)) + goto out_put_cfile; + if (cfile_css != css) { + css_put(cfile_css); + goto out_put_cfile; + } + + ret = event->register_event(memcg, event->eventfd, buf); + if (ret) + goto out_put_css; + + vfs_poll(efile.file, &event->pt); + + spin_lock_irq(&memcg->event_list_lock); + list_add(&event->list, &memcg->event_list); + spin_unlock_irq(&memcg->event_list_lock); + + fdput(cfile); + fdput(efile); + + return nbytes; + +out_put_css: + css_put(css); +out_put_cfile: + fdput(cfile); +out_put_eventfd: + eventfd_ctx_put(event->eventfd); +out_put_efile: + fdput(efile); +out_kfree: + kfree(event); + + return ret; +} + +void memcg1_css_offline(struct mem_cgroup *memcg) +{ + struct mem_cgroup_event *event, *tmp; + + /* + * Unregister events and notify userspace. + * Notify userspace about cgroup removing only after rmdir of cgroup + * directory to avoid race between userspace and kernelspace. + */ + spin_lock_irq(&memcg->event_list_lock); + list_for_each_entry_safe(event, tmp, &memcg->event_list, list) { + list_del_init(&event->list); + schedule_work(&event->remove); + } + spin_unlock_irq(&memcg->event_list_lock); +} + static int __init memcg1_init(void) { int node; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index d377c0be9880..524a2c76ffc9 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -41,4 +41,55 @@ u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, struct cftype *cft, u64 val); +/* + * Per memcg event counter is incremented at every pagein/pageout. With THP, + * it will be incremented by the number of pages. This counter is used + * to trigger some periodic events. This is straightforward and better + * than using jiffies etc. to handle periodic memcg event. + */ +enum mem_cgroup_events_target { + MEM_CGROUP_TARGET_THRESH, + MEM_CGROUP_TARGET_SOFTLIMIT, + MEM_CGROUP_NTARGETS, +}; + +/* Whether legacy memory+swap accounting is active */ +static bool do_memsw_account(void) +{ + return !cgroup_subsys_on_dfl(memory_cgrp_subsys); +} + +/* + * Iteration constructs for visiting all cgroups (under a tree). If + * loops are exited prematurely (break), mem_cgroup_iter_break() must + * be used for reference counting. + */ +#define for_each_mem_cgroup_tree(iter, root) \ + for (iter = mem_cgroup_iter(root, NULL, NULL); \ + iter != NULL; \ + iter = mem_cgroup_iter(root, iter, NULL)) + +#define for_each_mem_cgroup(iter) \ + for (iter = mem_cgroup_iter(NULL, NULL, NULL); \ + iter != NULL; \ + iter = mem_cgroup_iter(NULL, iter, NULL)) + +void memcg1_css_offline(struct mem_cgroup *memcg); + +/* for encoding cft->private value on file */ +enum res_type { + _MEM, + _MEMSWAP, + _KMEM, + _TCP, +}; + +bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, + enum mem_cgroup_events_target target); +unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); +void mem_cgroup_oom_notify(struct mem_cgroup *memcg); +ssize_t memcg_write_event_control(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off); + + #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index da2c0fa0de1b..bd4b26a73596 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -46,9 +46,6 @@ #include #include #include -#include -#include -#include #include #include #include @@ -59,7 +56,6 @@ #include #include #include -#include #include #include #include @@ -97,91 +93,13 @@ static bool cgroup_memory_nobpf __ro_after_init; static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq); #endif -/* Whether legacy memory+swap accounting is active */ -static bool do_memsw_account(void) -{ - return !cgroup_subsys_on_dfl(memory_cgrp_subsys); -} - #define THRESHOLDS_EVENTS_TARGET 128 #define SOFTLIMIT_EVENTS_TARGET 1024 -/* for OOM */ -struct mem_cgroup_eventfd_list { - struct list_head list; - struct eventfd_ctx *eventfd; -}; - -/* - * cgroup_event represents events which userspace want to receive. - */ -struct mem_cgroup_event { - /* - * memcg which the event belongs to. - */ - struct mem_cgroup *memcg; - /* - * eventfd to signal userspace about the event. - */ - struct eventfd_ctx *eventfd; - /* - * Each of these stored in a list by the cgroup. - */ - struct list_head list; - /* - * register_event() callback will be used to add new userspace - * waiter for changes related to this event. Use eventfd_signal() - * on eventfd to send notification to userspace. - */ - int (*register_event)(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args); - /* - * unregister_event() callback will be called when userspace closes - * the eventfd or on cgroup removing. This callback must be set, - * if you want provide notification functionality. - */ - void (*unregister_event)(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd); - /* - * All fields below needed to unregister event when - * userspace closes eventfd. - */ - poll_table pt; - wait_queue_head_t *wqh; - wait_queue_entry_t wait; - struct work_struct remove; -}; - -static void mem_cgroup_threshold(struct mem_cgroup *memcg); -static void mem_cgroup_oom_notify(struct mem_cgroup *memcg); - -/* for encoding cft->private value on file */ -enum res_type { - _MEM, - _MEMSWAP, - _KMEM, - _TCP, -}; - #define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val)) #define MEMFILE_TYPE(val) ((val) >> 16 & 0xffff) #define MEMFILE_ATTR(val) ((val) & 0xffff) -/* - * Iteration constructs for visiting all cgroups (under a tree). If - * loops are exited prematurely (break), mem_cgroup_iter_break() must - * be used for reference counting. - */ -#define for_each_mem_cgroup_tree(iter, root) \ - for (iter = mem_cgroup_iter(root, NULL, NULL); \ - iter != NULL; \ - iter = mem_cgroup_iter(root, iter, NULL)) - -#define for_each_mem_cgroup(iter) \ - for (iter = mem_cgroup_iter(NULL, NULL, NULL); \ - iter != NULL; \ - iter = mem_cgroup_iter(NULL, iter, NULL)) - static inline bool task_is_dying(void) { return tsk_is_oom_victim(current) || fatal_signal_pending(current) || @@ -940,8 +858,8 @@ void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages) __this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages); } -static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, - enum mem_cgroup_events_target target) +bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, + enum mem_cgroup_events_target target) { unsigned long val, next; @@ -965,28 +883,6 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, return false; } -/* - * Check events in order. - * - */ -void memcg_check_events(struct mem_cgroup *memcg, int nid) -{ - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - return; - - /* threshold event is triggered in finer grain than soft limit */ - if (unlikely(mem_cgroup_event_ratelimit(memcg, - MEM_CGROUP_TARGET_THRESH))) { - bool do_softlimit; - - do_softlimit = mem_cgroup_event_ratelimit(memcg, - MEM_CGROUP_TARGET_SOFTLIMIT); - mem_cgroup_threshold(memcg); - if (unlikely(do_softlimit)) - memcg1_update_tree(memcg, nid); - } -} - struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p) { /* @@ -1726,7 +1622,7 @@ static struct lockdep_map memcg_oom_lock_dep_map = { }; #endif -static DEFINE_SPINLOCK(memcg_oom_lock); +DEFINE_SPINLOCK(memcg_oom_lock); /* * Check OOM-Killer is already running under our hierarchy. @@ -3545,7 +3441,7 @@ static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css, return -EINVAL; } -static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) +unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) { unsigned long val; @@ -4046,331 +3942,6 @@ static int mem_cgroup_swappiness_write(struct cgroup_subsys_state *css, return 0; } -static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap) -{ - struct mem_cgroup_threshold_ary *t; - unsigned long usage; - int i; - - rcu_read_lock(); - if (!swap) - t = rcu_dereference(memcg->thresholds.primary); - else - t = rcu_dereference(memcg->memsw_thresholds.primary); - - if (!t) - goto unlock; - - usage = mem_cgroup_usage(memcg, swap); - - /* - * current_threshold points to threshold just below or equal to usage. - * If it's not true, a threshold was crossed after last - * call of __mem_cgroup_threshold(). - */ - i = t->current_threshold; - - /* - * Iterate backward over array of thresholds starting from - * current_threshold and check if a threshold is crossed. - * If none of thresholds below usage is crossed, we read - * only one element of the array here. - */ - for (; i >= 0 && unlikely(t->entries[i].threshold > usage); i--) - eventfd_signal(t->entries[i].eventfd); - - /* i = current_threshold + 1 */ - i++; - - /* - * Iterate forward over array of thresholds starting from - * current_threshold+1 and check if a threshold is crossed. - * If none of thresholds above usage is crossed, we read - * only one element of the array here. - */ - for (; i < t->size && unlikely(t->entries[i].threshold <= usage); i++) - eventfd_signal(t->entries[i].eventfd); - - /* Update current_threshold */ - t->current_threshold = i - 1; -unlock: - rcu_read_unlock(); -} - -static void mem_cgroup_threshold(struct mem_cgroup *memcg) -{ - while (memcg) { - __mem_cgroup_threshold(memcg, false); - if (do_memsw_account()) - __mem_cgroup_threshold(memcg, true); - - memcg = parent_mem_cgroup(memcg); - } -} - -static int compare_thresholds(const void *a, const void *b) -{ - const struct mem_cgroup_threshold *_a = a; - const struct mem_cgroup_threshold *_b = b; - - if (_a->threshold > _b->threshold) - return 1; - - if (_a->threshold < _b->threshold) - return -1; - - return 0; -} - -static int mem_cgroup_oom_notify_cb(struct mem_cgroup *memcg) -{ - struct mem_cgroup_eventfd_list *ev; - - spin_lock(&memcg_oom_lock); - - list_for_each_entry(ev, &memcg->oom_notify, list) - eventfd_signal(ev->eventfd); - - spin_unlock(&memcg_oom_lock); - return 0; -} - -static void mem_cgroup_oom_notify(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter; - - for_each_mem_cgroup_tree(iter, memcg) - mem_cgroup_oom_notify_cb(iter); -} - -static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args, enum res_type type) -{ - struct mem_cgroup_thresholds *thresholds; - struct mem_cgroup_threshold_ary *new; - unsigned long threshold; - unsigned long usage; - int i, size, ret; - - ret = page_counter_memparse(args, "-1", &threshold); - if (ret) - return ret; - - mutex_lock(&memcg->thresholds_lock); - - if (type == _MEM) { - thresholds = &memcg->thresholds; - usage = mem_cgroup_usage(memcg, false); - } else if (type == _MEMSWAP) { - thresholds = &memcg->memsw_thresholds; - usage = mem_cgroup_usage(memcg, true); - } else - BUG(); - - /* Check if a threshold crossed before adding a new one */ - if (thresholds->primary) - __mem_cgroup_threshold(memcg, type == _MEMSWAP); - - size = thresholds->primary ? thresholds->primary->size + 1 : 1; - - /* Allocate memory for new array of thresholds */ - new = kmalloc(struct_size(new, entries, size), GFP_KERNEL); - if (!new) { - ret = -ENOMEM; - goto unlock; - } - new->size = size; - - /* Copy thresholds (if any) to new array */ - if (thresholds->primary) - memcpy(new->entries, thresholds->primary->entries, - flex_array_size(new, entries, size - 1)); - - /* Add new threshold */ - new->entries[size - 1].eventfd = eventfd; - new->entries[size - 1].threshold = threshold; - - /* Sort thresholds. Registering of new threshold isn't time-critical */ - sort(new->entries, size, sizeof(*new->entries), - compare_thresholds, NULL); - - /* Find current threshold */ - new->current_threshold = -1; - for (i = 0; i < size; i++) { - if (new->entries[i].threshold <= usage) { - /* - * new->current_threshold will not be used until - * rcu_assign_pointer(), so it's safe to increment - * it here. - */ - ++new->current_threshold; - } else - break; - } - - /* Free old spare buffer and save old primary buffer as spare */ - kfree(thresholds->spare); - thresholds->spare = thresholds->primary; - - rcu_assign_pointer(thresholds->primary, new); - - /* To be sure that nobody uses thresholds */ - synchronize_rcu(); - -unlock: - mutex_unlock(&memcg->thresholds_lock); - - return ret; -} - -static int mem_cgroup_usage_register_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args) -{ - return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEM); -} - -static int memsw_cgroup_usage_register_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args) -{ - return __mem_cgroup_usage_register_event(memcg, eventfd, args, _MEMSWAP); -} - -static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, enum res_type type) -{ - struct mem_cgroup_thresholds *thresholds; - struct mem_cgroup_threshold_ary *new; - unsigned long usage; - int i, j, size, entries; - - mutex_lock(&memcg->thresholds_lock); - - if (type == _MEM) { - thresholds = &memcg->thresholds; - usage = mem_cgroup_usage(memcg, false); - } else if (type == _MEMSWAP) { - thresholds = &memcg->memsw_thresholds; - usage = mem_cgroup_usage(memcg, true); - } else - BUG(); - - if (!thresholds->primary) - goto unlock; - - /* Check if a threshold crossed before removing */ - __mem_cgroup_threshold(memcg, type == _MEMSWAP); - - /* Calculate new number of threshold */ - size = entries = 0; - for (i = 0; i < thresholds->primary->size; i++) { - if (thresholds->primary->entries[i].eventfd != eventfd) - size++; - else - entries++; - } - - new = thresholds->spare; - - /* If no items related to eventfd have been cleared, nothing to do */ - if (!entries) - goto unlock; - - /* Set thresholds array to NULL if we don't have thresholds */ - if (!size) { - kfree(new); - new = NULL; - goto swap_buffers; - } - - new->size = size; - - /* Copy thresholds and find current threshold */ - new->current_threshold = -1; - for (i = 0, j = 0; i < thresholds->primary->size; i++) { - if (thresholds->primary->entries[i].eventfd == eventfd) - continue; - - new->entries[j] = thresholds->primary->entries[i]; - if (new->entries[j].threshold <= usage) { - /* - * new->current_threshold will not be used - * until rcu_assign_pointer(), so it's safe to increment - * it here. - */ - ++new->current_threshold; - } - j++; - } - -swap_buffers: - /* Swap primary and spare array */ - thresholds->spare = thresholds->primary; - - rcu_assign_pointer(thresholds->primary, new); - - /* To be sure that nobody uses thresholds */ - synchronize_rcu(); - - /* If all events are unregistered, free the spare array */ - if (!new) { - kfree(thresholds->spare); - thresholds->spare = NULL; - } -unlock: - mutex_unlock(&memcg->thresholds_lock); -} - -static void mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd) -{ - return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEM); -} - -static void memsw_cgroup_usage_unregister_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd) -{ - return __mem_cgroup_usage_unregister_event(memcg, eventfd, _MEMSWAP); -} - -static int mem_cgroup_oom_register_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd, const char *args) -{ - struct mem_cgroup_eventfd_list *event; - - event = kmalloc(sizeof(*event), GFP_KERNEL); - if (!event) - return -ENOMEM; - - spin_lock(&memcg_oom_lock); - - event->eventfd = eventfd; - list_add(&event->list, &memcg->oom_notify); - - /* already in OOM ? */ - if (memcg->under_oom) - eventfd_signal(eventfd); - spin_unlock(&memcg_oom_lock); - - return 0; -} - -static void mem_cgroup_oom_unregister_event(struct mem_cgroup *memcg, - struct eventfd_ctx *eventfd) -{ - struct mem_cgroup_eventfd_list *ev, *tmp; - - spin_lock(&memcg_oom_lock); - - list_for_each_entry_safe(ev, tmp, &memcg->oom_notify, list) { - if (ev->eventfd == eventfd) { - list_del(&ev->list); - kfree(ev); - } - } - - spin_unlock(&memcg_oom_lock); -} - static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_seq(sf); @@ -4611,243 +4182,6 @@ static void memcg_wb_domain_size_changed(struct mem_cgroup *memcg) #endif /* CONFIG_CGROUP_WRITEBACK */ -/* - * DO NOT USE IN NEW FILES. - * - * "cgroup.event_control" implementation. - * - * This is way over-engineered. It tries to support fully configurable - * events for each user. Such level of flexibility is completely - * unnecessary especially in the light of the planned unified hierarchy. - * - * Please deprecate this and replace with something simpler if at all - * possible. - */ - -/* - * Unregister event and free resources. - * - * Gets called from workqueue. - */ -static void memcg_event_remove(struct work_struct *work) -{ - struct mem_cgroup_event *event = - container_of(work, struct mem_cgroup_event, remove); - struct mem_cgroup *memcg = event->memcg; - - remove_wait_queue(event->wqh, &event->wait); - - event->unregister_event(memcg, event->eventfd); - - /* Notify userspace the event is going away. */ - eventfd_signal(event->eventfd); - - eventfd_ctx_put(event->eventfd); - kfree(event); - css_put(&memcg->css); -} - -/* - * Gets called on EPOLLHUP on eventfd when user closes it. - * - * Called with wqh->lock held and interrupts disabled. - */ -static int memcg_event_wake(wait_queue_entry_t *wait, unsigned mode, - int sync, void *key) -{ - struct mem_cgroup_event *event = - container_of(wait, struct mem_cgroup_event, wait); - struct mem_cgroup *memcg = event->memcg; - __poll_t flags = key_to_poll(key); - - if (flags & EPOLLHUP) { - /* - * If the event has been detached at cgroup removal, we - * can simply return knowing the other side will cleanup - * for us. - * - * We can't race against event freeing since the other - * side will require wqh->lock via remove_wait_queue(), - * which we hold. - */ - spin_lock(&memcg->event_list_lock); - if (!list_empty(&event->list)) { - list_del_init(&event->list); - /* - * We are in atomic context, but cgroup_event_remove() - * may sleep, so we have to call it in workqueue. - */ - schedule_work(&event->remove); - } - spin_unlock(&memcg->event_list_lock); - } - - return 0; -} - -static void memcg_event_ptable_queue_proc(struct file *file, - wait_queue_head_t *wqh, poll_table *pt) -{ - struct mem_cgroup_event *event = - container_of(pt, struct mem_cgroup_event, pt); - - event->wqh = wqh; - add_wait_queue(wqh, &event->wait); -} - -/* - * DO NOT USE IN NEW FILES. - * - * Parse input and register new cgroup event handler. - * - * Input must be in format ' '. - * Interpretation of args is defined by control file implementation. - */ -static ssize_t memcg_write_event_control(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off) -{ - struct cgroup_subsys_state *css = of_css(of); - struct mem_cgroup *memcg = mem_cgroup_from_css(css); - struct mem_cgroup_event *event; - struct cgroup_subsys_state *cfile_css; - unsigned int efd, cfd; - struct fd efile; - struct fd cfile; - struct dentry *cdentry; - const char *name; - char *endp; - int ret; - - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - return -EOPNOTSUPP; - - buf = strstrip(buf); - - efd = simple_strtoul(buf, &endp, 10); - if (*endp != ' ') - return -EINVAL; - buf = endp + 1; - - cfd = simple_strtoul(buf, &endp, 10); - if ((*endp != ' ') && (*endp != '\0')) - return -EINVAL; - buf = endp + 1; - - event = kzalloc(sizeof(*event), GFP_KERNEL); - if (!event) - return -ENOMEM; - - event->memcg = memcg; - INIT_LIST_HEAD(&event->list); - init_poll_funcptr(&event->pt, memcg_event_ptable_queue_proc); - init_waitqueue_func_entry(&event->wait, memcg_event_wake); - INIT_WORK(&event->remove, memcg_event_remove); - - efile = fdget(efd); - if (!efile.file) { - ret = -EBADF; - goto out_kfree; - } - - event->eventfd = eventfd_ctx_fileget(efile.file); - if (IS_ERR(event->eventfd)) { - ret = PTR_ERR(event->eventfd); - goto out_put_efile; - } - - cfile = fdget(cfd); - if (!cfile.file) { - ret = -EBADF; - goto out_put_eventfd; - } - - /* the process need read permission on control file */ - /* AV: shouldn't we check that it's been opened for read instead? */ - ret = file_permission(cfile.file, MAY_READ); - if (ret < 0) - goto out_put_cfile; - - /* - * The control file must be a regular cgroup1 file. As a regular cgroup - * file can't be renamed, it's safe to access its name afterwards. - */ - cdentry = cfile.file->f_path.dentry; - if (cdentry->d_sb->s_type != &cgroup_fs_type || !d_is_reg(cdentry)) { - ret = -EINVAL; - goto out_put_cfile; - } - - /* - * Determine the event callbacks and set them in @event. This used - * to be done via struct cftype but cgroup core no longer knows - * about these events. The following is crude but the whole thing - * is for compatibility anyway. - * - * DO NOT ADD NEW FILES. - */ - name = cdentry->d_name.name; - - if (!strcmp(name, "memory.usage_in_bytes")) { - event->register_event = mem_cgroup_usage_register_event; - event->unregister_event = mem_cgroup_usage_unregister_event; - } else if (!strcmp(name, "memory.oom_control")) { - event->register_event = mem_cgroup_oom_register_event; - event->unregister_event = mem_cgroup_oom_unregister_event; - } else if (!strcmp(name, "memory.pressure_level")) { - event->register_event = vmpressure_register_event; - event->unregister_event = vmpressure_unregister_event; - } else if (!strcmp(name, "memory.memsw.usage_in_bytes")) { - event->register_event = memsw_cgroup_usage_register_event; - event->unregister_event = memsw_cgroup_usage_unregister_event; - } else { - ret = -EINVAL; - goto out_put_cfile; - } - - /* - * Verify @cfile should belong to @css. Also, remaining events are - * automatically removed on cgroup destruction but the removal is - * asynchronous, so take an extra ref on @css. - */ - cfile_css = css_tryget_online_from_dir(cdentry->d_parent, - &memory_cgrp_subsys); - ret = -EINVAL; - if (IS_ERR(cfile_css)) - goto out_put_cfile; - if (cfile_css != css) { - css_put(cfile_css); - goto out_put_cfile; - } - - ret = event->register_event(memcg, event->eventfd, buf); - if (ret) - goto out_put_css; - - vfs_poll(efile.file, &event->pt); - - spin_lock_irq(&memcg->event_list_lock); - list_add(&event->list, &memcg->event_list); - spin_unlock_irq(&memcg->event_list_lock); - - fdput(cfile); - fdput(efile); - - return nbytes; - -out_put_css: - css_put(css); -out_put_cfile: - fdput(cfile); -out_put_eventfd: - eventfd_ctx_put(event->eventfd); -out_put_efile: - fdput(efile); -out_kfree: - kfree(event); - - return ret; -} - #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) static int mem_cgroup_slab_show(struct seq_file *m, void *p) { @@ -5314,19 +4648,8 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css) static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); - struct mem_cgroup_event *event, *tmp; - /* - * Unregister events and notify userspace. - * Notify userspace about cgroup removing only after rmdir of cgroup - * directory to avoid race between userspace and kernelspace. - */ - spin_lock_irq(&memcg->event_list_lock); - list_for_each_entry_safe(event, tmp, &memcg->event_list, list) { - list_del_init(&event->list); - schedule_work(&event->remove); - } - spin_unlock_irq(&memcg->event_list_lock); + memcg1_css_offline(memcg); page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); From patchwork Tue Jun 25 00:58:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08530C3064D for ; Tue, 25 Jun 2024 00:59:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA82B6B0357; Mon, 24 Jun 2024 20:59:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C5A0F6B0358; Mon, 24 Jun 2024 20:59:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD4D36B0359; Mon, 24 Jun 2024 20:59:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8C1006B0357 for ; Mon, 24 Jun 2024 20:59:45 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 08ADEA38D4 for ; Tue, 25 Jun 2024 00:59:45 +0000 (UTC) X-FDA: 82267603530.11.C199F68 Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173]) by imf22.hostedemail.com (Postfix) with ESMTP id EE743C0006 for ; Tue, 25 Jun 2024 00:59:42 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Zz5Mz7eu; spf=pass (imf22.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.173 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277164; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hnnziElivuQ/LCoNmnyRRYzwKafGd1VzRo+k4WxWNEA=; b=ZqHOKkfDGke2ucRwAttLoptMyS4V3RsH2iffAcfeICWQaYrWMk6s72NzorOsc3w/+NRIY5 CgfMU4+KSjGBfhQRlcqy9MTHRdLdWTIkNVqgFDXRULgzldEWLN8QhMjA9gnLZSFb1ukZgB n9ZP7DCqD0W0nhMg8tYtNewvzd6WZN0= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Zz5Mz7eu; spf=pass (imf22.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.173 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277164; a=rsa-sha256; cv=none; b=E7m/LCasOzflgjKOlgwTASZW0RkKcU1JTlbOAxR/VeQefXJUbrRkXrxRko9IJ/uN+5oZjR n1L8KLxgWgoLBKfkmSFJ2DLpRntJRNC/RMnQb6+Ohn719ZAzkmf55QeF/3pRMXK3W6SZ0I soecVFp4s/Z4vt9KbATqEcqZAVZAXGQ= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277181; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hnnziElivuQ/LCoNmnyRRYzwKafGd1VzRo+k4WxWNEA=; b=Zz5Mz7euqVmt7XORkaBlHGPy1k/atuJWJKdYE/ol1NG1b2dsF0yeJlI/cbm0Uxkz0ZqpZn pLewHnkD7xPEgvO0eWlj+ZwU0Gqm2DsLuhAD54EVH+gmRtJ+lvmB6Qm5PKalMHH1WgMkBw 5Z2h3LBBZWTLBVkhIT9tXB/VOGV2AtU= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 07/14] mm: memcg: rename memcg_check_events() Date: Mon, 24 Jun 2024 17:58:59 -0700 Message-ID: <20240625005906.106920-8-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: EE743C0006 X-Stat-Signature: s3z1k7h4fk7phm5988fwoydoxn5rsckw X-Rspam-User: X-HE-Tag: 1719277182-939321 X-HE-Meta: U2FsdGVkX1/xJMXGGgWik+ZJRU6/plzFeVumurvjB9ueFtqxSho3Nx0clqOBJuKWmmN8tS7qNYBqLqBJHeiiz+JaflvfddKPpCgLC0a3o5Z2ACQBE+zofRl4Qr/81xmxqmCKoyx3JxTE3RgJiVCXVKR3nk/cZc5YMTBsk/W5Ty9ljhpwTF0SLon4LuDd1zwk5FahekILhF8abUcueguCzDS4nJXSht+JADCQZV81ENsFPPXDd/SnULmwijWjoEHzd7uloV9ysHbcIeCfqO0JlVwaFxnkf4aIuHNgtGWCSENrVedNCzKvpFewhYjuBX6f+Hi6NpTFcZD8Q5vXVHtpcs9eyu3hD2H8hHnce4ydpPek3IlW6OYGjOrXUrnqxp95mjekb2GPBz8YOPD8VHThoyX9/FVb1Z/ErWN2m1mcisbHgXShdENqNYQEPfDHsuNxjMkN180WzgUbvt7GiPswoVqrjMj24ARs1V8zm9g9ZORlHoUJwMEMIz5dReHPQkOlsQWGOWxfnf5+NXgG0ppbOrWRb2tdNr0B+YA7Zts7t2zG5VXaALq4rseIMxkLm6v77RG4wc5TPnd9WpnFhHnXmriXHW/uAEJKJuVeb0DxDlVXZchUfuSZRQPwpI+jR6lxhGQMejjMU+dqwQK5TIas5WMVuy1IM5IJQdnwP5xaxgVrgekMnEB49f5i6CvXbzZXDVOBztNQYlLgEXuWQuw2DrZwBCA5fcGpjjNVc7A9p09Cn7+e/Z8VXgmQhGM75fSXS59mKIwkMp3LYJhL50mHCQX7n5JiMYXKjVBsm3ypKYXxQG5r+t37V84WH2JbXoGOa/0+XLfm7qU6+tAGWV0QE+8Tv7Q0URg7NKPHtTqrtp7+9sP/EdhWxwZG31nNnsrN8xXkGwsJ5dcfCF4ymJhStygFzivOVEzdwYHEA5JXGefKbE7kMAyQ6zcNFc8aIxO97pBxi7Ne4IrOSSL5K9e Z5lbZsS+ UmJt+tq3fl9B5vSZF16EN40lk/iIo3OQq4IKI8DinAc5rjDXYA+iNrrxDpB/wrRzAKRnJ/TsOgp5X6jWvyiLBgI5AvHCmuJh0SyJxhlmQyPD3VH7xV8pq0VS6UwrsGWniF0cZXYwD+4nstbIzOcnJEeZjBXnZbbPAoAMH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Rename memcg_check_events() into memcg1_check_events() for consistency with other cgroup v1-specific functions. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- mm/memcontrol-v1.c | 6 +++--- mm/memcontrol-v1.h | 2 +- mm/memcontrol.c | 8 ++++---- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 4b2290ceace6..d7b5c4c14732 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -835,9 +835,9 @@ static int mem_cgroup_move_account(struct folio *folio, local_irq_disable(); mem_cgroup_charge_statistics(to, nr_pages); - memcg_check_events(to, nid); + memcg1_check_events(to, nid); mem_cgroup_charge_statistics(from, -nr_pages); - memcg_check_events(from, nid); + memcg1_check_events(from, nid); local_irq_enable(); out: return ret; @@ -1424,7 +1424,7 @@ static void mem_cgroup_threshold(struct mem_cgroup *memcg) * Check events in order. * */ -void memcg_check_events(struct mem_cgroup *memcg, int nid) +void memcg1_check_events(struct mem_cgroup *memcg, int nid) { if (IS_ENABLED(CONFIG_PREEMPT_RT)) return; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 524a2c76ffc9..ef1b7037cbdc 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -12,7 +12,7 @@ static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) } void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages); -void memcg_check_events(struct mem_cgroup *memcg, int nid); +void memcg1_check_events(struct mem_cgroup *memcg, int nid); void memcg_oom_recover(struct mem_cgroup *memcg); int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index bd4b26a73596..92fb72bbd494 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2632,7 +2632,7 @@ void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg) local_irq_disable(); mem_cgroup_charge_statistics(memcg, folio_nr_pages(folio)); - memcg_check_events(memcg, folio_nid(folio)); + memcg1_check_events(memcg, folio_nid(folio)); local_irq_enable(); } @@ -5697,7 +5697,7 @@ static void uncharge_batch(const struct uncharge_gather *ug) local_irq_save(flags); __count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout); __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory); - memcg_check_events(ug->memcg, ug->nid); + memcg1_check_events(ug->memcg, ug->nid); local_irq_restore(flags); /* drop reference from uncharge_folio */ @@ -5836,7 +5836,7 @@ void mem_cgroup_replace_folio(struct folio *old, struct folio *new) local_irq_save(flags); mem_cgroup_charge_statistics(memcg, nr_pages); - memcg_check_events(memcg, folio_nid(new)); + memcg1_check_events(memcg, folio_nid(new)); local_irq_restore(flags); } @@ -6104,7 +6104,7 @@ void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) memcg_stats_lock(); mem_cgroup_charge_statistics(memcg, -nr_entries); memcg_stats_unlock(); - memcg_check_events(memcg, folio_nid(folio)); + memcg1_check_events(memcg, folio_nid(folio)); css_put(&memcg->css); } From patchwork Tue Jun 25 00:59:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710402 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E618C2BD09 for ; Tue, 25 Jun 2024 00:59:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBCC86B0359; Mon, 24 Jun 2024 20:59:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D42C06B035A; Mon, 24 Jun 2024 20:59:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B22786B035E; Mon, 24 Jun 2024 20:59:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8B2E06B0359 for ; Mon, 24 Jun 2024 20:59:47 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 29FA6121520 for ; Tue, 25 Jun 2024 00:59:47 +0000 (UTC) X-FDA: 82267603614.10.D645D9E Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) by imf16.hostedemail.com (Postfix) with ESMTP id EF778180012 for ; Tue, 25 Jun 2024 00:59:44 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=m8+KDAZP; spf=pass (imf16.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.172 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277177; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gDpuG7ZRLEX7/v6od4u5N3qPX9lXtC5VgDycpruJVCk=; b=e/J0DAgaqOADJs3lOLrW4qwmGfTGFWZcQ9ABSCUpE0k6I5Djuf08BPN8oWpboqkfrcq6SQ Tf3SlMCf6Mxf46vLC7eTWj+ejfmNEQYQGyCuuqrhQ/bNtIKy3JVP+2Zr99vuCz6iYCzqmL C8m5EQHDsWl6+R+HLCLGZDzrmQlhiBA= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=m8+KDAZP; spf=pass (imf16.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.172 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277177; a=rsa-sha256; cv=none; b=paUTBmSSrnkDYr+qawQmuLimMTujyuGJyD5S3s5PyN1mhsEmrjwT0BWF2ufsCSAl9EpmGz cgstAjqlrX6ajIp/SXCDOdyEGKJQsE1YdF7MiX/LHHXYBukPUfSwG5Li5vEgduhkEGnkuZ 7QRnTpi8yD2RNX4l0oWu/1ponh4YGX4= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277183; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gDpuG7ZRLEX7/v6od4u5N3qPX9lXtC5VgDycpruJVCk=; b=m8+KDAZPBXlAxzuey8qbrsfLxOAcJ26BTSgsLqwb8PF3wE0mB/XAoZ0f8zjH/qfkiv9YU2 QpNerO2d/+fK0XMGbpKDloXTC34+z0o4ohNH/pSLyRw/C/HsCR4F2/QiYTAlTujw4orXLN zxSDNGBTrTL2ylcvCaGz0tNPRfr3oS8= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 08/14] mm: memcg: move cgroup v1 oom handling code into memcontrol-v1.c Date: Mon, 24 Jun 2024 17:59:00 -0700 Message-ID: <20240625005906.106920-9-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: EF778180012 X-Stat-Signature: md8ex6qxric7jon5oz7j1ot5qpzwypxi X-Rspam-User: X-HE-Tag: 1719277184-387994 X-HE-Meta: U2FsdGVkX1+/Kt71XbjbRZisqGiGOtNYsEYldg+dJa0uLLspLucUzvVmuJuEv9FgptJhj3VEgvcXq+WJYX5TZ+/fRhJOResyjvO2uiuGGkh1mol36oAsz0k/Stlkg0jO8+i+flVQ0lpBIAFt/QCKt2LttvfhS4gx87mV1du80zI+rIaotM0g+o+IgJmb5OVnD5h4FDbg+gkbGrK6AkLau6NLc1S5RrNdaVDmENNXQ3dKwIL059KyLoN8xl3DS9yewAH5WOWegvPygP+2t+vQVDPP1KTFUeYIyb+WoCdZ6PKFsOtvmnaGgJJs+Vxeassm1LbGsxR1FaiY241u/V/80lNas6tOktQhOj2QeACfWdp4vJmRZCaIX6wC/gp6/s3DW5oqPFRzKrmmJGob0n/T1YpJu9JnmEylNW5t0MfvIpYMateeVjVWL2xRgadu6VBxpRLyA2JLBdS8h6/yObRyIFl+zEy52OLPT5NdLclhYQkkDj5cJacWQCBNxVw1eNd1zbiQi4oxDezpstDRYjhQ4AsSnkf7o2mItbbQrHz+nXAme4suxZ7tkyree9NOUkIgsYfTUsAT2G9k02/6ugMq/rPG2IDoRP+ncReuDpTBch0nv6uliFPuOTQ3h7oLBN6fbVeyAuzKelA+E5eu4IT0ngO0m6CQPYj5mQu1RsMLnk52dt+4tas6BJ7FiI6ML1NW1o1pEZspuClxkcxHc+tOVte5zjGHGp5XEtqhD2NbPYbwjuQH7J9CpIDKwzajpooVIOrbYNve0B1/OkoMVpYY4ee7c0Kc8R8Qlhd9tUzKJBtfNZiKOC4VUfgHo7sSpWwyYU68zNmAvRjU8yAKHQPyfVjpn9Rab9qxQxFPJkQrHFLaYw1Pubz0wrwpxWFu6cMX+Ciz+aZH1HZZPk+toBnIMFy4vlmIlXF60zC4JiBzBo9Vz5TENG5LyvFkAiccUvR1OMDMfh3D8gjN2v+RU5V ljNDyf1v 2H3TmUjRXB+bhrHP72ftlwLau/H0AycENpEhbFASyMhjBWGXB8GJ2+J7/Lv9g2vJMbDnzp87p13jkY7UdLpt5JJfYlu8V4+qZyornT4wcnDFr4e0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Cgroup v1 supports a complicated OOM handling in userspace mechanism, which is not supported by cgroup v2. Let's move the corresponding code into memcontrol-v1.c. Aside from mechanical code movement this patch introduces two new functions: memcg1_oom_prepare() and memcg1_oom_finish(). Those are implementing cgroup v1-specific parts of the common memcg OOM handling path. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- mm/memcontrol-v1.c | 229 ++++++++++++++++++++++++++++++++++++++++++++- mm/memcontrol-v1.h | 3 +- mm/memcontrol.c | 216 +----------------------------------------- 3 files changed, 231 insertions(+), 217 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index d7b5c4c14732..253d49d5fb12 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -110,7 +110,13 @@ struct mem_cgroup_event { struct work_struct remove; }; -extern spinlock_t memcg_oom_lock; +#ifdef CONFIG_LOCKDEP +static struct lockdep_map memcg_oom_lock_dep_map = { + .name = "memcg_oom_lock", +}; +#endif + +DEFINE_SPINLOCK(memcg_oom_lock); static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, struct mem_cgroup_tree_per_node *mctz, @@ -1469,7 +1475,7 @@ static int mem_cgroup_oom_notify_cb(struct mem_cgroup *memcg) return 0; } -void mem_cgroup_oom_notify(struct mem_cgroup *memcg) +static void mem_cgroup_oom_notify(struct mem_cgroup *memcg) { struct mem_cgroup *iter; @@ -1959,6 +1965,225 @@ void memcg1_css_offline(struct mem_cgroup *memcg) spin_unlock_irq(&memcg->event_list_lock); } +/* + * Check OOM-Killer is already running under our hierarchy. + * If someone is running, return false. + */ +static bool mem_cgroup_oom_trylock(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter, *failed = NULL; + + spin_lock(&memcg_oom_lock); + + for_each_mem_cgroup_tree(iter, memcg) { + if (iter->oom_lock) { + /* + * this subtree of our hierarchy is already locked + * so we cannot give a lock. + */ + failed = iter; + mem_cgroup_iter_break(memcg, iter); + break; + } else + iter->oom_lock = true; + } + + if (failed) { + /* + * OK, we failed to lock the whole subtree so we have + * to clean up what we set up to the failing subtree + */ + for_each_mem_cgroup_tree(iter, memcg) { + if (iter == failed) { + mem_cgroup_iter_break(memcg, iter); + break; + } + iter->oom_lock = false; + } + } else + mutex_acquire(&memcg_oom_lock_dep_map, 0, 1, _RET_IP_); + + spin_unlock(&memcg_oom_lock); + + return !failed; +} + +static void mem_cgroup_oom_unlock(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + + spin_lock(&memcg_oom_lock); + mutex_release(&memcg_oom_lock_dep_map, _RET_IP_); + for_each_mem_cgroup_tree(iter, memcg) + iter->oom_lock = false; + spin_unlock(&memcg_oom_lock); +} + +static void mem_cgroup_mark_under_oom(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + + spin_lock(&memcg_oom_lock); + for_each_mem_cgroup_tree(iter, memcg) + iter->under_oom++; + spin_unlock(&memcg_oom_lock); +} + +static void mem_cgroup_unmark_under_oom(struct mem_cgroup *memcg) +{ + struct mem_cgroup *iter; + + /* + * Be careful about under_oom underflows because a child memcg + * could have been added after mem_cgroup_mark_under_oom. + */ + spin_lock(&memcg_oom_lock); + for_each_mem_cgroup_tree(iter, memcg) + if (iter->under_oom > 0) + iter->under_oom--; + spin_unlock(&memcg_oom_lock); +} + +static DECLARE_WAIT_QUEUE_HEAD(memcg_oom_waitq); + +struct oom_wait_info { + struct mem_cgroup *memcg; + wait_queue_entry_t wait; +}; + +static int memcg_oom_wake_function(wait_queue_entry_t *wait, + unsigned mode, int sync, void *arg) +{ + struct mem_cgroup *wake_memcg = (struct mem_cgroup *)arg; + struct mem_cgroup *oom_wait_memcg; + struct oom_wait_info *oom_wait_info; + + oom_wait_info = container_of(wait, struct oom_wait_info, wait); + oom_wait_memcg = oom_wait_info->memcg; + + if (!mem_cgroup_is_descendant(wake_memcg, oom_wait_memcg) && + !mem_cgroup_is_descendant(oom_wait_memcg, wake_memcg)) + return 0; + return autoremove_wake_function(wait, mode, sync, arg); +} + +void memcg_oom_recover(struct mem_cgroup *memcg) +{ + /* + * For the following lockless ->under_oom test, the only required + * guarantee is that it must see the state asserted by an OOM when + * this function is called as a result of userland actions + * triggered by the notification of the OOM. This is trivially + * achieved by invoking mem_cgroup_mark_under_oom() before + * triggering notification. + */ + if (memcg && memcg->under_oom) + __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); +} + +/** + * mem_cgroup_oom_synchronize - complete memcg OOM handling + * @handle: actually kill/wait or just clean up the OOM state + * + * This has to be called at the end of a page fault if the memcg OOM + * handler was enabled. + * + * Memcg supports userspace OOM handling where failed allocations must + * sleep on a waitqueue until the userspace task resolves the + * situation. Sleeping directly in the charge context with all kinds + * of locks held is not a good idea, instead we remember an OOM state + * in the task and mem_cgroup_oom_synchronize() has to be called at + * the end of the page fault to complete the OOM handling. + * + * Returns %true if an ongoing memcg OOM situation was detected and + * completed, %false otherwise. + */ +bool mem_cgroup_oom_synchronize(bool handle) +{ + struct mem_cgroup *memcg = current->memcg_in_oom; + struct oom_wait_info owait; + bool locked; + + /* OOM is global, do not handle */ + if (!memcg) + return false; + + if (!handle) + goto cleanup; + + owait.memcg = memcg; + owait.wait.flags = 0; + owait.wait.func = memcg_oom_wake_function; + owait.wait.private = current; + INIT_LIST_HEAD(&owait.wait.entry); + + prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE); + mem_cgroup_mark_under_oom(memcg); + + locked = mem_cgroup_oom_trylock(memcg); + + if (locked) + mem_cgroup_oom_notify(memcg); + + schedule(); + mem_cgroup_unmark_under_oom(memcg); + finish_wait(&memcg_oom_waitq, &owait.wait); + + if (locked) + mem_cgroup_oom_unlock(memcg); +cleanup: + current->memcg_in_oom = NULL; + css_put(&memcg->css); + return true; +} + + +bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked) +{ + /* + * We are in the middle of the charge context here, so we + * don't want to block when potentially sitting on a callstack + * that holds all kinds of filesystem and mm locks. + * + * cgroup1 allows disabling the OOM killer and waiting for outside + * handling until the charge can succeed; remember the context and put + * the task to sleep at the end of the page fault when all locks are + * released. + * + * On the other hand, in-kernel OOM killer allows for an async victim + * memory reclaim (oom_reaper) and that means that we are not solely + * relying on the oom victim to make a forward progress and we can + * invoke the oom killer here. + * + * Please note that mem_cgroup_out_of_memory might fail to find a + * victim and then we have to bail out from the charge path. + */ + if (READ_ONCE(memcg->oom_kill_disable)) { + if (current->in_user_fault) { + css_get(&memcg->css); + current->memcg_in_oom = memcg; + } + return false; + } + + mem_cgroup_mark_under_oom(memcg); + + *locked = mem_cgroup_oom_trylock(memcg); + + if (*locked) + mem_cgroup_oom_notify(memcg); + + mem_cgroup_unmark_under_oom(memcg); + + return true; +} + +void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked) +{ + if (locked) + mem_cgroup_oom_unlock(memcg); +} + static int __init memcg1_init(void) { int node; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index ef1b7037cbdc..3de956b2422f 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -87,9 +87,10 @@ enum res_type { bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, enum mem_cgroup_events_target target); unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); -void mem_cgroup_oom_notify(struct mem_cgroup *memcg); ssize_t memcg_write_event_control(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off); +bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); +void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 92fb72bbd494..8abd364ac837 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1616,130 +1616,6 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, return ret; } -#ifdef CONFIG_LOCKDEP -static struct lockdep_map memcg_oom_lock_dep_map = { - .name = "memcg_oom_lock", -}; -#endif - -DEFINE_SPINLOCK(memcg_oom_lock); - -/* - * Check OOM-Killer is already running under our hierarchy. - * If someone is running, return false. - */ -static bool mem_cgroup_oom_trylock(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter, *failed = NULL; - - spin_lock(&memcg_oom_lock); - - for_each_mem_cgroup_tree(iter, memcg) { - if (iter->oom_lock) { - /* - * this subtree of our hierarchy is already locked - * so we cannot give a lock. - */ - failed = iter; - mem_cgroup_iter_break(memcg, iter); - break; - } else - iter->oom_lock = true; - } - - if (failed) { - /* - * OK, we failed to lock the whole subtree so we have - * to clean up what we set up to the failing subtree - */ - for_each_mem_cgroup_tree(iter, memcg) { - if (iter == failed) { - mem_cgroup_iter_break(memcg, iter); - break; - } - iter->oom_lock = false; - } - } else - mutex_acquire(&memcg_oom_lock_dep_map, 0, 1, _RET_IP_); - - spin_unlock(&memcg_oom_lock); - - return !failed; -} - -static void mem_cgroup_oom_unlock(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter; - - spin_lock(&memcg_oom_lock); - mutex_release(&memcg_oom_lock_dep_map, _RET_IP_); - for_each_mem_cgroup_tree(iter, memcg) - iter->oom_lock = false; - spin_unlock(&memcg_oom_lock); -} - -static void mem_cgroup_mark_under_oom(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter; - - spin_lock(&memcg_oom_lock); - for_each_mem_cgroup_tree(iter, memcg) - iter->under_oom++; - spin_unlock(&memcg_oom_lock); -} - -static void mem_cgroup_unmark_under_oom(struct mem_cgroup *memcg) -{ - struct mem_cgroup *iter; - - /* - * Be careful about under_oom underflows because a child memcg - * could have been added after mem_cgroup_mark_under_oom. - */ - spin_lock(&memcg_oom_lock); - for_each_mem_cgroup_tree(iter, memcg) - if (iter->under_oom > 0) - iter->under_oom--; - spin_unlock(&memcg_oom_lock); -} - -static DECLARE_WAIT_QUEUE_HEAD(memcg_oom_waitq); - -struct oom_wait_info { - struct mem_cgroup *memcg; - wait_queue_entry_t wait; -}; - -static int memcg_oom_wake_function(wait_queue_entry_t *wait, - unsigned mode, int sync, void *arg) -{ - struct mem_cgroup *wake_memcg = (struct mem_cgroup *)arg; - struct mem_cgroup *oom_wait_memcg; - struct oom_wait_info *oom_wait_info; - - oom_wait_info = container_of(wait, struct oom_wait_info, wait); - oom_wait_memcg = oom_wait_info->memcg; - - if (!mem_cgroup_is_descendant(wake_memcg, oom_wait_memcg) && - !mem_cgroup_is_descendant(oom_wait_memcg, wake_memcg)) - return 0; - return autoremove_wake_function(wait, mode, sync, arg); -} - -void memcg_oom_recover(struct mem_cgroup *memcg) -{ - /* - * For the following lockless ->under_oom test, the only required - * guarantee is that it must see the state asserted by an OOM when - * this function is called as a result of userland actions - * triggered by the notification of the OOM. This is trivially - * achieved by invoking mem_cgroup_mark_under_oom() before - * triggering notification. - */ - if (memcg && memcg->under_oom) - __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); -} - /* * Returns true if successfully killed one or more processes. Though in some * corner cases it can return true even without killing any process. @@ -1753,104 +1629,16 @@ static bool mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) memcg_memory_event(memcg, MEMCG_OOM); - /* - * We are in the middle of the charge context here, so we - * don't want to block when potentially sitting on a callstack - * that holds all kinds of filesystem and mm locks. - * - * cgroup1 allows disabling the OOM killer and waiting for outside - * handling until the charge can succeed; remember the context and put - * the task to sleep at the end of the page fault when all locks are - * released. - * - * On the other hand, in-kernel OOM killer allows for an async victim - * memory reclaim (oom_reaper) and that means that we are not solely - * relying on the oom victim to make a forward progress and we can - * invoke the oom killer here. - * - * Please note that mem_cgroup_out_of_memory might fail to find a - * victim and then we have to bail out from the charge path. - */ - if (READ_ONCE(memcg->oom_kill_disable)) { - if (current->in_user_fault) { - css_get(&memcg->css); - current->memcg_in_oom = memcg; - } + if (!memcg1_oom_prepare(memcg, &locked)) return false; - } - - mem_cgroup_mark_under_oom(memcg); - locked = mem_cgroup_oom_trylock(memcg); - - if (locked) - mem_cgroup_oom_notify(memcg); - - mem_cgroup_unmark_under_oom(memcg); ret = mem_cgroup_out_of_memory(memcg, mask, order); - if (locked) - mem_cgroup_oom_unlock(memcg); + memcg1_oom_finish(memcg, locked); return ret; } -/** - * mem_cgroup_oom_synchronize - complete memcg OOM handling - * @handle: actually kill/wait or just clean up the OOM state - * - * This has to be called at the end of a page fault if the memcg OOM - * handler was enabled. - * - * Memcg supports userspace OOM handling where failed allocations must - * sleep on a waitqueue until the userspace task resolves the - * situation. Sleeping directly in the charge context with all kinds - * of locks held is not a good idea, instead we remember an OOM state - * in the task and mem_cgroup_oom_synchronize() has to be called at - * the end of the page fault to complete the OOM handling. - * - * Returns %true if an ongoing memcg OOM situation was detected and - * completed, %false otherwise. - */ -bool mem_cgroup_oom_synchronize(bool handle) -{ - struct mem_cgroup *memcg = current->memcg_in_oom; - struct oom_wait_info owait; - bool locked; - - /* OOM is global, do not handle */ - if (!memcg) - return false; - - if (!handle) - goto cleanup; - - owait.memcg = memcg; - owait.wait.flags = 0; - owait.wait.func = memcg_oom_wake_function; - owait.wait.private = current; - INIT_LIST_HEAD(&owait.wait.entry); - - prepare_to_wait(&memcg_oom_waitq, &owait.wait, TASK_KILLABLE); - mem_cgroup_mark_under_oom(memcg); - - locked = mem_cgroup_oom_trylock(memcg); - - if (locked) - mem_cgroup_oom_notify(memcg); - - schedule(); - mem_cgroup_unmark_under_oom(memcg); - finish_wait(&memcg_oom_waitq, &owait.wait); - - if (locked) - mem_cgroup_oom_unlock(memcg); -cleanup: - current->memcg_in_oom = NULL; - css_put(&memcg->css); - return true; -} - /** * mem_cgroup_get_oom_group - get a memory cgroup to clean up after OOM * @victim: task to be killed by the OOM killer From patchwork Tue Jun 25 00:59:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 134D5C2BD09 for ; Tue, 25 Jun 2024 01:00:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A3A8C6B035A; Mon, 24 Jun 2024 20:59:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 996EB6B035E; Mon, 24 Jun 2024 20:59:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79C876B035F; Mon, 24 Jun 2024 20:59:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 5BFB76B035A for ; Mon, 24 Jun 2024 20:59:49 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 15FBA1A1586 for ; Tue, 25 Jun 2024 00:59:49 +0000 (UTC) X-FDA: 82267603698.08.A3D371F Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173]) by imf04.hostedemail.com (Postfix) with ESMTP id E465440013 for ; Tue, 25 Jun 2024 00:59:46 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=eV89MCas; spf=pass (imf04.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.173 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277176; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fbzH39LFdgO7g4Zh8QO5WwcNo4wg+o7OIMv0JFj+P7c=; b=Ls5TxD5m02xPkzJT0JSQDa7sonrjfnsNz/Mi6EJROMO54B+ChGxLA3m/ba2kYbl1Oq7sDQ loZk4FFyhMB1z4QkM/bdG4fmbAm2Lf8qIfzOxtOwyTAgE76zEgQzhHKgINGy5059VcvgDs jBNYz1IM6p0hh2AOJdXbRs/RkZsp0kI= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=eV89MCas; spf=pass (imf04.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.173 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277176; a=rsa-sha256; cv=none; b=0Nmqmi6HEM1xH5kfyb0VWzXiryT4fe0ek5EiBTmvosTI6UhsrJVoo6rsSsE7Y11v33TFxI cqyRbejUYOO5IJcDCYvnJ7yhM6SSUzT2pPhhQwTSBF25l+XxCIXR8Pw0z1qPIHhX4DC3Jd vYbCQINLTvUEzpqgmiRm02qF2yUb2PI= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fbzH39LFdgO7g4Zh8QO5WwcNo4wg+o7OIMv0JFj+P7c=; b=eV89MCast7EgJhdesN7qSvQtr85YMmOO0k2w6ijFUtydlNQ78AauHRfRhUjsPEeOfbC5Gx dtcK/V9nKa9SNmF/5FFvf0Xxi6O7J2yw96sE17lemuK/TV1CjppWoMQQb/mE6M7L50bNMj 8VHqifkS/V/BI3Pm3oEoZbviAukkS9Q= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 09/14] mm: memcg: rename memcg_oom_recover() Date: Mon, 24 Jun 2024 17:59:01 -0700 Message-ID: <20240625005906.106920-10-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Stat-Signature: hh4ughr9wfdmx7ypjfgs8bte53jqea6m X-Rspam-User: X-Rspamd-Queue-Id: E465440013 X-Rspamd-Server: rspam02 X-HE-Tag: 1719277186-466499 X-HE-Meta: U2FsdGVkX18GMwXfASD3TzKdT2bKtRtS4sFjZc2tzEOk+BgjKAQ4GPKwor+j6CNsPLVxX/9t7ui+LYEHtDKdAzD5DwtK0kF5dHce6WIRJNYhHZZ1IlLL1KP6cpVC4482S6D+eSGtBPOc0tg/1TI28Tpq0JfC2MrIXF3i4VITkwg+ZxGOyq3Um45npsyL/C1dXx/00aG0IFmWzkGquduWgGjW4I70iknX4Hf07Quje6jxZHjrqGFM1i+luhOZrDszvn4M6UoKkTIQ52eN3rWF+oH+8Ztr9VAbZQfrbYxUOY8dJzyYCjkLa4epKqvwufqVSeDUUBUOm2B6YxFelPFo2C2sGzXZTUJLteihauRhRWdEm+LrrkX5QITmaI9gak0attfckuDR+K296VocR+MiVP7RFfvrvSlw302X1V3alha0gJcfvIMAw8MT6vSZSskm/7Y6AJIsYhSITDTF5oA3jFsU0x8JYMuLfYOqP5cHp2Og7BxRNiNQ9wO9BIrcEUAGtc0EV7MWgdY6YLVZFIWjK5bN4Vg7s63PW5VaQKbss9z+/zHGFHgAvPH6Lemau3rXMaBMjw8+rahYk4Rts00qnIT9V8D7a28nGefHajxG8Z5UJOv+7XHuzc6mvrz6OwMLt2JP6y1g6/bi09ZxqVgz7cgKNJ52Z6uL0p77+fVFEvSRl68FO703Njw6I37BVsNCC27nSiOAh+xR3KG+dLaYQz/+HyvtBjaciGgKpCkglBTMUYbZqvEg8hiUR7zINeoZ3P7zGzDsOnnPzFhs4EeoNEC2teIG86XqUWOc37b9d2lL7ETOVooeXVEF73vweoe51Wh+NQOMNe12Ujak36NJDlFgLJrZtRBUY7qnAGkvQZiHc4xkDfpgiWDHAfAwXfO21wof58uyDIWozdmFv92W+NkEyufvDlrRzCenKPnZ1s1WyCWdoajZGUwx7sxZkUMUTupnrEe4e3wGf/wdNP7 T2q7BNNV +vGpTY1KC4B3RHUNSUjlN4+99NAqkJEK2IR3YtPzCniMkrfByLEaf8uo3a34rJwePrKOZyqmzZQFV71Uu+tdckQwO/ZBhPraPbjNGWcQbJ1V6u/t8V3geuqzBbqqy0BC3MB5J6e6NJsyrDaQkiSfGCpNQwRz2ntXI+GI7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Rename memcg_oom_recover() into memcg1_oom_recover() for consistency with other memory cgroup v1-related functions. Move the declaration in mm/memcontrol-v1.h to be nearby other memcg v1 oom handling functions. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- mm/memcontrol-v1.c | 6 +++--- mm/memcontrol-v1.h | 2 +- mm/memcontrol.c | 6 +++--- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 253d49d5fb12..1d5608ee1606 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -1090,8 +1090,8 @@ static void __mem_cgroup_clear_mc(void) mc.moved_swap = 0; } - memcg_oom_recover(from); - memcg_oom_recover(to); + memcg1_oom_recover(from); + memcg1_oom_recover(to); wake_up_all(&mc.waitq); } @@ -2067,7 +2067,7 @@ static int memcg_oom_wake_function(wait_queue_entry_t *wait, return autoremove_wake_function(wait, mode, sync, arg); } -void memcg_oom_recover(struct mem_cgroup *memcg) +void memcg1_oom_recover(struct mem_cgroup *memcg) { /* * For the following lockless ->under_oom test, the only required diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 3de956b2422f..972c493a8ae3 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -13,7 +13,6 @@ static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages); void memcg1_check_events(struct mem_cgroup *memcg, int nid); -void memcg_oom_recover(struct mem_cgroup *memcg); int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages); @@ -92,5 +91,6 @@ ssize_t memcg_write_event_control(struct kernfs_open_file *of, bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); +void memcg1_oom_recover(struct mem_cgroup *memcg); #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8abd364ac837..37e0af5b26f3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3167,7 +3167,7 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, } while (true); if (!ret && enlarge) - memcg_oom_recover(memcg); + memcg1_oom_recover(memcg); return ret; } @@ -3752,7 +3752,7 @@ static int mem_cgroup_oom_control_write(struct cgroup_subsys_state *css, WRITE_ONCE(memcg->oom_kill_disable, val); if (!val) - memcg_oom_recover(memcg); + memcg1_oom_recover(memcg); return 0; } @@ -5479,7 +5479,7 @@ static void uncharge_batch(const struct uncharge_gather *ug) page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory); if (ug->nr_kmem) memcg_account_kmem(ug->memcg, -ug->nr_kmem); - memcg_oom_recover(ug->memcg); + memcg1_oom_recover(ug->memcg); } local_irq_save(flags); From patchwork Tue Jun 25 00:59:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710404 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA722C2BD09 for ; Tue, 25 Jun 2024 01:00:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 669896B00A7; Mon, 24 Jun 2024 20:59:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5EFE16B00E9; Mon, 24 Jun 2024 20:59:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F3596B035F; Mon, 24 Jun 2024 20:59:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 115686B00A7 for ; Mon, 24 Jun 2024 20:59:52 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CAC761C2782 for ; Tue, 25 Jun 2024 00:59:51 +0000 (UTC) X-FDA: 82267603782.03.5231302 Received: from out-187.mta0.migadu.com (out-187.mta0.migadu.com [91.218.175.187]) by imf18.hostedemail.com (Postfix) with ESMTP id 748431C0005 for ; Tue, 25 Jun 2024 00:59:49 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=CS6G31DD; spf=pass (imf18.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.187 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277175; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GPeyaNOSxBqGrmhIQ0Yy2cBzUve/uQScgREmCFFE80A=; b=iivNhtH4gMSkwmJEiUNVdHBVRHp3ZkfuBOpo+h00PZFmth3UKDHV0OzomTXIX9ShDciiZr ao+pJQp8J61ZwoXNqv0Ufx484RpqV9RvoXKCKqTsK4lwXm9VscRrzQH65YVYQ4K4TmjUkp a1hQL2RphXC+kf4lVhDMI2/xAOawQOA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277175; a=rsa-sha256; cv=none; b=fQ7lKiMIbCX9TzzJXMbCpTEHDrVwAVENkolZX2ddqiRGzBRoKVs0WxmGFBFsgn5FLa8Ksi RySVmWpWAuvEQYFeqBBeYPYRundl6deKN/Lrllj9581JH5jtU3jUgXx9Xmnnz0Oz4+shcn T9XZku6gRouW8of3lb0JZB58ye7OObQ= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=CS6G31DD; spf=pass (imf18.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.187 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277188; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GPeyaNOSxBqGrmhIQ0Yy2cBzUve/uQScgREmCFFE80A=; b=CS6G31DDySi/Xt0aJx00BATVEUW2DIQKSsJ0IKZnXL6+TmgPsBUi5GK+Ap2FWWoXcGDcAR Q4KTj0NOAajPbPkhkSn27NKhCZY26SsS3jEniRpROz7KSL7rrGiIDH8W+6CcJ4LUZoUIrC cpLmvnsleHo/2OgjosFJ77JwOftICeI= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 10/14] mm: memcg: move cgroup v1 interface files to memcontrol-v1.c Date: Mon, 24 Jun 2024 17:59:02 -0700 Message-ID: <20240625005906.106920-11-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 748431C0005 X-Stat-Signature: ie3aqcd3faatnomcuuej73uhkphnrfte X-HE-Tag: 1719277189-156521 X-HE-Meta: U2FsdGVkX1/pvICdb6R+X/NUCiStAN2N0oKSwxaTsHtdNYtdSbBLeEE9I4nX7rrmoMZR/Hbg3dFC2gC1oYBhRkZ0cqK6BwoHyR+i+4oYqBfq2GNYPHprfvybn1D+6tDgjpNX/os4uIWUBJ9lKsMJg8RbWArxWgu6tyNwNe83mEvRtlpwS74hLutnuh3/KPi6Phcx4rwG0BlwsxXQDQYtyTRC9KyRXenZswwbW9c4LOzwM/qpW60Jando2SmwPWXwBEGYKbHhWAJrWYAFM6jhwfNziW3EMTLnpZEukov/78KOeKxe8rnCOZLma1VfpTIlaD42PzfqTLcWphTeelMHcXQs8NlPjHr5A+I3igYCZFhpBuaR4TVcPLZ4mZ3E5vhVCzFrGSQaMbNooQIyro+nIHevMkN3PMqI8T18j/SiqwF01ZAlm9tTgmRLkoL+c6T92IuUszfBXT1/mjuKOclb4W9QQUG6+op+ofFhp4ujWMheiWZ9FzDnINkjJW950zB9v2+vYeNRadE4IabnjcBPqd0f21Tv/6s0TE8LKiqd7UFT+y/mZnTGMB4gAniDK9/mL8HHq9aUubOgAl8wWH/P2rUK2tzD1Re96BfjEWL1999O6WX/L6e/1NnxV4nLsb4V/odZxzbvDAp348jGU4wBM9RUPP01XwMvjjPSmLvzPSBp+NCkrPtmjn8QsJtLvW8SuTWEp+iPDQaUUOr0oLpdpnMxjjLcr0vshlpetwm0GQ3v8JM3DoswJGIJEJp9kJ6YDEGiJRkpd2+RmKlqXnl3MIe6aAhvv+OvXYb1owCeJ2b1hTrjrgzl92iZTGCWfQkOS5yThroIDL8hOCFg1FZLXCNvXuVezyKCVcz3XNoV7l2+Sr7Yfn4/asTTmcZX8KvnJvQKNiK39dXqxHmHg/yR9OWJk10CWJ8O79e02jrogRG3BocfMssjHXz5wRzxyypu+8uQKRIMsmjwJzR3ps+ mAZLBoQN J2bxP1+EntPBOgVoYwsO0oAK+X5+NkdSGWidDAEmwJcdxG9vkXs9ErgkyKzvcwSiJnRV8L8CObU3GCbhkWf+MJtrOuSYrDVdSDEO286K6aDHrjAzk/b9Pks+B6IF99CTMJLhsm6dFlUv7O3cIZbBXJYu+68utRqAFBHfAl5NP5ic3Tpg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Move legacy cgroup v1 memory controller interfaces and corresponding code into memcontrol-v1.c. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- mm/memcontrol-v1.c | 739 ++++++++++++++++++++++++++++++++++++++++++++- mm/memcontrol-v1.h | 29 +- mm/memcontrol.c | 721 +------------------------------------------ 3 files changed, 767 insertions(+), 722 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 1d5608ee1606..1b7337d0170d 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "internal.h" #include "swap.h" @@ -110,6 +111,18 @@ struct mem_cgroup_event { struct work_struct remove; }; +#define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val)) +#define MEMFILE_TYPE(val) ((val) >> 16 & 0xffff) +#define MEMFILE_ATTR(val) ((val) & 0xffff) + +enum { + RES_USAGE, + RES_LIMIT, + RES_MAX_USAGE, + RES_FAILCNT, + RES_SOFT_LIMIT, +}; + #ifdef CONFIG_LOCKDEP static struct lockdep_map memcg_oom_lock_dep_map = { .name = "memcg_oom_lock", @@ -577,14 +590,14 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry, } #endif -u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, +static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, struct cftype *cft) { return mem_cgroup_from_css(css)->move_charge_at_immigrate; } #ifdef CONFIG_MMU -int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, +static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, struct cftype *cft, u64 val) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); @@ -606,7 +619,7 @@ int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, return 0; } #else -int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, +static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, struct cftype *cft, u64 val) { return -ENOSYS; @@ -1803,8 +1816,8 @@ static void memcg_event_ptable_queue_proc(struct file *file, * Input must be in format ' '. * Interpretation of args is defined by control file implementation. */ -ssize_t memcg_write_event_control(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off) +static ssize_t memcg_write_event_control(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) { struct cgroup_subsys_state *css = of_css(of); struct mem_cgroup *memcg = mem_cgroup_from_css(css); @@ -2184,6 +2197,722 @@ void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked) mem_cgroup_oom_unlock(memcg); } +static DEFINE_MUTEX(memcg_max_mutex); + +static int mem_cgroup_resize_max(struct mem_cgroup *memcg, + unsigned long max, bool memsw) +{ + bool enlarge = false; + bool drained = false; + int ret; + bool limits_invariant; + struct page_counter *counter = memsw ? &memcg->memsw : &memcg->memory; + + do { + if (signal_pending(current)) { + ret = -EINTR; + break; + } + + mutex_lock(&memcg_max_mutex); + /* + * Make sure that the new limit (memsw or memory limit) doesn't + * break our basic invariant rule memory.max <= memsw.max. + */ + limits_invariant = memsw ? max >= READ_ONCE(memcg->memory.max) : + max <= memcg->memsw.max; + if (!limits_invariant) { + mutex_unlock(&memcg_max_mutex); + ret = -EINVAL; + break; + } + if (max > counter->max) + enlarge = true; + ret = page_counter_set_max(counter, max); + mutex_unlock(&memcg_max_mutex); + + if (!ret) + break; + + if (!drained) { + drain_all_stock(memcg); + drained = true; + continue; + } + + if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, + memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, NULL)) { + ret = -EBUSY; + break; + } + } while (true); + + if (!ret && enlarge) + memcg1_oom_recover(memcg); + + return ret; +} + +/* + * Reclaims as many pages from the given memcg as possible. + * + * Caller is responsible for holding css reference for memcg. + */ +static int mem_cgroup_force_empty(struct mem_cgroup *memcg) +{ + int nr_retries = MAX_RECLAIM_RETRIES; + + /* we call try-to-free pages for make this cgroup empty */ + lru_add_drain_all(); + + drain_all_stock(memcg); + + /* try to free all pages in this cgroup */ + while (nr_retries && page_counter_read(&memcg->memory)) { + if (signal_pending(current)) + return -EINTR; + + if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, + MEMCG_RECLAIM_MAY_SWAP, NULL)) + nr_retries--; + } + + return 0; +} + +static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + if (mem_cgroup_is_root(memcg)) + return -EINVAL; + return mem_cgroup_force_empty(memcg) ?: nbytes; +} + +static u64 mem_cgroup_hierarchy_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return 1; +} + +static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + if (val == 1) + return 0; + + pr_warn_once("Non-hierarchical mode is deprecated. " + "Please report your usecase to linux-mm@kvack.org if you " + "depend on this functionality.\n"); + + return -EINVAL; +} + +static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + struct page_counter *counter; + + switch (MEMFILE_TYPE(cft->private)) { + case _MEM: + counter = &memcg->memory; + break; + case _MEMSWAP: + counter = &memcg->memsw; + break; + case _KMEM: + counter = &memcg->kmem; + break; + case _TCP: + counter = &memcg->tcpmem; + break; + default: + BUG(); + } + + switch (MEMFILE_ATTR(cft->private)) { + case RES_USAGE: + if (counter == &memcg->memory) + return (u64)mem_cgroup_usage(memcg, false) * PAGE_SIZE; + if (counter == &memcg->memsw) + return (u64)mem_cgroup_usage(memcg, true) * PAGE_SIZE; + return (u64)page_counter_read(counter) * PAGE_SIZE; + case RES_LIMIT: + return (u64)counter->max * PAGE_SIZE; + case RES_MAX_USAGE: + return (u64)counter->watermark * PAGE_SIZE; + case RES_FAILCNT: + return counter->failcnt; + case RES_SOFT_LIMIT: + return (u64)READ_ONCE(memcg->soft_limit) * PAGE_SIZE; + default: + BUG(); + } +} + +/* + * This function doesn't do anything useful. Its only job is to provide a read + * handler for a file so that cgroup_file_mode() will add read permissions. + */ +static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m, + __always_unused void *v) +{ + return -EINVAL; +} + +static int memcg_update_tcp_max(struct mem_cgroup *memcg, unsigned long max) +{ + int ret; + + mutex_lock(&memcg_max_mutex); + + ret = page_counter_set_max(&memcg->tcpmem, max); + if (ret) + goto out; + + if (!memcg->tcpmem_active) { + /* + * The active flag needs to be written after the static_key + * update. This is what guarantees that the socket activation + * function is the last one to run. See mem_cgroup_sk_alloc() + * for details, and note that we don't mark any socket as + * belonging to this memcg until that flag is up. + * + * We need to do this, because static_keys will span multiple + * sites, but we can't control their order. If we mark a socket + * as accounted, but the accounting functions are not patched in + * yet, we'll lose accounting. + * + * We never race with the readers in mem_cgroup_sk_alloc(), + * because when this value change, the code to process it is not + * patched in yet. + */ + static_branch_inc(&memcg_sockets_enabled_key); + memcg->tcpmem_active = true; + } +out: + mutex_unlock(&memcg_max_mutex); + return ret; +} + +/* + * The user of this function is... + * RES_LIMIT. + */ +static ssize_t mem_cgroup_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + unsigned long nr_pages; + int ret; + + buf = strstrip(buf); + ret = page_counter_memparse(buf, "-1", &nr_pages); + if (ret) + return ret; + + switch (MEMFILE_ATTR(of_cft(of)->private)) { + case RES_LIMIT: + if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ + ret = -EINVAL; + break; + } + switch (MEMFILE_TYPE(of_cft(of)->private)) { + case _MEM: + ret = mem_cgroup_resize_max(memcg, nr_pages, false); + break; + case _MEMSWAP: + ret = mem_cgroup_resize_max(memcg, nr_pages, true); + break; + case _KMEM: + pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. " + "Writing any value to this file has no effect. " + "Please report your usecase to linux-mm@kvack.org if you " + "depend on this functionality.\n"); + ret = 0; + break; + case _TCP: + ret = memcg_update_tcp_max(memcg, nr_pages); + break; + } + break; + case RES_SOFT_LIMIT: + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + ret = -EOPNOTSUPP; + } else { + WRITE_ONCE(memcg->soft_limit, nr_pages); + ret = 0; + } + break; + } + return ret ?: nbytes; +} + +static ssize_t mem_cgroup_reset(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + struct page_counter *counter; + + switch (MEMFILE_TYPE(of_cft(of)->private)) { + case _MEM: + counter = &memcg->memory; + break; + case _MEMSWAP: + counter = &memcg->memsw; + break; + case _KMEM: + counter = &memcg->kmem; + break; + case _TCP: + counter = &memcg->tcpmem; + break; + default: + BUG(); + } + + switch (MEMFILE_ATTR(of_cft(of)->private)) { + case RES_MAX_USAGE: + page_counter_reset_watermark(counter); + break; + case RES_FAILCNT: + counter->failcnt = 0; + break; + default: + BUG(); + } + + return nbytes; +} + +#ifdef CONFIG_NUMA + +#define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE)) +#define LRU_ALL_ANON (BIT(LRU_INACTIVE_ANON) | BIT(LRU_ACTIVE_ANON)) +#define LRU_ALL ((1 << NR_LRU_LISTS) - 1) + +/* static unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, */ +/* int nid, unsigned int lru_mask, bool tree) */ +/* { */ +/* struct lruvec *lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(nid)); */ +/* unsigned long nr = 0; */ +/* enum lru_list lru; */ + +/* VM_BUG_ON((unsigned)nid >= nr_node_ids); */ + +/* for_each_lru(lru) { */ +/* if (!(BIT(lru) & lru_mask)) */ +/* continue; */ +/* if (tree) */ +/* nr += lruvec_page_state(lruvec, NR_LRU_BASE + lru); */ +/* else */ +/* nr += lruvec_page_state_local(lruvec, NR_LRU_BASE + lru); */ +/* } */ +/* return nr; */ +/* } */ + +/* static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, */ +/* unsigned int lru_mask, */ +/* bool tree) */ +/* { */ +/* unsigned long nr = 0; */ +/* enum lru_list lru; */ + +/* for_each_lru(lru) { */ +/* if (!(BIT(lru) & lru_mask)) */ +/* continue; */ +/* if (tree) */ +/* nr += memcg_page_state(memcg, NR_LRU_BASE + lru); */ +/* else */ +/* nr += memcg_page_state_local(memcg, NR_LRU_BASE + lru); */ +/* } */ +/* return nr; */ +/* } */ + +static int memcg_numa_stat_show(struct seq_file *m, void *v) +{ + struct numa_stat { + const char *name; + unsigned int lru_mask; + }; + + static const struct numa_stat stats[] = { + { "total", LRU_ALL }, + { "file", LRU_ALL_FILE }, + { "anon", LRU_ALL_ANON }, + { "unevictable", BIT(LRU_UNEVICTABLE) }, + }; + const struct numa_stat *stat; + int nid; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + mem_cgroup_flush_stats(memcg); + + for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { + seq_printf(m, "%s=%lu", stat->name, + mem_cgroup_nr_lru_pages(memcg, stat->lru_mask, + false)); + for_each_node_state(nid, N_MEMORY) + seq_printf(m, " N%d=%lu", nid, + mem_cgroup_node_nr_lru_pages(memcg, nid, + stat->lru_mask, false)); + seq_putc(m, '\n'); + } + + for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { + + seq_printf(m, "hierarchical_%s=%lu", stat->name, + mem_cgroup_nr_lru_pages(memcg, stat->lru_mask, + true)); + for_each_node_state(nid, N_MEMORY) + seq_printf(m, " N%d=%lu", nid, + mem_cgroup_node_nr_lru_pages(memcg, nid, + stat->lru_mask, true)); + seq_putc(m, '\n'); + } + + return 0; +} +#endif /* CONFIG_NUMA */ + +static const unsigned int memcg1_stats[] = { + NR_FILE_PAGES, + NR_ANON_MAPPED, +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + NR_ANON_THPS, +#endif + NR_SHMEM, + NR_FILE_MAPPED, + NR_FILE_DIRTY, + NR_WRITEBACK, + WORKINGSET_REFAULT_ANON, + WORKINGSET_REFAULT_FILE, +#ifdef CONFIG_SWAP + MEMCG_SWAP, + NR_SWAPCACHE, +#endif +}; + +static const char *const memcg1_stat_names[] = { + "cache", + "rss", +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + "rss_huge", +#endif + "shmem", + "mapped_file", + "dirty", + "writeback", + "workingset_refault_anon", + "workingset_refault_file", +#ifdef CONFIG_SWAP + "swap", + "swapcached", +#endif +}; + +/* Universal VM events cgroup1 shows, original sort order */ +static const unsigned int memcg1_events[] = { + PGPGIN, + PGPGOUT, + PGFAULT, + PGMAJFAULT, +}; + +void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) +{ + unsigned long memory, memsw; + struct mem_cgroup *mi; + unsigned int i; + + BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats)); + + mem_cgroup_flush_stats(memcg); + + for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { + unsigned long nr; + + nr = memcg_page_state_local_output(memcg, memcg1_stats[i]); + seq_buf_printf(s, "%s %lu\n", memcg1_stat_names[i], nr); + } + + for (i = 0; i < ARRAY_SIZE(memcg1_events); i++) + seq_buf_printf(s, "%s %lu\n", vm_event_name(memcg1_events[i]), + memcg_events_local(memcg, memcg1_events[i])); + + for (i = 0; i < NR_LRU_LISTS; i++) + seq_buf_printf(s, "%s %lu\n", lru_list_name(i), + memcg_page_state_local(memcg, NR_LRU_BASE + i) * + PAGE_SIZE); + + /* Hierarchical information */ + memory = memsw = PAGE_COUNTER_MAX; + for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) { + memory = min(memory, READ_ONCE(mi->memory.max)); + memsw = min(memsw, READ_ONCE(mi->memsw.max)); + } + seq_buf_printf(s, "hierarchical_memory_limit %llu\n", + (u64)memory * PAGE_SIZE); + seq_buf_printf(s, "hierarchical_memsw_limit %llu\n", + (u64)memsw * PAGE_SIZE); + + for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { + unsigned long nr; + + nr = memcg_page_state_output(memcg, memcg1_stats[i]); + seq_buf_printf(s, "total_%s %llu\n", memcg1_stat_names[i], + (u64)nr); + } + + for (i = 0; i < ARRAY_SIZE(memcg1_events); i++) + seq_buf_printf(s, "total_%s %llu\n", + vm_event_name(memcg1_events[i]), + (u64)memcg_events(memcg, memcg1_events[i])); + + for (i = 0; i < NR_LRU_LISTS; i++) + seq_buf_printf(s, "total_%s %llu\n", lru_list_name(i), + (u64)memcg_page_state(memcg, NR_LRU_BASE + i) * + PAGE_SIZE); + +#ifdef CONFIG_DEBUG_VM + { + pg_data_t *pgdat; + struct mem_cgroup_per_node *mz; + unsigned long anon_cost = 0; + unsigned long file_cost = 0; + + for_each_online_pgdat(pgdat) { + mz = memcg->nodeinfo[pgdat->node_id]; + + anon_cost += mz->lruvec.anon_cost; + file_cost += mz->lruvec.file_cost; + } + seq_buf_printf(s, "anon_cost %lu\n", anon_cost); + seq_buf_printf(s, "file_cost %lu\n", file_cost); + } +#endif +} + +static u64 mem_cgroup_swappiness_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + return mem_cgroup_swappiness(memcg); +} + +static int mem_cgroup_swappiness_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + if (val > MAX_SWAPPINESS) + return -EINVAL; + + if (!mem_cgroup_is_root(memcg)) + WRITE_ONCE(memcg->swappiness, val); + else + WRITE_ONCE(vm_swappiness, val); + + return 0; +} + +static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_seq(sf); + + seq_printf(sf, "oom_kill_disable %d\n", READ_ONCE(memcg->oom_kill_disable)); + seq_printf(sf, "under_oom %d\n", (bool)memcg->under_oom); + seq_printf(sf, "oom_kill %lu\n", + atomic_long_read(&memcg->memory_events[MEMCG_OOM_KILL])); + return 0; +} + +static int mem_cgroup_oom_control_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + /* cannot set to root cgroup and only 0 and 1 are allowed */ + if (mem_cgroup_is_root(memcg) || !((val == 0) || (val == 1))) + return -EINVAL; + + WRITE_ONCE(memcg->oom_kill_disable, val); + if (!val) + memcg1_oom_recover(memcg); + + return 0; +} + +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) +static int mem_cgroup_slab_show(struct seq_file *m, void *p) +{ + /* + * Deprecated. + * Please, take a look at tools/cgroup/memcg_slabinfo.py . + */ + return 0; +} +#endif + +struct cftype mem_cgroup_legacy_files[] = { + { + .name = "usage_in_bytes", + .private = MEMFILE_PRIVATE(_MEM, RES_USAGE), + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "max_usage_in_bytes", + .private = MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "limit_in_bytes", + .private = MEMFILE_PRIVATE(_MEM, RES_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "soft_limit_in_bytes", + .private = MEMFILE_PRIVATE(_MEM, RES_SOFT_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "failcnt", + .private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "stat", + .seq_show = memory_stat_show, + }, + { + .name = "force_empty", + .write = mem_cgroup_force_empty_write, + }, + { + .name = "use_hierarchy", + .write_u64 = mem_cgroup_hierarchy_write, + .read_u64 = mem_cgroup_hierarchy_read, + }, + { + .name = "cgroup.event_control", /* XXX: for compat */ + .write = memcg_write_event_control, + .flags = CFTYPE_NO_PREFIX | CFTYPE_WORLD_WRITABLE, + }, + { + .name = "swappiness", + .read_u64 = mem_cgroup_swappiness_read, + .write_u64 = mem_cgroup_swappiness_write, + }, + { + .name = "move_charge_at_immigrate", + .read_u64 = mem_cgroup_move_charge_read, + .write_u64 = mem_cgroup_move_charge_write, + }, + { + .name = "oom_control", + .seq_show = mem_cgroup_oom_control_read, + .write_u64 = mem_cgroup_oom_control_write, + }, + { + .name = "pressure_level", + .seq_show = mem_cgroup_dummy_seq_show, + }, +#ifdef CONFIG_NUMA + { + .name = "numa_stat", + .seq_show = memcg_numa_stat_show, + }, +#endif + { + .name = "kmem.limit_in_bytes", + .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "kmem.usage_in_bytes", + .private = MEMFILE_PRIVATE(_KMEM, RES_USAGE), + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "kmem.failcnt", + .private = MEMFILE_PRIVATE(_KMEM, RES_FAILCNT), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "kmem.max_usage_in_bytes", + .private = MEMFILE_PRIVATE(_KMEM, RES_MAX_USAGE), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) + { + .name = "kmem.slabinfo", + .seq_show = mem_cgroup_slab_show, + }, +#endif + { + .name = "kmem.tcp.limit_in_bytes", + .private = MEMFILE_PRIVATE(_TCP, RES_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "kmem.tcp.usage_in_bytes", + .private = MEMFILE_PRIVATE(_TCP, RES_USAGE), + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "kmem.tcp.failcnt", + .private = MEMFILE_PRIVATE(_TCP, RES_FAILCNT), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "kmem.tcp.max_usage_in_bytes", + .private = MEMFILE_PRIVATE(_TCP, RES_MAX_USAGE), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { }, /* terminate */ +}; + +struct cftype memsw_files[] = { + { + .name = "memsw.usage_in_bytes", + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE), + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "memsw.max_usage_in_bytes", + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "memsw.limit_in_bytes", + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, + { + .name = "memsw.failcnt", + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT), + .write = mem_cgroup_reset, + .read_u64 = mem_cgroup_read_u64, + }, + { }, /* terminate */ +}; + static int __init memcg1_init(void) { int node; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 972c493a8ae3..7be4670d9abb 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -3,6 +3,8 @@ #ifndef __MM_MEMCONTROL_V1_H #define __MM_MEMCONTROL_V1_H +#include + void memcg1_update_tree(struct mem_cgroup *memcg, int nid); void memcg1_remove_from_trees(struct mem_cgroup *memcg); @@ -34,12 +36,6 @@ int memcg1_can_attach(struct cgroup_taskset *tset); void memcg1_cancel_attach(struct cgroup_taskset *tset); void memcg1_move_task(void); -struct cftype; -u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css, - struct cftype *cft); -int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val); - /* * Per memcg event counter is incremented at every pagein/pageout. With THP, * it will be incremented by the number of pages. This counter is used @@ -86,11 +82,28 @@ enum res_type { bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, enum mem_cgroup_events_target target); unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); -ssize_t memcg_write_event_control(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off); bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); void memcg1_oom_recover(struct mem_cgroup *memcg); +void drain_all_stock(struct mem_cgroup *root_memcg); +unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, + unsigned int lru_mask, bool tree); +unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, + int nid, unsigned int lru_mask, + bool tree); + +unsigned long memcg_events(struct mem_cgroup *memcg, int event); +unsigned long memcg_events_local(struct mem_cgroup *memcg, int event); +unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx); +unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item); +unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item); +int memory_stat_show(struct seq_file *m, void *v); + +void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s); + +extern struct cftype memsw_files[]; +extern struct cftype mem_cgroup_legacy_files[]; + #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 37e0af5b26f3..c7341e811945 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -96,10 +96,6 @@ static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq); #define THRESHOLDS_EVENTS_TARGET 128 #define SOFTLIMIT_EVENTS_TARGET 1024 -#define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val)) -#define MEMFILE_TYPE(val) ((val) >> 16 & 0xffff) -#define MEMFILE_ATTR(val) ((val) & 0xffff) - static inline bool task_is_dying(void) { return tsk_is_oom_victim(current) || fatal_signal_pending(current) || @@ -676,7 +672,7 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, } /* idx can be of type enum memcg_stat_item or node_stat_item. */ -static unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx) +unsigned long memcg_page_state_local(struct mem_cgroup *memcg, int idx) { long x; int i = memcg_stats_index(idx); @@ -825,7 +821,7 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, memcg_stats_unlock(); } -static unsigned long memcg_events(struct mem_cgroup *memcg, int event) +unsigned long memcg_events(struct mem_cgroup *memcg, int event) { int i = memcg_events_index(event); @@ -835,7 +831,7 @@ static unsigned long memcg_events(struct mem_cgroup *memcg, int event) return READ_ONCE(memcg->vmstats->events[i]); } -static unsigned long memcg_events_local(struct mem_cgroup *memcg, int event) +unsigned long memcg_events_local(struct mem_cgroup *memcg, int event) { int i = memcg_events_index(event); @@ -1420,15 +1416,13 @@ static int memcg_page_state_output_unit(int item) } } -static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg, - int item) +unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item) { return memcg_page_state(memcg, item) * memcg_page_state_output_unit(item); } -static inline unsigned long memcg_page_state_local_output( - struct mem_cgroup *memcg, int item) +unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item) { return memcg_page_state_local(memcg, item) * memcg_page_state_output_unit(item); @@ -1487,8 +1481,6 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) WARN_ON_ONCE(seq_buf_has_overflowed(s)); } -static void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s); - static void memory_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) { if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) @@ -1861,7 +1853,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) * Drains all per-CPU charge caches for given root_memcg resp. subtree * of the hierarchy under it. */ -static void drain_all_stock(struct mem_cgroup *root_memcg) +void drain_all_stock(struct mem_cgroup *root_memcg) { int cpu, curcpu; @@ -3115,120 +3107,6 @@ void split_page_memcg(struct page *head, int old_order, int new_order) css_get_many(&memcg->css, old_nr / new_nr - 1); } - -static DEFINE_MUTEX(memcg_max_mutex); - -static int mem_cgroup_resize_max(struct mem_cgroup *memcg, - unsigned long max, bool memsw) -{ - bool enlarge = false; - bool drained = false; - int ret; - bool limits_invariant; - struct page_counter *counter = memsw ? &memcg->memsw : &memcg->memory; - - do { - if (signal_pending(current)) { - ret = -EINTR; - break; - } - - mutex_lock(&memcg_max_mutex); - /* - * Make sure that the new limit (memsw or memory limit) doesn't - * break our basic invariant rule memory.max <= memsw.max. - */ - limits_invariant = memsw ? max >= READ_ONCE(memcg->memory.max) : - max <= memcg->memsw.max; - if (!limits_invariant) { - mutex_unlock(&memcg_max_mutex); - ret = -EINVAL; - break; - } - if (max > counter->max) - enlarge = true; - ret = page_counter_set_max(counter, max); - mutex_unlock(&memcg_max_mutex); - - if (!ret) - break; - - if (!drained) { - drain_all_stock(memcg); - drained = true; - continue; - } - - if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, NULL)) { - ret = -EBUSY; - break; - } - } while (true); - - if (!ret && enlarge) - memcg1_oom_recover(memcg); - - return ret; -} - -/* - * Reclaims as many pages from the given memcg as possible. - * - * Caller is responsible for holding css reference for memcg. - */ -static int mem_cgroup_force_empty(struct mem_cgroup *memcg) -{ - int nr_retries = MAX_RECLAIM_RETRIES; - - /* we call try-to-free pages for make this cgroup empty */ - lru_add_drain_all(); - - drain_all_stock(memcg); - - /* try to free all pages in this cgroup */ - while (nr_retries && page_counter_read(&memcg->memory)) { - if (signal_pending(current)) - return -EINTR; - - if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - MEMCG_RECLAIM_MAY_SWAP, NULL)) - nr_retries--; - } - - return 0; -} - -static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of, - char *buf, size_t nbytes, - loff_t off) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); - - if (mem_cgroup_is_root(memcg)) - return -EINVAL; - return mem_cgroup_force_empty(memcg) ?: nbytes; -} - -static u64 mem_cgroup_hierarchy_read(struct cgroup_subsys_state *css, - struct cftype *cft) -{ - return 1; -} - -static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - if (val == 1) - return 0; - - pr_warn_once("Non-hierarchical mode is deprecated. " - "Please report your usecase to linux-mm@kvack.org if you " - "depend on this functionality.\n"); - - return -EINVAL; -} - unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) { unsigned long val; @@ -3251,67 +3129,6 @@ unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) return val; } -enum { - RES_USAGE, - RES_LIMIT, - RES_MAX_USAGE, - RES_FAILCNT, - RES_SOFT_LIMIT, -}; - -static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, - struct cftype *cft) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(css); - struct page_counter *counter; - - switch (MEMFILE_TYPE(cft->private)) { - case _MEM: - counter = &memcg->memory; - break; - case _MEMSWAP: - counter = &memcg->memsw; - break; - case _KMEM: - counter = &memcg->kmem; - break; - case _TCP: - counter = &memcg->tcpmem; - break; - default: - BUG(); - } - - switch (MEMFILE_ATTR(cft->private)) { - case RES_USAGE: - if (counter == &memcg->memory) - return (u64)mem_cgroup_usage(memcg, false) * PAGE_SIZE; - if (counter == &memcg->memsw) - return (u64)mem_cgroup_usage(memcg, true) * PAGE_SIZE; - return (u64)page_counter_read(counter) * PAGE_SIZE; - case RES_LIMIT: - return (u64)counter->max * PAGE_SIZE; - case RES_MAX_USAGE: - return (u64)counter->watermark * PAGE_SIZE; - case RES_FAILCNT: - return counter->failcnt; - case RES_SOFT_LIMIT: - return (u64)READ_ONCE(memcg->soft_limit) * PAGE_SIZE; - default: - BUG(); - } -} - -/* - * This function doesn't do anything useful. Its only job is to provide a read - * handler for a file so that cgroup_file_mode() will add read permissions. - */ -static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m, - __always_unused void *v) -{ - return -EINVAL; -} - #ifdef CONFIG_MEMCG_KMEM static int memcg_online_kmem(struct mem_cgroup *memcg) { @@ -3373,139 +3190,9 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg) } #endif /* CONFIG_MEMCG_KMEM */ -static int memcg_update_tcp_max(struct mem_cgroup *memcg, unsigned long max) -{ - int ret; - - mutex_lock(&memcg_max_mutex); - - ret = page_counter_set_max(&memcg->tcpmem, max); - if (ret) - goto out; - - if (!memcg->tcpmem_active) { - /* - * The active flag needs to be written after the static_key - * update. This is what guarantees that the socket activation - * function is the last one to run. See mem_cgroup_sk_alloc() - * for details, and note that we don't mark any socket as - * belonging to this memcg until that flag is up. - * - * We need to do this, because static_keys will span multiple - * sites, but we can't control their order. If we mark a socket - * as accounted, but the accounting functions are not patched in - * yet, we'll lose accounting. - * - * We never race with the readers in mem_cgroup_sk_alloc(), - * because when this value change, the code to process it is not - * patched in yet. - */ - static_branch_inc(&memcg_sockets_enabled_key); - memcg->tcpmem_active = true; - } -out: - mutex_unlock(&memcg_max_mutex); - return ret; -} - -/* - * The user of this function is... - * RES_LIMIT. - */ -static ssize_t mem_cgroup_write(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); - unsigned long nr_pages; - int ret; - - buf = strstrip(buf); - ret = page_counter_memparse(buf, "-1", &nr_pages); - if (ret) - return ret; - - switch (MEMFILE_ATTR(of_cft(of)->private)) { - case RES_LIMIT: - if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ - ret = -EINVAL; - break; - } - switch (MEMFILE_TYPE(of_cft(of)->private)) { - case _MEM: - ret = mem_cgroup_resize_max(memcg, nr_pages, false); - break; - case _MEMSWAP: - ret = mem_cgroup_resize_max(memcg, nr_pages, true); - break; - case _KMEM: - pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. " - "Writing any value to this file has no effect. " - "Please report your usecase to linux-mm@kvack.org if you " - "depend on this functionality.\n"); - ret = 0; - break; - case _TCP: - ret = memcg_update_tcp_max(memcg, nr_pages); - break; - } - break; - case RES_SOFT_LIMIT: - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - ret = -EOPNOTSUPP; - } else { - WRITE_ONCE(memcg->soft_limit, nr_pages); - ret = 0; - } - break; - } - return ret ?: nbytes; -} - -static ssize_t mem_cgroup_reset(struct kernfs_open_file *of, char *buf, - size_t nbytes, loff_t off) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); - struct page_counter *counter; - - switch (MEMFILE_TYPE(of_cft(of)->private)) { - case _MEM: - counter = &memcg->memory; - break; - case _MEMSWAP: - counter = &memcg->memsw; - break; - case _KMEM: - counter = &memcg->kmem; - break; - case _TCP: - counter = &memcg->tcpmem; - break; - default: - BUG(); - } - - switch (MEMFILE_ATTR(of_cft(of)->private)) { - case RES_MAX_USAGE: - page_counter_reset_watermark(counter); - break; - case RES_FAILCNT: - counter->failcnt = 0; - break; - default: - BUG(); - } - - return nbytes; -} - -#ifdef CONFIG_NUMA - -#define LRU_ALL_FILE (BIT(LRU_INACTIVE_FILE) | BIT(LRU_ACTIVE_FILE)) -#define LRU_ALL_ANON (BIT(LRU_INACTIVE_ANON) | BIT(LRU_ACTIVE_ANON)) -#define LRU_ALL ((1 << NR_LRU_LISTS) - 1) - -static unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, - int nid, unsigned int lru_mask, bool tree) +unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, + int nid, unsigned int lru_mask, + bool tree) { struct lruvec *lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(nid)); unsigned long nr = 0; @@ -3524,9 +3211,8 @@ static unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, return nr; } -static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, - unsigned int lru_mask, - bool tree) +unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, + unsigned int lru_mask, bool tree) { unsigned long nr = 0; enum lru_list lru; @@ -3542,221 +3228,6 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, return nr; } -static int memcg_numa_stat_show(struct seq_file *m, void *v) -{ - struct numa_stat { - const char *name; - unsigned int lru_mask; - }; - - static const struct numa_stat stats[] = { - { "total", LRU_ALL }, - { "file", LRU_ALL_FILE }, - { "anon", LRU_ALL_ANON }, - { "unevictable", BIT(LRU_UNEVICTABLE) }, - }; - const struct numa_stat *stat; - int nid; - struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - - mem_cgroup_flush_stats(memcg); - - for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { - seq_printf(m, "%s=%lu", stat->name, - mem_cgroup_nr_lru_pages(memcg, stat->lru_mask, - false)); - for_each_node_state(nid, N_MEMORY) - seq_printf(m, " N%d=%lu", nid, - mem_cgroup_node_nr_lru_pages(memcg, nid, - stat->lru_mask, false)); - seq_putc(m, '\n'); - } - - for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { - - seq_printf(m, "hierarchical_%s=%lu", stat->name, - mem_cgroup_nr_lru_pages(memcg, stat->lru_mask, - true)); - for_each_node_state(nid, N_MEMORY) - seq_printf(m, " N%d=%lu", nid, - mem_cgroup_node_nr_lru_pages(memcg, nid, - stat->lru_mask, true)); - seq_putc(m, '\n'); - } - - return 0; -} -#endif /* CONFIG_NUMA */ - -static const unsigned int memcg1_stats[] = { - NR_FILE_PAGES, - NR_ANON_MAPPED, -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - NR_ANON_THPS, -#endif - NR_SHMEM, - NR_FILE_MAPPED, - NR_FILE_DIRTY, - NR_WRITEBACK, - WORKINGSET_REFAULT_ANON, - WORKINGSET_REFAULT_FILE, -#ifdef CONFIG_SWAP - MEMCG_SWAP, - NR_SWAPCACHE, -#endif -}; - -static const char *const memcg1_stat_names[] = { - "cache", - "rss", -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - "rss_huge", -#endif - "shmem", - "mapped_file", - "dirty", - "writeback", - "workingset_refault_anon", - "workingset_refault_file", -#ifdef CONFIG_SWAP - "swap", - "swapcached", -#endif -}; - -/* Universal VM events cgroup1 shows, original sort order */ -static const unsigned int memcg1_events[] = { - PGPGIN, - PGPGOUT, - PGFAULT, - PGMAJFAULT, -}; - -static void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) -{ - unsigned long memory, memsw; - struct mem_cgroup *mi; - unsigned int i; - - BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats)); - - mem_cgroup_flush_stats(memcg); - - for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { - unsigned long nr; - - nr = memcg_page_state_local_output(memcg, memcg1_stats[i]); - seq_buf_printf(s, "%s %lu\n", memcg1_stat_names[i], nr); - } - - for (i = 0; i < ARRAY_SIZE(memcg1_events); i++) - seq_buf_printf(s, "%s %lu\n", vm_event_name(memcg1_events[i]), - memcg_events_local(memcg, memcg1_events[i])); - - for (i = 0; i < NR_LRU_LISTS; i++) - seq_buf_printf(s, "%s %lu\n", lru_list_name(i), - memcg_page_state_local(memcg, NR_LRU_BASE + i) * - PAGE_SIZE); - - /* Hierarchical information */ - memory = memsw = PAGE_COUNTER_MAX; - for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) { - memory = min(memory, READ_ONCE(mi->memory.max)); - memsw = min(memsw, READ_ONCE(mi->memsw.max)); - } - seq_buf_printf(s, "hierarchical_memory_limit %llu\n", - (u64)memory * PAGE_SIZE); - seq_buf_printf(s, "hierarchical_memsw_limit %llu\n", - (u64)memsw * PAGE_SIZE); - - for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { - unsigned long nr; - - nr = memcg_page_state_output(memcg, memcg1_stats[i]); - seq_buf_printf(s, "total_%s %llu\n", memcg1_stat_names[i], - (u64)nr); - } - - for (i = 0; i < ARRAY_SIZE(memcg1_events); i++) - seq_buf_printf(s, "total_%s %llu\n", - vm_event_name(memcg1_events[i]), - (u64)memcg_events(memcg, memcg1_events[i])); - - for (i = 0; i < NR_LRU_LISTS; i++) - seq_buf_printf(s, "total_%s %llu\n", lru_list_name(i), - (u64)memcg_page_state(memcg, NR_LRU_BASE + i) * - PAGE_SIZE); - -#ifdef CONFIG_DEBUG_VM - { - pg_data_t *pgdat; - struct mem_cgroup_per_node *mz; - unsigned long anon_cost = 0; - unsigned long file_cost = 0; - - for_each_online_pgdat(pgdat) { - mz = memcg->nodeinfo[pgdat->node_id]; - - anon_cost += mz->lruvec.anon_cost; - file_cost += mz->lruvec.file_cost; - } - seq_buf_printf(s, "anon_cost %lu\n", anon_cost); - seq_buf_printf(s, "file_cost %lu\n", file_cost); - } -#endif -} - -static u64 mem_cgroup_swappiness_read(struct cgroup_subsys_state *css, - struct cftype *cft) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(css); - - return mem_cgroup_swappiness(memcg); -} - -static int mem_cgroup_swappiness_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(css); - - if (val > MAX_SWAPPINESS) - return -EINVAL; - - if (!mem_cgroup_is_root(memcg)) - WRITE_ONCE(memcg->swappiness, val); - else - WRITE_ONCE(vm_swappiness, val); - - return 0; -} - -static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v) -{ - struct mem_cgroup *memcg = mem_cgroup_from_seq(sf); - - seq_printf(sf, "oom_kill_disable %d\n", READ_ONCE(memcg->oom_kill_disable)); - seq_printf(sf, "under_oom %d\n", (bool)memcg->under_oom); - seq_printf(sf, "oom_kill %lu\n", - atomic_long_read(&memcg->memory_events[MEMCG_OOM_KILL])); - return 0; -} - -static int mem_cgroup_oom_control_write(struct cgroup_subsys_state *css, - struct cftype *cft, u64 val) -{ - struct mem_cgroup *memcg = mem_cgroup_from_css(css); - - /* cannot set to root cgroup and only 0 and 1 are allowed */ - if (mem_cgroup_is_root(memcg) || !((val == 0) || (val == 1))) - return -EINVAL; - - WRITE_ONCE(memcg->oom_kill_disable, val); - if (!val) - memcg1_oom_recover(memcg); - - return 0; -} - #ifdef CONFIG_CGROUP_WRITEBACK #include @@ -3970,147 +3441,6 @@ static void memcg_wb_domain_size_changed(struct mem_cgroup *memcg) #endif /* CONFIG_CGROUP_WRITEBACK */ -#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) -static int mem_cgroup_slab_show(struct seq_file *m, void *p) -{ - /* - * Deprecated. - * Please, take a look at tools/cgroup/memcg_slabinfo.py . - */ - return 0; -} -#endif - -static int memory_stat_show(struct seq_file *m, void *v); - -static struct cftype mem_cgroup_legacy_files[] = { - { - .name = "usage_in_bytes", - .private = MEMFILE_PRIVATE(_MEM, RES_USAGE), - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "max_usage_in_bytes", - .private = MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "limit_in_bytes", - .private = MEMFILE_PRIVATE(_MEM, RES_LIMIT), - .write = mem_cgroup_write, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "soft_limit_in_bytes", - .private = MEMFILE_PRIVATE(_MEM, RES_SOFT_LIMIT), - .write = mem_cgroup_write, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "failcnt", - .private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "stat", - .seq_show = memory_stat_show, - }, - { - .name = "force_empty", - .write = mem_cgroup_force_empty_write, - }, - { - .name = "use_hierarchy", - .write_u64 = mem_cgroup_hierarchy_write, - .read_u64 = mem_cgroup_hierarchy_read, - }, - { - .name = "cgroup.event_control", /* XXX: for compat */ - .write = memcg_write_event_control, - .flags = CFTYPE_NO_PREFIX | CFTYPE_WORLD_WRITABLE, - }, - { - .name = "swappiness", - .read_u64 = mem_cgroup_swappiness_read, - .write_u64 = mem_cgroup_swappiness_write, - }, - { - .name = "move_charge_at_immigrate", - .read_u64 = mem_cgroup_move_charge_read, - .write_u64 = mem_cgroup_move_charge_write, - }, - { - .name = "oom_control", - .seq_show = mem_cgroup_oom_control_read, - .write_u64 = mem_cgroup_oom_control_write, - }, - { - .name = "pressure_level", - .seq_show = mem_cgroup_dummy_seq_show, - }, -#ifdef CONFIG_NUMA - { - .name = "numa_stat", - .seq_show = memcg_numa_stat_show, - }, -#endif - { - .name = "kmem.limit_in_bytes", - .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT), - .write = mem_cgroup_write, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "kmem.usage_in_bytes", - .private = MEMFILE_PRIVATE(_KMEM, RES_USAGE), - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "kmem.failcnt", - .private = MEMFILE_PRIVATE(_KMEM, RES_FAILCNT), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "kmem.max_usage_in_bytes", - .private = MEMFILE_PRIVATE(_KMEM, RES_MAX_USAGE), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, -#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_SLUB_DEBUG) - { - .name = "kmem.slabinfo", - .seq_show = mem_cgroup_slab_show, - }, -#endif - { - .name = "kmem.tcp.limit_in_bytes", - .private = MEMFILE_PRIVATE(_TCP, RES_LIMIT), - .write = mem_cgroup_write, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "kmem.tcp.usage_in_bytes", - .private = MEMFILE_PRIVATE(_TCP, RES_USAGE), - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "kmem.tcp.failcnt", - .private = MEMFILE_PRIVATE(_TCP, RES_FAILCNT), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "kmem.tcp.max_usage_in_bytes", - .private = MEMFILE_PRIVATE(_TCP, RES_MAX_USAGE), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { }, /* terminate */ -}; - /* * Private memory cgroup IDR * @@ -4902,7 +4232,7 @@ static int memory_events_local_show(struct seq_file *m, void *v) return 0; } -static int memory_stat_show(struct seq_file *m, void *v) +int memory_stat_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_seq(m); char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL); @@ -6133,33 +5463,6 @@ static struct cftype swap_files[] = { { } /* terminate */ }; -static struct cftype memsw_files[] = { - { - .name = "memsw.usage_in_bytes", - .private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE), - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "memsw.max_usage_in_bytes", - .private = MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "memsw.limit_in_bytes", - .private = MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT), - .write = mem_cgroup_write, - .read_u64 = mem_cgroup_read_u64, - }, - { - .name = "memsw.failcnt", - .private = MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT), - .write = mem_cgroup_reset, - .read_u64 = mem_cgroup_read_u64, - }, - { }, /* terminate */ -}; - #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) /** * obj_cgroup_may_zswap - check if this cgroup can zswap From patchwork Tue Jun 25 00:59:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B677C30659 for ; Tue, 25 Jun 2024 01:00:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 30C786B0360; Mon, 24 Jun 2024 20:59:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 293F16B0361; Mon, 24 Jun 2024 20:59:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0246C6B0363; Mon, 24 Jun 2024 20:59:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D4BF46B0360 for ; Mon, 24 Jun 2024 20:59:53 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8AE24A14A2 for ; Tue, 25 Jun 2024 00:59:53 +0000 (UTC) X-FDA: 82267603866.01.F18B30B Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) by imf20.hostedemail.com (Postfix) with ESMTP id 7C2D31C0006 for ; Tue, 25 Jun 2024 00:59:51 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PlGf6lnG; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf20.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277180; a=rsa-sha256; cv=none; b=ymSEyPGK7szd8WihXzQs82NvvgaeW4x/ss1VwZoNhRQWbePliFUNFTrXiSYSEpB5acN26v wYKADHaTSIeYgsElIfTtU1j9wTvJmWTscBwMHke2mgE59L5UB2Q004q3H6ICyiItC0MbGY 6CNVzAhok3xKI+czh8KxAmpbVUwR/0Y= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PlGf6lnG; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf20.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277180; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i6OtryeQEKVoJSpeL/4FN+6EUWrG1/pihifLngpHrqA=; b=b77t9QE7QtPlXv6MRUaY7O5sUn4WMc0yfFh+v8Qlr4ooQ0LwSJOj09Grdx28qrMLC+l9Qt 7aXTAGmes3O3hWLFoQZ3W04r65F5Ib4N8m++jurmeVT+hRLDUXeyl6YiUdBBaUREZdgDvM syRxULOJqLrVQw4rb+XkuyVZXY/y3cE= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277190; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i6OtryeQEKVoJSpeL/4FN+6EUWrG1/pihifLngpHrqA=; b=PlGf6lnGIQXNmfnq2Lb4Tkf4shhTkmXLgelP2ND3Ym1XBUrQgwU2RRuocobaKhgUDIawY9 y/qwRqVY0IWNNQJ78VvWbqijJuUQHx0Fx3ff0YC5Hnm0PZzzLh8G6Fp4u+r+ci0iSk4sBa ygiGKYxBilotTIeiKryEUQaV54ffcHg= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 11/14] mm: memcg: make memcg1_update_tree() static Date: Mon, 24 Jun 2024 17:59:03 -0700 Message-ID: <20240625005906.106920-12-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 7C2D31C0006 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: fw3n9ti5f4rtdf8pf31rug1ou7ifxu4n X-HE-Tag: 1719277191-424723 X-HE-Meta: U2FsdGVkX19WgEUlv0acH7u293XBdObsX0fZKuje2v2NzSY0aoexqoUwKoLvJf7hBcxAk2emfU0KhSctaLdM/zA1HciOVxMzjtJ5FY71L06yuWidHrQg91x+vmYbrzsvj0TYF0X60QV2wWDdOM9P1BPC8c9Lg0qUyxHQgNIn8rMfmy4rBB4N3/0x14g1Iz6kFqlWRWsCZo/NHHMCPMvY8GSU7+RKcBBZ64DpTuVR5tZDI/cw1Y5078N1sxmybuXWVRfMaqbL55Xs7sSM6bdgGuIZewOo3+38hnenh+q/QC6KCH8FSUxRGxYYoWndwFJqYmAWFs4SuYLt1t/yVgygBJ8wTK1ZxZDIU5DIzuj/Sd11OqCSGdkntaiDVTRB2q4FkvIxLlHB4LOZ2UB6OZun8a5bRkpyu7wxLAjAEu4tKEQ190LIdmfnw6CBPnPY5bzYMPtRrh1JS8oC1o2qZOojvzEjjEi27qtdGrK3T3R2gzXCUz/VjdutPGBCPX0kKTmDGqmdfXNPGEQEloMG5RRjBIMYGwnjrdUzVn7xeTSDKgysSLOiTLxbgpiyENfjgtntgaa/oDPfCssp56Xr69RfMqis83dJecATNsEGKEfPXXNlbRPMTSIum8TEMZmSRnTk5d/FaRNPOCypQt/wi8QzmWKyy0XRw5t+OMpvnXKVRYXsms5fpDyB0dk4jLxKO48KrY75Qt92Op4cbfU6pzo0X5CV+4plQSUwLj7dfy1MpE1JXYZGO/x2jYYcHdrllwxhmxdMM9FubuVQky+Dgyy8OxUf4qT3qUml3//qOEXysfDJcVKyWzZeJwsSPNAmJGdHg+myOjXNaQ7jK2v31IexR8u/v28majMRG0go7rs8dwwHj9TvSw+sxRxUUHKQhunmpqE6YFUVMSAb2q7VkSEz5pp/ted8rMCek8d9wyTR0oF/aGLCS2y4CHVP1w1N8EtWLjttG+Ylo9Bzh+wAmEE LhWT7AMR v9jLlldcKSX+nSG6iI21mJ96CaHdMt+RPR2Q0y4HDIxwFvT5RLBeeIgiq6iBbFnQtmHZ8K/oddtE6vwT7n0oFeuHYWET+IltFUScYmjSgFAA21R0S++e3sS8AyR3xl4gbbNftOnf4Y/k41aMy1bCq69mnPmUrZ+2VujOb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: memcg1_update_tree() is not used outside of mm/memcontrol-v1.c anymore, define it as static and remove the declaration from the header file. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- mm/memcontrol-v1.c | 2 +- mm/memcontrol-v1.h | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 1b7337d0170d..f89de413004b 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -201,7 +201,7 @@ static unsigned long soft_limit_excess(struct mem_cgroup *memcg) return excess; } -void memcg1_update_tree(struct mem_cgroup *memcg, int nid) +static void memcg1_update_tree(struct mem_cgroup *memcg, int nid) { unsigned long excess; struct mem_cgroup_per_node *mz; diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 7be4670d9abb..7d6ac4a4fb36 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -5,7 +5,6 @@ #include -void memcg1_update_tree(struct mem_cgroup *memcg, int nid); void memcg1_remove_from_trees(struct mem_cgroup *memcg); static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) From patchwork Tue Jun 25 00:59:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710406 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 050E3C3064D for ; Tue, 25 Jun 2024 01:00:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 30B296B0363; Mon, 24 Jun 2024 20:59:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 293E26B0366; Mon, 24 Jun 2024 20:59:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E6796B0367; Mon, 24 Jun 2024 20:59:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E08046B0363 for ; Mon, 24 Jun 2024 20:59:55 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9006B41482 for ; Tue, 25 Jun 2024 00:59:55 +0000 (UTC) X-FDA: 82267603950.28.459C689 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) by imf03.hostedemail.com (Postfix) with ESMTP id 6DA8920013 for ; Tue, 25 Jun 2024 00:59:53 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ISVeAmKN; spf=pass (imf03.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277174; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zJ3ODeZAUzmBmpS974GYslSkPfIUFwRZ/UoK6P7Hc/E=; b=RmrlOJqigljC8FsxfICM2I2OcAhAG6+eGBDpcN6R7m9XEKwbf7NCKLyDzZoH9IHaKNOTfW f7unNImvLjObx7sET0b20kCmzi9jAt7BZqJ8agMt72RKvLYisM0NKzbatHIIr1UycD+OQN xbvBqWi4qJVr3v09UKEYLhFcGTBPu5I= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ISVeAmKN; spf=pass (imf03.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277174; a=rsa-sha256; cv=none; b=fP1y/9o2ybGpVKFd3flO/iV/RXQZm4CMf1S3C9glmsh1C0Kr75ZmHHMpeaZyj+wY+1XX2V lsCSH7U7ZbKsXLsOBPpG5NWMFmzx5vPn57nBrRFnM1Y17LOXvRAKsoK2N+KC2p+wh+HaCp vDHm1eiUHlooB83jPyktvz7xydUK7rA= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277192; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zJ3ODeZAUzmBmpS974GYslSkPfIUFwRZ/UoK6P7Hc/E=; b=ISVeAmKN40Ye3C2e8nOk/sLPBa9vn5C3IabRiHVDyM7fVcdHK4iwaSqjQTJhI4w7xXS49+ TcCWJ2eM9yk2nhKW0+YUkPaWA+XyEtwmvHmLWK2fKGT31wVjvJohmZARjIB/joYkrMpIKJ rjUqnhP5spuFw+Z/tNtF8zHs3mgraZA= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 12/14] mm: memcg: group cgroup v1 memcg related declarations Date: Mon, 24 Jun 2024 17:59:04 -0700 Message-ID: <20240625005906.106920-13-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6DA8920013 X-Stat-Signature: w8iikgsxfadhfuadtb9frgqb8fhaf6jg X-Rspam-User: X-HE-Tag: 1719277193-941900 X-HE-Meta: U2FsdGVkX19+QevMawLyNBupxf1TYxm1robvf/KLvFfMHdqSsJh8DybGVWxoO6qfYMVx3uDGQYYb2pyZu0qC9Sj/GBNMdEBPC/zulCjgbDqCBR8FbiJ7blNmdvJsRj4GArsHApfquC1oT4G8nleXKZ3rByuBEkM3kyhGyjIQ8jF8vgTrWMyeiv2H7L2unWuwrEde7PgfYl/tJmFD6+sgKm97Nf1vvn6PHKxOwq/1Z7e3y5xsQfMYx74rJ8PkyIMCQj0Hj5EE6zHReZttG6TUvoL4RpjM7gYMNJTzEG5acsZ71CldvAZJnWFIO+D8RjLyYiZATWNL5pIwR57nDx+EB9EDfBVxI0WEFcP19oc0XpJoaPhsYPbk6HIuEGlhr65WJNevl0/740ljWaEfBUmmNWBXbXFzGbYEn2u43dBPM52G4n8y2gVHpsk3rOir3FbWsEf7JhOq9LSPazaZrPHGrj1Ppl8vm7OviXOy5FnJ3Pd76fwFNTUyrdGTCIpOZqGL61oAlJP6mPfVjLlwx/GOMA1jwOlOA/LDCWk6Lj2qHq/X+RxwFZApu7Typ9b32cIjaOCmd8nusSK5Bi0dyJAoyCZZ20pGoOKT65Lv5v4X2D26Gtfb712ApZAv7BI9DztlknbbFMTHMYcaOPD+2UQtV0ZhWULCufZUMxEf2AP0odJt93nuIJa9Jg7tGB12eJIyQuyUSTOpNEuFJl0rljD1o3mSay8xUbw5rpAytbpJQU3c3We1k4nJNFdg2A3uCaogoAWZTYDEj2DDXRUF3Ti+qEm71kp9mnLdZ53me+pHJw6o278o5QboblNBj57BRamv3ksKyJrSq4JujkWeUwamr0BJ2yc6eI7UV4Y6mZdx4fU1QemkUvVPzsupt9D62vcQfeaQglk3NnwkSildVYUzvDCfLWhWmwFkI5wcbeJeTubSg2CGzlGwNeWpuRcVzB39odvY11Xr6Ybn/0sF4nd vXPWu5qx 7ZoDbOjkVWv3IwnxzhNlWeBq7kFqACZSFGy4T0WtW4wx7/1Re5Gjlbmb3a+mXzFgVtZoJH/CIxUb2yBbBQEi/SLKsbERt3YHPLVk55QKyT2JRRT0NTeAAxa6eDQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Group all cgroup v1-related declarations at the end of memcontrol.h and mm/memcontrol-v1.h with an intention to put them all together under a config option later on. It should make things easier to follow and maintain too. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- include/linux/memcontrol.h | 144 +++++++++++++++++++------------------ mm/memcontrol-v1.h | 89 ++++++++++++----------- 2 files changed, 123 insertions(+), 110 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 588179d29849..a70d64ed04f5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -950,39 +950,13 @@ static inline void mem_cgroup_exit_user_fault(void) current->in_user_fault = 0; } -static inline bool task_in_memcg_oom(struct task_struct *p) -{ - return p->memcg_in_oom; -} - -bool mem_cgroup_oom_synchronize(bool wait); struct mem_cgroup *mem_cgroup_get_oom_group(struct task_struct *victim, struct mem_cgroup *oom_domain); void mem_cgroup_print_oom_group(struct mem_cgroup *memcg); -void folio_memcg_lock(struct folio *folio); -void folio_memcg_unlock(struct folio *folio); - void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int val); -/* try to stablize folio_memcg() for all the pages in a memcg */ -static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) -{ - rcu_read_lock(); - - if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account)) - return true; - - rcu_read_unlock(); - return false; -} - -static inline void mem_cgroup_unlock_pages(void) -{ - rcu_read_unlock(); -} - /* idx can be of type enum memcg_stat_item or node_stat_item */ static inline void mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx, int val) @@ -1109,10 +1083,6 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, void split_page_memcg(struct page *head, int old_order, int new_order); -unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned); - #else /* CONFIG_MEMCG */ #define MEM_CGROUP_ID_SHIFT 0 @@ -1423,26 +1393,6 @@ mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) { } -static inline void folio_memcg_lock(struct folio *folio) -{ -} - -static inline void folio_memcg_unlock(struct folio *folio) -{ -} - -static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) -{ - /* to match folio_memcg_rcu() */ - rcu_read_lock(); - return true; -} - -static inline void mem_cgroup_unlock_pages(void) -{ - rcu_read_unlock(); -} - static inline void mem_cgroup_handle_over_high(gfp_t gfp_mask) { } @@ -1455,16 +1405,6 @@ static inline void mem_cgroup_exit_user_fault(void) { } -static inline bool task_in_memcg_oom(struct task_struct *p) -{ - return false; -} - -static inline bool mem_cgroup_oom_synchronize(bool wait) -{ - return false; -} - static inline struct mem_cgroup *mem_cgroup_get_oom_group( struct task_struct *victim, struct mem_cgroup *oom_domain) { @@ -1558,14 +1498,6 @@ void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) static inline void split_page_memcg(struct page *head, int old_order, int new_order) { } - -static inline -unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, - gfp_t gfp_mask, - unsigned long *total_scanned) -{ - return 0; -} #endif /* CONFIG_MEMCG */ /* @@ -1916,4 +1848,80 @@ static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) } #endif + +/* Cgroup v1-related declarations */ + +#ifdef CONFIG_MEMCG +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned); + +bool mem_cgroup_oom_synchronize(bool wait); + +static inline bool task_in_memcg_oom(struct task_struct *p) +{ + return p->memcg_in_oom; +} + +void folio_memcg_lock(struct folio *folio); +void folio_memcg_unlock(struct folio *folio); + +/* try to stablize folio_memcg() for all the pages in a memcg */ +static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) +{ + rcu_read_lock(); + + if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account)) + return true; + + rcu_read_unlock(); + return false; +} + +static inline void mem_cgroup_unlock_pages(void) +{ + rcu_read_unlock(); +} + +#else /* CONFIG_MEMCG */ +static inline +unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, + gfp_t gfp_mask, + unsigned long *total_scanned) +{ + return 0; +} + +static inline void folio_memcg_lock(struct folio *folio) +{ +} + +static inline void folio_memcg_unlock(struct folio *folio) +{ +} + +static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) +{ + /* to match folio_memcg_rcu() */ + rcu_read_lock(); + return true; +} + +static inline void mem_cgroup_unlock_pages(void) +{ + rcu_read_unlock(); +} + +static inline bool task_in_memcg_oom(struct task_struct *p) +{ + return false; +} + +static inline bool mem_cgroup_oom_synchronize(bool wait) +{ + return false; +} + +#endif /* CONFIG_MEMCG */ + #endif /* _LINUX_MEMCONTROL_H */ diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 7d6ac4a4fb36..89d420793048 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -5,15 +5,9 @@ #include -void memcg1_remove_from_trees(struct mem_cgroup *memcg); - -static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) -{ - WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); -} +/* Cgroup v1 and v2 common declarations */ void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages); -void memcg1_check_events(struct mem_cgroup *memcg, int nid); int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages); @@ -29,30 +23,6 @@ static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n); void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n); -bool memcg1_wait_acct_move(struct mem_cgroup *memcg); -struct cgroup_taskset; -int memcg1_can_attach(struct cgroup_taskset *tset); -void memcg1_cancel_attach(struct cgroup_taskset *tset); -void memcg1_move_task(void); - -/* - * Per memcg event counter is incremented at every pagein/pageout. With THP, - * it will be incremented by the number of pages. This counter is used - * to trigger some periodic events. This is straightforward and better - * than using jiffies etc. to handle periodic memcg event. - */ -enum mem_cgroup_events_target { - MEM_CGROUP_TARGET_THRESH, - MEM_CGROUP_TARGET_SOFTLIMIT, - MEM_CGROUP_NTARGETS, -}; - -/* Whether legacy memory+swap accounting is active */ -static bool do_memsw_account(void) -{ - return !cgroup_subsys_on_dfl(memory_cgrp_subsys); -} - /* * Iteration constructs for visiting all cgroups (under a tree). If * loops are exited prematurely (break), mem_cgroup_iter_break() must @@ -68,24 +38,28 @@ static bool do_memsw_account(void) iter != NULL; \ iter = mem_cgroup_iter(NULL, iter, NULL)) -void memcg1_css_offline(struct mem_cgroup *memcg); +/* Whether legacy memory+swap accounting is active */ +static bool do_memsw_account(void) +{ + return !cgroup_subsys_on_dfl(memory_cgrp_subsys); +} -/* for encoding cft->private value on file */ -enum res_type { - _MEM, - _MEMSWAP, - _KMEM, - _TCP, +/* + * Per memcg event counter is incremented at every pagein/pageout. With THP, + * it will be incremented by the number of pages. This counter is used + * to trigger some periodic events. This is straightforward and better + * than using jiffies etc. to handle periodic memcg event. + */ +enum mem_cgroup_events_target { + MEM_CGROUP_TARGET_THRESH, + MEM_CGROUP_TARGET_SOFTLIMIT, + MEM_CGROUP_NTARGETS, }; bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, enum mem_cgroup_events_target target); unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); -bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); -void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); -void memcg1_oom_recover(struct mem_cgroup *memcg); - void drain_all_stock(struct mem_cgroup *root_memcg); unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg, unsigned int lru_mask, bool tree); @@ -100,6 +74,37 @@ unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item); unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item); int memory_stat_show(struct seq_file *m, void *v); +/* Cgroup v1-specific declarations */ + +void memcg1_remove_from_trees(struct mem_cgroup *memcg); + +static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) +{ + WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX); +} + +bool memcg1_wait_acct_move(struct mem_cgroup *memcg); + +struct cgroup_taskset; +int memcg1_can_attach(struct cgroup_taskset *tset); +void memcg1_cancel_attach(struct cgroup_taskset *tset); +void memcg1_move_task(void); +void memcg1_css_offline(struct mem_cgroup *memcg); + +/* for encoding cft->private value on file */ +enum res_type { + _MEM, + _MEMSWAP, + _KMEM, + _TCP, +}; + +bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked); +void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked); +void memcg1_oom_recover(struct mem_cgroup *memcg); + +void memcg1_check_events(struct mem_cgroup *memcg, int nid); + void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s); extern struct cftype memsw_files[]; From patchwork Tue Jun 25 00:59:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A997FC2BD09 for ; Tue, 25 Jun 2024 01:00:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 643046B0367; Mon, 24 Jun 2024 20:59:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5CA5B6B0368; Mon, 24 Jun 2024 20:59:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3CE3C6B0369; Mon, 24 Jun 2024 20:59:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 12BDA6B0367 for ; Mon, 24 Jun 2024 20:59:58 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9C6461A157A for ; Tue, 25 Jun 2024 00:59:57 +0000 (UTC) X-FDA: 82267604034.01.02FCCA2 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) by imf18.hostedemail.com (Postfix) with ESMTP id 7ACF01C0006 for ; Tue, 25 Jun 2024 00:59:55 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="xZ/trUm1"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.186 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277179; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aKa/FfyHPEH4o1sFdYYdtwpHqrQqG+YVicTF0vyr7Es=; b=zq7gRRvlXq6CYhdDzRL/NGnCn/Yz29to1vSOEnKEi+ZpzIy5C/gv5caxbewF34ZUOfAj1C 8N4GUBJzy0DPt2aUM4XKkYMBp8RQGc12j5XVT+PkoxzuZ0y36X3UqWHz417dcvGsGcJOC0 Webl5/1/u2a45TuxdGG+KYy+TyTMLT0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277179; a=rsa-sha256; cv=none; b=GVvy3CJ7WtttcRmvIJzBW6OwC6QMg6bd7ATGZde2b3VvjHxQzJFPLe5YXbHYdyLAJYQVOH r3SoSh23NyKpQUuxy72frHE0xh5+xDc0TP4VWgP4HlKokZsuYpD6PrFDO2bFhzXgURt9ft ytmXGV/tQPhzCUxPCui4BDFy3yxE+KU= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="xZ/trUm1"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.186 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277194; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aKa/FfyHPEH4o1sFdYYdtwpHqrQqG+YVicTF0vyr7Es=; b=xZ/trUm1bPRxjkxHdfwXYw23D8g3nT4RuyvbkwX3GrpmF6OUjQPtABVAGUzjlLOXtMZzFy N73Jm3sr4Sw/t537Ch/wnspIKfD3obUNU29cQ4gbrd662+v0CBRE7p65+BiGKRMKacyg7m gBsWfxX0p2NJAop6892EaPa5rPKrxWk= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 13/14] mm: memcg: put cgroup v1-related members of task_struct under config option Date: Mon, 24 Jun 2024 17:59:05 -0700 Message-ID: <20240625005906.106920-14-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 7ACF01C0006 X-Stat-Signature: ampt5mt3egcbcgnqs5s6emdu4ijrkff4 X-Rspam-User: X-HE-Tag: 1719277195-623815 X-HE-Meta: U2FsdGVkX18Ti9deGHlcNedNqBvNYXMLu688uqaUtZJQxYCi5QBwJi80GUp1DyGb75XJSSfJ6QP3pRXi3cv/OhrYeRKbYrNe+7S1jiosqY6TN0j9i6/eZd5h/Aqp4luDbmJp8d5WKZRy30dg3+4rLP57QEvMooaM/EkVbr1tEKFexLGDTUaYHU2M8TB2t6SQDpb4EJHgHhAn543Xlu8OFX43FpDpZhLjCmjcWtK1MN9G9vnWsxeGwfkbaNERE/P5J/H5yiyrdYWB0QtqExuu/2CuAlWlobgrAn38EAaETBfjRcm2TRCkFLO8kzMQDwlow4GMhzpr19kgJUI5MpGvXpoS1ayQDUh71FpKOuxZr520UzIgXDvjVUzE0XMJ3QqjNUS1LFZz8Cq/gZelNJi0djbmuiBl6hU0QX/TLSkjn7hgqSHUaS+lHIyJWlXLuD8ZbxXj/QbWvZYudJqF5dCD0ufJDMwID2R+lyWAFYiEJHVETZ33NwZDvbmaFbmpbcFItiU9wdxbCM4uF4DVKpZ+Dm1Eg/xsHTIAd9Npq0v5knkA04R2FkYlog++mBgzxawsJcKFqe12K6UyfKdD30o+xNBwJERnjXrZ0dYXL87XbbrKPsF/yDZ/VMADaT2VA0V5rVzr19O1ZDpDZBgnyPI9TdRaXE4DOwZsz8wnE3lalbOEajDLdVORTV1SYmt9TDMgI8ZkxmP5SOww2EmoqFv1LOSqVJ06AnhfeoMa5X1VZ4bLHryPJlFxqemDsoXutm8lpZuELewtOuWEdFRTO6+DS7Koztg10JlMyPFGF1EgbXviKgDmsXumY3OI9qmwu1q6qEHSmbf30Y48OdP8ElqDm3AhfrFhnMN26acNiXv+EzN5sB5+2agJfPdeNRhr4PSI6DbyE2n7jl7wYt5SrIbUHSS6BzWsz8QDTS+NneLkj3N6iGGSx5ThYCcY1KvteLblX8CkXm8nrL8Ra0f/Dex fcop44ZV 7H7XK3FxzPUgAi7ywHB9g+SvGMXskk6dO3ExdxLS76lA3gXwrFqOE/2Ki8DV93/ZkbCDxb7s5ItDOpA7EOSpi2I7Dk2Dd3nsaYqTmMbolJSfCEpS6s/j65QZpxiJZFjyFbSU3zfrprAyrq3+3BiDmODjzgZnztaqLjVcS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Guard cgroup v1-related members of task_struct under the CONFIG_MEMCG_V1 config option, so that users who adopted cgroup v2 don't have to waste the memory for fields which are never accessed. Signed-off-by: Roman Gushchin Acked-by: Michal Hocko --- include/linux/memcontrol.h | 6 +++--- init/Kconfig | 9 +++++++++ mm/Makefile | 3 ++- mm/memcontrol-v1.h | 21 ++++++++++++++++++++- mm/memcontrol.c | 10 +++++++--- 5 files changed, 41 insertions(+), 8 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a70d64ed04f5..796cfa842346 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1851,7 +1851,7 @@ static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) /* Cgroup v1-related declarations */ -#ifdef CONFIG_MEMCG +#ifdef CONFIG_MEMCG_V1 unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, unsigned long *total_scanned); @@ -1883,7 +1883,7 @@ static inline void mem_cgroup_unlock_pages(void) rcu_read_unlock(); } -#else /* CONFIG_MEMCG */ +#else /* CONFIG_MEMCG_V1 */ static inline unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, @@ -1922,6 +1922,6 @@ static inline bool mem_cgroup_oom_synchronize(bool wait) return false; } -#endif /* CONFIG_MEMCG */ +#endif /* CONFIG_MEMCG_V1 */ #endif /* _LINUX_MEMCONTROL_H */ diff --git a/init/Kconfig b/init/Kconfig index febdea2afc3b..5191b6435b4e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -969,6 +969,15 @@ config MEMCG help Provides control over the memory footprint of tasks in a cgroup. +config MEMCG_V1 + bool "Legacy memory controller" + depends on MEMCG + default n + help + Legacy cgroup v1 memory controller. + + San N is unsure. + config MEMCG_KMEM bool depends on MEMCG diff --git a/mm/Makefile b/mm/Makefile index 124d4dea2035..d2915f8c9dc0 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -96,7 +96,8 @@ obj-$(CONFIG_NUMA) += memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o -obj-$(CONFIG_MEMCG) += memcontrol.o memcontrol-v1.o vmpressure.o +obj-$(CONFIG_MEMCG_V1) += memcontrol-v1.o +obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) += swap_cgroup.o endif diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 89d420793048..64b053d7f131 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -75,7 +75,7 @@ unsigned long memcg_page_state_local_output(struct mem_cgroup *memcg, int item); int memory_stat_show(struct seq_file *m, void *v); /* Cgroup v1-specific declarations */ - +#ifdef CONFIG_MEMCG_V1 void memcg1_remove_from_trees(struct mem_cgroup *memcg); static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) @@ -110,4 +110,23 @@ void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s); extern struct cftype memsw_files[]; extern struct cftype mem_cgroup_legacy_files[]; +#else /* CONFIG_MEMCG_V1 */ + +static inline void memcg1_remove_from_trees(struct mem_cgroup *memcg) {} +static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) {} +static inline bool memcg1_wait_acct_move(struct mem_cgroup *memcg) { return false; } +static inline void memcg1_css_offline(struct mem_cgroup *memcg) {} + +static inline bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked) { return true; } +static inline void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked) {} +static inline void memcg1_oom_recover(struct mem_cgroup *memcg) {} + +static inline void memcg1_check_events(struct mem_cgroup *memcg, int nid) {} + +static inline void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) {} + +extern struct cftype memsw_files[]; +extern struct cftype mem_cgroup_legacy_files[]; +#endif /* CONFIG_MEMCG_V1 */ + #endif /* __MM_MEMCONTROL_V1_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c7341e811945..d2e1f8baeae8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4471,18 +4471,20 @@ struct cgroup_subsys memory_cgrp_subsys = { .css_free = mem_cgroup_css_free, .css_reset = mem_cgroup_css_reset, .css_rstat_flush = mem_cgroup_css_rstat_flush, - .can_attach = memcg1_can_attach, #if defined(CONFIG_LRU_GEN) || defined(CONFIG_MEMCG_KMEM) .attach = mem_cgroup_attach, #endif - .cancel_attach = memcg1_cancel_attach, - .post_attach = memcg1_move_task, #ifdef CONFIG_MEMCG_KMEM .fork = mem_cgroup_fork, .exit = mem_cgroup_exit, #endif .dfl_cftypes = memory_files, +#ifdef CONFIG_MEMCG_V1 + .can_attach = memcg1_can_attach, + .cancel_attach = memcg1_cancel_attach, + .post_attach = memcg1_move_task, .legacy_cftypes = mem_cgroup_legacy_files, +#endif .early_init = 0, }; @@ -5653,7 +5655,9 @@ static int __init mem_cgroup_swap_init(void) return 0; WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, swap_files)); +#ifdef CONFIG_MEMCG_V1 WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys, memsw_files)); +#endif #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, zswap_files)); #endif From patchwork Tue Jun 25 00:59:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 13710408 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E8B5C2BD09 for ; Tue, 25 Jun 2024 01:00:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 13AD06B00D9; Mon, 24 Jun 2024 21:00:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BF886B036A; Mon, 24 Jun 2024 20:59:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE2636B0369; Mon, 24 Jun 2024 20:59:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B4D8C6B009A for ; Mon, 24 Jun 2024 20:59:59 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7E389121527 for ; Tue, 25 Jun 2024 00:59:59 +0000 (UTC) X-FDA: 82267604118.20.F79E337 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) by imf25.hostedemail.com (Postfix) with ESMTP id 60FCEA0012 for ; Tue, 25 Jun 2024 00:59:57 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="cUo/vEeX"; spf=pass (imf25.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.186 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719277184; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IUWh34mKvDA5ZX4ELdIfwkO9YFkJ3Clj/+3O87GvqhQ=; b=3u4H64d43w8FAGCvnI5Cyi3icYEMI4YKZg1x2zMu294H11LUg/CKUEUuh1DF4DPxZVqhPx R5hkW7xMIs6+zzJPhaJh+8vpVDosKcJuiA1EpwHbwR777pA9YiuTISCT8wF4ER92GUmipG 841EPjQ5/vXwhW1M5NzQ8xuO0cOWiY0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719277184; a=rsa-sha256; cv=none; b=lo0hYUuXr46R4UPWFmmVHoohKrhgVoAbwwN2DyzojitkFJC8+L/NQhXD55ClH2cxyBybub 7AMGChgPXEyazKf1qow9fw79R/aLPICdTqOBEkEoprjTsRP/Cm2YLz8ayt8RagvBtmMotb PaG5mfrsnk1ertZnqJVYPX28pHbffPo= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="cUo/vEeX"; spf=pass (imf25.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.186 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719277196; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IUWh34mKvDA5ZX4ELdIfwkO9YFkJ3Clj/+3O87GvqhQ=; b=cUo/vEeXYjeXJIY2wMDo5RPQ1Z5aqrE///lMqhQ/e22bOBg2cLi6CMQ29cSj37a4yT3qEg DL2MBC7H/y1neuQrubwTKXQ6XXXymtTO5RkYGmrEqVJC7icFj4khgDg89CwZfQ2yUNBiF6 cpLXh1XL96ClOj+cqZSLEtHEHkRQoCk= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: shakeel.butt@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: roman.gushchin@linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Roman Gushchin Subject: [PATCH v2 14/14] MAINTAINERS: add mm/memcontrol-v1.c/h to the list of maintained files Date: Mon, 24 Jun 2024 17:59:06 -0700 Message-ID: <20240625005906.106920-15-roman.gushchin@linux.dev> In-Reply-To: <20240625005906.106920-1-roman.gushchin@linux.dev> References: <20240625005906.106920-1-roman.gushchin@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 60FCEA0012 X-Stat-Signature: tdt5rh3b87uno3c8zex7bc4er9gkgqgj X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1719277197-188803 X-HE-Meta: U2FsdGVkX1893nmLkeqLMpnsyEN9WLpunYihbl0CK5aMMRS044U2VN55C5fqzZSfBNcDpOtvg/FZE+91VaI1zp04KjYmihHCqzVceXOtlMt/9PSnnT/Y2AnW2nawOjV2hkEySdbtMSMPGgpXqLyxmIi4n4rLzl8dbOKAcGeaUW/eP8SyXcdDiXgYrlRjTMdvYwJerDJLa1ZERzf0GsNAx5GsRMbQ9yuJc5/A+wD8vaf17FePfuvIMhF7jPJXdQbIr8mH9cIwFhfLfK8KHIJ/0YHSgGGx6Sk7/+hDiQ8luiXhcEJlH9V2fHjvH09tZRslaJdutwVmjesigBI+t88p7eAcK7XfwXLCLiMyQuhgXijYxaXeVjYA3lsjyvZ1A2F68Z4Ngvs+PJAT21NjKZaiNY6mn7WxB5eGo9fBxMaXzMJX4IpXQdiMU9pZczKVtv9U/mozLBgBUuCarP7pjXo1GLrhFSfNKikQsWJz1wu48nqomswlWj7lafexRMTmnwF42Lt36pHSWgxEVGgpJ1vBa+nEFNUCTNCLh6UZ0L3YHO9TvKYiq4yecMfpcjhZ2i9r9x3LYiuvBkdVec7v8/BBrjhV8H0syraCLT9bR2C/grgjVal1cs7nD7ZktQpWU0gUQG1CVRjFtNsyqQYmKO0xPT2Z3b/lRdGNvaHj3y8prboP7d42sJcokcqpVq2+/Ouz8USTRXKxTRBqzN/+wi4jt/rVmtpol/dBxI0tgVVaTbNinojERsHj/KeUrL82lm+7Xqjnnr7FF8rMtwvHqK2/ta1jiYJ+4zFtOOrGfL5yVnOmXrq3pR9G/9tKJzwLEYMzMlNyiJBj1BKBIBW46inFdaNgeS/f/YLQela6utkn25F2b06HNom/1zkXYsvxAP/NlDqxv/3oL+J525cRmyrE6YUUwo08mJankb2y/zlpPKjYUrgsXSU435T9OnsZiuQvOGd+lIYmnPevGM4agzz sXJCPrOa iVLeZqeGqpIR0ZLWxt5dNVBbh99XGaU/xP54wynmiedvnPKToIhFoxK61QvJvk095JSP07cXzOU4uC/Ukz+ogPrvef3OyUT52GHM69gVyi+63OIZZJh0SIvrIg96a/8/CYvrQLn0m6M8XujJfL1vPlpnmEVViE6/pRohPbj2N32jMyzw/+DRT/jEeBEPSSA9RRNtk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Signed-off-by: Roman Gushchin --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 7ad96cbb9f28..52a4089746b3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5582,6 +5582,8 @@ L: linux-mm@kvack.org S: Maintained F: include/linux/memcontrol.h F: mm/memcontrol.c +F: mm/memcontrol-v1.c +F: mm/memcontrol-v1.h F: mm/swap_cgroup.c F: samples/cgroup/* F: tools/testing/selftests/cgroup/memcg_protection.m