From patchwork Thu Sep 26 22:55:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ivan Shapovalov X-Patchwork-Id: 13813723 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9886ECCFA17 for ; Thu, 26 Sep 2024 22:55:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D4066B009E; Thu, 26 Sep 2024 18:55:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 25D0D6B00A1; Thu, 26 Sep 2024 18:55:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D6C36B00A2; Thu, 26 Sep 2024 18:55:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E0D9B6B009E for ; Thu, 26 Sep 2024 18:55:42 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 57FB54069C for ; Thu, 26 Sep 2024 22:55:42 +0000 (UTC) X-FDA: 82608398124.30.D4E7BAF Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) by imf18.hostedemail.com (Postfix) with ESMTP id 648AB1C0006 for ; Thu, 26 Sep 2024 22:55:40 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intelfx.name header.s=google header.b=VJUjrsAS; spf=pass (imf18.hostedemail.com: domain of intelfx@intelfx.name designates 209.85.208.181 as permitted sender) smtp.mailfrom=intelfx@intelfx.name; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727391218; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=2qUguF1WrTtWK/hZPra6odgRc7M73fCw096rk09LDH0=; b=r1asNZrDD7vv9jKPy37d6UPi7TmvCZf1GvgaK8Hof9WRTUhh9Hhj19jQ2FDvDONqz7OcDp 7a5YtC75g+3VTJenMWPt5/zJ1BarI8scLUVNxoC6ZwTIus8hTckb+KoEv6G0kGDjet8rNp PiE7tABKIeaBDZ0s2MlRsDfw75mig44= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727391218; a=rsa-sha256; cv=none; b=FZxUUqShqQ33fYOsP4BSoW0EGEFCN0A72vT606KO6HLg84hqYP/ABd226ts+6qSE3Wk/Yt vplBmCXYRGhEgsegnWvLKDyFECde1hWHAO1Ng2x9xWsGtgKln8ykfOOAwHFVSUGrgJnbti MP10VL+k2+2OSLHWGkHzJHx80nQ39xY= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=intelfx.name header.s=google header.b=VJUjrsAS; spf=pass (imf18.hostedemail.com: domain of intelfx@intelfx.name designates 209.85.208.181 as permitted sender) smtp.mailfrom=intelfx@intelfx.name; dmarc=none Received: by mail-lj1-f181.google.com with SMTP id 38308e7fff4ca-2f77be8ffecso18082371fa.1 for ; Thu, 26 Sep 2024 15:55:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intelfx.name; s=google; t=1727391338; x=1727996138; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=2qUguF1WrTtWK/hZPra6odgRc7M73fCw096rk09LDH0=; b=VJUjrsASYLNc47j7s1ritqp4n3noeDK8ebPIXl1YRhNhgZCAEXgoFrg7qvCQGxF3+3 qcinCtJqB1rEovHVuUxETcDCDDWAuV0tf8PIdjpavDDw1sW9PSImbFTsS2J9koFw7eYm 8shJc5E5ssrUHTvw0EGQpc161ZoUp7cGiy/pE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727391338; x=1727996138; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=2qUguF1WrTtWK/hZPra6odgRc7M73fCw096rk09LDH0=; b=i0pEJkLsOCoybjOzYt9VdZUFdp54NSVMy60bspgPe05rj/dT12p3ijArfTm175xBGD dbAdFIXVSLYjz5EhASwaocf/zEw6zJEP92XR6Cv1XVOmci8nRocJVIQ4jNOzgTcB1ei8 zSB0x4xjuuXFy62GaQB/sioeF/XWKtk3RxmKBPQ/a/yiXaFosssMgG8Qcyv3pT1LXi+4 kYf236bkGqvVsRrdK1lhAL5wMiotZff9Ztg9jSq19v5Ba6eAwGh92MSa5OVcXm41QaMr 3AIt0GJt95pJH8QbaVcjFI8AT1se1tXaOjJ0areuE474icqGozJdJnh3GxseissWK2bm 9EqQ== X-Forwarded-Encrypted: i=1; AJvYcCWA0N6Aq11Lsd2SffCrT7EeBq+PDQWf/0OBtnR2FMR9/smDsAwxPVD6uiM1li+dgcVt5wAVZKEbfg==@kvack.org X-Gm-Message-State: AOJu0YxPur9AONmuTQP/92VpyzbGCHkPQRVeEE4roseEf9AgUD1xnT8a usMKRqZBW3a6EmD5OpTTefX9zWzUD/vD00DHb9rOD4aoMSFOHXZ6kZ3zfqU91Ao= X-Google-Smtp-Source: AGHT+IGwUMWdUp0o0aHg791Fonu8FwXG9GKopXkJI/AU2LTdR0oYW3XsDjbJHS2DiFmo0bVpxWj2Lg== X-Received: by 2002:a2e:4e0a:0:b0:2f7:4c9d:7a83 with SMTP id 38308e7fff4ca-2f9d4197b0amr5428481fa.40.1727391338181; Thu, 26 Sep 2024 15:55:38 -0700 (PDT) Received: from able.tailbefcf.ts.net ([188.129.244.140]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-2f9d45d789fsm990061fa.37.2024.09.26.15.55.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Sep 2024 15:55:37 -0700 (PDT) From: Ivan Shapovalov To: linux-kernel@vger.kernel.org Cc: Mike Yuan , Ivan Shapovalov , Tejun Heo , Zefan Li , Johannes Weiner , =?utf-8?q?Michal_Koutn=C3=BD?= , Jonathan Corbet , Yosry Ahmed , Nhat Pham , Chengming Zhou , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Chris Li , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH] zswap: improve memory.zswap.writeback inheritance Date: Fri, 27 Sep 2024 00:55:28 +0200 Message-ID: <20240926225531.700742-1-intelfx@intelfx.name> X-Mailer: git-send-email 2.46.1.5.g6e13a7cc9d MIME-Version: 1.0 X-Stat-Signature: e447ckh68azh6hxpohw348ahxkkzgye7 X-Rspamd-Queue-Id: 648AB1C0006 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1727391340-674029 X-HE-Meta: U2FsdGVkX19XZti1CFq9CIamXxEFYLcPXnkqTdLIttXLJJ2J0RzDzRqpNcNMaVxnbTWGTTaM0nN3DP5hggoe86YdcHBKKJcqIDoGNMw25UCHLpOC+9GrWuuw67Zyv0GMu/s7EEHCd0QkOjqilOSQ7FDNLPKDkASDGRWVF4bsWHFAd2x5rTosw4992psEwlgwc5ldNXzS0tVeSY8WiwxalLsSfgIznAwc7AGTT+7gFpTTrX1A/Hf+l6DgF6cf+sauTte9Ms9hXHv7dV+xwPLPf3mDmqKGqpBy3eR+YZooRyWhRmroxgFNJgUogDI6w/zsj1gvL4QE+M3F5pqEfNiYe1nIqrWwUFJpELs2VXLcNcCznpk2yo0WjZGV8EfyaZ8+Lwe7t/4zNrhMvIXo/NCTUYUFWG1msH53PnBgWkIllQ4So5nsAwkVeL7npTI7ZwKKIP+Euq7M+7tj0uUf5MfrYpDAkppLtQwIS35GS1j8gn5GaoJJOSW2CJSjxlCQlvwA8wFzQQGAEScqvb03z9hf8VhTejazpNa/8kkz/s/CUt4iz49/Yjv1hi98MLHK1JZYP9+92jFu0/KubRU8yAFWsmvypQl4+Ck3P3SDApO+byGBl3unOJ9GmadcYEjgasttmFKLFM6amyXYzA2BcmLukxFNS3+rY4AcKwrX6YzOPRl6V33C2hCvIMWNKNN+HApsO9aIEMdKYo19mf/EqkccsCO5Z8/Sgy+6bvCm1nvKuBv3CdorRamQ8zbgGN/TIZvlhH7mPAUdLu/u6VdPbQcheDE3uCsD3lkIIO3J1j0KYNImOwl6iHLpQpSDmPRqFxvG4wB9mO/Ddy9qKuBZi4/NuazrBdG/Cy7xp/2LKdWU5xzORDre4awugS1ijFwii0pywzruv1+0mRahuw+QsZ2dP9KcgkspnTOfG+wULTftdomalzxdO4bLHYBMYEDMTUaVGzASPm9KgTQhpwPMCOP FPTCLJ85 /n/dUoUqUJcprmeA9fAUdAWFMRS+eewiUE/dpyi4IGggEv624hrrWQSbBhD9nncif3WHJfRpxXtXeDTlSTs4giZwRW9Xh3Wocd9kmsYnYL8CLVVo4WTG+fFCivy2In6KEosZaygTfHEtTA85uQy3dqzU91i05izUFDkBzdKZodN8A+AaKTZACFo9tbMqoiTWKheMz9XM3rFeKsMbT9tKCkpk+jhK6kJ4EkAX2WwFGhei8+qRS3yKYzhAVR/5S4lEB7YnkBNXMNPvy+gv+3kbVTKq4LL0hWmbssuVfUGT5yi1v8V2IQBMIuX5+XX/UhNoMEZhgwNy4wyKo6M+00GhncCuRfg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Improve the inheritance behavior of the `memory.zswap.writeback` cgroup attribute introduced during the 6.11 cycle. Specifically, in 6.11 we walk the parent cgroups until we find a _disabled_ writeback, which does not allow the user to selectively enable zswap writeback while having it disabled by default. Instead, introduce a third value for the `memory.zswap.writeback` cgroup attribute meaning "follow the parent", and use it as the default value for all cgroups. Additionally, introduce a `zswap.writeback_disable` module parameter which is used as the "parent" of the root cgroup, thus making it the global default. Reads from `memory.zswap.writeback` cgroup attribute will be coerced to 0 or 1 to maintain backwards compatilibity. However, it is possible to write -1 to that attribute to make the cgroup follow the parent again. Fixes: e39925734909 ("mm/memcontrol: respect zswap.writeback setting from parent cg too") Fixes: 501a06fe8e4c ("zswap: memcontrol: implement zswap writeback disabling") Signed-off-by: Ivan Shapovalov --- Documentation/admin-guide/cgroup-v2.rst | 17 +++++++++++------ Documentation/admin-guide/mm/zswap.rst | 9 ++++++++- include/linux/memcontrol.h | 3 ++- include/linux/zswap.h | 6 ++++++ mm/memcontrol.c | 24 +++++++++++++++++------- mm/zswap.c | 9 +++++++++ 6 files changed, 53 insertions(+), 15 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 95c18bc17083..eea580490679 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1717,10 +1717,12 @@ The following nested keys are defined. entries fault back in or are written out to disk. memory.zswap.writeback - A read-write single value file. The default value is "1". - Note that this setting is hierarchical, i.e. the writeback would be - implicitly disabled for child cgroups if the upper hierarchy - does so. + A read-write single value file. The default is to follow the parent + cgroup configuration, and the root cgroup follows the global + ``zswap.writeback_enabled`` module parameter (which is 1 by default). + Thus, this setting is hierarchical, i.e. the writeback setting for + a child cgroup can be implicitly controlled at runtime by changing + any parent value or the global module parameter. When this is set to 0, all swapping attempts to swapping devices are disabled. This included both zswap writebacks, and swapping due @@ -1729,8 +1731,11 @@ The following nested keys are defined. reclaim inefficiency after disabling writeback (because the same pages might be rejected again and again). - Note that this is subtly different from setting memory.swap.max to - 0, as it still allows for pages to be written to the zswap pool. + Note that this is different from setting memory.swap.max to 0, + as it still allows for pages to be written to the zswap pool. + + This can also be set to -1, which would make the cgroup (and its + future children) follow the parent/global value again. memory.pressure A read-only nested-keyed file. diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst index 3598dcd7dbe7..20267b8893db 100644 --- a/Documentation/admin-guide/mm/zswap.rst +++ b/Documentation/admin-guide/mm/zswap.rst @@ -126,10 +126,17 @@ Setting this parameter to 100 will disable the hysteresis. Some users cannot tolerate the swapping that comes with zswap store failures and zswap writebacks. Swapping can be disabled entirely (without disabling -zswap itself) on a cgroup-basis as follows:: +zswap itself) either globally, using the ``writeback_enabled`` sysfs attribute, +or on a per-cgroup basis, e.g.:: + + echo 0 > /sys/module/zswap/parameters/writeback_enabled echo 0 > /sys/fs/cgroup//memory.zswap.writeback +All cgroups follow (i.e. dynamically inherit) the parent configuration +by default, and the root cgroup follows the module parameter (which can +thus be considered the global default). + Note that if the store failures are recurring (for e.g if the pages are incompressible), users can observe reclaim inefficiency after disabling writeback (because the same pages might be rejected again and again). diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0e5bf25d324f..ca0510057040 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -202,8 +202,9 @@ struct mem_cgroup { /* * Prevent pages from this memcg from being written back from zswap to * swap, and from being swapped out on zswap store failures. + * (< 0: follow the parent/global default) */ - bool zswap_writeback; + int zswap_writeback; #endif /* vmpressure notifications */ diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 6cecb4a4f68b..7d121fdb3521 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -37,6 +37,7 @@ void zswap_lruvec_state_init(struct lruvec *lruvec); void zswap_folio_swapin(struct folio *folio); bool zswap_is_enabled(void); bool zswap_never_enabled(void); +bool zswap_writeback_is_enabled(void); #else struct zswap_lruvec_state {}; @@ -71,6 +72,11 @@ static inline bool zswap_never_enabled(void) return true; } +static inline bool zswap_writeback_is_enabled(void) +{ + return true; +} + #endif #endif /* _LINUX_ZSWAP_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d563fb515766..1e0aca42e5a7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3613,7 +3613,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) memcg1_soft_limit_reset(memcg); #ifdef CONFIG_ZSWAP memcg->zswap_max = PAGE_COUNTER_MAX; - WRITE_ONCE(memcg->zswap_writeback, true); + WRITE_ONCE(memcg->zswap_writeback, -1); #endif page_counter_set_high(&memcg->swap, PAGE_COUNTER_MAX); if (parent) { @@ -5318,15 +5318,25 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size) bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg) { + int memcg_zswap_writeback; + /* if zswap is disabled, do not block pages going to the swapping device */ if (!zswap_is_enabled()) return true; - for (; memcg; memcg = parent_mem_cgroup(memcg)) - if (!READ_ONCE(memcg->zswap_writeback)) - return false; + /* + * -1 means "follow the parent" (root cgroup follows the global default). + * Walk cgroups until we find something, otherwise return the global default. + */ + for (; memcg; memcg = parent_mem_cgroup(memcg)) { + memcg_zswap_writeback = READ_ONCE(memcg->zswap_writeback); + if (memcg_zswap_writeback >= 0) + goto found; + } + return zswap_writeback_is_enabled(); - return true; +found: + return !!memcg_zswap_writeback; } static u64 zswap_current_read(struct cgroup_subsys_state *css, @@ -5365,7 +5375,7 @@ static int zswap_writeback_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - seq_printf(m, "%d\n", READ_ONCE(memcg->zswap_writeback)); + seq_printf(m, "%d\n", mem_cgroup_zswap_writeback_enabled(memcg)); return 0; } @@ -5379,7 +5389,7 @@ static ssize_t zswap_writeback_write(struct kernfs_open_file *of, if (parse_ret) return parse_ret; - if (zswap_writeback != 0 && zswap_writeback != 1) + if (zswap_writeback < -1 || zswap_writeback > 1) return -EINVAL; WRITE_ONCE(memcg->zswap_writeback, zswap_writeback); diff --git a/mm/zswap.c b/mm/zswap.c index adeaf9c97fde..724d8a02d61c 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -129,6 +129,10 @@ static bool zswap_shrinker_enabled = IS_ENABLED( CONFIG_ZSWAP_SHRINKER_DEFAULT_ON); module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644); +/* Enable/disable zswap writeback globally */ +static bool zswap_writeback_enabled = true; +module_param_named(writeback_enabled, zswap_writeback_enabled, bool, 0644); + bool zswap_is_enabled(void) { return zswap_enabled; @@ -139,6 +143,11 @@ bool zswap_never_enabled(void) return !static_branch_maybe(CONFIG_ZSWAP_DEFAULT_ON, &zswap_ever_enabled); } +bool zswap_writeback_is_enabled(void) +{ + return zswap_writeback_enabled; +} + /********************************* * data structures **********************************/