From patchwork Mon Jul 15 20:36:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Finkel X-Patchwork-Id: 13733897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DA7CC3DA4B for ; Mon, 15 Jul 2024 20:38:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D7FDD6B00B3; Mon, 15 Jul 2024 16:38:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D2F626B00B4; Mon, 15 Jul 2024 16:38:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD0146B00B5; Mon, 15 Jul 2024 16:38:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9D9B16B00B3 for ; Mon, 15 Jul 2024 16:38:47 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3C39CA4503 for ; Mon, 15 Jul 2024 20:38:47 +0000 (UTC) X-FDA: 82343150694.21.03696EC Received: from m47-110.mailgun.net (m47-110.mailgun.net [69.72.47.110]) by imf13.hostedemail.com (Postfix) with ESMTP id 6197C20023 for ; Mon, 15 Jul 2024 20:38:44 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=relay.vimeo.com header.s=mailo header.b="Nn/mhU7D"; dmarc=pass (policy=reject) header.from=vimeo.com; spf=pass (imf13.hostedemail.com: domain of "bounce+ea57f2.9d2a1c-linux-mm=kvack.org@relay.vimeo.com" designates 69.72.47.110 as permitted sender) smtp.mailfrom="bounce+ea57f2.9d2a1c-linux-mm=kvack.org@relay.vimeo.com" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721075896; a=rsa-sha256; cv=none; b=B+vNaxLkIkeTbpb7jr2ItDcTxPNvFpxgr6yC+Gt9oZ9Nchow3UyPMnwt9VyPWLSuuGPPBU o/VpOSzPsba6G1HprQNecbG5OVDac+MjV55xEjQWheMw7u5SjPFxtYMtnb+wRnT/08aBgl +LsXSsoqJ0G+6DdLmxTVU26imDskNS8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=relay.vimeo.com header.s=mailo header.b="Nn/mhU7D"; dmarc=pass (policy=reject) header.from=vimeo.com; spf=pass (imf13.hostedemail.com: domain of "bounce+ea57f2.9d2a1c-linux-mm=kvack.org@relay.vimeo.com" designates 69.72.47.110 as permitted sender) smtp.mailfrom="bounce+ea57f2.9d2a1c-linux-mm=kvack.org@relay.vimeo.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721075896; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HfRPdAcnrMbjlOo0wWrv74jdg37Jc0Mwt1QOfTOfj30=; b=DO8mgYg5WcUCumtyaOp49HyuDQc2EynfEqZBg4hEf1KZNBu10/xy9I3szPmQbhBuGJ9/3o CrXDPk20QmKhmExjOPBa8HX88AceEnGvaesmrOU8hNKMiH7j6JtZqrbLfxCkIdvZxT5hUG yvwyRhMNOPmbgTE2sDXraYKMd4Qj5Tw= DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=relay.vimeo.com; q=dns/txt; s=mailo; t=1721075923; x=1721083123; h=Content-Transfer-Encoding: MIME-Version: References: In-Reply-To: Message-Id: Date: Subject: Subject: Cc: To: To: From: From: Sender: Sender; bh=HfRPdAcnrMbjlOo0wWrv74jdg37Jc0Mwt1QOfTOfj30=; b=Nn/mhU7Dxi9ZNj1vv1jW1lps8t1MegObvKKd9vcVBWXQ7V1LwAynorXUFGyvh3aIzuXov0WyQjZ3wMplv+j4qhOpxJqyQNoC7AK0prgDhmEEkdzLtLXWW9Tgu7E8mfvjzeUs/1CPXdVyifxG49/0+dQw5swJDnPtlPXbTvUZsj8= X-Mailgun-Sending-Ip: 69.72.47.110 X-Mailgun-Sid: WyI5NTRmYiIsImxpbnV4LW1tQGt2YWNrLm9yZyIsIjlkMmExYyJd Received: from smtp.vimeo.com (215.71.185.35.bc.googleusercontent.com [35.185.71.215]) by 5c992e5e8abf with SMTP id 669588d2a9a2764e22567f1c (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Mon, 15 Jul 2024 20:38:42 GMT Received: from nutau (gke-sre-us-east1-main-7f6ba6de-c8sv.c.vimeo-core.internal [10.56.27.202]) by smtp.vimeo.com (Postfix) with ESMTP id BA2266011D; Mon, 15 Jul 2024 20:38:42 +0000 (UTC) Received: by nutau (Postfix, from userid 1001) id 96787B408B8; Mon, 15 Jul 2024 16:38:42 -0400 (EDT) From: David Finkel To: Muchun Song , Andrew Morton Cc: core-services@vimeo.com, Jonathan Corbet , Michal Hocko , Roman Gushchin , Shakeel Butt , Shuah Khan , Johannes Weiner , Tejun Heo , Zefan Li , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, David Finkel Subject: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers Date: Mon, 15 Jul 2024 16:36:26 -0400 Message-Id: <20240715203625.1462309-2-davidf@vimeo.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240715203625.1462309-1-davidf@vimeo.com> References: <20240715203625.1462309-1-davidf@vimeo.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 6197C20023 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: j9fah5nwix5s165on88p5e5d5cmogrcx X-HE-Tag: 1721075924-265500 X-HE-Meta: U2FsdGVkX18ylgid+V1X4eOGpTqcfwqqwArODJET26aZHLfuM/VsSc6IUkU3QrBRq8UuEjxKwe7GT6KF/dkzc5shaKnlFK4WxF667NNna1qUiMu3OQqXy49DzkRw1Ku+KHoxyfHq6+JDvxJFDONmY8FkQG1U8LZFwEF8RKDVoS4r/In731U6r92EMkedXte5zFY0FJJPYvRGRP2Ta/G0pcuQUM9qu3uRcb+2phl617EMMuvvUl5v7dBffQVNw7/LAqT+gmjJE64PaW2x7FiQljGL2PzY+MRQgXvsQj6TnC+m9ocKoYrOodOi8bAYi/zg/bqorxf1zGR0bPkB3OOBRD0h7VnMUSzYqIjcV04G8T0HFqwf5F4DDPxWD1i4s+yJh38casZb0B9Xrz1kkduNABGIdpX2qfxrUL7PwsCr5cXmlCPlZ30sPyC5ZNYIZSz60nNea+8vmQTewnmGcCsKO8U5/UIT9eq0Wznyv6iPY04y7re7fx2zs1r6HWppBcXWQZRalkJxVqsn8IsU82ZWaADOl0Ef13Xkek93ALD6KfdtTtX+7g+xQTWOyBMXMz1Y9Ti6XiDlK99qbtHryNOGcADF1b1QXguVOC6qEcjs1w0uZJpstqUN9pAlWOh/fdtr9tWGJeAo38kvwHExlGbIkmqPShtRuKOMwMyq7t1MdS34/mf0OvxwAOpwhZ/5CTvUnB+G9t8HkzveMP4pHksnaOaXGrJaND42vH68SuPYgJrFnbmSP98uvPLBeyDRak7hRG5gO/A865BOoQN56+dNkP48PeaPCeKms0OLHwP5DuBmDS5p/5R9P5iICIlFjRx/w6WDKMFUipnr0DQ43J7oySkch56Ic0ptcwPAOUzoto6Tkv8HRl3w0UIDPtFnFQcWyL+0jJGgK9L9xfC3IpsvsizWZIHbAH4j4ZwKD2HwdtZBbBv3dfwKeB4LCkGeQTXzjbnM70EoPamWybnfM9S x4Iak0OG rKt2l+1GYjsD137ML3dMxb6n+djMUAk5hpOpZ7zalNTP9hEdVOP9dtWZdA6slbR91Lyi0IlbbiSiY4C6YV85fcqdKaaJictcSQLMxGmTKrSQscA71hJ/g1wddXvh2VECnUHLwXI2VpOtQRaosQCHWffVS4xqDj9NIHkuZYnUTwvAgqDVWA5VYhsAc9RFdT2GEo81XYkHAtw75BpR8NiML5galAJq3CWJtSy9DCVtcWApdo70sh1+PAF80MuJXiI3aHZ8AVmhM23GHwLYqGGiP9TLLbh5Gnx0q+MqBcfJMmlwhIFhtXag5oKGwxZyDgci2CqFfyZffIui0/xYNZo4w8ldpsAqys/4NIhspUjet+7XIoQKbTxbHidD26/eL4zRGZ458pWZVV/iW5BesK5qaCu0L4l1rfUyhlmcTN5n9nlofiVDuW/Vp29hoNkwQzStM1bGWRV9S10XgrBK3x+xWMFFP2wfjnN725qv9nLYKBR4d3kg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Other mechanisms for querying the peak memory usage of either a process or v1 memory cgroup allow for resetting the high watermark. Restore parity with those mechanisms. For example: - Any write to memory.max_usage_in_bytes in a cgroup v1 mount resets the high watermark. - writing "5" to the clear_refs pseudo-file in a processes's proc directory resets the peak RSS. This change copies the cgroup v1 behavior so any write to the memory.peak and memory.swap.peak pseudo-files reset the high watermark to the current usage. This behavior is particularly useful for work scheduling systems that need to track memory usage of worker processes/cgroups per-work-item. Since memory can't be squeezed like CPU can (the OOM-killer has opinions), these systems need to track the peak memory usage to compute system/container fullness when binpacking workitems. Signed-off-by: David Finkel Acked-by: Michal Hocko --- Documentation/admin-guide/cgroup-v2.rst | 20 +++--- mm/memcontrol.c | 23 ++++++ .../selftests/cgroup/test_memcontrol.c | 72 ++++++++++++++++--- 3 files changed, 99 insertions(+), 16 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 8fbb0519d556..201d8e5d9f82 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1322,11 +1322,13 @@ PAGE_SIZE multiple when read back. reclaim induced by memory.reclaim. memory.peak - A read-only single value file which exists on non-root - cgroups. + A read-write single value file which exists on non-root cgroups. + + The max memory usage recorded for the cgroup and its descendants since + either the creation of the cgroup or the most recent reset. - The max memory usage recorded for the cgroup and its - descendants since the creation of the cgroup. + Any non-empty write to this file resets it to the current memory usage. + All content written is completely ignored. memory.oom.group A read-write single value file which exists on non-root @@ -1652,11 +1654,13 @@ PAGE_SIZE multiple when read back. Healthy workloads are not expected to reach this limit. memory.swap.peak - A read-only single value file which exists on non-root - cgroups. + A read-write single value file which exists on non-root cgroups. + + The max swap usage recorded for the cgroup and its descendants since + the creation of the cgroup or the most recent reset. - The max swap usage recorded for the cgroup and its - descendants since the creation of the cgroup. + Any non-empty write to this file resets it to the current swap usage. + All content written is completely ignored. memory.swap.max A read-write single value file which exists on non-root diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8f2f1bb18c9c..abfa547615d6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -25,6 +25,7 @@ * Copyright (C) 2020 Alibaba, Inc, Alex Shi */ +#include #include #include #include @@ -6915,6 +6916,16 @@ static u64 memory_peak_read(struct cgroup_subsys_state *css, return (u64)memcg->memory.watermark * PAGE_SIZE; } +static ssize_t memory_peak_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + page_counter_reset_watermark(&memcg->memory); + + return nbytes; +} + static int memory_min_show(struct seq_file *m, void *v) { return seq_puts_memcg_tunable(m, @@ -7232,6 +7243,7 @@ static struct cftype memory_files[] = { .name = "peak", .flags = CFTYPE_NOT_ON_ROOT, .read_u64 = memory_peak_read, + .write = memory_peak_write, }, { .name = "min", @@ -8201,6 +8213,16 @@ static u64 swap_peak_read(struct cgroup_subsys_state *css, return (u64)memcg->swap.watermark * PAGE_SIZE; } +static ssize_t swap_peak_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + page_counter_reset_watermark(&memcg->swap); + + return nbytes; +} + static int swap_high_show(struct seq_file *m, void *v) { return seq_puts_memcg_tunable(m, @@ -8283,6 +8305,7 @@ static struct cftype swap_files[] = { .name = "swap.peak", .flags = CFTYPE_NOT_ON_ROOT, .read_u64 = swap_peak_read, + .write = swap_peak_write, }, { .name = "swap.events", diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 41ae8047b889..681972de673b 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -161,12 +161,12 @@ static int alloc_pagecache_50M_check(const char *cgroup, void *arg) /* * This test create a memory cgroup, allocates * some anonymous memory and some pagecache - * and check memory.current and some memory.stat values. + * and checks memory.current, memory.peak, and some memory.stat values. */ -static int test_memcg_current(const char *root) +static int test_memcg_current_peak(const char *root) { int ret = KSFT_FAIL; - long current; + long current, peak, peak_reset; char *memcg; memcg = cg_name(root, "memcg_test"); @@ -180,12 +180,32 @@ static int test_memcg_current(const char *root) if (current != 0) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak != 0) + goto cleanup; + if (cg_run(memcg, alloc_anon_50M_check, NULL)) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak < MB(50)) + goto cleanup; + + peak_reset = cg_write(memcg, "memory.peak", "\n"); + if (peak_reset != 0) + goto cleanup; + + peak = cg_read_long(memcg, "memory.peak"); + if (peak > MB(30)) + goto cleanup; + if (cg_run(memcg, alloc_pagecache_50M_check, NULL)) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak < MB(50)) + goto cleanup; + ret = KSFT_PASS; cleanup: @@ -817,13 +837,14 @@ static int alloc_anon_50M_check_swap(const char *cgroup, void *arg) /* * This test checks that memory.swap.max limits the amount of - * anonymous memory which can be swapped out. + * anonymous memory which can be swapped out. Additionally, it verifies that + * memory.swap.peak reflects the high watermark and can be reset. */ -static int test_memcg_swap_max(const char *root) +static int test_memcg_swap_max_peak(const char *root) { int ret = KSFT_FAIL; char *memcg; - long max; + long max, peak; if (!is_swap_enabled()) return KSFT_SKIP; @@ -840,6 +861,12 @@ static int test_memcg_swap_max(const char *root) goto cleanup; } + if (cg_read_long(memcg, "memory.swap.peak")) + goto cleanup; + + if (cg_read_long(memcg, "memory.peak")) + goto cleanup; + if (cg_read_strcmp(memcg, "memory.max", "max\n")) goto cleanup; @@ -862,6 +889,27 @@ static int test_memcg_swap_max(const char *root) if (cg_read_key_long(memcg, "memory.events", "oom_kill ") != 1) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak < MB(29)) + goto cleanup; + + peak = cg_read_long(memcg, "memory.swap.peak"); + if (peak < MB(29)) + goto cleanup; + + if (cg_write(memcg, "memory.swap.peak", "\n")) + goto cleanup; + + if (cg_read_long(memcg, "memory.swap.peak") > MB(10)) + goto cleanup; + + + if (cg_write(memcg, "memory.peak", "\n")) + goto cleanup; + + if (cg_read_long(memcg, "memory.peak")) + goto cleanup; + if (cg_run(memcg, alloc_anon_50M_check_swap, (void *)MB(30))) goto cleanup; @@ -869,6 +917,14 @@ static int test_memcg_swap_max(const char *root) if (max <= 0) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak < MB(29)) + goto cleanup; + + peak = cg_read_long(memcg, "memory.swap.peak"); + if (peak < MB(19)) + goto cleanup; + ret = KSFT_PASS; cleanup: @@ -1295,7 +1351,7 @@ struct memcg_test { const char *name; } tests[] = { T(test_memcg_subtree_control), - T(test_memcg_current), + T(test_memcg_current_peak), T(test_memcg_min), T(test_memcg_low), T(test_memcg_high), @@ -1303,7 +1359,7 @@ struct memcg_test { T(test_memcg_max), T(test_memcg_reclaim), T(test_memcg_oom_events), - T(test_memcg_swap_max), + T(test_memcg_swap_max_peak), T(test_memcg_sock), T(test_memcg_oom_group_leaf_events), T(test_memcg_oom_group_parent_events),