From patchwork Sat Dec 3 01:11:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 13063391 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66ACCC4321E for ; Sat, 3 Dec 2022 01:11:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C4D806B0071; Fri, 2 Dec 2022 20:11:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BD6F26B0072; Fri, 2 Dec 2022 20:11:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A50286B0073; Fri, 2 Dec 2022 20:11:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 911A86B0071 for ; Fri, 2 Dec 2022 20:11:36 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5F42EAAB01 for ; Sat, 3 Dec 2022 01:11:36 +0000 (UTC) X-FDA: 80199217392.26.0A13018 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf21.hostedemail.com (Postfix) with ESMTP id 10F6D1C0007 for ; Sat, 3 Dec 2022 01:11:35 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="MeZG/Gqx"; spf=pass (imf21.hostedemail.com: domain of 3R6KKYwsKCNg4FG4MLSGCH4AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--almasrymina.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3R6KKYwsKCNg4FG4MLSGCH4AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670029896; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=/nFR+wx3fDh2jF+K9/Q6SHeUTNnF6C4ZtCGpq7JUDE0=; b=7cWvt1t9l0lDOTwdodQ2FHZE2W8W+80RvoNdAwTM7vK3UEHDnpGSK7w9pizpUa6L9K1cvj dNvVuRPT3GTQiKm7fAR2JV1cW6snqwvImRLsTn2WFiz8JfUZZEbUcTQW30dBg7fmo3d1AD NqGZPJVmQyRZJ/NTtBEFkjrjETpEL+w= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="MeZG/Gqx"; spf=pass (imf21.hostedemail.com: domain of 3R6KKYwsKCNg4FG4MLSGCH4AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--almasrymina.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3R6KKYwsKCNg4FG4MLSGCH4AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670029896; a=rsa-sha256; cv=none; b=3GxL6G5kOwnIoaeWEdgdCEfY9pNVS7a063d8Fla6VbOc9iapdnp1PLd+qitMO/fBJ47LQR y1BjF9PtRiTxTZfFv4i2b1TLQ4VQnYLn3Br8bZaDrnx+QM+7zCWHvcjhkEVnjtdXMTbu2D W7PTiV/GsVMCTYGYJYdcmoa0Yd9ID/0= Received: by mail-yb1-f202.google.com with SMTP id y133-20020a25328b000000b006f997751950so6809461yby.7 for ; Fri, 02 Dec 2022 17:11:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=/nFR+wx3fDh2jF+K9/Q6SHeUTNnF6C4ZtCGpq7JUDE0=; b=MeZG/Gqxyp/tTgTVrJI9XN6sIrGfZKGsvwpnx4AHGwBdG+OWLYFHLJhw0cd8K8Ef+0 yG81P7JvXIEbTXa1zJvjOPe4Bbs36R3mIFG4v1l3Bq++h8W3bHcV/drAXV3VmlO7KD5e IIwSUyBd6u/VJXBWUAO1/RGXFgsXcdq3canItaKNtl+Ff7RMbc+r5pkcmwyPD2/LQCLI 2gu8lfvDfWwUOlxiX1mdcXb+hbaX/xI94k3LOAvwt98TVZ+BL4e7jFpYYlFFMuWLljQi Dmhi562LLY4/Y1igpOaRFei7nukuzmBdpzsOcB6wB0yQMJBXMGxz6p3fU5Nes+fXgea9 vtsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=/nFR+wx3fDh2jF+K9/Q6SHeUTNnF6C4ZtCGpq7JUDE0=; b=C124u1kssm7P8a2VrlKSHKOLvPa182SzoOkiifvEyvdpyEVVAAiRoUC9fSToByCSr/ rxIPbc8F3jaPEKW3AvtjLSIQARYmPUBSmaxQHGqTEOwdRzXjDctKFOLzoGjAB19OAzQr vNrc0LrxTcFhfytOgy2T/cArAfzk5FOC+VYWnDPrvm/qqKKI8asEQAwuzsEgpPAyIbrX 7s6JW4HnWggs0NnCF+ejRpszm5aMEf5sswHeoIuWyaEfX3WCRP6BUU1X1BY+VxSLWn9l o0KYLMLhavVQ2TfjucRgdgASToHoI13fjSniMdPFoeAOOvj/0Ot0gkWeSvK1abLQabn+ iZSA== X-Gm-Message-State: ANoB5pn5UEHt6QrdQgg4uF3s2FZV7Kj4IepvqwKeGFcpMt183Ux21V/0 THT8l0mShUjJTxdjSchDSf5Fa7nigff1gdBEYw== X-Google-Smtp-Source: AA0mqf6E37hjDbRWoBovKVGFhOxSdBTvuUVJhtHQ9SzNJkbMeXEp28FeVH4fAGLQ+4QVoYUWusCkN8UEjWGYGUfPPg== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2d4:203:e655:31e2:2ad4:2421]) (user=almasrymina job=sendgmr) by 2002:a25:3504:0:b0:6ee:984b:3d08 with SMTP id c4-20020a253504000000b006ee984b3d08mr52761857yba.116.1670029895188; Fri, 02 Dec 2022 17:11:35 -0800 (PST) Date: Fri, 2 Dec 2022 17:11:19 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.39.0.rc0.267.gcb52ba06e7-goog Message-ID: <20221203011120.2361610-1-almasrymina@google.com> Subject: [PATCH v1] [mm-unstable] mm: Fix memcg reclaim on memory tiered systems From: Mina Almasry To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Huang Ying , Yang Shi , Yosry Ahmed , weixugc@google.com, fvdl@google.com, Mina Almasry , linux-mm@kvack.org, linux-kernel@vger.kernel.org X-Spamd-Result: default: False [0.49 / 9.00]; MID_CONTAINS_FROM(1.00)[]; DMARC_POLICY_ALLOW(-0.50)[google.com,reject]; MV_CASE(0.50)[]; BAYES_HAM(-0.41)[71.54%]; FORGED_SENDER(0.30)[almasrymina@google.com,3R6KKYwsKCNg4FG4MLSGCH4AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--almasrymina.bounces.google.com]; R_DKIM_ALLOW(-0.20)[google.com:s=20210112]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; RCVD_NO_TLS_LAST(0.10)[]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_TWELVE(0.00)[14]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_NEQ_ENVFROM(0.00)[almasrymina@google.com,3R6KKYwsKCNg4FG4MLSGCH4AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--almasrymina.bounces.google.com]; FROM_HAS_DN(0.00)[]; DKIM_TRACE(0.00)[google.com:+]; TO_DN_SOME(0.00)[]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1]; PREVIOUSLY_DELIVERED(0.00)[linux-mm@kvack.org]; ARC_NA(0.00)[] X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 10F6D1C0007 X-Stat-Signature: ix3786e8xfr3pcer5qx9pkso7da61s4e X-HE-Tag: 1670029895-656433 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: commit 3f1509c57b1b ("Revert "mm/vmscan: never demote for memcg reclaim"") enabled demotion in memcg reclaim, which is the right thing to do, however, I suspect it introduced a regression in the behavior of try_to_free_mem_cgroup_pages(). The callers of try_to_free_mem_cgroup_pages() expect it to attempt to reclaim - not demote - nr_pages from the cgroup. I.e. the memory usage of the cgroup should reduce by nr_pages. The callers expect try_to_free_mem_cgroup_pages() to also return the number of pages reclaimed, not demoted. However, what try_to_free_mem_cgroup_pages() actually does is it unconditionally counts demoted pages as reclaimed pages. So in practice when it is called it will often demote nr_pages and return the number of demoted pages to the caller. Demoted pages don't lower the memcg usage, and so I think try_to_free_mem_cgroup_pages() is not actually doing what the callers want it to do. I suspect various things work suboptimally on memory systems or don't work at all due to this: - memory.high enforcement likely doesn't work (it just demotes nr_pages instead of lowering the memcg usage by nr_pages). - try_charge_memcg() will keep retrying the charge while try_to_free_mem_cgroup_pages() is just demoting pages and not actually making any room for the charge. - memory.reclaim has a wonky interface. It advertises to the user it reclaims the provided amount but it will actually demote that amount. There may be more effects to this issue. To fix these issues I propose shrink_folio_list() to only count pages demoted from inside of sc->nodemask to outside of sc->nodemask as 'reclaimed'. For callers such as reclaim_high() or try_charge_memcg() that set sc->nodemask to NULL, try_to_free_mem_cgroup_pages() will try to actually reclaim nr_pages and return the number of pages reclaimed. No demoted pages would count towards the nr_pages requirement. For callers such as memory_reclaim() that set sc->nodemask, try_to_free_mem_cgroup_pages() will free nr_pages from that nodemask with either reclaim or demotion. Tested this change using memory.reclaim interface. With this change, echo "1m" > memory.reclaim Will cause freeing of 1m of memory from the cgroup regardless of the demotions happening inside. echo "1m nodes=0" > memory.reclaim Will cause freeing of 1m of node 0 by demotion if a demotion target is available, and by reclaim if no demotion target is available. Signed-off-by: Mina Almasry --- This is developed on top of mm-unstable largely because I need the memory.reclaim nodes= arg to test it properly. --- mm/vmscan.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) -- 2.39.0.rc0.267.gcb52ba06e7-goog diff --git a/mm/vmscan.c b/mm/vmscan.c index 2b42ac9ad755..8f6e993b870d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1653,6 +1653,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, LIST_HEAD(free_folios); LIST_HEAD(demote_folios); unsigned int nr_reclaimed = 0; + unsigned int nr_demoted = 0; unsigned int pgactivate = 0; bool do_demote_pass; struct swap_iocb *plug = NULL; @@ -2085,7 +2086,17 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, /* 'folio_list' is always empty here */ /* Migrate folios selected for demotion */ - nr_reclaimed += demote_folio_list(&demote_folios, pgdat); + nr_demoted = demote_folio_list(&demote_folios, pgdat); + + /* + * Only count demoted folios as reclaimed if we demoted them from + * inside of the nodemask to outside of the nodemask, hence reclaiming + * pages in the nodemask. + */ + if (sc->nodemask && node_isset(pgdat->node_id, *sc->nodemask) && + !node_isset(next_demotion_node(pgdat->node_id), *sc->nodemask)) + nr_reclaimed += nr_demoted; + /* Folios that could not be demoted are still in @demote_folios */ if (!list_empty(&demote_folios)) { /* Folios which weren't demoted go back on @folio_list */