From patchwork Wed Feb 17 15:32:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12091801 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1513AC433E0 for ; Wed, 17 Feb 2021 15:32:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 85C3C64E0F for ; Wed, 17 Feb 2021 15:32:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 85C3C64E0F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EC0336B0006; Wed, 17 Feb 2021 10:32:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E70826B006C; Wed, 17 Feb 2021 10:32:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D61326B006E; Wed, 17 Feb 2021 10:32:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id C08BB6B0006 for ; Wed, 17 Feb 2021 10:32:47 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 8E82B75AA for ; Wed, 17 Feb 2021 15:32:47 +0000 (UTC) X-FDA: 77828152374.04.B0FD704 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf22.hostedemail.com (Postfix) with ESMTP id 4CCFEC000C62 for ; Wed, 17 Feb 2021 15:32:45 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id b8so7573691plh.12 for ; Wed, 17 Feb 2021 07:32:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=kE7mLzgqqc+6vOhYxJiqUhHI6vUz0T4aeCDVC6PmU30=; b=b2qP0aVfIrVcgB8xO4ckm7QHZL+79RAzka2JYVw36Lk0gYh3+CcaMpoTvULGB3KnjV z2evxWo7r63ifxXyOahnUPH2qhAbGDdInmxbxD+zLuZtGlQOoYImpA8HWgrC8xHidjrC yN+p+as2K3h4pxEX8HqyqlDoQ9pQzNNkuoWVx1Q5Qp8aCsi7aMnh8wnYlJXlAMxAt3/z sK+f5GeOPDdRBimcprVKJ8zp3FtwYB+hyvR7GUnq9uRWsO393v2C1HY1eRC+dEBvHTWB JFmYRXffi90t/TfvyOjQUQszkdXRXFWfLO2LAynThYyENVz5bSup1abPgMk2gSu4lzUV RoKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=kE7mLzgqqc+6vOhYxJiqUhHI6vUz0T4aeCDVC6PmU30=; b=P1ZxYKbH81JKelAgmeZTD/yqPt++UF4x6mzTdgdrxWmnor7xikEFYMRDUEDBaq38gO Ls0A98ghvJH2+MZ8xb95c92D8LZFD71J3UAJLazY2nqmXFJMfnRhYquosGKXdkMtt4oW EC7DA/YBhvp7dAY6ohoQRGbcXLBeu6yUMIreqVwO2bIv9AAPLifEXLp+zaFM+fFwUUeg UuUKLO2kEpIDNI431sNw+b3A8sRMVFFUjNf1GsElwWSdcyF+FfeEKzfM2mP6DvT1uAGB 3mvNs3pfiK/+poRHL4GISPA0tO5n2ev6nS+fSGsB3Xh5C0rEzOIx4a8/t1ZkasB4I+M3 BUZw== X-Gm-Message-State: AOAM530Ry9qMn3DfYT8QP64Tr9tm8qsBM8HpJo8L0tGj3k7+BXUuNt0A aZdYoYQNVUEFPcRPnzPpmlesjQ== X-Google-Smtp-Source: ABdhPJzdRsnShuvnl7vxNO0E4ReAVQVWkufj2Kb8+xK3S/HwZZqQZ7Qrw9qSk0mzav2oIF6Tv4aqkQ== X-Received: by 2002:a17:902:b610:b029:e3:2b1e:34ff with SMTP id b16-20020a170902b610b02900e32b1e34ffmr22440894pls.69.1613575965826; Wed, 17 Feb 2021 07:32:45 -0800 (PST) Received: from localhost.bytedance.net ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id q13sm2877921pfg.155.2021.02.17.07.32.42 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Feb 2021 07:32:45 -0800 (PST) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song , Shakeel Butt , Michal Hocko , stable@vger.kernel.org Subject: [PATCH v3] mm: memcontrol: fix swap undercounting in cgroup2 Date: Wed, 17 Feb 2021 23:32:37 +0800 Message-Id: <20210217153237.92484-1-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) MIME-Version: 1.0 X-Stat-Signature: hpm74etb84joi9wiff6ocjo1x38hmwju X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4CCFEC000C62 Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf22; identity=mailfrom; envelope-from=""; helo=mail-pl1-f176.google.com; client-ip=209.85.214.176 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613575965-177003 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When pages are swapped in, the VM may retain the swap copy to avoid repeated writes in the future. It's also retained if shared pages are faulted back in some processes, but not in others. During that time we have an in-memory copy of the page, as well as an on-swap copy. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly differently due to the nature of how they account memory and swap: Cgroup1 has a unified memory+swap counter that tracks a data page regardless whether it's in-core or swapped out. On swapin, we transfer the charge from the swap entry to the newly allocated swapcache page, even though the swap entry might stick around for a while. That's why we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). Cgroup2 tracks memory and swap as separate, independent resources and thus has split memory and swap counters. On swapin, we charge the newly allocated swapcache page as memory, while the swap slot in turn must remain charged to the swap counter as long as its allocated too. The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control"), because it accidentally removed the do_memsw_account() check in the branch inside mem_cgroup_uncharge() that was supposed to tell the difference between the charge transfer in cgroup1 and the separate counters in cgroup2. As a result, cgroup2 currently undercounts retained swap to varying degrees: swap slots are cached up to 50% of the configured limit or total available swap space; partially faulted back shared pages are only limited by physical capacity. This in turn allows cgroups to significantly overconsume their alloted swap space. Add the do_memsw_account() check back to fix this problem. Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") Signed-off-by: Muchun Song Acked-by: Johannes Weiner Reviewed-by: Shakeel Butt Acked-by: Michal Hocko Cc: stable@vger.kernel.org # 5.8+ --- v3: - Replace !cgroup_subsys_on_dfl(memory_cgrp_subsys) with do_memsw_account(). Thanks to Shakeel. v2: - update commit log and add a comment to the code. Very thanks to Johannes. mm/memcontrol.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ed5cc78a8dbf..b5a66b98af74 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6771,7 +6771,19 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) memcg_check_events(memcg, page); local_irq_enable(); - if (PageSwapCache(page)) { + /* + * Cgroup1's unified memory+swap counter has been charged with the + * new swapcache page, finish the transfer by uncharging the swap + * slot. The swap slot would also get uncharged when it dies, but + * it can stick around indefinitely and we'd count the page twice + * the entire time. + * + * Cgroup2 has separate resource counters for memory and swap, + * so this is a non-issue here. Memory and swap charge lifetimes + * correspond 1:1 to page and swap slot lifetimes: we charge the + * page to memory here, and uncharge swap when the slot is freed. + */ + if (do_memsw_account() && PageSwapCache(page)) { swp_entry_t entry = { .val = page_private(page) }; /* * The swap entry might not get freed for a long time,