From patchwork Mon Mar 20 03:06:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cai Xinchen X-Patchwork-Id: 13180703 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71BAAC6FD1F for ; Mon, 20 Mar 2023 03:12:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF360900003; Sun, 19 Mar 2023 23:12:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EA1F1900002; Sun, 19 Mar 2023 23:12:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB795900003; Sun, 19 Mar 2023 23:12:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C89E2900002 for ; Sun, 19 Mar 2023 23:12:31 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 941C21A010F for ; Mon, 20 Mar 2023 03:12:31 +0000 (UTC) X-FDA: 80587803702.12.DCFAF2E Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf14.hostedemail.com (Postfix) with ESMTP id 70247100004 for ; Mon, 20 Mar 2023 03:12:28 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of caixinchen1@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=caixinchen1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679281949; a=rsa-sha256; cv=none; b=ygJJ8uJtr744NbT46Q7lpqrKhVQJQGub06Sk8H/SzbbQyFz22SyojTpIUgh1xP8Sj4ekcb Hg6dWs0tHunvpF+IXxr14mLDcchAMuhTqqLvQZbKpPnX905IpvJ7bT6n4KB8koHZ8n1ppi H6U2A8k76VQRgoGqI5HbDFTDpP5xNTw= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of caixinchen1@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=caixinchen1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679281949; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references; bh=greSsflW5n3S2Zv5Y8NT/3T33TRMncEElXFBIN83cnU=; b=WjqQhLWj+P8Oi1eqyjc1QfADm1hMoXayvmWZkh3AXgP98GJlgjuPvxaMjiOevZ34UWeFTV 6njObc8uuobXY6UrLXBH6GqK3dNRJTcXkSx1Bvfs/OeRmUrwkLmEgEkxfpXjQ44nGnrYRO NH5w73exJzkcwvfflZ1p1/D3qNpCwXE= Received: from kwepemi500024.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Pg08w2XsmzKs5v; Mon, 20 Mar 2023 11:10:08 +0800 (CST) Received: from ci.huawei.com (10.67.175.89) by kwepemi500024.china.huawei.com (7.221.188.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Mon, 20 Mar 2023 11:12:22 +0800 From: Cai Xinchen To: , , , , , , CC: , , , , , Subject: [PATCH 0/1] Fix vmstat_percpu incorrect subtraction after reparent Date: Mon, 20 Mar 2023 03:06:47 +0000 Message-ID: <20230320030648.50663-1-caixinchen1@huawei.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-Originating-IP: [10.67.175.89] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemi500024.china.huawei.com (7.221.188.100) X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: 70247100004 X-Rspamd-Server: rspam01 X-Stat-Signature: mitqa9dhe4g8a19mekqotu8ckboim1zo X-HE-Tag: 1679281948-231199 X-HE-Meta: U2FsdGVkX1+uDoZrX3ufgxPS24/V4iAC2LWMA83pXaczLIRsERujQNf4NYRj2cCznmXmhjtpaq8TNLsR2m5RIwHZliExYZ+SBF2EB8FtA9jvp3ecWdD2AOpPH98lhpVPxY2LbRpowbm468AA5jIkS5T2CutEDYQftYcFtGDxc1FNJWD1mjtVEwTlWd6HviEpy1HAIBtPIr+6FA5WxAvdvK1O89TIKM3A6d3U3pqlTnVAS2QLjDw2gjwVUyd2petL8QR7NYZ18oNqtEE+Hls8aeG8Op2peOk5SYgjMfvM6lytx/dZfc7BkdtxxTAJ6c3CBT97si8+HclCV62of5vokxZzxxteVZrlJjG4G3uLRow3e048v6S9NlI+pwsL9zce+B+NpuNUyeFBN9MlDSYhQujKU80dXuEz0mZEN12R+6GO3WT8MhEFlKXlx61UKjstAAmLLQkCG5L5NtBEHR72b4GjAd2BzenkR+IV9AsKydIHoXWKtt5Pq26tFugWlsetoBAO5CVa+DlYaHMUf1Ac5bJ2AJVwDYBrp+9dS2xUjWEaPT7ILTcaLQ9X8V0xEqCe1Ofz7a8hYQ8QV1UPZdlNWJRJwk+wvcyJv9pdVMWarvvKW39LOyRRGvJG0HM41Oftc2JuLpgmBNbn6Grbl2WClOv5upgkE6GCJo1LhJYWfC2EMMJnimH/wFhfe15aH+GpZ6bspHTxCe1u7P6qv0oPsfnGLL6lBqmioUinoW3bK0VKHtAkHCCHxH3LqgaNxAAPdRZs3a4GiuywTv7Vf3s9cACQ4KDQSqF6i+G0d2xWyWgzDH8Zuoogr1hqhGC4edCkxJVLgPhrb4OCxfJ0asv2MSvo3l6KyrsgKi086qdgjMw0MzRtja7Q8iJkV2DCK4Jhi7vgWGt1nRzcNKT1rMIzi1apvMY4iX2oN1NzpYK/q98OMQWsyVwXyYYDHQFU+V+PidLLZjisZCYxvGolYaa ZQdvtM9K DHZhJmQ3B+Yl4INEng7kUchnyAXszDZuPpzA/FAE52Ov0wAXT9bnD1nYkGnZiMLn57WdduW5Y03dZr3mQe/Mx3D/ggizeV8fC1y1oQ+qzaTdBYAiYuH+Y2/Dhn9niM2dKEKP3t7Xo9ANQMSzP9eSwDmApHXA5j0+ze9wp+ESPHKmxH+jTFod66+bf4KOQWygWZjV1fk+ka3WG67YMNHzSAICWIzStFyclFZ79auFxJQC1e88rPlqKP585rE8NUGClRCbc5jYVu4zS8Bo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, I see the patch-series (Use obj_cgroup APIs to charge the LRU pages). Link: https://lore.kernel.org/all/20220621125658.64935-1-songmuchun@bytedance.com/ There are two problems left: root / \ A B / \ \ C E D 1. In some case of reparent, some page cache may be used by other memcg D but it charges to the parent memcg A of dying memcg E. D is getting away with using the page for free while A is taxed. For this problem, the page may be shared by many memcgs. Which memcg should be recharged to? It is hard to select. And for recharge method, for example, the user rmdir E. If we recharge the page to D, some pages of process attached to D may be reclaimed. The user may feel confused about the phenomenon that I rmdir E but the processes attached to D are reclaiming their pages and running slower. And for cgroup v2, the page is charged to the memcg when it alloc and the stats is counted to its parent. The method of reparent seems to follow the rule. 2. The stats problem of vmstats_percpu. When memcg C is offllined, its pages are reparented to memcg P, so far P->vmstats (hierarchical) have those pages, and P->vmstats_percpu (non-hierarchical) don't. When those pages get uncharged, P->vmstats (hierachical) decreases, which is correct, but P->vmstats_percpu (non-hierarchical) also decreases, which is wrong, as those stats were never added to P->vmstats_percpu to begin with. If the reparented memory exceeds the original non-hierarchical memory in P, some arg such as cache which is show in memory.stat will be zero (if x < 0, it shows 0) I think propagate vmstats_percpu stats of dying memcg to its parent can solve this problem. If we do not propagate, the reparented memory exceeds the original non-hierarchical memory in P, (hierarchical_usage - non-hierarchical_usage(shows 0, but exactly negative number) - children_hierarchical_usage) may be meaningless. And I want to ask for your opinions about problem 1, how to define the actions of charging pages to memcg when the memcg is died. Cai Xinchen (1): mm: memcontrol: fix vmstats_percpu state incorrect subtraction after reparent kernel/cgroup/cgroup.c | 5 +++++ mm/memcontrol.c | 43 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 47 insertions(+), 1 deletion(-)