From patchwork Tue Dec 24 07:53:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11309155 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 02144138D for ; Tue, 24 Dec 2019 07:55:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D471520722 for ; Tue, 24 Dec 2019 07:55:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="vaUVC/DH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726047AbfLXHzR (ORCPT ); Tue, 24 Dec 2019 02:55:17 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:45756 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726043AbfLXHzR (ORCPT ); Tue, 24 Dec 2019 02:55:17 -0500 Received: by mail-pl1-f193.google.com with SMTP id b22so8170439pls.12 for ; Mon, 23 Dec 2019 23:55:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=V1NIvureCn33Tn3ECKZBbHdYOKZ0mZ/srBiMlz7CqOo=; b=vaUVC/DHZHrYVeAmeVTVNycr7RSOOShtL0anFZVIrQTDBEW6y4NcDlgg06k3VdlMIU y28LQYrFnfOe3w4bYy/fP5H1Wr9BNt3v9Udg4OxWCmBsiVVEzFOMYYSZwNNc4zFaL23D X6x4xJJMalWRbUFVaHb4p9ulQUHFOe0aCozmWYUnrMKU8jAADiqGkKTc4KtipzY9oGNn WCPkB/aRAcxCJixLDhLreuTSNJcahFoVfD3fVsUXqZleggWrFMalGnuRnenh2aMf6QcQ GaL/cpTjS9exDf0ECPYMAoEynr5LzVDZyAReYOFPaBIraG7+bf4BvxP40Kmn00cAkuT8 qkHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=V1NIvureCn33Tn3ECKZBbHdYOKZ0mZ/srBiMlz7CqOo=; b=XH//M5TKoSuT5NCj9zrEzQLti7y0X8o/cO6+EtruGSIgrJO/KCoX1pIO13B0E2PEda 5ihEcn3Ca1Wfq+HS0em0N3adir+HEKEHQi3A2a9if39Vsf8XdXQd56x83DPB6S2Z4Eri t7/OAn7oWlewJrlUIUDJyNL5AxFtI5ZGdR2j2bYesWf1E88ruq+9bG04wDQjJXoihU30 WhH0x3PWpGeUgkucD0YqIJyRUzBE6nP0gsj57ILJCF6Epu6fisKodGxjZjxcw574pW4i 7/gCdga8mUhAI0fsUQvFjWprAAGCUkZgD3C7Dqq294imCHzF2P1zwRpYUsxhsVo1H6hI 8viw== X-Gm-Message-State: APjAAAUSXHRLBbIManu+OaKUId3IX1W0pA8DRG5Idn9JadIjezv2wRmu 6TPHI56WupFsbB6g5BGa3zw= X-Google-Smtp-Source: APXvYqyT8qOnAq66pXq7Xp1RgbKisswZoZED+pR98Qm34GbQU76x3QyoL8MQF2ZduR4HUfbhrLxMpA== X-Received: by 2002:a17:90b:f06:: with SMTP id br6mr4174688pjb.125.1577174116520; Mon, 23 Dec 2019 23:55:16 -0800 (PST) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id c2sm2004064pjq.27.2019.12.23.23.55.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Dec 2019 23:55:15 -0800 (PST) From: Yafang Shao To: hannes@cmpxchg.org, david@fromorbit.com, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, viro@zeniv.linux.org.uk Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Yafang Shao Subject: [PATCH v2 1/5] mm, memcg: reduce size of struct mem_cgroup by using bit field Date: Tue, 24 Dec 2019 02:53:22 -0500 Message-Id: <1577174006-13025-2-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1577174006-13025-1-git-send-email-laoar.shao@gmail.com> References: <1577174006-13025-1-git-send-email-laoar.shao@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org There are some members in struct mem_group can be either 0(false) or 1(true), so we can define them using bit field to reduce size. With this patch, the size of struct mem_cgroup can be reduced by 64 bytes in theory, but as there're some MEMCG_PADDING()s, the real number may be different, which is relate with the cacheline size. Anyway, this patch could reduce the size of struct mem_cgroup more or less. Signed-off-by: Yafang Shao --- include/linux/memcontrol.h | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a7a0a1a5..612a457 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -229,20 +229,26 @@ struct mem_cgroup { /* * Should the accounting and control be hierarchical, per subtree? */ - bool use_hierarchy; + unsigned int use_hierarchy : 1; /* * Should the OOM killer kill all belonging tasks, had it kill one? */ - bool oom_group; + unsigned int oom_group : 1; /* protected by memcg_oom_lock */ - bool oom_lock; - int under_oom; + unsigned int oom_lock : 1; - int swappiness; /* OOM-Killer disable */ - int oom_kill_disable; + unsigned int oom_kill_disable : 1; + + /* Legacy tcp memory accounting */ + unsigned int tcpmem_active : 1; + unsigned int tcpmem_pressure : 1; + + int under_oom; + + int swappiness; /* memory.events and memory.events.local */ struct cgroup_file events_file; @@ -297,9 +303,6 @@ struct mem_cgroup { unsigned long socket_pressure; - /* Legacy tcp memory accounting */ - bool tcpmem_active; - int tcpmem_pressure; #ifdef CONFIG_MEMCG_KMEM /* Index in the kmem_cache->memcg_params.memcg_caches array */ From patchwork Tue Dec 24 07:53:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11309159 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 71680109A for ; Tue, 24 Dec 2019 07:55:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4F2B620722 for ; Tue, 24 Dec 2019 07:55:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Rya8+jtC" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726102AbfLXHzU (ORCPT ); Tue, 24 Dec 2019 02:55:20 -0500 Received: from mail-pj1-f66.google.com ([209.85.216.66]:53536 "EHLO mail-pj1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726043AbfLXHzU (ORCPT ); Tue, 24 Dec 2019 02:55:20 -0500 Received: by mail-pj1-f66.google.com with SMTP id n96so872003pjc.3 for ; Mon, 23 Dec 2019 23:55:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=fFfoleXjv3JSEmvkK0O9oMZGbyG7YMfuDochkkrQvOQ=; b=Rya8+jtCyVchuEd4ioU+noODodMq+INQiLX/PL4jSKcYg5GvOf/SXNDOQHJtEgbW8y xKL05hnbQGUtL3Acr7ahz8ME2eCZf7ZuMunCKvbRRAimebxigKbETepsyTSzWlQ9BgXB 3GWj0gLhRxLYw/JJUiPcpH5hRxy5+e+SEzVL3VAEoj1PGXL6tzT9xSp+STluqj8AzL/j MOWHh7EVc9HXo7isZza9cfU/KNvur+BDPfzHuBNNWOckvpzIdobDCGncrm7kproAsgCm nYd9QDN5QIDzAXYaZ6OccZH5PF/Aw9nhoe4DlLL++4qM2OcHdVA8Z5lRMeNFWbez74aW 24vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=fFfoleXjv3JSEmvkK0O9oMZGbyG7YMfuDochkkrQvOQ=; b=Ck4z/RzBryQXGf8vgSwFVsJdWnEG2uKWI9yfoI1xMsql2C8raPWwYMDGrVINFLlBV2 +8Qpt2XRK19rKmIkU94e/Ae+HDkeLH/UTo8e3+eYbtE5bnJ00Xn5xVBftwB7tQOntuBX /7KjfwXJ5NM4ogMrwxVFpF6UU42Ppw/+er5cQiG+uwuXCJVwRvIZv9TSsTmpsYajgNLE vfF/j0nBLQrSLRKSlUYY4wppbdDOd3C/LFk7nlnEL+MZOTYiaGuPETb41/Z9m/P9fq2+ nL2+Rqrowqp/lJhajKT7NGvOmGR41OR/8uBrmNO4rCNcJbYWfVdk9CEkLGNv5SGc74ZO K0Aw== X-Gm-Message-State: APjAAAWoMM2LoTbNHhcLALLzRln+kW2IkPzFHsucYL+W9C7oG2mLzdc2 8ZwQaFjJ+8FeMMF8WavSBVU= X-Google-Smtp-Source: APXvYqzwb5nZVKnI+8k0kQUlZltmzZl2BXhecd6/pm7gYC87WxNmmvp1PaWB32bCf3nsZxt4xB3f0Q== X-Received: by 2002:a17:90a:ead3:: with SMTP id ev19mr4262063pjb.80.1577174120029; Mon, 23 Dec 2019 23:55:20 -0800 (PST) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id c2sm2004064pjq.27.2019.12.23.23.55.16 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Dec 2019 23:55:19 -0800 (PST) From: Yafang Shao To: hannes@cmpxchg.org, david@fromorbit.com, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, viro@zeniv.linux.org.uk Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Yafang Shao , Roman Gushchin Subject: [PATCH v2 2/5] mm, memcg: introduce MEMCG_PROT_SKIP for memcg zero usage case Date: Tue, 24 Dec 2019 02:53:23 -0500 Message-Id: <1577174006-13025-3-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1577174006-13025-1-git-send-email-laoar.shao@gmail.com> References: <1577174006-13025-1-git-send-email-laoar.shao@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If the usage of a memcg is zero, we don't need to do useless work to scan it. That is a minor optimization. Cc: Roman Gushchin Signed-off-by: Yafang Shao --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 2 +- mm/vmscan.c | 6 ++++++ 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 612a457..1a315c7 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -54,6 +54,7 @@ enum mem_cgroup_protection { MEMCG_PROT_NONE, MEMCG_PROT_LOW, MEMCG_PROT_MIN, + MEMCG_PROT_SKIP, /* For zero usage case */ }; struct mem_cgroup_reclaim_cookie { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c5b5f74..f35fcca 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6292,7 +6292,7 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, usage = page_counter_read(&memcg->memory); if (!usage) - return MEMCG_PROT_NONE; + return MEMCG_PROT_SKIP; emin = memcg->memory.min; elow = memcg->memory.low; diff --git a/mm/vmscan.c b/mm/vmscan.c index 5a6445e..3c4c2da 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2677,6 +2677,12 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) * thresholds (see get_scan_count). */ break; + case MEMCG_PROT_SKIP: + /* + * Skip scanning this memcg if the usage of it is + * zero. + */ + continue; } reclaimed = sc->nr_reclaimed; From patchwork Tue Dec 24 07:53:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11309163 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 16C34138D for ; Tue, 24 Dec 2019 07:55:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E66C22071E for ; Tue, 24 Dec 2019 07:55:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Gaa4LRfG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726140AbfLXHzY (ORCPT ); Tue, 24 Dec 2019 02:55:24 -0500 Received: from mail-pf1-f196.google.com ([209.85.210.196]:33783 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726043AbfLXHzY (ORCPT ); Tue, 24 Dec 2019 02:55:24 -0500 Received: by mail-pf1-f196.google.com with SMTP id z16so10413875pfk.0 for ; Mon, 23 Dec 2019 23:55:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=jfKPUo9lIku1R2ahp2lE/JKldZJ7MZdv3GlFZ3kEMlk=; b=Gaa4LRfGyOoT4mcvSupClLdax8k3FG9KAqjHzo4kjy2lKdgb3gPtZxcshF8/w3UAYw oUzjAC2i6C/3lX3na16TLMtL1zEK2qnJaS8GpN4bxtVgQsXe3V+x0OZd8AVYwK52v+Jz KVqPoe42Dawe3tmBAST+AdjrL773RmDLid0ehT9oSBdbO7+wpvEtckN5571GyeA9IQJH ZR9zEaA4R9iTKVFon3OHlwHGbB6rsWF+8jRhYoGRmEYlYogg/tn+AdOKmkDf9w2AFnzr LM+BExf+n/Ii13V1sA51PNpGoRjE5oXcLqrRjTznnP0eIqBQRVzLf1J1fov7ggl5tz0i w6eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=jfKPUo9lIku1R2ahp2lE/JKldZJ7MZdv3GlFZ3kEMlk=; b=CBKwAhtJsxeYvKwloCGPs0ZFrr9QwCsulQD/p762eA1nfCxx12gWKd42xCNKu+eB4t xLMrjSs4XJ2WxHSBBRjmwIjdpjvWuhVUVOHQrwHDHZjkDqkl7iikjQUiLYgjdi+AGG+i JCBAvW+m5Lagp32TfAsaFRCsll/wkG9peL8kdmR3hIWJUxrt8Fv1u6oyQIsnfDCgDnPf 45s8Wj+2rH2p/Ff4rtT4wubpE95wUpsOK54Zbs87QRZMWmIeoLDpPIov3Mnoyq6SX8dX vdOqQzjIR7UHpO2fxYMAgTG9wu4uV5izs446JVGckMX1rkVk9ScNOJFprb+PCUwDrYgD grig== X-Gm-Message-State: APjAAAXwzg1RqTQIfbCZiXlliuWAHkM0luhloRmZ+O15evtyOiiKqv3C tP0SWL4LK7x+APMf86NH7gYjgg7BcN4= X-Google-Smtp-Source: APXvYqx5yy+fa0xyry+JOfLiSMfzbupoakJJbuMOlC+zi6L36BrPLDVcL6DMDnAvbUxjPELtPhMdgg== X-Received: by 2002:a63:554c:: with SMTP id f12mr36614237pgm.23.1577174123459; Mon, 23 Dec 2019 23:55:23 -0800 (PST) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id c2sm2004064pjq.27.2019.12.23.23.55.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Dec 2019 23:55:22 -0800 (PST) From: Yafang Shao To: hannes@cmpxchg.org, david@fromorbit.com, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, viro@zeniv.linux.org.uk Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Yafang Shao , Chris Down Subject: [PATCH v2 3/5] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Date: Tue, 24 Dec 2019 02:53:24 -0500 Message-Id: <1577174006-13025-4-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1577174006-13025-1-git-send-email-laoar.shao@gmail.com> References: <1577174006-13025-1-git-send-email-laoar.shao@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org memory.{emin, elow} are set in mem_cgroup_protected(), and the values of them won't be changed until next recalculation in this function. After either or both of them are set, the next reclaimer to relcaim this memcg may be a different reclaimer, e.g. this memcg is also the root memcg of the new reclaimer, and then in mem_cgroup_protection() in get_scan_count() the old values of them will be used to calculate scan count, that is not proper. We should reset them to zero in this case. Here's an example of this issue. root_mem_cgroup / A memory.max=1024M memory.min=512M memory.current=800M Once kswapd is waked up, it will try to scan all MEMCGs, including this A, and it will assign memory.emin of A with 512M. After that, A may reach its hard limit(memory.max), and then it will do memcg reclaim. Because A is the root of this reclaimer, so it will not calculate its memory.emin. So the memory.emin is the old value 512M, and then this old value will be used in mem_cgroup_protection() in get_scan_count() to get the scan count. That is not proper. Cc: Chris Down Signed-off-by: Yafang Shao --- mm/memcontrol.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f35fcca..2e78931 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, if (!root) root = root_mem_cgroup; - if (memcg == root) + if (memcg == root) { + /* + * Reset memory.(emin, elow) for reclaiming the memcg + * itself. + */ + if (memcg != root_mem_cgroup) { + memcg->memory.emin = 0; + memcg->memory.elow = 0; + } return MEMCG_PROT_NONE; + } usage = page_counter_read(&memcg->memory); if (!usage) From patchwork Tue Dec 24 07:53:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11309167 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 73F42109A for ; Tue, 24 Dec 2019 07:55:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4987E20722 for ; Tue, 24 Dec 2019 07:55:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="q+8rr6aj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726171AbfLXHz1 (ORCPT ); Tue, 24 Dec 2019 02:55:27 -0500 Received: from mail-pj1-f68.google.com ([209.85.216.68]:39247 "EHLO mail-pj1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726043AbfLXHz1 (ORCPT ); Tue, 24 Dec 2019 02:55:27 -0500 Received: by mail-pj1-f68.google.com with SMTP id t101so886934pjb.4 for ; Mon, 23 Dec 2019 23:55:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=5g2CLXGgkilXyD/2e6aUVdcgkPOtGJ9tnX8zatE5JMA=; b=q+8rr6ajBZfR9pA/qszO7K4g081IeY3nrEQ0eX0Ta/Iz18TJqfZeSQjtCFBLFjUIJH v6i+88Xa5c6dpn80FykmXrKZM39ZVIo3cpDEAsTTlpvjSgVf//zRp2wtShjifLC9M4SY OtcFOM6UMIji/p0EQHRYLXjl/GBJvgyOgMjM0+ejtQajtd9JVh9dA4y1qu18XJ0eav3L PaYyfXGVm+Sd5g6L20U+HWInvM4gQVtz/hudATeZ9XiACsil7xbo9+wLor6vNYcvw3Qa LjhUJ9FKqhoTbebSbGFj/wE19bXexAbrbzZXEZD1hiXpbwxRaP4DDYhNIi6xdIMXLD0X 4weA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=5g2CLXGgkilXyD/2e6aUVdcgkPOtGJ9tnX8zatE5JMA=; b=Kx0EDg45clgbQvPXZ8swXsZsdTVj9pDJpUOt7kwCXvzKmxjFhUNIoTxeAy9qYq+ZlD 6JlrURsVeTfVKBCVX4HW5BcIFLB6Vx3AEhyx4b7hQDzbpC/a98/+JVskftq4XqqWOajO E66cEFxL5xfsxca6IvovrGrxGd2g0kI41AxTeV7c4BWOfaH183hUKzsE8EBTzhNJhi/M YZ8OymFJBP8h+u3lDmuPnFyDo4BiPLgkRRnAjhPO9Iaz1MNi9QfG/qmcLINnLBDvDIZd J/T8Wr/tq8u5F/AX68WkHr3ZPxwZyAo3cMMmEUtG97gJrIJosKvJErubHlHQGHwU8lzE 5kpg== X-Gm-Message-State: APjAAAVt3fFAhyKu7WEddy00GlKov5FNbyqU2OWKmk6YqNUOL0D43yib LXL+w/Sqo+ikdJf5YZlyYQU= X-Google-Smtp-Source: APXvYqyjF29Qc3jrwSpUGKp+qUPPcOVOx2HneauT30nXLFZAOThRTyDFmjPqhi8ssMoVbQr35afI0g== X-Received: by 2002:a17:90a:77c1:: with SMTP id e1mr4325998pjs.134.1577174126920; Mon, 23 Dec 2019 23:55:26 -0800 (PST) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id c2sm2004064pjq.27.2019.12.23.23.55.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Dec 2019 23:55:26 -0800 (PST) From: Yafang Shao To: hannes@cmpxchg.org, david@fromorbit.com, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, viro@zeniv.linux.org.uk Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Yafang Shao , Dave Chinner Subject: [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function Date: Tue, 24 Dec 2019 02:53:25 -0500 Message-Id: <1577174006-13025-5-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1577174006-13025-1-git-send-email-laoar.shao@gmail.com> References: <1577174006-13025-1-git-send-email-laoar.shao@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The lru walker isolation function may use this memcg to do something, e.g. the inode isolatation function will use the memcg to do inode protection in followup patch. So make memcg visible to the lru walker isolation function. Something should be emphasized in this patch is it replaces for_each_memcg_cache_index() with for_each_mem_cgroup() in list_lru_walk_node(). Because there's a gap between these two MACROs that for_each_mem_cgroup() depends on CONFIG_MEMCG while the other one depends on CONFIG_MEMCG_KMEM. But as list_lru_memcg_aware() returns false if CONFIG_MEMCG_KMEM is not configured, it is safe to this replacement. Cc: Dave Chinner Signed-off-by: Yafang Shao --- include/linux/memcontrol.h | 21 +++++++++++++++++++++ mm/list_lru.c | 22 ++++++++++++---------- mm/memcontrol.c | 15 --------------- 3 files changed, 33 insertions(+), 25 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 1a315c7..f36ada9 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -449,6 +449,21 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *, int mem_cgroup_scan_tasks(struct mem_cgroup *, int (*)(struct task_struct *, void *), void *); +/* + * Iteration constructs for visiting all cgroups (under a tree). If + * loops are exited prematurely (break), mem_cgroup_iter_break() must + * be used for reference counting. + */ +#define for_each_mem_cgroup_tree(iter, root) \ + for (iter = mem_cgroup_iter(root, NULL, NULL); \ + iter != NULL; \ + iter = mem_cgroup_iter(root, iter, NULL)) + +#define for_each_mem_cgroup(iter) \ + for (iter = mem_cgroup_iter(NULL, NULL, NULL); \ + iter != NULL; \ + iter = mem_cgroup_iter(NULL, iter, NULL)) + static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg) { if (mem_cgroup_disabled()) @@ -949,6 +964,12 @@ static inline int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, return 0; } +#define for_each_mem_cgroup_tree(iter) \ + for (iter = NULL; iter; ) + +#define for_each_mem_cgroup(iter) \ + for (iter = NULL; iter; ) + static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg) { return 0; diff --git a/mm/list_lru.c b/mm/list_lru.c index 0f1f6b0..536830d 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -207,11 +207,11 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid) EXPORT_SYMBOL_GPL(list_lru_count_node); static unsigned long -__list_lru_walk_one(struct list_lru_node *nlru, int memcg_idx, +__list_lru_walk_one(struct list_lru_node *nlru, struct mem_cgroup *memcg, list_lru_walk_cb isolate, void *cb_arg, unsigned long *nr_to_walk) { - + int memcg_idx = memcg_cache_id(memcg); struct list_lru_one *l; struct list_head *item, *n; unsigned long isolated = 0; @@ -273,7 +273,7 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid) unsigned long ret; spin_lock(&nlru->lock); - ret = __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_arg, + ret = __list_lru_walk_one(nlru, memcg, isolate, cb_arg, nr_to_walk); spin_unlock(&nlru->lock); return ret; @@ -289,7 +289,7 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid) unsigned long ret; spin_lock_irq(&nlru->lock); - ret = __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_arg, + ret = __list_lru_walk_one(nlru, memcg, isolate, cb_arg, nr_to_walk); spin_unlock_irq(&nlru->lock); return ret; @@ -299,17 +299,15 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate, void *cb_arg, unsigned long *nr_to_walk) { + struct mem_cgroup *memcg; long isolated = 0; - int memcg_idx; - isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg, - nr_to_walk); - if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) { - for_each_memcg_cache_index(memcg_idx) { + if (list_lru_memcg_aware(lru)) { + for_each_mem_cgroup(memcg) { struct list_lru_node *nlru = &lru->node[nid]; spin_lock(&nlru->lock); - isolated += __list_lru_walk_one(nlru, memcg_idx, + isolated += __list_lru_walk_one(nlru, memcg, isolate, cb_arg, nr_to_walk); spin_unlock(&nlru->lock); @@ -317,7 +315,11 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid, if (*nr_to_walk <= 0) break; } + } else { + isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg, + nr_to_walk); } + return isolated; } EXPORT_SYMBOL_GPL(list_lru_walk_node); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2e78931..2fc2bf4 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -222,21 +222,6 @@ enum res_type { /* Used for OOM nofiier */ #define OOM_CONTROL (0) -/* - * Iteration constructs for visiting all cgroups (under a tree). If - * loops are exited prematurely (break), mem_cgroup_iter_break() must - * be used for reference counting. - */ -#define for_each_mem_cgroup_tree(iter, root) \ - for (iter = mem_cgroup_iter(root, NULL, NULL); \ - iter != NULL; \ - iter = mem_cgroup_iter(root, iter, NULL)) - -#define for_each_mem_cgroup(iter) \ - for (iter = mem_cgroup_iter(NULL, NULL, NULL); \ - iter != NULL; \ - iter = mem_cgroup_iter(NULL, iter, NULL)) - static inline bool should_force_charge(void) { return tsk_is_oom_victim(current) || fatal_signal_pending(current) || From patchwork Tue Dec 24 07:53:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11309171 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5BDC3138D for ; Tue, 24 Dec 2019 07:55:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3130720730 for ; Tue, 24 Dec 2019 07:55:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="M4vEKx3w" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726184AbfLXHzb (ORCPT ); Tue, 24 Dec 2019 02:55:31 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:45771 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726178AbfLXHzb (ORCPT ); Tue, 24 Dec 2019 02:55:31 -0500 Received: by mail-pl1-f193.google.com with SMTP id b22so8170657pls.12 for ; Mon, 23 Dec 2019 23:55:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=kZ3OMcMXwx7R/y0P55Kf7rXfdQYQYitdi49ZafohpU8=; b=M4vEKx3wrrHwrHAxdlTcjzq22bDDXAAhjs9OdAGn5eeqZtFHHJEiH2RenGeIweDvnt a8jq2l9fVB5PQ/T5ATnsfVZZaHep6aTV+1HfJNEpBUEwWDAWgjXqgpm3YX6ouPYOfEFi W+sBJVOjgBHaefVmRraRA52SA+gTUGJDbgr0Zrq3PWMB0S9GMgtP6eRn3d9gn5Cs33jW DPYJLkYm+cj+wLGzPt/7dirhzudrVCN2fkTwHl33MlJXXzI1TcgcVtufRlFEtoghNb7E jSATInsJOHEIAoMGD1SSjDgTP9nHPbl4WBfiAaTPig1DXFnfg3cUeaknHcgB8Qo5VvlS VM4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=kZ3OMcMXwx7R/y0P55Kf7rXfdQYQYitdi49ZafohpU8=; b=rQty4ksllSCZIglsAz2F/anYhvJRS5WMI5adbWR0UOoMBXBx0XyMfHwmAi66x1KdgG PjdtMIRMg3XEccSkeJZhp2VAL1EN5XuePsqn+X603R1uJsEmxG4C3MkuvuJC4Z6biDF4 5LMdMfIs1MJ+WoRly2OUzJJtBJsSv8yNwDhl2ZWgrL2zl+rxTLgbVm9tIZdolXPAS/Yc OcN9D/GdhxWSx1kK5FtyvDlc4CgwsrQBxIphgsnDuxprGfjo7YTeLawDQrQCWDPtekFs mJIFqV3CssGpMJ7yC6brGeIydrLceZniRY8iSjT6rVSIkdwFNC9NxwACuKl0dpz9C2// cbxg== X-Gm-Message-State: APjAAAXSt7Wz5g7v3sVYMHOFkYsurKgpU8Zb895BtdzifOxtXaN8xF1B nHmLm/rMc9+3nU9TFgbcouM= X-Google-Smtp-Source: APXvYqyHpzRKeidKJFiCUXzcUbXAqifMuaaaaknwdZo6Y+QXCqEurYZ93Agk2WD8yfSYA+6iGUqcjQ== X-Received: by 2002:a17:90a:e291:: with SMTP id d17mr4383136pjz.116.1577174130816; Mon, 23 Dec 2019 23:55:30 -0800 (PST) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id c2sm2004064pjq.27.2019.12.23.23.55.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Dec 2019 23:55:30 -0800 (PST) From: Yafang Shao To: hannes@cmpxchg.org, david@fromorbit.com, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, viro@zeniv.linux.org.uk Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Yafang Shao , Roman Gushchin , Chris Down , Dave Chinner Subject: [PATCH v2 5/5] memcg, inode: protect page cache from freeing inode Date: Tue, 24 Dec 2019 02:53:26 -0500 Message-Id: <1577174006-13025-6-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1577174006-13025-1-git-send-email-laoar.shao@gmail.com> References: <1577174006-13025-1-git-send-email-laoar.shao@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On my server there're some running MEMCGs protected by memory.{min, low}, but I found the usage of these MEMCGs abruptly became very small, which were far less than the protect limit. It confused me and finally I found that was because of inode stealing. Once an inode is freed, all its belonging page caches will be dropped as well, no matter how may page caches it has. So if we intend to protect the page caches in a memcg, we must protect their host (the inode) first. Otherwise the memcg protection can be easily bypassed with freeing inode, especially if there're big files in this memcg. Supposes we have a memcg, and the stat of this memcg is, memory.current = 1024M memory.min = 512M And in this memcg there's a inode with 800M page caches. Once this memcg is scanned by kswapd or other regular reclaimers, kswapd <<<< It can be either of the regular reclaimers. shrink_node_memcgs switch (mem_cgroup_protected()) <<<< Not protected case MEMCG_PROT_NONE: <<<< Will scan this memcg beak; shrink_lruvec() <<<< Reclaim the page caches shrink_slab() <<<< It may free this inode and drop all its page caches(800M). So we must protect the inode first if we want to protect page caches. The inherent mismatch between memcg and inode is a trouble. One inode can be shared by different MEMCGs, but it is a very rare case. If an inode is shared, its belonging page caches may be charged to different MEMCGs. Currently there's no perfect solution to fix this kind of issue, but the inode majority-writer ownership switching can help it more or less. Cc: Roman Gushchin Cc: Chris Down Cc: Dave Chinner Signed-off-by: Yafang Shao Reported-by: kbuild test robot Reported-by: kbuild test robot --- fs/inode.c | 25 +++++++++++++++++++++++-- include/linux/memcontrol.h | 11 ++++++++++- mm/memcontrol.c | 43 +++++++++++++++++++++++++++++++++++++++++++ mm/vmscan.c | 5 +++++ 4 files changed, 81 insertions(+), 3 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index fef457a..4f4b2f3 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -54,6 +54,13 @@ * inode_hash_lock */ +struct inode_head { + struct list_head *freeable; +#ifdef CONFIG_MEMCG_KMEM + struct mem_cgroup *memcg; +#endif +}; + static unsigned int i_hash_mask __read_mostly; static unsigned int i_hash_shift __read_mostly; static struct hlist_head *inode_hashtable __read_mostly; @@ -724,8 +731,10 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) static enum lru_status inode_lru_isolate(struct list_head *item, struct list_lru_one *lru, spinlock_t *lru_lock, void *arg) { - struct list_head *freeable = arg; + struct inode_head *ihead = (struct inode_head *)arg; + struct list_head *freeable = ihead->freeable; struct inode *inode = container_of(item, struct inode, i_lru); + struct mem_cgroup *memcg = NULL; /* * we are inverting the lru lock/inode->i_lock here, so use a trylock. @@ -734,6 +743,15 @@ static enum lru_status inode_lru_isolate(struct list_head *item, if (!spin_trylock(&inode->i_lock)) return LRU_SKIP; +#ifdef CONFIG_MEMCG_KMEM + memcg = ihead->memcg; +#endif + if (memcg && inode->i_data.nrpages && + !(memcg_can_reclaim_inode(memcg, inode))) { + spin_unlock(&inode->i_lock); + return LRU_ROTATE; + } + /* * Referenced or dirty inodes are still in use. Give them another pass * through the LRU as we canot reclaim them now. @@ -789,11 +807,14 @@ static enum lru_status inode_lru_isolate(struct list_head *item, */ long prune_icache_sb(struct super_block *sb, struct shrink_control *sc) { + struct inode_head ihead; LIST_HEAD(freeable); long freed; + ihead.freeable = &freeable; + ihead.memcg = sc->memcg; freed = list_lru_shrink_walk(&sb->s_inode_lru, sc, - inode_lru_isolate, &freeable); + inode_lru_isolate, &ihead); dispose_list(&freeable); return freed; } diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index f36ada9..d1d4175 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -247,6 +247,9 @@ struct mem_cgroup { unsigned int tcpmem_active : 1; unsigned int tcpmem_pressure : 1; + /* Soft protection will be ignored if it's true */ + unsigned int in_low_reclaim : 1; + int under_oom; int swappiness; @@ -363,7 +366,7 @@ static inline unsigned long mem_cgroup_protection(struct mem_cgroup *memcg, enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, struct mem_cgroup *memcg); - +bool memcg_can_reclaim_inode(struct mem_cgroup *memcg, struct inode *inode); int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask, struct mem_cgroup **memcgp, bool compound); @@ -865,6 +868,12 @@ static inline enum mem_cgroup_protection mem_cgroup_protected( return MEMCG_PROT_NONE; } +static inline bool memcg_can_reclaim_inode(struct mem_cgroup *memcg, + struct inode *memcg) +{ + return true; +} + static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask, struct mem_cgroup **memcgp, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2fc2bf4..c3498fd 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6340,6 +6340,49 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, } /** + * Once an inode is freed, all its belonging page caches will be dropped as + * well, even if there're lots of page caches. So if we intend to protect + * page caches in a memcg, we must protect their host(the inode) first. + * Otherwise the memcg protection can be easily bypassed with freeing inode, + * especially if there're big files in this memcg. + * Note that it may happen that the page caches are already charged to the + * memcg, but the inode hasn't been added to this memcg yet. In this case, + * this inode is not protected. + * The inherent mismatch between memcg and inode is a trouble. One inode + * can be shared by different MEMCGs, but it is a very rare case. If + * an inode is shared, its belonging page caches may be charged to + * different MEMCGs. Currently there's no perfect solution to fix this + * kind of issue, but the inode majority-writer ownership switching can + * help it more or less. + */ +bool memcg_can_reclaim_inode(struct mem_cgroup *memcg, + struct inode *inode) +{ + unsigned long cgroup_size; + unsigned long protection; + bool reclaimable = true; + + if (memcg == root_mem_cgroup) + goto out; + + protection = mem_cgroup_protection(memcg, memcg->in_low_reclaim); + if (!protection) + goto out; + + /* + * Don't protect this inode if the usage of this memcg is still + * above the protection after reclaiming this inode and all its + * belonging page caches. + */ + cgroup_size = mem_cgroup_size(memcg); + if (inode->i_data.nrpages + protection > cgroup_size) + reclaimable = false; + +out: + return reclaimable; +} + +/** * mem_cgroup_try_charge - try charging a page * @page: page to charge * @mm: mm context of the victim diff --git a/mm/vmscan.c b/mm/vmscan.c index 3c4c2da..ecc5c1d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2666,6 +2666,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) sc->memcg_low_skipped = 1; continue; } + + memcg->in_low_reclaim = 1; memcg_memory_event(memcg, MEMCG_LOW); break; case MEMCG_PROT_NONE: @@ -2693,6 +2695,9 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority); + if (memcg->in_low_reclaim) + memcg->in_low_reclaim = 0; + /* Record the group's reclaim efficiency */ vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,