From patchwork Sun Feb 23 09:31:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11398745 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B78DE924 for ; Sun, 23 Feb 2020 09:32:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 774F5208C4 for ; Sun, 23 Feb 2020 09:32:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="vBUosExY" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 774F5208C4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4BC756B0006; Sun, 23 Feb 2020 04:32:06 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 491E16B0007; Sun, 23 Feb 2020 04:32:06 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 310616B0008; Sun, 23 Feb 2020 04:32:06 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0087.hostedemail.com [216.40.44.87]) by kanga.kvack.org (Postfix) with ESMTP id 1A1B06B0006 for ; Sun, 23 Feb 2020 04:32:06 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id BCEA3181AC9CC for ; Sun, 23 Feb 2020 09:32:05 +0000 (UTC) X-FDA: 76520875410.16.park38_2f52dfeb6654e X-Spam-Summary: 2,0,0,71915781f5cfeda8,d41d8cd98f00b204,laoar.shao@gmail.com,,RULES_HIT:2:41:69:355:379:541:800:960:973:988:989:1260:1345:1359:1437:1535:1605:1730:1747:1777:1792:2194:2198:2199:2200:2393:2553:2559:2562:2693:2897:2898:3138:3139:3140:3141:3142:3865:3867:3868:3870:3871:3872:3874:4049:4120:4250:4321:4605:5007:6261:6653:7514:9413:9592:10004:11026:11232:11473:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12683:12986:13161:13172:13229:13255:14394:14687:21080:21444:21451:21627:21666:21966:21990:30054:30070:30090,0,RBL:209.85.215.195:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: park38_2f52dfeb6654e X-Filterd-Recvd-Size: 9163 Received: from mail-pg1-f195.google.com (mail-pg1-f195.google.com [209.85.215.195]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Sun, 23 Feb 2020 09:32:05 +0000 (UTC) Received: by mail-pg1-f195.google.com with SMTP id 7so1327786pgr.2 for ; Sun, 23 Feb 2020 01:32:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=giI6+M2Chm3IDZDkYL5IltZHwDcoKcc/iH/6UHqzXm8=; b=vBUosExYG0Bx4Op5XOy4KQBb2B+rmHDG/LXHvH2UvgzZC2V401Lv34w9ZazigPvxTw tg+9XyYqaOv1G5gGJ74oWQag7JmkVMN6XerHA/OnHqvNivGUecFinUGp3XJQeTJmUpbE jLIDCk+RhudVvBnQG3oL2HrWpJrMViPWcuLXSPACwUkd6Tgdr5UX2L+WjcpTwrsRzIAg /FNO3Dskff1qZPVl636g2MO2S/8BvjKqjVcACJbJkwzMUB+b+YDu64+TWpgVgGj1g/xD YVte3D095X1M46PTzNITf1qhiWxnr+q5DynO5sEhG/rpB4IU/ZWrHZmyKIiJmolH+hpk 94AA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=giI6+M2Chm3IDZDkYL5IltZHwDcoKcc/iH/6UHqzXm8=; b=MP9v6x4UCdZlaUVdKxY5bipbfs6kQr0YTvlbO7lEGo5kEkYrgILcA5tPSFGFtDhd11 JqHl+4xk77UL6HqM/5SMqO55sVzg3b06X12pkLsI2Q9GCR0LUL9SbHK1fLo1b/Pu0awM b5NJum9+ClA456BUwh0p8V6RCp06M/UIlvZ17nYU8pSxarA6CosGlP2iAwgAr04NRE1f fOVDcS3ABNe6Vp88BhV1h64LVrXZeaz/vrfsIQdHXW7xLyMl1VeuXYNw5A9QudV+YpIa nxGSM+znuQOI/gTJ4sHrSCxb+ZHhfrMlpV5m8XL93QRjG3Hc0d8CYAw22ocXQe4cJkP/ 1AOQ== X-Gm-Message-State: APjAAAVU8uW32FCPeGEmTExnBTzErGjfK6GE4QLDyDg6HOotyDyXeIst LPyJAhRcbBZCX+MRJ7uidcM= X-Google-Smtp-Source: APXvYqyX93TTT1M0Z2jBEvgMvNj2tMfHBt97bP/oPO4mLBFgGAzEcfEwzZX93nVowTMuPj5XKzLaDQ== X-Received: by 2002:a63:18d:: with SMTP id 135mr21763280pgb.32.1582450324189; Sun, 23 Feb 2020 01:32:04 -0800 (PST) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id t19sm8346011pgg.23.2020.02.23.01.32.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 23 Feb 2020 01:32:03 -0800 (PST) From: Yafang Shao To: dchinner@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, guro@fb.com, akpm@linux-foundation.org, viro@zeniv.linux.org.uk Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Yafang Shao Subject: [PATCH v4 1/3] mm, list_lru: make memcg visible to lru walker isolation function Date: Sun, 23 Feb 2020 04:31:32 -0500 Message-Id: <1582450294-18038-2-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582450294-18038-1-git-send-email-laoar.shao@gmail.com> References: <1582450294-18038-1-git-send-email-laoar.shao@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The lru walker isolation function may use this memcg to do something, e.g. the inode isolatation function will use the memcg to do inode protection in followup patch. So make memcg visible to the lru walker isolation function. Something should be emphasized in this patch is it replaces for_each_memcg_cache_index() with for_each_mem_cgroup() in list_lru_walk_node(). Because there's a gap between these two MACROs that for_each_mem_cgroup() depends on CONFIG_MEMCG while the other one depends on CONFIG_MEMCG_KMEM. But as list_lru_memcg_aware() returns false if CONFIG_MEMCG_KMEM is not configured, it is safe to this replacement. Another difference between for_each_memcg_cache_index() and for_each_mem_cgroup() is that for_each_memcg_cache_index() excludes the root_mem_cgroup because its kmemcg_id is -1, while for_each_mem_cgroup() includes the root_mem_cgroup. So we need to skip the root_mem_cgroup explicitly in the for loop. Cc: Dave Chinner Signed-off-by: Yafang Shao --- include/linux/memcontrol.h | 21 +++++++++++++++++++++ mm/list_lru.c | 47 +++++++++++++++++++++++++++------------------- mm/memcontrol.c | 15 --------------- 3 files changed, 49 insertions(+), 34 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e8734da..6554284 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -445,6 +445,21 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *, int mem_cgroup_scan_tasks(struct mem_cgroup *, int (*)(struct task_struct *, void *), void *); +/* + * Iteration constructs for visiting all cgroups (under a tree). If + * loops are exited prematurely (break), mem_cgroup_iter_break() must + * be used for reference counting. + */ +#define for_each_mem_cgroup_tree(iter, root) \ + for (iter = mem_cgroup_iter(root, NULL, NULL); \ + iter != NULL; \ + iter = mem_cgroup_iter(root, iter, NULL)) + +#define for_each_mem_cgroup(iter) \ + for (iter = mem_cgroup_iter(NULL, NULL, NULL); \ + iter != NULL; \ + iter = mem_cgroup_iter(NULL, iter, NULL)) + static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg) { if (mem_cgroup_disabled()) @@ -945,6 +960,12 @@ static inline int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, return 0; } +#define for_each_mem_cgroup_tree(iter) \ + for (iter = NULL; iter; ) + +#define for_each_mem_cgroup(iter) \ + for (iter = NULL; iter; ) + static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg) { return 0; diff --git a/mm/list_lru.c b/mm/list_lru.c index 249468d..6fd6dfa 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -207,11 +207,11 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid) EXPORT_SYMBOL_GPL(list_lru_count_node); static unsigned long -__list_lru_walk_one(struct list_lru_node *nlru, int memcg_idx, +__list_lru_walk_one(struct list_lru_node *nlru, struct mem_cgroup *memcg, list_lru_walk_cb isolate, void *cb_arg, unsigned long *nr_to_walk) { - + int memcg_idx = memcg_cache_id(memcg); struct list_lru_one *l; struct list_head *item, *n; unsigned long isolated = 0; @@ -273,7 +273,7 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid) unsigned long ret; spin_lock(&nlru->lock); - ret = __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_arg, + ret = __list_lru_walk_one(nlru, memcg, isolate, cb_arg, nr_to_walk); spin_unlock(&nlru->lock); return ret; @@ -289,7 +289,7 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid) unsigned long ret; spin_lock_irq(&nlru->lock); - ret = __list_lru_walk_one(nlru, memcg_cache_id(memcg), isolate, cb_arg, + ret = __list_lru_walk_one(nlru, memcg, isolate, cb_arg, nr_to_walk); spin_unlock_irq(&nlru->lock); return ret; @@ -299,25 +299,34 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate, void *cb_arg, unsigned long *nr_to_walk) { - long isolated = 0; - int memcg_idx; + struct list_lru_node *nlru; + struct mem_cgroup *memcg; + long isolated; - isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg, - nr_to_walk); - if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) { - for_each_memcg_cache_index(memcg_idx) { - struct list_lru_node *nlru = &lru->node[nid]; + /* iterate the global lru first */ + isolated = list_lru_walk_one(lru, nid, NULL, isolate, cb_arg, + nr_to_walk); - spin_lock(&nlru->lock); - isolated += __list_lru_walk_one(nlru, memcg_idx, - isolate, cb_arg, - nr_to_walk); - spin_unlock(&nlru->lock); + if (!list_lru_memcg_aware(lru)) + goto out; - if (*nr_to_walk <= 0) - break; - } + nlru = &lru->node[nid]; + for_each_mem_cgroup(memcg) { + /* already scanned the root memcg above */ + if (mem_cgroup_is_root(memcg)) + continue; + + if (*nr_to_walk <= 0) + break; + + spin_lock(&nlru->lock); + isolated += __list_lru_walk_one(nlru, memcg, + isolate, cb_arg, + nr_to_walk); + spin_unlock(&nlru->lock); } + +out: return isolated; } EXPORT_SYMBOL_GPL(list_lru_walk_node); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 63bb6a2..e1c8c42 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -222,21 +222,6 @@ enum res_type { /* Used for OOM nofiier */ #define OOM_CONTROL (0) -/* - * Iteration constructs for visiting all cgroups (under a tree). If - * loops are exited prematurely (break), mem_cgroup_iter_break() must - * be used for reference counting. - */ -#define for_each_mem_cgroup_tree(iter, root) \ - for (iter = mem_cgroup_iter(root, NULL, NULL); \ - iter != NULL; \ - iter = mem_cgroup_iter(root, iter, NULL)) - -#define for_each_mem_cgroup(iter) \ - for (iter = mem_cgroup_iter(NULL, NULL, NULL); \ - iter != NULL; \ - iter = mem_cgroup_iter(NULL, iter, NULL)) - static inline bool should_force_charge(void) { return tsk_is_oom_victim(current) || fatal_signal_pending(current) || From patchwork Sun Feb 23 09:31:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11398749 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 49BBD930 for ; Sun, 23 Feb 2020 09:32:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 008C7208C4 for ; Sun, 23 Feb 2020 09:32:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bQCDECan" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 008C7208C4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 160346B0007; Sun, 23 Feb 2020 04:32:10 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0ECFF6B0008; Sun, 23 Feb 2020 04:32:10 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ECF826B000A; Sun, 23 Feb 2020 04:32:09 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0065.hostedemail.com [216.40.44.65]) by kanga.kvack.org (Postfix) with ESMTP id CB8656B0007 for ; Sun, 23 Feb 2020 04:32:09 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 78358824556B for ; Sun, 23 Feb 2020 09:32:09 +0000 (UTC) X-FDA: 76520875578.17.push89_2fe0000c53117 X-Spam-Summary: 2,0,0,dabc236ddd8ddd90,d41d8cd98f00b204,laoar.shao@gmail.com,,RULES_HIT:41:69:355:379:541:800:960:966:973:988:989:1260:1345:1359:1437:1535:1543:1711:1730:1747:1777:1792:2196:2198:2199:2200:2393:2559:2562:2693:2731:3138:3139:3140:3141:3142:3355:3865:3866:3867:3868:3870:3871:3872:4117:4321:4385:4605:5007:6261:6653:7514:7550:7875:7903:8660:9036:9413:10004:11026:11232:11658:11914:12048:12294:12296:12297:12517:12519:12555:13148:13161:13229:13230:13255:14096:14181:14394:14687:14721:14877:21080:21433:21444:21451:21627:21666:21972:21990:30004:30034:30054,0,RBL:209.85.216.68:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: push89_2fe0000c53117 X-Filterd-Recvd-Size: 6561 Received: from mail-pj1-f68.google.com (mail-pj1-f68.google.com [209.85.216.68]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Sun, 23 Feb 2020 09:32:08 +0000 (UTC) Received: by mail-pj1-f68.google.com with SMTP id r67so2736569pjb.0 for ; Sun, 23 Feb 2020 01:32:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=GJtxJvSc58vjyddnB/C/XHPYzi3+0cfBzGH9GFwf45I=; b=bQCDECanAhRGQQRTAd6yocWkVda+KDrS0dn/+9oCiTuHCbm0uZglkg4ccN2tNKHayN CFHLGLBWD8sHqlf/VBu3xQUT7cV4A9dJQx0GNq27ROSlqjPBOzsuDh+WqW5C5HyGNtRf qvud8z1QFxILnAqGmtHNnIcwd/FK9ZBFkJlJFveXRFwHBok7mzZa2FBIC9NdmVPjWOwQ tq6A5PHK25PzPf48f/jZ4h5aOYP2bJCkAgqcXe6ppf5c9pB0Ci2auXlrH/gHwD5JWo7l v9BurOb6Br/aPGeTaiJl70DKG8mZEFcdIHdBaikpdyOdloh8h4qsl2jBRs9TL7Mjgfk9 5q6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=GJtxJvSc58vjyddnB/C/XHPYzi3+0cfBzGH9GFwf45I=; b=IsaphUcHUkzZFDiBbF1y8EIQazhDXDBupsOWgtnQG1xgAeHx2utOSLFzICxvQMF8NZ Gb6J3yLTFvHAttzqk73XEOWEWLuEamQpHQBy5z0/gtWzflQIGj488K3PFmNIJAk5AbmR jy3dUExffKP7z75IofLvnsFj0MjxYk5EvpIhVjVcJCygM+PKCYfhozioKzCkhVp/3LI1 xLYd0LqRoUPEAJue6DSRQYzx0Eo1Qx13nvYHkvdgaV/dFyZG7vMyf3vlrGzySGVdDfUD rAF2t4x0J98im8lSg8oslr+lXm/osgvSS9OZGJoz99Rntp3B66xl7mws7WOE2OIMsyNi u2Ng== X-Gm-Message-State: APjAAAUCgiDhxdcTlUPwh0M6Dz87fAbLKSaLCteCMA2xrxR6pyQFux3s c+aBwc8/wiVEei6pJvpDbEQ= X-Google-Smtp-Source: APXvYqwv5eVkXNOBrbBo/0ywINZyJ5hpO+6pKnGtIr2ORGOQBfe3SJUAa9ZwaanqoTyVALmizu1sCg== X-Received: by 2002:a17:902:b206:: with SMTP id t6mr46058940plr.211.1582450327847; Sun, 23 Feb 2020 01:32:07 -0800 (PST) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id t19sm8346011pgg.23.2020.02.23.01.32.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 23 Feb 2020 01:32:07 -0800 (PST) From: Yafang Shao To: dchinner@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, guro@fb.com, akpm@linux-foundation.org, viro@zeniv.linux.org.uk Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Yafang Shao Subject: [PATCH v4 2/3] mm, shrinker: make memcg low reclaim visible to lru walker isolation function Date: Sun, 23 Feb 2020 04:31:33 -0500 Message-Id: <1582450294-18038-3-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582450294-18038-1-git-send-email-laoar.shao@gmail.com> References: <1582450294-18038-1-git-send-email-laoar.shao@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A new member memcg_low_reclaim is introduced in shrink_control struct, which is derived from scan_control struct, in order to tell the shrinker whether the reclaim session is under memcg low reclaim or not. The followup patch will use this new member. Cc: Dave Chinner Signed-off-by: Yafang Shao --- include/linux/shrinker.h | 3 +++ mm/vmscan.c | 27 ++++++++++++++++----------- 2 files changed, 19 insertions(+), 11 deletions(-) diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 0f80123..dc42ae5 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -31,6 +31,9 @@ struct shrink_control { /* current memcg being shrunk (for memcg aware shrinkers) */ struct mem_cgroup *memcg; + + /* derived from struct scan_control */ + bool memcg_low_reclaim; }; #define SHRINK_STOP (~0UL) diff --git a/mm/vmscan.c b/mm/vmscan.c index f14c8c6..c6e1ad8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -625,10 +625,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, /** * shrink_slab - shrink slab caches - * @gfp_mask: allocation context - * @nid: node whose slab caches to target * @memcg: memory cgroup whose slab caches to target - * @priority: the reclaim priority + * @sc: scan_control struct for this reclaim session + * @nid: node whose slab caches to target * * Call the shrink functions to age shrinkable caches. * @@ -638,15 +637,18 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, * @memcg specifies the memory cgroup to target. Unaware shrinkers * are called only if it is the root cgroup. * - * @priority is sc->priority, we take the number of objects and >> by priority - * in order to get the scan target. + * @sc is the scan_control struct, we take the number of objects + * and >> by sc->priority in order to get the scan target. * * Returns the number of reclaimed slab objects. */ -static unsigned long shrink_slab(gfp_t gfp_mask, int nid, - struct mem_cgroup *memcg, - int priority) +static unsigned long shrink_slab(struct mem_cgroup *memcg, + struct scan_control *sc, + int nid) { + bool memcg_low_reclaim = sc->memcg_low_reclaim; + gfp_t gfp_mask = sc->gfp_mask; + int priority = sc->priority; unsigned long ret, freed = 0; struct shrinker *shrinker; @@ -668,6 +670,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, .gfp_mask = gfp_mask, .nid = nid, .memcg = memcg, + .memcg_low_reclaim = memcg_low_reclaim, }; ret = do_shrink_slab(&sc, shrinker, priority); @@ -694,6 +697,9 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, void drop_slab_node(int nid) { unsigned long freed; + struct scan_control sc = { + .gfp_mask = GFP_KERNEL, + }; do { struct mem_cgroup *memcg = NULL; @@ -701,7 +707,7 @@ void drop_slab_node(int nid) freed = 0; memcg = mem_cgroup_iter(NULL, NULL, NULL); do { - freed += shrink_slab(GFP_KERNEL, nid, memcg, 0); + freed += shrink_slab(memcg, &sc, nid); } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL); } while (freed > 10); } @@ -2673,8 +2679,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) shrink_lruvec(lruvec, sc); - shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, - sc->priority); + shrink_slab(memcg, sc, pgdat->node_id); /* Record the group's reclaim efficiency */ vmpressure(sc->gfp_mask, memcg, false, From patchwork Sun Feb 23 09:31:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11398753 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED301930 for ; Sun, 23 Feb 2020 09:32:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ADBCB208C3 for ; Sun, 23 Feb 2020 09:32:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IISaJv5V" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ADBCB208C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 928786B0008; Sun, 23 Feb 2020 04:32:13 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8B2BE6B000A; Sun, 23 Feb 2020 04:32:13 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 77A996B000C; Sun, 23 Feb 2020 04:32:13 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id 614626B0008 for ; Sun, 23 Feb 2020 04:32:13 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0E2734DB1 for ; Sun, 23 Feb 2020 09:32:13 +0000 (UTC) X-FDA: 76520875746.27.beds61_306475b65744c X-Spam-Summary: 2,0,0,a7f351fcf77b1e36,d41d8cd98f00b204,laoar.shao@gmail.com,,RULES_HIT:2:41:355:379:541:800:960:965:966:973:988:989:1260:1345:1359:1437:1535:1605:1730:1747:1777:1792:1801:2196:2198:2199:2200:2393:2553:2559:2562:2693:2897:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4049:4120:4250:4321:4385:4390:4395:4470:4605:5007:6261:6653:7514:7903:9010:9121:9413:10004:11026:11232:11473:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12683:13161:13180:13229:14096:14394:14687:21080:21444:21450:21451:21611:21627:21666:21796:21990:30036:30054:30090,0,RBL:209.85.214.196:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.50.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: beds61_306475b65744c X-Filterd-Recvd-Size: 9066 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Sun, 23 Feb 2020 09:32:12 +0000 (UTC) Received: by mail-pl1-f196.google.com with SMTP id p11so2739582plq.10 for ; Sun, 23 Feb 2020 01:32:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=UPCguODQNHlafe6Q1lmQMxoxQpO6G2fyl/mROhgUGYE=; b=IISaJv5V/UKla+SEV1G+kAQgN76/AXuT7oFueJr0W5wOG0eAVr2UyRR/E8cie/iGsx YN2PEOLDHRSAsv5X9kh8Xy1GpjRwwW6KKKRWoZ+zs0aFirAKj0s0ML+coU/QNBRDT1j9 1F0PjQywKjQbL4GNulxMxHwZDatmkW3Y5fj7JT5U23cwqLpHZqoX9x3hX5JhhiTEmI9n BciESIc9pCxOiZdv9jVxcQqf1WDB+o2ajeldvw7txXopjsGm959bGrNpps2ecX8X9BIV o+PQV98f+Kt0VVuwNbTEQ0zcYYP4bnFfNG7m5FQ5VSVgyXurMZceZI9WmV0X/FYWdA+b Kizg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=UPCguODQNHlafe6Q1lmQMxoxQpO6G2fyl/mROhgUGYE=; b=StRB4kRV6z3wW2Dks0K1QDgTPwBoWVOEFjNHFWOQxQl/9I8OTHhqBJ7VQAk3q8lD2b TzOtLw3pQx+CPZOKwScQIA1a6epHEbvL9e+BoltaVYbU0+ZAomx/YB1rvBkET0mru6Uw QHsFXYslXB+OWN76rEgY5pij8BOQMQz6VHA9wL6FkkfsyLI6dO4H60dx7FT5VvnEyy1K Hw82RV+ZEskaROgkCY6FC12nAvswctPWwYbcVg4IvGpf/ScLYjTkywHamvpQLHX9Ell3 QG3T78XVf8UagA01umWCZkxzvmvRva1A/s4ocsOFu6XctTSyPONdXv717/2kH1OQ2Gyw Q+7A== X-Gm-Message-State: APjAAAUySNhYx1xYDDtGFlLwWsOoIG6pRXgTln/zl+HT6YMv5dcswBPm SOevISJsaHXkSmdQeIM/jtU= X-Google-Smtp-Source: APXvYqz7xMZz6ALe2Ma9WgplkMwhuaaRnU6/TRKfjbD9wssoNU8DPB1T/EsDFmWaSf9rCFRZW0peHA== X-Received: by 2002:a17:902:8bc7:: with SMTP id r7mr45076125plo.12.1582450331351; Sun, 23 Feb 2020 01:32:11 -0800 (PST) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id t19sm8346011pgg.23.2020.02.23.01.32.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 23 Feb 2020 01:32:10 -0800 (PST) From: Yafang Shao To: dchinner@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, guro@fb.com, akpm@linux-foundation.org, viro@zeniv.linux.org.uk Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Yafang Shao Subject: [PATCH v4 3/3] inode: protect page cache from freeing inode Date: Sun, 23 Feb 2020 04:31:34 -0500 Message-Id: <1582450294-18038-4-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582450294-18038-1-git-send-email-laoar.shao@gmail.com> References: <1582450294-18038-1-git-send-email-laoar.shao@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On my server there're some running MEMCGs protected by memory.{min, low}, but I found the usage of these MEMCGs abruptly became very small, which were far less than the protect limit. It confused me and finally I found that was because of inode stealing. Once an inode is freed, all its belonging page caches will be dropped as well, no matter how may page caches it has. So if we intend to protect the page caches in a memcg, we must protect their host (the inode) first. Otherwise the memcg protection can be easily bypassed with freeing inode, especially if there're big files in this memcg. Supposes we have a memcg, and the stat of this memcg is, memory.current = 1024M memory.min = 512M And in this memcg there's a inode with 800M page caches. Once this memcg is scanned by kswapd or other regular reclaimers, kswapd <<<< It can be either of the regular reclaimers. shrink_node_memcgs switch (mem_cgroup_protected()) <<<< Not protected case MEMCG_PROT_NONE: <<<< Will scan this memcg beak; shrink_lruvec() <<<< Reclaim the page caches shrink_slab() <<<< It may free this inode and drop all its page caches(800M). So we must protect the inode first if we want to protect page caches. Note that this inode may be a cold inode (in the tail of list lru), because memcg protection protects all slabs and page cache pages whatever they are cold or hot. IOW, this is a memcg-protection-specific issue. The inherent mismatch between memcg and inode is a trouble. One inode can be shared by different MEMCGs, but it is a very rare case. If an inode is shared, its belonging page caches may be charged to different MEMCGs. Currently there's no perfect solution to fix this kind of issue, but the inode majority-writer ownership switching can help it more or less. Cc: Dave Chinner Cc: Johannes Weiner Signed-off-by: Yafang Shao --- fs/inode.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 73 insertions(+), 3 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 7d57068..6373cd0 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -55,6 +55,12 @@ * inode_hash_lock */ +struct inode_isolate_control { + struct list_head *freeable; + struct mem_cgroup *memcg; /* derived from shrink_control */ + bool memcg_low_reclaim; /* derived from scan_control */ +}; + static unsigned int i_hash_mask __read_mostly; static unsigned int i_hash_shift __read_mostly; static struct hlist_head *inode_hashtable __read_mostly; @@ -714,6 +720,59 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) return busy; } +#ifdef CONFIG_MEMCG_KMEM +/* + * Once an inode is freed, all its belonging page caches will be dropped as + * well, even if there're lots of page caches. So if we intend to protect + * page caches in a memcg, we must protect their host(the inode) first. + * Otherwise the memcg protection can be easily bypassed with freeing inode, + * especially if there're big files in this memcg. + * Note that it may happen that the page caches are already charged to the + * memcg, but the inode hasn't been added to this memcg yet. In this case, + * this inode is not protected. + * The inherent mismatch between memcg and inode is a trouble. One inode + * can be shared by different MEMCGs, but it is a very rare case. If + * an inode is shared, its belonging page caches may be charged to + * different MEMCGs. Currently there's no perfect solution to fix this + * kind of issue, but the inode majority-writer ownership switching can + * help it more or less. + */ +static bool memcg_can_reclaim_inode(struct inode *inode, + struct inode_isolate_control *iic) +{ + unsigned long protection; + struct mem_cgroup *memcg; + bool reclaimable = true; + + if (!inode->i_data.nrpages) + goto out; + + /* Excludes freeing inode via drop_caches */ + if (!current->reclaim_state) + goto out; + + memcg = iic->memcg; + if (!memcg || memcg == root_mem_cgroup) + goto out; + + protection = mem_cgroup_protection(memcg, iic->memcg_low_reclaim); + if (!protection) + goto out; + + if (inode->i_data.nrpages) + reclaimable = false; + +out: + return reclaimable; +} +#else /* CONFIG_MEMCG_KMEM */ +static bool memcg_can_reclaim_inode(struct inode *inode, + struct inode_isolate_control *iic) +{ + return true; +} +#endif /* CONFIG_MEMCG_KMEM */ + /* * Isolate the inode from the LRU in preparation for freeing it. * @@ -732,8 +791,9 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) static enum lru_status inode_lru_isolate(struct list_head *item, struct list_lru_one *lru, spinlock_t *lru_lock, void *arg) { - struct list_head *freeable = arg; - struct inode *inode = container_of(item, struct inode, i_lru); + struct inode_isolate_control *iic = arg; + struct list_head *freeable = iic->freeable; + struct inode *inode = container_of(item, struct inode, i_lru); /* * we are inverting the lru lock/inode->i_lock here, so use a trylock. @@ -742,6 +802,11 @@ static enum lru_status inode_lru_isolate(struct list_head *item, if (!spin_trylock(&inode->i_lock)) return LRU_SKIP; + if (!memcg_can_reclaim_inode(inode, iic)) { + spin_unlock(&inode->i_lock); + return LRU_ROTATE; + } + /* * Referenced or dirty inodes are still in use. Give them another pass * through the LRU as we canot reclaim them now. @@ -799,9 +864,14 @@ long prune_icache_sb(struct super_block *sb, struct shrink_control *sc) { LIST_HEAD(freeable); long freed; + struct inode_isolate_control iic = { + .freeable = &freeable, + .memcg = sc->memcg, + .memcg_low_reclaim = sc->memcg_low_reclaim, + }; freed = list_lru_shrink_walk(&sb->s_inode_lru, sc, - inode_lru_isolate, &freeable); + inode_lru_isolate, &iic); dispose_list(&freeable); return freed; }