From patchwork Mon Dec 16 09:26:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11293635 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2DEDE184C for ; Mon, 16 Dec 2019 09:27:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EE5C5206D3 for ; Mon, 16 Dec 2019 09:27:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE5C5206D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 354648E0007; Mon, 16 Dec 2019 04:27:25 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 301E68E0003; Mon, 16 Dec 2019 04:27:25 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F14C8E0007; Mon, 16 Dec 2019 04:27:25 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 08CD78E0003 for ; Mon, 16 Dec 2019 04:27:25 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id ACAD9909B for ; Mon, 16 Dec 2019 09:27:24 +0000 (UTC) X-FDA: 76270476408.12.chin10_40989ab484c2b X-Spam-Summary: 2,0,0,f5d70290247c7312,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:willy@infradead.org:shakeelb@google.com:hannes@cmpxchg.org:alex.shi@linux.alibaba.com:yun.wang@linux.alibaba.com,RULES_HIT:41:355:379:541:800:960:966:968:973:988:989:1260:1261:1345:1359:1431:1437:1534:1542:1711:1730:1747:1777:1792:1801:2196:2199:2393:2559:2562:2898:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3872:4321:4385:4605:5007:6261:6737:8957:9010:9121:9149:9592:10004:11026:11473:11658:11914:12043:12048:12296:12297:12438:12555:12895:13161:13229:13846:14181:14394:14721:14915:21060:21080:21451:21627:21987:30054:30070,0,RBL:115.124.30.131:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:no t bulk,S X-HE-Tag: chin10_40989ab484c2b X-Filterd-Recvd-Size: 3843 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Mon, 16 Dec 2019 09:27:23 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R591e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04427;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0Tl3mBqP_1576488437; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0Tl3mBqP_1576488437) by smtp.aliyun-inc.com(127.0.0.1); Mon, 16 Dec 2019 17:27:17 +0800 From: Alex Shi To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, hannes@cmpxchg.org Cc: Alex Shi , yun.wang@linux.alibaba.com Subject: [PATCH v6 01/10] mm/vmscan: remove unnecessary lruvec adding Date: Mon, 16 Dec 2019 17:26:17 +0800 Message-Id: <1576488386-32544-2-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> References: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We don't have to add a freeable page into lru and then remove from it. This change saves a couple of actions and makes the moving more clear. The SetPageLRU needs to be kept here for list intergrity. Otherwise: #0 mave_pages_to_lru #1 release_pages if (put_page_testzero()) if (put_page_testzero()) !PageLRU //skip lru_lock list_add(&page->lru,); else list_add(&page->lru,) //corrupt Signed-off-by: Alex Shi Cc: Andrew Morton Cc: Johannes Weiner Cc: Hugh Dickins Cc: yun.wang@linux.alibaba.com Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/vmscan.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 74e8edce83ca..7de2bb126b40 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1852,26 +1852,18 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, while (!list_empty(list)) { page = lru_to_page(list); VM_BUG_ON_PAGE(PageLRU(page), page); + list_del(&page->lru); if (unlikely(!page_evictable(page))) { - list_del(&page->lru); spin_unlock_irq(&pgdat->lru_lock); putback_lru_page(page); spin_lock_irq(&pgdat->lru_lock); continue; } - lruvec = mem_cgroup_page_lruvec(page, pgdat); - SetPageLRU(page); - lru = page_lru(page); - - nr_pages = hpage_nr_pages(page); - update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); - list_move(&page->lru, &lruvec->lists[lru]); if (put_page_testzero(page)) { __ClearPageLRU(page); __ClearPageActive(page); - del_page_from_lru_list(page, lruvec, lru); if (unlikely(PageCompound(page))) { spin_unlock_irq(&pgdat->lru_lock); @@ -1880,6 +1872,12 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, } else list_add(&page->lru, &pages_to_free); } else { + lruvec = mem_cgroup_page_lruvec(page, pgdat); + lru = page_lru(page); + nr_pages = hpage_nr_pages(page); + + update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); + list_add(&page->lru, &lruvec->lists[lru]); nr_moved += nr_pages; } } From patchwork Mon Dec 16 09:26:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11293645 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 43A6813B6 for ; Mon, 16 Dec 2019 09:27:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D9C2B206D3 for ; Mon, 16 Dec 2019 09:27:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D9C2B206D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DD90C8E0003; Mon, 16 Dec 2019 04:27:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D88F98E000C; Mon, 16 Dec 2019 04:27:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA1FB8E0003; Mon, 16 Dec 2019 04:27:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0212.hostedemail.com [216.40.44.212]) by kanga.kvack.org (Postfix) with ESMTP id 783308E000B for ; Mon, 16 Dec 2019 04:27:28 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 3F51D9068 for ; Mon, 16 Dec 2019 09:27:28 +0000 (UTC) X-FDA: 76270476576.29.ducks37_40fae45a82e2c X-Spam-Summary: 2,0,0,ed7132eba4979f23,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:willy@infradead.org:shakeelb@google.com:hannes@cmpxchg.org:alex.shi@linux.alibaba.com:mhocko@kernel.org:vdavydov.dev@gmail.com:guro@fb.com:chris@chrisdown.name:tglx@linutronix.de:vbabka@suse.cz:cai@lca.pw:aryabinin@virtuozzo.com:kirill.shutemov@linux.intel.com:jglisse@redhat.com:aarcange@redhat.com:rientjes@google.com:aneesh.kumar@linux.ibm.com:swkhack@gmail.com:stefan.potyra@elektrobit.com:rppt@linux.vnet.ibm.com:sfr@canb.auug.org.au:colin.king@canonical.com:jgg@ziepe.ca:mchehab+samsung@kernel.org:peng.fan@nxp.com:nborisov@suse.com:ira.weiny@intel.com:ktkhai@virtuozzo.com:laoar.shao@gmail.com,RULES_HIT:69:152:327:355:379:541:960:966:967:973:988:989:1260:1261:1277:1311:1313:1314 :1345:13 X-HE-Tag: ducks37_40fae45a82e2c X-Filterd-Recvd-Size: 35056 Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com [115.124.30.44]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Mon, 16 Dec 2019 09:27:26 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04407;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=39;SR=0;TI=SMTPD_---0Tl3bwCZ_1576488437; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0Tl3bwCZ_1576488437) by smtp.aliyun-inc.com(127.0.0.1); Mon, 16 Dec 2019 17:27:18 +0800 From: Alex Shi To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, hannes@cmpxchg.org Cc: Alex Shi , Michal Hocko , Vladimir Davydov , Roman Gushchin , Chris Down , Thomas Gleixner , Vlastimil Babka , Qian Cai , Andrey Ryabinin , "Kirill A. Shutemov" , =?utf-8?b?SsOpcsO0?= =?utf-8?b?bWUgR2xpc3Nl?= , Andrea Arcangeli , David Rientjes , "Aneesh Kumar K.V" , swkhack , "Potyra, Stefan" , Mike Rapoport , Stephen Rothwell , Colin Ian King , Jason Gunthorpe , Mauro Carvalho Chehab , Peng Fan , Nikolay Borisov , Ira Weiny , Kirill Tkhai , Yafang Shao Subject: [PATCH v6 02/10] mm/lru: replace pgdat lru_lock with lruvec lock Date: Mon, 16 Dec 2019 17:26:18 +0800 Message-Id: <1576488386-32544-3-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> References: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patchset move lru_lock into lruvec, give a lru_lock for each of lruvec, thus bring a lru_lock for each of memcg per node. This is the main patch to replace per node lru_lock with per memcg lruvec lock. We introduces function lock_page_lruvec, which will lock the page's memcg and then memcg's lruvec->lru_lock. (Thanks Johannes Weiner, Hugh Dickins and Konstantin Khlebnikov suggestion/reminder on them) According to Daniel Jordan's suggestion, I run 208 'dd' with on 104 containers on a 2s * 26cores * HT box with a modefied case: https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice With this and later patches, the readtwice performance increases about 80% with containers, but w/o memcg the readtwice performance drops about 5%.(and another 5% drops with the last debug patch). Signed-off-by: Alex Shi Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Andrew Morton Cc: Roman Gushchin Cc: Shakeel Butt Cc: Chris Down Cc: Thomas Gleixner Cc: Mel Gorman Cc: Vlastimil Babka Cc: Qian Cai Cc: Andrey Ryabinin Cc: "Kirill A. Shutemov" Cc: "Jérôme Glisse" Cc: Andrea Arcangeli Cc: Yang Shi Cc: David Rientjes Cc: "Aneesh Kumar K.V" Cc: swkhack Cc: "Potyra, Stefan" Cc: Mike Rapoport Cc: Stephen Rothwell Cc: Colin Ian King Cc: Jason Gunthorpe Cc: Mauro Carvalho Chehab Cc: Matthew Wilcox Cc: Peng Fan Cc: Nikolay Borisov Cc: Ira Weiny Cc: Kirill Tkhai Cc: Yafang Shao Cc: Konstantin Khlebnikov Cc: Hugh Dickins Cc: Tejun Heo Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: cgroups@vger.kernel.org --- include/linux/memcontrol.h | 27 ++++++++++++++++ include/linux/mmzone.h | 2 ++ mm/compaction.c | 55 +++++++++++++++++++++----------- mm/huge_memory.c | 18 ++++------- mm/memcontrol.c | 67 ++++++++++++++++++++++++++++++--------- mm/mlock.c | 32 +++++++++---------- mm/mmzone.c | 1 + mm/page_idle.c | 7 ++-- mm/swap.c | 75 ++++++++++++++++++------------------------- mm/vmscan.c | 79 ++++++++++++++++++++++++++-------------------- 10 files changed, 217 insertions(+), 146 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a7a0a1a5c8d5..8389b9b927ef 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -417,6 +417,10 @@ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg, } struct lruvec *mem_cgroup_page_lruvec(struct page *, struct pglist_data *); +struct lruvec *lock_page_lruvec_irq(struct page *); +struct lruvec *lock_page_lruvec_irqsave(struct page *, unsigned long*); +void unlock_page_lruvec_irq(struct lruvec *); +void unlock_page_lruvec_irqrestore(struct lruvec *, unsigned long); struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); @@ -900,6 +904,29 @@ static inline struct lruvec *mem_cgroup_page_lruvec(struct page *page, { return &pgdat->__lruvec; } +#define lock_page_lruvec_irq(page) \ +({ \ + struct pglist_data *pgdat = page_pgdat(page); \ + spin_lock_irq(&pgdat->__lruvec.lru_lock); \ + &pgdat->__lruvec; \ +}) + +#define lock_page_lruvec_irqsave(page, flagsp) \ +({ \ + struct pglist_data *pgdat = page_pgdat(page); \ + spin_lock_irqsave(&pgdat->__lruvec.lru_lock, *flagsp); \ + &pgdat->__lruvec; \ +}) + +#define unlock_page_lruvec_irq(lruvec) \ +({ \ + spin_unlock_irq(&lruvec->lru_lock); \ +}) + +#define unlock_page_lruvec_irqrestore(lruvec, flags) \ +({ \ + spin_unlock_irqrestore(&lruvec->lru_lock, flags); \ +}) static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg) { diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 89d8ff06c9ce..c5455675acf2 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -311,6 +311,8 @@ struct lruvec { unsigned long refaults; /* Various lruvec state flags (enum lruvec_flags) */ unsigned long flags; + /* per lruvec lru_lock for memcg */ + spinlock_t lru_lock; #ifdef CONFIG_MEMCG struct pglist_data *pgdat; #endif diff --git a/mm/compaction.c b/mm/compaction.c index 672d3c78c6ab..8c0a2da217d8 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -786,7 +786,7 @@ static bool too_many_isolated(pg_data_t *pgdat) unsigned long nr_scanned = 0, nr_isolated = 0; struct lruvec *lruvec; unsigned long flags = 0; - bool locked = false; + struct lruvec *locked_lruvec = NULL; struct page *page = NULL, *valid_page = NULL; unsigned long start_pfn = low_pfn; bool skip_on_failure = false; @@ -846,11 +846,20 @@ static bool too_many_isolated(pg_data_t *pgdat) * contention, to give chance to IRQs. Abort completely if * a fatal signal is pending. */ - if (!(low_pfn % SWAP_CLUSTER_MAX) - && compact_unlock_should_abort(&pgdat->lru_lock, - flags, &locked, cc)) { - low_pfn = 0; - goto fatal_pending; + if (!(low_pfn % SWAP_CLUSTER_MAX)) { + if (locked_lruvec) { + unlock_page_lruvec_irqrestore(locked_lruvec, flags); + locked_lruvec = NULL; + } + + if (fatal_signal_pending(current)) { + cc->contended = true; + + low_pfn = 0; + goto fatal_pending; + } + + cond_resched(); } if (!pfn_valid_within(low_pfn)) @@ -919,10 +928,9 @@ static bool too_many_isolated(pg_data_t *pgdat) */ if (unlikely(__PageMovable(page)) && !PageIsolated(page)) { - if (locked) { - spin_unlock_irqrestore(&pgdat->lru_lock, - flags); - locked = false; + if (locked_lruvec) { + unlock_page_lruvec_irqrestore(locked_lruvec, flags); + locked_lruvec = NULL; } if (!isolate_movable_page(page, isolate_mode)) @@ -948,10 +956,20 @@ static bool too_many_isolated(pg_data_t *pgdat) if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) goto isolate_fail; + lruvec = mem_cgroup_page_lruvec(page, pgdat); + /* If we already hold the lock, we can skip some rechecking */ - if (!locked) { - locked = compact_lock_irqsave(&pgdat->lru_lock, - &flags, cc); + if (lruvec != locked_lruvec) { + struct mem_cgroup *memcg = lock_page_memcg(page); + + if (locked_lruvec) { + unlock_page_lruvec_irqrestore(locked_lruvec, flags); + locked_lruvec = NULL; + } + /* reget lruvec with a locked memcg */ + lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page)); + compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); + locked_lruvec = lruvec; /* Try get exclusive access under lock */ if (!skip_updated) { @@ -975,7 +993,6 @@ static bool too_many_isolated(pg_data_t *pgdat) } } - lruvec = mem_cgroup_page_lruvec(page, pgdat); /* Try isolate the page */ if (__isolate_lru_page(page, isolate_mode) != 0) @@ -1016,9 +1033,9 @@ static bool too_many_isolated(pg_data_t *pgdat) * page anyway. */ if (nr_isolated) { - if (locked) { - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - locked = false; + if (locked_lruvec) { + unlock_page_lruvec_irqrestore(locked_lruvec, flags); + locked_lruvec = NULL; } putback_movable_pages(&cc->migratepages); cc->nr_migratepages = 0; @@ -1043,8 +1060,8 @@ static bool too_many_isolated(pg_data_t *pgdat) low_pfn = end_pfn; isolate_abort: - if (locked) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + if (locked_lruvec) + unlock_page_lruvec_irqrestore(locked_lruvec, flags); /* * Updated the cached scanner pfn once the pageblock has been scanned diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 41a0fbddc96b..160c845290cf 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2495,17 +2495,13 @@ static void __split_huge_page_tail(struct page *head, int tail, } static void __split_huge_page(struct page *page, struct list_head *list, - pgoff_t end, unsigned long flags) + struct lruvec *lruvec, pgoff_t end, unsigned long flags) { struct page *head = compound_head(page); - pg_data_t *pgdat = page_pgdat(head); - struct lruvec *lruvec; struct address_space *swap_cache = NULL; unsigned long offset = 0; int i; - lruvec = mem_cgroup_page_lruvec(head, pgdat); - /* complete memcg works before add pages to LRU */ mem_cgroup_split_huge_fixup(head); @@ -2554,7 +2550,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, xa_unlock(&head->mapping->i_pages); } - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + unlock_page_lruvec_irqrestore(lruvec, flags); remap_page(head); @@ -2693,13 +2689,13 @@ bool can_split_huge_page(struct page *page, int *pextra_pins) int split_huge_page_to_list(struct page *page, struct list_head *list) { struct page *head = compound_head(page); - struct pglist_data *pgdata = NODE_DATA(page_to_nid(head)); struct deferred_split *ds_queue = get_deferred_split_queue(page); struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; + struct lruvec *lruvec; int count, mapcount, extra_pins, ret; bool mlocked; - unsigned long flags; + unsigned long uninitialized_var(flags); pgoff_t end; VM_BUG_ON_PAGE(is_huge_zero_page(page), page); @@ -2766,7 +2762,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) lru_add_drain(); /* prevent PageLRU to go away from under us, and freeze lru stats */ - spin_lock_irqsave(&pgdata->lru_lock, flags); + lruvec = lock_page_lruvec_irqsave(head, &flags); if (mapping) { XA_STATE(xas, &mapping->i_pages, page_index(head)); @@ -2797,7 +2793,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) } spin_unlock(&ds_queue->split_queue_lock); - __split_huge_page(page, list, end, flags); + __split_huge_page(page, list, lruvec, end, flags); if (PageSwapCache(head)) { swp_entry_t entry = { .val = page_private(head) }; @@ -2816,7 +2812,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) spin_unlock(&ds_queue->split_queue_lock); fail: if (mapping) xa_unlock(&mapping->i_pages); - spin_unlock_irqrestore(&pgdata->lru_lock, flags); + unlock_page_lruvec_irqrestore(lruvec, flags); remap_page(head); ret = -EBUSY; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c5b5f74cfd4d..f5d41ccd30e0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1217,7 +1217,7 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd goto out; } - memcg = page->mem_cgroup; + memcg = READ_ONCE(page->mem_cgroup); /* * Swapcache readahead pages are added to the LRU - and * possibly migrated - before they are charged. @@ -1238,6 +1238,42 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd return lruvec; } +struct lruvec *lock_page_lruvec_irq(struct page *page) +{ + struct lruvec *lruvec; + struct mem_cgroup *memcg; + + memcg = lock_page_memcg(page); + lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page)); + spin_lock_irq(&lruvec->lru_lock); + + return lruvec; +} + +struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags) +{ + struct lruvec *lruvec; + struct mem_cgroup *memcg; + + memcg = lock_page_memcg(page); + lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page)); + spin_lock_irqsave(&lruvec->lru_lock, *flags); + + return lruvec; +} + +void unlock_page_lruvec_irq(struct lruvec *lruvec) +{ + spin_unlock_irq(&lruvec->lru_lock); + __unlock_page_memcg(lruvec_memcg(lruvec)); +} + +void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, unsigned long flags) +{ + spin_unlock_irqrestore(&lruvec->lru_lock, flags); + __unlock_page_memcg(lruvec_memcg(lruvec)); +} + /** * mem_cgroup_update_lru_size - account for adding or removing an lru page * @lruvec: mem_cgroup per zone lru vector @@ -2570,41 +2606,42 @@ static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) css_put_many(&memcg->css, nr_pages); } -static void lock_page_lru(struct page *page, int *isolated) +static struct lruvec *lock_page_lru(struct page *page, int *isolated) { - pg_data_t *pgdat = page_pgdat(page); + struct lruvec *lruvec = lock_page_lruvec_irq(page); - spin_lock_irq(&pgdat->lru_lock); if (PageLRU(page)) { - struct lruvec *lruvec; - lruvec = mem_cgroup_page_lruvec(page, pgdat); ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_lru(page)); *isolated = 1; } else *isolated = 0; + + return lruvec; } -static void unlock_page_lru(struct page *page, int isolated) +static void unlock_page_lru(struct page *page, int isolated, + struct lruvec *locked_lruvec) { - pg_data_t *pgdat = page_pgdat(page); + struct lruvec *lruvec; - if (isolated) { - struct lruvec *lruvec; + unlock_page_lruvec_irq(locked_lruvec); + lruvec = lock_page_lruvec_irq(page); - lruvec = mem_cgroup_page_lruvec(page, pgdat); + if (isolated) { VM_BUG_ON_PAGE(PageLRU(page), page); SetPageLRU(page); add_page_to_lru_list(page, lruvec, page_lru(page)); } - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); } static void commit_charge(struct page *page, struct mem_cgroup *memcg, bool lrucare) { int isolated; + struct lruvec *lruvec; VM_BUG_ON_PAGE(page->mem_cgroup, page); @@ -2613,7 +2650,7 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg, * may already be on some other mem_cgroup's LRU. Take care of it. */ if (lrucare) - lock_page_lru(page, &isolated); + lruvec = lock_page_lru(page, &isolated); /* * Nobody should be changing or seriously looking at @@ -2632,7 +2669,7 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg, page->mem_cgroup = memcg; if (lrucare) - unlock_page_lru(page, isolated); + unlock_page_lru(page, isolated, lruvec); } #ifdef CONFIG_MEMCG_KMEM @@ -2928,7 +2965,7 @@ void __memcg_kmem_uncharge(struct page *page, int order) /* * Because tail pages are not marked as "used", set it. We're under - * pgdat->lru_lock and migration entries setup in all page mappings. + * lruvec->lru_lock and migration entries setup in all page mappings. */ void mem_cgroup_split_huge_fixup(struct page *head) { diff --git a/mm/mlock.c b/mm/mlock.c index a72c1eeded77..10d15f58b061 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -106,12 +106,10 @@ void mlock_vma_page(struct page *page) * Isolate a page from LRU with optional get_page() pin. * Assumes lru_lock already held and page already pinned. */ -static bool __munlock_isolate_lru_page(struct page *page, bool getpage) +static bool __munlock_isolate_lru_page(struct page *page, + struct lruvec *lruvec, bool getpage) { if (PageLRU(page)) { - struct lruvec *lruvec; - - lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); if (getpage) get_page(page); ClearPageLRU(page); @@ -182,7 +180,7 @@ static void __munlock_isolation_failed(struct page *page) unsigned int munlock_vma_page(struct page *page) { int nr_pages; - pg_data_t *pgdat = page_pgdat(page); + struct lruvec *lruvec; /* For try_to_munlock() and to serialize with page migration */ BUG_ON(!PageLocked(page)); @@ -194,7 +192,7 @@ unsigned int munlock_vma_page(struct page *page) * might otherwise copy PageMlocked to part of the tail pages before * we clear it in the head page. It also stabilizes hpage_nr_pages(). */ - spin_lock_irq(&pgdat->lru_lock); + lruvec = lock_page_lruvec_irq(page); if (!TestClearPageMlocked(page)) { /* Potentially, PTE-mapped THP: do not skip the rest PTEs */ @@ -205,15 +203,15 @@ unsigned int munlock_vma_page(struct page *page) nr_pages = hpage_nr_pages(page); __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); - if (__munlock_isolate_lru_page(page, true)) { - spin_unlock_irq(&pgdat->lru_lock); + if (__munlock_isolate_lru_page(page, lruvec, true)) { + unlock_page_lruvec_irq(lruvec); __munlock_isolated_page(page); goto out; } __munlock_isolation_failed(page); unlock_out: - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); out: return nr_pages - 1; @@ -291,28 +289,29 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) { int i; int nr = pagevec_count(pvec); - int delta_munlocked = -nr; struct pagevec pvec_putback; + struct lruvec *lruvec = NULL; int pgrescued = 0; pagevec_init(&pvec_putback); /* Phase 1: page isolation */ - spin_lock_irq(&zone->zone_pgdat->lru_lock); for (i = 0; i < nr; i++) { struct page *page = pvec->pages[i]; + lruvec = lock_page_lruvec_irq(page); + if (TestClearPageMlocked(page)) { /* * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - if (__munlock_isolate_lru_page(page, false)) + if (__munlock_isolate_lru_page(page, lruvec, false)) { + __mod_zone_page_state(zone, NR_MLOCK, -1); + unlock_page_lruvec_irq(lruvec); continue; - else + } else __munlock_isolation_failed(page); - } else { - delta_munlocked++; } /* @@ -323,9 +322,8 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) */ pagevec_add(&pvec_putback, pvec->pages[i]); pvec->pages[i] = NULL; + unlock_page_lruvec_irq(lruvec); } - __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); - spin_unlock_irq(&zone->zone_pgdat->lru_lock); /* Now we can release pins of pages that we are not munlocking */ pagevec_release(&pvec_putback); diff --git a/mm/mmzone.c b/mm/mmzone.c index 4686fdc23bb9..3750a90ed4a0 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -91,6 +91,7 @@ void lruvec_init(struct lruvec *lruvec) enum lru_list lru; memset(lruvec, 0, sizeof(struct lruvec)); + spin_lock_init(&lruvec->lru_lock); for_each_lru(lru) INIT_LIST_HEAD(&lruvec->lists[lru]); diff --git a/mm/page_idle.c b/mm/page_idle.c index 295512465065..d2d868ca2bf7 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -31,7 +31,7 @@ static struct page *page_idle_get_page(unsigned long pfn) { struct page *page; - pg_data_t *pgdat; + struct lruvec *lruvec; if (!pfn_valid(pfn)) return NULL; @@ -41,13 +41,12 @@ static struct page *page_idle_get_page(unsigned long pfn) !get_page_unless_zero(page)) return NULL; - pgdat = page_pgdat(page); - spin_lock_irq(&pgdat->lru_lock); + lruvec = lock_page_lruvec_irq(page); if (unlikely(!PageLRU(page))) { put_page(page); page = NULL; } - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); return page; } diff --git a/mm/swap.c b/mm/swap.c index 5341ae93861f..97e108be4f92 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -60,16 +60,14 @@ static void __page_cache_release(struct page *page) { if (PageLRU(page)) { - pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; - unsigned long flags; + unsigned long flags = 0; - spin_lock_irqsave(&pgdat->lru_lock, flags); - lruvec = mem_cgroup_page_lruvec(page, pgdat); + lruvec = lock_page_lruvec_irqsave(page, &flags); VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + unlock_page_lruvec_irqrestore(lruvec, flags); } __ClearPageWaiters(page); } @@ -192,26 +190,18 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, void *arg) { int i; - struct pglist_data *pgdat = NULL; - struct lruvec *lruvec; + struct lruvec *lruvec = NULL; unsigned long flags = 0; for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pagepgdat = page_pgdat(page); - if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - pgdat = pagepgdat; - spin_lock_irqsave(&pgdat->lru_lock, flags); - } + lruvec = lock_page_lruvec_irqsave(page, &flags); - lruvec = mem_cgroup_page_lruvec(page, pgdat); (*move_fn)(page, lruvec, arg); + unlock_page_lruvec_irqrestore(lruvec, flags); } - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + release_pages(pvec->pages, pvec->nr); pagevec_reinit(pvec); } @@ -324,12 +314,12 @@ static inline void activate_page_drain(int cpu) void activate_page(struct page *page) { - pg_data_t *pgdat = page_pgdat(page); + struct lruvec *lruvec; page = compound_head(page); - spin_lock_irq(&pgdat->lru_lock); - __activate_page(page, mem_cgroup_page_lruvec(page, pgdat), NULL); - spin_unlock_irq(&pgdat->lru_lock); + lruvec = lock_page_lruvec_irq(page); + __activate_page(page, lruvec, NULL); + unlock_page_lruvec_irq(lruvec); } #endif @@ -780,8 +770,7 @@ void release_pages(struct page **pages, int nr) { int i; LIST_HEAD(pages_to_free); - struct pglist_data *locked_pgdat = NULL; - struct lruvec *lruvec; + struct lruvec *lruvec = NULL; unsigned long uninitialized_var(flags); unsigned int uninitialized_var(lock_batch); @@ -791,21 +780,20 @@ void release_pages(struct page **pages, int nr) /* * Make sure the IRQ-safe lock-holding time does not get * excessive with a continuous string of pages from the - * same pgdat. The lock is held only if pgdat != NULL. + * same lruvec. The lock is held only if lruvec != NULL. */ - if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); - locked_pgdat = NULL; + if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; } if (is_huge_zero_page(page)) continue; if (is_zone_device_page(page)) { - if (locked_pgdat) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, - flags); - locked_pgdat = NULL; + if (lruvec) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; } /* * ZONE_DEVICE pages that return 'false' from @@ -822,27 +810,24 @@ void release_pages(struct page **pages, int nr) continue; if (PageCompound(page)) { - if (locked_pgdat) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); - locked_pgdat = NULL; + if (lruvec) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; } __put_compound_page(page); continue; } if (PageLRU(page)) { - struct pglist_data *pgdat = page_pgdat(page); + struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (pgdat != locked_pgdat) { - if (locked_pgdat) - spin_unlock_irqrestore(&locked_pgdat->lru_lock, - flags); + if (new_lruvec != lruvec) { + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); lock_batch = 0; - locked_pgdat = pgdat; - spin_lock_irqsave(&locked_pgdat->lru_lock, flags); + lruvec = lock_page_lruvec_irqsave(page, &flags); } - lruvec = mem_cgroup_page_lruvec(page, locked_pgdat); VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); @@ -854,8 +839,8 @@ void release_pages(struct page **pages, int nr) list_add(&page->lru, &pages_to_free); } - if (locked_pgdat) - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); mem_cgroup_uncharge_list(&pages_to_free); free_unref_page_list(&pages_to_free); @@ -893,7 +878,7 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, VM_BUG_ON_PAGE(!PageHead(page), page); VM_BUG_ON_PAGE(PageCompound(page_tail), page); VM_BUG_ON_PAGE(PageLRU(page_tail), page); - lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); + lockdep_assert_held(&lruvec->lru_lock); if (!list) SetPageLRU(page_tail); diff --git a/mm/vmscan.c b/mm/vmscan.c index 7de2bb126b40..017901112789 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1766,11 +1766,9 @@ int isolate_lru_page(struct page *page) WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); if (PageLRU(page)) { - pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; - spin_lock_irq(&pgdat->lru_lock); - lruvec = mem_cgroup_page_lruvec(page, pgdat); + lruvec = lock_page_lruvec_irq(page); if (PageLRU(page)) { int lru = page_lru(page); get_page(page); @@ -1778,7 +1776,7 @@ int isolate_lru_page(struct page *page) del_page_from_lru_list(page, lruvec, lru); ret = 0; } - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); } return ret; } @@ -1843,7 +1841,6 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, struct list_head *list) { - struct pglist_data *pgdat = lruvec_pgdat(lruvec); int nr_pages, nr_moved = 0; LIST_HEAD(pages_to_free); struct page *page; @@ -1854,9 +1851,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, VM_BUG_ON_PAGE(PageLRU(page), page); list_del(&page->lru); if (unlikely(!page_evictable(page))) { - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); putback_lru_page(page); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); continue; } SetPageLRU(page); @@ -1866,19 +1863,35 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, __ClearPageActive(page); if (unlikely(PageCompound(page))) { - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); (*get_compound_page_dtor(page))(page); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); } else list_add(&page->lru, &pages_to_free); } else { - lruvec = mem_cgroup_page_lruvec(page, pgdat); + struct mem_cgroup *memcg = lock_page_memcg(page); + struct lruvec *plv; + bool relocked = false; + + plv = mem_cgroup_lruvec(memcg, page_pgdat(page)); + /* page's lruvec changed in memcg moving */ + if (plv != lruvec) { + spin_unlock_irq(&lruvec->lru_lock); + spin_lock_irq(&plv->lru_lock); + relocked = true; + } + lru = page_lru(page); nr_pages = hpage_nr_pages(page); - - update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); - list_add(&page->lru, &lruvec->lists[lru]); + update_lru_size(plv, lru, page_zonenum(page), nr_pages); + list_add(&page->lru, &plv->lists[lru]); nr_moved += nr_pages; + + if (relocked) { + spin_unlock_irq(&plv->lru_lock); + spin_lock_irq(&lruvec->lru_lock); + } + __unlock_page_memcg(memcg); } } @@ -1937,7 +1950,7 @@ static int current_may_throttle(void) lru_add_drain(); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, &nr_scanned, sc, lru); @@ -1949,15 +1962,14 @@ static int current_may_throttle(void) if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_scanned); __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned); - spin_unlock_irq(&pgdat->lru_lock); - + spin_unlock_irq(&lruvec->lru_lock); if (nr_taken == 0) return 0; nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, 0, &stat, false); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; if (!cgroup_reclaim(sc)) @@ -1970,7 +1982,7 @@ static int current_may_throttle(void) __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_list(&page_list); free_unref_page_list(&page_list); @@ -2023,7 +2035,7 @@ static void shrink_active_list(unsigned long nr_to_scan, lru_add_drain(); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, lru); @@ -2034,7 +2046,7 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_vm_events(PGREFILL, nr_scanned); __count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); while (!list_empty(&l_hold)) { cond_resched(); @@ -2080,7 +2092,7 @@ static void shrink_active_list(unsigned long nr_to_scan, /* * Move pages back to the lru list. */ - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); /* * Count referenced pages from currently used mappings as rotated, * even though only some of them are actually re-activated. This @@ -2098,7 +2110,7 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_list(&l_active); free_unref_page_list(&l_active); @@ -2325,7 +2337,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) + lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) { reclaim_stat->recent_scanned[0] /= 2; reclaim_stat->recent_rotated[0] /= 2; @@ -2346,7 +2358,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, fp = file_prio * (reclaim_stat->recent_scanned[1] + 1); fp /= reclaim_stat->recent_rotated[1] + 1; - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); fraction[0] = ap; fraction[1] = fp; @@ -4324,24 +4336,21 @@ int page_evictable(struct page *page) */ void check_move_unevictable_pages(struct pagevec *pvec) { - struct lruvec *lruvec; - struct pglist_data *pgdat = NULL; + struct lruvec *lruvec = NULL; int pgscanned = 0; int pgrescued = 0; int i; for (i = 0; i < pvec->nr; i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pagepgdat = page_pgdat(page); + struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); pgscanned++; - if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irq(&pgdat->lru_lock); - pgdat = pagepgdat; - spin_lock_irq(&pgdat->lru_lock); + if (lruvec != new_lruvec) { + if (lruvec) + unlock_page_lruvec_irq(lruvec); + lruvec = lock_page_lruvec_irq(page); } - lruvec = mem_cgroup_page_lruvec(page, pgdat); if (!PageLRU(page) || !PageUnevictable(page)) continue; @@ -4357,10 +4366,10 @@ void check_move_unevictable_pages(struct pagevec *pvec) } } - if (pgdat) { + if (lruvec) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); } } EXPORT_SYMBOL_GPL(check_move_unevictable_pages); From patchwork Mon Dec 16 09:26:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11293637 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D4AF613B6 for ; Mon, 16 Dec 2019 09:27:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A9DBF20725 for ; Mon, 16 Dec 2019 09:27:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A9DBF20725 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 12EE18E0008; Mon, 16 Dec 2019 04:27:26 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 06BBD8E0003; Mon, 16 Dec 2019 04:27:26 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9AA18E0008; Mon, 16 Dec 2019 04:27:25 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0104.hostedemail.com [216.40.44.104]) by kanga.kvack.org (Postfix) with ESMTP id D2FF48E0003 for ; Mon, 16 Dec 2019 04:27:25 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 993AA181AC1E9 for ; Mon, 16 Dec 2019 09:27:25 +0000 (UTC) X-FDA: 76270476450.09.women83_40b0639020727 X-Spam-Summary: 2,0,0,150eab5a5d234a5c,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:willy@infradead.org:shakeelb@google.com:hannes@cmpxchg.org:alex.shi@linux.alibaba.com:mhocko@kernel.org:vdavydov.dev@gmail.com:guro@fb.com:chris@chrisdown.name:tglx@linutronix.de:vbabka@suse.cz:aryabinin@virtuozzo.com:swkhack@gmail.com:stefan.potyra@elektrobit.com:jgg@ziepe.ca:mchehab+samsung@kernel.org:peng.fan@nxp.com:nborisov@suse.com:ira.weiny@intel.com:ktkhai@virtuozzo.com:laoar.shao@gmail.com,RULES_HIT:41:355:379:541:800:960:973:981:988:989:1260:1261:1345:1359:1431:1437:1535:1543:1711:1730:1747:1777:1792:2198:2199:2393:2553:2559:2562:2898:3138:3139:3140:3141:3142:3167:3354:3865:3868:3871:4321:4605:5007:6261:6737:6742:7514:8957:9207:10004:11026:11473:11658:11914:12043:12048 :12291:1 X-HE-Tag: women83_40b0639020727 X-Filterd-Recvd-Size: 5682 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by imf26.hostedemail.com (Postfix) with ESMTP for ; Mon, 16 Dec 2019 09:27:24 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R661e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01422;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=30;SR=0;TI=SMTPD_---0Tl4JZ2F_1576488438; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0Tl4JZ2F_1576488438) by smtp.aliyun-inc.com(127.0.0.1); Mon, 16 Dec 2019 17:27:18 +0800 From: Alex Shi To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, hannes@cmpxchg.org Cc: Alex Shi , Michal Hocko , Vladimir Davydov , Roman Gushchin , Chris Down , Thomas Gleixner , Vlastimil Babka , Andrey Ryabinin , swkhack , "Potyra, Stefan" , Jason Gunthorpe , Mauro Carvalho Chehab , Peng Fan , Nikolay Borisov , Ira Weiny , Kirill Tkhai , Yafang Shao Subject: [PATCH v6 03/10] mm/lru: introduce the relock_page_lruvec function Date: Mon, 16 Dec 2019 17:26:19 +0800 Message-Id: <1576488386-32544-4-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> References: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: During the lruvec locking, a new page's lruvec is may same as previous one. Thus we could save a re-locking, and only change lock iff lruvec is new. Function named relock_page_lruvec following Hugh Dickins' patch. Signed-off-by: Alex Shi Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Andrew Morton Cc: Roman Gushchin Cc: Shakeel Butt Cc: Chris Down Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: Andrey Ryabinin Cc: swkhack Cc: "Potyra, Stefan" Cc: Jason Gunthorpe Cc: Matthew Wilcox Cc: Mauro Carvalho Chehab Cc: Peng Fan Cc: Nikolay Borisov Cc: Ira Weiny Cc: Kirill Tkhai Cc: Yang Shi Cc: Yafang Shao Cc: Mel Gorman Cc: Konstantin Khlebnikov Cc: Hugh Dickins Cc: Tejun Heo Cc: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org --- include/linux/memcontrol.h | 36 ++++++++++++++++++++++++++++++++++++ mm/vmscan.c | 8 ++------ 2 files changed, 38 insertions(+), 6 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 8389b9b927ef..09e861df48e8 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1299,6 +1299,42 @@ static inline void dec_lruvec_page_state(struct page *page, mod_lruvec_page_state(page, idx, -1); } +/* Don't lock again iff page's lruvec locked */ +static inline struct lruvec *relock_page_lruvec_irq(struct page *page, + struct lruvec *locked_lruvec) +{ + struct pglist_data *pgdat = page_pgdat(page); + struct lruvec *lruvec; + + lruvec = mem_cgroup_page_lruvec(page, pgdat); + + if (likely(locked_lruvec == lruvec)) + return lruvec; + + if (unlikely(locked_lruvec)) + unlock_page_lruvec_irq(locked_lruvec); + + return lock_page_lruvec_irq(page); +} + +/* Don't lock again iff page's lruvec locked */ +static inline struct lruvec *relock_page_lruvec_irqsave(struct page *page, + struct lruvec *locked_lruvec, unsigned long *flags) +{ + struct pglist_data *pgdat = page_pgdat(page); + struct lruvec *lruvec; + + lruvec = mem_cgroup_page_lruvec(page, pgdat); + + if (likely(locked_lruvec == lruvec)) + return lruvec; + + if (unlikely(locked_lruvec)) + unlock_page_lruvec_irqrestore(locked_lruvec, *flags); + + return lock_page_lruvec_irqsave(page, flags); +} + #ifdef CONFIG_CGROUP_WRITEBACK struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb); diff --git a/mm/vmscan.c b/mm/vmscan.c index 017901112789..be8bca45f7c6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4343,14 +4343,10 @@ void check_move_unevictable_pages(struct pagevec *pvec) for (i = 0; i < pvec->nr; i++) { struct page *page = pvec->pages[i]; - struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); pgscanned++; - if (lruvec != new_lruvec) { - if (lruvec) - unlock_page_lruvec_irq(lruvec); - lruvec = lock_page_lruvec_irq(page); - } + + lruvec = relock_page_lruvec_irq(page, lruvec); if (!PageLRU(page) || !PageUnevictable(page)) continue; From patchwork Mon Dec 16 09:26:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11293641 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4E587112B for ; Mon, 16 Dec 2019 09:27:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1B3B8206D3 for ; Mon, 16 Dec 2019 09:27:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1B3B8206D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9BBF38E000A; Mon, 16 Dec 2019 04:27:27 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 947298E0003; Mon, 16 Dec 2019 04:27:27 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 85A238E000A; Mon, 16 Dec 2019 04:27:27 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0095.hostedemail.com [216.40.44.95]) by kanga.kvack.org (Postfix) with ESMTP id 6FA438E0003 for ; Mon, 16 Dec 2019 04:27:27 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 3D6389068 for ; Mon, 16 Dec 2019 09:27:27 +0000 (UTC) X-FDA: 76270476534.28.metal25_40f7db0d3842b X-Spam-Summary: 2,0,0,d5a845ab9ebc20b8,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:willy@infradead.org:shakeelb@google.com:hannes@cmpxchg.org:alex.shi@linux.alibaba.com,RULES_HIT:41:355:379:541:800:960:973:981:988:989:1260:1261:1345:1359:1431:1437:1534:1542:1711:1730:1747:1777:1792:2393:2553:2559:2562:2898:3138:3139:3140:3141:3142:3353:3865:3866:3868:3870:3871:3874:4321:5007:6261:6737:7901:7903:8957:10004:11026:11658:11914:12043:12048:12296:12297:12438:12555:12895:13846:14181:14394:14721:14915:21060:21080:21451:21627:21987:30054:30070:30090,0,RBL:115.124.30.54:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMM ARY:none X-HE-Tag: metal25_40f7db0d3842b X-Filterd-Recvd-Size: 3553 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Mon, 16 Dec 2019 09:27:26 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R281e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0Tl3i.eD_1576488438; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0Tl3i.eD_1576488438) by smtp.aliyun-inc.com(127.0.0.1); Mon, 16 Dec 2019 17:27:19 +0800 From: Alex Shi To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, hannes@cmpxchg.org Cc: Alex Shi Subject: [PATCH v6 04/10] mm/mlock: optimize munlock_pagevec by relocking Date: Mon, 16 Dec 2019 17:26:20 +0800 Message-Id: <1576488386-32544-5-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> References: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: During the pagevec locking, a new page's lruvec is may same as previous one. Thus we could save a re-locking, and only change lock iff lruvec is newer. Signed-off-by: Alex Shi Cc: Johannes Weiner Cc: Hugh Dickins Cc: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Cc: Andrew Morton --- mm/mlock.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/mm/mlock.c b/mm/mlock.c index 10d15f58b061..050f999eadb1 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -289,6 +289,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) { int i; int nr = pagevec_count(pvec); + int delta_munlocked = -nr; struct pagevec pvec_putback; struct lruvec *lruvec = NULL; int pgrescued = 0; @@ -299,20 +300,19 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) for (i = 0; i < nr; i++) { struct page *page = pvec->pages[i]; - lruvec = lock_page_lruvec_irq(page); + lruvec = relock_page_lruvec_irq(page, lruvec); if (TestClearPageMlocked(page)) { /* * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - if (__munlock_isolate_lru_page(page, lruvec, false)) { - __mod_zone_page_state(zone, NR_MLOCK, -1); - unlock_page_lruvec_irq(lruvec); + if (__munlock_isolate_lru_page(page, lruvec, false)) continue; - } else + else __munlock_isolation_failed(page); - } + } else + delta_munlocked++; /* * We won't be munlocking this page in the next phase @@ -322,8 +322,10 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) */ pagevec_add(&pvec_putback, pvec->pages[i]); pvec->pages[i] = NULL; - unlock_page_lruvec_irq(lruvec); } + __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); + if (lruvec) + unlock_page_lruvec_irq(lruvec); /* Now we can release pins of pages that we are not munlocking */ pagevec_release(&pvec_putback); From patchwork Mon Dec 16 09:26:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11293651 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DDC6813B6 for ; Mon, 16 Dec 2019 09:27:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B3EE6206D3 for ; Mon, 16 Dec 2019 09:27:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B3EE6206D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DB0F68E000F; Mon, 16 Dec 2019 04:27:55 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D3A648E000C; Mon, 16 Dec 2019 04:27:55 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C50B88E000F; Mon, 16 Dec 2019 04:27:55 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188]) by kanga.kvack.org (Postfix) with ESMTP id AD1ED8E000C for ; Mon, 16 Dec 2019 04:27:55 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 776299402 for ; Mon, 16 Dec 2019 09:27:55 +0000 (UTC) X-FDA: 76270477710.25.cake11_451474be92657 X-Spam-Summary: 2,0,0,df0f56c5ce48296d,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:willy@infradead.org:shakeelb@google.com:hannes@cmpxchg.org:alex.shi@linux.alibaba.com:tglx@linutronix.de:mchehab+samsung@kernel.org:laoar.shao@gmail.com,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1345:1359:1431:1437:1534:1541:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3352:3867:3870:3872:5007:6261:6737:7514:8957:9207:10004:11026:11473:11658:11914:12043:12048:12296:12297:12438:12555:12895:13069:13255:13311:13357:13846:14096:14181:14384:14394:14721:14915:21060:21080:21451:21627:21987:30054:30070,0,RBL:115.124.30.42:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:no t bulk,S X-HE-Tag: cake11_451474be92657 X-Filterd-Recvd-Size: 3514 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Mon, 16 Dec 2019 09:27:54 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07488;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=17;SR=0;TI=SMTPD_---0Tl3Q8b2_1576488439; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0Tl3Q8b2_1576488439) by smtp.aliyun-inc.com(127.0.0.1); Mon, 16 Dec 2019 17:27:19 +0800 From: Alex Shi To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, hannes@cmpxchg.org Cc: Alex Shi , Thomas Gleixner , Mauro Carvalho Chehab , Yafang Shao Subject: [PATCH v6 05/10] mm/swap: only change the lru_lock iff page's lruvec is different Date: Mon, 16 Dec 2019 17:26:21 +0800 Message-Id: <1576488386-32544-6-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> References: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since we introduced relock_page_lruvec, we could use it in more place to reduce spin_locks. Signed-off-by: Alex Shi Cc: Johannes Weiner Cc: Andrew Morton Cc: Thomas Gleixner Cc: Matthew Wilcox Cc: Mauro Carvalho Chehab Cc: Yafang Shao Cc: Mel Gorman Cc: Konstantin Khlebnikov Cc: Hugh Dickins Cc: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org --- mm/swap.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index 97e108be4f92..84a845968e1d 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -196,11 +196,12 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - lruvec = lock_page_lruvec_irqsave(page, &flags); + lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags); (*move_fn)(page, lruvec, arg); - unlock_page_lruvec_irqrestore(lruvec, flags); } + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); release_pages(pvec->pages, pvec->nr); pagevec_reinit(pvec); @@ -819,14 +820,11 @@ void release_pages(struct page **pages, int nr) } if (PageLRU(page)) { - struct lruvec *new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + struct lruvec *pre_lruvec = lruvec; - if (new_lruvec != lruvec) { - if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags); + if (pre_lruvec != lruvec) lock_batch = 0; - lruvec = lock_page_lruvec_irqsave(page, &flags); - } VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); From patchwork Mon Dec 16 09:26:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11293639 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9227A112B for ; Mon, 16 Dec 2019 09:27:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 68520206D3 for ; Mon, 16 Dec 2019 09:27:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 68520206D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 415E08E0009; Mon, 16 Dec 2019 04:27:27 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3C9A18E0003; Mon, 16 Dec 2019 04:27:27 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 268168E0009; Mon, 16 Dec 2019 04:27:27 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0159.hostedemail.com [216.40.44.159]) by kanga.kvack.org (Postfix) with ESMTP id 1193E8E0003 for ; Mon, 16 Dec 2019 04:27:27 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id BF1F88249980 for ; Mon, 16 Dec 2019 09:27:26 +0000 (UTC) X-FDA: 76270476492.29.women05_40e7e9ada1c2a X-Spam-Summary: 2,0,0,84d68cf58583398b,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:willy@infradead.org:shakeelb@google.com:hannes@cmpxchg.org:alex.shi@linux.alibaba.com:vbabka@suse.cz:dan.j.williams@intel.com:mhocko@suse.com:richard.weiyang@gmail.com:arunks@codeaurora.org:osalvador@suse.de:rppt@linux.vnet.ibm.com:alexander.h.duyck@linux.intel.com:pasha.tatashin@soleen.com:glider@google.com,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1345:1359:1431:1437:1534:1541:1711:1730:1747:1777:1792:2198:2199:2393:2559:2562:3138:3139:3140:3141:3142:3352:3872:3876:4321:4605:5007:6261:6737:6742:7514:7903:9207:10004:11026:11473:11658:11914:12043:12048:12296:12297:12438:12555:12895:13069:13311:13357:13846:14096:14181:14384:14394:14721:14915:21060:21080:21451:21627:3 0064,0,R X-HE-Tag: women05_40e7e9ada1c2a X-Filterd-Recvd-Size: 3587 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Mon, 16 Dec 2019 09:27:25 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04407;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=24;SR=0;TI=SMTPD_---0Tl3mBr4_1576488439; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0Tl3mBr4_1576488439) by smtp.aliyun-inc.com(127.0.0.1); Mon, 16 Dec 2019 17:27:20 +0800 From: Alex Shi To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, hannes@cmpxchg.org Cc: Alex Shi , Vlastimil Babka , Dan Williams , Michal Hocko , Wei Yang , Arun KS , Oscar Salvador , Mike Rapoport , Alexander Duyck , Pavel Tatashin , Alexander Potapenko Subject: [PATCH v6 06/10] mm/pgdat: remove pgdat lru_lock Date: Mon, 16 Dec 2019 17:26:22 +0800 Message-Id: <1576488386-32544-7-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> References: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now pgdat.lru_lock was replaced by lruvec lock. It's not used anymore. Signed-off-by: Alex Shi Cc: Andrew Morton Cc: Vlastimil Babka Cc: Dan Williams Cc: Michal Hocko Cc: Mel Gorman Cc: Wei Yang Cc: Arun KS Cc: Oscar Salvador Cc: Mike Rapoport Cc: Alexander Duyck Cc: Pavel Tatashin Cc: Alexander Potapenko Cc: Konstantin Khlebnikov Cc: Hugh Dickins Cc: Johannes Weiner Cc: Tejun Heo Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org --- include/linux/mmzone.h | 1 - mm/page_alloc.c | 1 - 2 files changed, 2 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c5455675acf2..7db0cec19aa0 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -769,7 +769,6 @@ struct deferred_split { /* Write-intensive fields used by page reclaim */ ZONE_PADDING(_pad1_) - spinlock_t lru_lock; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4785a8a2040e..352f2a3d67b3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6712,7 +6712,6 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat) init_waitqueue_head(&pgdat->pfmemalloc_wait); pgdat_page_ext_init(pgdat); - spin_lock_init(&pgdat->lru_lock); lruvec_init(&pgdat->__lruvec); } From patchwork Mon Dec 16 09:26:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11293647 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F384613B6 for ; Mon, 16 Dec 2019 09:27:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AF7C9206D3 for ; Mon, 16 Dec 2019 09:27:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF7C9206D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0EC318E000B; Mon, 16 Dec 2019 04:27:29 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 070CB8E000D; Mon, 16 Dec 2019 04:27:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2F228E000B; Mon, 16 Dec 2019 04:27:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0075.hostedemail.com [216.40.44.75]) by kanga.kvack.org (Postfix) with ESMTP id 883928E000C for ; Mon, 16 Dec 2019 04:27:28 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 5C016909B for ; Mon, 16 Dec 2019 09:27:28 +0000 (UTC) X-FDA: 76270476576.09.hour27_40e539904b93a X-Spam-Summary: 2,0,0,c58a7edc074a7582,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:willy@infradead.org:shakeelb@google.com:hannes@cmpxchg.org:alex.shi@linux.alibaba.com:jgg@ziepe.ca:dan.j.williams@intel.com:vbabka@suse.cz:ira.weiny@intel.com:brouer@redhat.com:aryabinin@virtuozzo.com:jannh@google.com:logang@deltatee.com:jrdr.linux@gmail.com:rcampbell@nvidia.com:tobin@kernel.org:mhocko@suse.com:osalvador@suse.de:richard.weiyang@gmail.com:arunks@codeaurora.org:darrick.wong@oracle.com:amir73il@gmail.com:dchinner@redhat.com:josef@toxicpanda.com:kirill.shutemov@linux.intel.com:jglisse@redhat.com:mike.kravetz@oracle.com:ktkhai@virtuozzo.com:laoar.shao@gmail.com,RULES_HIT:4:41:69:152:355:379:541:800:960:966:968:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1431: 1437:151 X-HE-Tag: hour27_40e539904b93a X-Filterd-Recvd-Size: 16541 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Mon, 16 Dec 2019 09:27:25 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04446;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=38;SR=0;TI=SMTPD_---0Tl3Q8bH_1576488440; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0Tl3Q8bH_1576488440) by smtp.aliyun-inc.com(127.0.0.1); Mon, 16 Dec 2019 17:27:20 +0800 From: Alex Shi To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, hannes@cmpxchg.org Cc: Alex Shi , Jason Gunthorpe , Dan Williams , Vlastimil Babka , Ira Weiny , Jesper Dangaard Brouer , Andrey Ryabinin , Jann Horn , Logan Gunthorpe , Souptick Joarder , Ralph Campbell , "Tobin C. Harding" , Michal Hocko , Oscar Salvador , Wei Yang , Arun KS , "Darrick J. Wong" , Amir Goldstein , Dave Chinner , Josef Bacik , "Kirill A. Shutemov" , =?utf-8?b?SsOpcsO0?= =?utf-8?b?bWUgR2xpc3Nl?= , Mike Kravetz , Kirill Tkhai , Yafang Shao Subject: [PATCH v6 07/10] mm/lru: revise the comments of lru_lock Date: Mon, 16 Dec 2019 17:26:23 +0800 Message-Id: <1576488386-32544-8-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> References: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to fix the incorrect comments in code. Also fixed some zone->lru_lock comment error from ancient time. etc. Signed-off-by: Hugh Dickins Signed-off-by: Alex Shi Cc: Andrew Morton Cc: Jason Gunthorpe Cc: Dan Williams Cc: Vlastimil Babka Cc: Ira Weiny Cc: Jesper Dangaard Brouer Cc: Andrey Ryabinin Cc: Jann Horn Cc: Logan Gunthorpe Cc: Souptick Joarder Cc: Ralph Campbell Cc: "Tobin C. Harding" Cc: Michal Hocko Cc: Oscar Salvador Cc: Mel Gorman Cc: Wei Yang Cc: Johannes Weiner Cc: Arun KS Cc: Matthew Wilcox Cc: "Darrick J. Wong" Cc: Amir Goldstein Cc: Dave Chinner Cc: Josef Bacik Cc: "Kirill A. Shutemov" Cc: "Jérôme Glisse" Cc: Mike Kravetz Cc: Hugh Dickins Cc: Kirill Tkhai Cc: Daniel Jordan Cc: Yafang Shao Cc: Yang Shi Cc: cgroups@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org --- Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 +++------------ Documentation/admin-guide/cgroup-v1/memory.rst | 6 +++--- Documentation/trace/events-kmem.rst | 2 +- Documentation/vm/unevictable-lru.rst | 22 ++++++++-------------- include/linux/mm_types.h | 2 +- include/linux/mmzone.h | 2 +- mm/filemap.c | 4 ++-- mm/rmap.c | 2 +- mm/vmscan.c | 12 ++++++++---- 9 files changed, 28 insertions(+), 39 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v1/memcg_test.rst b/Documentation/admin-guide/cgroup-v1/memcg_test.rst index 3f7115e07b5d..0b9f91589d3d 100644 --- a/Documentation/admin-guide/cgroup-v1/memcg_test.rst +++ b/Documentation/admin-guide/cgroup-v1/memcg_test.rst @@ -133,18 +133,9 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. 8. LRU ====== - Each memcg has its own private LRU. Now, its handling is under global - VM's control (means that it's handled under global pgdat->lru_lock). - Almost all routines around memcg's LRU is called by global LRU's - list management functions under pgdat->lru_lock. - - A special function is mem_cgroup_isolate_pages(). This scans - memcg's private LRU and call __isolate_lru_page() to extract a page - from LRU. - - (By __isolate_lru_page(), the page is removed from both of global and - private LRU.) - + Each memcg has its own vector of LRUs (inactive anon, active anon, + inactive file, active file, unevictable) of pages from each node, + each LRU handled under a single lru_lock for that memcg and node. 9. Typical Tests. ================= diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 0ae4f564c2d6..60d97e8b7f3c 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -297,13 +297,13 @@ When oom event notifier is registered, event will be delivered. PG_locked. mm->page_table_lock - pgdat->lru_lock + lruvec->lru_lock lock_page_cgroup. In many cases, just lock_page_cgroup() is called. - per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by - pgdat->lru_lock, it has no lock of its own. + per-node-per-cgroup LRU (cgroup's private LRU) is just guarded by + lruvec->lru_lock, it has no lock of its own. 2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM) ----------------------------------------------- diff --git a/Documentation/trace/events-kmem.rst b/Documentation/trace/events-kmem.rst index 555484110e36..68fa75247488 100644 --- a/Documentation/trace/events-kmem.rst +++ b/Documentation/trace/events-kmem.rst @@ -69,7 +69,7 @@ When pages are freed in batch, the also mm_page_free_batched is triggered. Broadly speaking, pages are taken off the LRU lock in bulk and freed in batch with a page list. Significant amounts of activity here could indicate that the system is under memory pressure and can also indicate -contention on the zone->lru_lock. +contention on the lruvec->lru_lock. 4. Per-CPU Allocator Activity ============================= diff --git a/Documentation/vm/unevictable-lru.rst b/Documentation/vm/unevictable-lru.rst index 17d0861b0f1d..0e1490524f53 100644 --- a/Documentation/vm/unevictable-lru.rst +++ b/Documentation/vm/unevictable-lru.rst @@ -33,7 +33,7 @@ reclaim in Linux. The problems have been observed at customer sites on large memory x86_64 systems. To illustrate this with an example, a non-NUMA x86_64 platform with 128GB of -main memory will have over 32 million 4k pages in a single zone. When a large +main memory will have over 32 million 4k pages in a single node. When a large fraction of these pages are not evictable for any reason [see below], vmscan will spend a lot of time scanning the LRU lists looking for the small fraction of pages that are evictable. This can result in a situation where all CPUs are @@ -55,7 +55,7 @@ unevictable, either by definition or by circumstance, in the future. The Unevictable Page List ------------------------- -The Unevictable LRU infrastructure consists of an additional, per-zone, LRU list +The Unevictable LRU infrastructure consists of an additional, per-node, LRU list called the "unevictable" list and an associated page flag, PG_unevictable, to indicate that the page is being managed on the unevictable list. @@ -84,15 +84,9 @@ The unevictable list does not differentiate between file-backed and anonymous, swap-backed pages. This differentiation is only important while the pages are, in fact, evictable. -The unevictable list benefits from the "arrayification" of the per-zone LRU +The unevictable list benefits from the "arrayification" of the per-node LRU lists and statistics originally proposed and posted by Christoph Lameter. -The unevictable list does not use the LRU pagevec mechanism. Rather, -unevictable pages are placed directly on the page's zone's unevictable list -under the zone lru_lock. This allows us to prevent the stranding of pages on -the unevictable list when one task has the page isolated from the LRU and other -tasks are changing the "evictability" state of the page. - Memory Control Group Interaction -------------------------------- @@ -101,8 +95,8 @@ The unevictable LRU facility interacts with the memory control group [aka memory controller; see Documentation/admin-guide/cgroup-v1/memory.rst] by extending the lru_list enum. -The memory controller data structure automatically gets a per-zone unevictable -list as a result of the "arrayification" of the per-zone LRU lists (one per +The memory controller data structure automatically gets a per-node unevictable +list as a result of the "arrayification" of the per-node LRU lists (one per lru_list enum element). The memory controller tracks the movement of pages to and from the unevictable list. @@ -196,7 +190,7 @@ for the sake of expediency, to leave a unevictable page on one of the regular active/inactive LRU lists for vmscan to deal with. vmscan checks for such pages in all of the shrink_{active|inactive|page}_list() functions and will "cull" such pages that it encounters: that is, it diverts those pages to the -unevictable list for the zone being scanned. +unevictable list for the node being scanned. There may be situations where a page is mapped into a VM_LOCKED VMA, but the page is not marked as PG_mlocked. Such pages will make it all the way to @@ -328,7 +322,7 @@ If the page was NOT already mlocked, mlock_vma_page() attempts to isolate the page from the LRU, as it is likely on the appropriate active or inactive list at that time. If the isolate_lru_page() succeeds, mlock_vma_page() will put back the page - by calling putback_lru_page() - which will notice that the page -is now mlocked and divert the page to the zone's unevictable list. If +is now mlocked and divert the page to the node's unevictable list. If mlock_vma_page() is unable to isolate the page from the LRU, vmscan will handle it later if and when it attempts to reclaim the page. @@ -603,7 +597,7 @@ Some examples of these unevictable pages on the LRU lists are: unevictable list in mlock_vma_page(). shrink_inactive_list() also diverts any unevictable pages that it finds on the -inactive lists to the appropriate zone's unevictable list. +inactive lists to the appropriate node's unevictable list. shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd after shrink_active_list() had moved them to the inactive list, or pages mapped diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 270aa8fd2800..ff08a6a8145c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -78,7 +78,7 @@ struct page { struct { /* Page cache and anonymous pages */ /** * @lru: Pageout list, eg. active_list protected by - * pgdat->lru_lock. Sometimes used as a generic list + * lruvec->lru_lock. Sometimes used as a generic list * by the page owner. */ struct list_head lru; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 7db0cec19aa0..d73be191e9f8 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -159,7 +159,7 @@ static inline bool free_area_empty(struct free_area *area, int migratetype) struct pglist_data; /* - * zone->lock and the zone lru_lock are two of the hottest locks in the kernel. + * zone->lock and the lru_lock are two of the hottest locks in the kernel. * So add a wild amount of padding here to ensure that they fall into separate * cachelines. There are very few zone structures in the machine, so space * consumption is not a concern here. diff --git a/mm/filemap.c b/mm/filemap.c index bf6aa30be58d..6dcdf06660fb 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -101,8 +101,8 @@ * ->swap_lock (try_to_unmap_one) * ->private_lock (try_to_unmap_one) * ->i_pages lock (try_to_unmap_one) - * ->pgdat->lru_lock (follow_page->mark_page_accessed) - * ->pgdat->lru_lock (check_pte_range->isolate_lru_page) + * ->lruvec->lru_lock (follow_page->mark_page_accessed) + * ->lruvec->lru_lock (check_pte_range->isolate_lru_page) * ->private_lock (page_remove_rmap->set_page_dirty) * ->i_pages lock (page_remove_rmap->set_page_dirty) * bdi.wb->list_lock (page_remove_rmap->set_page_dirty) diff --git a/mm/rmap.c b/mm/rmap.c index b3e381919835..39052794cb46 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -27,7 +27,7 @@ * mapping->i_mmap_rwsem * anon_vma->rwsem * mm->page_table_lock or pte_lock - * pgdat->lru_lock (in mark_page_accessed, isolate_lru_page) + * lruvec->lru_lock (in mark_page_accessed, isolate_lru_page) * swap_lock (in swap_duplicate, swap_info_get) * mmlist_lock (in mmput, drain_mmlist and others) * mapping->private_lock (in __set_page_dirty_buffers) diff --git a/mm/vmscan.c b/mm/vmscan.c index be8bca45f7c6..21b6a9f681ff 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1626,14 +1626,16 @@ static __always_inline void update_lru_sizes(struct lruvec *lruvec, } /** - * pgdat->lru_lock is heavily contended. Some of the functions that + * Isolating page from the lruvec to fill in @dst list by nr_to_scan times. + * + * lruvec->lru_lock is heavily contended. Some of the functions that * shrink the lists perform better by taking out a batch of pages * and working on them outside the LRU lock. * * For pagecache intensive workloads, this function is the hottest * spot in the kernel (apart from copy_*_user functions). * - * Appropriate locks must be held before calling this function. + * Lru_lock must be held before calling this function. * * @nr_to_scan: The number of eligible pages to look through on the list. * @lruvec: The LRU vector to pull pages from. @@ -1820,14 +1822,16 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, /* * This moves pages from @list to corresponding LRU list. + * The pages from @list is out of any lruvec, and in the end list reuses as + * pages_to_free list. * * We move them the other way if the page is referenced by one or more * processes, from rmap. * * If the pages are mostly unmapped, the processing is fast and it is - * appropriate to hold zone_lru_lock across the whole operation. But if + * appropriate to hold lru_lock across the whole operation. But if * the pages are mapped, the processing is slow (page_referenced()) so we - * should drop zone_lru_lock around each page. It's impossible to balance + * should drop lru_lock around each page. It's impossible to balance * this, so instead we remove the pages from the LRU while processing them. * It is safe to rely on PG_active against the non-LRU pages in here because * nobody will play with that bit on a non-LRU page. From patchwork Mon Dec 16 09:26:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11293643 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 97DF6112B for ; Mon, 16 Dec 2019 09:27:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 63B3B206D3 for ; Mon, 16 Dec 2019 09:27:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 63B3B206D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AA4F68E000E; Mon, 16 Dec 2019 04:27:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9E24E8E000D; Mon, 16 Dec 2019 04:27:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 826F98E0003; Mon, 16 Dec 2019 04:27:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0252.hostedemail.com [216.40.44.252]) by kanga.kvack.org (Postfix) with ESMTP id 616D48E0003 for ; Mon, 16 Dec 2019 04:27:28 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 1108B180AD802 for ; Mon, 16 Dec 2019 09:27:28 +0000 (UTC) X-FDA: 76270476576.30.part29_4116f9f58e239 X-Spam-Summary: 2,0,0,73a6b900379aa699,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:willy@infradead.org:shakeelb@google.com:hannes@cmpxchg.org:alex.shi@linux.alibaba.com:mhocko@kernel.org:vdavydov.dev@gmail.com,RULES_HIT:41:355:379:541:800:960:968:973:988:989:1260:1261:1345:1359:1431:1437:1534:1542:1711:1730:1747:1777:1792:2393:2559:2562:2901:2914:3138:3139:3140:3141:3142:3353:3865:3867:3868:4321:5007:6119:6261:6737:7514:10004:11026:11473:11658:11914:12043:12048:12296:12297:12438:12555:12895:12986:13161:13229:13255:13846:14096:14181:14394:14721:14915:21060:21080:21450:21451:21627:21966:21990:30054:30070,0,RBL:115.124.30.132:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bul k,SPF:fp X-HE-Tag: part29_4116f9f58e239 X-Filterd-Recvd-Size: 3794 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Mon, 16 Dec 2019 09:27:26 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04407;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---0Tl3Q8bR_1576488441; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0Tl3Q8bR_1576488441) by smtp.aliyun-inc.com(127.0.0.1); Mon, 16 Dec 2019 17:27:21 +0800 From: Alex Shi To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, hannes@cmpxchg.org Cc: Alex Shi , Michal Hocko , Vladimir Davydov Subject: [PATCH v6 08/10] mm/lru: debug checking for page memcg moving and lock_page_memcg Date: Mon, 16 Dec 2019 17:26:24 +0800 Message-Id: <1576488386-32544-9-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> References: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The extra irq disable/enable and BUG_ON checking costs 5% readtwice performance on a 2 socket * 26 cores * HT box. Need to remove them later? Signed-off-by: Alex Shi Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Andrew Morton Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/compaction.c | 4 ++++ mm/memcontrol.c | 13 +++++++++++++ 2 files changed, 17 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index 8c0a2da217d8..f47820355b66 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -971,6 +971,10 @@ static bool too_many_isolated(pg_data_t *pgdat) compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); locked_lruvec = lruvec; +#ifdef CONFIG_MEMCG + if (!mem_cgroup_disabled()) + VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != page->mem_cgroup, page); +#endif /* Try get exclusive access under lock */ if (!skip_updated) { skip_updated = true; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f5d41ccd30e0..138f298b694f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1247,6 +1247,10 @@ struct lruvec *lock_page_lruvec_irq(struct page *page) lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page)); spin_lock_irq(&lruvec->lru_lock); +#ifdef CONFIG_MEMCG + if (!mem_cgroup_disabled()) + VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != page->mem_cgroup, page); +#endif return lruvec; } @@ -1259,6 +1263,10 @@ struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags) lruvec = mem_cgroup_lruvec(memcg, page_pgdat(page)); spin_lock_irqsave(&lruvec->lru_lock, *flags); +#ifdef CONFIG_MEMCG + if (!mem_cgroup_disabled()) + VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != page->mem_cgroup, page); +#endif return lruvec; } @@ -2014,6 +2022,11 @@ struct mem_cgroup *lock_page_memcg(struct page *page) if (unlikely(!memcg)) return NULL; + /* temporary lockdep checking, need remove */ + local_irq_save(flags); + might_lock(&memcg->move_lock); + local_irq_restore(flags); + if (atomic_read(&memcg->moving_account) <= 0) return memcg; From patchwork Mon Dec 16 09:26:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11293653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0893B13B6 for ; Mon, 16 Dec 2019 09:28:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CFC43206D3 for ; Mon, 16 Dec 2019 09:28:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CFC43206D3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1A24E8E0010; Mon, 16 Dec 2019 04:28:17 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 152B08E000C; Mon, 16 Dec 2019 04:28:17 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 041BC8E0010; Mon, 16 Dec 2019 04:28:16 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0245.hostedemail.com [216.40.44.245]) by kanga.kvack.org (Postfix) with ESMTP id E1C1F8E000C for ; Mon, 16 Dec 2019 04:28:16 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 9E1669407 for ; Mon, 16 Dec 2019 09:28:16 +0000 (UTC) X-FDA: 76270478592.08.jail56_482a759d22636 X-Spam-Summary: 2,0,0,c3a1129b9c205da8,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:willy@infradead.org:shakeelb@google.com:hannes@cmpxchg.org:alex.shi@linux.alibaba.com:mhocko@kernel.org:vdavydov.dev@gmail.com,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1345:1359:1431:1437:1534:1541:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3352:3865:3866:3867:3871:3872:5007:6261:6737:7514:8957:10004:11026:11473:11658:11914:12043:12048:12296:12297:12438:12555:12895:13069:13311:13357:13846:14096:14181:14384:14394:14721:14915:21060:21080:21451:21627:30054:30070,0,RBL:115.124.30.54:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,C ustom_ru X-HE-Tag: jail56_482a759d22636 X-Filterd-Recvd-Size: 3379 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Mon, 16 Dec 2019 09:28:15 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07417;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---0Tl3i.ek_1576488441; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0Tl3i.ek_1576488441) by smtp.aliyun-inc.com(127.0.0.1); Mon, 16 Dec 2019 17:27:21 +0800 From: Alex Shi To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, hannes@cmpxchg.org Cc: Alex Shi , Michal Hocko , Vladimir Davydov Subject: [PATCH v6 09/10] mm/memcg: fold lock in lock_page_lru Date: Mon, 16 Dec 2019 17:26:25 +0800 Message-Id: <1576488386-32544-10-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> References: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: According to the calling path of commit_charge, the lrucare is bound with PageLRU, so we could just fold it under PageLRU. This has no functional change. Signed-off-by: Alex Shi Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Andrew Morton Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/memcontrol.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 138f298b694f..f8e279487e1d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2621,9 +2621,10 @@ static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) static struct lruvec *lock_page_lru(struct page *page, int *isolated) { - struct lruvec *lruvec = lock_page_lruvec_irq(page); + struct lruvec *lruvec = NULL; if (PageLRU(page)) { + lruvec = lock_page_lruvec_irq(page); ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_lru(page)); @@ -2637,17 +2638,18 @@ static struct lruvec *lock_page_lru(struct page *page, int *isolated) static void unlock_page_lru(struct page *page, int isolated, struct lruvec *locked_lruvec) { - struct lruvec *lruvec; + if (isolated) { + struct lruvec *lruvec; - unlock_page_lruvec_irq(locked_lruvec); - lruvec = lock_page_lruvec_irq(page); + if (locked_lruvec) + unlock_page_lruvec_irq(locked_lruvec); + lruvec = lock_page_lruvec_irq(page); - if (isolated) { VM_BUG_ON_PAGE(PageLRU(page), page); SetPageLRU(page); add_page_to_lru_list(page, lruvec, page_lru(page)); + unlock_page_lruvec_irq(lruvec); } - unlock_page_lruvec_irq(lruvec); } static void commit_charge(struct page *page, struct mem_cgroup *memcg, From patchwork Mon Dec 16 09:26:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11293649 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7BCF6112B for ; Mon, 16 Dec 2019 09:27:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 486432080A for ; Mon, 16 Dec 2019 09:27:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 486432080A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 79AAF8E000D; Mon, 16 Dec 2019 04:27:38 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6D3108E000C; Mon, 16 Dec 2019 04:27:38 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 59DD68E000D; Mon, 16 Dec 2019 04:27:38 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0094.hostedemail.com [216.40.44.94]) by kanga.kvack.org (Postfix) with ESMTP id 3BF858E000C for ; Mon, 16 Dec 2019 04:27:38 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 01E07181AC1E9 for ; Mon, 16 Dec 2019 09:27:38 +0000 (UTC) X-FDA: 76270476996.26.worm56_428efa6f8071d X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alex.shi@linux.alibaba.com,:cgroups@vger.kernel.org:linux-kernel@vger.kernel.org::akpm@linux-foundation.org:mgorman@techsingularity.net:tj@kernel.org:hughd@google.com:khlebnikov@yandex-team.ru:daniel.m.jordan@oracle.com:yang.shi@linux.alibaba.com:willy@infradead.org:shakeelb@google.com:hannes@cmpxchg.org:alex.shi@linux.alibaba.com,RULES_HIT:30054,0,RBL:47.88.44.36:@linux.alibaba.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: worm56_428efa6f8071d X-Filterd-Recvd-Size: 3105 Received: from out4436.biz.mail.alibaba.com (out4436.biz.mail.alibaba.com [47.88.44.36]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Mon, 16 Dec 2019 09:27:36 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01422;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0Tl3mBrd_1576488441; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0Tl3mBrd_1576488441) by smtp.aliyun-inc.com(127.0.0.1); Mon, 16 Dec 2019 17:27:22 +0800 From: Alex Shi To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, hannes@cmpxchg.org Cc: Alex Shi Subject: [PATCH v6 10/10] mm: revise the comments of mem_cgroup_page_lruvec Date: Mon, 16 Dec 2019 17:26:26 +0800 Message-Id: <1576488386-32544-11-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> References: <1576488386-32544-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Johannes Weiner Better document the mem_cgroup_page_lruvec() caller requirements. Suggested-by: Shakeel Butt Signed-off-by: Johannes Weiner Signed-off-by: Alex Shi Cc: Andrew Morton Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/memcontrol.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f8e279487e1d..552de2e7da0e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1202,9 +1202,18 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, * @page: the page * @pgdat: pgdat of the page * - * This function is only safe when following the LRU page isolation - * and putback protocol: the LRU lock must be held, and the page must - * either be PageLRU() or the caller must have isolated/allocated it. + * NOTE: The returned lruvec is only stable if the calling context has + * the page->mem_cgroup pinned! This is accomplished by satisfying one + * of the following criteria: + * + * a) have the @page locked + * b) have an exclusive reference to @page (e.g. refcount 0) + * c) hold the lru_lock and "own" the PageLRU (meaning either ensure + * it's set, or be the one to hold the page in isolation) + * + * Otherwise, the page could be freed or moved out of the memcg, + * thereby releasing its reference on the memcg and potentially + * freeing it and its lruvecs in the process. */ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgdat) {