From patchwork Thu Nov 5 08:55:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883677 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 038D9921 for ; Thu, 5 Nov 2020 08:56:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8C3F6206C0 for ; Thu, 5 Nov 2020 08:56:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8C3F6206C0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8B4AE6B00A4; Thu, 5 Nov 2020 03:56:17 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3D66C6B00A7; Thu, 5 Nov 2020 03:56:17 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D2426B00A3; Thu, 5 Nov 2020 03:56:17 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188]) by kanga.kvack.org (Postfix) with ESMTP id 8C00D6B00A2 for ; Thu, 5 Nov 2020 03:56:16 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2F9108249980 for ; Thu, 5 Nov 2020 08:56:16 +0000 (UTC) X-FDA: 77449757952.06.shame32_481575e272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 0A4D610038297 for ; Thu, 5 Nov 2020 08:56:16 +0000 (UTC) X-Spam-Summary: 1,0,0,526cc2b8a186e9c7,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:41:69:355:379:541:800:960:973:988:989:1260:1261:1345:1359:1381:1431:1437:1535:1544:1711:1730:1747:1777:1792:2198:2199:2393:2559:2562:2731:2890:3138:3139:3140:3141:3142:3355:3865:3868:3870:3871:3872:4042:4321:4605:5007:6261:6737:6738:7903:8957:9010:9592:10004:11026:11232:11473:11658:11914:12043:12048:12291:12296:12297:12438:12555:12683:12895:12986:13161:13229:13846:14096:14181:14394:14721:14915:21060:21080:21450:21451:21627:21740:21987:30001:30054,0,RBL:115.124.30.131:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201;04yfue3j1aozyzkoy1wg1if88ex4aocgb96j696bgg9nrf58go4x64xx5ttwusd.czzetspk3yq44dnunhsquaihwiqoux7xrn9hmigwpf38se6nasobghnsax3q7x9.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: shame32_481575e272c8 X-Filterd-Recvd-Size: 5692 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:14 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:08 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Subject: [PATCH v21 01/19] mm/thp: move lru_add_page_tail func to huge_memory.c Date: Thu, 5 Nov 2020 16:55:31 +0800 Message-Id: <1604566549-62481-2-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The func is only used in huge_memory.c, defining it in other file with a CONFIG_TRANSPARENT_HUGEPAGE macro restrict just looks weird. Let's move it THP. And make it static as Hugh Dickin suggested. Signed-off-by: Alex Shi Reviewed-by: Kirill A. Shutemov Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Andrew Morton Cc: Johannes Weiner Cc: Matthew Wilcox Cc: Hugh Dickins Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org --- include/linux/swap.h | 2 -- mm/huge_memory.c | 30 ++++++++++++++++++++++++++++++ mm/swap.c | 33 --------------------------------- 3 files changed, 30 insertions(+), 35 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 667935c0dbd4..5e1e967c225f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -338,8 +338,6 @@ extern void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages); extern void lru_note_cost_page(struct page *); extern void lru_cache_add(struct page *); -extern void lru_add_page_tail(struct page *page, struct page *page_tail, - struct lruvec *lruvec, struct list_head *head); extern void mark_page_accessed(struct page *); extern void lru_add_drain(void); extern void lru_add_drain_cpu(int cpu); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 08a183f6c3ab..8f16e991f7cc 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2348,6 +2348,36 @@ static void remap_page(struct page *page, unsigned int nr) } } +static void lru_add_page_tail(struct page *page, struct page *page_tail, + struct lruvec *lruvec, struct list_head *list) +{ + VM_BUG_ON_PAGE(!PageHead(page), page); + VM_BUG_ON_PAGE(PageCompound(page_tail), page); + VM_BUG_ON_PAGE(PageLRU(page_tail), page); + lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); + + if (!list) + SetPageLRU(page_tail); + + if (likely(PageLRU(page))) + list_add_tail(&page_tail->lru, &page->lru); + else if (list) { + /* page reclaim is reclaiming a huge page */ + get_page(page_tail); + list_add_tail(&page_tail->lru, list); + } else { + /* + * Head page has not yet been counted, as an hpage, + * so we must account for each subpage individually. + * + * Put page_tail on the list at the correct position + * so they all end up in order. + */ + add_page_to_lru_list_tail(page_tail, lruvec, + page_lru(page_tail)); + } +} + static void __split_huge_page_tail(struct page *head, int tail, struct lruvec *lruvec, struct list_head *list) { diff --git a/mm/swap.c b/mm/swap.c index 29220174433b..8a578381c2fc 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -977,39 +977,6 @@ void __pagevec_release(struct pagevec *pvec) } EXPORT_SYMBOL(__pagevec_release); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -/* used by __split_huge_page_refcount() */ -void lru_add_page_tail(struct page *page, struct page *page_tail, - struct lruvec *lruvec, struct list_head *list) -{ - VM_BUG_ON_PAGE(!PageHead(page), page); - VM_BUG_ON_PAGE(PageCompound(page_tail), page); - VM_BUG_ON_PAGE(PageLRU(page_tail), page); - lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); - - if (!list) - SetPageLRU(page_tail); - - if (likely(PageLRU(page))) - list_add_tail(&page_tail->lru, &page->lru); - else if (list) { - /* page reclaim is reclaiming a huge page */ - get_page(page_tail); - list_add_tail(&page_tail->lru, list); - } else { - /* - * Head page has not yet been counted, as an hpage, - * so we must account for each subpage individually. - * - * Put page_tail on the list at the correct position - * so they all end up in order. - */ - add_page_to_lru_list_tail(page_tail, lruvec, - page_lru(page_tail)); - } -} -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ - static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, void *arg) { From patchwork Thu Nov 5 08:55:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883683 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 81BDC921 for ; Thu, 5 Nov 2020 08:56:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3967A2222B for ; Thu, 5 Nov 2020 08:56:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3967A2222B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 29AAA6B00A7; Thu, 5 Nov 2020 03:56:19 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 22B2A6B00A8; Thu, 5 Nov 2020 03:56:18 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E30576B00A9; Thu, 5 Nov 2020 03:56:18 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0001.hostedemail.com [216.40.44.1]) by kanga.kvack.org (Postfix) with ESMTP id A08576B00A7 for ; Thu, 5 Nov 2020 03:56:18 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3D9A78249980 for ; Thu, 5 Nov 2020 08:56:18 +0000 (UTC) X-FDA: 77449758036.23.army47_1d0748f272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 160E837609 for ; Thu, 5 Nov 2020 08:56:18 +0000 (UTC) X-Spam-Summary: 1,0,0,f12f2f3cfcd8ccef,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:41:69:355:379:541:800:960:973:988:989:1260:1261:1345:1359:1381:1431:1437:1534:1542:1711:1730:1747:1777:1792:2198:2199:2393:2559:2562:2731:2890:3138:3139:3140:3141:3142:3353:3865:3867:3870:3871:3872:4042:4321:5007:6261:6737:6738:7903:8957:9010:10004:11026:11232:11473:11658:11914:12043:12048:12296:12297:12438:12555:12895:12986:13846:14096:14181:14394:14721:14915:21060:21080:21450:21451:21627:21740:30001:30054,0,RBL:115.124.30.131:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201;04yghcthhsfh5eeh39h4hirr4s3w1ock86ttx77wmzyzbe9d6m4i5zxtwn3ma4s.p4xqpmn8n9rwqq4jsjnu8fa796n8kax5w4ufi7uz147tqjqny1efshppazsfedi.h-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:67,LUA_SUMMARY:none X-HE-Tag: army47_1d0748f272c8 X-Filterd-Recvd-Size: 3896 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:16 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:08 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Subject: [PATCH v21 02/19] mm/thp: use head for head page in lru_add_page_tail Date: Thu, 5 Nov 2020 16:55:32 +0800 Message-Id: <1604566549-62481-3-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since the first parameter is only used by head page, it's better to make it explicit. Signed-off-by: Alex Shi Reviewed-by: Kirill A. Shutemov Reviewed-by: Matthew Wilcox (Oracle) Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Andrew Morton Cc: Johannes Weiner Cc: Matthew Wilcox Cc: Hugh Dickins Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/huge_memory.c | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8f16e991f7cc..60726eb26840 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2348,33 +2348,32 @@ static void remap_page(struct page *page, unsigned int nr) } } -static void lru_add_page_tail(struct page *page, struct page *page_tail, +static void lru_add_page_tail(struct page *head, struct page *tail, struct lruvec *lruvec, struct list_head *list) { - VM_BUG_ON_PAGE(!PageHead(page), page); - VM_BUG_ON_PAGE(PageCompound(page_tail), page); - VM_BUG_ON_PAGE(PageLRU(page_tail), page); + VM_BUG_ON_PAGE(!PageHead(head), head); + VM_BUG_ON_PAGE(PageCompound(tail), head); + VM_BUG_ON_PAGE(PageLRU(tail), head); lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); if (!list) - SetPageLRU(page_tail); + SetPageLRU(tail); - if (likely(PageLRU(page))) - list_add_tail(&page_tail->lru, &page->lru); + if (likely(PageLRU(head))) + list_add_tail(&tail->lru, &head->lru); else if (list) { /* page reclaim is reclaiming a huge page */ - get_page(page_tail); - list_add_tail(&page_tail->lru, list); + get_page(tail); + list_add_tail(&tail->lru, list); } else { /* * Head page has not yet been counted, as an hpage, * so we must account for each subpage individually. * - * Put page_tail on the list at the correct position + * Put tail on the list at the correct position * so they all end up in order. */ - add_page_to_lru_list_tail(page_tail, lruvec, - page_lru(page_tail)); + add_page_to_lru_list_tail(tail, lruvec, page_lru(tail)); } } From patchwork Thu Nov 5 08:55:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883671 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E344921 for ; Thu, 5 Nov 2020 08:56:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BD42522264 for ; Thu, 5 Nov 2020 08:56:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD42522264 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9C3156B00A1; Thu, 5 Nov 2020 03:56:16 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9581A6B00A3; Thu, 5 Nov 2020 03:56:16 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7762B6B00A3; Thu, 5 Nov 2020 03:56:16 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0086.hostedemail.com [216.40.44.86]) by kanga.kvack.org (Postfix) with ESMTP id 437376B00A1 for ; Thu, 5 Nov 2020 03:56:16 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id DD3563635 for ; Thu, 5 Nov 2020 08:56:15 +0000 (UTC) X-FDA: 77449757910.28.lace43_06174a0272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id A28B66D62 for ; Thu, 5 Nov 2020 08:56:15 +0000 (UTC) X-Spam-Summary: 1,0,0,6b5019a3934ca280,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:41:69:152:355:379:541:800:960:966:973:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1431:1437:1515:1516:1518:1534:1542:1593:1594:1711:1730:1747:1777:1792:2196:2198:2199:2200:2393:2559:2562:2693:2731:2890:2915:3138:3139:3140:3141:3142:3353:3865:3867:3868:3870:3871:3872:3874:4042:4250:4385:5007:6261:6737:6738:7903:8957:9010:9592:10004:10400:10450:10455:11026:11232:11473:11658:11914:12043:12048:12294:12297:12438:12555:12895:12986:13846:13894:14096:14097:14181:14394:14659:14721:14915:19904:19999:21060:21080:21220:21450:21451:21627:21740:30001:30054,0,RBL:115.124.30.42:@linux.alibaba.com:.lbl8.mailshell.net-64.201.201.201 62.20.2.100;04yruqibyq6gwh76gxw43a6489niaocppargtnuzmu5trkni51ootujbsqsikz7.i7i8xpk76f167pn9c9ac84ypxjnkhzg41o16dozpagw49ry7488u9y64hw7p61i.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNS BL:none, X-HE-Tag: lace43_06174a0272c8 X-Filterd-Recvd-Size: 3939 Received: from out30-42.freemail.mail.aliyun.com (out30-42.freemail.mail.aliyun.com [115.124.30.42]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:13 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=alimailimapcm10staff010182156082;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:09 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: =?utf-8?q?Mika_Penttil=C3=A4?= Subject: [PATCH v21 03/19] mm/thp: Simplify lru_add_page_tail() Date: Thu, 5 Nov 2020 16:55:33 +0800 Message-Id: <1604566549-62481-4-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Simplify lru_add_page_tail(), there are actually only two cases possible: split_huge_page_to_list(), with list supplied and head isolated from lru by its caller; or split_huge_page(), with NULL list and head on lru - because when head is racily isolated from lru, the isolator's reference will stop the split from getting any further than its page_ref_freeze(). So decide between the two cases by "list", but add VM_WARN_ON()s to verify that they match our lru expectations. [Hugh Dickins: rewrite commit log] Signed-off-by: Alex Shi Reviewed-by: Kirill A. Shutemov Acked-by: Hugh Dickins Cc: Kirill A. Shutemov Cc: Andrew Morton Cc: Johannes Weiner Cc: Matthew Wilcox Cc: Hugh Dickins Cc: Mika Penttilä Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/huge_memory.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 60726eb26840..79318d7f7d5d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2356,24 +2356,16 @@ static void lru_add_page_tail(struct page *head, struct page *tail, VM_BUG_ON_PAGE(PageLRU(tail), head); lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); - if (!list) - SetPageLRU(tail); - - if (likely(PageLRU(head))) - list_add_tail(&tail->lru, &head->lru); - else if (list) { + if (list) { /* page reclaim is reclaiming a huge page */ + VM_WARN_ON(PageLRU(head)); get_page(tail); list_add_tail(&tail->lru, list); } else { - /* - * Head page has not yet been counted, as an hpage, - * so we must account for each subpage individually. - * - * Put tail on the list at the correct position - * so they all end up in order. - */ - add_page_to_lru_list_tail(tail, lruvec, page_lru(tail)); + /* head is still on lru (and we have it frozen) */ + VM_WARN_ON(!PageLRU(head)); + SetPageLRU(tail); + list_add_tail(&tail->lru, &head->lru); } } From patchwork Thu Nov 5 08:55:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883675 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0E99697 for ; Thu, 5 Nov 2020 08:56:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8DA21206FB for ; Thu, 5 Nov 2020 08:56:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8DA21206FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5106D6B00A3; Thu, 5 Nov 2020 03:56:17 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2BBA76B00A4; Thu, 5 Nov 2020 03:56:17 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E90F6B00A6; Thu, 5 Nov 2020 03:56:16 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0024.hostedemail.com [216.40.44.24]) by kanga.kvack.org (Postfix) with ESMTP id 8FD126B00A4 for ; Thu, 5 Nov 2020 03:56:16 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 250EE181AC9C6 for ; Thu, 5 Nov 2020 08:56:16 +0000 (UTC) X-FDA: 77449757952.05.glove52_3e0ef36272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 07F4E1802C523 for ; Thu, 5 Nov 2020 08:56:16 +0000 (UTC) X-Spam-Summary: 1,0,0,c0a43486835d36f7,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:2:41:69:355:379:541:800:960:966:973:981:988:989:1260:1261:1345:1359:1431:1437:1535:1605:1606:1730:1747:1777:1792:2196:2198:2199:2200:2393:2559:2562:2904:3138:3139:3140:3141:3142:3865:3867:3868:3870:3871:3872:3874:4117:4385:5007:6119:6261:6737:6738:7903:8957:9010:9592:10004:11026:11473:11658:11914:12043:12048:12296:12297:12438:12555:12679:12895:13846:14096:14394:14915:21060:21080:21451:21627:21740:21966:30054:30070,0,RBL:115.124.30.43:@linux.alibaba.com:.lbl8.mailshell.net-64.201.201.201 62.20.2.100;04yfg6mha9i9iz1p6oiqy37ebqpxaypdyjpphc1fcuux7qwfu94mnwzhbbc1otp.oqpjcacwb5qj5b7a74bjz9d1cp8gk59et896195fijsw45sczzci7uao8brha3h.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: glove52_3e0ef36272c8 X-Filterd-Recvd-Size: 6913 Received: from out30-43.freemail.mail.aliyun.com (out30-43.freemail.mail.aliyun.com [115.124.30.43]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:14 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:09 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: Andrea Arcangeli Subject: [PATCH v21 04/19] mm/thp: narrow lru locking Date: Thu, 5 Nov 2020 16:55:34 +0800 Message-Id: <1604566549-62481-5-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: lru_lock and page cache xa_lock have no obvious reason to be taken one way round or the other: until now, lru_lock has been taken before page cache xa_lock, when splitting a THP; but nothing else takes them together. Reverse that ordering: let's narrow the lru locking - but leave local_irq_disable to block interrupts throughout, like before. Hugh Dickins point: split_huge_page_to_list() was already silly, to be using the _irqsave variant: it's just been taking sleeping locks, so would already be broken if entered with interrupts enabled. So we can save passing flags argument down to __split_huge_page(). Why change the lock ordering here? That was hard to decide. One reason: when this series reaches per-memcg lru locking, it relies on the THP's memcg to be stable when taking the lru_lock: that is now done after the THP's refcount has been frozen, which ensures page memcg cannot change. Another reason: previously, lock_page_memcg()'s move_lock was presumed to nest inside lru_lock; but now lru_lock must nest inside (page cache lock inside) move_lock, so it becomes possible to use lock_page_memcg() to stabilize page memcg before taking its lru_lock. That is not the mechanism used in this series, but it is an option we want to keep open. [Hugh Dickins: rewrite commit log] Signed-off-by: Alex Shi Reviewed-by: Kirill A. Shutemov Acked-by: Hugh Dickins Cc: Hugh Dickins Cc: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Johannes Weiner Cc: Matthew Wilcox Cc: Andrew Morton Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/huge_memory.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 79318d7f7d5d..b70ec0c6076b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2435,7 +2435,7 @@ static void __split_huge_page_tail(struct page *head, int tail, } static void __split_huge_page(struct page *page, struct list_head *list, - pgoff_t end, unsigned long flags) + pgoff_t end) { struct page *head = compound_head(page); pg_data_t *pgdat = page_pgdat(head); @@ -2445,8 +2445,6 @@ static void __split_huge_page(struct page *page, struct list_head *list, unsigned int nr = thp_nr_pages(head); int i; - lruvec = mem_cgroup_page_lruvec(head, pgdat); - /* complete memcg works before add pages to LRU */ mem_cgroup_split_huge_fixup(head); @@ -2458,6 +2456,11 @@ static void __split_huge_page(struct page *page, struct list_head *list, xa_lock(&swap_cache->i_pages); } + /* prevent PageLRU to go away from under us, and freeze lru stats */ + spin_lock(&pgdat->lru_lock); + + lruvec = mem_cgroup_page_lruvec(head, pgdat); + for (i = nr - 1; i >= 1; i--) { __split_huge_page_tail(head, i, lruvec, list); /* Some pages can be beyond i_size: drop them from page cache */ @@ -2477,6 +2480,8 @@ static void __split_huge_page(struct page *page, struct list_head *list, } ClearPageCompound(head); + spin_unlock(&pgdat->lru_lock); + /* Caller disabled irqs, so they are still disabled here */ split_page_owner(head, nr); @@ -2494,8 +2499,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, page_ref_add(head, 2); xa_unlock(&head->mapping->i_pages); } - - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + local_irq_enable(); remap_page(head, nr); @@ -2641,12 +2645,10 @@ bool can_split_huge_page(struct page *page, int *pextra_pins) int split_huge_page_to_list(struct page *page, struct list_head *list) { struct page *head = compound_head(page); - struct pglist_data *pgdata = NODE_DATA(page_to_nid(head)); struct deferred_split *ds_queue = get_deferred_split_queue(head); struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; int count, mapcount, extra_pins, ret; - unsigned long flags; pgoff_t end; VM_BUG_ON_PAGE(is_huge_zero_page(head), head); @@ -2707,9 +2709,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) unmap_page(head); VM_BUG_ON_PAGE(compound_mapcount(head), head); - /* prevent PageLRU to go away from under us, and freeze lru stats */ - spin_lock_irqsave(&pgdata->lru_lock, flags); - + /* block interrupt reentry in xa_lock and spinlock */ + local_irq_disable(); if (mapping) { XA_STATE(xas, &mapping->i_pages, page_index(head)); @@ -2739,7 +2740,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __dec_lruvec_page_state(head, NR_FILE_THPS); } - __split_huge_page(page, list, end, flags); + __split_huge_page(page, list, end); ret = 0; } else { if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { @@ -2753,7 +2754,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) spin_unlock(&ds_queue->split_queue_lock); fail: if (mapping) xa_unlock(&mapping->i_pages); - spin_unlock_irqrestore(&pgdata->lru_lock, flags); + local_irq_enable(); remap_page(head, thp_nr_pages(head)); ret = -EBUSY; } From patchwork Thu Nov 5 08:55:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883685 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 26B72697 for ; Thu, 5 Nov 2020 08:56:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D3D26206FB for ; Thu, 5 Nov 2020 08:56:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D3D26206FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CAFD96B00A8; Thu, 5 Nov 2020 03:56:19 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C33D46B00AA; Thu, 5 Nov 2020 03:56:19 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFA5F6B00AB; Thu, 5 Nov 2020 03:56:19 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0245.hostedemail.com [216.40.44.245]) by kanga.kvack.org (Postfix) with ESMTP id 767BD6B00A8 for ; Thu, 5 Nov 2020 03:56:19 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 24D19181AC9C6 for ; Thu, 5 Nov 2020 08:56:19 +0000 (UTC) X-FDA: 77449758078.26.wind54_4608ee6272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 0338D1804B65C for ; Thu, 5 Nov 2020 08:56:18 +0000 (UTC) X-Spam-Summary: 1,0,0,003d4024c3cda000,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:41:69:355:379:541:800:960:966:968:973:988:989:1260:1261:1345:1359:1381:1431:1437:1534:1543:1711:1730:1747:1777:1792:1801:2196:2199:2393:2559:2562:2898:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3872:4321:4385:4605:5007:6261:6737:6738:8957:9010:9121:9592:10004:11026:11473:11658:11914:12043:12048:12291:12296:12297:12438:12555:12683:12895:12986:13161:13229:13846:14181:14394:14721:14915:21060:21080:21451:21627:21740:21987:30054:30070,0,RBL:115.124.30.133:@linux.alibaba.com:.lbl8.mailshell.net-64.201.201.201 62.20.2.100;04yg7xx7bbyicgir59dyop6qaxqoryphmyah9jbhwhectg4sdsa7rs7rim33skt.563kxogyzefr8e6o7jquny4wpa7xpkha8ka4cge4oh78tcpyd8emy9k714khsj4.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: wind54_4608ee6272c8 X-Filterd-Recvd-Size: 4819 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:16 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:10 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Subject: [PATCH v21 05/19] mm/vmscan: remove unnecessary lruvec adding Date: Thu, 5 Nov 2020 16:55:35 +0800 Message-Id: <1604566549-62481-6-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We don't have to add a freeable page into lru and then remove from it. This change saves a couple of actions and makes the moving more clear. The SetPageLRU needs to be kept before put_page_testzero for list integrity, otherwise: #0 move_pages_to_lru #1 release_pages if !put_page_testzero if (put_page_testzero()) !PageLRU //skip lru_lock SetPageLRU() list_add(&page->lru,) list_add(&page->lru,) [akpm@linux-foundation.org: coding style fixes] Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Andrew Morton Cc: Johannes Weiner Cc: Tejun Heo Cc: Matthew Wilcox Cc: Hugh Dickins Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Acked-by: Vlastimil Babka --- mm/vmscan.c | 38 +++++++++++++++++++++++++------------- 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 12a4873942e2..b9935668d121 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1852,26 +1852,30 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, while (!list_empty(list)) { page = lru_to_page(list); VM_BUG_ON_PAGE(PageLRU(page), page); + list_del(&page->lru); if (unlikely(!page_evictable(page))) { - list_del(&page->lru); spin_unlock_irq(&pgdat->lru_lock); putback_lru_page(page); spin_lock_irq(&pgdat->lru_lock); continue; } - lruvec = mem_cgroup_page_lruvec(page, pgdat); + /* + * The SetPageLRU needs to be kept here for list integrity. + * Otherwise: + * #0 move_pages_to_lru #1 release_pages + * if !put_page_testzero + * if (put_page_testzero()) + * !PageLRU //skip lru_lock + * SetPageLRU() + * list_add(&page->lru,) + * list_add(&page->lru,) + */ SetPageLRU(page); - lru = page_lru(page); - nr_pages = thp_nr_pages(page); - update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); - list_move(&page->lru, &lruvec->lists[lru]); - - if (put_page_testzero(page)) { + if (unlikely(put_page_testzero(page))) { __ClearPageLRU(page); __ClearPageActive(page); - del_page_from_lru_list(page, lruvec, lru); if (unlikely(PageCompound(page))) { spin_unlock_irq(&pgdat->lru_lock); @@ -1879,11 +1883,19 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, spin_lock_irq(&pgdat->lru_lock); } else list_add(&page->lru, &pages_to_free); - } else { - nr_moved += nr_pages; - if (PageActive(page)) - workingset_age_nonresident(lruvec, nr_pages); + + continue; } + + lruvec = mem_cgroup_page_lruvec(page, pgdat); + lru = page_lru(page); + nr_pages = thp_nr_pages(page); + + update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); + list_add(&page->lru, &lruvec->lists[lru]); + nr_moved += nr_pages; + if (PageActive(page)) + workingset_age_nonresident(lruvec, nr_pages); } /* From patchwork Thu Nov 5 08:55:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883679 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 004A9921 for ; Thu, 5 Nov 2020 08:56:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AB3F5206FB for ; Thu, 5 Nov 2020 08:56:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AB3F5206FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B419B6B00A5; Thu, 5 Nov 2020 03:56:17 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 81A5B6B00A8; Thu, 5 Nov 2020 03:56:17 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 443A76B00A6; Thu, 5 Nov 2020 03:56:17 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0251.hostedemail.com [216.40.44.251]) by kanga.kvack.org (Postfix) with ESMTP id E845C6B00A5 for ; Thu, 5 Nov 2020 03:56:16 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 80A01180AD80F for ; Thu, 5 Nov 2020 08:56:16 +0000 (UTC) X-FDA: 77449757952.01.cork70_02139d9272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 6450A1004C6DE for ; Thu, 5 Nov 2020 08:56:16 +0000 (UTC) X-Spam-Summary: 1,0,0,64d6c37eac960172,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:41:355:379:541:560:800:960:967:973:988:989:1260:1261:1345:1359:1431:1437:1534:1542:1711:1730:1747:1777:1792:2198:2199:2393:2525:2553:2559:2563:2682:2685:2693:2731:2859:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3354:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4321:5007:6261:6737:6738:7514:8784:8957:8985:9025:9121:10004:11026:11473:11658:11854:11914:12043:12048:12296:12297:12438:12555:12679:12696:12737:12895:12986:13053:13161:13229:13846:14096:14181:14394:14721:14915:21060:21080:21451:21627:21796:21811:30003:30036:30054:30056:30070:30090,0,RBL:115.124.30.130:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201;04y8zdkn7pwfhp6dp1sedyymgc5yfocnqcd3rkn1txdakxedbzriopecxtokx1w.9w9s5z6pbh88r14kcjtgrc6xinfofttenwnbyfqg4qnueg4i49pwdbmebz38s6n.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0. 5,0.5,Ne X-HE-Tag: cork70_02139d9272c8 X-Filterd-Recvd-Size: 4442 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:15 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=alimailimapcm10staff010182156082;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:10 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: Minchan Kim Subject: [PATCH v21 06/19] mm/rmap: stop store reordering issue on page->mapping Date: Thu, 5 Nov 2020 16:55:36 +0800 Message-Id: <1604566549-62481-7-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hugh Dickins and Minchan Kim observed a long time issue which discussed here, but actully the mentioned fix missed. https://lore.kernel.org/lkml/20150504031722.GA2768@blaptop/ The store reordering may cause problem in the scenario: CPU 0 CPU1 do_anonymous_page page_add_new_anon_rmap() page->mapping = anon_vma + PAGE_MAPPING_ANON lru_cache_add_inactive_or_unevictable() spin_lock(lruvec->lock) SetPageLRU() spin_unlock(lruvec->lock) /* idletacking judged it as LRU * page so pass the page in * page_idle_clear_pte_refs */ page_idle_clear_pte_refs rmap_walk if PageAnon(page) Johannes give detailed examples how the store reordering could cause a trouble: The concern is the SetPageLRU may get reorder before 'page->mapping' setting, That would make CPU 1 will observe at page->mapping after observing PageLRU set on the page. 1. anon_vma + PAGE_MAPPING_ANON That's the in-order scenario and is fine. 2. NULL That's possible if the page->mapping store gets reordered to occur after SetPageLRU. That's fine too because we check for it. 3. anon_vma without the PAGE_MAPPING_ANON bit That would be a problem and could lead to all kinds of undesirable behavior including crashes and data corruption. Is it possible? AFAICT the compiler is allowed to tear the store to page->mapping and I don't see anything that would prevent it. That said, I also don't see how the reader testing PageLRU under the lru_lock would prevent that in the first place. AFAICT we need that WRITE_ONCE() around the page->mapping assignment. Signed-off-by: Alex Shi Cc: Johannes Weiner Cc: Andrew Morton Cc: Hugh Dickins Cc: Matthew Wilcox Cc: Minchan Kim Cc: Vladimir Davydov Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Alex Shi Acked-by: Johannes Weiner Acked-by: Hugh Dickins --- mm/rmap.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/rmap.c b/mm/rmap.c index 1b84945d655c..078d54da59d4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1054,8 +1054,13 @@ static void __page_set_anon_rmap(struct page *page, if (!exclusive) anon_vma = anon_vma->root; + /* + * Prevent page->mapping from pointing to an anon_vma without + * the PAGE_MAPPING_ANON bit set. This could happen if the + * compiler stores anon_vma and then adds PAGE_MAPPING_ANON to it. + */ anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON; - page->mapping = (struct address_space *) anon_vma; + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma); page->index = linear_page_index(vma, address); } From patchwork Thu Nov 5 08:55:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883681 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 21294697 for ; Thu, 5 Nov 2020 08:56:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AA2F9206FB for ; Thu, 5 Nov 2020 08:56:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AA2F9206FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4945B6B00A6; Thu, 5 Nov 2020 03:56:18 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 41FDD6B00A7; Thu, 5 Nov 2020 03:56:18 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1AB526B00A8; Thu, 5 Nov 2020 03:56:18 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0196.hostedemail.com [216.40.44.196]) by kanga.kvack.org (Postfix) with ESMTP id DAADA6B00A6 for ; Thu, 5 Nov 2020 03:56:17 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 7E2163D15 for ; Thu, 5 Nov 2020 08:56:17 +0000 (UTC) X-FDA: 77449757994.17.twig78_500bc2b272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id 665E1180D0181 for ; Thu, 5 Nov 2020 08:56:17 +0000 (UTC) X-Spam-Summary: 1,0,0,480cbe612c2c23eb,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:41:355:379:541:800:960:967:973:988:989:1260:1261:1345:1359:1431:1437:1534:1541:1711:1730:1747:1777:1792:2194:2198:2199:2200:2393:2525:2559:2563:2682:2685:2859:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3352:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4321:5007:6261:6737:6738:7514:7576:7903:8957:9025:10004:11026:11658:11914:12043:12048:12114:12296:12297:12438:12555:12679:12696:12737:12895:12986:13069:13311:13357:13846:14096:14181:14384:14394:14721:14915:21060:21080:21451:21627:21811:21990:30054:30070,0,RBL:115.124.30.132:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201;04yf5denwseczi1grb48yet8q55ftoph9odr85o63x1eec3sz8ap3kejphhuqsj.reefphgrc8ctk7p3tw3m8taudrbyesht9iagnteqwpf86i5d9xa31wkjehxtpoa.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF: not bulk X-HE-Tag: twig78_500bc2b272c8 X-Filterd-Recvd-Size: 3369 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:15 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R671e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=23;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:11 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: Vlastimil Babka , Minchan Kim Subject: [PATCH v21 07/19] mm: page_idle_get_page() does not need lru_lock Date: Thu, 5 Nov 2020 16:55:37 +0800 Message-Id: <1604566549-62481-8-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins It is necessary for page_idle_get_page() to recheck PageLRU() after get_page_unless_zero(), but holding lru_lock around that serves no useful purpose, and adds to lru_lock contention: delete it. See https://lore.kernel.org/lkml/20150504031722.GA2768@blaptop for the discussion that led to lru_lock there; but __page_set_anon_rmap() now uses WRITE_ONCE(), and I see no other risk in page_idle_clear_pte_refs() using rmap_walk() (beyond the risk of racing PageAnon->PageKsm, mostly but not entirely prevented by page_count() check in ksm.c's write_protect_page(): that risk being shared with page_referenced() and not helped by lru_lock). Signed-off-by: Hugh Dickins Signed-off-by: Alex Shi Cc: Andrew Morton Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Minchan Kim Cc: Alex Shi Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Acked-by: Johannes Weiner Acked-by: "Huang, Ying" Acked-by: Vlastimil Babka --- mm/page_idle.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/mm/page_idle.c b/mm/page_idle.c index 057c61df12db..64e5344a992c 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -32,19 +32,15 @@ static struct page *page_idle_get_page(unsigned long pfn) { struct page *page = pfn_to_online_page(pfn); - pg_data_t *pgdat; if (!page || !PageLRU(page) || !get_page_unless_zero(page)) return NULL; - pgdat = page_pgdat(page); - spin_lock_irq(&pgdat->lru_lock); if (unlikely(!PageLRU(page))) { put_page(page); page = NULL; } - spin_unlock_irq(&pgdat->lru_lock); return page; } From patchwork Thu Nov 5 08:55:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883699 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9FD24921 for ; Thu, 5 Nov 2020 08:56:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6598D2073A for ; Thu, 5 Nov 2020 08:56:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6598D2073A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BC5AA6B00B1; Thu, 5 Nov 2020 03:56:23 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B2DE36B00B5; Thu, 5 Nov 2020 03:56:23 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EC436B00B2; Thu, 5 Nov 2020 03:56:23 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0073.hostedemail.com [216.40.44.73]) by kanga.kvack.org (Postfix) with ESMTP id 37FFD6B00B3 for ; Thu, 5 Nov 2020 03:56:23 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D3C8F583A for ; Thu, 5 Nov 2020 08:56:22 +0000 (UTC) X-FDA: 77449758204.01.loss39_4215c4f272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id A6C6A1004C6DE for ; Thu, 5 Nov 2020 08:56:22 +0000 (UTC) X-Spam-Summary: 1,0,0,2a6ba52918f6507f,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1345:1359:1431:1437:1534:1541:1711:1714:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3351:3870:3876:4321:5007:6261:6737:6738:7514:10004:11026:11473:11658:11914:12043:12048:12296:12297:12438:12555:12895:13069:13311:13357:13846:14096:14181:14384:14394:14721:14915:21060:21080:21450:21451:21627:21990:30054:30070,0,RBL:115.124.30.45:@linux.alibaba.com:.lbl8.mailshell.net-64.201.201.201 62.20.2.100;04yrz56mhjsz6t4w98f79kxytf8s7ypn1utoc89f1pkmyrrzhm1emfg9z5g6az4.k3zhsc7dh3wonprrtziasjdghf51iyfxhrjn3fepw79j4px4qxuj7szjsxjgqw5.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: loss39_4215c4f272c8 X-Filterd-Recvd-Size: 2698 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:21 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R661e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:11 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: Michal Hocko Subject: [PATCH v21 08/19] mm/memcg: add debug checking in lock_page_memcg Date: Thu, 5 Nov 2020 16:55:38 +0800 Message-Id: <1604566549-62481-9-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a debug checking in lock_page_memcg, then we could get alarm if anything wrong here. Suggested-by: Johannes Weiner Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Andrew Morton Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/memcontrol.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b2aa3b73ab82..157b745031a4 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2121,6 +2121,12 @@ struct mem_cgroup *lock_page_memcg(struct page *page) if (unlikely(!memcg)) return NULL; +#ifdef CONFIG_PROVE_LOCKING + local_irq_save(flags); + might_lock(&memcg->move_lock); + local_irq_restore(flags); +#endif + if (atomic_read(&memcg->moving_account) <= 0) return memcg; From patchwork Thu Nov 5 08:55:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883689 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E7729921 for ; Thu, 5 Nov 2020 08:56:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 922502068D for ; Thu, 5 Nov 2020 08:56:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 922502068D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B768E6B00B4; Thu, 5 Nov 2020 03:56:22 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A8A456B00B3; Thu, 5 Nov 2020 03:56:22 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8DD0E6B00B2; Thu, 5 Nov 2020 03:56:22 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0009.hostedemail.com [216.40.44.9]) by kanga.kvack.org (Postfix) with ESMTP id 4F80C6B00AE for ; Thu, 5 Nov 2020 03:56:22 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E28E8180AD807 for ; Thu, 5 Nov 2020 08:56:21 +0000 (UTC) X-FDA: 77449758162.22.hen81_0708818272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id B877D18038E60 for ; Thu, 5 Nov 2020 08:56:21 +0000 (UTC) X-Spam-Summary: 1,0,0,254b45523700786a,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:2:41:69:355:379:541:800:960:966:973:981:988:989:1260:1261:1345:1359:1381:1431:1437:1535:1605:1730:1747:1777:1792:2196:2199:2393:2553:2559:2562:2904:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4051:4120:4250:4385:4605:5007:6261:6737:6738:7901:7903:8660:8957:9010:9592:10004:11026:11473:11658:11914:12043:12048:12114:12257:12291:12296:12297:12438:12555:12683:12895:13148:13230:13846:14096:14394:14915:21060:21063:21080:21451:21627:21939:21966:21987:30054:30064:30070:30090,0,RBL:115.124.30.132:@linux.alibaba.com:.lbl8.mailshell.net-64.201.201.201 62.20.2.100;04yrmt85af1ypbf8mhe37yhy4y46dopwph7s98xo9rozhg3oy1sma46w97pbp7o.fyk143jhyyhbpxudnt6jcke34u5kx9rhsojjdp18joo6n9w1chquahsg64axb16.a-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: hen81_0708818272c8 X-Filterd-Recvd-Size: 9925 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:20 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04407;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:11 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Subject: [PATCH v21 09/19] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Date: Thu, 5 Nov 2020 16:55:39 +0800 Message-Id: <1604566549-62481-10-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Fold the PGROTATED event collection into pagevec_move_tail_fn call back func like other funcs does in pagevec_lru_move_fn. Thus we could save func call pagevec_move_tail(). Now all usage of pagevec_lru_move_fn are same and no needs of its 3rd parameter. It's just simply the calling. No functional change. [lkp@intel.com: found a build issue in the original patch, thanks] Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Andrew Morton Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/swap.c | 65 ++++++++++++++++++++++----------------------------------------- 1 file changed, 23 insertions(+), 42 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index 8a578381c2fc..ce8c97146e0d 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -204,8 +204,7 @@ int get_kernel_page(unsigned long start, int write, struct page **pages) EXPORT_SYMBOL_GPL(get_kernel_page); static void pagevec_lru_move_fn(struct pagevec *pvec, - void (*move_fn)(struct page *page, struct lruvec *lruvec, void *arg), - void *arg) + void (*move_fn)(struct page *page, struct lruvec *lruvec)) { int i; struct pglist_data *pgdat = NULL; @@ -224,7 +223,7 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, } lruvec = mem_cgroup_page_lruvec(page, pgdat); - (*move_fn)(page, lruvec, arg); + (*move_fn)(page, lruvec); } if (pgdat) spin_unlock_irqrestore(&pgdat->lru_lock, flags); @@ -232,35 +231,22 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, pagevec_reinit(pvec); } -static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec, - void *arg) +static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec) { - int *pgmoved = arg; - if (PageLRU(page) && !PageUnevictable(page)) { del_page_from_lru_list(page, lruvec, page_lru(page)); ClearPageActive(page); add_page_to_lru_list_tail(page, lruvec, page_lru(page)); - (*pgmoved) += thp_nr_pages(page); + __count_vm_events(PGROTATED, thp_nr_pages(page)); } } /* - * pagevec_move_tail() must be called with IRQ disabled. - * Otherwise this may cause nasty races. - */ -static void pagevec_move_tail(struct pagevec *pvec) -{ - int pgmoved = 0; - - pagevec_lru_move_fn(pvec, pagevec_move_tail_fn, &pgmoved); - __count_vm_events(PGROTATED, pgmoved); -} - -/* * Writeback is about to end against a page which has been marked for immediate * reclaim. If it still appears to be reclaimable, move it to the tail of the * inactive list. + * + * rotate_reclaimable_page() must disable IRQs, to prevent nasty races. */ void rotate_reclaimable_page(struct page *page) { @@ -273,7 +259,7 @@ void rotate_reclaimable_page(struct page *page) local_lock_irqsave(&lru_rotate.lock, flags); pvec = this_cpu_ptr(&lru_rotate.pvec); if (!pagevec_add(pvec, page) || PageCompound(page)) - pagevec_move_tail(pvec); + pagevec_lru_move_fn(pvec, pagevec_move_tail_fn); local_unlock_irqrestore(&lru_rotate.lock, flags); } } @@ -315,8 +301,7 @@ void lru_note_cost_page(struct page *page) page_is_file_lru(page), thp_nr_pages(page)); } -static void __activate_page(struct page *page, struct lruvec *lruvec, - void *arg) +static void __activate_page(struct page *page, struct lruvec *lruvec) { if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) { int lru = page_lru_base_type(page); @@ -340,7 +325,7 @@ static void activate_page_drain(int cpu) struct pagevec *pvec = &per_cpu(lru_pvecs.activate_page, cpu); if (pagevec_count(pvec)) - pagevec_lru_move_fn(pvec, __activate_page, NULL); + pagevec_lru_move_fn(pvec, __activate_page); } static bool need_activate_page_drain(int cpu) @@ -358,7 +343,7 @@ static void activate_page(struct page *page) pvec = this_cpu_ptr(&lru_pvecs.activate_page); get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) - pagevec_lru_move_fn(pvec, __activate_page, NULL); + pagevec_lru_move_fn(pvec, __activate_page); local_unlock(&lru_pvecs.lock); } } @@ -374,7 +359,7 @@ static void activate_page(struct page *page) page = compound_head(page); spin_lock_irq(&pgdat->lru_lock); - __activate_page(page, mem_cgroup_page_lruvec(page, pgdat), NULL); + __activate_page(page, mem_cgroup_page_lruvec(page, pgdat)); spin_unlock_irq(&pgdat->lru_lock); } #endif @@ -525,8 +510,7 @@ void lru_cache_add_inactive_or_unevictable(struct page *page, * be write it out by flusher threads as this is much more effective * than the single-page writeout from reclaim. */ -static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, - void *arg) +static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec) { int lru; bool active; @@ -573,8 +557,7 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, } } -static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, - void *arg) +static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec) { if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { int lru = page_lru_base_type(page); @@ -591,8 +574,7 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, } } -static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, - void *arg) +static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec) { if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && !PageSwapCache(page) && !PageUnevictable(page)) { @@ -636,21 +618,21 @@ void lru_add_drain_cpu(int cpu) /* No harm done if a racing interrupt already did this */ local_lock_irqsave(&lru_rotate.lock, flags); - pagevec_move_tail(pvec); + pagevec_lru_move_fn(pvec, pagevec_move_tail_fn); local_unlock_irqrestore(&lru_rotate.lock, flags); } pvec = &per_cpu(lru_pvecs.lru_deactivate_file, cpu); if (pagevec_count(pvec)) - pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); + pagevec_lru_move_fn(pvec, lru_deactivate_file_fn); pvec = &per_cpu(lru_pvecs.lru_deactivate, cpu); if (pagevec_count(pvec)) - pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + pagevec_lru_move_fn(pvec, lru_deactivate_fn); pvec = &per_cpu(lru_pvecs.lru_lazyfree, cpu); if (pagevec_count(pvec)) - pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); + pagevec_lru_move_fn(pvec, lru_lazyfree_fn); activate_page_drain(cpu); } @@ -679,7 +661,7 @@ void deactivate_file_page(struct page *page) pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate_file); if (!pagevec_add(pvec, page) || PageCompound(page)) - pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); + pagevec_lru_move_fn(pvec, lru_deactivate_file_fn); local_unlock(&lru_pvecs.lock); } } @@ -701,7 +683,7 @@ void deactivate_page(struct page *page) pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate); get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) - pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + pagevec_lru_move_fn(pvec, lru_deactivate_fn); local_unlock(&lru_pvecs.lock); } } @@ -723,7 +705,7 @@ void mark_page_lazyfree(struct page *page) pvec = this_cpu_ptr(&lru_pvecs.lru_lazyfree); get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) - pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); + pagevec_lru_move_fn(pvec, lru_lazyfree_fn); local_unlock(&lru_pvecs.lock); } } @@ -977,8 +959,7 @@ void __pagevec_release(struct pagevec *pvec) } EXPORT_SYMBOL(__pagevec_release); -static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, - void *arg) +static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec) { enum lru_list lru; int was_unevictable = TestClearPageUnevictable(page); @@ -1037,7 +1018,7 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, */ void __pagevec_lru_add(struct pagevec *pvec) { - pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, NULL); + pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn); } /** From patchwork Thu Nov 5 08:55:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883709 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB717921 for ; Thu, 5 Nov 2020 08:56:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 94A3C2068D for ; Thu, 5 Nov 2020 08:56:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 94A3C2068D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DBACA6B00BE; Thu, 5 Nov 2020 03:56:30 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D496B6B00BF; Thu, 5 Nov 2020 03:56:30 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B4E1B6B00C1; Thu, 5 Nov 2020 03:56:30 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id 769D56B00BE for ; Thu, 5 Nov 2020 03:56:30 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1943C8249980 for ; Thu, 5 Nov 2020 08:56:30 +0000 (UTC) X-FDA: 77449758540.29.pest07_1f06f76272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id EE974180868DF for ; Thu, 5 Nov 2020 08:56:29 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:30054:30090,0,RBL:47.88.44.36:@linux.alibaba.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10;04y8yfrxepf775xd734u6djbmg6mhocs8hw1rhjhez5egkuyi38sh7thxn9apei.krdkmppf9fn3byx5qc4ewerh57oy9wjegfppwaon6ezpf9e5ny73jddddtammy6.q-lbl8.mailshell.net-223.238.255.100;47.88.44.36-irl.urbl.hostedemail.com-127.0.0.150,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:68,LUA_SUMMARY:none X-HE-Tag: pest07_1f06f76272c8 X-Filterd-Recvd-Size: 4272 Received: from out4436.biz.mail.alibaba.com (out4436.biz.mail.alibaba.com [47.88.44.36]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:28 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R451e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:12 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Subject: [PATCH v21 10/19] mm/lru: move lock into lru_note_cost Date: Thu, 5 Nov 2020 16:55:40 +0800 Message-Id: <1604566549-62481-11-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We have to move lru_lock into lru_note_cost, since it cycle up on memcg tree, for future per lruvec lru_lock replace. It's a bit ugly and may cost a bit more locking, but benefit from multiple memcg locking could cover the lost. Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Johannes Weiner Cc: Andrew Morton Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/swap.c | 3 +++ mm/vmscan.c | 4 +--- mm/workingset.c | 2 -- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index ce8c97146e0d..2681d9023998 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -268,7 +268,9 @@ void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages) { do { unsigned long lrusize; + struct pglist_data *pgdat = lruvec_pgdat(lruvec); + spin_lock_irq(&pgdat->lru_lock); /* Record cost event */ if (file) lruvec->file_cost += nr_pages; @@ -292,6 +294,7 @@ void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages) lruvec->file_cost /= 2; lruvec->anon_cost /= 2; } + spin_unlock_irq(&pgdat->lru_lock); } while ((lruvec = parent_lruvec(lruvec))); } diff --git a/mm/vmscan.c b/mm/vmscan.c index b9935668d121..d771f812e983 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1973,19 +1973,17 @@ static int current_may_throttle(void) &stat, false); spin_lock_irq(&pgdat->lru_lock); - move_pages_to_lru(lruvec, &page_list); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - lru_note_cost(lruvec, file, stat.nr_pageout); item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); - spin_unlock_irq(&pgdat->lru_lock); + lru_note_cost(lruvec, file, stat.nr_pageout); mem_cgroup_uncharge_list(&page_list); free_unref_page_list(&page_list); diff --git a/mm/workingset.c b/mm/workingset.c index 130348cbf40a..a915a812c363 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -381,9 +381,7 @@ void workingset_refault(struct page *page, void *shadow) if (workingset) { SetPageWorkingset(page); /* XXX: Move to lru_cache_add() when it supports new vs putback */ - spin_lock_irq(&page_pgdat(page)->lru_lock); lru_note_cost_page(page); - spin_unlock_irq(&page_pgdat(page)->lru_lock); inc_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + file); } out: From patchwork Thu Nov 5 08:55:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883703 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 02BFC697 for ; Thu, 5 Nov 2020 08:56:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B55B1206C0 for ; Thu, 5 Nov 2020 08:56:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B55B1206C0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0ED2D6B00BA; Thu, 5 Nov 2020 03:56:29 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 026B16B00BC; Thu, 5 Nov 2020 03:56:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC0F66B00BD; Thu, 5 Nov 2020 03:56:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0203.hostedemail.com [216.40.44.203]) by kanga.kvack.org (Postfix) with ESMTP id 979E76B00BA for ; Thu, 5 Nov 2020 03:56:28 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3B82E582B for ; Thu, 5 Nov 2020 08:56:28 +0000 (UTC) X-FDA: 77449758456.23.stamp78_5903f19272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 1971937606 for ; Thu, 5 Nov 2020 08:56:28 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:30054:30070,0,RBL:47.88.44.36:@linux.alibaba.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100;04yrfxpm4tdxkdxdqnt3xk5o9y6f5yp1j75os8bjhzcdmwjispjx4mhb5qm3tqh.oghoepwhdicqpkguunbhredbjuqtytittdjq8xz9m7tndin5btnrrp7k4qz8oge.o-lbl8.mailshell.net-223.238.255.100;47.88.44.36-irl.urbl.hostedemail.com-127.0.0.150,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: stamp78_5903f19272c8 X-Filterd-Recvd-Size: 3100 Received: from out4436.biz.mail.alibaba.com (out4436.biz.mail.alibaba.com [47.88.44.36]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:27 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R591e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=23;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:12 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: Alexander Duyck , Michal Hocko Subject: [PATCH v21 11/19] mm/vmscan: remove lruvec reget in move_pages_to_lru Date: Thu, 5 Nov 2020 16:55:41 +0800 Message-Id: <1604566549-62481-12-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Isolated page shouldn't be recharged by memcg since the memcg migration isn't possible at the time. All pages were isolated from the same lruvec (and isolation inhibits memcg migration). So remove unnecessary regetting. Thanks to Alexander Duyck for pointing this out. Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Alexander Duyck Cc: Andrew Morton Cc: Konstantin Khlebnikov Cc: Michal Hocko Cc: Hugh Dickins Cc: Johannes Weiner Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org --- mm/vmscan.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index d771f812e983..cb2f6256a7d6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1887,7 +1887,12 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, continue; } - lruvec = mem_cgroup_page_lruvec(page, pgdat); + /* + * All pages were isolated from the same lruvec (and isolation + * inhibits memcg migration). + */ + VM_BUG_ON_PAGE(mem_cgroup_page_lruvec(page, page_pgdat(page)) + != lruvec, page); lru = page_lru(page); nr_pages = thp_nr_pages(page); From patchwork Thu Nov 5 08:55:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883693 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C22D7697 for ; Thu, 5 Nov 2020 08:56:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8106A22256 for ; Thu, 5 Nov 2020 08:56:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8106A22256 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7F0D76B00B3; Thu, 5 Nov 2020 03:56:23 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7A2636B00B6; Thu, 5 Nov 2020 03:56:23 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61E426B00B5; Thu, 5 Nov 2020 03:56:23 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0048.hostedemail.com [216.40.44.48]) by kanga.kvack.org (Postfix) with ESMTP id 271786B00B1 for ; Thu, 5 Nov 2020 03:56:23 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B9A578249980 for ; Thu, 5 Nov 2020 08:56:22 +0000 (UTC) X-FDA: 77449758204.15.jail04_0616ddb272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 9ECAF1814B0C7 for ; Thu, 5 Nov 2020 08:56:22 +0000 (UTC) X-Spam-Summary: 1,0,0,f1e0cc20f944e582,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:41:69:355:379:541:800:960:968:973:988:989:1260:1261:1345:1359:1431:1437:1534:1542:1711:1730:1747:1777:1792:2198:2199:2380:2393:2559:2562:2897:3138:3139:3140:3141:3142:3353:3865:3866:3867:3868:3870:3871:3872:4321:5007:6119:6261:6737:6738:8957:9040:9592:10004:11026:11232:11473:11638:11639:11658:11914:12043:12048:12296:12297:12438:12555:12895:12986:13846:14096:14181:14394:14721:14915:21060:21080:21451:21627:21990:30054:30070:30079,0,RBL:115.124.30.133:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201;04yf78osdr7o88663a85c79mrut3cycuuea3jgwekgcs6m8wdzsrtrkjycqbtiw.gw6eiq1decb9ggmbjj9e4a4ushdkxpyw4xomq6w3oqej8g6cms51yu4wxpf3zn7.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:68,LUA_SUMMARY:none X-HE-Tag: jail04_0616ddb272c8 X-Filterd-Recvd-Size: 4170 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:20 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=23;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:13 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: "Kirill A. Shutemov" , Vlastimil Babka Subject: [PATCH v21 12/19] mm/mlock: remove lru_lock on TestClearPageMlocked Date: Thu, 5 Nov 2020 16:55:42 +0800 Message-Id: <1604566549-62481-13-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In the func munlock_vma_page, comments mentained lru_lock needed for serialization with split_huge_pages. But the page must be PageLocked as well as pages in split_huge_page series funcs. Thus the PageLocked is enough to serialize both funcs. Further more, Hugh Dickins pointed: before splitting in split_huge_page_to_list, the page was unmap_page() to remove pmd/ptes which protect the page from munlock. Thus, no needs to guard __split_huge_page_tail for mlock clean, just keep the lru_lock there for isolation purpose. LKP found a preempt issue on __mod_zone_page_state which need change to mod_zone_page_state. Thanks! Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Kirill A. Shutemov Cc: Vlastimil Babka Cc: Andrew Morton Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Acked-by: Vlastimil Babka --- mm/mlock.c | 26 +++++--------------------- 1 file changed, 5 insertions(+), 21 deletions(-) diff --git a/mm/mlock.c b/mm/mlock.c index 884b1216da6a..796c726a0407 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -187,40 +187,24 @@ static void __munlock_isolation_failed(struct page *page) unsigned int munlock_vma_page(struct page *page) { int nr_pages; - pg_data_t *pgdat = page_pgdat(page); /* For try_to_munlock() and to serialize with page migration */ BUG_ON(!PageLocked(page)); - VM_BUG_ON_PAGE(PageTail(page), page); - /* - * Serialize with any parallel __split_huge_page_refcount() which - * might otherwise copy PageMlocked to part of the tail pages before - * we clear it in the head page. It also stabilizes thp_nr_pages(). - */ - spin_lock_irq(&pgdat->lru_lock); - if (!TestClearPageMlocked(page)) { /* Potentially, PTE-mapped THP: do not skip the rest PTEs */ - nr_pages = 1; - goto unlock_out; + return 0; } nr_pages = thp_nr_pages(page); - __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); + mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); - if (__munlock_isolate_lru_page(page, true)) { - spin_unlock_irq(&pgdat->lru_lock); + if (!isolate_lru_page(page)) __munlock_isolated_page(page); - goto out; - } - __munlock_isolation_failed(page); - -unlock_out: - spin_unlock_irq(&pgdat->lru_lock); + else + __munlock_isolation_failed(page); -out: return nr_pages - 1; } From patchwork Thu Nov 5 08:55:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883705 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D5A51921 for ; Thu, 5 Nov 2020 08:56:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8D6D6206C0 for ; Thu, 5 Nov 2020 08:56:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8D6D6206C0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E80566B00BC; Thu, 5 Nov 2020 03:56:29 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E083F6B00BD; Thu, 5 Nov 2020 03:56:29 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B4BD36B00BF; Thu, 5 Nov 2020 03:56:29 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0215.hostedemail.com [216.40.44.215]) by kanga.kvack.org (Postfix) with ESMTP id 6AA2E6B00BC for ; Thu, 5 Nov 2020 03:56:29 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 14F9F582B for ; Thu, 5 Nov 2020 08:56:29 +0000 (UTC) X-FDA: 77449758498.25.crush46_4201d37272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id E89691804E3A0 for ; Thu, 5 Nov 2020 08:56:28 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:30012:30054:30070,0,RBL:47.88.44.36:@linux.alibaba.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100;04y8c8oxor91p8smjg1woq1kpr4w7ycuf7bt38cdtkz3erus3erddaet7nku8me.mq6jjpwgd5ij4uaafm5xej846bxi61y68xezumgsx9rs8xyq56foboqzpe97e4b.s-lbl8.mailshell.net-223.238.255.100;47.88.44.36-irl.urbl.hostedemail.com-127.0.0.150,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:71,LUA_SUMMARY:none X-HE-Tag: crush46_4201d37272c8 X-Filterd-Recvd-Size: 3623 Received: from out4436.biz.mail.alibaba.com (out4436.biz.mail.alibaba.com [47.88.44.36]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:27 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R331e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04407;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=23;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:13 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: "Kirill A. Shutemov" , Vlastimil Babka Subject: [PATCH v21 13/19] mm/mlock: remove __munlock_isolate_lru_page Date: Thu, 5 Nov 2020 16:55:43 +0800 Message-Id: <1604566549-62481-14-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The func only has one caller, remove it to clean up code and simplify code. Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Hugh Dickins Cc: Kirill A. Shutemov Cc: Vlastimil Babka Cc: Andrew Morton Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Acked-by: Vlastimil Babka --- mm/mlock.c | 31 +++++++++---------------------- 1 file changed, 9 insertions(+), 22 deletions(-) diff --git a/mm/mlock.c b/mm/mlock.c index 796c726a0407..d487aa864e86 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -106,26 +106,6 @@ void mlock_vma_page(struct page *page) } /* - * Isolate a page from LRU with optional get_page() pin. - * Assumes lru_lock already held and page already pinned. - */ -static bool __munlock_isolate_lru_page(struct page *page, bool getpage) -{ - if (PageLRU(page)) { - struct lruvec *lruvec; - - lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (getpage) - get_page(page); - ClearPageLRU(page); - del_page_from_lru_list(page, lruvec, page_lru(page)); - return true; - } - - return false; -} - -/* * Finish munlock after successful page isolation * * Page must be locked. This is a wrapper for try_to_munlock() @@ -296,9 +276,16 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - if (__munlock_isolate_lru_page(page, false)) + if (PageLRU(page)) { + struct lruvec *lruvec; + + ClearPageLRU(page); + lruvec = mem_cgroup_page_lruvec(page, + page_pgdat(page)); + del_page_from_lru_list(page, lruvec, + page_lru(page)); continue; - else + } else __munlock_isolation_failed(page); } else { delta_munlocked++; From patchwork Thu Nov 5 08:55:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883701 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4AB12921 for ; Thu, 5 Nov 2020 08:56:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 01AA42068D for ; Thu, 5 Nov 2020 08:56:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 01AA42068D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6B93F6B00B8; Thu, 5 Nov 2020 03:56:27 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 669536B00B9; Thu, 5 Nov 2020 03:56:27 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E1E26B00BA; Thu, 5 Nov 2020 03:56:27 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0026.hostedemail.com [216.40.44.26]) by kanga.kvack.org (Postfix) with ESMTP id 1BD926B00B8 for ; Thu, 5 Nov 2020 03:56:27 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9BE9D8249980 for ; Thu, 5 Nov 2020 08:56:26 +0000 (UTC) X-FDA: 77449758372.23.sack29_39182fd272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 81E8D37606 for ; Thu, 5 Nov 2020 08:56:26 +0000 (UTC) X-Spam-Summary: 1,0,0,a908e3dc995bf04d,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:2:41:69:355:379:541:800:960:966:967:968:973:988:989:1260:1261:1345:1359:1431:1437:1535:1605:1730:1747:1777:1792:2196:2199:2393:2525:2553:2559:2563:2682:2685:2693:2859:2898:2901:2914:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4049:4119:4250:4321:4385:4605:5007:6261:6737:6738:7514:7903:8603:8957:9010:9025:9592:10004:11026:11232:11473:11638:11639:11658:11914:12043:12048:12296:12297:12438:12555:12683:12895:12986:13161:13172:13229:13845:13846:14093:14096:14394:14915:21060:21080:21451:21627:21740:21809:21811:21987:21990:30012:30054:30064:30070:30090,0,RBL:115.124.30.44:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201;04y8o7bdgf8iqeys3dn6jchk471idycm1bfxd5zs4ms3zscntofo5xngpj94x98.kjdzxtx9xntf8yatpszdn7xgazjco68i4usxfwpwzdhr4s3etwps4gst5b68weq.r-lbl8.mailshel l.net-22 X-HE-Tag: sack29_39182fd272c8 X-Filterd-Recvd-Size: 8402 Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com [115.124.30.44]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:24 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R331e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:14 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: Michal Hocko Subject: [PATCH v21 14/19] mm/lru: introduce TestClearPageLRU Date: Thu, 5 Nov 2020 16:55:44 +0800 Message-Id: <1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently lru_lock still guards both lru list and page's lru bit, that's ok. but if we want to use specific lruvec lock on the page, we need to pin down the page's lruvec/memcg during locking. Just taking lruvec lock first may be undermined by the page's memcg charge/migration. To fix this problem, we will clear the lru bit out of locking and use it as pin down action to block the page isolation in memcg changing. So now a standard steps of page isolation is following: 1, get_page(); #pin the page avoid to be free 2, TestClearPageLRU(); #block other isolation like memcg change 3, spin_lock on lru_lock; #serialize lru list access 4, delete page from lru list; This patch start with the first part: TestClearPageLRU, which combines PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This function will be used as page isolation precondition to prevent other isolations some where else. Then there are may !PageLRU page on lru list, need to remove BUG() checking accordingly. There 2 rules for lru bit now: 1, the lru bit still indicate if a page on lru list, just in some temporary moment(isolating), the page may have no lru bit when it's on lru list. but the page still must be on lru list when the lru bit set. 2, have to remove lru bit before delete it from lru list. As Andrew Morton mentioned this change would dirty cacheline for page isn't on LRU. But the lost would be acceptable in Rong Chen report: https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/ Suggested-by: Johannes Weiner Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Hugh Dickins Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Andrew Morton Cc: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Acked-by: Vlastimil Babka --- include/linux/page-flags.h | 1 + mm/mlock.c | 3 +-- mm/vmscan.c | 39 +++++++++++++++++++-------------------- 3 files changed, 21 insertions(+), 22 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 291dc247dc79..6426f2f03611 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -335,6 +335,7 @@ static inline void page_init_poison(struct page *page, size_t size) PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD) __CLEARPAGEFLAG(Dirty, dirty, PF_HEAD) PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD) + TESTCLEARFLAG(LRU, lru, PF_HEAD) PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD) TESTCLEARFLAG(Active, active, PF_HEAD) PAGEFLAG(Workingset, workingset, PF_HEAD) diff --git a/mm/mlock.c b/mm/mlock.c index d487aa864e86..7b0e6334be6f 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -276,10 +276,9 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) * We already have pin from follow_page_mask() * so we can spare the get_page() here. */ - if (PageLRU(page)) { + if (TestClearPageLRU(page)) { struct lruvec *lruvec; - ClearPageLRU(page); lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); del_page_from_lru_list(page, lruvec, diff --git a/mm/vmscan.c b/mm/vmscan.c index cb2f6256a7d6..ab7a0104d1e1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1542,7 +1542,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, */ int __isolate_lru_page(struct page *page, isolate_mode_t mode) { - int ret = -EINVAL; + int ret = -EBUSY; /* Only take pages on the LRU. */ if (!PageLRU(page)) @@ -1552,8 +1552,6 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) return ret; - ret = -EBUSY; - /* * To minimise LRU disruption, the caller can indicate that it only * wants to isolate pages it will be able to operate on without @@ -1600,8 +1598,10 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) * sure the page is not being freed elsewhere -- the * page release code relies on it. */ - ClearPageLRU(page); - ret = 0; + if (TestClearPageLRU(page)) + ret = 0; + else + put_page(page); } return ret; @@ -1667,8 +1667,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, page = lru_to_page(src); prefetchw_prev_lru_page(page, src, flags); - VM_BUG_ON_PAGE(!PageLRU(page), page); - nr_pages = compound_nr(page); total_scan += nr_pages; @@ -1765,21 +1763,18 @@ int isolate_lru_page(struct page *page) VM_BUG_ON_PAGE(!page_count(page), page); WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); - if (PageLRU(page)) { + if (TestClearPageLRU(page)) { pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; - spin_lock_irq(&pgdat->lru_lock); + get_page(page); lruvec = mem_cgroup_page_lruvec(page, pgdat); - if (PageLRU(page)) { - int lru = page_lru(page); - get_page(page); - ClearPageLRU(page); - del_page_from_lru_list(page, lruvec, lru); - ret = 0; - } + spin_lock_irq(&pgdat->lru_lock); + del_page_from_lru_list(page, lruvec, page_lru(page)); spin_unlock_irq(&pgdat->lru_lock); + ret = 0; } + return ret; } @@ -4293,6 +4288,10 @@ void check_move_unevictable_pages(struct pagevec *pvec) nr_pages = thp_nr_pages(page); pgscanned += nr_pages; + /* block memcg migration during page moving between lru */ + if (!TestClearPageLRU(page)) + continue; + if (pagepgdat != pgdat) { if (pgdat) spin_unlock_irq(&pgdat->lru_lock); @@ -4301,10 +4300,7 @@ void check_move_unevictable_pages(struct pagevec *pvec) } lruvec = mem_cgroup_page_lruvec(page, pgdat); - if (!PageLRU(page) || !PageUnevictable(page)) - continue; - - if (page_evictable(page)) { + if (page_evictable(page) && PageUnevictable(page)) { enum lru_list lru = page_lru_base_type(page); VM_BUG_ON_PAGE(PageActive(page), page); @@ -4313,12 +4309,15 @@ void check_move_unevictable_pages(struct pagevec *pvec) add_page_to_lru_list(page, lruvec, lru); pgrescued += nr_pages; } + SetPageLRU(page); } if (pgdat) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); spin_unlock_irq(&pgdat->lru_lock); + } else if (pgscanned) { + count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); } } EXPORT_SYMBOL_GPL(check_move_unevictable_pages); From patchwork Thu Nov 5 08:55:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883687 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 41FE5697 for ; Thu, 5 Nov 2020 08:56:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DD43A20728 for ; Thu, 5 Nov 2020 08:56:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DD43A20728 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0EB5B6B00AD; Thu, 5 Nov 2020 03:56:22 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 074E96B00AF; Thu, 5 Nov 2020 03:56:21 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E14CC6B00AD; Thu, 5 Nov 2020 03:56:21 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0227.hostedemail.com [216.40.44.227]) by kanga.kvack.org (Postfix) with ESMTP id 92B856B00AD for ; Thu, 5 Nov 2020 03:56:21 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 23E07583A for ; Thu, 5 Nov 2020 08:56:21 +0000 (UTC) X-FDA: 77449758162.26.sack44_02136f4272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 01B261804B65C for ; Thu, 5 Nov 2020 08:56:20 +0000 (UTC) X-Spam-Summary: 1,0,0,4730ffd4e421c3c3,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:2:41:69:355:379:541:800:960:966:973:988:989:1260:1261:1345:1359:1381:1431:1437:1535:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2553:2559:2562:2693:2731:2895:2898:2899:3138:3139:3140:3141:3142:3369:3622:3865:3866:3867:3868:3870:3871:3872:4050:4120:4250:4321:4385:4605:5007:6119:6261:6737:6738:7903:8603:8957:9592:10004:11026:11232:11658:11914:12043:12048:12291:12296:12297:12438:12555:12679:12683:12895:13153:13161:13228:13229:13846:14394:14915:21060:21080:21451:21611:21627:21740:21987:21990:30012:30054:30070:30090,0,RBL:115.124.30.132:@linux.alibaba.com:.lbl8.mailshell.net-64.201.201.201 62.20.2.100;04yfjj68dh4no7oznpkchhqmfwmosycd56jx6ag3wy7tu8w4ocsp8hy3b3nrjbo.g9nagz73ktwfzh1zimnxrdr9ghnbo4ok5tsowaj8g9tudetqj86nquejxmm6pbr.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rule s:0:0:0, X-HE-Tag: sack44_02136f4272c8 X-Filterd-Recvd-Size: 9175 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:19 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=alimailimapcm10staff010182156082;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:14 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Subject: [PATCH v21 15/19] mm/compaction: do page isolation first in compaction Date: Thu, 5 Nov 2020 16:55:45 +0800 Message-Id: <1604566549-62481-16-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, compaction would get the lru_lock and then do page isolation which works fine with pgdat->lru_lock, since any page isoltion would compete for the lru_lock. If we want to change to memcg lru_lock, we have to isolate the page before getting lru_lock, thus isoltion would block page's memcg change which relay on page isoltion too. Then we could safely use per memcg lru_lock later. The new page isolation use previous introduced TestClearPageLRU() + pgdat lru locking which will be changed to memcg lru lock later. Hugh Dickins fixed following bugs in this patch's early version: Fix lots of crashes under compaction load: isolate_migratepages_block() must clean up appropriately when rejecting a page, setting PageLRU again if it had been cleared; and a put_page() after get_page_unless_zero() cannot safely be done while holding locked_lruvec - it may turn out to be the final put_page(), which will take an lruvec lock when PageLRU. And move __isolate_lru_page_prepare back after get_page_unless_zero to make trylock_page() safe: trylock_page() is not safe to use at this time: its setting PG_locked can race with the page being freed or allocated ("Bad page"), and can also erase flags being set by one of those "sole owners" of a freshly allocated page who use non-atomic __SetPageFlag(). Suggested-by: Johannes Weiner Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Andrew Morton Cc: Matthew Wilcox Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Acked-by: Vlastimil Babka --- include/linux/swap.h | 2 +- mm/compaction.c | 42 +++++++++++++++++++++++++++++++++--------- mm/vmscan.c | 43 ++++++++++++++++++++++--------------------- 3 files changed, 56 insertions(+), 31 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 5e1e967c225f..596bc2f4d9b0 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -356,7 +356,7 @@ extern void lru_cache_add_inactive_or_unevictable(struct page *page, extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask); -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode); +extern int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode); extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, diff --git a/mm/compaction.c b/mm/compaction.c index ee1f8439369e..7b1cf48884dd 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -886,6 +886,7 @@ static bool too_many_isolated(pg_data_t *pgdat) if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) { if (!cc->ignore_skip_hint && get_pageblock_skip(page)) { low_pfn = end_pfn; + page = NULL; goto isolate_abort; } valid_page = page; @@ -967,6 +968,21 @@ static bool too_many_isolated(pg_data_t *pgdat) if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) goto isolate_fail; + /* + * Be careful not to clear PageLRU until after we're + * sure the page is not being freed elsewhere -- the + * page release code relies on it. + */ + if (unlikely(!get_page_unless_zero(page))) + goto isolate_fail; + + if (__isolate_lru_page_prepare(page, isolate_mode) != 0) + goto isolate_fail_put; + + /* Try isolate the page */ + if (!TestClearPageLRU(page)) + goto isolate_fail_put; + /* If we already hold the lock, we can skip some rechecking */ if (!locked) { locked = compact_lock_irqsave(&pgdat->lru_lock, @@ -979,10 +995,6 @@ static bool too_many_isolated(pg_data_t *pgdat) goto isolate_abort; } - /* Recheck PageLRU and PageCompound under lock */ - if (!PageLRU(page)) - goto isolate_fail; - /* * Page become compound since the non-locked check, * and it's on LRU. It can only be a THP so the order @@ -990,16 +1002,13 @@ static bool too_many_isolated(pg_data_t *pgdat) */ if (unlikely(PageCompound(page) && !cc->alloc_contig)) { low_pfn += compound_nr(page) - 1; - goto isolate_fail; + SetPageLRU(page); + goto isolate_fail_put; } } lruvec = mem_cgroup_page_lruvec(page, pgdat); - /* Try isolate the page */ - if (__isolate_lru_page(page, isolate_mode) != 0) - goto isolate_fail; - /* The whole page is taken off the LRU; skip the tail pages. */ if (PageCompound(page)) low_pfn += compound_nr(page) - 1; @@ -1028,6 +1037,15 @@ static bool too_many_isolated(pg_data_t *pgdat) } continue; + +isolate_fail_put: + /* Avoid potential deadlock in freeing page under lru_lock */ + if (locked) { + spin_unlock_irqrestore(&pgdat->lru_lock, flags); + locked = false; + } + put_page(page); + isolate_fail: if (!skip_on_failure) continue; @@ -1064,9 +1082,15 @@ static bool too_many_isolated(pg_data_t *pgdat) if (unlikely(low_pfn > end_pfn)) low_pfn = end_pfn; + page = NULL; + isolate_abort: if (locked) spin_unlock_irqrestore(&pgdat->lru_lock, flags); + if (page) { + SetPageLRU(page); + put_page(page); + } /* * Updated the cached scanner pfn once the pageblock has been scanned diff --git a/mm/vmscan.c b/mm/vmscan.c index ab7a0104d1e1..0be55d875fde 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1540,7 +1540,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, * * returns 0 on success, -ve errno on failure. */ -int __isolate_lru_page(struct page *page, isolate_mode_t mode) +int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) { int ret = -EBUSY; @@ -1592,22 +1592,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) if ((mode & ISOLATE_UNMAPPED) && page_mapped(page)) return ret; - if (likely(get_page_unless_zero(page))) { - /* - * Be careful not to clear PageLRU until after we're - * sure the page is not being freed elsewhere -- the - * page release code relies on it. - */ - if (TestClearPageLRU(page)) - ret = 0; - else - put_page(page); - } - - return ret; + return 0; } - /* * Update LRU sizes after isolating pages. The LRU size updates must * be complete before mem_cgroup_update_lru_size due to a sanity check. @@ -1687,20 +1674,34 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, * only when the page is being freed somewhere else. */ scan += nr_pages; - switch (__isolate_lru_page(page, mode)) { + switch (__isolate_lru_page_prepare(page, mode)) { case 0: + /* + * Be careful not to clear PageLRU until after we're + * sure the page is not being freed elsewhere -- the + * page release code relies on it. + */ + if (unlikely(!get_page_unless_zero(page))) + goto busy; + + if (!TestClearPageLRU(page)) { + /* + * This page may in other isolation path, + * but we still hold lru_lock. + */ + put_page(page); + goto busy; + } + nr_taken += nr_pages; nr_zone_taken[page_zonenum(page)] += nr_pages; list_move(&page->lru, dst); break; - case -EBUSY: + default: +busy: /* else it is being freed elsewhere */ list_move(&page->lru, src); - continue; - - default: - BUG(); } } From patchwork Thu Nov 5 08:55:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883691 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED00C921 for ; Thu, 5 Nov 2020 08:56:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A8D1A221F8 for ; Thu, 5 Nov 2020 08:56:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A8D1A221F8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 119F96B00B0; Thu, 5 Nov 2020 03:56:23 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E55356B00B2; Thu, 5 Nov 2020 03:56:22 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B725B6B00B1; Thu, 5 Nov 2020 03:56:22 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0089.hostedemail.com [216.40.44.89]) by kanga.kvack.org (Postfix) with ESMTP id 791856B00B0 for ; Thu, 5 Nov 2020 03:56:22 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1F666583A for ; Thu, 5 Nov 2020 08:56:22 +0000 (UTC) X-FDA: 77449758204.05.fuel64_0f0015a272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id F32E21802C539 for ; Thu, 5 Nov 2020 08:56:21 +0000 (UTC) X-Spam-Summary: 1,0,0,85530a059da2e498,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:41:355:379:541:800:960:966:973:988:989:1260:1261:1345:1359:1381:1431:1437:1535:1544:1711:1730:1747:1777:1792:2196:2199:2393:2559:2562:2736:2898:3138:3139:3140:3141:3142:3355:3865:3867:3868:3870:3871:3872:3874:4117:4321:4385:5007:6261:6737:6738:7903:8957:9592:10004:11026:11658:11914:12043:12048:12291:12296:12297:12438:12555:12683:12895:12986:13161:13221:13229:13846:14181:14394:14721:14915:21060:21080:21450:21451:21627:21740:21987:21990:30054:30070,0,RBL:115.124.30.44:@linux.alibaba.com:.lbl8.mailshell.net-64.201.201.201 62.20.2.100;04yrtkcz38jni3gwemqqd7qdh174cyccrzb5e146pqnasz544d3femox57dqrja.m5qm8ojm9iwrfn6rjnmzyzbexfeuoykanhxde156d8oxx4g6dmquu6ry7d934fy.y-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:70,LUA_SUMMARY:none X-HE-Tag: fuel64_0f0015a272c8 X-Filterd-Recvd-Size: 6346 Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com [115.124.30.44]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:20 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04357;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:14 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Subject: [PATCH v21 16/19] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Date: Thu, 5 Nov 2020 16:55:46 +0800 Message-Id: <1604566549-62481-17-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hugh Dickins' found a memcg change bug on original version: If we want to change the pgdat->lru_lock to memcg's lruvec lock, we have to serialize mem_cgroup_move_account during pagevec_lru_move_fn. The possible bad scenario would like: cpu 0 cpu 1 lruvec = mem_cgroup_page_lruvec() if (!isolate_lru_page()) mem_cgroup_move_account spin_lock_irqsave(&lruvec->lru_lock <== wrong lock. So we need TestClearPageLRU to block isolate_lru_page(), that serializes the memcg change. and then removing the PageLRU check in move_fn callee as the consequence. __pagevec_lru_add_fn() is different from the others, because the pages it deals with are, by definition, not yet on the lru. TestClearPageLRU is not needed and would not work, so __pagevec_lru_add() goes its own way. Reported-by: Hugh Dickins Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Andrew Morton Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Acked-by: Vlastimil Babka --- mm/swap.c | 44 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 35 insertions(+), 9 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index 2681d9023998..1838a9535703 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -222,8 +222,14 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, spin_lock_irqsave(&pgdat->lru_lock, flags); } + /* block memcg migration during page moving between lru */ + if (!TestClearPageLRU(page)) + continue; + lruvec = mem_cgroup_page_lruvec(page, pgdat); (*move_fn)(page, lruvec); + + SetPageLRU(page); } if (pgdat) spin_unlock_irqrestore(&pgdat->lru_lock, flags); @@ -233,7 +239,7 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec) { - if (PageLRU(page) && !PageUnevictable(page)) { + if (!PageUnevictable(page)) { del_page_from_lru_list(page, lruvec, page_lru(page)); ClearPageActive(page); add_page_to_lru_list_tail(page, lruvec, page_lru(page)); @@ -306,7 +312,7 @@ void lru_note_cost_page(struct page *page) static void __activate_page(struct page *page, struct lruvec *lruvec) { - if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) { + if (!PageActive(page) && !PageUnevictable(page)) { int lru = page_lru_base_type(page); int nr_pages = thp_nr_pages(page); @@ -362,7 +368,8 @@ static void activate_page(struct page *page) page = compound_head(page); spin_lock_irq(&pgdat->lru_lock); - __activate_page(page, mem_cgroup_page_lruvec(page, pgdat)); + if (PageLRU(page)) + __activate_page(page, mem_cgroup_page_lruvec(page, pgdat)); spin_unlock_irq(&pgdat->lru_lock); } #endif @@ -519,9 +526,6 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec) bool active; int nr_pages = thp_nr_pages(page); - if (!PageLRU(page)) - return; - if (PageUnevictable(page)) return; @@ -562,7 +566,7 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec) static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec) { - if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + if (PageActive(page) && !PageUnevictable(page)) { int lru = page_lru_base_type(page); int nr_pages = thp_nr_pages(page); @@ -579,7 +583,7 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec) static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec) { - if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && + if (PageAnon(page) && PageSwapBacked(page) && !PageSwapCache(page) && !PageUnevictable(page)) { bool active = PageActive(page); int nr_pages = thp_nr_pages(page); @@ -1021,7 +1025,29 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec) */ void __pagevec_lru_add(struct pagevec *pvec) { - pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn); + int i; + struct pglist_data *pgdat = NULL; + struct lruvec *lruvec; + unsigned long flags = 0; + + for (i = 0; i < pagevec_count(pvec); i++) { + struct page *page = pvec->pages[i]; + struct pglist_data *pagepgdat = page_pgdat(page); + + if (pagepgdat != pgdat) { + if (pgdat) + spin_unlock_irqrestore(&pgdat->lru_lock, flags); + pgdat = pagepgdat; + spin_lock_irqsave(&pgdat->lru_lock, flags); + } + + lruvec = mem_cgroup_page_lruvec(page, pgdat); + __pagevec_lru_add_fn(page, lruvec); + } + if (pgdat) + spin_unlock_irqrestore(&pgdat->lru_lock, flags); + release_pages(pvec->pages, pvec->nr); + pagevec_reinit(pvec); } /** From patchwork Thu Nov 5 08:55:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883713 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 42815697 for ; Thu, 5 Nov 2020 08:56:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD66720825 for ; Thu, 5 Nov 2020 08:56:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD66720825 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E327A6B00C7; Thu, 5 Nov 2020 03:56:36 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DEE866B00C9; Thu, 5 Nov 2020 03:56:36 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC6E86B00CA; Thu, 5 Nov 2020 03:56:36 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0147.hostedemail.com [216.40.44.147]) by kanga.kvack.org (Postfix) with ESMTP id 5B8AC6B00C7 for ; Thu, 5 Nov 2020 03:56:36 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 01809582D for ; Thu, 5 Nov 2020 08:56:36 +0000 (UTC) X-FDA: 77449758792.17.mist54_4b07dcc272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id DC604180D0184 for ; Thu, 5 Nov 2020 08:56:35 +0000 (UTC) X-Spam-Summary: 1,0,0,4685e75fb818280f,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:69:327:355:379:541:960:966:967:973:988:989:1260:1261:1345:1359:1431:1437:1605:1730:1747:1777:1792:2194:2195:2196:2198:2199:2200:2201:2202:2393:2525:2538:2559:2563:2682:2685:2731:2740:2859:2890:2898:2901:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3608:3865:3866:3867:3868:3870:3871:3872:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4042:4250:4321:4385:4605:5007:6261:6737:6738:7514:7875:7903:8603:8957:8985:9010:9025:9207:9592:10004:11026:11914:12043:12048:12291:12296:12297:12438:12555:12683:12895:12986:13146:13153:13228:13230:13846:13868:14096:14394:14915:21060:21080:21324:21450:21451:21611:21627:21740:21796:21811:21966:21987:21990:30001:30036:30054:30055:30056:30064:30067:30070:30090,0,RBL:115.124.30.54:@linux.alibaba.com:.lbl8.mailshell.net-62.20.2.100 64.201.201.201;04yfdkuwzekbrnub6z9qbdc3j46o9ycwwadz4tk7ap4uzf3gufeinampj7egxq7.9n57m4kqfcff8nhyzxd bnbgzxx5 X-HE-Tag: mist54_4b07dcc272c8 X-Filterd-Recvd-Size: 32156 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:25 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=23;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:15 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: Michal Hocko , Yang Shi Subject: [PATCH v21 17/19] mm/lru: replace pgdat lru_lock with lruvec lock Date: Thu, 5 Nov 2020 16:55:47 +0800 Message-Id: <1604566549-62481-18-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch moves per node lru_lock into lruvec, thus bring a lru_lock for each of memcg per node. So on a large machine, each of memcg don't have to suffer from per node pgdat->lru_lock competition. They could go fast with their self lru_lock. After move memcg charge before lru inserting, page isolation could serialize page's memcg, then per memcg lruvec lock is stable and could replace per node lru lock. In func isolate_migratepages_block, compact_unlock_should_abort and lock_page_lruvec_irqsave are open coded to work with compact_control. Also add a debug func in locking which may give some clues if there are sth out of hands. Daniel Jordan's testing show 62% improvement on modified readtwice case on his 2P * 10 core * 2 HT broadwell box. https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/ On a large machine with memcg enabled but not used, the page's lruvec seeking pass a few pointers, that may lead to lru_lock holding time increase and a bit regression. Hugh Dickins helped on the patch polish, thanks! Signed-off-by: Alex Shi Acked-by: Hugh Dickins Cc: Rong Chen Cc: Hugh Dickins Cc: Andrew Morton Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Yang Shi Cc: Matthew Wilcox Cc: Konstantin Khlebnikov Cc: Tejun Heo Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: cgroups@vger.kernel.org Signed-off-by: Alex Shi Acked-by: Hugh Dickins Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Acked-by: Vlastimil Babka --- include/linux/memcontrol.h | 58 +++++++++++++++++++++++ include/linux/mmzone.h | 3 +- mm/compaction.c | 56 ++++++++++++++-------- mm/huge_memory.c | 11 ++--- mm/memcontrol.c | 73 ++++++++++++++++++++++++++-- mm/mlock.c | 22 ++++++--- mm/mmzone.c | 1 + mm/page_alloc.c | 1 - mm/swap.c | 116 ++++++++++++++++++++++----------------------- mm/vmscan.c | 55 ++++++++++----------- 10 files changed, 270 insertions(+), 126 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0f4dd7829fb2..6ecb08ff4ad1 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -666,6 +666,19 @@ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg, struct mem_cgroup *get_mem_cgroup_from_page(struct page *page); +struct lruvec *lock_page_lruvec(struct page *page); +struct lruvec *lock_page_lruvec_irq(struct page *page); +struct lruvec *lock_page_lruvec_irqsave(struct page *page, + unsigned long *flags); + +#ifdef CONFIG_DEBUG_VM +void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page); +#else +static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page) +{ +} +#endif + static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; @@ -1233,6 +1246,31 @@ static inline void mem_cgroup_put(struct mem_cgroup *memcg) { } +static inline struct lruvec *lock_page_lruvec(struct page *page) +{ + struct pglist_data *pgdat = page_pgdat(page); + + spin_lock(&pgdat->__lruvec.lru_lock); + return &pgdat->__lruvec; +} + +static inline struct lruvec *lock_page_lruvec_irq(struct page *page) +{ + struct pglist_data *pgdat = page_pgdat(page); + + spin_lock_irq(&pgdat->__lruvec.lru_lock); + return &pgdat->__lruvec; +} + +static inline struct lruvec *lock_page_lruvec_irqsave(struct page *page, + unsigned long *flagsp) +{ + struct pglist_data *pgdat = page_pgdat(page); + + spin_lock_irqsave(&pgdat->__lruvec.lru_lock, *flagsp); + return &pgdat->__lruvec; +} + static inline struct mem_cgroup * mem_cgroup_iter(struct mem_cgroup *root, struct mem_cgroup *prev, @@ -1476,6 +1514,10 @@ static inline void count_memcg_page_event(struct page *page, void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) { } + +static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page) +{ +} #endif /* CONFIG_MEMCG */ /* idx can be of type enum memcg_stat_item or node_stat_item */ @@ -1605,6 +1647,22 @@ static inline struct lruvec *parent_lruvec(struct lruvec *lruvec) return mem_cgroup_lruvec(memcg, lruvec_pgdat(lruvec)); } +static inline void unlock_page_lruvec(struct lruvec *lruvec) +{ + spin_unlock(&lruvec->lru_lock); +} + +static inline void unlock_page_lruvec_irq(struct lruvec *lruvec) +{ + spin_unlock_irq(&lruvec->lru_lock); +} + +static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, + unsigned long flags) +{ + spin_unlock_irqrestore(&lruvec->lru_lock, flags); +} + #ifdef CONFIG_CGROUP_WRITEBACK struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fb3bf696c05e..0afba4ea2a21 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -276,6 +276,8 @@ enum lruvec_flags { struct lruvec { struct list_head lists[NR_LRU_LISTS]; + /* per lruvec lru_lock for memcg */ + spinlock_t lru_lock; /* * These track the cost of reclaiming one LRU - file or anon - * over the other. As the observed cost of reclaiming one LRU @@ -796,7 +798,6 @@ struct deferred_split { /* Write-intensive fields used by page reclaim */ ZONE_PADDING(_pad1_) - spinlock_t lru_lock; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* diff --git a/mm/compaction.c b/mm/compaction.c index 7b1cf48884dd..9cfe90961493 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -804,7 +804,7 @@ static bool too_many_isolated(pg_data_t *pgdat) unsigned long nr_scanned = 0, nr_isolated = 0; struct lruvec *lruvec; unsigned long flags = 0; - bool locked = false; + struct lruvec *locked = NULL; struct page *page = NULL, *valid_page = NULL; unsigned long start_pfn = low_pfn; bool skip_on_failure = false; @@ -864,11 +864,20 @@ static bool too_many_isolated(pg_data_t *pgdat) * contention, to give chance to IRQs. Abort completely if * a fatal signal is pending. */ - if (!(low_pfn % SWAP_CLUSTER_MAX) - && compact_unlock_should_abort(&pgdat->lru_lock, - flags, &locked, cc)) { - low_pfn = 0; - goto fatal_pending; + if (!(low_pfn % SWAP_CLUSTER_MAX)) { + if (locked) { + unlock_page_lruvec_irqrestore(locked, flags); + locked = NULL; + } + + if (fatal_signal_pending(current)) { + cc->contended = true; + + low_pfn = 0; + goto fatal_pending; + } + + cond_resched(); } if (!pfn_valid_within(low_pfn)) @@ -940,9 +949,8 @@ static bool too_many_isolated(pg_data_t *pgdat) if (unlikely(__PageMovable(page)) && !PageIsolated(page)) { if (locked) { - spin_unlock_irqrestore(&pgdat->lru_lock, - flags); - locked = false; + unlock_page_lruvec_irqrestore(locked, flags); + locked = NULL; } if (!isolate_movable_page(page, isolate_mode)) @@ -983,10 +991,19 @@ static bool too_many_isolated(pg_data_t *pgdat) if (!TestClearPageLRU(page)) goto isolate_fail_put; + rcu_read_lock(); + lruvec = mem_cgroup_page_lruvec(page, pgdat); + /* If we already hold the lock, we can skip some rechecking */ - if (!locked) { - locked = compact_lock_irqsave(&pgdat->lru_lock, - &flags, cc); + if (lruvec != locked) { + if (locked) + unlock_page_lruvec_irqrestore(locked, flags); + + compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); + locked = lruvec; + rcu_read_unlock(); + + lruvec_memcg_debug(lruvec, page); /* Try get exclusive access under lock */ if (!skip_updated) { @@ -1005,9 +1022,8 @@ static bool too_many_isolated(pg_data_t *pgdat) SetPageLRU(page); goto isolate_fail_put; } - } - - lruvec = mem_cgroup_page_lruvec(page, pgdat); + } else + rcu_read_unlock(); /* The whole page is taken off the LRU; skip the tail pages. */ if (PageCompound(page)) @@ -1041,8 +1057,8 @@ static bool too_many_isolated(pg_data_t *pgdat) isolate_fail_put: /* Avoid potential deadlock in freeing page under lru_lock */ if (locked) { - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - locked = false; + unlock_page_lruvec_irqrestore(locked, flags); + locked = NULL; } put_page(page); @@ -1057,8 +1073,8 @@ static bool too_many_isolated(pg_data_t *pgdat) */ if (nr_isolated) { if (locked) { - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - locked = false; + unlock_page_lruvec_irqrestore(locked, flags); + locked = NULL; } putback_movable_pages(&cc->migratepages); cc->nr_migratepages = 0; @@ -1086,7 +1102,7 @@ static bool too_many_isolated(pg_data_t *pgdat) isolate_abort: if (locked) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + unlock_page_lruvec_irqrestore(locked, flags); if (page) { SetPageLRU(page); put_page(page); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b70ec0c6076b..94e42dba052a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2354,7 +2354,7 @@ static void lru_add_page_tail(struct page *head, struct page *tail, VM_BUG_ON_PAGE(!PageHead(head), head); VM_BUG_ON_PAGE(PageCompound(tail), head); VM_BUG_ON_PAGE(PageLRU(tail), head); - lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); + lockdep_assert_held(&lruvec->lru_lock); if (list) { /* page reclaim is reclaiming a huge page */ @@ -2438,7 +2438,6 @@ static void __split_huge_page(struct page *page, struct list_head *list, pgoff_t end) { struct page *head = compound_head(page); - pg_data_t *pgdat = page_pgdat(head); struct lruvec *lruvec; struct address_space *swap_cache = NULL; unsigned long offset = 0; @@ -2456,10 +2455,8 @@ static void __split_huge_page(struct page *page, struct list_head *list, xa_lock(&swap_cache->i_pages); } - /* prevent PageLRU to go away from under us, and freeze lru stats */ - spin_lock(&pgdat->lru_lock); - - lruvec = mem_cgroup_page_lruvec(head, pgdat); + /* lock lru list/PageCompound, ref freezed by page_ref_freeze */ + lruvec = lock_page_lruvec(head); for (i = nr - 1; i >= 1; i--) { __split_huge_page_tail(head, i, lruvec, list); @@ -2480,7 +2477,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, } ClearPageCompound(head); - spin_unlock(&pgdat->lru_lock); + unlock_page_lruvec(lruvec); /* Caller disabled irqs, so they are still disabled here */ split_page_owner(head, nr); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 157b745031a4..91226af58ce8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -20,6 +20,9 @@ * Lockless page tracking & accounting * Unified hierarchy configuration model * Copyright (C) 2015 Red Hat, Inc., Johannes Weiner + * + * Per memcg lru locking + * Copyright (C) 2020 Alibaba, Inc, Alex Shi */ #include @@ -1305,6 +1308,19 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg, return ret; } +#ifdef CONFIG_DEBUG_VM +void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page) +{ + if (mem_cgroup_disabled()) + return; + + if (!page->mem_cgroup) + VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != root_mem_cgroup, page); + else + VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != page->mem_cgroup, page); +} +#endif + /** * mem_cgroup_page_lruvec - return lruvec for isolating/putting an LRU page * @page: the page @@ -1343,6 +1359,59 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd } /** + * lock_page_lruvec - lock and return lruvec for a given page. + * @page: the page + * + * This series functions should be used in either conditions: + * PageLRU is cleared or unset + * or page is locked. + */ +struct lruvec *lock_page_lruvec(struct page *page) +{ + struct lruvec *lruvec; + struct pglist_data *pgdat = page_pgdat(page); + + rcu_read_lock(); + lruvec = mem_cgroup_page_lruvec(page, pgdat); + spin_lock(&lruvec->lru_lock); + rcu_read_unlock(); + + lruvec_memcg_debug(lruvec, page); + + return lruvec; +} + +struct lruvec *lock_page_lruvec_irq(struct page *page) +{ + struct lruvec *lruvec; + struct pglist_data *pgdat = page_pgdat(page); + + rcu_read_lock(); + lruvec = mem_cgroup_page_lruvec(page, pgdat); + spin_lock_irq(&lruvec->lru_lock); + rcu_read_unlock(); + + lruvec_memcg_debug(lruvec, page); + + return lruvec; +} + +struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags) +{ + struct lruvec *lruvec; + struct pglist_data *pgdat = page_pgdat(page); + + rcu_read_lock(); + lruvec = mem_cgroup_page_lruvec(page, pgdat); + spin_lock_irqsave(&lruvec->lru_lock, *flags); + rcu_read_unlock(); + + lruvec_memcg_debug(lruvec, page); + + return lruvec; +} + +/** * mem_cgroup_update_lru_size - account for adding or removing an lru page * @lruvec: mem_cgroup per zone lru vector * @lru: index of lru list the page is sitting on @@ -3245,10 +3314,8 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size) #endif /* CONFIG_MEMCG_KMEM */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE - /* - * Because tail pages are not marked as "used", set it. We're under - * pgdat->lru_lock and migration entries setup in all page mappings. + * Because page->mem_cgroup is not set on compound tails, set it now. */ void mem_cgroup_split_huge_fixup(struct page *head) { diff --git a/mm/mlock.c b/mm/mlock.c index 7b0e6334be6f..ab164a675c25 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -262,12 +262,12 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) int nr = pagevec_count(pvec); int delta_munlocked = -nr; struct pagevec pvec_putback; + struct lruvec *lruvec = NULL; int pgrescued = 0; pagevec_init(&pvec_putback); /* Phase 1: page isolation */ - spin_lock_irq(&zone->zone_pgdat->lru_lock); for (i = 0; i < nr; i++) { struct page *page = pvec->pages[i]; @@ -277,10 +277,16 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) * so we can spare the get_page() here. */ if (TestClearPageLRU(page)) { - struct lruvec *lruvec; + struct lruvec *new_lruvec; + + new_lruvec = mem_cgroup_page_lruvec(page, + page_pgdat(page)); + if (new_lruvec != lruvec) { + if (lruvec) + unlock_page_lruvec_irq(lruvec); + lruvec = lock_page_lruvec_irq(page); + } - lruvec = mem_cgroup_page_lruvec(page, - page_pgdat(page)); del_page_from_lru_list(page, lruvec, page_lru(page)); continue; @@ -299,8 +305,12 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) pagevec_add(&pvec_putback, pvec->pages[i]); pvec->pages[i] = NULL; } - __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); - spin_unlock_irq(&zone->zone_pgdat->lru_lock); + if (lruvec) { + __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); + unlock_page_lruvec_irq(lruvec); + } else if (delta_munlocked) { + mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); + } /* Now we can release pins of pages that we are not munlocking */ pagevec_release(&pvec_putback); diff --git a/mm/mmzone.c b/mm/mmzone.c index 4686fdc23bb9..3750a90ed4a0 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -91,6 +91,7 @@ void lruvec_init(struct lruvec *lruvec) enum lru_list lru; memset(lruvec, 0, sizeof(struct lruvec)); + spin_lock_init(&lruvec->lru_lock); for_each_lru(lru) INIT_LIST_HEAD(&lruvec->lists[lru]); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d77220615fd5..74bf7f4c6317 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6808,7 +6808,6 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat) init_waitqueue_head(&pgdat->pfmemalloc_wait); pgdat_page_ext_init(pgdat); - spin_lock_init(&pgdat->lru_lock); lruvec_init(&pgdat->__lruvec); } diff --git a/mm/swap.c b/mm/swap.c index 1838a9535703..ed033f7c4f2d 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -79,16 +79,14 @@ static DEFINE_PER_CPU(struct lru_pvecs, lru_pvecs) = { static void __page_cache_release(struct page *page) { if (PageLRU(page)) { - pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; unsigned long flags; - spin_lock_irqsave(&pgdat->lru_lock, flags); - lruvec = mem_cgroup_page_lruvec(page, pgdat); + lruvec = lock_page_lruvec_irqsave(page, &flags); VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + unlock_page_lruvec_irqrestore(lruvec, flags); } __ClearPageWaiters(page); } @@ -207,32 +205,30 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, void (*move_fn)(struct page *page, struct lruvec *lruvec)) { int i; - struct pglist_data *pgdat = NULL; - struct lruvec *lruvec; + struct lruvec *lruvec = NULL; unsigned long flags = 0; for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pagepgdat = page_pgdat(page); - - if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - pgdat = pagepgdat; - spin_lock_irqsave(&pgdat->lru_lock, flags); - } + struct lruvec *new_lruvec; /* block memcg migration during page moving between lru */ if (!TestClearPageLRU(page)) continue; - lruvec = mem_cgroup_page_lruvec(page, pgdat); + new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + if (lruvec != new_lruvec) { + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = lock_page_lruvec_irqsave(page, &flags); + } + (*move_fn)(page, lruvec); SetPageLRU(page); } - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); release_pages(pvec->pages, pvec->nr); pagevec_reinit(pvec); } @@ -274,9 +270,15 @@ void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages) { do { unsigned long lrusize; - struct pglist_data *pgdat = lruvec_pgdat(lruvec); - spin_lock_irq(&pgdat->lru_lock); + /* + * Hold lruvec->lru_lock is safe here, since + * 1) The pinned lruvec in reclaim, or + * 2) From a pre-LRU page during refault (which also holds the + * rcu lock, so would be safe even if the page was on the LRU + * and could move simultaneously to a new lruvec). + */ + spin_lock_irq(&lruvec->lru_lock); /* Record cost event */ if (file) lruvec->file_cost += nr_pages; @@ -300,7 +302,7 @@ void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages) lruvec->file_cost /= 2; lruvec->anon_cost /= 2; } - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); } while ((lruvec = parent_lruvec(lruvec))); } @@ -364,13 +366,15 @@ static inline void activate_page_drain(int cpu) static void activate_page(struct page *page) { - pg_data_t *pgdat = page_pgdat(page); + struct lruvec *lruvec; page = compound_head(page); - spin_lock_irq(&pgdat->lru_lock); - if (PageLRU(page)) - __activate_page(page, mem_cgroup_page_lruvec(page, pgdat)); - spin_unlock_irq(&pgdat->lru_lock); + if (TestClearPageLRU(page)) { + lruvec = lock_page_lruvec_irq(page); + __activate_page(page, lruvec); + unlock_page_lruvec_irq(lruvec); + SetPageLRU(page); + } } #endif @@ -860,8 +864,7 @@ void release_pages(struct page **pages, int nr) { int i; LIST_HEAD(pages_to_free); - struct pglist_data *locked_pgdat = NULL; - struct lruvec *lruvec; + struct lruvec *lruvec = NULL; unsigned long flags; unsigned int lock_batch; @@ -871,11 +874,11 @@ void release_pages(struct page **pages, int nr) /* * Make sure the IRQ-safe lock-holding time does not get * excessive with a continuous string of pages from the - * same pgdat. The lock is held only if pgdat != NULL. + * same lruvec. The lock is held only if lruvec != NULL. */ - if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); - locked_pgdat = NULL; + if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; } page = compound_head(page); @@ -883,10 +886,9 @@ void release_pages(struct page **pages, int nr) continue; if (is_zone_device_page(page)) { - if (locked_pgdat) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, - flags); - locked_pgdat = NULL; + if (lruvec) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; } /* * ZONE_DEVICE pages that return 'false' from @@ -907,27 +909,27 @@ void release_pages(struct page **pages, int nr) continue; if (PageCompound(page)) { - if (locked_pgdat) { - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); - locked_pgdat = NULL; + if (lruvec) { + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = NULL; } __put_compound_page(page); continue; } if (PageLRU(page)) { - struct pglist_data *pgdat = page_pgdat(page); + struct lruvec *new_lruvec; - if (pgdat != locked_pgdat) { - if (locked_pgdat) - spin_unlock_irqrestore(&locked_pgdat->lru_lock, + new_lruvec = mem_cgroup_page_lruvec(page, + page_pgdat(page)); + if (new_lruvec != lruvec) { + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); lock_batch = 0; - locked_pgdat = pgdat; - spin_lock_irqsave(&locked_pgdat->lru_lock, flags); + lruvec = lock_page_lruvec_irqsave(page, &flags); } - lruvec = mem_cgroup_page_lruvec(page, locked_pgdat); VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); del_page_from_lru_list(page, lruvec, page_off_lru(page)); @@ -937,8 +939,8 @@ void release_pages(struct page **pages, int nr) list_add(&page->lru, &pages_to_free); } - if (locked_pgdat) - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); mem_cgroup_uncharge_list(&pages_to_free); free_unref_page_list(&pages_to_free); @@ -1026,26 +1028,24 @@ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec) void __pagevec_lru_add(struct pagevec *pvec) { int i; - struct pglist_data *pgdat = NULL; - struct lruvec *lruvec; + struct lruvec *lruvec = NULL; unsigned long flags = 0; for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pagepgdat = page_pgdat(page); + struct lruvec *new_lruvec; - if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); - pgdat = pagepgdat; - spin_lock_irqsave(&pgdat->lru_lock, flags); + new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + if (lruvec != new_lruvec) { + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); + lruvec = lock_page_lruvec_irqsave(page, &flags); } - lruvec = mem_cgroup_page_lruvec(page, pgdat); __pagevec_lru_add_fn(page, lruvec); } - if (pgdat) - spin_unlock_irqrestore(&pgdat->lru_lock, flags); + if (lruvec) + unlock_page_lruvec_irqrestore(lruvec, flags); release_pages(pvec->pages, pvec->nr); pagevec_reinit(pvec); } diff --git a/mm/vmscan.c b/mm/vmscan.c index 0be55d875fde..2953ddec88a0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1765,14 +1765,12 @@ int isolate_lru_page(struct page *page) WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); if (TestClearPageLRU(page)) { - pg_data_t *pgdat = page_pgdat(page); struct lruvec *lruvec; get_page(page); - lruvec = mem_cgroup_page_lruvec(page, pgdat); - spin_lock_irq(&pgdat->lru_lock); + lruvec = lock_page_lruvec_irq(page); del_page_from_lru_list(page, lruvec, page_lru(page)); - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); ret = 0; } @@ -1839,7 +1837,6 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, struct list_head *list) { - struct pglist_data *pgdat = lruvec_pgdat(lruvec); int nr_pages, nr_moved = 0; LIST_HEAD(pages_to_free); struct page *page; @@ -1850,9 +1847,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, VM_BUG_ON_PAGE(PageLRU(page), page); list_del(&page->lru); if (unlikely(!page_evictable(page))) { - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); putback_lru_page(page); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); continue; } @@ -1874,9 +1871,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, __ClearPageActive(page); if (unlikely(PageCompound(page))) { - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); destroy_compound_page(page); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); } else list_add(&page->lru, &pages_to_free); @@ -1953,7 +1950,7 @@ static int current_may_throttle(void) lru_add_drain(); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, &nr_scanned, sc, lru); @@ -1965,7 +1962,7 @@ static int current_may_throttle(void) __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned); __count_vm_events(PGSCAN_ANON + file, nr_scanned); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); if (nr_taken == 0) return 0; @@ -1973,7 +1970,7 @@ static int current_may_throttle(void) nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, 0, &stat, false); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); move_pages_to_lru(lruvec, &page_list); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); @@ -1982,7 +1979,7 @@ static int current_may_throttle(void) __count_vm_events(item, nr_reclaimed); __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); lru_note_cost(lruvec, file, stat.nr_pageout); mem_cgroup_uncharge_list(&page_list); @@ -2035,7 +2032,7 @@ static void shrink_active_list(unsigned long nr_to_scan, lru_add_drain(); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, lru); @@ -2046,7 +2043,7 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_vm_events(PGREFILL, nr_scanned); __count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); while (!list_empty(&l_hold)) { cond_resched(); @@ -2092,7 +2089,7 @@ static void shrink_active_list(unsigned long nr_to_scan, /* * Move pages back to the lru list. */ - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); nr_activate = move_pages_to_lru(lruvec, &l_active); nr_deactivate = move_pages_to_lru(lruvec, &l_inactive); @@ -2103,7 +2100,7 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_list(&l_active); free_unref_page_list(&l_active); @@ -2693,10 +2690,10 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) /* * Determine the scan balance between anon and file LRUs. */ - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&target_lruvec->lru_lock); sc->anon_cost = target_lruvec->anon_cost; sc->file_cost = target_lruvec->file_cost; - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&target_lruvec->lru_lock); /* * Target desirable inactive:active list ratios for the anon @@ -4272,16 +4269,15 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) */ void check_move_unevictable_pages(struct pagevec *pvec) { - struct lruvec *lruvec; - struct pglist_data *pgdat = NULL; + struct lruvec *lruvec = NULL; int pgscanned = 0; int pgrescued = 0; int i; for (i = 0; i < pvec->nr; i++) { struct page *page = pvec->pages[i]; - struct pglist_data *pagepgdat = page_pgdat(page); int nr_pages; + struct lruvec *new_lruvec; if (PageTransTail(page)) continue; @@ -4293,13 +4289,12 @@ void check_move_unevictable_pages(struct pagevec *pvec) if (!TestClearPageLRU(page)) continue; - if (pagepgdat != pgdat) { - if (pgdat) - spin_unlock_irq(&pgdat->lru_lock); - pgdat = pagepgdat; - spin_lock_irq(&pgdat->lru_lock); + new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + if (lruvec != new_lruvec) { + if (lruvec) + unlock_page_lruvec_irq(lruvec); + lruvec = lock_page_lruvec_irq(page); } - lruvec = mem_cgroup_page_lruvec(page, pgdat); if (page_evictable(page) && PageUnevictable(page)) { enum lru_list lru = page_lru_base_type(page); @@ -4313,10 +4308,10 @@ void check_move_unevictable_pages(struct pagevec *pvec) SetPageLRU(page); } - if (pgdat) { + if (lruvec) { __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); - spin_unlock_irq(&pgdat->lru_lock); + unlock_page_lruvec_irq(lruvec); } else if (pgscanned) { count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); } From patchwork Thu Nov 5 08:55:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883711 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E5DF921 for ; Thu, 5 Nov 2020 08:56:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E29FD206FB for ; Thu, 5 Nov 2020 08:56:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E29FD206FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5F6BE6B00C3; Thu, 5 Nov 2020 03:56:34 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 57B416B00C4; Thu, 5 Nov 2020 03:56:34 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3AB0D6B00C5; Thu, 5 Nov 2020 03:56:34 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0134.hostedemail.com [216.40.44.134]) by kanga.kvack.org (Postfix) with ESMTP id EFCB56B00C3 for ; Thu, 5 Nov 2020 03:56:33 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A04BE180AD811 for ; Thu, 5 Nov 2020 08:56:33 +0000 (UTC) X-FDA: 77449758666.09.bun90_070134d272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 83E15180AD80F for ; Thu, 5 Nov 2020 08:56:33 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:30012:30054:30056:30070,0,RBL:47.88.44.36:@linux.alibaba.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100;04yfmtrbpz86eik8x5wos31w6jys8opz78u9cfhsi7dxbndfdqzcddumzwnarsr.zmqjrwh5339y5ruhjqfya5g66dxjg6rge54kd84qb9srubx7ffkabxoqfjb3hs6.h-lbl8.mailshell.net-223.238.255.100;47.88.44.36-irl.urbl.hostedemail.com-127.0.0.150,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:69,LUA_SUMMARY:none X-HE-Tag: bun90_070134d272c8 X-Filterd-Recvd-Size: 9548 Received: from out4436.biz.mail.alibaba.com (out4436.biz.mail.alibaba.com [47.88.44.36]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:32 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R381e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=24;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:15 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: Alexander Duyck , Thomas Gleixner , Andrey Ryabinin Subject: [PATCH v21 18/19] mm/lru: introduce the relock_page_lruvec function Date: Thu, 5 Nov 2020 16:55:48 +0800 Message-Id: <1604566549-62481-19-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexander Duyck Use this new function to replace repeated same code, no func change. When testing for relock we can avoid the need for RCU locking if we simply compare the page pgdat and memcg pointers versus those that the lruvec is holding. By doing this we can avoid the extra pointer walks and accesses of the memory cgroup. In addition we can avoid the checks entirely if lruvec is currently NULL. Signed-off-by: Alexander Duyck Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Johannes Weiner Cc: Andrew Morton Cc: Thomas Gleixner Cc: Andrey Ryabinin Cc: Matthew Wilcox Cc: Mel Gorman Cc: Konstantin Khlebnikov Cc: Hugh Dickins Cc: Tejun Heo Cc: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Alexander Duyck Signed-off-by: Alex Shi Acked-by: Hugh Dickins Acked-by: Johannes Weiner Acked-by: Johannes Weiner Acked-by: Vlastimil Babka --- include/linux/memcontrol.h | 52 ++++++++++++++++++++++++++++++++++++++++++++++ mm/mlock.c | 11 +--------- mm/swap.c | 33 +++++++---------------------- mm/vmscan.c | 12 ++--------- 4 files changed, 62 insertions(+), 46 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 6ecb08ff4ad1..ba4050154fea 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -660,6 +660,22 @@ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg, struct lruvec *mem_cgroup_page_lruvec(struct page *, struct pglist_data *); +static inline bool lruvec_holds_page_lru_lock(struct page *page, + struct lruvec *lruvec) +{ + pg_data_t *pgdat = page_pgdat(page); + const struct mem_cgroup *memcg; + struct mem_cgroup_per_node *mz; + + if (mem_cgroup_disabled()) + return lruvec == &pgdat->__lruvec; + + mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec); + memcg = page->mem_cgroup ? : root_mem_cgroup; + + return lruvec->pgdat == pgdat && mz->memcg == memcg; +} + struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); @@ -1221,6 +1237,14 @@ static inline struct lruvec *mem_cgroup_page_lruvec(struct page *page, return &pgdat->__lruvec; } +static inline bool lruvec_holds_page_lru_lock(struct page *page, + struct lruvec *lruvec) +{ + pg_data_t *pgdat = page_pgdat(page); + + return lruvec == &pgdat->__lruvec; +} + static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg) { return NULL; @@ -1663,6 +1687,34 @@ static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, spin_unlock_irqrestore(&lruvec->lru_lock, flags); } +/* Don't lock again iff page's lruvec locked */ +static inline struct lruvec *relock_page_lruvec_irq(struct page *page, + struct lruvec *locked_lruvec) +{ + if (locked_lruvec) { + if (lruvec_holds_page_lru_lock(page, locked_lruvec)) + return locked_lruvec; + + unlock_page_lruvec_irq(locked_lruvec); + } + + return lock_page_lruvec_irq(page); +} + +/* Don't lock again iff page's lruvec locked */ +static inline struct lruvec *relock_page_lruvec_irqsave(struct page *page, + struct lruvec *locked_lruvec, unsigned long *flags) +{ + if (locked_lruvec) { + if (lruvec_holds_page_lru_lock(page, locked_lruvec)) + return locked_lruvec; + + unlock_page_lruvec_irqrestore(locked_lruvec, *flags); + } + + return lock_page_lruvec_irqsave(page, flags); +} + #ifdef CONFIG_CGROUP_WRITEBACK struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb); diff --git a/mm/mlock.c b/mm/mlock.c index ab164a675c25..55b3b3672977 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -277,16 +277,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) * so we can spare the get_page() here. */ if (TestClearPageLRU(page)) { - struct lruvec *new_lruvec; - - new_lruvec = mem_cgroup_page_lruvec(page, - page_pgdat(page)); - if (new_lruvec != lruvec) { - if (lruvec) - unlock_page_lruvec_irq(lruvec); - lruvec = lock_page_lruvec_irq(page); - } - + lruvec = relock_page_lruvec_irq(page, lruvec); del_page_from_lru_list(page, lruvec, page_lru(page)); continue; diff --git a/mm/swap.c b/mm/swap.c index ed033f7c4f2d..c593ba596dea 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -210,19 +210,12 @@ static void pagevec_lru_move_fn(struct pagevec *pvec, for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct lruvec *new_lruvec; /* block memcg migration during page moving between lru */ if (!TestClearPageLRU(page)) continue; - new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (lruvec != new_lruvec) { - if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); - lruvec = lock_page_lruvec_irqsave(page, &flags); - } - + lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags); (*move_fn)(page, lruvec); SetPageLRU(page); @@ -918,17 +911,12 @@ void release_pages(struct page **pages, int nr) } if (PageLRU(page)) { - struct lruvec *new_lruvec; - - new_lruvec = mem_cgroup_page_lruvec(page, - page_pgdat(page)); - if (new_lruvec != lruvec) { - if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, - flags); + struct lruvec *prev_lruvec = lruvec; + + lruvec = relock_page_lruvec_irqsave(page, lruvec, + &flags); + if (prev_lruvec != lruvec) lock_batch = 0; - lruvec = lock_page_lruvec_irqsave(page, &flags); - } VM_BUG_ON_PAGE(!PageLRU(page), page); __ClearPageLRU(page); @@ -1033,15 +1021,8 @@ void __pagevec_lru_add(struct pagevec *pvec) for (i = 0; i < pagevec_count(pvec); i++) { struct page *page = pvec->pages[i]; - struct lruvec *new_lruvec; - - new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (lruvec != new_lruvec) { - if (lruvec) - unlock_page_lruvec_irqrestore(lruvec, flags); - lruvec = lock_page_lruvec_irqsave(page, &flags); - } + lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags); __pagevec_lru_add_fn(page, lruvec); } if (lruvec) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2953ddec88a0..3b09a39de8cd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1884,8 +1884,7 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, * All pages were isolated from the same lruvec (and isolation * inhibits memcg migration). */ - VM_BUG_ON_PAGE(mem_cgroup_page_lruvec(page, page_pgdat(page)) - != lruvec, page); + VM_BUG_ON_PAGE(!lruvec_holds_page_lru_lock(page, lruvec), page); lru = page_lru(page); nr_pages = thp_nr_pages(page); @@ -4277,7 +4276,6 @@ void check_move_unevictable_pages(struct pagevec *pvec) for (i = 0; i < pvec->nr; i++) { struct page *page = pvec->pages[i]; int nr_pages; - struct lruvec *new_lruvec; if (PageTransTail(page)) continue; @@ -4289,13 +4287,7 @@ void check_move_unevictable_pages(struct pagevec *pvec) if (!TestClearPageLRU(page)) continue; - new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (lruvec != new_lruvec) { - if (lruvec) - unlock_page_lruvec_irq(lruvec); - lruvec = lock_page_lruvec_irq(page); - } - + lruvec = relock_page_lruvec_irq(page, lruvec); if (page_evictable(page) && PageUnevictable(page)) { enum lru_list lru = page_lru_base_type(page); From patchwork Thu Nov 5 08:55:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 11883707 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2CD2F697 for ; Thu, 5 Nov 2020 08:56:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BC047206FB for ; Thu, 5 Nov 2020 08:56:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BC047206FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2A6126B00BD; Thu, 5 Nov 2020 03:56:30 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F18006B00BF; Thu, 5 Nov 2020 03:56:29 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D939F6B00C0; Thu, 5 Nov 2020 03:56:29 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0214.hostedemail.com [216.40.44.214]) by kanga.kvack.org (Postfix) with ESMTP id 8F44F6B00BD for ; Thu, 5 Nov 2020 03:56:29 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 32971181AC9C6 for ; Thu, 5 Nov 2020 08:56:29 +0000 (UTC) X-FDA: 77449758498.27.trees19_190e19d272c8 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 12A3B3D669 for ; Thu, 5 Nov 2020 08:56:29 +0000 (UTC) X-Spam-Summary: 1,0,0,1423cdf6e37820bb,d41d8cd98f00b204,alex.shi@linux.alibaba.com,,RULES_HIT:4:41:69:355:379:541:800:960:966:968:973:988:989:1260:1261:1345:1359:1431:1437:1605:1730:1747:1777:1792:1801:1981:2194:2196:2198:2199:2200:2201:2393:2553:2559:2562:2692:2693:2731:2736:2737:2903:2916:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4250:4321:4385:4605:5007:6119:6261:6630:6737:6738:7576:7875:7903:7974:8660:9010:9592:10004:11026:11232:11473:11658:11914:12043:12048:12291:12295:12296:12297:12438:12555:12679:12683:12895:12986:13148:13149:13156:13228:13230:13846:13869:13972:14096:14394:14915:21060:21067:21080:21324:21433:21451:21611:21627:21740:21939:21990:30005:30012:30034:30045:30051:30054:30070:30079:30085:30090,0,RBL:115.124.30.132:@linux.alibaba.com:.lbl8.mailshell.net-64.201.201.201 62.20.2.100;04yrqjgcg9kaweormdynduojybwjqyc4t1z7pzu96cbriefbpjwc5asxphbixr4.fd137jmnyqmxayjb4ktmrpnab46q97d968xf5mk871dfuf4ew3pctwb46ife1nk.g-lbl8.mailshell.net-223.238.255. 100,Cach X-HE-Tag: trees19_190e19d272c8 X-Filterd-Recvd-Size: 17151 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 08:56:26 +0000 (UTC) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R301e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=23;SR=0;TI=SMTPD_---0UEJC3Fv_1604566567; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0UEJC3Fv_1604566567) by smtp.aliyun-inc.com(127.0.0.1); Thu, 05 Nov 2020 16:56:16 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com Cc: Andrey Ryabinin , Jann Horn Subject: [PATCH v21 19/19] mm/lru: revise the comments of lru_lock Date: Thu, 5 Nov 2020 16:55:49 +0800 Message-Id: <1604566549-62481-20-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Hugh Dickins Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to fix the incorrect comments in code. Also fixed some zone->lru_lock comment error from ancient time. etc. I struggled to understand the comment above move_pages_to_lru() (surely it never calls page_referenced()), and eventually realized that most of it had got separated from shrink_active_list(): move that comment back. Signed-off-by: Hugh Dickins Signed-off-by: Alex Shi Acked-by: Johannes Weiner Cc: Andrew Morton Cc: Tejun Heo Cc: Andrey Ryabinin Cc: Jann Horn Cc: Mel Gorman Cc: Johannes Weiner Cc: Matthew Wilcox Cc: Hugh Dickins Cc: cgroups@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Acked-by: Vlastimil Babka --- Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 ++------ Documentation/admin-guide/cgroup-v1/memory.rst | 21 +++++------ Documentation/trace/events-kmem.rst | 2 +- Documentation/vm/unevictable-lru.rst | 22 +++++------- include/linux/mm_types.h | 2 +- include/linux/mmzone.h | 3 +- mm/filemap.c | 4 +-- mm/rmap.c | 4 +-- mm/vmscan.c | 41 ++++++++++++---------- 9 files changed, 50 insertions(+), 64 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v1/memcg_test.rst b/Documentation/admin-guide/cgroup-v1/memcg_test.rst index 3f7115e07b5d..0b9f91589d3d 100644 --- a/Documentation/admin-guide/cgroup-v1/memcg_test.rst +++ b/Documentation/admin-guide/cgroup-v1/memcg_test.rst @@ -133,18 +133,9 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. 8. LRU ====== - Each memcg has its own private LRU. Now, its handling is under global - VM's control (means that it's handled under global pgdat->lru_lock). - Almost all routines around memcg's LRU is called by global LRU's - list management functions under pgdat->lru_lock. - - A special function is mem_cgroup_isolate_pages(). This scans - memcg's private LRU and call __isolate_lru_page() to extract a page - from LRU. - - (By __isolate_lru_page(), the page is removed from both of global and - private LRU.) - + Each memcg has its own vector of LRUs (inactive anon, active anon, + inactive file, active file, unevictable) of pages from each node, + each LRU handled under a single lru_lock for that memcg and node. 9. Typical Tests. ================= diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 12757e63b26c..24450696579f 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -285,20 +285,17 @@ When oom event notifier is registered, event will be delivered. 2.6 Locking ----------- - lock_page_cgroup()/unlock_page_cgroup() should not be called under - the i_pages lock. +Lock order is as follows: - Other lock order is following: + Page lock (PG_locked bit of page->flags) + mm->page_table_lock or split pte_lock + lock_page_memcg (memcg->move_lock) + mapping->i_pages lock + lruvec->lru_lock. - PG_locked. - mm->page_table_lock - pgdat->lru_lock - lock_page_cgroup. - - In many cases, just lock_page_cgroup() is called. - - per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by - pgdat->lru_lock, it has no lock of its own. +Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by +lruvec->lru_lock; PG_lru bit of page->flags is cleared before +isolating a page from its LRU under lruvec->lru_lock. 2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM) ----------------------------------------------- diff --git a/Documentation/trace/events-kmem.rst b/Documentation/trace/events-kmem.rst index 555484110e36..68fa75247488 100644 --- a/Documentation/trace/events-kmem.rst +++ b/Documentation/trace/events-kmem.rst @@ -69,7 +69,7 @@ When pages are freed in batch, the also mm_page_free_batched is triggered. Broadly speaking, pages are taken off the LRU lock in bulk and freed in batch with a page list. Significant amounts of activity here could indicate that the system is under memory pressure and can also indicate -contention on the zone->lru_lock. +contention on the lruvec->lru_lock. 4. Per-CPU Allocator Activity ============================= diff --git a/Documentation/vm/unevictable-lru.rst b/Documentation/vm/unevictable-lru.rst index 17d0861b0f1d..0e1490524f53 100644 --- a/Documentation/vm/unevictable-lru.rst +++ b/Documentation/vm/unevictable-lru.rst @@ -33,7 +33,7 @@ reclaim in Linux. The problems have been observed at customer sites on large memory x86_64 systems. To illustrate this with an example, a non-NUMA x86_64 platform with 128GB of -main memory will have over 32 million 4k pages in a single zone. When a large +main memory will have over 32 million 4k pages in a single node. When a large fraction of these pages are not evictable for any reason [see below], vmscan will spend a lot of time scanning the LRU lists looking for the small fraction of pages that are evictable. This can result in a situation where all CPUs are @@ -55,7 +55,7 @@ unevictable, either by definition or by circumstance, in the future. The Unevictable Page List ------------------------- -The Unevictable LRU infrastructure consists of an additional, per-zone, LRU list +The Unevictable LRU infrastructure consists of an additional, per-node, LRU list called the "unevictable" list and an associated page flag, PG_unevictable, to indicate that the page is being managed on the unevictable list. @@ -84,15 +84,9 @@ The unevictable list does not differentiate between file-backed and anonymous, swap-backed pages. This differentiation is only important while the pages are, in fact, evictable. -The unevictable list benefits from the "arrayification" of the per-zone LRU +The unevictable list benefits from the "arrayification" of the per-node LRU lists and statistics originally proposed and posted by Christoph Lameter. -The unevictable list does not use the LRU pagevec mechanism. Rather, -unevictable pages are placed directly on the page's zone's unevictable list -under the zone lru_lock. This allows us to prevent the stranding of pages on -the unevictable list when one task has the page isolated from the LRU and other -tasks are changing the "evictability" state of the page. - Memory Control Group Interaction -------------------------------- @@ -101,8 +95,8 @@ The unevictable LRU facility interacts with the memory control group [aka memory controller; see Documentation/admin-guide/cgroup-v1/memory.rst] by extending the lru_list enum. -The memory controller data structure automatically gets a per-zone unevictable -list as a result of the "arrayification" of the per-zone LRU lists (one per +The memory controller data structure automatically gets a per-node unevictable +list as a result of the "arrayification" of the per-node LRU lists (one per lru_list enum element). The memory controller tracks the movement of pages to and from the unevictable list. @@ -196,7 +190,7 @@ for the sake of expediency, to leave a unevictable page on one of the regular active/inactive LRU lists for vmscan to deal with. vmscan checks for such pages in all of the shrink_{active|inactive|page}_list() functions and will "cull" such pages that it encounters: that is, it diverts those pages to the -unevictable list for the zone being scanned. +unevictable list for the node being scanned. There may be situations where a page is mapped into a VM_LOCKED VMA, but the page is not marked as PG_mlocked. Such pages will make it all the way to @@ -328,7 +322,7 @@ If the page was NOT already mlocked, mlock_vma_page() attempts to isolate the page from the LRU, as it is likely on the appropriate active or inactive list at that time. If the isolate_lru_page() succeeds, mlock_vma_page() will put back the page - by calling putback_lru_page() - which will notice that the page -is now mlocked and divert the page to the zone's unevictable list. If +is now mlocked and divert the page to the node's unevictable list. If mlock_vma_page() is unable to isolate the page from the LRU, vmscan will handle it later if and when it attempts to reclaim the page. @@ -603,7 +597,7 @@ Some examples of these unevictable pages on the LRU lists are: unevictable list in mlock_vma_page(). shrink_inactive_list() also diverts any unevictable pages that it finds on the -inactive lists to the appropriate zone's unevictable list. +inactive lists to the appropriate node's unevictable list. shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd after shrink_active_list() had moved them to the inactive list, or pages mapped diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6a6b078b9d6a..82c788917319 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -78,7 +78,7 @@ struct page { struct { /* Page cache and anonymous pages */ /** * @lru: Pageout list, eg. active_list protected by - * pgdat->lru_lock. Sometimes used as a generic list + * lruvec->lru_lock. Sometimes used as a generic list * by the page owner. */ struct list_head lru; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 0afba4ea2a21..1299b8ce64d3 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -113,8 +113,7 @@ static inline bool free_area_empty(struct free_area *area, int migratetype) struct pglist_data; /* - * zone->lock and the zone lru_lock are two of the hottest locks in the kernel. - * So add a wild amount of padding here to ensure that they fall into separate + * Add a wild amount of padding here to ensure datas fall into separate * cachelines. There are very few zone structures in the machine, so space * consumption is not a concern here. */ diff --git a/mm/filemap.c b/mm/filemap.c index d90614f501da..426d547cf19e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -102,8 +102,8 @@ * ->swap_lock (try_to_unmap_one) * ->private_lock (try_to_unmap_one) * ->i_pages lock (try_to_unmap_one) - * ->pgdat->lru_lock (follow_page->mark_page_accessed) - * ->pgdat->lru_lock (check_pte_range->isolate_lru_page) + * ->lruvec->lru_lock (follow_page->mark_page_accessed) + * ->lruvec->lru_lock (check_pte_range->isolate_lru_page) * ->private_lock (page_remove_rmap->set_page_dirty) * ->i_pages lock (page_remove_rmap->set_page_dirty) * bdi.wb->list_lock (page_remove_rmap->set_page_dirty) diff --git a/mm/rmap.c b/mm/rmap.c index 078d54da59d4..73788505aa0a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -28,12 +28,12 @@ * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) * anon_vma->rwsem * mm->page_table_lock or pte_lock - * pgdat->lru_lock (in mark_page_accessed, isolate_lru_page) * swap_lock (in swap_duplicate, swap_info_get) * mmlist_lock (in mmput, drain_mmlist and others) * mapping->private_lock (in __set_page_dirty_buffers) - * mem_cgroup_{begin,end}_page_stat (memcg->move_lock) + * lock_page_memcg move_lock (in __set_page_dirty_buffers) * i_pages lock (widely used) + * lruvec->lru_lock (in lock_page_lruvec_irq) * inode->i_lock (in set_page_dirty's __mark_inode_dirty) * bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty) * sb_lock (within inode_lock in fs/fs-writeback.c) diff --git a/mm/vmscan.c b/mm/vmscan.c index 3b09a39de8cd..1c343adbbbe3 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1614,14 +1614,16 @@ static __always_inline void update_lru_sizes(struct lruvec *lruvec, } /** - * pgdat->lru_lock is heavily contended. Some of the functions that + * Isolating page from the lruvec to fill in @dst list by nr_to_scan times. + * + * lruvec->lru_lock is heavily contended. Some of the functions that * shrink the lists perform better by taking out a batch of pages * and working on them outside the LRU lock. * * For pagecache intensive workloads, this function is the hottest * spot in the kernel (apart from copy_*_user functions). * - * Appropriate locks must be held before calling this function. + * Lru_lock must be held before calling this function. * * @nr_to_scan: The number of eligible pages to look through on the list. * @lruvec: The LRU vector to pull pages from. @@ -1815,25 +1817,11 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, } /* - * This moves pages from @list to corresponding LRU list. - * - * We move them the other way if the page is referenced by one or more - * processes, from rmap. - * - * If the pages are mostly unmapped, the processing is fast and it is - * appropriate to hold zone_lru_lock across the whole operation. But if - * the pages are mapped, the processing is slow (page_referenced()) so we - * should drop zone_lru_lock around each page. It's impossible to balance - * this, so instead we remove the pages from the LRU while processing them. - * It is safe to rely on PG_active against the non-LRU pages in here because - * nobody will play with that bit on a non-LRU page. - * - * The downside is that we have to touch page->_refcount against each page. - * But we had to alter page->flags anyway. + * move_pages_to_lru() moves pages from private @list to appropriate LRU list. + * On return, @list is reused as a list of pages to be freed by the caller. * * Returns the number of pages moved to the given lruvec. */ - static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, struct list_head *list) { @@ -2012,6 +2000,23 @@ static int current_may_throttle(void) return nr_reclaimed; } +/* + * shrink_active_list() moves pages from the active LRU to the inactive LRU. + * + * We move them the other way if the page is referenced by one or more + * processes. + * + * If the pages are mostly unmapped, the processing is fast and it is + * appropriate to hold lru_lock across the whole operation. But if + * the pages are mapped, the processing is slow (page_referenced()), so + * we should drop lru_lock around each page. It's impossible to balance + * this, so instead we remove the pages from the LRU while processing them. + * It is safe to rely on PG_active against the non-LRU pages in here because + * nobody will play with that bit on a non-LRU page. + * + * The downside is that we have to touch page->_refcount against each page. + * But we had to alter page->flags anyway. + */ static void shrink_active_list(unsigned long nr_to_scan, struct lruvec *lruvec, struct scan_control *sc,